Publications - Junda He

2025

From Code to Courtroom: LLMs as the New Software Judges

Junda He, Jieke Shi, Terru Zhuo Yue et al.

Under review. 2025

This forward-looking SE 2030 paper aims to steer the research community toward advancing LLM-as-a-Judge for evaluating LLMgenerated software artifacts, while also sharing potential research paths to achieve this goal. We provide a literature review of existing SE studies on LLM-as-a-Judge and envision these frameworks as reliable, robust, and scalable human surrogates capable of evaluating software artifacts with consistent, multi-faceted assessments by 2030 and beyond. To validate this vision, we analyze the limitations of current studies, identify key research gaps, and outline a detailed roadmap to guide future developments of LLM-as-a-Judge in software engineering..

[Paper]

From Code to Courtroom: LLMs as the New Software Judges

Junda He, Jieke Shi, Terru Zhuo Yue et al.

Under review. 2025

[Paper]

A Functional Software Reference Architecture for LLM-Integrated Systems

Alessio Bucaioni, Martin Weyssow, Junda He, Yunbo Lyu, David Lo

IEEE International Conference on Software Architecture (ICSA) 2025

The integration of large language models into software systems is transforming capabilities such as natural language understanding, decision-making, and autonomous task execution. However, the absence of a commonly accepted software reference architecture hinders systematic reasoning about their design and quality attributes. This gap makes it challenging to address critical concerns like privacy, security, modularity, and interoperability, which are increasingly important as these systems grow in complexity and societal impact. In this paper, we describe our emerging results for a preliminary functional reference architecture as a conceptual framework.

[Paper]

A Functional Software Reference Architecture for LLM-Integrated Systems

Alessio Bucaioni, Martin Weyssow, Junda He, Yunbo Lyu, David Lo

IEEE International Conference on Software Architecture (ICSA) 2025

[Paper]

LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead

Junda He, Christoph Treude, David Lo

ACM Transactions on Software Engineering and Methodology (TOSEM) 2025

Integrating Large Language Models (LLMs) into autonomous agents marks a significant shift in the research landscape by offering cognitive abilities that are competitive with human planning and reasoning. This paper explores the transformative potential of integrating Large Language Models into Multi-Agent (LMA) systems for addressing complex challenges in software engineering (SE).

[Paper]

LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead

Junda He, Christoph Treude, David Lo

ACM Transactions on Software Engineering and Methodology (TOSEM) 2025

[Paper]

2024

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Junda He, Bowen Xu, Zhou Yang et al.

Empirical Software Engineering (EMSE) 2025

Inspired by the recent success of pre-trained models (PTMs) in natural language processing (NLP), we present PTM4Tag+, a tag recommendation framework for Stack Overflow posts that utilizes PTMs in language modeling. PTM4Tag+ is implemented with a triplet architecture, which considers three key components of a post, i.e., Title, Description, and Code, with independent PTMs. We utilize a number of popular pre-trained models, including the BERT-based models (e.g., BERT, RoBERTa, CodeBERT, BERTOverflow, and ALBERT), and encoder-decoder models (e.g., PLBART, CoTexT, and CodeT5).

2025

From Code to Courtroom: LLMs as the New Software Judges

From Code to Courtroom: LLMs as the New Software Judges

A Functional Software Reference Architecture for LLM-Integrated Systems

A Functional Software Reference Architecture for LLM-Integrated Systems

LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead

LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead

2024

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Finding Safety Violations of AI-Enabled Control Systems through the Lens of Synthesized Proxy Programs

Finding Safety Violations of AI-Enabled Control Systems through the Lens of Synthesized Proxy Programs

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Baffle: Hiding Backdoors in Offline Reinforcement Learning Datasets

Baffle: Hiding Backdoors in Offline Reinforcement Learning Datasets

Curiosity-Driven Testing for Sequential Decision-Making Process

Curiosity-Driven Testing for Sequential Decision-Making Process

Representation Learning for Stack Overflow Posts: How Far Are We?

Representation Learning for Stack Overflow Posts: How Far Are We?

2023

CCBert: Self-supervised Code Change Representation Learning

CCBert: Self-supervised Code Change Representation Learning

2022

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Natural Attack for Pre-trained Models of Code

Natural Attack for Pre-trained Models of Code