Junda He
Logo PhD Candidate
Logo Research Engineer

I am Junda He (何俊达), a third-year PhD candidate and research engineer at Singapore Management University (SMU), supervised by ACM Fellow, IEEE Fellow, ASE Fellow, and ACM Distinguished Speaker, Prof. David Lo. Before joining SMU, I obtained my M.Sc and B.Sc degrees from University College London (UCL). My current research mainly focuses on both SE4AI and AI4SE. I warmly welcome anyone interested to connect and collaborate with me.


Education
  • Singapore Management University
    Singapore Management University
    School of Computing and Information Systems
    Ph.D. Candidate
    Aug. 2022 - present
  • University College London
    University College London
    MSc in Software Engineering
    Sep. 2019 - Dec. 2020
  • University College London
    University College London
    BSc in Computer Science
    Sep. 2016 - July. 2019
Honors & Awards
  • SMU Presidential Doctoral Fellowship
    2024
  • ACM SIGSOFT CAPS Travel Funds
    2024
News
2025
Our paper has been accepted to TOSEM!
Mar 13
Our paper has been accepted to ICLR 2025!
Jan 23
2024
Our paper has been accepted to TOSEM
Dec 18
One work has been accepted to Empirical Software Engineering (EMSE).
Nov 30
Selected Publications (view more )
From Code to Courtroom: LLMs as the New Software Judges
From Code to Courtroom: LLMs as the New Software Judges

Junda He, Jieke Shi, Terru Zhuo Yue et al.

Under review. 2025

This forward-looking SE 2030 paper aims to steer the research community toward advancing LLM-as-a-Judge for evaluating LLMgenerated software artifacts, while also sharing potential research paths to achieve this goal. We provide a literature review of existing SE studies on LLM-as-a-Judge and envision these frameworks as reliable, robust, and scalable human surrogates capable of evaluating software artifacts with consistent, multi-faceted assessments by 2030 and beyond. To validate this vision, we analyze the limitations of current studies, identify key research gaps, and outline a detailed roadmap to guide future developments of LLM-as-a-Judge in software engineering..

From Code to Courtroom: LLMs as the New Software Judges

Junda He, Jieke Shi, Terru Zhuo Yue et al.

Under review. 2025

This forward-looking SE 2030 paper aims to steer the research community toward advancing LLM-as-a-Judge for evaluating LLMgenerated software artifacts, while also sharing potential research paths to achieve this goal. We provide a literature review of existing SE studies on LLM-as-a-Judge and envision these frameworks as reliable, robust, and scalable human surrogates capable of evaluating software artifacts with consistent, multi-faceted assessments by 2030 and beyond. To validate this vision, we analyze the limitations of current studies, identify key research gaps, and outline a detailed roadmap to guide future developments of LLM-as-a-Judge in software engineering..

LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead
LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead

Junda He, Christoph Treude, David Lo

ACM Transactions on Software Engineering and Methodology (TOSEM) 2025

Integrating Large Language Models (LLMs) into autonomous agents marks a significant shift in the research landscape by offering cognitive abilities that are competitive with human planning and reasoning. This paper explores the transformative potential of integrating Large Language Models into Multi-Agent (LMA) systems for addressing complex challenges in software engineering (SE).

LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead

Junda He, Christoph Treude, David Lo

ACM Transactions on Software Engineering and Methodology (TOSEM) 2025

Integrating Large Language Models (LLMs) into autonomous agents marks a significant shift in the research landscape by offering cognitive abilities that are competitive with human planning and reasoning. This paper explores the transformative potential of integrating Large Language Models into Multi-Agent (LMA) systems for addressing complex challenges in software engineering (SE).

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models
PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Junda He, Bowen Xu, Zhou Yang et al.

Empirical Software Engineering (EMSE) 2025

Inspired by the recent success of pre-trained models (PTMs) in natural language processing (NLP), we present PTM4Tag+, a tag recommendation framework for Stack Overflow posts that utilizes PTMs in language modeling. PTM4Tag+ is implemented with a triplet architecture, which considers three key components of a post, i.e., Title, Description, and Code, with independent PTMs. We utilize a number of popular pre-trained models, including the BERT-based models (e.g., BERT, RoBERTa, CodeBERT, BERTOverflow, and ALBERT), and encoder-decoder models (e.g., PLBART, CoTexT, and CodeT5).

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Junda He, Bowen Xu, Zhou Yang et al.

Empirical Software Engineering (EMSE) 2025

Inspired by the recent success of pre-trained models (PTMs) in natural language processing (NLP), we present PTM4Tag+, a tag recommendation framework for Stack Overflow posts that utilizes PTMs in language modeling. PTM4Tag+ is implemented with a triplet architecture, which considers three key components of a post, i.e., Title, Description, and Code, with independent PTMs. We utilize a number of popular pre-trained models, including the BERT-based models (e.g., BERT, RoBERTa, CodeBERT, BERTOverflow, and ALBERT), and encoder-decoder models (e.g., PLBART, CoTexT, and CodeT5).

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Terry Yue Zhuo, Minh Chien Vu*, Jenny Chim*, Junda He*, Indraneil Paul* et al. (* equal contribution)

International Conference on Learning Representations (ICLR) 2025

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks or standalone function calls. Solving challenging and practical requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs.

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Terry Yue Zhuo, Minh Chien Vu*, Jenny Chim*, Junda He*, Indraneil Paul* et al. (* equal contribution)

International Conference on Learning Representations (ICLR) 2025

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks or standalone function calls. Solving challenging and practical requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs.

Curiosity-Driven Testing for Sequential Decision-Making Process
Curiosity-Driven Testing for Sequential Decision-Making Process

Junda He, Zhou Yang, Jieke Shi et al.

International Conference on Software Engineering (ICSE) 2024

Sequential decision-making processes (SDPs) are fundamental for complex real-world challenges, such as autonomous driving, robotic control, and traffic management. While recent advances in Deep Learning (DL) have led to mature solutions for solving these complex problems, SDMs remain vulnerable to learning unsafe behaviors, posing significant risks in safety-critical applications. However, developing a testing framework for SDMs that can identify a diverse set of crash-triggering scenarios remains an open challenge. To address this, we propose CureFuzz, a novel curiosity-driven black-box fuzz testing approach for SDMs.

Curiosity-Driven Testing for Sequential Decision-Making Process

Junda He, Zhou Yang, Jieke Shi et al.

International Conference on Software Engineering (ICSE) 2024

Sequential decision-making processes (SDPs) are fundamental for complex real-world challenges, such as autonomous driving, robotic control, and traffic management. While recent advances in Deep Learning (DL) have led to mature solutions for solving these complex problems, SDMs remain vulnerable to learning unsafe behaviors, posing significant risks in safety-critical applications. However, developing a testing framework for SDMs that can identify a diverse set of crash-triggering scenarios remains an open challenge. To address this, we propose CureFuzz, a novel curiosity-driven black-box fuzz testing approach for SDMs.

Representation Learning for Stack Overflow Posts: How Far Are We?
Representation Learning for Stack Overflow Posts: How Far Are We?

Junda He, Xin Zhou, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Clairine Irsan, David Lo

ACM Transactions on Software Engineering and Methodology (TOSEM) 2024

The tremendous success of Stack Overflow has accumulated an extensive corpus of software engineering knowledge, thus motivating researchers to propose various solutions for analyzing its content. The performance of such solutions hinges significantly on the selection of representation models for Stack Overflow posts. As the volume of literature on Stack Overflow continues to burgeon, it highlights the need for a powerful Stack Overflow post representation model and drives researchers' interest in developing specialized representation models that can adeptly capture the intricacies of Stack Overflow posts. The state-of-the-art (SOTA) Stack Overflow post representation models are Post2Vec and BERTOverflow, which are built upon neural networks such as convolutional neural network and transformer architecture (e.g., BERT). Despite their promising results, these representation methods have not been comprehensively evaluated.

Representation Learning for Stack Overflow Posts: How Far Are We?

Junda He, Xin Zhou, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Clairine Irsan, David Lo

ACM Transactions on Software Engineering and Methodology (TOSEM) 2024

The tremendous success of Stack Overflow has accumulated an extensive corpus of software engineering knowledge, thus motivating researchers to propose various solutions for analyzing its content. The performance of such solutions hinges significantly on the selection of representation models for Stack Overflow posts. As the volume of literature on Stack Overflow continues to burgeon, it highlights the need for a powerful Stack Overflow post representation model and drives researchers' interest in developing specialized representation models that can adeptly capture the intricacies of Stack Overflow posts. The state-of-the-art (SOTA) Stack Overflow post representation models are Post2Vec and BERTOverflow, which are built upon neural networks such as convolutional neural network and transformer architecture (e.g., BERT). Despite their promising results, these representation methods have not been comprehensively evaluated.

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models
PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Junda He, Bowen Xu, Zhou Yang, DongGyun Han, Chengran Yang, David Lo

IEEE/ACM International Conference on Program Comprehension (ICPC) 2022

Stack Overflow is often viewed as one of the most influential Software Question & Answer (SQA) websites, containing millions of programming-related questions and answers. Tags play a critical role in efficiently structuring the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant contents. Poorly selected tags often introduce extra noise and redundancy, which raises problems like tag synonym and tag explosion. Thus, an automated tag recommendation technique that can accurately recommend high-quality tags is desired to alleviate the problems mentioned above. Inspired by the recent success of pre-trained language models (PTMs) in natural language processing (NLP), we present PTM4Tag, a tag recommendation framework for Stack Overflow posts that utilize PTMs.

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Junda He, Bowen Xu, Zhou Yang, DongGyun Han, Chengran Yang, David Lo

IEEE/ACM International Conference on Program Comprehension (ICPC) 2022

Stack Overflow is often viewed as one of the most influential Software Question & Answer (SQA) websites, containing millions of programming-related questions and answers. Tags play a critical role in efficiently structuring the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant contents. Poorly selected tags often introduce extra noise and redundancy, which raises problems like tag synonym and tag explosion. Thus, an automated tag recommendation technique that can accurately recommend high-quality tags is desired to alleviate the problems mentioned above. Inspired by the recent success of pre-trained language models (PTMs) in natural language processing (NLP), we present PTM4Tag, a tag recommendation framework for Stack Overflow posts that utilize PTMs.

Natural Attack for Pre-trained Models of Code
Natural Attack for Pre-trained Models of Code

Zhou Yang, Jieke Shi, Junda He, David Lo

International Conference on Software Engineering (ICSE) 2022

In this paper, we propose ALERT (nAturaLnEss AwaRe ATtack), a black-box attack that adversarially transforms inputs to make victim models produce wrong outputs. Different from prior works, this paper considers the natural semantic of generated examples at the same time as preserving the operational semantic of original inputs. Our user study demonstrates that human developers consistently consider that adversarial examples generated by ALERT are more natural than those generated by the state-of-the-art work by Zhang et al. that ignores the naturalness requirement.

Natural Attack for Pre-trained Models of Code

Zhou Yang, Jieke Shi, Junda He, David Lo

International Conference on Software Engineering (ICSE) 2022

In this paper, we propose ALERT (nAturaLnEss AwaRe ATtack), a black-box attack that adversarially transforms inputs to make victim models produce wrong outputs. Different from prior works, this paper considers the natural semantic of generated examples at the same time as preserving the operational semantic of original inputs. Our user study demonstrates that human developers consistently consider that adversarial examples generated by ALERT are more natural than those generated by the state-of-the-art work by Zhang et al. that ignores the naturalness requirement.

More publications
Academic Service
Reviewer for
  • Communications of the ACM (2024-present)
  • ACM Transactions on Software Engineering and Methodology (2025-present)
  • Automated Software Engineering (2025-present)
  • Transactions on Intelligent Systems and Technology (2025-present)
  • Neurocomputing Journal (2024-present)
  • Subreviewer for conferences including ICSE 2023-2024, ASE 2022-2024, FSE 2023, etc.
Other Service
  • ICSE Shadow PC member (2025)
  • MSR Junior PC member (2024)