Main navigation

Event

PhD defence of Boqi (Percy) Chen – Domain-Driven and Consistent Integration of Large Language Models: An Input-Output Perspective

Friday, October 24, 2025 13:00to15:00

McConnell Engineering Building Room 603, 3480 rue University, Montreal, QC, H3A 0E9, CA

Abstract

The rapid advancement of large language models (LLMs) has sparked their application across diverse domains, including software engineering (SE). In SE, LLMs have shown strong performance on a range of tasks and enabled the development of practical assistance tools such as GitHub Copilot and Cursor. However, integrating LLMs into complex real-world applications remains a significant challenge, particularly for domain-specific tasks that are underrepresented in LLM training data.

Human domain experts typically follow systematic domain-specific best practices when approaching complex tasks, relying on comprehensive workflows and a variety of domain-specific tools. To enhance LLMs’ ability to handle such tasks, existing research has focused on integrating these expert practices into LLM-based applications. Practical frameworks like LangChain and LangGraph have been proposed to support this integration by enabling the construction of domain-specific workflows. However, the workflow representations in these frameworks remain limited in their expressiveness, often lacking modularity and the capacity to model complex behaviors.

Furthermore, domain-specific tools often impose strict constraints on input data. To integrate LLMs with such tools effectively, it is essential to ensure that LLM-generated outputs conform to these constraints. However, the inherent nondeterminism of LLMs poses a significant challenge in achieving consistent outputs with respect to the constraints. While existing research has explored techniques such as constrained decoding to improve output consistency, these methods primarily target simple output formats and do not extend to more complex structures like graphs. Additionally, the relationship between consistency and the overall quality of LLM-generated outputs, particularly in graphs, remains insufficiently understood.

In this thesis, I propose two systematic approaches to address the challenges of workflow representation and output consistency in LLM-based applications. The contributions are organized around two high-level research questions. To tackle the first challenge on workflow representation (HRQ1), I introduce SHERPA, a framework that models domain-specific workflows as state machines, facilitating the integration of LLMs with domain-specific tools. This framework decouples workflow representation from its concrete implementation, supporting a modular and flexible design of LLM-based applications. Systematic evaluation demonstrates that SHERPA enables rapid experimentation with diverse workflows, leading to improved task performance and a better balance between cost and effectiveness.

To address the second challenge on output consistency (HRQ2), I propose AbsCon, a framework designed to ensure the consistency of LLM-generated graphs by leveraging the nondeterministic nature of LLMs. Generalizing a constraint optimization-based approach that I originally proposed for scene graph generation, AbsCon guarantees that the generated graphs satisfy domain-specific constraints. Evaluation results further demonstrate that enforcing such consistency also significantly improves the overall quality of the generated graphs when compared to human-constructed ground truths.

Boqi (Percy) Chen

Boqi (Percy) Chen received a BSEng degree from McGill University in 2020. He is currently a PhD candidate in the Department of Electrical and Computer Engineering at McGill University, supervised by Prof. Gunter Mussbacher and Prof. Dániel Varró. He is also an ML engineer at Aggregate Intellect through a Mitacs grant. Focusing on large language models, his research focuses on applying model-based methods to enhance the reliability of machine learning components and evaluating the performance and properties of these components in various software engineering tasks.