PhD defence of Boqi (Percy) Chen – Domain-Driven and Consistent Integration of Large Language Models: An Input-Output Perspective
Abstract
The rapid advancement of large language models (LLMs) has sparked their application across diverse domains, including software engineering (SE). In SE, LLMs have shown strong performance on a range of tasks and enabled the development of practical assistance tools such as GitHub Copilot and Cursor. However, integrating LLMs into complex real-world applications remains a significant challenge, particularly for domain-specific tasks that are underrepresented in LLM training data.
Human domain experts typically follow systematic domain-specific best practices when approaching complex tasks, relying on comprehensive workflows and a variety of domain-specific tools. To enhance LLMs’ ability to handle such tasks, existing research has focused on integrating these expert practices into LLM-based applications. Practical frameworks like LangChain and LangGraph have been proposed to support this integration by enabling the construction of domain-specific workflows. However, the workflow representations in these frameworks remain limited in their expressiveness, often lacking modularity and the capacity to model complex behaviors.
Furthermore, domain-specific tools often impose strict constraints on input data. To integrate LLMs with such tools effectively, it is essential to ensure that LLM-generated outputs conform to these constraints. However, the inherent nondeterminism of LLMs poses a significant challenge in achieving consistent outputs with respect to the constraints. While existing research has explored techniques such as constrained decoding to improve output consistency, these methods primarily target simple output formats and do not extend to more complex structures like graphs. Additionally, the relationship between consistency and the overall quality of LLM-generated outputs, particularly in graphs, remains insufficiently understood.
In this thesis, I propose two systematic approaches to address the challenges of workflow representation and output consistency in LLM-based applications. The contributions are organized around two high-level research questions. To tackle the first challenge on workflow representation (HRQ1), I introduce SHERPA, a framework that models domain-specific workflows as state machines, facilitating the integration of LLMs with domain-specific tools. This framework decouples workflow representation from its concrete implementation, supporting a modular and flexible design of LLM-based applications. Systematic evaluation demonstrates that SHERPA enables rapid experimentation with diverse workflows, leading to improved task performance and a better balance between cost and effectiveness.
To address the second challenge on output consistency (HRQ2), I propose AbsCon, a framework designed to ensure the consistency of LLM-generated graphs by leveraging the nondeterministic nature of LLMs. Generalizing a constraint optimization-based approach that I originally proposed for scene graph generation, AbsCon guarantees that the generated graphs satisfy domain-specific constraints. Evaluation results further demonstrate that enforcing such consistency also significantly improves the overall quality of the generated graphs when compared to human-constructed ground truths.