Xiang Li (University of Pennsylvania)
Title: What Can Statistics Offer to Language Models: Watermarking and Evaluation
Abstract: `Large language models (LLMs) have transformed how we generate and process information, yet two foundational challenges remain: ensuring the authenticity of their outputs and accurately evaluating their true capabilities. In this talk, I argue that both challenges are, at their core, statistical problems, and that statistical thinking can play an important role in advancing reliable and principled research on large language models. I will present two lines of work that approach these problems from a statistical perspective.
The first part introduces a statistical framework for language watermarks, which embed imperceptible signals into model-generated text for provenance verification. By formulating watermark detection as a hypothesis testing problem, this framework identifies pivotal statistics, provides rigorous Type I error control, and derives optimal detection rules that are both theoretically grounded and computationally efficient. It clarifies the theoretical limits of existing methods, such as the Gumbel-max and inverse-transform watermarks, and guides the design of more robust and powerful detectors. The second part focuses on language model evaluation, where I study how to quantify the unseen knowledge that models possess but may not reveal through limited queries. To that end, I introduce a statistical pipeline, based on the smoothed Good–Turing estimator, to estimate the total amount of a model’s knowledge beyond what is observed in benchmark datasets. The findings reveal that even advanced LLMs often articulate only a fraction of their internal knowledge, suggesting a new perspective on evaluation and model competence. Together, these projects represent an ongoing effort to develop statistical foundations for trustworthy and reliable language models, with applications ranging from watermark detection to model evaluation.
🔗 Zoom: https://mcgill.zoom.us/j/85469273736
Meeting ID: 854 6927 3736