PhD defence of Thi Kieu Khanh Ho – Time-Series Anomaly Detection with Graphs
Abstract
Time-Series Anomaly Detection (TSAD), the task of identifying patterns that deviate from expected behavior, is critical in domains such as e-commerce, cybersecurity, predictive maintenance, and healthcare. Despite substantial progress, TSAD remains challenging due to the complexity of time-series signals, the diversity of anomaly types, and the scarcity of high-quality labeled data. This thesis addresses these challenges through three complementary contributions.
First, the field lacks a systematic understanding of how emerging techniques such as graph modeling and self-supervised learning (SSL) can be leveraged for anomaly detection. Existing surveys often overlook the unique challenges of TSAD, leaving researchers without a roadmap to guide future work. To address this gap, we present the first comprehensive surveys on Graph-based TSAD (G-TSAD), a novel perspective on modeling time-series data using graph structures for the task of TSAD, and on Self-Supervised Learning for Anomaly Detection (SSL-AD), which demonstrates how proxy tasks can assist TSAD in obtaining robust representations from unlabeled data. These surveys highlight methodological advances, practical limitations, and provide an outlook on promising future directions for TSAD.
Second, while graph-based approaches have recently been introduced to capture spatial relationships across sensors in multivariate sensory systems, they often overlook fine-grained local structures, such as sub-graphs, that can be critical for detecting anomalies. To address this, we propose EEG-CGS, a novel contrastive and generative SSL framework for anomaly detection in complex sensory systems. EEG-CGS incorporates local structural patterns into graph representations while requiring no anomaly labels during training. This design improves robustness in multivariate TSAD and demonstrates strong performance in detecting anomalous sensors and regions.
Finally, a key challenge in unsupervised TSAD lies in the assumption that training data are purely normal, which is rarely valid in practice due to distribution shifts or labeling errors. Such contamination causes unsupervised methods to overfit and misclassify anomalies encountered during training. To address this, we introduce TSAD-C, a novel framework that incorporates graph representations and diffusions models, to capture both long-term temporal and spatial dependencies in time series, while explicitly handling contamination. Furthermore, unlike existing TSAD approaches benchmarked on small, curated datasets with simplistic anomalies, this thesis advances TSAD towards frameworks that generalize to complex, real-world scenarios and detect richer anomaly types, from local signal deviations to sensor- and region-level failures, with direct applications in clinical and industrial domains.