PhD defence of Liheng Ma – Towards More Expressive Graph Neural Networks via Learning on Pseudo-Coordinates
Abstract
Learning on graphs has emerged as a fundamental direction in machine learning, as part of the broader field of geometric deep learning, which focuses on modeling the intricate structures and symmetries underlying complex data.
However, existing standard paradigm of graph neural networks (GNNs) -- message passing neural networks (MPNNs) -- face inherent limitations in terms of expressivity, capacity, flexibility, and generality.
This thesis introduces a novel paradigm—learning on pseudo-coordinates—which provides not only enhanced expressive and representational power but also greater modeling flexibility.
Pseudo-coordinates serve as coordinate-like spatial representations that relax the strict constraints of conventional coordinate systems.
They are particularly valuable for domains lacking canonical coordinate systems, such as graphs and manifolds.
The main body of this thesis is composed of three principal studies.
The first study focuses on the design of graph Transformers, representing a realization of the graph learning paradigm on pseudo-coordinates.
In this work, we introduce a powerful graph Transformer, encompassing a novel pseudo-coordinate design and an advanced model architecture, which demonstrates the superior effectiveness of graph learning on pseudo-coordinates compared to the conventional message-passing paradigm.
The second study investigates alternative approaches within this paradigm.
We introduce a novel graph convolutional operator defined on pseudo-coordinates, leveraging the design of continuous convolutional kernels.
The proposed graph convolution is not only highly expressive but also generalizes many existing GNN formulations, including but not limited to MPNNs and polynomial spectral GNNs.
More importantly, it exhibits distinct characteristics from attention mechanisms in graph Transformers, thereby expanding the design space for developing diverse and powerful graph neural architectures.
The final study, inspired by recent advances in Transformer-based multimodal foundation models, revisits the design of graph Transformers. We demonstrate that, contrary to the prevailing trend of introducing complex architectural modifications, plain Transformer architectures—when equipped with our proposed lightweight enhancements—can serve as highly effective graph learners. These enhancements are broadly applicable across domains and gracefully reduce to the original Transformer when necessary. This finding underscores the versatility of vanilla Transformer architectures and highlights their strong potential as a unified backbone for multimodal learning across language, vision, and graph domains.
Collectively, this thesis underscores the broad potential of the graph learning on pseudo-coordinates paradigm, which offers not only strong expressivity and flexibility but also a promising pathway toward unifying graph learning with multimodal foundation models.