PhD defence of Pavel Rumiantsev – Efficient Algorithms for Automatic Structure Identification and Adaptation in Deep Neural Networks
Abstract
The structure of a neural network is one of the critical determinants of its performance and dictates how information flows. The architecture determines the representational capacity, inductive biases, and computational efficiency of a model. These factors directly dictate how well the network learns from data, how well it performs on a given task, and how well it generalises to new examples.
The challenge of identifying a good neural network architecture has traditionally been solved by manual design, relying on expert intuition and costly trial-and-error empirical experimentation. Such handcrafted designs are commonly reused for multiple applications to reduce the costs associated with structural design.
While sharing the same architecture across a variety of tasks may bring potential benefits of native multimodality, this choice can be inherently suboptimal for individual heterogeneous application requirements. As machine learning tasks grow more complex, a rigorous architectural design is imperative to achieve robust generalisation and efficient utilisation of computational resources.
This thesis contributes several algorithms for efficient architecture design and adaptation. It addresses both paradigms for identifying structure: static, where the goal is to identify the most suitable fixed architecture for a given task; and dynamic, where the structure of a neural network is adjusted to identify a specialised configuration for each data point of the task.
In the static paradigm, two improvements are proposed in the field of Zero-Shot Neural Architecture Search, a domain where an architecture must be selected without training any neural networks. The first contribution is a set of Zero-Shot ranking functions specifically designed for fast and memory-efficient evaluation of candidate architectures. They outperform state-of-the-art approaches not only in terms of accuracy, but also in terms of computational efficiency. The second contribution is a statistical comparison procedure designed to achieve improved architecture search stability. This procedure is compatible with common search algorithms and effectively mitigates the problem of Zero-Shot ranking functions variability.
In the dynamic paradigm, the thesis presents two novel sparse Mixture of Experts methods that efficiently tackle the problem of expert specialisation. The first contribution is a novel expert routing system that is designed to enforce the specialisation of experts. The thesis demonstrates the benefits of the proposed system by explaining how it can be used to achieve effective knowledge transfer from a teacher Graph Neural Network into a more efficient student Mixture of Experts model, outperforming existing Graph Neural Network knowledge distillation approaches.
The second contribution is a Mixture of Experts that is specifically tailored to the graph domain. The thesis proposes a novel graph-structure-aware expert routing procedure that is used to distribute inference tasks to a set of heterogeneous experts. This allows the learning architecture to adapt to distinct graph patterns and exhibit robustness across a wide variety of graph learning tasks.