Andrew Mackenzie (McGill University)

Wednesday, September 27, 2023 13:00to14:00
Burnside Hall Room 1214, 805 rue Sherbrooke Ouest, Montreal, QC, H3A 0B9, CA

Title: Tensor Programs and µP

Abstract: We will discuss the limiting behaviour of large neural networks as the layer width goes to infinity. One of the factors that most affects limiting behaviour is the specific parametrization used; apart from training stability, this will determine whether or not the neural network can learn features. We show a technique for mechanically deriving the "best" parametrization, known as µP. As an additional empirical benefit, we demonstrate that under µP, hyperparameters transfer directly across different sizes of models, allowing for running all experiments at a small scale.

Back to top