Event

PhD defence of Yanyan Mu – From Foundational Components to Complex Factorizations: Task-Dependent Architectures via Undirected Graphical Models

Friday, September 19, 2025 09:30to11:15
McConnell Engineering Building Room 603, 3480 rue University, Montreal, QC, H3A 0E9, CA

Abstract

In this thesis we study generative models for high dimensional static and temporal data with a design principle of balancing the number of parameters, network complexity and explainability by determining the factors that represent the data, in the meantime, of maintaining a unified architecture for different vision tasks.

Restricted Boltzmann Machine (RBM) and other similar networks are difficult to extend beyond a limited spatial extent for the reason that the number of parameters grows exponentially with the large configuration spaces involved. We propose a generalization of a hierarchically undirected model that combines both top-down and bottom-up information propagation for image super-resolution tasks. It aggregates nearby sub-receptive fields to form a larger receptive field by adding a second hidden layer, while keeping the number of free parameters under control by convolutional weight sharing.

For temporal data, we focus on the fundamental principle of computer vision, that is, temporal correlations are the variations between related images which are caused by independent factors - object appearance and motion. The goal is to represent the underlying explanatory factors using decoupling rather than keeping them mixed. Once decoupled, each factor lies in a lower dimensional abstract space. Different computer vision tasks can be conducted more efficiently in corresponding spaces than they are in the original pixel space. We present an algorithm that decouples object appearance and location to amplitude and phase in static image by using complex factorization over orthogonal filter pairs. The filter pairs are learned in an unsupervised manner from multiple consecutive frames. We demonstrate that using this factorization, object movements are encoded in the phase gradient between frames over time by an experiment of optical flow recovery. Test results show that small disparity is successfully captured by the factorized phase gradient.

In separate but related work, we consider a stochastic mining simulation and put forward a solution using RBM with two-phase learning. Test results show that this approach offers significant improvements to conventional pattern-based algorithms as the RBM is better able to learn the underlying distribution of the sample data.

We believe that by considering the structural elements of neural networks, we can gain some insight into how to develop architectures that can be trained using standard gradient based methods and can tackle more complex problems without growth in complexity.

Back to top