PhD defence of Loren Lugosch - Deep Neural Networks for Voice Control

Monday, May 15, 2023 13:00to15:00
McConnell Engineering Building , Room 603, 3480 rue University, Montreal, QC, H3A 0E9, CA



Voice control systems enable people to control their computers by speaking to them. After a review of the state-of-the-art in sequence modeling, speech recognition, and language understanding using deep learning, this thesis describes a number of contributions to the art of voice control. The first contribution is a study of large-scale semi-supervised learning through pseudo-labeling for massively multilingual speech recognition. The second contribution is a study of the use of autoregressive models for conditional computation with neural networks, using speech recognition as a test case. The third contribution is a method for training end-to-end spoken language understanding models using speech synthesis. The fourth contribution is a crowdsourced dataset, Timers and Such, for spoken language understanding involving numbers, along with baseline experimental results and open-source software infrastructure for using the dataset. The fifth contribution is our part in the design and implementation of SpeechBrain, an open-source software toolkit for speech processing. Finally, using some of the tools and techniques developed earlier in the thesis, we propose a simplified and unified approach to voice control in which the entire traditional pipeline, composed of an automatic speech recognition subsystem, a natural language understanding subsystem, and human-programmed control logic, is subsumed within a single deep neural network.

Back to top