The Doctoral Colloquium is open to all.
Music Research Colloquium presents:
"Bayesian Hierarchical Models for Unsupervised Audio Source Separation"
Julian Neri (PhD candidate, music technology)
Friday, 22 January 2021 at 16:30
Abstract: Source separation plays a key role in active music listening technology because it aims at un-mixing a mono or stereo music recording into a larger number of audio tracks ideally corresponding to each musical instrument. While supervised methods that rely on tailored datasets of clean target sounds can separate real music recordings, unsupervised methods have not reached the same quality because learning from mixed audio alone requires strong prior information about the sources.
This presentation proposes several new unsupervised separation methods that are formulated in Bayesian theory and enabled by fast inference. While most existing approaches rely on nonparametric representations of sound, like the spectrogram or waveform, our unsupervised techniques disentangle a latent or parametric representation, namely the sinusoidal model. Two kinds of knowledge are exploited with the sinusoidal model to improve separation: parametric source models estimated in an unsupervised way, and perceptual grouping principles such as similarity, proximity, and common fate. A hierarchical model is designed for grouping partial trajectory data. Variational inference is used to invert the model, creating a sophisticated and flexible stochastic sound analysis.
Finally, deep learning is leveraged in a variational auto-encoder that automatically infers the number of sources and enables high-quality separation in real-time. Disentangling encoded sources not only proves effective in practice but also aligns with theories of human perception.