Thesis title: Deep Learning Methods Applied to Vector Quantile Regression and Simultaneous Music Generation and Separation
In this thesis, I will present the work I have done during my PhD in two areas of machine learning and statistics, the first being multivalued quantile regression and the second being deep learning applied to music processing.
Quantile regression refers to the problem of finding the conditional quantiles of a response variable X given a set of covariates Y.
Quantile regression has been extended to the multivalued case in the literature, through a definition of the multivalued quantile function as the map solving an euclidean optimal transport problem between a uniform r. v. in the unit hypercube and the r.v underlying the data.
The algorithm proposed in the literature suffers however from scalability issues and retrieves a non differentiable map, making the computation of derived statistical quantities such as the CDF, PDF and inverse function challenging.
The first part of this thesis is thus dedicated to developing a differentiable and more scalable algorithm for VQR, based on learning the convex potentials involved in the optimal transport problem through a partially input convex neural network (PICNN).
After that, building on previous work, we discuss our extension of the vector quantile regression framework to target variables defined on manifolds.
The second part of this thesis is dedicated to the development of a deep learning model for simultaneous music generation and music source separation.
The model operates directly on waveforms and it is the first able to perform both total and partial music generation and source separation, without requiring specialized training for each task.
It is based on diffusion models, a class of generative models that have been shown to be able to generate high quality samples in a wide range of domains, including images, audio and text.
The two topics can be seen as two faces of the very general task of sampling from a distribution learned from data.
In fact vector quantile regression allows conditional sampling through a canonical map, allowing a precise control over useful statistical quantities, at the expense of scalability.
On the other hand diffusion models allow sampling from very high dimensional distributions, such as music waveforms, in a scalable way, but without the same level of control over the derived statistical quantities.