MASSIMILIANO MANCINI

Dottore di ricerca

ciclo: XXXII


supervisore: Barbara Caputo

Titolo della tesi: Towards Recognizing New Semantic Concepts in New Visual Domains

Deep learning is the leading paradigm in computer vision. However, deep models heavily rely on large scale annotated datasets for training. Unfortunately, labeling data is a costly and time-consuming process and datasets cannot capture the infinite variability of the real world. Therefore, deep neural networks are inherently limited by the restricted visual and semantic information contained in their training set. In this thesis, we argue that it is crucial to design deep neural architectures that can operate in previously unseen visual domains and recognize novel semantic concepts. In the first part of the thesis, we describe different solutions to enable deep models to generalize to new visual domains, by transferring knowledge from a labeled source domain(s) to a domain (target) where no labeled data are available. We first address the problem of unsupervised domain adaptation assuming that both source and target datasets are available but as mixtures of multiple latent domains. In this scenario, we propose to discover the multiple domains by introducing in the deep architecture a domain prediction branch and to perform adaptation by considering a weighted version of batch-normalization (BN). We also show how variants of this approach can be effectively applied to other scenarios such as domain generalization and continuous domain adaptation, where we have no access to target data but we can exploit either multiple sources or a stream of target images at test time. Finally, we demonstrate that deep models equipped with graph-based BN layers are effective in predictive domain adaptation, where information about the target domain is available only in the form of metadata. In the second part of the thesis, we show how to extend the knowledge of a pre-trained deep model incorporating new semantic concepts, without having access to the original training set. We first consider the problem of adding new tasks to a given network and we show that using simple task-specific binary masks to modify the pre-trained filters suffices to achieve performance comparable to those of task-specific models. We then focus on the open-world recognition scenario, where we are interested not only in learning new concepts but also in detecting unseen ones, and we demonstrate that end-to-end training and clustering are fundamental components to address this task. Finally, we study the problem of incremental class learning in semantic segmentation and we discover that the performances of standard approaches are hampered by the fact that the semantic of the background changes across different learning steps. We then show that a simple modification of standard entropy-based losses can largely mitigate this problem. In the final part of the thesis, we tackle a more challenging problem: given images of multiple domains and semantic categories (with their attributes), how to build a model that recognizes images of unseen concepts in unseen domains? We also propose an approach based on domain and semantic mixing of inputs and features, which is a first, promising step towards solving this problem.

Produzione scientifica

Connessione ad iris non disponibile

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma