Thesis title: Learning To Learn Objects From The WEB
Deep networks do well when trained on large scale data sets. Compared to traditional learning approaches (also called shallow approaches), where feature extraction and classification are two separate steps often laded with heuristics, deep networks offer many advantages: first, they have proved over countless benchmarks to be able to achieve much higher accuracies on basically any visual recognition problem;second, they offer a conceptual simplicity of use that has made them very quickly the dominant learning tool of the community. Despite these advantages, they also present some limitations, such as high computational cost, long training time and the demand for large datasets, to name a few. This has given datasets like ImageNet a major role in the growth of deep learning architectures for visual object recognition.Nonetheless, ImageNet was created a decade ago and hasn’t been updated which means it is prone to aging, as well as dataset bias issues especially that it was manually annotated and controlled. Moving beyond fixed training datasets will lead to more robust visual systems. Recent work has begun to investigate how to use the massive amount of images available on the Web in place of manual image annotations. Even when following the simple route of downloading data from the Web, manually annotating the images downloaded is time consuming and costly.These factors have led most of researchers to rely instead on transfer learning, by taking deep architectures pre-trained over ImageNet and adapting them to their specific needs through fine-tuning on a smaller set of images. A very promising alternative consists in developing algorithms for the automatic creation of annotated data from the Web through smart downloading approaches.In this Thesis we will present 3 different approaches for the creation of Web datasets from the Web. The first method generates large scale datasets with minimal noise and maximum visual variability, based on a visual and natural language processing concept expansion strategy. We test the effectiveness of our approach through object categorization experiments using our Web-derived version of ImageNet on a popular robot vision benchmark database, and on a lifelong object discovery task on a mobile robot. The second method explores using seed images to produce datasets with visually similar features and we use this method to create a benchmark for object recognition inspired by RoboCup@Home competition and thus focusing on home robots. The third method automatically creates from the web annotated images for Object detection and instance segmentation. We show the effects of training on images of objects with backgrounds on segmentation. We analyse the level of confusion of the models when faced with fine grained categories vs higher level categories i.e.pasta plate vs just plate. Finally we also explore the limitation of training on a single object per image and discuss its effects on testing on crammed scenarios with multiple or stacked objects.