Thesis title: Tackling the main challenges for an effective application of deep learning to earth observation
In recent years, deep learning (DL) methods reached state-of-the-art results, surpassing traditional methods, in several earth observation (EO) tasks, such as semantic segmentation, change detection (CD) and so on. In particular, we have witnessed both the increasing application of models designed for generic computer vision (CV) tasks to the field of EO (e.g., Neural Radiance Field) and the development and implementation of models designed specifically for this type of data, which have their own peculiarities. However, with the sudden increase of studies in this direction, various issues have become apparent. The purpose of this thesis is to identify such problems and to find an effective and efficient solution to solve them, enabling a wide and high-performance application of DL to EO. In particular, we identified three main issues:
1. Lack of large labeled datasets. In EO most of the tasks are based on supervised training. However, supervised training strongly depends on annotations. More than in other fields, for aerial, drone and remote sensing (RS) images, it is difficult to rely on a labeled dataset, in light of the high cost and the amount of effort and time that are required, along with a well founded expertise;
2. Lack of computational resources. Few laboratories and research institutes can afford large computational power. This is even more true for the field of EO, where the demand for GPUs has increased exponentially in just the past few years. In addition, data is often never available all at once and its continuous arrival generally forces the model to iteratively re-align itself over the entire dataset, incurring a large consumption of resources and time, in order to tackle catastrophic forgetting. An issue of unsustainable power consumption is also raised;
3. Lack of datasets for novel research lines. Several lines of research borrowed from CV have undergone strong and sudden development in the field of EO. Incidentally, there are some fields of application, peculiar to EO, that would be good to investigate. However, to open these lines of research, it is necessary to create datasets and algorithms ad hoc.
Given these problems, investigated in the thesis, we propose three branches of solutions. For the first issue we introduce self-supervised learning (SSL) as a mean to reduce the amount of annotated data needed. The goal of SSL is to learn an effective visual representation of the input using a massive quantity of data provided without any label. In particular, along with preliminary studies, we show the effectiveness of the proposed self-supervised Multi-Attention REsu-Net (MARE), that combines Online Bag of Words (OBoW) and Multi-Attention ResU-Net (MAResU-Net), to improve semantic segmentation results on the ISPRS Vaihingen benchmark dataset. Then, to surpass the need of big computational resources, overcoming at the same time catastrophic forgetting, we propose a combination of Continual Learning (CL) and SSL. In particular, Continual Barlow Twins (CBT), that puts toghether a CL strategy (namely Elastic Weight Consolidation) and a SSL strategy (that is Barlow Twins) is presented. We show very encouraging performance on semantic segmentation of three non-overlapping domain datasets (i.e. Potsdam, US3D, UAVid). Finally, we present a new research line, that is 3D Change Detection (3DCD). Particularly, we present a new dataset (namely 3DCD dataset) and a novel algorithm (MultiTask Bitemporal Image Transformer). In conclusion, we can affirm that the DL potentialities are becoming wider and wider also in EO and that the arisen problems brought new important and fruitful research lines that are being explored more and more.