Titolo della tesi: Multimodal Hypercomplex Learning for Biomedical Data
Deep learning (DL) is revolutionizing healthcare, from diagnostic applications to clinical research tools such as brain-computer interfaces (BCIs). In breast cancer screening, computer-aided diagnosis (CAD) systems based on neural models help clinicians produce more accurate diagnoses, reducing false positives, recall rates, unnecessary radiation exposure, patient anxiety, and costs. In BCI applications, DL serves both diagnostic purposes, like seizure detection, and neurocognitive research, such as understanding how the brain encodes visual information.
A crucial aspect of applying DL to medicine is addressing the inherently multimodal nature of medical data. This includes different views (mammography and chest X-ray), different modalities (T1/T2 MRI sequences), or analyzing multiple imaging modalities that monitor the same condition (mammography, breast ultrasound, and digital breast tomosynthesis). Similarly, in BCI applications like emotion recognition, responses are multimodal - reflected in behavioral reactions and physiological changes like heart rate and brain activity. However, current multimodal methods often fail to effectively leverage information from different modalities. Studies in both computer vision and medical applications have shown that conventional multimodal networks tend to focus on a single modality, overlooking the complementary information contained in different inputs.
Hypercomplex neural networks (HNNs), particularly quaternion neural networks (QNNs) and parameterized hypercomplex neural networks (PHNNs), have shown promising results in processing multidimensional data. These models operate in hypercomplex domains with distinct algebraic rules, such as the Hamilton product in the quaternion domain. This formulation endows QNNs with the ability to learn from both global and local relations, unlike real-valued networks. While QNNs are limited to four-dimensional inputs, PHNNs extend their benefits - particularly correlation learning - to any $n$-dimensional input. However, their learning capabilities have not been investigated for multimodal inputs, such as different physiological signals or views of a mammogram, as opposed to multidimensional inputs like RGB images. In this thesis, we explore this concept and define a new learning paradigm, namely \textit{multimodal hypercomplex learning} (MHL), to overcome the aforementioned shortcomings of traditional multimodal approaches. We redefine global relations as intra-modality features and local relations as inter-modality interactions. We investigate this learning paradigm across diverse medical settings and imaging modalities, studying cross-modal hypercomplex learning behavior throughout different network levels. We explore architectures where different components operate in different hypercomplex domains, allowing each input to be processed in its natural domain.
In this regard, we propose several multimodal hypercomplex architectures. We design multi-view models for breast cancer detection, incorporating both two-view and four-view approaches. These models are also tested on multi-view chest X-rays for multi-label disease classification and on multimodal brain MRIs for overall survival prediction and tumor segmentation. Additionally, we introduce a novel framework for breast cancer detection involving attention-map augmentation and MHL, capable of learning correlations between mammograms and corresponding attention maps. We validate this framework also on multi-resolution histopathological images of breast cancer, demonstrating cross-resolution generalizability. Lastly, we propose multiple multimodal hypercomplex networks for emotion recognition from electroencephalograms (EEGs) and peripheral physiological signals, exploring various architectural designs including hypercomplex fusion modules and hierarchical hypercomplex structures.
Finally, we address a direct extension of these works, focusing on explainability of hypercomplex models. We study the learning behavior of HNNs by introducing interpretable PHNNs and quaternion-like models that do not require post-hoc methods, evaluating them qualitatively and quantitatively on both natural and medical images.
Through extensive experiments across various medical scenarios involving different imaging modalities and biomedical signals, we have thoroughly explored multimodal hypercomplex learning while addressing scenario-specific challenges through the lens of explainability, advancing research in this field.