Thesis title: Deep Learning with Structured Data: Topology Inference and Higher-Order Modeling
Relational data --broadly defined as a set of data points and the relationships between them-- plays a foundational role across diverse real-world applications, from social networks and biological systems to recommendation engines. For decades, graph representation has been the dominant paradigm for describing relational data, providing a framework that captures pairwise interactions between entities. In recent years, Graph Neural Networks (GNNs) have emerged as a powerful tool that enables the learning of complex patterns and dependencies within relational data.
GNNs have revolutionized relational data processing. However, traditional graph representation is inherently limited to encoding pairwise relations between entities. This is a fundamental constraint, as many real-world systems exhibit multi-way or higher-order interactions, where entities interact simultaneously in groups of three or more. These n-body interactions cannot be adequately captured by pairwise edges alone. Addressing this limitation requires moving beyond graphs to richer, more expressive frameworks. Recent advances in topological deep learning leverage concepts from algebraic topology—such as hypergraphs, simplicial complexes, and cell complexes—to enable higher-order network representation and modeling.
The pairwise or higher-order relationships --also referred to as topology-- of relational data may present multiple challenges, such as missingness, redundancy, or even the absence of relations altogether. This last scenario, where the topology is entirely absent, opens up an intriguing possibility: any type of data may possess an underlying structure, even when it is not explicitly defined. Topology learning aims to infer underlying --in other words, latent-- topology from the data itself.
This thesis contributes to methodological and practical aspects of relational data modeling, with a focus on the challenges of topology learning and higher-order network modeling. In this study, we address key challenges in these areas in multiple ways, including but not limited to: introducing novel methodologies for both graph and higher-order latent topology inference and demonstrating their effectiveness across various applications, including missing data imputation, network traffic compression, and goal-oriented semantic communication tasks. Introducing a novel conceptualization of homophily in higher-order networks; developing the MultiSet framework—capable of encompassing most current hypergraph neural network architectures—; creating the first benchmarking framework (TopoBenchmark); and proposing a new approach that combines sequence modeling with topological representations to enable efficient information propagation across different ranks.