Thesis title: Algorithmic Methods for Biomedical Networks and Clinical Data
Among the many application domains revolutionised by the advent of Big Data and Data Science, through the availability of huge computational resources and methods, Medicine was most certainly one of them. While there are countless different successful applications of Data Science in Medicine, from image-based diagnosis to genome interpretation, from biomarker discovery to inferring health status through wearable devices, many open problems remain. Of varied interests is the development of integrated and comprehensive models for molecular biology, that we will study through the Network Medicine paradigm, and the fundamental problems of causal inference in observational clinical data sets.
On one side, Network Medicine aims to develop a new approach to biomedical science, combining principles and approaches from systems biology and network science to understand the causes of human diseases by identifying molecular relationships between distinct phenotypes [14], and find new treatments by drug-repurposing strategies[183].
Machine Learning Techniques have also proven to be necessary to solve complex problems in many sciences. Some outstanding results in biomedicine proved Machine Learning able to predict protein structure and function from genetic sequences, or to define optimal diets from patients’ clinical and microbiome profiles. Other striking examples can be found in paralyzed patients, where algorithms were able to read cortical activity directly from the brain, transmitting signals to the muscles and restoring motor control. As patients’ data collected and medical technologies become more complex, the role of Machine Learning will play more and more central in clinical medicine.
From a computational point of view, the developing of new technologies and algorithms for Biomedical Data applications could potentially start with the theoretical modeling of the problem, studying algorithm complexity and the learning limits. Otherwise, one could apply known and well defined techniques, adapting them for the specific use case. Consequently, I focused my research on three main directions: (I) to develop new Network Medicine algorithms for Genetic Data Analysis, (II) using well studied Machine Learning algorithms for Clinical Data analysis and (III) developing new algorithms to provide a theoretical framework for some common biological applications. Most certainly, the ultimate goal will be to leverage both genetic and clinical data at the same time, creating a comprehensive understanding of the human pathophysiology.
The Thesis is organized in two parts. Part I presents the research done in Network Medicine. It contains an adapted introduction to Network Medicine from "Molecular networks in Network Medicine: Development and applications" [215] (Chapter 1), then 3 applications of Network Medicine: a first work on Disease Gene Prediction [90] with an extended version recently submitted to Bioin- formatics Journal (Chapter 2), then a disease gene variant prioritization algorithm which will be shortly submit to Nature Reports Journal (Chapter 3) and a work on partial correlation network analysis (Chapter 4), which will be shortly submit to a peer reviewed bioinformatic journal too. Part II is divided in two chapters: an application of Machine Learning for short-term blood glucose prediction in Type 1 diabetes mellitus [196] (Chapter 5) and a theoretical analysis of all 1-center problems on Metric rational set similarities [31] (Chapter 6), a common problem in DNA sequence similarity analysis.