RICCARDO GIUBILEI

Dottore di ricerca

ciclo: XXXIII



Titolo della tesi: Energy Trees: Classification and Regression With Structured and Mixed-Type Data

As data analyses continue to grow in complexity, so has the need for frameworks and models that keep up the pace. Object Oriented Data Analysis has a primary role in this direction because it works directly with structured data objects, i.e. using variables that have not undergone any process of (further) simplification. However, so far, the focus has been only on single-type variables at the same time. In an attempt to fill this gap, Energy Trees are introduced in this work as a statistically sound model to perform classification and regression with structured and mixed-type covariates. Two successful and well-established ideas from the literature, namely Conditional Trees and Energy Statistics, are used and combined to build Energy Trees. In such a way, the proposed model benefits from several properties. However, the problem of splitting with respect to structured covariates is still not well-defined. In this work, two alternative procedures, namely feature vector extraction and clustering, are proposed and compared. Then, the choices that must be made both for traditional covariates, i.e. numeric and nominal, and for the structured covariates here considered, i.e. functions, graphs, and persistence diagrams, are outlined. Additionally, one of the striking advantages of Energy Trees is their great flexibility, hence general indications to change these choices, as well as implementing any other type of covariates, are also provided. Extensive simulation studies are employed to show that Energy Trees are unbiased, do not suffer from overfitting, and select meaningful covariates. These studies are performed for increasing levels of complexity, starting from traditional covariates only and arriving to the case of structured and mixed-type covariates. All of them provide positive results. Once Energy Trees are confirmed to work properly, their applicability, as well as some extensions, may be considered. With reference to the latter, the ensemble models called bagging of Energy Trees and Random Energy Forests are presented. Additionally, the Unsupervised Random Energy Forest model for unlabeled learning samples is introduced and tested on simulated data. The Energy Trees framework is implemented in the R package etree. Hence, the latter is described in detail, covering all the main functions and features. A usage example is also included, before describing both current and future work. Finally, Energy Trees are employed to conduct four empirical analyses on data coming from the fields of human biology and medicine. Specifically, the main prediction tasks involve knee osteoarthritis, intelligence, schizophrenia, and brain tumors. Covariates include the shape of the bones, multimodal brain connectomes, brain metabolic networks, demographic information, and various others. The analyses show that the predictive ability of the model is adequate, besides suggesting its potential utility in these important but intricate fields.

Produzione scientifica

11573/1672547 - 2022 - etree: Classification and Regression With Structured
and Mixed-Type Data in R
Giubilei, Riccardo; Padellini, Tullia; Brutti, Pierpaolo - 04b Atto di convegno in volume
congresso: The 51st Scientific Meeting of the Italian Statistical Society, SIS 2022 (Caserta, Italy)
libro: Book of short papers. SIS 2022 - (9788891932310)

11573/1489615 - 2020 - Unsupervised Energy Trees: Clustering With Complex and Mixed-Type Variables
Giubilei, Riccardo; Padellini, Tullia; Brutti, Pierpaolo - 04b Atto di convegno in volume
congresso: 50th Scientific Meeting of the Italian Statistical Society (Pisa)
libro: SIS2020 Book of short papers - (9788891910776)

11573/1489608 - 2020 - Topological and Mixed-type learning of Brain Activity
Padellini, Tullia; Brutti, Pierpaolo; Giubilei, Riccardo - 04b Atto di convegno in volume
congresso: 50th Scientific Meeting of the Italian Statistical Society (Pisa)
libro: SIS2020 Book of short papers - (9788891910776)

11573/1489659 - 2019 - ETrees: A Generalization of Conditional Trees to Mixed-Type Data
Giubilei, Riccardo; Padellini, Tullia; Brutti, Pierpaolo - 04d Abstract in atti di convegno
congresso: 32nd Edition of the European Meeting of Statisticians (Palermo)
libro: EMS2019 Program and Book of Abstracts - ()

11573/1189934 - 2018 - Supervised Learning for Link Prediction in Social Networks
Giubilei, Riccardo; Brutti, Pierpaolo - 04b Atto di convegno in volume
congresso: 49th Scientific meeting of the Italian Statistical Society, SIS2018 (Palermo)
libro: Book of short papers SIS 2018 - (9788891910233)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma