MARIAELENA BOTTAZZI SCHENONE

PhD Graduate

PhD program:: XXXVIII


supervisor: Maurizio Vichi

Thesis title: Advances in unsupervised learning: integrating clustering and latent structure modeling

In recent years, the explosive growth of data across various domains has underscored the urgent need for advanced statistical methodologies capable of capturing latent structures, managing high-dimensionality, and yielding interpretable insights. Traditional clustering and factor analysis techniques often prove inadequate in these contexts due to their limited flexibility, sensitivity to noise and outliers, and inability to uncover complex, overlapping, or hierarchical patterns. This thesis addresses these challenges by introducing novel multivariate methods focused on unsupervised learning, simultaneous clustering and dimensionality reduction, and latent structure modeling. The research conducted throughout this work has resulted in ten scientific papers, all of which have been either published or are currently under review in leading international journals. Among these, the five most significant contributions are discussed in detail in this thesis, while the other works—three already published, two under second-round review and one recently submitted—are also briefly referenced to provide a comprehensive overview of the research outcomes. Each thesis’ chapter offers new methodological developments tailored to a distinct data structure or problem setting, while collectively reinforcing a broader vision: developing tools for flexible, interpretable, and robust analysis of high-dimensional data. The first two contributions investigate the connection between clustering and ranking of multivariate observations through Linear Ordered Partitions. Unlike traditional clustering that partitions units devoid of inherent order, the first model identifies clusters as equivalence classes ordered along a latent univariate dimension, thus yielding an optimal ranked partition of the data. The proposed method utilizes a constrained Factorial K-Means model. Complementing this, a second bootstrap-based methodology for clustering and ranking in a univariate context is also briefly introduced. It is named Cluster Ranking via Bootstrap K-Means, and it aligns with the Linear Ordered Partitions framework by constructing ranked equivalence classes focused on univariate data. This model identifies the maximum number of statistically distinct clusters, within which units are considered equivalent and differences are observed across clusters. Ranking is achieved through bootstrap confidence intervals for K-Means centroids, enabling both estimation of the optimal number of clusters and ranking within and across equivalence classes. Building upon integrating latent structure and clustering, we introduce the Fuzzy Reduced K-Means. Unlike standard fuzzy clustering, this model incorporates dimensionality reduction into the clustering process, unveiling latent dimensions that shape data structure while permitting observations to belong to multiple clusters with varying degrees of membership. Such flexibility is crucial for capturing the overlapping and multifaceted nature of real-world phenomena. Expanding upon this idea, the Generalized Reduced K-Means model is presented. It extends the Reduced K-Means framework by allowing different clusters to be associated with distinct latent subspaces. This model is particularly suited to scenarios where data dimensions contribute diversely to various subgroups. Attention then shifts to hierarchical modeling of latent structures. A foundational method for achieving such hierarchies is Structural Equation Modeling, utilized in a minor work to model complex interrelationships among multiple air pollutants and their determinants. Building on this, the innovative Ultrametric Factor Analysis model is introduced. It reconstructs the correlation matrix of Manifest Variables through a nested hierarchy of latent factors, forming an ultrametric tree. Unlike traditional factor analysis, this approach uncovers a unique and interpretable latent hierarchy, offering deep insights into underlying dimensions. Several work has been done in three-way setting. The Tucker3 model has been applied to socio-economic data and has been extended considering a disjoint and interpretable version. Moreover, a fuzzy entropic Triple K-Means model has been proposed. Shortly summarized in the Introduction, a third work simultaneously integrates clustering and latent variables modeling, softly partitioning the occasion mode of a unit-by-variable-by-occasion array into K clusters, producing K consensus matrices. Each consensus is analyzed with a Second-Order Disjoint Factor Analysis to extract first-order factors and a single General Factor, providing a compact, interpretable representation of the J variables within each cluster. The fourth work that considers three-way data, which is the one illustrated more in detail in this thesis, explores clustering each of the three modes of a data array within a robustness framework. It introduces dimension-wise and cell-wise robust extensions of the Triple K-Means algorithm, allowing for outlier detection and trimming at various levels: entire units, variables, or occasions (dimension-wise), or individual data cells (cell-wise). These methods, based on a trimmed Alternating Least-Squares optimization framework, incorporate a data-driven selection mechanism for outlier detection using the elbow method applied to second-order derivatives. Collectively, these ten studies contribute a cohesive toolkit for unsupervised learning in high-dimensional, noisy, and complex data. The proposed approaches are broadly applicable across fields where complexity and multidimensionality are standard, offering powerful solutions for extracting insights from intricate data landscapes.

Research products

11573/1758722 - 2026 - Three-way data analysis with explainable Tucker3 clustering (XT3Clus)
Bottazzi Schenone, Mariaelena; Iannaccio, Tiziano; Mozzetta, Ilaria; Vichi, Maurizio - 01a Articolo in rivista
paper: APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY ([Chichester]: John Wiley & Sons, ©1999-) pp. 1-16 - issn: 1526-4025 - wos: WOS:001695339600005 (0) - scopus: (0)

11573/1762113 - 2026 - Mapping potential natural vegetation through a quantitative and multivariate approach. The case study of Campobasso Functional Urban Area (Italy)
Montaldi, Alessandro; Stanisci, Angela; Varricchione, Marco; Carla De Francesco, Maria; Paura, Bruno; Antonietta Santoianni, Lucia; Ciaramella, Dario; Delchiaro, Michele; Pica, Alessia; Del Monte, Maurizio; Bottazzi Schenone, Mariaelena; Del Vico, Eva; Capotorti, Giulia - 01a Articolo in rivista
paper: GLOBAL ECOLOGY AND CONSERVATION (Amsterdam : Elsevier) pp. - - issn: 2351-9894 - wos: WOS:001721238800001 (0) - scopus: 2-s2.0-105033285205 (0)

11573/1756309 - 2025 - Fuzzy clustering and dimensionality reduction of a three-way data matrix
Bombelli, Ilaria; Bottazzi Schenone, Mariaelena; Vichi, Maurizio - 01a Articolo in rivista
paper: STATISTICS (Amsterdam: Gordon and Breach) pp. 1-34 - issn: 1029-4910 - wos: WOS:001619657500001 (0) - scopus: 2-s2.0-105022711114 (0)

11573/1735673 - 2025 - Ultrametric factor analysis for building hierarchies of reliable and unidimensional latent concepts
Bottazzi Schenone, Mariaelena; Cavicchia, Carlo; Vichi, Maurizio; Zaccaria, Giorgia - 01a Articolo in rivista
paper: PSYCHOMETRIKA (Cambridge: Cambridge University Press New York: Springer) pp. 1-20 - issn: 1860-0980 - wos: (0) - scopus: (0)

11573/1733118 - 2025 - A Latent Curve Model to Estimate the Evolution of Urban Air Pollution
Bottazzi Schenone, Mariaelena; Grimaccia, Elena - 04b Atto di convegno in volume
conference: SIS 2024 - The 52nd Scientific Meeting of the Italian Statistical Society (Bari)
book: Methodological and applied statistics and demography III - (9783031644313)

11573/1744954 - 2025 - A novel clustering method with maximum number of ordered centroids and stable clusters for optimal ranking in a univariate setting
Bottazzi Schenone, Mariaelena; Grimaccia, Elena; Vichi, Maurizio - 01a Articolo in rivista
paper: STATISTICAL METHODS & APPLICATIONS (Physica-Verlag, berlin) pp. - - issn: 1618-2510 - wos: (0) - scopus: (0)

11573/1742163 - 2025 - City quality of life by advanced tensor analysis
Bottazzi Schenone, Mariaelena; Iannaccio, Tiziano; Mozzetta, Ilaria; Vichi, Maurizio - 04b Atto di convegno in volume
conference: SIS 2025 - Statistics for Innovation (Genova)
book: Statistics for Innovation II, Short Papers, Contributed Sessions 1 - (9783031963025)

11573/1732297 - 2025 - Generalized Reduced K–Means
Bottazzi Schenone, Mariaelena; Rocci, Roberto; Vichi, Maurizio - 01a Articolo in rivista
paper: COMPUTATIONAL STATISTICS (Physica-Verlag GmBh & Company:Tiergartenstrasse 17, D 69121 Heidelberg Germany:011 49 6221 487492, INTERNET: http://www.springer.de, Fax: 011 49 6221 487177) pp. 1-26 - issn: 0943-4062 - wos: WOS:001387192800001 (1) - scopus: 2-s2.0-85213722921 (2)

11573/1736395 - 2025 - Uncovering socioeconomic disparities in European regions: a Tucker 3 clustering approach
Bottazzi Schenone, Mariaelena; Tomaselli, Venera; Vichi, Maurizio - 01a Articolo in rivista
paper: QUALITY & QUANTITY (Dordrecht; London; Boston; Amsterdam: Springer Nature Dordrecht; London; Boston; Amsterdam: Kluwer Academic Publishers; Elsevier Scientific) pp. 1- - issn: 0033-5177 - wos: (0) - scopus: 2-s2.0-105001994613 (2)

11573/1734403 - 2025 - Integrating fuzzy clustering and dimensionality reduction for enhanced analysis of large datasets on high-dimensional social phenomena
Bottazzi Schenone, Mariaelena; Vichi, Maurizio - 04d Abstract in atti di convegno
conference: DSSR 2025 - Towards a holistic understanding of society: bridging Social Sciences, Statistics and Computational Sciences (Pescara)
book: Book of Abstracts: Data Science & Social Research (DSSR 2025) - (9781326620653)

11573/1742166 - 2025 - Clustering Hierarchical Disjoint Principal Component Analysis for environmental impact assessment of food products
Bottazzi Schenone, Mariaelena; Vichi, Maurizio - 04b Atto di convegno in volume
conference: IES 2025 (Bressanone)
book: Innovation & Society: Statistics and Data Science for Evaluation and Quality - (9788854958494)

11573/1743382 - 2025 - Clustering for ranking multivariate data by Linear Ordered Partitions
Bottazzi Schenone, Mariaelena; Vichi, Maurizio - 01a Articolo in rivista
paper: ASTA ADVANCES IN STATISTICAL ANALYSIS (Heidelberg ; Berlin : Springer) pp. 1-32 - issn: 1863-8171 - wos: WOS:001532143300001 (0) - scopus: 2-s2.0-105011182018 (0)

11573/1743882 - 2025 - Fuzzy Reduced K-Means for analyzing high-dimensional social phenomena
Bottazzi Schenone, Mariaelena; Vichi, Maurizio - 01a Articolo in rivista
paper: ANNALS OF OPERATIONS RESEARCH (Switzerland: Springer Nature Dordrecht : Kluwer) pp. 1-25 - issn: 1572-9338 - wos: WOS:001536025500001 (0) - scopus: 2-s2.0-105011621996 (0)

11573/1699856 - 2024 - Structural equation models for simultaneous modeling of air pollutants
Bottazzi Schenone, Mariaelena; Grimaccia, Elena; Vichi, Maurizio - 01a Articolo in rivista
paper: ENVIRONMETRICS (John Wiley & Sons Limited:1 Oldlands Way, Bognor Regis, P022 9SA United Kingdom:011 44 1243 779777, EMAIL: cs-journals@wiley.co.uk, INTERNET: http://www.wiley.co.uk, Fax: 011 44 1243 843232) pp. 1-22 - issn: 1180-4009 - wos: WOS:001142082500001 (4) - scopus: 2-s2.0-85182166566 (4)

11573/1710416 - 2024 - Optimal number of clusters to rank a model-based index
Bottazzi Schenone, Mariaelena; Grimaccia, Elena; Vichi, Maurizio - 04b Atto di convegno in volume
conference: Conference of European Statistics Stakeholders (CESS) - 2022 (Roma)
book: High-quality and timely statistics - (978-3-031-63629-5)

11573/1739927 - 2024 - Advances in fuzzy clustering and dimensionality reduction
Bottazzi Schenone, Mariaelena; Vichi, Maurizio - 04d Abstract in atti di convegno
conference: MBC2 2024 - Models and Learning in Clustering and Classification 7th International Workshop (Catania)
book: Book of Abstracts: models and learning in clustering and classification 7th international workshop (2024) - (9791221067842)

11573/1692708 - 2023 - Computational assessment of k-means clustering on a Structural Equation Model based index
Bottazzi Schenone, Mariaelena; Grimaccia, Elena; Vichi, Maurizio - 04b Atto di convegno in volume
conference: SIS 2023 – Statistical Learning, Sustainability and Impact Evaluation (Ancona)
book: Book of short papers SIS 2023 - (9788891935618)

11573/1692718 - 2023 - Assessing environmental quality by clustering a structural equation model based index
Bottazzi Schenone, Mariaelena; Grimaccia, Elena; Vichi, Maurizio - 04b Atto di convegno in volume
conference: IES 2023 – Statistical methods for evaluation and quality: techniques, technologies, trends (Pescara)
book: Book of short papers IES 2023 - (9791280333698)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma