DANIEL MAURICIO JIMENEZ GUTIERREZ

Dottore di ricerca

ciclo: XXXVIII


supervisore: Andrea Vitaletti
co-supervisore: Ioannis Chatzigiannakis

Titolo della tesi: Toward Robust and Fair Personalized Federated Learning under Quantified Non-IID Data

In a Fourth Industrial Revolution landscape, information flows across institutions—banks, hospitals, public agencies—and IoT devices such as smartphones, wearables, home assistants, and connected vehicles. However, centralized data storage to build machine learning (ML) models is often constrained by privacy laws, regulatory requirements, bandwidth, and governance. Federated Learning (FL) enables collaborative model training without centralizing data, but real deployments face non-independent and identically distributed (non-IID) data that slows or destabilizes convergence, and harms both global and per-client models’ performance. This thesis develops a metrics-driven, fairness-aware approach to robust (personalized) FL under non-IID conditions. First, it presents FedArtML, a toolkit that partitions centralized datasets into federated clients with tunable levels of non-IID data across label, attribute (feature), quantity, and spatiotemporal skews, and includes metrics to quantify the level of non-IID data. Using FedArtML, a large-scale empirical study over eight datasets shows that label and spatiotemporal skews are most damaging, while attribute and quantity skews are comparatively less harmful. For label skew, we identify two degradation regimes at HD≈0.5 and HD≥0.75, where accuracy drops accelerate and rounds-to-accuracy increase; transfer-learning models are particularly sensitive. Building on these findings, the thesis proposes three mitigation methods. FedLECC combines client clustering with loss-aware selection, improving accuracy while reducing communication overhead (up to 15× vs. FedAvg) and selecting ∼ 20% of clients without loss of performance. PSI-PFL adapts the Population Stability Index (PSI) to label skew with data-driven thresholds; its weighted variant (WPSI) is more discriminative than other state-of-the-art metrics and yields higher global accuracy and stronger client fairness. Clust-PSI-PFL forms PSI-guided client clusters and trains cluster-specific models, consistently improving global accuracy and reducing local disparity (up to 37% average distance reduction) across Dirichlet and Similarity partitions. Two real-world case studies validate feasibility and document performance/runtime trade-offs under IID and non-IID regimes: (i) multi-hospital 12-lead ECG arrhythmia classification, and (ii) a public-administration chatbot leveraging large language models (LLMs). Overall, the results demonstrate that quantifying, characterizing, and aligning selection/personalization to measured non-IID data produces more robust and fair FL.

Produzione scientifica

11573/1741911 - 2025 - Graph neural networks to model and optimize the operation of water distribution networks. A review
Vittori, Giacomo; Falkouskaya, Yelizaveta; Jimenez Gutierrez, Daniel Mauricio; Cattai, Tiziana; Chatzigiannakis, Ioannis - 01a Articolo in rivista
rivista: JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION (Amsterdam: Elsevier B.V.) pp. 1-22 - issn: 2467-964X - wos: WOS:001522355700001 (3) - scopus: 2-s2.0-105009077261 (7)

11573/1698341 - 2024 - Application of federated learning techniques for arrhythmia classification using 12-lead ECG signals
Jimenez Gutierrez, Daniel M; Hassan, Hafiz Muuhammad; Landi, Lorella; Vitaletti, A; Chatzigiannakis, Ioannis - 04b Atto di convegno in volume
congresso: 8th International Symposium on Algorithmic Aspects of Cloud Computing, ALGOCLOUD 2023 (Amsterdam; The Netherlands)
libro: Algorithmic Aspects of Cloud Computing - (978-3-031-49361-4; 978-3-031-49360-7)

11573/1711428 - 2024 - FedArtML: A Tool to Facilitate the Generation of Non-IID Datasets in a Controlled Way to Support Federated Learning Research
Jimenez, Daniel; Anagnostopoulos, Aris; Chatzigiannakis, Ioannis; Vitaletti, Andrea - 01a Articolo in rivista
rivista: IEEE ACCESS (Piscataway NJ: Institute of Electrical and Electronics Engineers) pp. 81004-81016 - issn: 2169-3536 - wos: WOS:001249285400001 (8) - scopus: 2-s2.0-85195379558 (18)

11573/1698339 - 2024 - Olive Leaf Infection Detection Using the Cloud-Edge Continuum
Sarantakos, Themistoklis; Jimenez Gutierrez, Daniel Mauricio; Amaxilatis, Dimitrios - 04b Atto di convegno in volume
congresso: 8th International Symposium on Algorithmic Aspects of Cloud Computing, ALGOCLOUD 2023 (Amsterdam)
libro: Algorithmic Aspects of Cloud Computing - (978-3-031-49360-7; 978-3-031-49361-4)

11573/1670724 - 2023 - Device discovery and tracing in the Bluetooth Low Energy domain
Locatelli, Pierluigi; Perri, Massimo; Jimenez Gutierrez, Daniel Mauricio; Lacava, Andrea; Cuomo, Francesca - 01a Articolo in rivista
rivista: COMPUTER COMMUNICATIONS (Butterworth Heinemann Publishers:Linacre House Jordan Hill, Oxford OX2 8DP United Kingdom:011 44 1865 314569, EMAIL: bhmarketing@repp.co.uk, INTERNET: http://www.laxtonsprices.co.uk, Fax: 011 44 1865 314569) pp. 42-56 - issn: 0140-3664 - wos: WOS:000943958100001 (6) - scopus: 2-s2.0-85148327049 (8)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma