DONATO CRISOSTOMI

Dottore di ricerca

ciclo: XXXVII


co-supervisore: Prof. Emanuele Rodolà

Titolo della tesi: Model Merging: Foundations and Algorithms

Modern deep learning typically treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies an alternative paradigm, model merging: combining independently trained neural networks into a single model directly in weight space, without access to additional training data and with little or no optimization. The thesis is organized around two regimes. In the single-task setting, where models share a common objective but differ in initialization, we introduce C2M3, a cycle-consistent merging algorithm grounded in Frank–Wolfe optimization. C2M3 aligns collections of networks into a shared parameter space that serves as a reference-free aggregation point, making weight averaging meaningful without privileging any one model as the anchor. In the multi-task setting, where models are fine-tuned for distinct downstream tasks, we first develop a theoretical account of task vectors, the parameter differences between a fine-tuned model and its pretrained initialization. We show that task vectors admit a gradient-based interpretation under standard assumptions, clarifying both the success and the limits of task arithmetic. This gradient view has a direct consequence: gradients are known to exhibit low-rank structure, and task vectors inherit this property. We formalize and exploit this low-rank structure through Task Singular Vectors (TSV), a decomposition that supports both model compression and interference reduction in TSV-Merge. We then present MASS, an input-adaptive routing mechanism that uses TSV geometry to direct inference through task-relevant subspaces. Finally, we introduce MERGE3, an evolutionary merging framework that incorporates Item Response Theory to reduce evaluation costs by up to 50× while preserving solution quality. Taken together, these contributions place model merging on firmer theoretical and algorithmic foundations, advancing a paradigm in which learned capabilities can be composed, reused, and extended across models.

Produzione scientifica

11573/1763252 - 2026 - MASS: MoErging through Adaptive Subspace Selection
Crisostomi, Donato; Zirilli, Alessandro; Gargiulo, Antonio Andrea; Bucarelli, Maria Sofia; Scardapane, Simone; Silvestri, Fabrizio; Masi, Iacopo; Rodola', Emanuele - 04b Atto di convegno in volume
congresso: International Conference on Learning Representations (ICLR) (Rio De Janeiro, Brazil)
libro: International Conference on Learning Representations (ICLR) - ()

11573/1763254 - 2026 - Implicit Inversion turns CLIP into a Decoder
D'orazio, Antonio; Briglia, Maria Rosaria; Crisostomi, Donato; Loi, Dario; Rodola', Emanuele; Masi, Iacopo - 04b Atto di convegno in volume
congresso: International Conference on Learning Representations (ICLR) (Rio De Janeiro, Brazil)
libro: International Conference on Learning Representations (ICLR) - ()

11573/1751174 - 2025 - Task Singular Vectors: Reducing Task Interference in Model Merging
Gargiulo, Antonio Andrea; Crisostomi, Donato; Bucarelli, Maria Sofia; Scardapane, Simone; Silvestri, Fabrizio; Rodolà, Emanuele - 04b Atto di convegno in volume
congresso: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 (Nashville, Tennessee, USA)
libro: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - ()

11573/1750761 - 2025 - MERGE$^3$: Efficient Evolutionary Merging on Consumer-grade GPUs
Mencattini, Tommaso; Minut, Robert Adrian; Crisostomi, Donato; Santilli, Andrea; Rodola, Emanuele - 04f Poster
congresso: International Conference on Machine Learning (Vancouver; Canada)
libro: Proceedings of Machine Learning Research - ()

11573/1750756 - 2025 - Mergenetic: a Simple Evolutionary Model Merging Library
Minut, Adrian Robert; Mencattini, Tommaso; Santilli, Andrea; Crisostomi, Donato; Rodola, Emanuele - 04b Atto di convegno in volume
congresso: Association for Computational Linguistics (Vienna; Austria)
libro: System Demonstrations - ()

11573/1726455 - 2024 - C2M3: Cycle-Consistent Multi-Model Merging
Crisostomi, Donato; Fumero, Marco; Baieri, Daniele; Bernard, Florian; Rodola, Emanuele - 04b Atto di convegno in volume
congresso: Thirty-eighth Annual Conference on Neural Information Processing Systems (Vancouver, Canada)
libro: Advances in Neural Information Processing Systems - ()

11573/1698997 - 2023 - Mitigating the Burden of Redundant Datasets via Batch-Wise Unique Samples and Frequency-Aware Losses
Crisostomi, Donato; Caciolai, Andrea; Pedrani, Alessandro; Rottmann, Kay; Manzotti, Alessandro; Palumbo, Enrico; Bernardi, Davide - 04b Atto di convegno in volume
congresso: The 61st Annual Meeting of the Association for Computational Linguistics (ACL) (Toronto)
libro: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track) - ()

11573/1698996 - 2023 - AVEN-GR: Attribute value extraction and normalization using product graphs
Crisostomi, Donato; Ricatte, Thomas - 04b Atto di convegno in volume
congresso: The 61st Annual Meeting of the Association for Computational Linguistics (ACL) (Toronto)
libro: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track) - ()

11573/1621756 - 2022 - Few-Shot Object Detection: A Survey
Antonelli, Simone; Avola, Danilo; Cinque, Luigi; Crisostomi, Donato; Luca Foresti, Gian; Galasso, Fabio; Marini, Marco Raoul; Mecca, Alessio; Pannone, Daniele - 01a Articolo in rivista
rivista: ACM COMPUTING SURVEYS (ACM / Association for Computing Machinery:1515 Broadway, 17th Floor:New York, NY 10036:(212)869-7440, EMAIL: acmhelp@hq.acm.org, INTERNET: http://www.acm.org, Fax: (212)944-1318) pp. - - issn: 0360-0300 - wos: WOS:000886929000020 (82) - scopus: 2-s2.0-85145187840 (113)

11573/1672156 - 2022 - Metric Based Few-Shot Graph Classification
Crisostomi, Donato; Antonelli, Simone; Maiorca, Valentino; Moschella, Luca; Marin, Riccardo; Rodola', Emanuele - 04b Atto di convegno in volume
congresso: First Learning on Graphs Conference (Virtual Conference)
libro: Proceedings of the First Learning on Graphs Conference - ()

11573/1672304 - 2022 - Play música alegre: A Large-Scale Empirical Analysis of Cross-Lingual Phenomena in Voice Assistant Interactions
Crisostomi, Donato; Manzotti, Alessandro; Palumbo, Enrico; Davide, Bernardi; Campbell, Sarah; Garg, Shubham - 04b Atto di convegno in volume
congresso: Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22) (Abu Dhabi, United Arab Emirates)
libro: Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22) - ()

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma