DATA SCIENCE

Offerta formativa erogata 2025/2026

Algorithmic Frontiers of Big Data

Instructors: Chris Schwiegelshohn and Francesco d'Amore Duration: 20 hours
When: 27-28-29/10 from 10 to 13:30, 3-4-5-6-7/11 from 10:30 to 12:30
Where: Room B203, DIAG, Via Ariosto 25
Abstract
Large and high-dimensional data sets are a staple of modern data analysis. There are two approaches to address the challenge of dealing with them. In order to make existing algorithms viable, we are interested in reducing the size and number of dimensions as much as possible. Alternatively, we can use different computational models as studied in distributed or streaming algorithms to mitigate limitations of our existing hardware. In this PhD class, we will offer an introduction into both of these directions.

In the first part of the course, we will see sparsification techniques for memory-efficient algorithms. The techniques that we introduce, such as random projections, PCA and sensitivity sampling, also lend themselves to data analysis directly, which we will also cover.

The second part of the course focuses on distributed computing, a concept that is at the core of modern computational infrastructures—ranging from the internet and data centers to blockchain networks. We will present the LOCAL model of computation (Linial, FOCS '87), a synchronous distributed model in which the cost of an algorithm is measured by the number of communication rounds, under the assumption of unlimited local computation and unbounded message size. The LOCAL model captures the following fundamental question: how far must information propagate in a network to solve a given task? We will introduce the basic concepts of distributed algorithms and the principles of the LOCAL model, review classical and more modern complexity results, and explore recent developments in its quantum and even “super-quantum” extensions.

The course is aimed at PhD students and is self-contained, with no particular background necessary other than familiarity with some basic linear algebra and probability theory.
Short Bios
Chris Schwiegelshohn is an associate professor from Aarhus University who is nostalgic about his time as part of the Sapienza faculty. He works mainly in algorithms, with a focus on approximation algorithms and learning theory.
Francesco d’Amore is a postdoctoral researcher at the Gran Sasso Science Institute (GSSI). His research primarily focuses on the theory of distributed computation. He received his PhD from Université Côte d’Azur and Inria, and later held postdoctoral positions at Aalto University and Bocconi University, before joining GSSI.

---

Physics-Informed Statistical Learning for Spatial and Functional Data
Abstract. This course offers an introduction to a family of physics-informed statistical learning methods designed for spatial and functional data. These models build upon nonparametric and semiparametric regression frameworks with roughness penalties. The penalties incorporate differential operators—ranging from simple second derivatives to more complex Partial Differential Equations—encoding the physics of the underlying phenomena, and complying with the geometry of the domain over which the data are observed. The methods can handle spatial and spatio-temporal data, as well as functional data, observed over multidimensional domains that can have complex shapes, such as non-convex planar regions, curved surfaces, irregular volumes, and linear networks. Moreover, the use of unstructured mesh discretization endows the methods with high flexibility, enabling the capture of highly localized signals, strong anisotropies, and non-stationary patterns.
The course will explore these methods through real-world applications in environmental and life sciences, demonstrating their effectiveness in modeling intricate spatial and functional data structures. Practical lab sessions will utilize the Python package fdaPDE.
When: February 2026
Where
Zoom – id: 867 5387 8695 – code: 627930 [link]
Sapienza University of Rome
Department of Computer, Control and Management Engineering (DIAG)
Via Ariosto, 25, 00185 Roma
Sala riunioni B101 – 1st floor
Schedule
09.00 – 10.00 | Session 1
10.00 – 10.20 | break
10.20 – 11.20 | Session 2
11.20 – 11.40 | break
11.40 – 12.40 | Session 3
12.40 – 14.20 | break
14.20 – 15.20 | Session

---

Title: Federated Learning - from data harmonization to federated AI training

Abstract:
This hackathon introduces participants to the process of developing and training AI models in a distributed, privacy-compliant research environment. During the event, students gain hands-on experience with cutting-edge federated learning technologies. They will learn how to build a decentralized infrastructure and harmonize heterogeneous clinical datasets using ontology-based methods. Participants will also build and operate a real federated network based on the insights of the Horizon Europe projects dAIbetes (grant agreement number 101136305) and Microb-AI-ome (grant agreement number 101079777). Participants will further learn how secure data exchange enables cross-institutional collaboration. Finally, participants will train machine learning models on harmonized, clinic-local data without centralizing sensitive information. By the end of the hackathon, participants will have a solid understanding of the technical fundamentals, challenges, and research potential of federated AI for biomedical applications, as well as an implemented, working end-to-end federated learning workflow.

Lecturers: Simon Süwer and Julian Klemm

Short bio for Simon:
Simon has been a PhD student at CoSyBio since 2024 working on the Horizon Europe project dAIbetes. He completed a Bachelor of Science in Applied Computer Science at the University of Applied Sciences and Arts Hannover, followed by a Master of Science in Computer Science at the University of Vienna, specialising in Data Science. In his master thesis he worked on the combination of session- and sequence-based recommender systems in a dynamic Graph Neural Network (GNN), in particular on the development of hierarchical dynamic GNNs.
Simon's current research aims at developing privacy-preserving tools that make federated collaboration not only easier, but almost effortless. By merging theory and practice, his goal is to create innovative solutions to complex challenges that redefine the way we think about data collaboration and privacy.

Short bio for Julian:
Julian has been a PhD student at CoSyBio since April 2023 as part of the Horizon Europe projects FeatureCloud and Microb-AI-ome. He received a Bachelor of Science in Biology as well as a Master of Science in Bioinformatics from the University Hamburg. In his master thesis he focused on privacy-enhancing techniques for federated learning, especially differential privacy, while in hi PhD his focus is on a federated data warehouse to allow easier data sharing in a privacy-preserving manner. Generally, Julian is focused on bringing privacy preserving tools for almost effortless federated collaboration.

---

CINECA & NVIDIA PhD Course: High-Performance Distributed Computing and Inference Optimizations

Dates: April 14-15, 2026

Program:

Day 1 - April 14, 2026
9:30-12:30: Introduction to high performance and distributed computing (Room 3.01)
14:00-17:00: Leonardo cluster 101 [hands-on] - from login to running distributed training (Room 2.01)
Day 2 - April 15, 2026
9:30-12:30 & 14:00-17:00: Optimize your model for faster inference - overview of inference optimizations and practical exercises on quantization, KV-cache compression for LLMs and diffusion models with NVIDIA Model-Optimizer on Leonardo (Room 3.01)
Location: Building D, Complesso Regina Elena, Viale Regina Elena, 295

Instructors:
Andrea Pilzer (NVIDIA) - Andrea Pilzer is a Solution Architect at NVIDIA leading the NVIDIA AI Technology Center in Italy where he focuses on supporting researchers on HPC clusters and NVIDIA technology adoption. His main interests are in deep learning, video processing, VLMs and uncertainty estimation. He was postdoc at Aalto University working on uncertainty estimation for deep learning, worked at Huawei Ireland and got his Ph.D. in CS from University of Trento working with Nicu Sebe and Elisa Ricci.
Sergio Orlandini (CINECA)
The course will include hands-on exercises using GPUs on the CINECA Leonardo cluster, providing you with practical experience in high-performance computing and modern inference optimization techniques.

--

Statistical Physics for Machine Learning
25-29 maggio
Phase Transitions, Inference, and High-Dimensional Landscapes
PhD Module · 5 Days × 4 Hours · Lecture | Whiteboard | Exercises & Open Problems
Abstract
Why do learning algorithms succeed — and when must they fail? This course builds a unified answer by developing the deep connections between statistical physics and machine learning. Rather than treating the two fields as adjacent disciplines, we show that they are asking the same fundamental questions: how do we learn models of complex systems from observations, and what are the fundamental limits of this process?
The course begins from first principles — entropy, free energy, and Bayesian inference — and establishes a common language in which maximum likelihood estimation, regularisation, and posterior inference all emerge naturally as instances of statistical mechanics. We derive and classify the phase transitions that govern the difficulty of learning: sharp boundaries between regimes where inference is information-theoretically impossible, computationally hard but statistically feasible, and efficiently solvable. These transitions are not artefacts of specific algorithms but intrinsic features of the problem geometry, reflecting thermodynamic phase transitions.
Central to the course is an introduction to disordered systems — the random-field Ising model, the Sherrington–Kirkpatrick spin glass, the p-spin model, and random k-satisfiability problem— through which we develop the replica method, the cavity method, and mean-field theory, to analyse the geometry of rugged energy landscapes. We further introduce the theory of random matrices — the semicircle law, the Stieltjes transform, and the BBP transition — as a complementary toolkit for understanding high-dimensional data. These tools are then deployed on canonical problems in modern machine learning: spectral methods and detectability transitions in PCA and community detection; storage capacity, memorisation, and retrieval in Hopfield networks and their modern exponential-capacity generalisations; the role of the Hessian in loss landscape geometry, gradient descent dynamics, and the edge of stability; and finally the statistical physics of generative diffusion models, where speciation and collapse emerge as counterparts of symmetry breaking and condensation in a glass phase.
Throughout, results are derived from first principles, making them tangible rather than axiomatic. The course closes by connecting these tools to open problems at the frontier of AI research: the geometry of overparameterised networks, implicit regularisation, and the fundamental limits of high-dimensional generation.
References
Amit D J (1989). Modeling Brain Function: The World of Attractor Neural Networks. Cambridge University Press.
Baik J, Ben Arous G & Péché S (2005). Phase transition of the largest eigenvalue for non-null complex sample covariance matrices. Annals of Probability, 33(5), 1643–1697.
Barbier J (2025). Mean-Field Theory of High-Dimensional Bayesian Inference, through the Lens of Spiked Matrix Estimation. ICTP Lecture Notes.
Barucca P (2017). Spectral partitioning in equitable graphs. Physical Review E, 95, 062310.
Barucca P (2026). Lecture Notes for Statistical Physics for Machine Learning.
Biroli G, Bonnaire T, de Bortoli V & Mézard M (2024). Dynamical regimes of diffusion models. Nature Communications, 15, 9957.
Bonnaire T, Urfin R, Biroli G & Mézard M (2025). Why diffusion models don't memorize: the role of implicit dynamical regularization in training. NeurIPS 2025 (Best Paper Award).
Bouchaud J-P & Potters M (2022). A First Course in Random Matrix Theory. Cambridge University Press.
Castellani T & Cavagna A (2005). Spin-glass theory for pedestrians. Journal of Statistical Mechanics, P05012.
Charbonneau P, Marinari E, Mézard M, Parisi G, Ricci-Tersenghi F, Sicuro G & Zamponi F, eds. (2023). Spin Glass Theory and Far Beyond: Replica Symmetry Breaking after 40 Years. World Scientific.
Ferreira L S, Metz F L & Barucca P (2025). Random matrix ensemble for the covariance matrix of Ornstein–Uhlenbeck processes with heterogeneous temperatures. Physical Review E, 111, 014151.
Livan G, Novaes M & Vivo P (2018). Introduction to Random Matrices: Theory and Practice. Springer.
Mézard M & Montanari A (2009). Information, Physics, and Computation. Oxford University Press.
Mézard M, Parisi G & Virasoro M A (1987). Spin Glass Theory and Beyond. World Scientific.
Zdeborová L & Krzakala F (2016). Statistical physics of inference: thresholds and algorithms. Advances in Physics, 65(5), 453–552.
About the Instructor
Paolo Barucca is Associate Professor in the Department of Computer Science at University College London. He holds a PhD in Theoretical Physics from Sapienza University of Rome, where he developed his early research in the statistical physics of disordered systems. His work sits at the intersection of statistical physics, random matrix theory, network theory, and machine learning, with applications to complex systems in economics and finance. His research addresses questions of stability, inference, and predictability in high-dimensional and complex systems.
Contact: p.barucca@ucl.ac.uk · Web: paolobarucca.com
Prerequisites: Linear algebra, probability theory, and basic calculus. No prior knowledge of statistical physics or spin glasses is required