STEFANO D'ARRIGO

PhD Graduate

PhD program:: XXXVIII


supervisor: Fabio Galasso

Thesis title: Video Anomaly Detection: Ensuring the Safety of Human Actions and Street Scenes

Artificial Intelligence, particularly Computer Vision, holds immense potential to enhance human safety and advance society's digital transition. This thesis addresses the challenges of developing robust and efficient AI for complex, human-centered tasks, spanning from behavior monitoring to driving scenes. We analyze the task of Video Anomaly Detection and its related applications in human action monitoring, crowd occupancy estimation, and out-of-distribution detection in street scenes. For human action monitoring, we propose two novel methods. COSKAD demonstrates the critical impact of latent space geometry on learning representations of expected human actions, proving that low-dimensional vectors can effectively embed complex spatio-temporal dependencies. MoCoDAD advances this by estimating the latent distribution of human motion, leveraging an action's inherent variability to robustly distinguish normal from abnormal behavior. Shifting from individual to group dynamics, STEERER-V introduces a method to precisely estimate a crowd's space occupancy, and by proxy its weight, directly from 2D RGB images. This approach bypasses computationally expensive intermediate steps and is accompanied by ANTHROPOS-V, a new benchmark to spur further research in this domain. Finally, to enhance the reliability of self-driving systems, CMS-OoD presents a cross-modal steering technique. It efficiently adapts a large Vision-Language Model to condition a semantic segmentation task model, significantly improving OOD detection. As a key benefit, this method also generates grounded textual explanations of the observed scene, fostering safer, more interpretable human-vehicle interaction. Collectively, these contributions demonstrate that through geometric priors, distributional assumptions, or cross-modal conditioning, we can develop AI systems that are more robust, efficient, and better aligned with human needs in complex environments.

Research products

11573/1741969 - 2025 - ANTHROPOS-V: Benchmarking the Novel Task of Crowd Volume Estimation
Collorone, Luca; D'arrigo, Stefano; Pappa, Massimiliano; D'amely Di Melendugno, Guido M.; Ficarra, Giovanni; Galasso, Fabio - 04b Atto di convegno in volume
conference: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 (Tucson; Usa (AZ))
book: Proceedings of the 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025 - (979-8-3315-1083-1)

11573/1757985 - 2025 - HierVision: Standardized and Reproducible Hierarchical Sources for Vision Datasets
Kasarla, Tejaswi; Hulikal Rooparaghunath, Ruthu; D'arrigo, Stefano; Mago, Gowreesh; Jha, Abhishek; Ayoughi, Melika; Shreya Mishra, Swasti; Manzano Rodríguez, Ana; Long, Teng; Ghadimi Atigh, Mina; Van Spengler, Max; Mettes, Pascal - 04b Atto di convegno in volume
conference: IEEE International Conference on Computer Vision (Honolulu; Hawaii, USA)
book: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops - ()

11573/1726559 - 2024 - Contracting skeletal kinematics for human-related video anomaly detection
Flaborea, Alessandro; D'amely Di Melendugno, Guido Maria; D'arrigo, Stefano; Sterpa, Marco Aurelio; Sampieri, Alessio; Galasso, Fabio - 01a Articolo in rivista
paper: PATTERN RECOGNITION (Elsevier Science Limited:Oxford Fulfillment Center, PO Box 800, Kidlington Oxford OX5 1DX United Kingdom:011 44 1865 843000, 011 44 1865 843699, EMAIL: asianfo@elsevier.com, tcb@elsevier.co.UK, INTERNET: http://www.elsevier.com, http://www.elsevier.com/locate/shpsa/, Fax: 011 44 1865 843010) pp. - - issn: 0031-3203 - wos: WOS:001291491800001 (18) - scopus: 2-s2.0-85200884155 (21)

11573/1699647 - 2023 - Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection
Flaborea, Alessandro; Collorone, Luca; D'amely Di Melendugno, Guido Maria; D'arrigo, Stefano; Prenkaj, Bardh; Galasso, Fabio - 04b Atto di convegno in volume
conference: IEEE/CVF International Conference on Computer Vision 2023 (Paris, France)
book: Proceedings of the IEEE/CVF International Conference on Computer Vision - (979-8-3503-0718-4)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma