ANXHELO DIKO

PhD Graduate

PhD program:: XXXVII


co-supervisor: Prof. Luigi Cinque

Thesis title: Perceiving in Time: See, Understand, Remember

Rooted in the remarkable capability of human visual cognition to navigate the real world effortlessly, this thesis spans from perception to memory, to theoretically and empirically examine the fundamental concepts needed to realize the vision of an artificial visual intelligence: a machine with the capacity to autonomously perceive the world, understand semantics, reason in time, and remember. The journey begins with perception. Vision Transformers, despite their success, suffer from feature collapse—a loss of spatial structures in deeper layers caused by over-globalized attention. ReViT addresses this through a simple yet effective residual attention mechanism that preserves spatial awareness, the very foundation of visual perception. However, understanding perception alone is insufficient. The real world unfolds through interconnected events defined by motion, interactions and semantic patterns. This brings our journey towards S-GEAR, which tackles semantic understanding and temporal intelligence in the context of action anticipation —a challenge that goes beyond isolated actions and explores sequences, aiming to understand what has happened in order to predict what is next. S-GEAR tackles this by explicitly embedding semantic relationships and temporal co-occurrence patterns into visual representations. Where existing anticipation methods treat events in isolation, S-GEAR recognizes that semantics constrain what can happen next, dramatically reducing future uncertainty. The final piece is memory—the mechanism that enables us to recall past experiences when needed most. ReWind introduces a query-guided memory system that determines not just how to compress information, but what to remember and what to forget. By coordinating reading, writing, and selection mechanisms, ReWind enables visual models to process long videos efficiently while maintaining coherent and relevant information. Together, these contributions advance the path toward machines that not only see, but also understand and remember.

Research products

11573/1732914 - 2025 - Semantically Guided Representation Learning for Action Anticipation
Diko, Anxhelo; Avola, Danilo; Prenkaj, Bardh; Fontana, Federico; Cinque, Luigi - 04b Atto di convegno in volume
conference: European Conference on Computer Vision (Milan)
book: 18th European Conference on Computer Vision, ECCV 2024 - (978-3-031-73390-1)

11573/1756578 - 2025 - ReWind: Understanding Long Videos with Instructed Learnable Memory
Diko, Anxhelo; Wang, Tinghuai; Swaileh, Wassim; Sun, Shiyan; Patras, Ioannis - 04b Atto di convegno in volume
conference: Computer Vision and Pattern Recognition (CVPR) (Nashvielle, Tennese, Stati Uniti)
book: IEEE Proceedings of Computer Vision and Pattern Recognition (CVPR) - ()

11573/1732586 - 2024 - ReViT: Enhancing Vision Transformers Feature Diversity with Attention Residual Connections
Diko, Anxhelo; Avola, Danilo; Cascio, Marco; Cinque, Luigi - 01a Articolo in rivista
paper: PATTERN RECOGNITION (Elsevier Science Limited:Oxford Fulfillment Center, PO Box 800, Kidlington Oxford OX5 1DX United Kingdom:011 44 1865 843000, 011 44 1865 843699, EMAIL: asianfo@elsevier.com, tcb@elsevier.co.UK, INTERNET: http://www.elsevier.com, http://www.elsevier.com/locate/shpsa/, Fax: 011 44 1865 843010) pp. 1-13 - issn: 0031-3203 - wos: WOS:001290647100001 (17) - scopus: 2-s2.0-85200520823 (20)

11573/1713996 - 2024 - Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks
Lanzino, Romeo; Fontana, Federico; Diko, Anxhelo; Marini, Marco Raoul; Cinque, Luigi - 04b Atto di convegno in volume
conference: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Seattle; USA)
book: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) - (979-8-3503-6547-4; 979-8-3503-6548-1)

11573/1723846 - 2024 - Evaluation of Rehabilitation Outcomes in Patients with Chronic Neurological Health Conditions Using a Machine Learning Approach
Santilli, Gabriele; Mangone, Massimiliano; Agostini, Francesco; Paoloni, Marco; Bernetti, Andrea; Diko, Anxhelo; Tognolo, Lucrezia; Coraci, Daniele; Vigevano, Federico; Vetrano, Mario; Vulpiani, Maria Chiara; Fiore, Pietro; Gimigliano, Francesca - 01a Articolo in rivista
paper: JOURNAL OF FUNCTIONAL MORPHOLOGY AND KINESIOLOGY (Switzerland : MDPI AG, Basel) pp. 1-19 - issn: 2411-5142 - wos: (0) - scopus: (0)

11573/1674631 - 2023 - COVID-19 therapy optimization by aI-driven biomechanical simulations
Agrimi, E; Diko, A; Carlotti, D; Ciardiello, A; Borthakur, M; Giagu, S; Melchionna, S; Voena, C - 01a Articolo in rivista
paper: THE EUROPEAN PHYSICAL JOURNAL PLUS (Heidelberg ; Berlin : Springer) pp. 1-10 - issn: 2190-5444 - wos: WOS:000941121400004 (2) - scopus: 2-s2.0-85149304590 (3)

11573/1696639 - 2023 - Real-time GAN-based model for underwater image enhancement
Avola, D.; Cannistraci, I.; Cascio, M.; Cinque, L.; Diko, A.; Distante, D.; Foresti, G. L.; Mecca, A.; Scagnetto, I. - 04b Atto di convegno in volume
conference: Proceedings of the 22nd International Conference on Image Analysis and Processing, ICIAP 2023 (Udine)
book: Image Analysis and Processing – ICIAP 2023 - (978-3-031-43147-0; 978-3-031-43148-7)

11573/1695340 - 2023 - A machine learning approach for knee injury detection from magnetic resonance imaging
Mangone, M.; Diko, A.; Giuliani, L.; Agostini, F.; Paoloni, M.; Bernetti, A.; Santilli, G.; Conti, M.; Savina, A.; Iudicelli, G.; Ottonello, C.; Santilli, V. - 01a Articolo in rivista
paper: INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (Basel: MDPI 2003-) pp. 1-11 - issn: 1660-4601 - wos: (0) - scopus: 2-s2.0-85163685598 (12)

11573/1691701 - 2023 - The use of machine learning for inferencing the effectiveness of a rehabilitation program for orthopedic and neurological patients
Santilli, Valter; Mangone, Massimiliano; Diko, Anxhelo; Alviti, Federica; Bernetti, Andrea; Agostini, Francesco; Palagi, Laura; Servidio, Marila; Paoloni, Marco; Goffredo, Michela; Infarinato, Francesco; Pournajaf, Sanaz; Franceschini, Marco; Fini, Massimo; Damiani, Carlo - 01a Articolo in rivista
paper: INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (Basel: MDPI 2003-) pp. 1-16 - issn: 1660-4601 - wos: (0) - scopus: 2-s2.0-85153939862 (19)

11573/1655215 - 2022 - A novel GAN-based anomaly detection and localization method for aerial video surveillance at low altitude
Avola, D.; Cannistraci, I.; Cascio, M.; Cinque, L.; Diko, A.; Fagioli, A.; Foresti, G. L.; Lanzino, R.; Mancini, M.; Mecca, A.; Pannone, D. - 01a Articolo in rivista
paper: REMOTE SENSING (Basel : Molecular Diversity Preservation International) pp. 1-18 - issn: 2072-4292 - wos: WOS:000845372100001 (19) - scopus: 2-s2.0-85137772162 (35)

11573/1599991 - 2022 - Low-altitude aerial video surveillance via one-class svm anomaly detection from textural features in uav images
Avola, D.; Cinque, L.; Di Mambro, A.; Diko, A.; Fagioli, A.; Foresti, G. L.; Marini, M. R.; Mecca, A.; Pannone, D. - 01a Articolo in rivista
paper: INFORMATION (Basel: Molecular Diversity Preservation International) pp. 1-21 - issn: 2078-2489 - wos: WOS:000757982800001 (21) - scopus: 2-s2.0-85121686772 (28)

11573/1553510 - 2021 - MS-faster R-CNN: multi-stream backbone for improved faster R-CNN object detection and aerial tracking from UAV images
Avola, D.; Cinque, L.; Diko, A.; Fagioli, A.; Foresti, G. L.; Mecca, A.; Pannone, D.; Piciarelli, C. - 01a Articolo in rivista
paper: REMOTE SENSING (Basel : Molecular Diversity Preservation International) pp. 1-18 - issn: 2072-4292 - wos: WOS:000650746700001 (82) - scopus: 2-s2.0-85105436695 (98)

11573/1756577 - 2021 - In-silico analysis of airflow dynamics and particle transport within a human nasal cavity
Pratim Borthakur, Manash; Succi, Sauro; Sterpone, Fabio; P('(E))Rot, Franck; Diko, Anxhelo; Melchionna, Simone - 01a Articolo in rivista
paper: JOURNAL OF COMPUTATIONAL SCIENCE (Amsterdam : Elsevier) pp. - - issn: 1877-7503 - wos: WOS:000702816500006 (4) - scopus: 2-s2.0-85111039898 (4)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma