TAIBA MAJID XXX

Dottoressa di ricerca

ciclo: XXXVIII


supervisore: Irene Amerini
co-supervisore: Marco Schaerf

Titolo della tesi: Defending Against Audio Deepfakes: Robust Detection in the Synthetic Speech Era

Neural speech synthesis has revolutionized voice cloning, enabling hyper-realistic audio creation with minimal training data. While offering transformative possibilities for accessibility and creativity, these technologies become devastating weapons when misused. Audio deepfakes now fuel sophisticated fraud, social engineering attacks, and misinformation campaigns that threaten democratic discourse. The ear, once our most trusted ally, is now our weakest sensor in distinguishing authentic from synthetic speech. This thesis confronts the critical challenge of audio deepfake detection, systematically addressing three fundamental barriers that cripple current methods' real-world effectiveness. First, existing approaches catastrophically fail across different datasets, generators, and audio processing conditions encountered in practice. Second, traditional architectures fundamentally misunderstand the intricate spectro-temporal fingerprints essential for forensic discrimination, while suffering from catastrophic forgetting when adapting to emerging synthesis techniques. Third, current methods lack the transparency and explainability demanded by forensic applications, where detection decisions must withstand rigorous legal scrutiny. To address these limitations, we propose a comprehensive framework tackling each challenge through specialized methodologies. For the generalization problem, we develop hierarchical learning approaches that model complex acoustic structures through novel architectures, moving beyond conventional networks that destroy crucial spatial-temporal information. We explore fusion strategies that combine complementary evidence streams, including spectral, temporal, and self-supervised representations, to create more robust detection systems. Additionally, we investigate ensemble methods that leverage diverse acoustic cues to improve cross-domain performance. Recognizing that the synthetic speech landscape continuously evolves, we address the adaptation challenge through knowledge transfer methodologies. We propose continual learning strategies that enable detectors to incorporate new synthesis methods without catastrophic forgetting. Our codec-aware framework specifically handles compression artifacts and channel variations encountered in real-world scenarios, ensuring robust performance across different audio processing pipelines. Finally, we tackle the requirement for explainable detection systems suitable for forensic workflows. We develop an interpretable framework that provides transparent explanations for detection decisions, moving beyond simple confidence scores to offer a detailed analysis of acoustic features that indicate synthetic content. The proposed methods offer valuable insights for both audio deepfake detection and the broader multimedia forensics field. The hierarchical modeling approaches, fusion strategies, and interpretability frameworks can be adapted to address various challenges across different forensic domains. Our evaluation protocols establish new standards for assessing detection system reliability in practical deployment scenarios. We consider this thesis a significant contribution to synthetic speech forensics. While achieving promising results, the rapidly evolving nature of generative technologies necessitates continuous development of new detection methodologies. Our approach provides valuable foundations for future research, contributing to trustworthy systems capable of maintaining information integrity in an era of increasingly sophisticated synthetic media.

Produzione scientifica

11573/1741899 - 2025 - Deepfake Media Forensics: Status and Future Challenges
Amerini, I.; Barni, M.; Battiato, S.; Bestagini, P.; Boato, G.; Bruni, V.; Caldelli, R.; De Natale, F.; De Nicola, R.; Guarnera, L.; Mandelli, S.; Majid, T.; Marcialis, G. L.; Micheletto, M.; Montibeller, A.; Orru, G.; Ortis, A.; Perazzo, P.; Puglisi, G.; Purnekar, N.; Salvi, D.; Tubaro, S.; Villari, M.; Vitulano, D. - 01a Articolo in rivista
rivista: JOURNAL OF IMAGING (Basel : MDPI AG, 2015-) pp. - - issn: 2313-433X - wos: WOS:001453125100001 (22) - scopus: 2-s2.0-105001128461 (33)

11573/1737982 - 2025 - Audio Deepfake Detection: A Continual Approach with Feature Distillation and Dynamic Class Rebalancing
Wani, T. M.; Amerini, I. - 04b Atto di convegno in volume
congresso: 27th International Conference on Pattern Recognition, ICPR 2024 (Kolkata)
libro: Pattern Recognition - (9783031783043; 9783031783050)

11573/1741570 - 2025 - HCN-TA: Hierarchical Capsule Network with Temporal Attention for a Generalizable Approach to Audio Deepfake Detection
Wani, T. M.; Uecker, M.; Wani, F. A.; Amerini, I. - 04b Atto di convegno in volume
congresso: 40th Annual ACM Symposium on Applied Computing, SAC 2025 (Sicily, Italy)
libro: Proceedings of the 2023 International Conference on Communication, Signal Processing and Computer Engineering - (9798400706295)

11573/1714009 - 2024 - Advances and Challenges in Computer Vision for Image-Based Plant Disease Detection: A Comprehensive Survey of Machine and Deep Learning Approaches
Syed Asif Ahmad, Qadri; Nen-Fu, Huang; Taiba Majid, Taiba Majid; Showkat Ahmad, Bhat - 01d Recensione
rivista: IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING (New York : Institute of Electrical and Electronics Engineers, c2004-) pp. 1-32 - issn: 1545-5955 - wos: WOS:001214263100001 (22) - scopus: 2-s2.0-85192143901 (24)

11573/1714014 - 2024 - Detecting audio deepfakes: integrating CNN and BiLSTM with multi-feature concatenation
Taiba Majid, Taiba Majid; Qadri, Syed Asif Ahmad; Comminiello, Danilo; Amerini, Irene - 04b Atto di convegno in volume
congresso: IH&MMSec '24: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security (Baiona, Spain)
libro: IH&MMSec '24: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security - (979-8-4007-0637-0)

11573/1726267 - 2024 - ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake Detection
Wani, T. M.; Gulzar, R.; Amerini, I. - 04b Atto di convegno in volume
congresso: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 (Seattle; USA)
libro: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) - (979-8-3503-6547-4; 979-8-3503-6548-1)

11573/1714012 - 2023 - Plant Disease Detection and Segmentation using End-to-End YOLOv8: A Comprehensive Approach
Qadri, S. A. A.; Huang, N. -F.; Wani, T. M.; Bhat, S. A. - 04b Atto di convegno in volume
congresso: 13th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2023 (Penang; Malaysia)
libro: IEEE 13th International Conference on Control System, Computing and Engineering (ICCSCE) - (979-8-3503-2318-4; 979-8-3503-2317-7; 979-8-3503-2319-1)

11573/1693704 - 2023 - Deepfakes Audio Detection Leveraging Audio Spectrogram and Convolutional Neural Networks
Taiba Majid, Taiba Majid; Amerini, I. - 04b Atto di convegno in volume
congresso: Udine, Italy (22nd International Conference on Image Analysis and Processing, ICIAP 2023)
libro: Image Analysis and Processing – ICIAP 2023 - (978-3-031-43152-4; 978-3-031-43153-1)

11573/1713909 - 2021 - A Comprehensive Review of Speech Emotion Recognition Systems
Taiba Majid, Taiba Majid; Teddy Surya, Gunawan; Syed Asif Ahmad, Qadri; Mira, Kartiwi; Eliathamby, Ambikairajah - 01a Articolo in rivista
rivista: IEEE ACCESS (Piscataway NJ: Institute of Electrical and Electronics Engineers) pp. 47795-47814 - issn: 2169-3536 - wos: WOS:000637174900001 (158) - scopus: 2-s2.0-85103266589 (310)

11573/1714011 - 2021 - Stride Based Convolutional Neural Network for Speech Emotion Recognition
Wani, T. M.; Gunawan, T. S.; Qadri, S. A. A.; Mansor, H.; Arifin, F.; Ahmad, Y. A. - 04b Atto di convegno in volume
congresso: 2021 IEEE 7th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA) (Bandung; Indonesia)
libro: 2021 IEEE 7th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA) - (978-1-7281-7523-2; 978-1-7281-7522-5; 978-1-7281-7524-9)

11573/1714010 - 2020 - Speech emotion recognition using convolution neural networks and deep stride convolutional neural networks
Wani, T. M.; Gunawan, T. S.; Qadri, S. A. A.; Mansor, H.; Kartiwi, M.; Ismail, N. - 04b Atto di convegno in volume
congresso: 6th International Conference on Wireless and Telematics, ICWT 2020 (Yogyakarta)
libro: 2020 6th International Conference on Wireless and Telematics (ICWT) - (9781728175966)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma