SIMONE TEDESCHI

PhD Graduate

PhD program:: XXXVII


supervisor: Roberto Navigli

Thesis title: Towards Comprehensive and Efficient Information Extraction Across Languages

The exponential growth of textual data shared online has created an urgent need for methods that can effectively extract, structure, and interpret information from vast and varied texts. Information Extraction (IE), a key area within Natural Language Processing (NLP), addresses this need by transforming unstructured text into structured formats enabling automated text analytics and decision-making. However, existing IE systems face substantial challenges in scalability and generalization. These challenges include limited labeled data for low-resource languages, computational demands that restrict accessibility to only well-resourced institutions, and a predominant focus on popular entities. Additionally, most IE tasks are entity-centric tasks (e.g. Named Entity Recognition, Entity Disambiguation, and Relation Extraction), thus overlooking the broader contextual richness present in many texts. This thesis aims at advancing the field of IE by tackling these critical issues through novel resources, methodologies, and theoretical approaches aimed at fostering a multilingual, scalable, and semantically-enriched IE framework. To bridge the multilingual gap, we leverage a combination of neural and knowledge-based approaches and create multilingual datasets for NER and Relation Extraction, ensuring that IE systems can operate effectively across diverse linguistic settings. On the computational front, we propose optimizations designed to reduce the resource requirements of IE models, especially in the context of Entity Disambiguation, enabling broader adoption of NLP technologies by reducing dependence on high-performance hardware and extensive labeled datasets. Additionally, this work challenges traditional IE frameworks by expanding the focus beyond named entities to encompass abstract concepts, idiomatic expressions, and tail entities, which are essential for a more nuanced and comprehensive understanding of texts. Through these contributions, this research aims to establish a robust foundation for multilingual, resource-efficient IE systems that can meet the evolving demands of global text analytics across varied languages, domains, and cultural contexts. Finally, to further encourage the usage and development of multilingual IE systems, we publicly release all the artifacts -- datasets and models -- introduced in this thesis.

Research products

11573/1724737 - 2024 - Language Pivoting from Parallel Corpora for Word Sense Disambiguation of Historical Languages: A Case Study on Latin
Ghinassi, Iacopo; Tedeschi, Simone; Marongiu, Paola; Navigli, Roberto; Mcgillivray, Barbara - 04b Atto di convegno in volume
conference: LREC-COLING 2024 (Turin; Italy)
book: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) - (9782493814104)

11573/1717749 - 2024 - CNER: Concept and Named Entity Recognition
Martinelli, Giuliano; Molfese, Francesco; Tedeschi, Simone; Fernández-Castro, Alberte; Navigli, Roberto - 04b Atto di convegno in volume
conference: North American Association for Computational Linguistics (Mexico City; Mexico)
book: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - (979-8-89176-114-8)

11573/1713460 - 2024 - CroCoAlign: A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts
Molfese, Francesco Maria; Bejgu, Andrei Stefan; Tedeschi, Simone; Conia, Simone; Navigli, Roberto - 04b Atto di convegno in volume
conference: European Association for Computational Linguistics (St. Julian's; Malta)
book: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) - (979-8-89176-088-2)

11573/1711963 - 2024 - Analyzing Homonymy Disambiguation Capabilities of Pretrained Language Models
Proietti, Lorenzo; Perrella, Stefano; Tedeschi, Simone; Vulpis, Giulia; Lavalle, Leonardo; Sanchietti, Andrea; Ferrari, Andrea; Navigli, Roberto - 04b Atto di convegno in volume
conference: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (Torino; Italy)
book: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) - (978-2-493814-10-4)

11573/1686592 - 2023 - REDFM: a Filtered and Multilingual Relation Extraction Dataset
Huguet Cabot, Pere-Lluis; Tedeschi, Simone; Ngonga Ngomo, Axel-Cyrille; Navigli, Roberto - 04b Atto di convegno in volume
conference: Association for Computational Linguistics (Toronto, Canada)
book: The 61st Conference of the the Association for Computational Linguistics - (978-1-959429-72-2)

11573/1685069 - 2023 - What's the Meaning of Superhuman Performance in Today's NLU?
Tedeschi, Simone; Bos, Johan; Declerck, Thierry; Hajič, Jan; Hershcovich, Daniel; Hovy, Eduard; Koller, Alexander; Krek, Simon; Schockaert, Steven; Sennrich, Rico; Shutova, Ekaterina; Navigli, Roberto - 04b Atto di convegno in volume
conference: Association for Computational Linguistics (Toronto, Canada)
book: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics - (9781959429722)

11573/1664524 - 2022 - Human Attention Assessment Using A Machine Learning Approach with GAN-based Data Augmentation Technique Trained Using a Custom Dataset
Pepe, S.; Tedeschi, S.; Brandizzi, N.; Russo, S.; Iocchi, L.; Napoli, C. - 01a Articolo in rivista
paper: OBM NEUROBIOLOGY (Beachwood OH: Open Biomedical Publishing Corporation) pp. - - issn: 2573-4407 - wos: (0) - scopus: 2-s2.0-85144152564 (24)

11573/1671588 - 2022 - EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation
Scott Keh, Sedrick; Bharadwaj, Rohit K.; Liu, Emmy; Tedeschi, Simone; Gangal, Varun; Navigli, Roberto - 04b Atto di convegno in volume
conference: 3rd Workshop on Figurative Language Processing (FLP) (Abu Dhabi; United Arab Emirates)
book: Proceedings of the 3rd Workshop on Figurative Language Processing (FLP) - (9781959429111)

11573/1653020 - 2022 - ID10M: Idiom Identification in 10 Languages
Tedeschi, Simone; Martelli, Federico; Navigli, Roberto - 04b Atto di convegno in volume
conference: Findings of the Association for Computational Linguistics: NAACL 2022 (Seattle, United States)
book: Findings of the Association for Computational Linguistics: NAACL 2022 - (9781955917766)

11573/1653019 - 2022 - MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)
Tedeschi, Simone; Navigli, Roberto - 04b Atto di convegno in volume
conference: 2022 Findings of the Association for Computational Linguistics: NAACL 2022 (Seattle; United States)
book: Findings of the Association for Computational Linguistics: NAACL 2022 - (9781955917766)

11573/1653120 - 2022 - NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection
Tedeschi, Simone; Navigli, Roberto - 04b Atto di convegno in volume
conference: 16th International Workshop on Semantic Evaluation, SemEval 2022 (Seattle, United States)
book: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) - (9781955917803)

11573/1599786 - 2021 - Named entity recognition for entity linking: what works and what’s next
Tedeschi, Simone; Conia, Simone; Cecconi, Francesco; Navigli, Roberto - 04b Atto di convegno in volume
conference: Empirical Methods in Natural Language Processing (Punta Cana, Dominican Republic)
book: Findings of the Association for Computational Linguistics: EMNLP 2021 - ()

11573/1599784 - 2021 - WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER
Tedeschi, Simone; Maiorca, Valentino; Campolungo, Niccolò; Cecconi, Francesco; Navigli, Roberto - 04b Atto di convegno in volume
conference: Empirical Methods in Natural Language Processing (Punta Cana, Dominican Republic)
book: Findings of the Association for Computational Linguistics: EMNLP 2021 - ()

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma