SIMONE TEDESCHI

PhD Graduate

PhD program:: XXXVII

supervisor: Roberto Navigli

Thesis title: Towards Comprehensive and Efficient Information Extraction Across Languages

The exponential growth of textual data shared online has created an urgent need for methods that can effectively extract, structure, and interpret information from vast and varied texts. Information Extraction (IE), a key area within Natural Language Processing (NLP), addresses this need by transforming unstructured text into structured formats enabling automated text analytics and decision-making. However, existing IE systems face substantial challenges in scalability and generalization. These challenges include limited labeled data for low-resource languages, computational demands that restrict accessibility to only well-resourced institutions, and a predominant focus on popular entities. Additionally, most IE tasks are entity-centric tasks (e.g. Named Entity Recognition, Entity Disambiguation, and Relation Extraction), thus overlooking the broader contextual richness present in many texts. This thesis aims at advancing the field of IE by tackling these critical issues through novel resources, methodologies, and theoretical approaches aimed at fostering a multilingual, scalable, and semantically-enriched IE framework. To bridge the multilingual gap, we leverage a combination of neural and knowledge-based approaches and create multilingual datasets for NER and Relation Extraction, ensuring that IE systems can operate effectively across diverse linguistic settings. On the computational front, we propose optimizations designed to reduce the resource requirements of IE models, especially in the context of Entity Disambiguation, enabling broader adoption of NLP technologies by reducing dependence on high-performance hardware and extensive labeled datasets. Additionally, this work challenges traditional IE frameworks by expanding the focus beyond named entities to encompass abstract concepts, idiomatic expressions, and tail entities, which are essential for a more nuanced and comprehensive understanding of texts. Through these contributions, this research aims to establish a robust foundation for multilingual, resource-efficient IE systems that can meet the evolving demands of global text analytics across varied languages, domains, and cultural contexts. Finally, to further encourage the usage and development of multilingual IE systems, we publicly release all the artifacts -- datasets and models -- introduced in this thesis.

Research products

11573/1747704 - 2025 - How Much Do Encoder Models Know About Word Senses?

Teglia, Simone; Tedeschi, Simone; Simone And Roberto, ; Navigli, Roberto - 04b Atto di convegno in volume

conference: Association for Computational Linguistics (Vienna; Austria)

book: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) - (979-8-89176-251-0)

11573/1724737 - 2024 - Language Pivoting from Parallel Corpora for Word Sense Disambiguation of Historical Languages: A Case Study on Latin

Ghinassi, Iacopo; Tedeschi, Simone; Marongiu, Paola; Navigli, Roberto; Mcgillivray, Barbara - 04b Atto di convegno in volume

conference: LREC-COLING 2024 (Turin; Italy)

book: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) - (9782493814104)

11573/1717749 - 2024 - CNER: Concept and Named Entity Recognition

Martinelli, Giuliano; Molfese, Francesco; Tedeschi, Simone; Fernández-Castro, Alberte; Navigli, Roberto - 04b Atto di convegno in volume

conference: North American Association for Computational Linguistics (Mexico City; Mexico)

book: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - (979-8-89176-114-8)

11573/1713460 - 2024 - CroCoAlign: A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts

Molfese, Francesco Maria; Bejgu, Andrei Stefan; Tedeschi, Simone; Conia, Simone; Navigli, Roberto - 04b Atto di convegno in volume

conference: European Association for Computational Linguistics (St. Julian's; Malta)

book: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) - (979-8-89176-088-2)

11573/1711963 - 2024 - Analyzing Homonymy Disambiguation Capabilities of Pretrained Language Models

Proietti, Lorenzo; Perrella, Stefano; Tedeschi, Simone; Vulpis, Giulia; Lavalle, Leonardo; Sanchietti, Andrea; Ferrari, Andrea; Navigli, Roberto - 04b Atto di convegno in volume

conference: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (Torino; Italy)

book: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) - (978-2-493814-10-4)

11573/1686592 - 2023 - REDFM: a Filtered and Multilingual Relation Extraction Dataset

Huguet Cabot, Pere-Lluis; Tedeschi, Simone; Ngonga Ngomo, Axel-Cyrille; Navigli, Roberto - 04b Atto di convegno in volume

conference: Association for Computational Linguistics (Toronto, Canada)

book: The 61st Conference of the the Association for Computational Linguistics - (978-1-959429-72-2)

11573/1685069 - 2023 - What's the Meaning of Superhuman Performance in Today's NLU?

Tedeschi, Simone; Bos, Johan; Declerck, Thierry; Hajič, Jan; Hershcovich, Daniel; Hovy, Eduard; Koller, Alexander; Krek, Simon; Schockaert, Steven; Sennrich, Rico; Shutova, Ekaterina; Navigli, Roberto - 04b Atto di convegno in volume

conference: Association for Computational Linguistics (Toronto, Canada)

book: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics - (9781959429722)

11573/1664524 - 2022 - Human Attention Assessment Using A Machine Learning Approach with GAN-based Data Augmentation Technique Trained Using a Custom Dataset

Pepe, S.; Tedeschi, S.; Brandizzi, N.; Russo, S.; Iocchi, L.; Napoli, C. - 01a Articolo in rivista

paper: OBM NEUROBIOLOGY (Beachwood OH: Open Biomedical Publishing Corporation) pp. - - issn: 2573-4407 - wos: (0) - scopus: 2-s2.0-85144152564 (26)

11573/1671588 - 2022 - EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

Scott Keh, Sedrick; Bharadwaj, Rohit K.; Liu, Emmy; Tedeschi, Simone; Gangal, Varun; Navigli, Roberto - 04b Atto di convegno in volume

conference: 3rd Workshop on Figurative Language Processing (FLP) (Abu Dhabi; United Arab Emirates)

book: Proceedings of the 3rd Workshop on Figurative Language Processing (FLP) - (9781959429111)

11573/1653020 - 2022 - ID10M: Idiom Identification in 10 Languages

Tedeschi, Simone; Martelli, Federico; Navigli, Roberto - 04b Atto di convegno in volume

conference: Findings of the Association for Computational Linguistics: NAACL 2022 (Seattle, United States)

book: Findings of the Association for Computational Linguistics: NAACL 2022 - (9781955917766)

11573/1653019 - 2022 - MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)

Tedeschi, Simone; Navigli, Roberto - 04b Atto di convegno in volume

conference: 2022 Findings of the Association for Computational Linguistics: NAACL 2022 (Seattle; United States)

book: Findings of the Association for Computational Linguistics: NAACL 2022 - (9781955917766)

11573/1653120 - 2022 - NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection

Tedeschi, Simone; Navigli, Roberto - 04b Atto di convegno in volume

conference: 16th International Workshop on Semantic Evaluation, SemEval 2022 (Seattle, United States)

book: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) - (9781955917803)

11573/1599786 - 2021 - Named entity recognition for entity linking: what works and what’s next

Tedeschi, Simone; Conia, Simone; Cecconi, Francesco; Navigli, Roberto - 04b Atto di convegno in volume

conference: Empirical Methods in Natural Language Processing (Punta Cana, Dominican Republic)

book: Findings of the Association for Computational Linguistics: EMNLP 2021 - ()

11573/1599784 - 2021 - WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER

Tedeschi, Simone; Maiorca, Valentino; Campolungo, Niccolò; Cecconi, Francesco; Navigli, Roberto - 04b Atto di convegno in volume

conference: Empirical Methods in Natural Language Processing (Punta Cana, Dominican Republic)

book: Findings of the Association for Computational Linguistics: EMNLP 2021 - ()