PERE-LLUIS HUGUET CABOT

PhD Graduate

PhD program:: XXXVII


supervisor: Roberto Navigli
co-supervisor: Daniele Nardi

Thesis title: From Text to Knowledge: Multilingual Information Extraction for Knowledge Graph Construction

In the era of Large Language Models (LLMs), Information Extraction (IE) may seem like a “Chronicle of a Death Foretold”. Between 2020 and 2023, it ranked among the top three most popular topics at conferences like ACL, yet by 2024, it had dropped to tenth place. The advent of Transformer Language Models (LMs), emerging just before work on this dissertation began, has transformed the field of Natural Language Processing (NLP), enabling unprecedented performance across a broad range of Natural Language Understanding (NLU) tasks. Surprisingly, scaling these models into LLMs has not led to diminishing returns but has instead further expanded their capabilities. However, there remains a need for efficient methods suitable for real-world applications that require low latency or the ability to process large volumes of real-time data domains where large models are often impractical. Additionally, tasks reliant on LLMs’ parametric memory face limitations due to neural inference, where accuracy and recency of information cannot always be guaranteed. While LLMs show great promise, they increasingly require grounding in external knowledge sources for reliable results. This is where IE becomes indispensable. Rather than being replaced, IE complements and strengthens LLMs, supporting their reasoning with accurate, grounded information. Knowledge Graphs (KGs) serve as structured frameworks that bridge unstructured text and structured knowledge, enabling scalable, interpretable organization of vast amounts of information. Essential for applications like semantic search, recommendation systems, and question-answering, KGs rely heavily on robust IE techniques. In this thesis, we focus on advancing multilingual IE methods to enhance KG construction and address limitations in existing IE systems.

Research products

11573/1726493 - 2024 - MOSAICo: a Multilingual Open-text Semantically Annotated Interlinked Corpus
Conia, Simone; Barba, Edoardo; Martinez Lorenzo, Abelardo Carlos; Huguet Cabot, Pere Lluis; Orlando, Riccardo; Procopio, Luigi; Navigli, Roberto - 04b Atto di convegno in volume
conference: North American Association for Computational Linguistics (Mexico City; Mexico)
book: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) - (9798891761148)

11573/1726491 - 2024 - Mitigating Data Scarcity in Semantic Parsing across Languages with the Multilingual Semantic Layer and its Dataset
Martinez Lorenzo, Abelardo Carlos; Huguet Cabot, Pere Lluis; Ghonim, Karim; Xu, Lu; Choi, Hee-Soo; Fernández-Castro, Alberte; Navigli, Roberto - 04b Atto di convegno in volume
conference: Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 (Hybridì; Bangkok)
book: Findings of the Association for Computational Linguistics: ACL 2024 - (9798891760998)

11573/1726492 - 2024 - ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget
Orlando, Riccardo; Huguet Cabot, Pere Lluis; Barba, Edoardo; Navigli, Roberto - 04b Atto di convegno in volume
conference: Association for Computational Linguistics (Bangkok; Thailand)
book: Findings of the Association for Computational Linguistics: ACL 2024 - ()

11573/1726490 - 2024 - Dissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People’s Gender and Origin
Stranisci, Marco; Huguet Cabot, Pere Lluis; Bassignana, Elisa; Navigli, Roberto - 04b Atto di convegno in volume
conference: 5th Workshop on Gender Bias in Natural Language Processing (Bangkok; Thailand)
book: Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP) - (979-8-89176-137-7)

11573/1686592 - 2023 - REDFM: a Filtered and Multilingual Relation Extraction Dataset
Huguet Cabot, Pere-Lluis; Tedeschi, Simone; Ngonga Ngomo, Axel-Cyrille; Navigli, Roberto - 04b Atto di convegno in volume
conference: Association for Computational Linguistics (Toronto, Canada)
book: The 61st Conference of the the Association for Computational Linguistics - (978-1-959429-72-2)

11573/1688041 - 2023 - AMRs Assemble! Learning to Ensemble with Autoregressive Models for AMR Parsing
Martinez Lorenzo, Abelardo Carlos; Huguet Cabot, Pere Lluís; Navigli, Roberto - 04b Atto di convegno in volume
conference: Association for Computational Linguistics (Toronto, Canada)
book: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) - (9781959429715)

11573/1688042 - 2023 - Cross-lingual AMR Aligner: Paying Attention to Cross-Attention
Martinez Lorenzo, Abelardo Carlos; Huguet Cabot, Pere Lluís; Navigli, Roberto - 04b Atto di convegno in volume
conference: Association for Computational Linguistics (Toronto, Canada)
book: Findings of the Association for Computational Linguistics: ACL 2023 - (9781959429623)

11573/1688044 - 2023 - Incorporating Graph Information in Transformer-based AMR Parsing
Vasylenko, Pavlo; Huguet Cabot, Pere Lluís; Martinez Lorenzo, Abelardo Carlos; Navigli, Roberto - 04b Atto di convegno in volume
conference: Association for Computational Linguistics (Toronto, Canada)
book: Findings of the Association for Computational Linguistics: ACL 2023 - (9781959429623)

11573/1604164 - 2021 - REBEL: Relation Extraction By End-to-end Language generation
Huguet Cabot, Pere-Lluís; Navigli, Roberto - 04b Atto di convegno in volume
conference: 2021 Conference on Empirical Methods in Natural Language Processing (Punta Cana, Dominican Republic)
book: Findings of the Association for Computational Linguistics. EMNLP 2021 - (9781955917100)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma