ANDREI STEFAN BEJGU

PhD Graduate

PhD program:: XXXVII


supervisor: Roberto Navigli
advisor: Roberto Navigli

Thesis title: Enhancing Latent Alignment Methods for NLP: From Words to Concepts

A central challenge in artificial intelligence is developing systems that can truly understand human language. Achieving this requires models that capture nuanced meanings, relationships, and contextual subtleties, especially in specialized applications. This thesis explores strategies in Natural Language Processing (NLP) to address this challenge, beginning with foundational word representations, advancing to sentence-level representations, and then integrating both levels to enrich interpretive depth. Finally, we investigate synthetic data generation as a method to refine these representations further using Large Language Models (LLMs). Word representations serve as the starting point, positioning words within a continuous semantic space to encode basic lexical relationships. While these embeddings effectively capture individual word associations, they fail to represent meaning beyond isolated terms, limiting their utility in complex tasks that demand contextual comprehension. To address this gap, the thesis progresses to sentence-level embeddings, which extend beyond individual words by integrating syntactic and semantic relationships. These context-aware embeddings provide the depth necessary for semantic similarity, sentence alignment, and cross-lingual applications, enabling models to interpret richer and more nuanced meanings within larger contexts. By integrating word and sentence-level embeddings, we achieve a multi-layered representation that captures both fine-grained lexical details and broader contextual dependencies, offering a more comprehensive approach to language understanding. This layered integration sets the foundation for the final phase of the thesis: synthetic data generation, which further refines these representations by addressing data scarcity and tailoring embeddings to task-specific needs in specialized applications. In sum, this thesis shows how each stage—from foundational word embeddings to sentence-level representations, integrating both and, ultimately, synthetic data generation—contributes to building robust and adaptable NLP models. This structured progression provides a comprehensive pathway for addressing the varied and intricate challenges of language understanding, advancing NLP toward effective language comprehension in AI systems.

Research products

11573/1717771 - 2024 - Word Sense Linking: Disambiguating Outside the Sandbox
Bejgu, Andrei Stefan; Barba, Edoardo; Procopio, Luigi; Fernández-Castro, Alberte; Navigli, Roberto - 04b Atto di convegno in volume
conference: 62nd Annual Meeting of the Association-for-Computational-Linguistics (ACL) (Bangkok)
book: Findings of the Association for Computational Linguistics ACL 2024 - (9798891760998)

11573/1713460 - 2024 - CroCoAlign: A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts
Molfese, Francesco Maria; Bejgu, Andrei Stefan; Tedeschi, Simone; Conia, Simone; Navigli, Roberto - 04b Atto di convegno in volume
conference: European Association for Computational Linguistics (St. Julian's; Malta)
book: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) - (979-8-89176-088-2)

11573/1694184 - 2023 - XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs
Martelli, Federico; Bejgu, Andrei Stefan; Campagnano, Cesare; Čibej, Jaka; Costa, Rute; Gantar, Apolonija; Kallas, Jelena; Koeva, Svetla; Koppel, Kristina; Krek, Simon; Langemets, Margit; Lipp, Veronika; Nimb, Sanni; Olsen, Sussi; Sandford Pedersen, Bolette; Quochi, Valeria; Salgado, Ana; Simon, László; Tiberius, Carole; Ureña-Ruiz, Rafael-J; Navigli, Roberto - 04b Atto di convegno in volume
conference: Ninth Italian Conference on Computational Linguistics (Venice; Italy)
book: Proceedings of the Ninth Italian Conference on Computational Linguistics - ()

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma