MARCO MARU

Dottore di ricerca

ciclo: XXXIII


supervisore: Claudia Angela Ciancaglini - Roberto Navigli

Titolo della tesi: Delving into the Uncharted Territories of Word Sense Disambiguation

The automatic disambiguation of word senses, i.e. Word Sense Disambiguation, is a long-standing task in the field of Natural Language Processing; an AI-complete problem which took its first steps more than half a century ago, and which, to date, has apparently attained human-like performances on standard evaluation benchmarks. Unfortunately, the steady evolution that the task experienced over time in terms of sheer performance has not been followed hand in hand by an adequate theoretical support, nor by a careful error analysis. Furthermore, we believe that the lack of an exhaustive bird's eye view which accounts for the sort of high-end and unrealistic computational architectures that systems will soon need in order to further refine their performances could lead the field to a dead angle in a few years. In essence, taking advantage from the current moment of great accomplishments and renewed interest in the task, we argue that Word Sense Disambiguation is mature enough for researchers to really observe the extent of the results hitherto obtained, evaluate what is actually missing, and answer the much sought for question: ``are current state-of-the-art systems really able to effectively solve lexical ambiguity?'' Driven by the desire to become both architects and participants in this period of pondering, we have identified a few macro areas representative of the challenges of automatic disambiguation. From this point of view, in this thesis we propose experimental solutions and empirical tools so as to bring to the attention of the Word Sense Disambiguation community unusual and unexplored points of view. We hope these will represent a new perspective through which to best observe the current state of disambiguation, as well as to foresee future paths for the task to evolve on. Specifically, 1q) prompted by the growing concern about the rise in performance being closely linked to the demand for more and more unrealistic computational architectures in all areas of application of Deep Learning related techniques, we 1a) provide evidence for the undisclosed potential of approaches based on knowledge-bases, via the exploitation of syntagmatic information. Moreover, 2q) driven by the dissatisfaction with the use of cognitively-inaccurate, finite inventories of word senses in Word Sense Disambiguation, we 2a) introduce an approach based on Definition Modeling paradigms to generate contextual definitions for target words and phrases, hence going beyond the limits set by specific lexical-semantic inventories. Finally, 3q) moved by the desire to analyze the real implications beyond the idea of ``machines performing disambiguation on par with their human counterparts'' we 3a) put forward a detailed analysis of the shared errors affecting current state-of-the-art systems based on diverse approaches for Word Sense Disambiguation, and highlight, by means of a novel evaluation dataset tailored to represent common and critical issues shared by all systems, performances way lower than those usually reported in the current literature.

Produzione scientifica

11573/1656313 - 2022 - Fully-Semantic Parsing and Generation: the BabelNet Meaning Representation
Martinez Lorenzo, Abelardo Carlos; Maru, Marco; Navigli, Roberto - 04b Atto di convegno in volume
congresso: Association for Computational Linguistics (Dublin, Ireland)
libro: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics - (9781955917216)

11573/1653273 - 2022 - SemEval-2022 Task 9: R2VQ – Competence-based Multimodal Question Answering
Tu, Jingxuan; Holderness, Eben; Maru, Marco; Conia, Simone; Rim, Kyeongmin; Lynch, Kelley; Brutti, Richard; Navigli, Roberto; Pustejovsky, James - 04b Atto di convegno in volume
congresso: 16th International Workshop on Semantic Evaluation (SemEval-2022) (Seattle; United States)
libro: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) - (9781955917803)

11573/1465840 - 2020 - Generationary or “How We Went beyond Word Sense Inventories and Learned to Gloss”
Bevilacqua, Michele; Maru, Marco; Navigli, Roberto - 04b Atto di convegno in volume
congresso: The 2020 Conference on Empirical Methods in Natural Language Processing (Online)
libro: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) - (978-1-952148-60-6)

11573/1424203 - 2020 - Personalized PageRank with Syntagmatic Information for Multilingual Word Sense Disambiguation
Scozzafava, Federico; Maru, Marco; Brignone, Fabrizio; Torrisi, Giovanni; Navigli, Roberto - 04b Atto di convegno in volume
congresso: Association for Computational Linguistics (Seattle, WA, USA)
libro: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations - (978-1-952148-04-0)

11573/1344950 - 2019 - SyntagNet: challenging supervised word sense disambiguation with Lexical-Semantic Combinations
Maru, Marco; Scozzafava, Federico; Martelli, Federico; Navigli, Roberto - 04b Atto di convegno in volume
congresso: Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (Hong Kong; China)
libro: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) - (978-1-950737-92-5)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma