Research: Development of a transversal methodology for analysing and annotating mixed corpora in order to compile a corpus-based glossary for terms in Standard Arabic and Egyptian dialect. (Working title)
RESEARCH PROJECT'S ABSTRACT)
Traditionally, Arabic is described as a typical example of a 'diglossia', i.e. where speakers can choose between two distinct varieties, considered one superior and one inferior - Standard Arabic and dialects - or they mix them in different ways.
This approach is also reflected on the lexicographic resources, both in print and electronic form. Moreover, research work on corpora in Arabic is limited and focuses in two opposite directions: the analysis of texts, often from journalistic sources, in Modern Standard Arabic; or the analysis of oral texts in dialect varieties.
The aim of the project is therefore to encourage a paradigm shift in order to start considering Arabic "as one language" (M. Al-Batal (2018)). For this reason, it can be useful to include mixed corpora (where written and spoken are combined) and/or dialect corpora tout court into the research, because they are representative of the multifaceted reality of the Arabic world.
This study will be carried out by analysing and annotating MSA and dialect texts on the basis of the same principles, starting from Universal Dependencies and rethinking its morphological and syntactic categories so as to better encode the information.
The intention is to semi-automatically compile a glossary of words in both Standard Arabic and Egyptian dialect, contributing to the development of a transversal methodology, possibly applicable to other kinds of texts.
---
CURRICULUM VITAE)
11/2023 - Today: PhD student in Civilizations of Asia and Africa, Italian Institute of Oriental Studies (ISO) – Sapienza University of Rome (Italy).
2020 - 2023: Master’s Degree in Modern Languages for International Communication (LM-38) – Roma Tre University (Italy).
Dissertation title: "Computational analysis of a work in Egyptian Arabic: Universal Dependencies annotation and compilation of a glossary”.
2016 - 2020: Bachelor's Degree in Language and Linguistic-Cultural Mediation (L-12) – Roma Tre University (Italy).
Dissertation title: "Analysis of a bilingual Italian-Arabic parallel corpus: UD annotation and marking of translation equivalents".