DARIO ONORATI

Dottore di ricerca

ciclo: XXXVII


supervisore: Fabio Massimo Zanzotto

Titolo della tesi: Quantifying and Addressing Bias in Large Language Models: From Detection to Mitigation

This doctoral thesis investigates the detection and mitigation of social bias in Large Language Models (LLMs) and Instruction-Following Language Models (IFLMs). Although these models have achieved remarkable success in a wide variety of Natural Language Processing (NLP) tasks, they often reproduce and even amplify existing social stereotypes present in the large-scale web data on which they are trained. The first part of this research focuses on measuring the quantity of biases in IFLMs. To achieve this result, we introduced a new resource named \pat{}, composed of a dataset and a set of evaluation metrics specifically designed to quantify stereotypical associations. \itaPat{} is the Italian version of \pat{} to detect biases and stereotypes of Italian culture. Our findings revealed that IFLMs consistently generate outputs that are biased across multiple social dimensions, such as gender, race, and age. The second part of the work investigates the mechanisms behind these behaviors. Through a series of controlled experiments, we observed that LLMs have a great memorization capacity, achieving excellent performance on previously encountered data. However, they often demonstrate a limited generalization ability to unseen inputs. This effect is particularly evident under extreme domain adaptation conditions: when exposed to domain-specific data, model performance increases, underscoring their strong reliance on memorized patterns. Based on these results, we explored the possibility of debiasing using the extreme domain adaptation strategy on open LLM models on the PANDA dataset, which consists of anti-stereotyped sentences. To ensure the approach remained computationally efficient, we used Low-Rank Adaptation (LoRA), a Parameter-Efficient Fine-Tuning (PEFT) method. The results demonstrate that this technique can be a viable way to mitigate social bias while preserving task performance. In summary, this thesis provides a comprehensive analysis of social bias in various LLMs and IFLMs, introducing \pat{} as a novel resource for bias assessment. Furthermore, a scalable and effective solution for mitigating a pre-trained model is proposed. Ultimately, I hope that work like this will lead to more equitable and accountable NLP systems.

Produzione scientifica

11573/1727083 - 2024 - Investigating the Impact of Data Contamination of Large Language Models in Text-to-SQL Translation
Ranaldi, F; Ruzzetti, Es; Onorati, D; Ranaldi, L; Giannone, C; Favalli, A; Romagnoli, R; Zanzotto, Fm - 04b Atto di convegno in volume
congresso: 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) (Bangkok; Thailand)
libro: Findings of the Association for Computational Linguistics: ACL 2024 - (9798891760998)

11573/1696819 - 2023 - The Dark Side of the Language: Syntax-based Neural Networks rivaling Transformers in Definitely Unseen Sentences
Onorati, Dario; Ranaldi, Leonardo; Nourbakhsh, Aria; Patrizi, Arianna; Sofia Ruzzetti, Elena; Mastromattei, Michele; Fallucchi, Francesca; Massimo Zanzotto, Fabio - 04b Atto di convegno in volume
congresso: 2023 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) (Venice; Italy)
libro: 2023 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology WI-IAT 2023. Proceeding - (9798350309188; 979-8-3503-0919-5)

11573/1696812 - 2023 - Measuring bias in Instruction-Following models with P-AT
Onorati, Dario; Ruzzetti Elena, Sofia; Venditti, Davide; Ranaldi, Leonardo; Zanzotto Fabio, Massimo - 04b Atto di convegno in volume
congresso: Empirical Methods in Natural Language Processing (Sentosa Gateway; Singapore)
libro: Findings of the Association for Computational Linguistics: EMNLP 2023 - (9798891760615)

11573/1696813 - 2023 - Investigating Gender Bias in Large Language Models for the Italian Language
Sofia Ruzzetti, Elena; Onorati, Dario; Ranaldi, Leonardo; Venditti, Davide; Massimo Zanzotto, Fabio - 04b Atto di convegno in volume
congresso: Italian Conference on Computational Linguistics 2023 (Venice; Italy)
libro: CLiC-it 2023 Italian Conference on Computational Linguistics Proceedings of the 9th Italian Conference on Computational Linguistics Venice, Italy, November 30 - December 2, 2023 - ()

11573/1670705 - 2022 - KERMIT for Sentiment Analysis in Italian Healthcare Reviews
Ranaldi, Leonardo; Mastromattei, Michele; Onorati, Dario; Sofia Ruzzetti, Elena; Fallucchi, Francesca; Massimo Zanzotto, Fabio - 04b Atto di convegno in volume
congresso: 8th Italian Conference on Computational Linguistics, CLiC-it 2021 (Milan; Italy)
libro: Italian Conference on Computational Linguistics 2021 - ()

11573/1643156 - 2020 - Pat-in-the-loop: Syntax-based neural networks with activation visualization and declarative control
Zanzotto, F. M.; Onorati, D.; Tommasino, P.; Santilli, A.; Ranaldi, L.; Fallucchi, F. - 04b Atto di convegno in volume
congresso: 2020 Italian Workshop on Explainable Artificial Intelligence, XAI.it 2020 (Online)
libro: CEUR Workshop Proceedings - ()

11573/1643092 - 2020 - KERMIT: Complementing transformer architectures with encoders of explicit syntactic interpretations
Zanzotto, F. M.; Santilli, A.; Ranaldi, L.; Onorati, D.; Tommasino, P.; Fallucchi, F. - 04b Atto di convegno in volume
congresso: 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 (Punta Cana, Repubblica Dominicana)
libro: EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference - (9781952148606)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma