ANDREA SANTILLI

Dottore di ricerca

ciclo: XXXVI


co-supervisore: Prof. Emanuele Rodolà

Titolo della tesi: Effective, Efficient and Reliable Large Language Models

In recent years, Large Language Models (LLMs) have fundamentally transformed the field of Natural Language Processing (NLP), reshaping the landscape of AI research and applications. This thesis represents the culmination of four years of doctoral research, which began in 2020 when LLMs were still an emerging technology and GPT-3 had just been introduced. Over the course of this research, we have both observed and contributed to the advancement of some of the technologies underpinning LLMs, from their early stages to their current role as cutting-edge AI systems. Specifically, this thesis combines some of the works carried out during this time under three critical dimensions of LLMs: Effectiveness, Efficiency, and Reliability. On the Effectiveness dimension, we contributed to the development of instruction tuning - a key technique now ubiquitous in the training pipeline of LLMs. Our work demonstrated that smaller, instruction-tuned LLMs can outperform models up to 16 times their size, including GPT-3. We also developed PromptSource, an integrated development environment for creating, managing, and sharing natural language prompts, which has become a valuable resource for the NLP community. Both of these contributions were carried out during the BigScience Workshop, a year-long open research initiative by Hugging Face targeting the study of LLMs. Finally, along this dimension, we studied how to make these models handle multimodal database-like queries. Addressing the Efficiency dimension, we tackled the challenge of accelerating LLM inference. We introduced three novel parallel decoding algorithms that significantly speed up text generation without compromising output quality. This has since evolved into an active research area known as speculative or parallel decoding. Furthermore, we developed an efficient, language-specific instruction-tuned LLM for the Italian language, demonstrating a cost-effective approach to creating high-quality models for specific languages. Our research on Reliability addresses the critical issue of making these models reliable since they have been shown to systematically generate incorrect information - a phenomenon known as hallucinations. In this direction, we investigated whether it's possible to detect the model's confidence in its outputs. We conducted a comprehensive assessment of current uncertainty quantification methods and their evaluation protocols and explored novel approaches to combine these methods to improve the detection and quantification of uncertainty in LLM outputs. Our work paves the way for more Effective, Efficient, and Reliable large language models, addressing key challenges in their development and deployment while opening new avenues for future research in this rapidly evolving field.

Produzione scientifica

11573/1672044 - 2023 - Latent Autoregressive Source Separation
Postolache, Emilian; Mariani, Giorgio; Mancusi, Michele; Santilli, Andrea; Cosmo, Luca; Rodola', Emanuele - 04b Atto di convegno in volume
congresso: The Thirty-Seventh AAAI Conference on Artificial Intelligence (Washington DC, USA)
libro: Proceedings of AAAI - ()

11573/1706544 - 2023 - Accelerating Transformer Inference for Translation via Parallel Decoding
Santilli, Andrea; Severino, Silvio; Postolache, Emilian; Maiorca, Valentino; Mancusi, Michele; Marin, Riccardo; Rodola, Emanuele - 04b Atto di convegno in volume
congresso: The 61st Annual Meeting of the Association for Computational Linguistics (Toronto, Canada)
libro: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) - ()

11573/1724128 - 2023 - Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Srivastava, Aarohi; Rastogi, Abhinav; Rao, Abhishek; Awal Md Shoeb, Abu; Abid, Abubakar; Fisch, Adam; Brown, Adam R.; Santoro, Adam; Gupta, Aditya; Garriga-Alonso, Adrià; Kluska, Agnieszka; Lewkowycz, Aitor; Agarwal, Akshat; Power, Alethea; Ray, Alex; Warstadt, Alex; Kocurek, Alexander W.; Safaya, Ali; Tazarv, Ali; Xiang, Alice; Parrish, Alicia; Nie, Allen; Hussain, Aman; Askell, Amanda; Dsouza, Amanda; Slone, Ambrose; Rahane, Ameet; Iyer, Anantharaman S.; Andreassen, Anders; Madotto, Andrea; Santilli, Andrea; Stuhlmüller, Andreas; Dai, Andrew; La, Andrew; Lampinen, Andrew; Zou, Andy; Jiang, Angela; Chen, Angelica; Vuong, Anh; Gupta, Animesh; Gottardi, Anna; Norelli, Antonio; Venkatesh, Anu; Gholamidavoodi, Arash; Tabassum, Arfa; Menezes, Arul; Kirubarajan, Arun; Mullokandov, Asher; Sabharwal, Ashish; Herrick, Austin; Efrat, Avia; Erdem, Aykut; Karakaş, Ayla; Ryan Roberts, B.; Sheng Loe, Bao; Zoph, Barret; Bojanowski, Bartłomiej; Özyurt, Batuhan; Hedayatnia, Behnam; Neyshabur, Behnam; Inden, Benjamin; Stein, Benno; Ekmekci, Berk; Yuchen Lin, Bill; Howald, Blake; Orinion, Bryan; Diao, Cameron; Dour, Cameron; Stinson, Catherine; Argueta, Cedrick; Ferri Ramírez, César; Singh, Chandan; Rathkopf, Charles; Meng, Chenlin; Baral, Chitta; Wu, Chiyu; Callison-Burch, Chris; Waites, Chris; Voigt, Christian; Manning, Christopher D.; Potts, Christopher; Ramirez, Cindy; Rivera, Clara E.; Siro, Clemencia; Raffel, Colin; Ashcraft, Courtney; Garbacea, Cristina; Sileo, Damien; Garrette, Dan; Hendrycks, Dan; Kilman, Dan; Roth, Dan; Freeman, Daniel; Khashabi, Daniel; Levy, Daniel; Moseguí González, Daniel; Perszyk, Danielle; Hernandez, Danny; Chen, Danqi; Ippolito, Daphne; Gilboa, Dar; Dohan, David; Drakard, David; Jurgens, David; Datta, Debajyoti; Ganguli, Deep; Emelin, Denis; Kleyko, Denis; Yuret, Deniz; Chen, Derek; Tam, Derek; Hupkes, Dieuwke; Misra, Diganta; Buzan, Dilyar; Coelho Mollo, Dimitri; Yang, Diyi; Lee, Dong-Ho; Schrader, Dylan; Shutova, Ekaterina; Dogus Cubuk, Ekin; Segal, Elad; Hagerman, Eleanor; Barnes, Elizabeth; Donoway, Elizabeth; Pavlick, Ellie; Rodola, Emanuele; Lam, Emma; Chu, Eric; Tang, Eric; Erdem, Erkut; Chang, Ernie; Chi, Ethan A.; Dyer, Ethan; Jerzak, Ethan; Kim, Ethan; Engefu Manyasi, Eunice; Zheltonozhskii, Evgenii; Xia, Fanyue; Siar, Fatemeh; Martínez-Plumed, Fernando; Happé, Francesca; Chollet, Francois; Rong, Frieda; Mishra, Gaurav; Indra Winata, Genta; De Melo, Gerard; Kruszewski, Germán; Parascandolo, Giambattista; Mariani, Giorgio; Wang, Gloria; Jaimovitch-López, Gonzalo; Betz, Gregor; Gur-Ari, Guy; Galijasevic, Hana; Kim, Hannah; Rashkin, Hannah; Hajishirzi, Hannaneh; Mehta, Harsh; Bogar, Hayden; Shevlin, Henry; Schütze, Hinrich; Yakura, Hiromu; Zhang, Hongming; Mee Wong, Hugh; Ng, Ian; Noble, Isaac; Jumelet, Jaap; Geissinger, Jack; Kernion, Jackson; Hilton, Jacob; Lee, Jaehoon; Fernández Fisac, Jaime; Simon, James B.; Koppel, James; Zheng, James; Zou, James; Kocoń, Jan; Thompson, Jana; Wingfield, Janelle; Kaplan, Jared; Radom, Jarema; Sohl-Dickstein, Jascha; Phang, Jason; Wei, Jason; Yosinski, Jason; Novikova, Jekaterina; Bosscher, Jelle; Marsh, Jennifer; Kim, Jeremy; Taal, Jeroen; Engel, Jesse; Alabi, Jesujoba; Xu, Jiacheng; Song, Jiaming; Tang, Jillian; Waweru, Joan; Burden, John; Miller, John; Balis, John U.; Batchelder, Jonathan; Berant, Jonathan; Frohberg, Jörg; Rozen, Jos; Hernandez-Orallo, Jose; Boudeman, Joseph; Guerr, Joseph; Jones, Joseph; Tenenbaum, Joshua B.; Rule, Joshua S.; Chua, Joyce; Kanclerz, Kamil; Livescu, Karen; Krauth, Karl; Gopalakrishnan, Karthik; Ignatyeva, Katerina; Markert, Katja; Dhole, Kaustubh D.; Gimpel, Kevin; Omondi, Kevin; Mathewson, Kory; Chiafullo, Kristen; Shkaruta, Ksenia; Shridhar, Kumar; Mcdonell, Kyle; Richardson, Kyle; Reynolds, Laria; Gao, Leo; Zhang, Li; Dugan, Liam; Qin, Lianhui; Contreras-Ochando, Lidia; Morency, Louis-Philippe; Moschella, Luca; Lam, Lucas; Noble, Lucy; Schmidt, Ludwig; He, Luheng; Oliveros Colón, - 01a Articolo in rivista
rivista: TRANSACTIONS ON MACHINE LEARNING RESEARCH (Amherst Massachusetts: OpenReview.net, 2022-) pp. - - issn: 2835-8856 - wos: (0) - scopus: (0)

11573/1699202 - 2023 - Multimodal Neural Databases
Trappolini, G.; Santilli, A.; Rodola, E.; Halevy, A.; Silvestri, F. - 04b Atto di convegno in volume
congresso: ACM International Conference on Research and Development in Information Retrieval (Taipei; Taiwan)
libro: SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval - (9781450394086)

11573/1672130 - 2022 - PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Bach, Stephen; Sanh, Victor; Yong, Zheng Xin; Webson, Albert; Raffel, Colin; Nayak, Nihal V.; Sharma, Abheesht; Kim, Taewoon; Bari, M Saiful; Fevry, Thibault; Alyafeai, Zaid; Dey, Manan; Santilli, Andrea; Sun, Zhiqing; Ben-David, Srulik; Xu, Canwen; Chhablani, Gunjan; Wang, Han; Fries, Jason; Al-Shaibani, Maged; Sharma, Shanya; Thakker, Urmish; Almubarak, Khalid; Tang, Xiangru; Radev, Dragomir; Jiang, Mike Tian-Jian; Rush, Alexander - 04b Atto di convegno in volume
congresso: 60th Annual Meeting of the Association for Computational Linguistics (Dublin, Ireland)
libro: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations - ()

11573/1724130 - 2022 - Explanatory Learning: Towards Artificial Scientific Discovery
Norelli, Antonio; Mariani, Giorgio; Moschella, Luca; Santilli, Andrea; Parascandolo, Giambattista; Melzi, Simone; Rodola, Emanuele - 04f Poster
congresso: Knowledge and Logical Reasoning in the Era of Data-driven Learning workshop at the International Conference of Machine Learning (Honolulu, Hawaii)
libro: Knowledge and Logical Reasoning in the Era of Data-driven Learning workshop at the International Conference of Machine Learning - ()

11573/1643085 - 2022 - KERMITviz: Visualizing Neural Network Activations on Syntactic Trees
Ranaldi, L.; Fallucchi, F.; Santilli, A.; Zanzotto, F. M. - 04b Atto di convegno in volume
congresso: 15th International Conference on Metadata and Semantics Research, MTSR 2021 (Londra, United Kingdom)
libro: Communications in Computer and Information Science - (978-3-030-98875-3; 978-3-030-98876-0)

11573/1672126 - 2022 - Multitask prompted training enables zero-shot task generalization
Sanh, Victor; Webson, Albert; Raffel, Colin; Bach, Stephen; Sutawika, Lintang; Alyafeai, Zaid; Chaffin, Antoine; Stiegler, Arnaud; Raja, Arun; Dey, Manan; Saiful Bari, M; Xu, Canwen; Thakker, Urmish; Sharma Sharma, Shanya; Szczechla, Eliza; Kim, Taewoon; Chhablani, Gunjan; Nayak, Nihal; Datta, Debajyoti; Chang, Jonathan; Tian-Jian Jiang, Mike; Wang, Han; Manica, Matteo; Shen, Sheng; Xin Yong, Zheng; Pandey, Harshit; Bawden, Rachel; Wang, Thomas; Neeraj, Trishala; Rozen, Jos; Sharma, Abheesht; Santilli, Andrea; Fevry, Thibault; Alan Fries, Jason; Teehan, Ryan; Le Scao, Teven; Biderman, Stella; Gao, Leo; Wolf, Thomas; M Rush, Alexander - 04b Atto di convegno in volume
congresso: The Tenth International Conference on Learning Representations (Virtual Conference)
libro: International Conference on Learning Representations - ()

11573/1643156 - 2020 - Pat-in-the-loop: Syntax-based neural networks with activation visualization and declarative control
Zanzotto, F. M.; Onorati, D.; Tommasino, P.; Santilli, A.; Ranaldi, L.; Fallucchi, F. - 04b Atto di convegno in volume
congresso: 2020 Italian Workshop on Explainable Artificial Intelligence, XAI.it 2020 (Online)
libro: CEUR Workshop Proceedings - ()

11573/1643092 - 2020 - KERMIT: Complementing transformer architectures with encoders of explicit syntactic interpretations
Zanzotto, F. M.; Santilli, A.; Ranaldi, L.; Onorati, D.; Tommasino, P.; Fallucchi, F. - 04b Atto di convegno in volume
congresso: 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 (Punta Cana, Repubblica Dominicana)
libro: EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference - (9781952148606)

11573/1643150 - 2018 - A kernel-based approach for irony and sarcasm detection in Italian
Santilli, A.; Croce, D.; Basili, R. - 04b Atto di convegno in volume
congresso: 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop, EVALITA 2018 (Torino; Italia)
libro: CEUR Workshop Proceedings - (9788831978422; 9788831978699)

11573/1643152 - 2018 - SyntNN at SemEval-2018 Task 2: Is Syntax Useful for Emoji Prediction? Embedding Syntactic Trees in Multi Layer Perceptrons
Santilli, A.; Zanzotto, F. M. - 04b Atto di convegno in volume
congresso: NAACL HLT 2018 - International Workshop on Semantic Evaluation, SemEval 2018 - Proceedings of the 12th Workshop (New Orleans; USA)
libro: NAACL HLT 2018 - International Workshop on Semantic Evaluation, SemEval 2018 - Proceedings of the 12th Workshop - ()

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma