JERIN GEORGE MATHEW

PhD Graduate

PhD program:: XXXVII


supervisor: Donatella Firmani
co-supervisor: Massimo Mecella

Thesis title: Language Models for Information Quality: Methods and Applications

This thesis addresses the challenges and advancements in applying language models to enhance information quality across data management tasks. As organizations increasingly rely on diverse data sources, maintaining high information quality has become crucial for decision-making and operational processes. This work investigates four key areas contributing to information quality: handling of duplicate data, uncertainty quantification, data-on-demand, and ethical considerations in data management. The core contributions are as follows. First, the thesis presents methodologies for detecting, managing, and removing duplicate data across different contexts. It introduces an efficient pipeline for entity count estimation to address duplicate records in large datasets, grounded approaches for integrating entity resolution with knowledge graphs, and a method that uses transformer-based language models to manage conceptual duplicates in multiple-choice question repositories. Secondly, it explores methods for quantifying and addressing uncertainty in outputs generated by large language models in question-answering applications, improving the reliability of responses for decision-critical tasks. Thirdly, the research presents a system enabling on-demand structured data generation and integration from multiple industrial sources using language models. Lastly, the thesis examines fairness in ranking systems that rely on subjective data, proposing a framework to assess group bias in collaborative rating platforms. In summary, this thesis explores the use of language models for managing information quality by addressing critical challenges in duplicate data handling, uncertainty quantification, real-time data access, and ethical considerations, fostering trust in data-driven systems across a variety of applications.

Research products

11573/1711344 - 2024 - INTEND: Intent-Based Data Operation in the Computing Continuum
Firmani, Donatella; Leotta, Francesco; Mathew, Jerin George; Rossi, Jacopo; Balzotti, Lorenzo; Song, Hui; Roman, Dumitru; Dautov, Rustem; Johannes Husom, Erik; Sen, Sagar; Balionyte-Merle, Vilija; Morichetta, Andrea; Dustdar, Schahram; Metsch, Thijs; Frascolla, Valerio; Khalid, Ahmed; Landi, Giada; Brenes, Juan; Toma, Ioan; Szabó, Róbert; Schaefer, Christian; Udroiu, Cosmin; Ulisses, Alexandre; Pietsch, Verena; Akselsen, Sigmund; Munch-Ellingsen, Arne; Pavlova, Irena; Kim, Hong-Gee; Kim, Changsoo; Allen, Bob; Kim, Sunwoo; Paulson, Eberechukwu - 04b Atto di convegno in volume
conference: International Conference on Advanced Information Systems Engineering (Limassol; Cyprus)
book: Proceedings of the Research Projects Exhibition Papers Presented at the 36th International Conference on Advanced Information Systems Engineering (CAiSE 2024) - ()

11573/1729168 - 2024 - Composing Smart Data Services in Shop Floors Through Large Language Models
Mathew, Jerin George; Monti, Flavia; Firmani, Donatella; Leotta, Francesco; Mandreoli, Federica; Mecella, Massimo - 04b Atto di convegno in volume
conference: International Conference on Service Oriented Computing (Tunisi)
book: Service-Oriented Computing - (9789819608072; 9789819608089)

11573/1709419 - 2024 - On the application of process management and process mining to Industry 4.0
Monti, Flavia; Mathew, Jerin George; Leotta, Francesco; Koschmider, Agnes; Mecella, Massimo - 01a Articolo in rivista
paper: SOFTWARE AND SYSTEMS MODELING (Springer Berlin Heidelberg) pp. - - issn: 1619-1366 - wos: WOS:001208203000001 (4) - scopus: 2-s2.0-85191515342 (5)

11573/1695891 - 2023 - NLP-Based Management of Large Multiple-Choice Test Item Repositories
Albano, Valentina; Firmani, Donatella; Laura, Luigi; Mathew, Jerin George; Lucia Paoletti, Anna; Torrente, Irene - 01a Articolo in rivista
paper: THE JOURNAL OF LEARNING ANALYTICS (UTS epress) pp. 28-44 - issn: 1929-7750 - wos: WOS:001133089300009 (2) - scopus: 2-s2.0-85180650601 (4)

11573/1685695 - 2023 - Using Knowledge Graphs for Record Linkage: Challenges and Opportunities
Andreou, A. S.; Firmani, D.; Mathew, J. G.; Mecella, M.; Pingos, M. - 04b Atto di convegno in volume
conference: International Conference on Advanced Information Systems Engineering (Zaragoza; Spain)
book: Advanced Information Systems Engineering Workshops - (978-3-031-34984-3; 978-3-031-34985-0)

11573/1685236 - 2023 - TopKontrol: a Monitoring and Quality Control System for the Packaging Production
Calamo, M.; De Franceschi, A.; De Santis, G.; Leotta, F.; Mazzaroppi, C.; Mathew, J. G.; Mecella, M.; Monti, F.; Sabatino, C.; Visani, L.; Visani, M. - 04b Atto di convegno in volume
conference: 35th International Conference on Advanced Information Systems Engineering, CAiSE 2023 (Zaragoza; Spain)
book: CAiSE-RPE 2023 CAiSE 2023 Research Projects Exhibition - ()

11573/1687834 - 2023 - Bridging the Gap between Buyers and Sellers in Data Marketplaces with Personalized Datasets
Firmani, Donatella; Mathew, Jerin George; Santoro, Donatello; Simonini, Giovanni; Zecchini, Luca - 04b Atto di convegno in volume
conference: 31st Italian Symposium on Advanced Database Systems (Galzignano Terme; Italy)
book: Proceedings of the 31st Symposium of Advanced Database Systems (SEBD 2023), Galzingano Terme, Italy, July 2nd to 5th, 2023 - ()

11573/1638545 - 2022 - Electrospindle 4.0: Towards Zero Defect Manufacturing of Spindles
Amadori, Francesco; Bardani, Michele; Bernasconi, Eleonora; Cappelletti, Federica; Catarci, Tiziana; Drudi, Gianluca; Ferretti, Mario; Foschini, Luigi; Galli, Paolo; Germani, Michele; Grosso, Giuseppe; Leotta, Francesco; Mathew, Jerin George; Manuguerra, Luca; Mariucci, Nicola; Mecella, Massimo; Monti, Flavia; Pierini, Fabrizio; Rossi, Marta - 04b Atto di convegno in volume
conference: Joint 16th Research Challenges in Information Science Workshops and Research Projects Track, RCIS-WS and RP 2022 (Barcelona (Spain))
book: Joint Proceedings of RCIS 2022 Workshops and Research Projects Track co-located with the 16th International Conference on Research Challenges in Information Science (RCIS 2022) - ()

11573/1643212 - 2022 - Supporting Zero Defect Manufacturing Through Cloud Computing and Data Analytics: the Case Study of Electrospindle 4.0
Leotta, F.; Mathew, J. G.; Mecella, M.; Monti, F. - 04b Atto di convegno in volume
conference: CAiSE 2022 International Workshops (Leuven; Belgium)
book: Advanced Information Systems Engineering Workshops CAiSE 2022 International Workshops, Leuven, Belgium, June 6–10, 2022, Proceedings - (978-3-031-07477-6; 978-3-031-07478-3)

11573/1683367 - 2022 - Towards an Information Systems-driven Maturity Model for Industry 4.0
Leotta, F.; Mathew, J. G.; Monti, F.; Mecella, M. - 04b Atto di convegno in volume
conference: BUSINESS PROCESS MODELING, DEVELOPMENT, AND SUPPORT (BPMDS) (Leuven)
book: Lecture Notes in Business Information Processing - ()

11573/1643753 - 2021 - Automatic entity labeling through explanation techniques
Castano, S.; Ferrara, A.; Firmani, D.; Mathew, J. G.; Montanelli, S. - 04b Atto di convegno in volume
conference: 29th Italian Symposium on Advanced Database Systems, SEBD 2021 (Pizzo Calabro; Italy)
book: 29th Italian Symposium on Advanced Database Systems, SEBD 2021 - ()

11573/1638813 - 2021 - Unsupervised segmentation of human habits in smart home logs through process discovery
Esposito, L.; Veneruso, S.; Leotta, F.; Monti, F.; Mathew, J. G.; Mecella, M. - 04b Atto di convegno in volume
conference: 1st Italian Forum on Business Process Management, ITBPM 2021 (ita)
book: Proceedings of the 1st Italian Forum on Business Process Management co-located with the 19th International Conference of Business Process Management (BPM 2021) - ()

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma