FEDERICO SCAFOGLIERI

Dottore di ricerca

ciclo: XXXIII


supervisore: Domenico Lembo

Titolo della tesi: Ontology-Based Information Extraction experiences, framework, algorithms and tools

A significant portion of the information collected by enterprises and organizations resides in text documents and is thus inherently unstructured. Turning it into a structured form is the aim of Information Extraction (IE). Depending on the approach, the output of an IE process can fill forms, populate relational tables, or even be presented through an ontology. This last approach, known in the literature under the name of Ontology-Based Information Extraction (OBIE), is particularly interesting, since ontologies may facilitate the integration with other corporate and external data and enable data management and governance at an abstract, conceptual level. However, despite OBIE has been so far the subject of several investigations, how to exploit the reasoning abilities offered by an ontology to improve the extraction process has not yet been specifically studied. This thesis is intended to be a first step in that direction. Starting from our experience gained from implementing OBIE systems via open-source technologies, and with the intent to address the encountered weaknesses, we propose a formal framework for OBIE, called Ontology-Based Document Spanning (OBDS). We devise our proposal by revisiting the Ontology-Based Data Access (ODBA) paradigm, a sophisticated form of semantic data integration from relational databases, and leveraging the investigation on Document Spanners, a recent formal study of rule-based information extraction that follows the database principles. The reasoning service of main interest in OBDS, as usual in ontology-based data management approaches, is Query Answering (Q. A.). We provide an analysis of this service in different settings and propose algorithms for Q. A., in the spirit of OBDA. Right here we show how the ontology plays a major role by mediating the extraction of information from text. To demonstrate the applicability of our approach in practice, we illustrate \MastroT, an OBDS \emph{tool} that we have implemented using robust industrial technologies and experimented on large document datasets. Last but not least, we formally treat the problem of the Entity Resolution (ER), which is recurrent in the OBIE context, as in general in information integration approaches.

Produzione scientifica

11573/1690454 - 2023 - Comparing State of the Art Rule-Based Tools for Information Extraction
Scafoglieri, Federico - 04b Atto di convegno in volume
congresso: 7th International Joint Conference on Rules and Reasoning (RuleML+RR) (Oslo)
libro: Rules and Reasoning. RuleML+RR 2023. - ()

11573/1674518 - 2022 - Automatic Information Extraction from Investment Product Documents
Scafoglieri, Federico; Monaco, Alessandra; Neccia, Giulia; Lembo, Domenico; Limosani, Alessandra; Medda, Francesca - 04b Atto di convegno in volume
congresso: 30th Italian Symposium on Advanced Database Systems (SEBD 2022) (Tirrenia; (Pisa))
libro: SEBD 2022 Italian Symposium on Advanced Database Systems - ()

11573/1628561 - 2021 - Boosting Information Extraction through Semantic Technologies: The KIDs use case at CONSOB
Scafoglieri, Federico; Lembo, Domenico; Limosani, Alessandra; Medda, Francesca; Lenzerini, Maurizio - 04b Atto di convegno in volume
congresso: 20th International Semantic Web Conference (ISWC 2021) (Virtual Event)
libro: ISWC-Posters-Demos-Industry 2021 International Semantic Web Conference (ISWC) 2021: Posters, Demos, and Industry Tracks - ()

11573/1476963 - 2020 - Ontology Mediated Information Extraction with MASTRO SYSTEM-T
Lembo, Domenico; Li, Yunyao; Popa, Lucian; Qian, Kun; Scafoglieri, Federico - 04b Atto di convegno in volume
congresso: 19th International Semantic Web Conference (ISWC 2020) (Athens; Greece)
libro: ISWC 2020 Posters, Demos, and Industry Tracks - ()

11573/1476978 - 2020 - Ontology Mediated Information Extraction in Financial Domain with Mastro System-T
Lembo, Domenico; Li, Yunyao; Popa, Lucian; Scafoglieri, Federico - 04b Atto di convegno in volume
congresso: 6th International Workshop on Data Science for Macro-Modeling (DSMM) (Portland, OR; USA)
libro: DSMM '20: Proceedings of the Sixth International Workshop on Data Science for Macro-Modeling - (978-145038030-0)

11573/1476957 - 2020 - Ontology-based Document Spanning Systems for Information Extraction
Lembo, Domenico; Scafoglieri, Federico - 01a Articolo in rivista
rivista: INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING (World Scientific Publishing Co.) pp. 3-26 - issn: 1793-351X - wos: WOS:000541467300002 (2) - scopus: 2-s2.0-85091996633 (5)

11573/1385329 - 2019 - Semantic technologies for the production and publication of open data in ACI - Automobile club d’Italia
Bouquet, P.; Caltabiano, D.; Catoni, E.; Fabrizi, A.; Lembo, D.; Minenna, M.; Molinari, A.; Pompermaier, F.; Punchina, M.; Ronconi, G.; Ruzzi, M.; Santarelli, V.; Scafoglieri, Federico - 04b Atto di convegno in volume
congresso: 2019 ISWC Satellite Tracks (Posters and Demonstrations, Industry, and Outrageous Ideas), ISWC 2019-Satellites (Auckland; New Zealand)
libro: ISWC 2019 Satellites. Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019) - ()

11573/1276654 - 2019 - A Formal Framework for Coupling Document Spanners with Ontologies
Lembo, Domenico; Scafoglieri, Federico - 04b Atto di convegno in volume
congresso: 2th International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) (Cagliari; Italy)
libro: 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) - ()

11573/1385381 - 2019 - Coupling ontologies with document spanners
Lembo, Domenico; Scafoglieri, Federico - 04b Atto di convegno in volume
congresso: 32nd International Workshop on Description Logics (DL 2019) (Oslo, Norway)
libro: Proceedings of the 32nd International Workshop on Description Logics - ()

11573/1384007 - 2018 - Ontology Population for Open-Source Intelligence
Ganino, G.; Lembo, D.; Mecella, M.; Scafoglieri, F. - 04b Atto di convegno in volume
congresso: 26th Italian Symposium on Advanced Database Systems, SEBD 2018 (Castellaneta Marina (Taranto); Italy)
libro: SEBD 2018 Italian Symposium on Advanced Database Systems - ()

11573/1184181 - 2018 - Ontology population for open-source intelligence: A GATE-based solution
Ganino, Giulio; Lembo, Domenico; Mecella, Massimo; Scafoglieri, Federico - 01a Articolo in rivista
rivista: SOFTWARE-PRACTICE & EXPERIENCE (John Wiley & Sons Limited:1 Oldlands Way, Bognor Regis, P022 9SA United Kingdom:011 44 1243 779777, EMAIL: cs-journals@wiley.co.uk, INTERNET: http://www.wiley.co.uk, Fax: 011 44 1243 843232) pp. 2303-2330 - issn: 0038-0644 - wos: WOS:000449824700009 (11) - scopus: 2-s2.0-85053536013 (20)

11573/1079315 - 2018 - Ontology Population from Raw Text Corpus for Open-Source Intelligence
Ganino, Giulio; Lembo, Domenico; Scafoglieri, Federico - 04b Atto di convegno in volume
congresso: 17th International Conference on Web Engineering, ICWE 2017 (Rome; Italy)
libro: Current Trends in Web Engineering - (978-3-319-74432-2; 978-3-319-74433-9)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma