FEDERICO CROCE

Dottore di ricerca

ciclo: XXXIII

supervisore: Maurizio Lenzerini

Titolo della tesi: Explaining Datasets in Ontology-based Data Management

We deal with explaining datasets in the context of Ontology-based Data Management (OBDM). This context is constituted by a three-layered architecture in which: an ontology layer provides a high-level, logic-based specification of a domain of interest; a data source layer stores the actual information; and a mapping layer semantically links the data sources to the specification of the domain. We study two different scenarios in this context, and we associate two corresponding different facets to the problem of explaining datasets. In particular, the two scenarios are identified by the way in which we assume the dataset is provided. In case the dataset is provided as the result of evaluating a query over the OBDM system, we assume the goal of an explanation is to show evidence for the fact that the provided dataset represents the answers of the query with respect to the OBDM system. For this reason, we refer to this problem as the one that deals with explaining query answers in OBDM. Alternatively, we consider the case in which the dataset is provided as such, and we assume the goal of an explanation is to provide a semantic characterization of the content of this dataset, by using the knowledge of the domain of interest represented by the ontology of the OBDM system. We refer to the problem associated with this case as to the one that deals with explaining the content of datasets. We provide several contributions for both scenarios. For the scenario of explaining query answers, we consider ontologies expressed in the popular DL-Lite family of Description Logic, and we address the problem of computing explanations for answers to queries in an OBDM system where queries are either positive, in particular conjunctive queries, or negative, i.e., negation of conjunctive queries. We provide the following contributions: (i) we propose a formal, comprehensive framework for explaining query answers in OBDM systems based on DL-Lite; (ii) we present an algorithm that, given a tuple returned as an answer to a positive query, and given a weighting function, examines all the explanations of the answer, and chooses the best explanation according to such function; (iii) we do the same for the answers to negative queries. Notably, on the way to get the latter result, we present what appears to be the first algorithm that computes the answers to negative queries in DL-Lite. For the scenario of explaining the content of datasets, we study two different variations of the problem. The first variation, that we call query characterization, aims at finding a semantic characterization of a single dataset, by means of a logical expression that when evaluated as a query over the ontology, returns exactly the dataset. The second variation is a generalization of the former, that we call query separation, and it aims at finding a semantic characterization of two datasets, the one representing a set of positive examples, and the other representing a set of negative examples. Such a characterization is searched by means of a logical expression that when evaluated as a query over the ontology, returns all the positive examples, and none of the negative ones. For both variations, since an expression that properly characterizes an input dataset (resp. two input datasets) does not always exist, our first contribution is to propose (best) approximations of the proper characterization and separation, called (minimally) complete and (maximally) sound characterizations and separations. We do this by presenting a general framework for the query characterization and separation problems in OBDM. Then, in a setting that uses the most popular languages for the OBDM paradigm, our second contribution is a comprehensive study of three natural computational problems associated with the framework, namely Verification, i.e. checking whether a given expression is a proper, complete, or sound characterization (resp. separation) of a given dataset (resp. of two given datasets). Existence, i.e. checking whether a proper, or best approximated characterization (resp. separation) of a given dataset (resp. two given datasets) exists at all. Computation, i.e. computing any proper, or any best approximated characterization (resp. separation) of a given dataset (resp. of two given datasets). Finally, we discuss on possible customization strategies that could be applied to the problem of explaining datasets in OBDM, hence that are applicable to both scenarios we deal with, to come up with solutions that best fit a set of user-defined criteria describing the degree of comprehensibility of an explanation for that specific user.

Produzione scientifica

11573/1744773 - 2025 - A1C-Based Subgroups of Type 2 Diabetes and Their Association with Microvascular Complications—Cluster Analysis of a Large Italian Cohort

Acitelli, Elisa; Marconi, Lorenzo; Di Teodoro, Giulia; Palagi, Laura; Salvatore, Cecilia; Valentini, Riccardo; Croce, Federico; Grani, Giorgio; Rosati, Riccardo; Maranghi, Marianna - 04d Abstract in atti di convegno

rivista: DIABETES (American Diabetes Association:National Service Center, 1701 North Beaureguard Street:Alexandria, VA 22311:(800)232-3472, (703)549-1500, INTERNET: http://www.diabetes.org/diabetscare, Fax: (703)549-6995) pp. - - issn: 0012-1797 - wos: WOS:001522824702425 (0) - scopus: (0)

congresso: American Diabetes Association 85th Scientific Sessions (ADA 2025) (Chicago; United States of America)

libro: Diabetes - ()

11573/1719621 - 2024 - Ontology-Based Data Preparation in Healthcare: The Case of the AMD-STITCH Project

Croce, Federico; Valentini, Riccardo; Maranghi, Marianna; Grani, Giorgio; Lenzerini, Maurizio; Rosati, Riccardo - 01a Articolo in rivista

rivista: SN COMPUTER SCIENCE (Singapore: Springer Singapore) pp. - - issn: 2661-8907 - wos: (0) - scopus: 2-s2.0-85190461184 (6)

11573/1616540 - 2021 - Query Definability and Its Approximations in Ontology-based Data Management

Cima, G.; Croce, F.; Lenzerini, M. - 04b Atto di convegno in volume

congresso: ACM International Conference on Information and Knowledge Management (Virtual; Australia)

libro: CIKM ’21 Proceedings of the 30th ACM International Conference on Information & Knowledge Management - (9781450384469)

11573/1389347 - 2020 - Ontology-based explanation of classifiers

Croce, F.; Cima, G.; Lenzerini, M.; Catarci, T. - 04b Atto di convegno in volume

congresso: Workshops of the 23rd International Conference on Extending Database Technology/23rd International Conference on Database Theory, EDBT-ICDT-WS 2020 (Copenhagen; Denmark)

libro: Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference - ()

11573/1359118 - 2019 - On queries with inequalities in DL-LiteR≠

Cima, G.; Croce, Federico; Lenzerini, M.; Poggi, A.; Toccacieli, E. - 04b Atto di convegno in volume

congresso: 32nd International Workshop on Description Logics, DL 2019 (Oslo; Norway)

libro: Proceedings of the 32nd International Workshop on Description Logics - ()

11573/1334012 - 2018 - A framework for explaining query answers in dl-lite

Croce, F.; Lenzerini, M. - 04b Atto di convegno in volume

congresso: 21st International Conference on Knowledge Engineering and Knowledge Management, EKAW 2018 (Nancy; France)

libro: Knowledge Engineering and Knowledge Management - (978-3-030-03666-9; 978-3-030-03667-6)