VITTORIA LA SERRA

Dottoressa di ricerca

ciclo: XXXV

Titolo della tesi: Anomaly detection in panel data identification features: comparing temporal and supervised classification record linkage-based methods

With repeated observations of the same units in time, panel data enable researchers to study the dynamics of a broad set of phenomena. Units in a panel are identified by key features for them to be followed through the observation periods; these must be chosen so that it is ensured that they remain fixed and consistent over time. Losing the uniqueness on the key features results in losing track of the units’ history; therefore, if errors in the reported key features occur, these must be identified. This issue is approached due to a real-life problem observed in granular insurance data, reported since 2016 and used by some Central Banks to build statistics for the European System of Central Banks (ESCB). The reported insurance assets are uniquely identified by codes that are required to be kept stable and consistent over time; nevertheless, due to reporting errors, unexpected changes in the codes may occur, causing inconsistencies when compiling insurance statistics. This causes a limited decreased quality of the produced statistics, which cannot be neglected. Two approaches are proposed in this work to deal with the described issue: a temporal one making use of ARIMA models for time series prediction and a supervised classification one using Machine Learning models. The two, apparently, very different methodologies are used for the same goal, looking at the issue from two different perspectives: the former exploiting the temporal aspect of the data, the latter by focusing on subsequent couples of reporting periods. Both rely on the idea that records in the data that do not share the same value for the key feature but refer to the same unit will be equal or at least similar to the other observed features, with a record linkage perspective. The two methodologies are trained and tested on Italian data from 2019-2022, with ad hoc procedures to ensure robustness and reliability of the results. Promising test results are presented to show the potential benefits of the two proposed methodologies on data quality management processes and the efficiency gains coming from automation.

Produzione scientifica

11573/1658174 - 2022 - Statistical matching for anomaly detection in insurance assets granular reporting

La Serra, Vittoria; Svezia, Emiliano - 01a Articolo in rivista

rivista: BIS WORKING PAPERS (Basel: Bank for International Settlements) pp. 1-12 - issn: 1682-7678 - wos: (0) - scopus: (0)

11573/1551835 - 2021 - Exploring methods for the assessment of temporal trends in mortality and hospitalization in Italian industrially contaminated sites

La Serra, Vittoria; Pasetto, Roberto; Manno, Valerio; Iavarone, Ivano; Jona Lasinio, Giovanna; Minelli, Giada - 01a Articolo in rivista

rivista: ANNALI DELL'ISTITUTO SUPERIORE DI SANITÀ (Roma: Istituto Superiore di Sanità Roma: Istituto poligrafico dello stato) pp. 183-192 - issn: 2384-8553 - wos: WOS:000660879200011 (1) - scopus: 2-s2.0-85108247034 (2)

11573/1564125 - 2021 - A Tucker3 method application on adjusted-PMRs for the study of work-related mortality

Malpassuti, Vittoria Carolina; La Serra, Vittoria; Massari, Stefania - 04b Atto di convegno in volume

congresso: 51th Meeting of the Italian Statistical Society (Pisa)

libro: Book of short papers SIS 2021 - ()

11573/1480718 - 2020 - A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

La Serra, Vittoria; Faes, Christel; Hens, Niel; Brutti, Pierpaolo - 04b Atto di convegno in volume

congresso: 50th Meeting of the Italian Statistical Society (Pisa)

libro: Book of short papers SIS 2020 - (9788891910776)