Thesis title: A Bayesian nonparametric approach to correct for underreporting in count data
In this research thesis we deal with the issue of underreporting, often affecting administrative data particularly in developing countries or more deprived areas within a country or region, with particular focus on its effects on the estimation of the prevalence of a phenomenon. When health or vital statistics are concerned, underreporting may severely affect estimates, impacting on government intervention policies and resource allocation. We propose a nonparametric compound Poisson model that introduces a latent clustering structure for the reporting probabilities and estimates them jointly with the model’s parameters exploiting an auxiliary variable associated with the reporting process. The proposed model is then used to estimate the prevalence of Chronic Kidney Disease (CKD) in Apulia, Italy, from unpublished administrative data coming from a retrospective study conducted between the 1st of January 2011 and the 31st of December 2013. Whereas accurate estimation of the prevalence is necessary for monitoring, surveillance and management purposes, counts are expected to be considerably underreported, especially in some areas of Apulia, which is one of the most deprived and heterogeneous regions in Italy. Our results agree with previous findings and experts’ expectations and highlight interesting geographical patterns of the disease. We compare our model to existing approaches in the literature using simulated as well as real data described in previous literature, namely early neonatal mortality risk in Minas Gerais, Brazil. In comparison with alternative models, the proposed approach proves to be accurate and particularly suitable when prior information about data quality is not completely accurate.