Contemporary biomedical research is increasingly constrained by the high costs of experimental studies, together with stringent ethical, regulatory, and privacy requirements governing the collection and dissemination of patient-level data. These constraints have fostered two distinct yet closely related statistical developments. On the one hand, synthetic data have been proposed as a tool to support data access, reproducibility, and privacy protection. On the other hand, observational data are increasingly used to emulate experimental settings through causal inference methods, particularly those based on propensity scores. While these approaches are usually treated as separate methodological domains, this talk argues that they should be understood as two responses to the same underlying problem.
The crucial issue common to both is the limited attention often paid to the data-generating process. In the synthetic data framework, inadequate representation of such a process may lead to poor reproduction of rare events, distortion of dependence structures, and amplification of existing biases. In observational causal inference, covariate imbalance is only part of the problem: when propensity scores are estimated through standard logistic specifications that poorly represent the treatment assignment mechanism, adjustment procedures may not only be ineffective, but may in fact exacerbate bias in causal effect estimation. Within this perspective, particular attention will be devoted to Bayesian networks as a flexible framework for representing complex dependence structures and for estimating propensity scores beyond standard parametric formulations. More broadly, the seminar highlights the centrality of the data-generating process as a crucial statistical problem in current biomedical research.
20 Marzo 2026, ore 12
Clelia Di Serio
Vita-Salute San Raffaele University, Milano
In person: Room V (4th floor) building CU002 Scienze Statistiche
Webinar: https://uniroma1.zoom.us/j/83625004899?pwd=bXCtz0mp759PUh2lkqT0BUoVa0Uegg.1
ID riunione: 836 2500 4899
Passcode: 123456