STEFANO PIERSANTI

Dottore di ricerca

ciclo: XXXIII



Titolo della tesi: Will it fail and why? A large case study of company default prediction with highly interpretable machine learning models

Finding a model to predict the default of a firm is a well-known topic over the financial and data science community. Many modern approaches try to find well-performing models to forecast it; those models often act like a black-box and don't give to financial institutions the fundamental explanations they need for their choices. This project aims to find a robust predictive model using a tree-based machine learning algorithm which flanked by a game-theoretic approach can provide sound explanations of the output of the model. In our work we use in combination three large and important datasets in order to investigate both bankruptcy and bank default: a state of difficulty for companies that often anticipates actual bankruptcy. We combine one dataset from the Italian Central Credit Register of the Bank of Italy, one from balance sheet information related to Italian firms, and information from AnaCredit dataset, a novel source of credit data by European Central Bank. We try to go beyond the academic study and to show how our model, based on some promising machine learning algorithms, outperforms the current default predictions made by credit institutions and at the same time, provides insights on the reasons that lead to a particular outcome. Default prediction problem has been studied for over fifty years, but remain a very hard task even today. Since it maintains a remarkable practical relevance, we try to put in practice our efforts in order to obtain the maximum prediction results, also in comparison with the reference literature. Finally, we dedicated a special effort to the analysis of predictions in highly unbalanced contexts. Imbalanced classes are a common problem in machine learning classification that typically is addressed by removing the imbalance in the training set. We conjecture that it is not always the best choice and propose the use of a slightly unbalanced training set, showing that this approach contributes to maximize the performance.

Produzione scientifica

11573/1390513 - 2020 - Firms Default Prediction with Machine Learning
Aliaj, T.; Anagnostopoulos, A.; Piersanti, S. - 04b Atto di convegno in volume
congresso: 4th Workshop on Mining Data for Financial Applications, MIDAS 2019, held in conjunction with the 19th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019 (Würzburg; Germany)
libro: Mining Data for Financial Applications - (978-3-030-37719-9; 978-3-030-37720-5)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma