Thesis title: Tensor decomposition in mortality: identifying subgroups, modeling, forecasting and exploring causes of death
With the increasing availability of temporal data, a researcher often analyzes information stored in matrices, in which entries are replicated on different occasions. For example, in the context of underwriting, pricing, or forecast, an actuary manages a greater amount of information and could have to deal with the death rates (or with log-death rates) by age and year (or different countries). The occasions can be time-varying or refer to different conditions, and in these situations, data can be stored in a 3-way array or tensor. Also, we can consider an additional dimension (the second occasions) and the data are stored in a 4-way array or (4-way) tensor. More in general, data can be stored in a N-way array or (N-way) tensor. These data are called multi-way data and they are analysed and handled by multi-way models. The aim of this work is to illustrate the different uses of DEDICOM, Tucker and CANDECOMP/PARAFAC models in the context of mortality, such as: identifying subgroups, modeling, forecasting and exploring causes of death. To achieve this aim, we gradually approach the problem, considering respectively three and four dimensions in different order and in various applications. In particular we focus on the Tucker method to modeling and exploring data, on Canonical Polyadic Decomposition (CANDECOMP) or Parallel Factors (PARAFAC) (CANDECOMP/PARAFAC) to forecasting data and on the nonnegative 3-way DEcomposition into DIrectional COMponents (DEDICOM) method, that is a special case of Tucker decomposition, to identify subgroups in the data. Aiming at identifying subgroups, we show how the DEDICOM is able to extract meaningful relational patterns from multi population log centered death rate mortality data. Our work, by specifically describing the mesoscale interactions between countries, could help to design appropriate actions against longevity risk that may impact on the stability conditions of life assurance and pensions. Concerning the mortality modeling, firstly we refer to the three-way Lee Carter model [72], that is based on Tucker 3 decomposition, and that can be considered an extension of the classic Lee carter model [46]. The proposed approach allows us to simplify the data structure and to obtain a rank reduced representation. Then, following this line of research and focusing on the forecasting, we propose a coherent mortality forecasting using a four-way CANDECOMP/PARAFAC decomposition, hence considering another dimension. Our proposal based on the four-way structure allows managing mortality data aggregated in multi-dimensional settings, according to common demographic features: age class, time, country, and gender. We deal with four-dimensional mortality data using two main approaches proposed in the literature, the first one which works on centered mortality rates as in [20], and the second one working on compositional data as in [7]. Here, we provide two steps further on methodological developments in the field of mortality analysis and forecasting in a high-dimensional space. Firstly, compared to the current literature, we use an additional dimension, implementing a 4-way tensor decomposition. Thus, we further extend this framework including the CoDa analysis in the spirit of [8]. In the last part, we apply the Tucker 4 method to the mortality by cause of death, hence considering again four dimensions and referring to death rates. This four-way component analysis is useful for the exploratory analysis of four-way data and in this context it reveals some peculiar aspects of the mortality phenomenon. In particular, this analysis lets us understand how the longevity improvements, witnessed in many high-income countries during the twentieth century, were determined especially by the reduction in a few specific major causes of death groups.