Compositional data analysis for modelling and forecasting mortality using the α-transformation

Compositional data analysis for modelling and forecasting mortality using the α-transformation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Mortality forecasting is crucial for demographic planning and actuarial studies, especially for projecting population ageing and longevity risk. Classical approaches largely rely on extrapolative methods, such as the Lee-Carter (LC) model, which use mortality rates as the mortality measure. In recent years, compositional data analysis (CoDA), which respects summability and non-negativity constraints, has gained increasing attention for mortality forecasting. While the centred log-ratio (CLR) transformation is commonly used to map compositional data to real space, the α-transformation, a generalisation of log-ratio transformations, offers greater flexibility and adaptability. This study contributes to mortality forecasting by introducing the α-transformation as an alternative to the CLR transformation within a non-functional CoDA model that has not been previously investigated in existing literature. To fairly compare the impact of transformation choices on forecast accuracy, zero values in the data are imputed, although the α-transformation can inherently handle them. Using age-specific life table death counts for males and females in 31 selected European countries/regions from 1983 to 2018, the proposed method demonstrates comparable performance to the CLR transformation in most cases, with improved forecast accuracy in some instances. These findings highlight the potential of the α-transformation for enhancing mortality forecasting within the non-functional CoDA framework.


💡 Research Summary

This paper investigates the use of compositional data analysis (CoDA) for mortality forecasting, focusing on the α‑transformation as an alternative to the widely used centred log‑ratio (CLR) transformation. Traditional mortality forecasting methods, such as the Lee‑Carter (LC) model, operate on mortality rates and often ignore the summability constraint inherent in life‑table death counts. By treating age‑specific death counts (dₜ,ₓ) as compositional data—positive vectors that sum to a constant radix—the authors preserve the natural demographic constraint and avoid the incoherence that can arise when forecasting rates independently across ages.

The authors first compile a comprehensive dataset of male and female death counts for 31 European countries/regions spanning 1983–2018, obtained from the Human Mortality Database. Ages are treated discretely (0–110+), and for ages 80 and above the Kannisto model is applied to smooth noisy high‑age rates. Because zeros appear in the raw death counts for younger ages, a small‑value imputation scheme is employed for both CLR and α‑transformation to ensure a fair comparison, even though the α‑transformation can theoretically handle zeros directly.

Two transformations are then applied to the centred death‑count matrix: the standard CLR, which maps a P‑part composition to a (P‑1)‑dimensional real vector via log‑ratios, and the α‑transformation, a one‑parameter power transformation that interpolates between log‑ratio analysis (α = 0) and Euclidean analysis (α = 1). The α‑parameter is tuned for each country–sex combination by minimizing out‑of‑sample prediction error, yielding a diverse set of optimal α values (e.g., 0.99 for Belgium, 0.81 for West Germany, 0.20 for France).

After transformation, singular value decomposition (SVD) is used to obtain a low‑rank approximation of the data matrix. The rank k is chosen based on explained variance: k = 4 for males and k = 7 for females, capturing over 80 % of the total variance. This yields time series for the mortality index κₜ, which is then forecasted over an eight‑year horizon using two ARIMA specifications: the conventional ARIMA(0,1,1) with drift and an automatically selected ARIMA model via a stepwise algorithm. The forecasted κₜ series, together with the age‑specific loadings βₓ, are back‑transformed to the simplex, re‑scaled to the life‑table radix, and bias‑corrected to produce predicted death counts ˆdₜ,ₓ.

Model performance is evaluated on the 2011–2018 test period using root mean squared error (RMSE) and mean absolute error (MAE) averaged across ages. The α‑transformation delivers comparable accuracy to CLR in most cases and outperforms it in several countries, achieving RMSE reductions of 5–10 % for nations such as Belgium, West Germany, and France. In a few instances (e.g., Switzerland, Spain) the optimal α collapses to zero, indicating that CLR remains the best choice for those datasets.

The study’s contributions are threefold: (1) it introduces the α‑transformation into a non‑functional (discrete‑age) CoDA mortality model, a setting not previously explored; (2) it demonstrates that tuning α can yield modest but consistent gains in forecast accuracy; and (3) it provides a transparent preprocessing pipeline—including zero handling, smoothing, and bias correction—that can be replicated in other demographic contexts. Limitations include the partial loss of CoDA theoretical properties (e.g., scale invariance) when α ≠ 0, the risk of over‑fitting α to the training period, and the reliance on relatively simple ARIMA models for κₜ, which may be insufficient for long‑term forecasting.

Future research directions suggested by the authors include: employing Bayesian hierarchical or deep‑learning time‑series models for κₜ to capture more complex dynamics; extending the framework to multi‑cause mortality by integrating several compositional vectors; and investigating zero‑preserving implementations of the α‑transformation to fully exploit its theoretical advantage. Overall, the paper provides strong evidence that the α‑transformation is a viable and sometimes superior alternative to CLR for compositional mortality forecasting, opening new avenues for demographic and actuarial modelling.


Comments & Academic Discussion

Loading comments...

Leave a Comment