Performing Bayesian Risk Aggregation using Discrete Approximation Algorithms with Graph Factorization
Risk aggregation is a popular method used to estimate the sum of a collection of financial assets or events, where each asset or event is modelled as a random variable. Applications, in the financial services industry, include insurance, operational risk, stress testing, and sensitivity analysis, but the problem is widely encountered in many other application domains. This thesis has contributed two algorithms to perform Bayesian risk aggregation when model exhibit hybrid dependency and high dimensional inter-dependency. The first algorithm operates on a subset of the general problem, with an emphasis on convolution problems, in the presence of continuous and discrete variables (so called hybrid models) and the second algorithm offer a universal method for general purpose inference over much wider classes of Bayesian Network models.
💡 Research Summary
This dissertation tackles the challenging problem of risk aggregation within the Bayesian network (BN) framework, focusing on models that combine continuous and discrete variables (hybrid models) and exhibit high‑dimensional inter‑dependencies. Traditional exact inference methods quickly become infeasible due to exponential growth in computational and memory requirements. To overcome these limitations, the author introduces two novel algorithms that together provide a comprehensive solution for both convolution‑type risk aggregation and general inference in large hybrid BNs.
The first algorithm, Bayesian Factorization and Elimination (BFE), is designed specifically for n‑fold convolution problems that arise when aggregating losses or other financial quantities. BFE begins by converting the original hybrid BN into a binary‑factorized (BF) representation. This step decomposes multivariate distributions into a set of small, conditionally independent factors, dramatically reducing the size of the conditional probability tables. After factorization, BFE applies a tailored Variable Elimination (VE) schedule that removes auxiliary variables while preserving exact marginal distributions. A key innovation is the Compound Density Factorization (CDF) technique, which efficiently handles the joint distribution of frequency (count) and severity (loss) variables, allowing the algorithm to compute compound loss densities without resorting to costly Monte‑Carlo simulations or FFT‑based convolutions. The author also presents a Log‑Based Aggregation (LBA) scheme that incrementally builds the convolution, further improving scalability. Experimental results on both independent‑severity and common‑cause models demonstrate that BFE matches the numerical accuracy of exact methods while achieving order‑of‑magnitude speed‑ups, and it successfully performs de‑convolution to infer latent frequency variables from observed aggregate losses.
The second algorithm, Dynamic Discretized Belief Propagation (DDBP), addresses the broader class of inference problems in high‑dimensional hybrid BNs. DDBP integrates a Dynamic Discretization (DD) process with a new graph‑based region construction called Triplet Region Construction (TRC). DD adaptively partitions continuous variables into a small number of intervals, refining the partition boundaries iteratively based on posterior entropy reduction, thereby controlling approximation error. Once discretized, the network is transformed into an binary‑factorized graph, and TRC builds a region graph where each region (a “triplet”) consists of a primary variable and two neighboring variables. This structure minimizes cycles and ensures that the region graph satisfies the MaxEnt‑Normal property and has correct counting numbers (all equal to one). Consequently, Generalized Belief Propagation (GBP) messages passed on the TRC graph converge reliably and produce unbiased marginal estimates. The author proves convergence, demonstrates polynomial‑time complexity O(n·k³) (n = number of variables, k = number of discretization bins), and validates the approach on a suite of experiments: 5‑ to 10‑dimensional Conditional Gaussian Densely Connected Chain DAGs (CG‑DCCD), a 20‑dimensional CG‑DCCD model, and a linear Gaussian BN with observations. Across these benchmarks, DDBP’s marginal estimates exhibit KL divergences on the order of 10⁻³ compared with exact Junction Tree results or large‑scale Monte‑Carlo baselines, confirming both high accuracy and scalability.
Beyond the core algorithms, the thesis provides an extensive review of BN fundamentals, factor graphs, region‑based free‑energy approximations, and existing approximate inference techniques (Loopy Belief Propagation, Expectation Propagation, Variational Inference, etc.). It also details the conversion of sparse BNs into Densely Connected Chain DAGs (DCCD) and the construction of binary factor graphs, laying the theoretical groundwork for the proposed methods.
In summary, the dissertation makes three principal contributions: (1) the BFE framework, which enables exact‑quality convolution for hybrid risk models through binary factorization, variable elimination, and compound density factorization; (2) the DDBP algorithm, which couples adaptive discretization with the TRC region graph to deliver fast, accurate inference for arbitrary‑size hybrid BNs; and (3) rigorous theoretical analysis and empirical validation that together demonstrate polynomial‑time scalability without sacrificing precision. The work opens avenues for real‑time risk aggregation in finance and insurance, and suggests future extensions such as non‑binary factorization, online discretization for streaming data, and parallel GPU implementations.
Comments & Academic Discussion
Loading comments...
Leave a Comment