Data breaches in the catastrophe framework & beyond
Development of sustainable insurance for cyber risks, with associated benefits, inter alia requires reduction of ambiguity of the risk. Considering cyber risk, and data breaches in particular, as a man-made catastrophe clarifies the actuarial need for multiple levels of analysis - going beyond claims-driven loss statistics alone to include exposure, hazard, breach size, and so on - and necessitating specific advances in scope, quality, and standards of both data and models. The prominent human element, as well as dynamic, networked, and multi-type nature, of cyber risk makes it perhaps uniquely challenging. Complementary top-down statistical, and bottom-up analytical approaches are discussed. Focusing on data breach severity, measured in private information items (‘ids’) extracted, we exploit relatively mature open data for U.S. data breaches. We show that this extremely heavy-tailed risk is worsening for external attacker (‘hack’) events - both in frequency and severity. Writing in Q2-2018, the median predicted number of ids breached in the U.S. due to hacking, for the last 6 months of 2018, is 0.5 billion. But with a 5% chance that the figure exceeds 7 billion - doubling the historical total. ‘Fortunately’ the total breach in that period turned out to be near the median.
💡 Research Summary
The paper reframes data breaches as a man‑made catastrophe and argues that sustainable cyber‑insurance requires a multi‑layered risk analysis that goes far beyond simple claims‑driven loss statistics. By adopting the classic catastrophe‑modeling framework—exposure, hazard, vulnerability and loss—the authors show how each component can be quantified for cyber risk. Exposure is measured in terms of the volume and sensitivity of digital assets (databases, cloud services, IoT devices, etc.). Hazard captures the variety of threat vectors, with particular emphasis on external attackers (“hack” events). Vulnerability incorporates security controls, system complexity, and especially the human factor (security awareness, insider behavior). Loss is expressed not in monetary terms but in the number of private information items (“ids”) that are exfiltrated, providing a more direct metric for insurance and re‑insurance pricing.
The authors exploit publicly available U.S. breach databases (e.g., Privacy Rights Clearinghouse, Verizon DBIR) to build a statistical picture of breach severity. Their analysis reveals an extremely heavy‑tailed distribution: the frequency of hacking‑related breaches has risen by roughly 22 % per year, and the upper quantiles of breach size have expanded dramatically. A Pareto‑type tail with an estimated shape parameter around 1.3 indicates that the mean of the distribution is close to diverging, making traditional average‑based risk measures inadequate.
For modeling, a compound Poisson framework is adopted: breach occurrences follow a Poisson process, while breach sizes are modeled with heavy‑tailed distributions such as Pareto, log‑normal or Beta‑prime. Using data up to 2017, the authors forecast the second half of 2018. The median (50 th percentile) predicted number of ids breached in the United States is 0.5 billion, but there is a 5 % probability that the total exceeds 7 billion—roughly double the historical cumulative total. In reality, the actual 2018‑H2 total fell close to the median, illustrating the usefulness of the probabilistic approach.
The paper stresses that a purely top‑down statistical method is insufficient because of reporting delays, under‑reporting, and inconsistent breach definitions. Consequently, a complementary bottom‑up approach is advocated: constructing firm‑level network topologies, assessing security controls, and simulating attacker behavior to generate scenario‑based loss estimates. By integrating both perspectives, insurers can obtain robust loss distributions even when data are scarce, and can perform sensitivity analyses for regulatory or technological changes.
From a practical insurance standpoint, the authors recommend pricing on tail‑risk metrics such as Value‑at‑Risk (VaR) or Tail‑Value‑at‑Risk (TVaR) rather than on simple averages. Reinsurance treaties should incorporate caps and excess‑of‑loss layers that explicitly account for the possibility of ultra‑large breaches (tens of billions of ids). Moreover, industry‑wide standards for breach reporting, a unified definition of “ids,” and real‑time threat‑intelligence sharing platforms are essential to improve data quality, reduce model uncertainty, and mitigate systemic risk.
In conclusion, treating data breaches as catastrophes forces the insurance industry to adopt a more rigorous, data‑driven, and multi‑dimensional risk assessment. The heavy‑tailed nature of breach severity, the accelerating pace of hacking events, and the dominant role of human and network factors make cyber risk uniquely challenging. By advancing data standards, combining top‑down statistical models with bottom‑up scenario analysis, and aligning pricing and reinsurance structures with tail risk, insurers can build more resilient cyber‑insurance products that are capable of handling the evolving threat landscape.
Comments & Academic Discussion
Loading comments...
Leave a Comment