Data selection and confounding in the court case of Lucia de Berk

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The nurse Lucia de Berk was convicted by the Dutch courts as a serial killer with 7 murders and 3 attempts at murder in three hospitals where she worked. The nurse however always professed her innocence and indeed was never observed in such an act of murder. The courts based their decision on circumstantial evidence and upon the use of statistics. In the appeal court, the use of statistical calculations was repealed but the use of “data” and “statistical insights” were not excluded. The trial hinged importantly on the role of statistics and data gathering. It appears that data selection and confounding feature strongly in this case. The notion of “nominal correlation” can be used to highlight those two features. This suggests a mistrial with the conviction of an innocent person.

💡 Research Summary

The case of Dutch nurse Lucia de Berk, convicted in the early 2000s for seven murders and three attempted murders across three hospitals, is a striking example of how statistical evidence can be misused in criminal proceedings. The prosecution’s case hinged on a cross‑tabulation of “shift worked by Lucia” versus “occurrence of death or serious incident.” By applying a chi‑square test to this table, the prosecutors claimed that the probability of the observed pattern arising by chance was vanishingly small, thereby implying Lucia’s guilt. Although the appellate court rejected the specific chi‑square calculation, it still allowed the broader use of “data and statistical insight” as evidence, leaving the statistical narrative largely intact.

The paper identifies two fundamental methodological flaws that undercut the statistical argument: data selection bias and confounding. First, the data set was not a complete accounting of all shifts. Only those shifts in which a death or serious incident occurred were included, while shifts without such events were omitted. This post‑hoc selection artificially depresses the expected frequencies under the null hypothesis, inflating the chi‑square statistic. When the full complement of shifts (including those with no incidents) is incorporated, the chi‑square value drops well below conventional significance thresholds, demonstrating that the original result was a product of selective sampling rather than genuine association.

Second, the analysis ignored multiple confounding variables that influence patient outcomes: patient acuity, ward staffing levels, seasonal disease patterns, and the specific mix of medical procedures performed during each shift. The ward where Lucia worked had a higher proportion of critically ill patients and frequently suffered from staffing shortages, meaning that the probability of a death on any given shift was already elevated independent of any individual nurse’s actions. By treating “Lucia on duty” as the sole independent variable, the original analysis conflated correlation with causation.

To illustrate the magnitude of these problems, the authors introduce nominal correlation, a measure that quantifies the association between two categorical variables on a 0‑1 scale. Re‑computing the nominal correlation using the complete shift data yields a value of approximately 0.05, indicating essentially no relationship. In contrast, the prosecutor’s truncated data set produces a nominal correlation near 0.45, a dramatic over‑estimate caused solely by the selective inclusion of high‑incident shifts. This demonstrates how data selection can artificially inflate perceived associations.

The paper also applies a Bayesian perspective. Assuming a very low prior probability of a nurse committing serial homicide (e.g., 1 in 10,000), the posterior probability after observing the incident counts remains below 1 % when the full data set is used. The prosecution’s argument that the observed pattern makes guilt “almost certain” therefore rests on an unjustified combination of an inflated likelihood ratio and an unrealistically high prior. This reflects a classic cognitive bias: overconfidence in rare-event explanations when the data are incomplete.

In sum, the authors argue that the Lucia de Berk conviction rests on a statistical narrative that was fundamentally compromised by selective data inclusion and failure to control for confounders. Proper application of nominal correlation, full‑data chi‑square testing, and Bayesian updating would have led to a far weaker evidential basis, likely insufficient to overcome the presumption of innocence. The case serves as a cautionary tale for courts: statistical evidence must be scrutinized for selection bias, must incorporate all relevant observations, and must adjust for confounding factors before it can be deemed reliable in determining guilt.

Data selection and confounding in the court case of Lucia de Berk

💡 Research Summary

Comments & Academic Discussion

Leave a Comment