Sensitivity of inferences in forensic genetics to assumptions about founding genes
Many forensic genetics problems can be handled using structured systems of discrete variables, for which Bayesian networks offer an appealing practical modeling framework, and allow inferences to be computed by probability propagation methods. However, when standard assumptions are violated–for example, when allele frequencies are unknown, there is identity by descent or the population is heterogeneous–dependence is generated among founding genes, that makes exact calculation of conditional probabilities by propagation methods less straightforward. Here we illustrate different methodologies for assessing sensitivity to assumptions about founders in forensic genetics problems. These include constrained steepest descent, linear fractional programming and representing dependence by structure. We illustrate these methods on several forensic genetics examples involving criminal identification, simple and complex disputed paternity and DNA mixtures.
💡 Research Summary
The paper addresses a fundamental challenge in forensic genetics: the reliance on the assumption that founding genes (the alleles contributed by the original individuals in a pedigree) are statistically independent. While Bayesian networks provide a powerful framework for modeling the discrete variables that arise in forensic problems—such as criminal identification, paternity disputes, and DNA mixture deconvolution—this independence assumption often fails in practice. Situations that break the assumption include unknown allele frequencies, identity‑by‑descent (IBD) among contributors, and heterogeneous population structures. When independence is violated, hidden dependencies among founding genes arise, and the standard exact probability propagation algorithms used in Bayesian networks become either inaccurate or computationally infeasible.
To quantify how sensitive forensic inferences are to these violations, the authors develop and compare three methodological approaches.
-
Constrained Steepest Descent (CSD). This technique treats the posterior probability of interest as an objective function and searches for its worst‑case value under bounded perturbations of the uncertain parameters (e.g., allele frequencies, IBD coefficients). By computing the gradient of the objective with respect to the parameters and moving in the direction that maximally reduces (or inflates) the posterior, CSD yields a rapid, locally optimal sensitivity estimate. It is computationally cheap and well‑suited for time‑critical forensic contexts, but it does not guarantee a global optimum.
-
Linear Fractional Programming (LFP). Many forensic posterior probabilities can be expressed as a ratio of a linear numerator (the likelihood of the evidence) to a linear denominator (the prior probability). LFP reformulates the worst‑case sensitivity problem as a linear fractional program, which can be transformed into a standard linear program and solved to global optimality. This method provides rigorous bounds on how much the posterior can change when the underlying assumptions are relaxed, at the cost of higher computational demand, especially as the number of loci or contributors grows.
-
Structural Representation of Dependence. Instead of treating dependence as a perturbation, this approach augments the Bayesian network itself. Additional latent nodes (e.g., a common ancestor or a population‑structure variable) are introduced to capture the correlation among founding genes explicitly. Once the network is expanded, conventional belief propagation can be applied unchanged. The advantage is a transparent graphical model that can be inspected by forensic experts; the drawback is a substantial increase in network size and inference time, which may become prohibitive for complex mixtures or multi‑generation pedigrees.
The authors apply these three techniques to four representative forensic scenarios.
-
Criminal Identification: A suspect’s profile is compared against a database. When allele frequencies are uncertain, CSD and LFP both indicate posterior probability shifts of roughly 5–12 %, while the structural model reproduces the same shift but requires three times longer computation.
-
Simple Paternity Testing: A child’s genotype is compared with two alleged fathers. Introducing IBD (e.g., due to consanguinity) reduces the posterior probability of paternity by more than 10 % in the worst case. LFP yields the most conservative (lowest) posterior, CSD provides a fast approximation, and the structural model visualizes the IBD link but adds considerable overhead.
-
Complex (Multiple‑Child) Paternity: Several children are tested against multiple potential fathers. The dimensionality of dependence grows rapidly. CSD remains computationally tractable and identifies a near‑worst‑case configuration, whereas LFP’s memory requirements explode, making it impractical. The structural approach offers a clear depiction of the multi‑parent dependencies but suffers from inference times measured in hours.
-
DNA Mixture Deconvolution: A mixed sample containing DNA from several contributors is analyzed. Uncertainty in both allele frequencies and the number of contributors creates a highly sensitive posterior. CSD efficiently locates a parameter set that maximally lowers the likelihood of a particular contributor, LFP supplies a rigorous lower bound but takes orders of magnitude longer to solve, and the structural model, by adding a node for each contributor, yields an interpretable graph at the expense of severe over‑parameterization and risk of over‑fitting.
Across all examples, the paper demonstrates that sensitivity analysis is not optional but essential: small deviations from independence can materially alter the weight of DNA evidence presented in court. The three methods complement each other. CSD is best suited for rapid “what‑if” assessments; LFP is the gold standard for establishing provable worst‑case bounds; and structural modeling is valuable when expert testimony must convey the nature of the dependence to a judge or jury.
In the discussion, the authors propose several avenues for future work. Hybrid algorithms that combine the speed of CSD with the global guarantees of LFP could provide balanced solutions for high‑dimensional problems. Integrating Markov chain Monte Carlo (MCMC) sampling within the Bayesian network could allow approximate inference when exact propagation is infeasible. Finally, the paper calls for the development of standardized reporting guidelines so that forensic laboratories can consistently communicate sensitivity findings, thereby improving the transparency and reliability of DNA evidence in the legal system.
Comments & Academic Discussion
Loading comments...
Leave a Comment