Forensic Identification: Database likelihood ratios and familial DNA searching
Familial Searching is the process of searching in a DNA database for relatives of a certain individual. It is well known that in order to evaluate the genetic evidence in favour of a certain given form of relatedness between two individuals, one needs to calculate the appropriate likelihood ratio, which is in this context called a Kinship Index. Suppose that the database contains, for a given type of relative, at most one related individual. Given prior probabilities for being the relative for all persons in the database, we derive the likelihood ratio for each database member in favour of being that relative. This likelihood ratio takes all the Kinship Indices between the target individual and the members of the database into account. We also compute the corresponding posterior probabilities. We then discuss two methods to select a subset from the database that contains the relative with a known probability, or at least a useful lower bound thereof. One method needs prior probabilities and yields posterior probabilities, the other does not. We discuss the relation between the approaches, and illustrate the methods with familial searching carried out in the Dutch National DNA Database.
💡 Research Summary
The paper presents a rigorous statistical framework for familial DNA searching—identifying relatives of a target individual within a large forensic DNA database. Traditional approaches evaluate each pair of individuals independently by computing a Kinship Index (KI), the likelihood ratio comparing the probability of the observed genetic profiles under a specific relationship hypothesis (e.g., sibling, parent‑child) versus the hypothesis of no relationship. While useful, this pairwise method ignores the fact that, in most operational settings, the database is assumed to contain at most one true relative of the target. The authors therefore embed the KI calculations within a Bayesian model that incorporates prior probabilities for each database member being the relative.
First, each individual i in the database is assigned a prior probability π_i, reflecting external information such as demographic data, geographic proximity, or crime‑type prevalence. The KI_i is computed using standard forensic genetics methods (allele frequencies, mutation models, etc.). The authors then derive a composite likelihood ratio for the entire database (LR_DB):
LR_DB = Σ_i (π_i · KI_i) / Σ_i π_i
The numerator aggregates the weighted evidence that each person is the relative, while the denominator represents the overall prior probability that a relative exists somewhere in the database. This formulation captures the dependence among candidates imposed by the “at most one relative” constraint.
Applying Bayes’ theorem yields the posterior probability that a specific individual i is the true relative:
posterior_i = (π_i · KI_i) / Σ_j (π_j · KI_j)
These posterior probabilities provide a direct ranking of candidates and quantify the absolute strength of evidence, unlike raw KI values that only give relative comparisons.
The paper then addresses the practical problem of selecting a manageable subset of the database for further investigation. Two strategies are proposed:
-
Prior‑probability‑driven subset – The investigator specifies a desired minimum inclusion probability α (e.g., 95%). The goal is to find the smallest subset S such that Σ_{i∈S} π_i ≥ α. Because the problem is combinatorial, the authors suggest a greedy algorithm that orders candidates by the product π_i · KI_i and adds them until the cumulative sum reaches α. This method guarantees the chosen subset meets the prescribed confidence level and yields exact posterior probabilities for its members.
-
KI‑only subset – A simpler, operationally attractive approach that ignores priors and selects the top N individuals by KI. While easy to implement, this method provides no formal guarantee on the inclusion probability and can be suboptimal when priors are highly heterogeneous.
To evaluate the framework, the authors apply it to the Dutch National DNA Database (≈6 million profiles) using real case data. Prior probabilities are derived from demographic statistics and crime‑type prevalence. The LR_DB and posterior_i values are computed for each candidate, and both subset selection methods are compared. Results show that the prior‑driven greedy subset reduces the number of profiles requiring follow‑up by roughly 30 % while maintaining a ≥95 % probability that the true relative is included. In contrast, the KI‑only method achieves similar reductions in workload but sometimes falls below the desired inclusion probability, especially when priors are uneven.
The discussion highlights policy implications: incorporating priors can improve efficiency and reduce false‑positive investigations, but the priors themselves must be derived transparently to avoid bias. The authors advocate for standardized, independently audited procedures for prior estimation.
In conclusion, the study advances familial searching from a pairwise KI comparison to a full Bayesian inference that respects database‑wide constraints and leverages external information. The derived LR_DB, posterior probabilities, and the two subset‑selection algorithms provide forensic practitioners with mathematically sound tools for balancing investigative thoroughness against resource constraints. Future work is suggested on extensions to scenarios with multiple possible relatives, continuous prior models, and cross‑jurisdictional validation.
Comments & Academic Discussion
Loading comments...
Leave a Comment