Haplotype Inference on Pedigrees with Recombinations, Errors, and Missing Genotypes via SAT solvers
The Minimum-Recombinant Haplotype Configuration problem (MRHC) has been highly successful in providing a sound combinatorial formulation for the important problem of genotype phasing on pedigrees. Despite several algorithmic advances and refinements that led to some efficient algorithms, its applicability to real datasets has been limited by the absence of some important characteristics of these data in its formulation, such as mutations, genotyping errors, and missing data. In this work, we propose the Haplotype Configuration with Recombinations and Errors problem (HCRE), which generalizes the original MRHC formulation by incorporating the two most common characteristics of real data: errors and missing genotypes (including untyped individuals). Although HCRE is computationally hard, we propose an exact algorithm for the problem based on a reduction to the well-known Satisfiability problem. Our reduction exploits recent progresses in the constraint programming literature and, combined with the use of state-of-the-art SAT solvers, provides a practical solution for the HCRE problem. Biological soundness of the phasing model and effectiveness (on both accuracy and performance) of the algorithm are experimentally demonstrated under several simulated scenarios and on a real dairy cattle population.
💡 Research Summary
The paper addresses the longstanding challenge of haplotype inference on pedigrees when real‑world data contain recombination events, genotyping errors, and missing calls. While the Minimum‑Recombinant Haplotype Configuration (MRHC) model provides a parsimonious formulation that minimizes the number of recombinations, it assumes complete and error‑free genotypes, limiting its applicability. To overcome this, the authors introduce the Haplotype Configuration with Recombinations and Errors (HCRE) problem, formally defined as the (r, e)‑HC problem: given a pedigree with possibly incomplete genotypes, find a haplotype configuration that respects Mendelian inheritance, deviates from the observed genotypes in at most e loci (errors), and contains at most r recombination events.
The authors prove that HCRE is APX‑hard, indicating that polynomial‑time exact algorithms are unlikely. Nevertheless, they devise a practical exact solution by reducing HCRE to a Boolean satisfiability (SAT) instance. The reduction encodes three families of constraints: (1) Mendelian consistency between parent and child using source‑vector variables; (2) genotype consistency, where a per‑locus error flag e_i
Comments & Academic Discussion
Loading comments...
Leave a Comment