A simple computational method for the identification of disease-associated loci in complex, incomplete pedigrees

Reading time: 6 minute
...

📝 Original Info

  • Title: A simple computational method for the identification of disease-associated loci in complex, incomplete pedigrees
  • ArXiv ID: 0710.5625
  • Date: 2007-10-31
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (원문에 저자 명단이 포함되지 않음) — **

📝 Abstract

We present an approach, called the "Shadow Method," for the identification of disease loci from dense genetic marker maps in complex, potentially incomplete pedigrees. "Shadow" is a simple method based on an analysis of the patterns of obligate meiotic recombination events in genotypic data. This method can be applied to any high density marker map and was specifically designed to exploit the fact that extremely dense marker maps are becoming more readily available. We also describe how to interpret and associate meaningful P-Values to the results. Shadow has significant advantages over traditional parametric linkage analysis methods in that it can be readily applied even in cases in which the topology of a pedigree or pedigrees can only be partially determined. In addition, Shadow is robust to variability in a range of parameters and in particular does not require prior knowledge of mode of inheritance, penetrance or clinical misdiagnosis rate. Shadow can be used for any SNP data, but is especially effective when applied to dense samplings. Our primary example uses data from Affymetrix 100k SNPChip samples in which we illustrate our approach by analyzing simulated data as well as genome-wide SNP data from two pedigrees with inherited forms of kidney failure, one of which is compared with a typical LOD score analysis.

💡 Deep Analysis

📄 Full Content

Studies of genetic disease have been remarkably successful in identifying disease genes and novel biological pathways. For family-based analyses of phenotypes with single, highly penetrant disease alleles, the first step is the identification of a locus harboring the mutant allele. This requires the acquisition and subsequent analysis of a significant amount of genetic data. As regards the former, the ease with which investigators can accomplish genome-wide genotyping has increased tremendously in recent years. For example, one commercial microarray technology (Affymetrix SNPChip) now allows rapid chip-based genotyping of approximately 10, 000, 100, 000, and 500, 000 SNPs (see Matsuzaki et al (4) and ( 5)).

Most of the currently available linkage approaches were originally developed with the goal of extracting as much information as possible from a relatively small set of markers. We base our approach on the fact that with very dense genetic maps, we can ignore markers that are not fully informative and still extract most of the useful genetic information. In essence, our method is based on identifying obligate recombination events and using the distribution of these events to identify genomic regions inherited identical by descent (IBD). This allows us to handle the complicated requirements of real data and the often complex and incompletely known structures of available pedigrees. We call our technique the Shadow Method and introduce it in the next section.

Our motivation for the development of Shadow is severalfold. Perhaps most important is the fact that available software is overmatched by the great number of computations required in order to calculate parametric or non-parametric LOD scores for large pedigrees and large data sets. It is known that using standard methods, the size of the calculation (as measured in the number of arithmetic operations) increases exponentially in pedigree size or number of markers used (the various elaborations of the Elston-Stewart algorithm as in Ott (6) and the NPL algorithm as Kruglyak et al (3) respectively). In contrast, the computational load of Shadow only grows linearly with the number of markers and at a rate that is less than exponential in pedigree size. In the worst case scenario, it increases exponentially in sample1 size, but is independent of pedigree size. This enables us to analyze large pedigrees.

Computational complexity is just one concern. We are also cognizant of the fact that in analyses of large complex pedigrees, it can be extremely useful for investigators to have an index of which regions are most likely to harbor disease genes by virtue of the of sharing regions IBD in affected individuals, as well as a measure, given data from a subset of a pedigree, of distance from IBD for any region of the genome. This relies on the computation of something we call the Shadow function at the locus x, denoted S(x). It is effectively a measure of just how inconsistent the data is with the hypothesis that the pattern of inheritance at a given locus is from IBD. In particular, S(x) = 0 implies IBD at x.

Thus, the Shadow method is a conceptually and computationally simple technique with several features that we believe make it useful for the analysis of large, complex, and perhaps incomplete pedigrees, particularly for relatively rare diseases caused by uncommon genetic variants of large effect: (1) Shadow enables rapid identification of genetic regions most likely to harbor IBD regions in pedigrees; (2) Shadow measures how inconsistent such regions (and in fact all regions) are from being IBD; and (3) Shadow helps to identify the source of such inconsistencies in “almost IBD” regions. We also develop methods to assess how likely we are to find such IBD or “almost IBD” regions by chance. The specifics of this measure and the details of its interpretation are presented in the next section.

We illustrate the use of Shadow by analyzing both simulated data as well as genome-wide SNP data from two pedigrees with inherited forms of kidney disease.

The pedigrees are illustrated in Figure 1. The family FS-Z has a relatively simple pedigree and it is known that the responsible gene defect is a point mutation in the TRPC6 gene on chromosome 11q (Reiser et al (2)). In this case a full multi-point linkage analysis will work well and we compare our results to a LOD score analysis. The second family we analyze, the FG-FM family, has an incomplete and large pedigree, a situation which makes standard linkage approaches unreliable and/or impossible.

At the core of the Shadow method is the idea that the sample data provides us with a means to measure for each locus x the degree to which the data is inconsistent with the hypothesis that the region around x is IBD and thus is possibly within a diseaseharboring allele. We call this measure the Shadow function and denote it as S. Since we focus on inconsistency, a locus x that is consistent with the IBD assumption has S(x) =

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut