In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of species overlap and diversity borrowed from ecological studies. While this approach is robust in a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method which adjusts for the data under-sampling as well as provides a unifying approach to simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in ecological studies of biodiversity patterns. The procedure for evaluating model's fit is described along with the public domain software developed to perform the necessary diagnostics. The model-driven analysis is seen to compare favorably vis a vis traditional methods when applied to the data from T-cell receptors in transgenic mice populations.
Deep Dive into Model for Diversity Analysis of Antigen Receptor Repertoires.
In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of species overlap and diversity borrowed from ecological studies. While this approach is robust in a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method which adjusts for the data under-sampling as well as provides a unifying approach to simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in ecological studies of biodiversity patterns. The procedure for evaluating model’s fit is described along with th
Model for Diversity Analysis of Antigen Receptor Repertoires
Grzegorz A. Rempala∗, Micha l Seweryn†, and Leszek Ignatowicz‡
February 24, 2010
Abstract
In modern molecular biology one of the most common ways of studying a vertebrate immune system
is to statistically compare the counts of sequenced antigen receptor clones (either immunoglobulins or
T-cell receptors) derived from various tissues under different experimental or clinical conditions. The
problem is difficult and does not fit readily into the standard statistical framework of contingency tables
primarily due to serious under-sampling of the receptor populations. This under-sampling is caused on
one hand by the extreme diversity of antigen receptor repertoires maintained by the immune system
and, on the other, by the high cost and labor intensity of the receptor data collection process. In most of
the recent immunological literature the differences across antigen receptor populations are examined via
non-parametric statistical measures of species overlap and diversity borrowed from ecological studies.
While this approach is robust in a wide range of situations, it seems to provide little insight into the
underlying clonal size distribution and the overall mechanism differentiating the receptor populations.
As a possible alternative, the current paper presents a parametric method which adjusts for the data
under-sampling as well as provides a unifying approach to simultaneous comparison of multiple receptor
groups by means of the modern statistical tools of unsupervised learning. The parametric model is based
on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the
univariate Poisson-lognormal models used in ecological studies of biodiversity patterns. The procedure
for evaluating model’s fit is described along with the public domain software developed to perform the
necessary diagnostics.
The model-driven analysis is seen to compare favorably vis a vis traditional
methods when applied to the data from T-cell receptors in transgenic mice populations.
Keywords: T-cells, antigen receptors, computational immunology, species diversity estimation, Poisson
abundance models, Lognormal distribution, dissimilarity measure, dendrogram, mutual information.
2010 AMS Subject Classification: 62P10, 92B05
∗Corresponding author. Department of Biostatistics and the Cancer Center, Medical College of Georgia, Augusta, GA
30912. E-mail:grempala@mcg.edu
† Wydzia l Matematyki, Universytet L´odzki, L´odz, Poland. E-mail msewery@math.uni.lodz.pl
‡ Department of Medicine, Center for Biotechnology and Genomic Medicine, Medical College of Georgia. E-mail: ligna-
towicz@mcg.edu
1
arXiv:1003.1066v1 [q-bio.BM] 4 Mar 2010
1
Introduction
The major feature of the adaptive immune system is its capacity to generate clones of B and T-cells that
are able to recognize and neutralize specific antigens. Both cell types recognize antigens by a special
class of surface molecules called B- and T-cell receptors. The methodology developed in this paper will
apply to both types of receptors, for the sake of clarity and simplicity, we describe the background and
the overall problem in terms of T-cell receptors. For a general introduction to the molecular biology of
the immune system, we refer interested reader to e.g., Janeway (2005).
A single T-cell receptor (TCR) is composed of two chains, α and β, that are formed during T-
cell differentiation.
Both chains are formed by rearrangements of genetic segments, Vα and Jα for
TCRα chain and Vβ, Dβ and Jβ for TCRβ chain.
Since there are a number of segments of each
type in the genomic DNA , a great number of different α and β chains are generated.
This chain
diversity is further increased by the recombination process when individual nucleotides might be added
or deleted at the junctional sites. The region containing these highly variable junctions is the third of
three complementarity-determining regions (CDRs) that are seen crystallographically to contact antigen.
The sources of TCR diversity are thus naturally broken down hierarchically into gene segment family
(library), segment within family, CDR3 length and CDR3 nucleotide diversity.
Both combinatorial
and insertional re-arrangements result in the huge TCR repertoire ensuring that immune system has a
potential to recognize a large number of antigens. For instance, it is estimated that in mice the number
of different TCRs that can be formed exceeds 1015 (Davis and Bjorkman, 1988; Casrouge et al., 2000).
For humans, it is estimated that over 1018 different TCRs can be produced and the number of different
TCR species (TCR richness) in a human at any given time has been estimated to exceed 107 (Arstila
et al., 1999; Naylor et al., 2005). This clonal diversity of TCR populations makes them particularly
challenging objects to analyze statistically.
In what follows, we are concerned with the statistical analysis of the diversity of TCR repertoire
samples obtained
…(Full text truncated)…
This content is AI-processed based on ArXiv data.