Markov basis and Groebner basis of Segre-Veronese configuration for testing independence in group-wise selections

Reading time: 6 minute
...

📝 Original Info

  • Title: Markov basis and Groebner basis of Segre-Veronese configuration for testing independence in group-wise selections
  • ArXiv ID: 0704.1074
  • Date: 2010-02-18
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We consider testing independence in group-wise selections with some restrictions on combinations of choices. We present models for frequency data of selections for which it is easy to perform conditional tests by Markov chain Monte Carlo (MCMC) methods. When the restrictions on the combinations can be described in terms of a Segre-Veronese configuration, an explicit form of a Gr\"obner basis consisting of moves of degree two is readily available for performing a Markov chain. We illustrate our setting with the National Center Test for university entrance examinations in Japan. We also apply our method to testing independence hypotheses involving genotypes at more than one locus or haplotypes of alleles on the same chromosome.

💡 Deep Analysis

Deep Dive into Markov basis and Groebner basis of Segre-Veronese configuration for testing independence in group-wise selections.

We consider testing independence in group-wise selections with some restrictions on combinations of choices. We present models for frequency data of selections for which it is easy to perform conditional tests by Markov chain Monte Carlo (MCMC) methods. When the restrictions on the combinations can be described in terms of a Segre-Veronese configuration, an explicit form of a Gr"obner basis consisting of moves of degree two is readily available for performing a Markov chain. We illustrate our setting with the National Center Test for university entrance examinations in Japan. We also apply our method to testing independence hypotheses involving genotypes at more than one locus or haplotypes of alleles on the same chromosome.

📄 Full Content

Suppose that people are asked to select items which are classified into categories or groups and there are some restrictions on combinations of choices. For example, when a consumer buys a car, he or she can choose various options, such as a color, a grade of air conditioning, a brand of audio equipment, etc. Due to space restrictions for example, some combinations of options may not be available. The problem we consider in this paper is testing independence of people's preferences in group-wise selections in the presence of restrictions. We assume that observations are the counts of people choosing various combinations in group-wise selections, i.e., the data are given in a form of a multiway contingency table with some structural zeros corresponding to the restrictions.

If there are m groups of items and a consumer freely chooses just one item from each group, then the combination of choices is simply a cell of an mway contingency table. Then the hypothesis of independence reduces to the complete independence model of an m-way contingency table. The problem becomes harder if there are some additional conditions in a group-wise selection. A consumer may be asked to choose up to two items from a group or there may be a restriction on the total number of items. Groups may be nested, so that there are further restrictions on the number of items from subgroups. Some restrictions may concern several groups or subgroups. Therefore the restrictions on combinations may be complicated.

As a concrete example we consider restrictions on choosing subjects in the National Center Test (NCT hereafter) for university entrance examinations in Japan. Due to time constraints of the schedule of the test, the pattern of restrictions is rather complicated. However we will show that restrictions of NCT can be described in terms of a Segre-Veronese configuration.

Another important application of this paper is a generalization of the Hardy-Weinberg model in population genetics. We are interested in testing various hypotheses of independence involving genotypes at more than one locus and haplotypes of combination of alleles on the same chromosome. Although this problem seems to be different from the above introductory motivation on consumer choices, we can imagine that each offspring is required to choose two alleles for each gene (locus) from a pool of alleles for the gene. He or she can choose the same allele twice (homozygote) or different alleles (heterozygote). In the Hardy-Weinberg model two choices are assumed to be independently and identically distributed. A natural generalization of the Hardy-Weinberg model for a single locus is to consider independence of genotypes of more than one locus. In many epidemiological studies, the primary interest is the correlation between a certain disease and the genotype of a single gene (or the genotypes at more than one locus, or the haplotypes involving alleles on the same chromosome). Further complication might arise if certain homozygotes are fatal and can not be observed, thus becoming a structural zero.

In this paper we consider conditional tests of independence hypotheses in the above two important problems from the viewpoint of Markov bases and Gröbner bases. Evaluation of P -values by Markov chain Monte Carlo (MCMC) method using Markov bases and Gröbner bases was initiated in Diaconis and Sturmfels (1998). See also Sturmfels (1995). Since then, this approach attracted much attention from statisticians as well as algebraists. Contributions of the present authors are found, for example, in Aoki andTakemura (2005, 2007), Ohsugi and Hibi (2005, 2006, 2007), and Takemura and Aoki (2004). Methods of algebraic statistics are currently actively applied to problems in computational biology (Pachter and Sturmfels, 2005). In algebraic statistics, results in commutative algebra may find somewhat unexpected applications in statistics. At the same time statistical problems may present new problems to commutative algebra. A recent example is a conjunctive Bayesian network proposed in Beerenwinkel et al. (2006), where a result of Hibi (1987) is successfully used. In this paper we present application of results on Segre-Veronese configuration to testing independence in NCT and Hardy-Weinberg models. In fact, these statistical considerations have prompted further theoretical developments of Gröbner bases for Segre-Veronese type configurations and we will present these theoretical results in our subsequent paper (Aoki et al., 2007).

Even in two-way tables, if the positions of the structural zeros are arbitrary, then Markov bases may contain moves of high degrees (Aoki and Takemura, 2005). See also Huber et al. (2006) and Rapallo (2006) for Markov bases of the problems with the structural zeros. However if the restrictions on the combinations can be described in terms of a Segre-Veronese configuration, then an explicit form of a Gröbner basis consisting of binomials of degree two with a squaref

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut