We will establish that the VC dimension of the class of d-dimensional ellipsoids is (d^2+3d)/2, and that maximum likelihood estimate with N-component d-dimensional Gaussian mixture models induces a geometric class having VC dimension at least N(d^2+3d)/2. Keywords: VC dimension; finite dimensional ellipsoid; Gaussian mixture model
Deep Dive into VC dimension of ellipsoids.
We will establish that the VC dimension of the class of d-dimensional ellipsoids is (d^2+3d)/2, and that maximum likelihood estimate with N-component d-dimensional Gaussian mixture models induces a geometric class having VC dimension at least N(d^2+3d)/2. Keywords: VC dimension; finite dimensional ellipsoid; Gaussian mixture model
For sets X โ R d and Y โ X, we say that a set B โ R d cuts Y out of X if Y = X โฉ B. A class C of subsets of R d is said to shatter a set X โ R d if every Y โ X is cut out of X by some B โ C. The vc dimension of C, denoted by VCdim(C), is defined to be the maximum n (or โ if no such maximum exists) for which some subset of R d of cardinality n is shattered by C.
The vc dimension of a class describes a complexity of the class, and are employed in empirical process theory [4], statistical and computational learning theory [8,3] and discrete geometry [6]. Although asymptotic estimates of vc dimensions are given for many classes, the exact values of vc dimensions are known for only a few classes (e.g. the class of Euclidean balls [10], the class of halfspaces [6], and so on).
In Section 2, we prove :
where a covariance matrix of size d is, by definition, a real, positive definite matrix. As in statistical learning theory [8], for a class P of probability density functions we consider the class D (P) of sets {x โ R d ; f (x) > s} such that f is any probability density function in P and s is any positive real number. Then D (G d ) is the class of d-dimensional ellipsoids.
For a positive integer N , an N -component d-dimensional Gaussian mixture model [7] ( (N, d)-gmm ) is, by definition, any probability distribution belonging to the convex hull of some N d-dimensional Gaussian distributions. Suppose we are given a sample from a population (N, d)-gmm but the number N of the components is unknown. To select N from the sample is an example of Akaike’s model selection problem [1] (see [5] for recent approach). The authors of [9] proposed to choose N by structural risk minimization principle [8], where an important role is played by the vc dimension of the class D ((G d ) N ) with (G d ) N being the class of (N, d)-gmms. Our result is that the vc dimension of D ((G d ) N ) is greater than or equal to N (d 2 + 3d)/2.
We will prove Theorem 1. For a positive integer B, a vector a โ R B \ { 0}, and c โ R, we write an affine function โ a,c (x) := t ax + c (x โ R B ) and an open halfspace H a,c := {x โ R B ; โ a,c (x) < 0}. We say a set W โ R B spans an affine subspace H โ R B , if H is the smallest affine subspace that contains W . The cardinality of a set S is denoted by |S|. For a vector a = t (a 1 , . . . , a
Proof. By an affine transformation we can assume without loss of generality that all the components of the vector a are 1 and that S is the canonical basis {e Proof. Let B be the right-hand side. Let ฯ be a map S d-1 โ R B which maps
there is some set S โ S d-1 such that |S| = B and ฯ(S) spans the hyperplane. Let a โ R B be a vector with the first d components being 1 and the other components being 0. By Lemma 2, for any ฮต > 0 the family
. By the definition of ฯ, the class of sets defined by quadratic inequalities
But, when ฮต is sufficiently small, all of these sets are ellipsoids.
We verify the converse inequality.
Below, the convex hull of a set A is denoted by conv(A).
If there are x = (u, x B ), y = (u, y B ) โ S such that x B < y B , then for any a โ R B with the last component nonnegative and for any c โ R we have โ a,c (x) < โ a,c (y), and thus x โ H a,c = {x โ R B ; โ a,c (x) < 0} whenever y โ H a,c . This contradicts the assumption “C shatters S.” Therefore, for the canonical projection ฯ :
By applying Radon’s theorem 1 [6] to the set ฯ(S) โ R B-1 , there is a partition (T 1 , T 2 ) of S such that we can take y from conv(ฯ(T 1 )) โฉ conv(ฯ(T 2 )). Then we see that there are z, z โฒ โ R such that (y, z) โ conv(T 1 ) and (y, z โฒ ) โ conv(T 2 ). Because C shatters S, there are some a โ R B and some c โ R such that the last component a B of a is nonnegative and a halfspace H a,c โ C cuts T 1 out of S. Thus, we have โ a,c (x) < 0 for all x โ conv(T 1 ) while โ a,c (x) โฅ 0 for all x โ conv(T 2 ) where T 2 = S \ T 1 . Therefore โ a,c (y, z) < โ a,c (y, z โฒ ) and a B > 0, we have z โฒ > z. On the other hand, some member H a โฒ ,c โฒ โ C cuts T 2 out of S. By a similar reasoning, we have z > z โฒ , which is a contradiction.
Proof. Let 0 โ conv(A). Then for every finite subset A โฒ of A, 0 / โ conv(A โฒ ) and there is a hyperplane J through 0 such that conv(A โฒ ) is contained in one of the two open halfspaces determined by J. So there is a new rectangular coordinate system such that the origin point is the same as the older rectangular coordinate system, one of the new coordinate axes is normal to J, and any a โ A โฒ is represented as (a 1 , . . . , a B ) with a B > 0. So VCdim({H a,c } aโA โฒ ,cโR ) โค B by Lemma 4, and thus VCdim({H a,c } aโA,cโR ) โค B.
The proof of Theorem 1 is as follows: By Lemma 3, we have only to establish that the class of d-dimensional ellipsoids has vc dimension less than or equal to B := (d 2 + 3d)/2. Assume otherwise. For a = t (a 1 , . . . , a B ) โ R B and x = t (x 1 , . . . , x d ), define a quadratic form q a (x) and a quadratic polynomial p a (x) by
Let A be the set of a โ R B s
…(Full text truncated)…
This content is AI-processed based on ArXiv data.