Entropy Concentration and the Empirical Coding Game

Reading time: 5 minute
...

📝 Original Info

  • Title: Entropy Concentration and the Empirical Coding Game
  • ArXiv ID: 0809.1017
  • Date: 2008-09-17
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We give a characterization of Maximum Entropy/Minimum Relative Entropy inference by providing two `strong entropy concentration' theorems. These theorems unify and generalize Jaynes' `concentration phenomenon' and Van Campenhout and Cover's `conditional limit theorem'. The theorems characterize exactly in what sense a prior distribution Q conditioned on a given constraint, and the distribution P, minimizing the relative entropy D(P ||Q) over all distributions satisfying the constraint, are `close' to each other. We then apply our theorems to establish the relationship between entropy concentration and a game-theoretic characterization of Maximum Entropy Inference due to Topsoe and others.

💡 Deep Analysis

Deep Dive into Entropy Concentration and the Empirical Coding Game.

We give a characterization of Maximum Entropy/Minimum Relative Entropy inference by providing two strong entropy concentration' theorems. These theorems unify and generalize Jaynes' concentration phenomenon’ and Van Campenhout and Cover’s conditional limit theorem'. The theorems characterize exactly in what sense a prior distribution Q conditioned on a given constraint, and the distribution P, minimizing the relative entropy D(P ||Q) over all distributions satisfying the constraint, are close’ to each other. We then apply our theorems to establish the relationship between entropy concentration and a game-theoretic characterization of Maximum Entropy Inference due to Topsoe and others.

📄 Full Content

arXiv:0809.1017v1 [cs.IT] 5 Sep 2008 Entropy Concentration and the Empirical Coding Game Peter Gr¨unwald∗ CWI, P.O. Box 94079,1090 GB Amsterdam, NL Abstract We give a characterization of Maximum Entropy/Minimum Relative Entropy inference by providing two ‘strong entropy concentration’ theo- rems. These theorems unify and generalize Jaynes’ ‘concentration phe- nomenon’ and Van Campenhout and Cover’s ‘conditional limit theorem’. The theorems characterize exactly in what sense a prior distribution Q con- ditioned on a given constraint and the distribution ˜P minimizing D(P||Q) over all P satisfying the constraint are ‘close’ to each other. We then apply our theorems to establish the relationship between entropy concentration and a game-theoretic characterization of Maximum Entropy Inference due to Topsøe and others. 1 Introduction Jaynes’ Maximum Entropy (MaxEnt) Principle is a well-known principle for in- ductive inference [Csisz´ar, 1975, 1991, Topsøe, 1979, van Campenhout and Cover, 1981, Cover and Thomas, 1991, Gr¨unwald and Dawid, 2004]. It has been applied to statistical and machine learning problems ranging from protein modeling to ∗Also: research fellow at EURANDOM, P.O. Box 513, 5600 MB Eindhoven, The Nether- lands. This is a slightly modified version of the paper with the same title that appeared in Statistica Neerlandica 62(3), 2008, pages 374–392, on the occasion of the 10th anniversary of EURANDOM. Some of the results presented here have already appeared in the conference pa- per [Gr¨unwald, 2001a] and the technical report [Gr¨unwald, 2001b]. Theorems 5.3 and 5.4 of Section 5 are new and have not been published before. The paper benefited enormously from various discussions with Richard Gill, Phil Dawid and Franz Merkl. This work was supported in part by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. This publication only reflects the author’s views. 1 stock market prediction [Kapur and Kesavan, 1992]. One of its characteriza- tions (some would say ‘justifications’) is the so-called concentration phenomenon [Jaynes, 1978, 1982]. Here is an informal version of this phenomenon, in the words of Jaynes [2003]: “If the information incorporated into the maximum-entropy analysis includes all the constraints actually operating in the random exper- iment, then the distribution predicted by maximum entropy is over- whelmingly the most likely to be observed experimentally.” For the case in which a prior distribution over the domain at hand is available, van Campenhout and Cover [1981] have proven the related conditional limit theo- rem. In Sections 2-4, we provide a strong generalization of both the concentration phenomenon and the conditional limit theorem. In Section 5, the results of Sec- tion 4 are used to extend an existing game-theoretic characterization (again, some would say “justification”) of Maximum Entropy due to Topsøe [1979]. In this way, we provide sharper results on two of the most frequently cited characterizations of the maximum entropy principle. 2 Informal Overview Maximum Entropy Let X be a random variable taking values in some set X , which (only for the time being!) we assume to be finite: X = {1, . . . , m}. Let P, Q be distributions for X with probability mass functions p and q. We define HQ(P), the Q-entropy of P, as HQ(P) = −EP  log p(x) q(x)  = −D(P||Q), (1) where D(·∥·) is the Kullback-Leibler (KL) divergence between P and Q [Cover and Thomas, 1991]. In the usual MaxEnt setting, we are given a ‘prior’ distribution Q and a moment constraint: E[T(X)] = ˜t (2) where T is some function T : X →Rk for some k > 0 (More general formulations with arbitrary convex constraints exist [Csisz´ar, 1975], but here we stick to con- straints of form (2)). We define, if it exists, ˜P to be the unique distribution over X that maximizes the Q-entropy over all distributions (over X ) satisfying (2): ˜P = arg max {P :EP [T(X)]=˜t} HQ(P) = arg min {P :EP [T(X)]=˜t} D(P||Q) (3) 2 The MaxEnt Principle then tells us that, in absence of any further knowledge about the ‘true’ or ‘posterior’ distribution according to which data are distributed, our best guess for it is ˜P. In practical problems we are usually not given a constraint of form (2). Rather we are given an empirical constraint of the form 1 n n X i=1 T(Xi) = ˜t which we always abbreviate to ‘T (n) = ˜t’ (4) The MaxEnt Principle is then usually applied as follows: suppose we are given an empirical constraint of form (4). We then have to make predictions about new data coming from the same source. In absence of knowledge of any ‘true’ distribution generating this data, we should make our predictions based on the MaxEnt distribution ˜P for the moment constraint (2) corresponding to empir- ical constraint (4). ˜P is extended to several outcomes by taking the product distribution. The Concentration Phenomenon and The Conditional Limit Theorem Why should this procedure make any sense? Here is one justific

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut