Computer Science / Information Theory Computer Science / Machine Learning Mathematics / math.IT Mathematics / math.ST Statistics / stat.ME Statistics / stat.TH

Entropy Concentration and the Empirical Coding Game

February 23, 2026

Reading time: 5 minute

...

#Computer Science #Mathematics #Machine Learning #Statistics #Information Theory

📝 Original Info

Title: Entropy Concentration and the Empirical Coding Game
ArXiv ID: 0809.1017
Date: 2008-09-17
Authors: Researchers from original ArXiv paper

📝 Abstract

We give a characterization of Maximum Entropy/Minimum Relative Entropy inference by providing two `strong entropy concentration' theorems. These theorems unify and generalize Jaynes' `concentration phenomenon' and Van Campenhout and Cover's `conditional limit theorem'. The theorems characterize exactly in what sense a prior distribution Q conditioned on a given constraint, and the distribution P, minimizing the relative entropy D(P ||Q) over all distributions satisfying the constraint, are `close' to each other. We then apply our theorems to establish the relationship between entropy concentration and a game-theoretic characterization of Maximum Entropy Inference due to Topsoe and others.

💡 Deep Analysis

Deep Dive into Entropy Concentration and the Empirical Coding Game.

We give a characterization of Maximum Entropy/Minimum Relative Entropy inference by providing two strong entropy concentration' theorems. These theorems unify and generalize Jaynes' concentration phenomenon’ and Van Campenhout and Cover’s conditional limit theorem'. The theorems characterize exactly in what sense a prior distribution Q conditioned on a given constraint, and the distribution P, minimizing the relative entropy D(P ||Q) over all distributions satisfying the constraint, are close’ to each other. We then apply our theorems to establish the relationship between entropy concentration and a game-theoretic characterization of Maximum Entropy Inference due to Topsoe and others.

📄 Full Content

arXiv:0809.1017v1 [cs.IT] 5 Sep 2008 Entropy Concentration and the Empirical Coding Game Peter Gr¨unwald∗ CWI, P.O. Box 94079,1090 GB Amsterdam, NL Abstract We give a characterization of Maximum Entropy/Minimum Relative Entropy inference by providing two ‘strong entropy concentration’ theo- rems. These theorems unify and generalize Jaynes’ ‘concentration phe- nomenon’ and Van Campenhout and Cover’s ‘conditional limit theorem’. The theorems characterize exactly in what sense a prior distribution Q con- ditioned on a given constraint and the distribution ˜P minimizing D(P||Q) over all P satisfying the constraint are ‘close’ to each other. We then apply our theorems to establish the relationship between entropy concentration and a game-theoretic characterization of Maximum Entropy Inference due to Topsøe and others. 1 Introduction Jaynes’ Maximum Entropy (MaxEnt) Principle is a well-known principle for in- ductive inference [Csisz´ar, 1975, 1991, Topsøe, 1979, van Campenhout and Cover, 1981, Cover and Thomas, 1991, Gr¨unwald and Dawid, 2004]. It has been applied to statistical and machine learning problems ranging from protein modeling to ∗Also: research fellow at EURANDOM, P.O. Box 513, 5600 MB Eindhoven, The Nether- lands. This is a slightly modiﬁed version of the paper with the same title that appeared in Statistica Neerlandica 62(3), 2008, pages 374–392, on the occasion of the 10th anniversary of EURANDOM. Some of the results presented here have already appeared in the conference pa- per [Gr¨unwald, 2001a] and the technical report [Gr¨unwald, 2001b]. Theorems 5.3 and 5.4 of Section 5 are new and have not been published before. The paper beneﬁted enormously from various discussions with Richard Gill, Phil Dawid and Franz Merkl. This work was supported in part by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. This publication only reﬂects the author’s views. 1 stock market prediction [Kapur and Kesavan, 1992]. One of its characteriza- tions (some would say ‘justiﬁcations’) is the so-called concentration phenomenon [Jaynes, 1978, 1982]. Here is an informal version of this phenomenon, in the words of Jaynes [2003]: “If the information incorporated into the maximum-entropy analysis includes all the constraints actually operating in the random exper- iment, then the distribution predicted by maximum entropy is over- whelmingly the most likely to be observed experimentally.” For the case in which a prior distribution over the domain at hand is available, van Campenhout and Cover [1981] have proven the related conditional limit theo- rem. In Sections 2-4, we provide a strong generalization of both the concentration phenomenon and the conditional limit theorem. In Section 5, the results of Sec- tion 4 are used to extend an existing game-theoretic characterization (again, some would say “justiﬁcation”) of Maximum Entropy due to Topsøe [1979]. In this way, we provide sharper results on two of the most frequently cited characterizations of the maximum entropy principle. 2 Informal Overview Maximum Entropy Let X be a random variable taking values in some set X , which (only for the time being!) we assume to be ﬁnite: X = {1, . . . , m}. Let P, Q be distributions for X with probability mass functions p and q. We deﬁne HQ(P), the Q-entropy of P, as HQ(P) = −EP log p(x) q(x) = −D(P||Q), (1) where D(·∥·) is the Kullback-Leibler (KL) divergence between P and Q [Cover and Thomas, 1991]. In the usual MaxEnt setting, we are given a ‘prior’ distribution Q and a moment constraint: E[T(X)] = ˜t (2) where T is some function T : X →Rk for some k > 0 (More general formulations with arbitrary convex constraints exist [Csisz´ar, 1975], but here we stick to con- straints of form (2)). We deﬁne, if it exists, ˜P to be the unique distribution over X that maximizes the Q-entropy over all distributions (over X ) satisfying (2): ˜P = arg max {P :EP [T(X)]=˜t} HQ(P) = arg min {P :EP [T(X)]=˜t} D(P||Q) (3) 2 The MaxEnt Principle then tells us that, in absence of any further knowledge about the ‘true’ or ‘posterior’ distribution according to which data are distributed, our best guess for it is ˜P. In practical problems we are usually not given a constraint of form (2). Rather we are given an empirical constraint of the form 1 n n X i=1 T(Xi) = ˜t which we always abbreviate to ‘T (n) = ˜t’ (4) The MaxEnt Principle is then usually applied as follows: suppose we are given an empirical constraint of form (4). We then have to make predictions about new data coming from the same source. In absence of knowledge of any ‘true’ distribution generating this data, we should make our predictions based on the MaxEnt distribution ˜P for the moment constraint (2) corresponding to empir- ical constraint (4). ˜P is extended to several outcomes by taking the product distribution. The Concentration Phenomenon and The Conditional Limit Theorem Why should this procedure make any sense? Here is one justiﬁc

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Entropy Concentration and the Empirical Coding Game

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

Irreversible Monte Carlo Algorithms for Efficient Sampling

Learning Hidden Markov Models using Non-Negative Matrix Factorization

Modified-CS: Modifying Compressive Sensing for Problems with Partially Known Support

Start searching

No results found