Entropy, Optimization and Counting

Entropy , Optimiza t ion and Counting Mohit Singh Microsoft Research, Redmond , USA Email: mohits@microsof t.com Nisheeth K. V ishnoi Microsoft Research, Bangalore, India Email: nisheeth.vish noi@gmail.com Abstract In this paper we study th e problem of computing m ax-entro py distributions over a discrete set of objects s ubject to observed marginals. Interest in such distrib utions arises due to their applicability in ar- eas such as statistical physics, economics, biology , information theory , machine learning, combinatorics and, more re cently , app roximatio n algorithms. A key difﬁculty in computing max-e ntropy distributions has been to show th at they hav e po lynomially -sized descriptions. W e show that such description s exist under general co nditions. Sub sequently , we show how algo rithms for (appro ximately) countin g the un - derlying discrete set can be translated i nto ef ﬁcient algorithms to (appr oximately) compute max-entropy distributions. In the reverse direction, we show how a ccess to algo rithms that co mpute max-entr opy distributions can b e used to co unt, which establishes an equiv alence between coun ting an d c omputin g max-entr opy distributions. Contents 1 Intr oduction 3 1.1 Our Contrib utio n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Informal Statement of Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 T echnica l O ver vie w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Org anizat ion of the Rest of the Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Pre liminaries 9 2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Combinato rial P olytop es, S eparati on Oracles, Counting Oracles and Interiority . . . . . . . 9 2.3 The Maximum Entrop y C on vex Progra m . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Formal State ment of Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 The Ellipsoid Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 Examples of Combinatori al Polyto pes 15 4 New Algorithmic A ppro aches f or the T ra veling Salesman Pro blem 16 5 Bounding B ox 17 6 Optimization via Counting 18 6.1 Proof of Theorem 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.2 Proof of Theorem 2.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7 Counting via Optimization 22 7.1 A Con vex Program for Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 7.2 The Interior of P ( M ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 7.3 A Separation Oracle for Interior ity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.4 The Ellipsoid Algorithm for Theorem 2.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 A Omitted P roofs 31 A.1 Duality of the Max-Entro py Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 A.2 Optimal and Near -Optimal Dual Solution s . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 B Generalize d C ounting and M inimizing Kullback-Leibler Div erg ence 34 C 2 → 2 -norm of ∇ f 37 2 1 Introd uction In this paper we study the computability of max-entrop y prob ability distrib utio ns over a discrete set. C on- sider a collection M of discrete obj ects whose buildi ng blocks are the elements [ m ] = { 1 , 2 , . . . , m } ; thus, M ⊆ { 0 , 1 } m . S uppose there is some unkno wn dis trib uti on p on M and we are giv en access to it vi a observ- ables θ , the simplest of w hich is the probability that an elemen t is present in a random sample M from p ; namely , P M ← p [ e ∈ M ] = θ e . If θ is all we kno w , what is o ur best g uess for p ? The prin ciple of max-en trop y [17, 18] postulates that the best guess is the distrib uti on wh ich maximizes (Shannon) entropy . 1 Roughly , the argumen t is that any distrib ution which has more information must violate some observ able, and a dis- trib utio n with less information m ust implicitly use additio nal indepe ndent observ ables, henc e contradictin g their maximality . Access to such a dist rib utio n could then be used to obtain samples which conform w ith the observ ed statistic s and to obtain the most informed guess to further statist ics. Giv en the fund amental natu re of such a dist rib utio n, it should not be surprising that it sho ws up in var ious areas such as statistical phys ics, economics , biology , information theory , machine learning, combinatori cs and, more recently , in the design of approx imation algorithms, see for instance [26]. From a computatio nal point of vie w , the questi on is ho w to ﬁ nd max-entro py distrib ution s. Note th at the entrop y func tion is conca ve, hence, the p roblem of max imizing it o v er the s et of al l proba bility distrib utions ov er M with mar ginals θ is a con vex programming problem. But what is the input? If θ and M are giv en explic itly , then a solut ion to this con ve x prog ram can be obtain ed, using the ellipsoid method , in time polyno mial in | M | and the number of bits needed to represent θ . 2 Ho wev er , in most interest ing applicatio ns, while θ is gi ve n ex plicit ly , M may b e an e xpon ential ly-size d set o v er the un i vers e { 0 , 1 } m and speciﬁed i mplicitly . For exa mple, the input cou ld be a grap h G = ( V , E ) w ith m edges and θ ∈ R m ≥ 0 whereas M could be all sp anning trees or all perfect matchings in G ; in such a scen ario | M | cou ld be exponen tial i n m . This renders the con v ex progra m for computing the max-e ntrop y distrib uti on prohib iti ve ly lar ge. Moreove r , simply describ ing the distrib ution could require exponen tial spac e. The good news is that one can use con vex programming duality to con vert the max-entro py progra m into one that has m v aria bles. Addition ally , und er mild condit ions on θ , strong duality holds and, hence, the max-entrop y distrib ution is a product distrib ution, i.e., there exist γ e for e ∈ [ m ] such that for a ll M ∈ M , p M ∝ ∏ e ∈ M γ e , see [4] o r Lemma 2.3. Thus, the m ax-ent rop y distrib ution for θ can b e desc ribed by m numbers γ = ( γ e ) e ∈ [ m ] . T here a re two main comput ationa l problems concer ning max-entrop y distri b ution s. The ﬁ rst is to, giv en θ and implicit access to M , obtain ˜ γ such that the en trop y of th e prod uct dist rib utio n ˜ p , corr espond ing to ˜ γ , is close to that of the max-entr opy dis trib utio n, and the obse rv able s obtained from ˜ p are clo se to θ . The second is, giv en ˜ γ , obtain a random sample from the dist rib utio n ˜ p . T he second problem can be handled by in vok ing the equi valenc e between appro ximate counti ng and sampling due to Jerrum, V aliant and V aziran i [22] and, hence, we focu s on the ﬁrst issue of computin g approximatio ns to the max-ent rop y distrib utions. 3 Ho wev er , the exist ence of ˜ γ which req uires polyn omially-man y bit s in the input size is, a priori, f ar from clear . This raise s the cruc ial questi on of whether good enoug h succinct description s exist for max-ent rop y distrib utions. While there is a vast amount of literatur e concernin g the computatio n of max-entrop y distrib ut ions (see for example t he surve y [37]), pre vious (partial) results on computing max-entrop y distrib utions required exp loiting some s pecial stru cture of the par ticular proble m at hand . In theore tical computer sc ience, intere st in rigorously computing max-e ntrop y dis trib utio ns deri v es from their applicatio ns to randomized rou nding and the design of non-tri via l app roximati on algorithms, notably to problems such as the symmetric and the asymmetric trav eli ng salesman problem (T SP/A TSP). For examp le, using a very technica l arg ument, [1] 1 Recall that the Shannon entropy of a distribution p = ( p M ) M ∈ M is H ( p ) def = ∑ M ∈ M p M ln 1 p M . 2 The ellipsoid algorithm requires a bounding ball which in this case is trivial sin ce ∀ M ∈ M , 0 ≤ p M ≤ 1 . 3 T o be precise, this equi v alence between random sampling and approximate counting holds when the combinatorial problem at hand is self-r educib le , see also [32] . 3 gi ve an algorith m to compute the max-entro py dist rib utio n over spanning trees of a graph. This algorithm was then used by them to improve the appro ximatio n ratio for A TSP , and by [30] to improv e the approx i- mation rat io for (gr aphica l) TSP , mak ing pro gress on tw o long-s tandin g problems. Subsequentl y , the abil ity to compute max-entro py distrib ution s o ver spanning trees has also been used to design efﬁcient priv ac y preser ving mechanisms for spann ing tree auctions by [16]. In anoth er examp le, [2] show ho w to comput e max-entr opy distrib uti ons ov er perfect matching s in a tree and use it to design approximati on algorithms for a max-min f air alloc ation proble m. The question of compu ting max-entro py distrib utions ov er perfe ct matching s in bipartite (and general) graphs, howe v er , has been an important open problem. Recent applic a- tions of the abili ty to compute max-entrop y distrib ution over perfect matchin gs in biparti te graphs include ne w approa ches for TS P and A TSP , see [36] and Sectio n 4. For counting prob lems, it is rare to obtain algorithms that can count ex actly , notable exce ptions being the prob lem of c ountin g spanning trees in a graph [28] or co unting cer tain proble ms restricted to tre es using dynamic programming. Most natural counting problems turn out to be #P-hard includin g the problem to count the number of perfect matchings in a bipartit e grap h [34, 35]. The goal the n shifts to ﬁnding algorithms that approximately count up to any ﬁxed preci sion [27, 33]. Here the most successful techni que has been the Mar ko v chai n Mont e Carlo (MCMC) method [20] which , when combin ed with t he eq ui v alen ce between approx imate cou nting and sampling [22] 4 , leads to many approximate countin g algo rithms. T he technique has bee n applied to man y pro blems includin g counting perfect matching s in a biparti te graph [21], counti ng bases in a balanced m atroid [11], countin g solutions to a knapsac k proble m [6 ] and counting the number of colori ngs in restricted graph families [1 9]. Ho we ver , t he prob lem of obtaining app roximate c ountin g oracles for se veral pr oblems remain s op en as well; perhap s a prominent e xample is th at of (appro ximately) counti ng the number of perfect matching s in a general graph. In combi natoric s, m ax-ent ropy distr ib ution s are often refe rred to as h ar d-cor e or Gib bs di strib utions and ha ve been intensely studi ed. While it is nice that the hard-co re distrib ution has a produc t form, i.e., p M ∝ ∏ e ∈ M γ e , the que stion of intere st here is whether one can upper bound the γ e s. Structurall y , such a bound implies th at hard- core distrib uti ons exhibit a s igniﬁcan t amount of app roximate stochas tic independe nce. In an importa nt result, [25] pro ved suc h a boun d for the hard-co re distrib ution over ma tching s in a graph. This led to resolving se v eral que stions i n volv ing asymptot ic graph and hyper graph pro blems. For instance res ults of [23, 25, 24] prove that the fraction al chromatic index of a graph asymptoticall y behav es as its chromatic inde x. Howe ver , the argu ment of [25] is quite difﬁcult and seems to be speciﬁc to the setting of m atchin gs lea ving it an interes ting proble m to understand under what conditions can one obtain upper bounds on γ e s. 1.1 Our Contribution W e ﬁrst sho w that good enough succinct represe ntation s ex ist for m ax-ent ropy distrib ution s. Subsequen tly , we gi v e an algorithm that compu tes arbitrar ily good appr oximatio ns to max-ent rop y distrib ution s giv en θ and access to a suit able counting orac le for M . Our algorithm is ef ﬁcient whene ve r the correspondi ng count- ing oracle is efﬁcie nt. M oreo ve r , the cou nting oracle can be approximate and/o r rando mized. This allows us to lev erage a varie ty of a lgorith ms dev elo ped for se veral #P-ha rd probl ems to g i ve algo rithms to co mpute max-entr opy distrib utions . Consequent ly , we obtain sev eral new and old resul ts abo ut con crete algo rithms to compute max-entro py dis trib uti ons. Interestin g e xamples for which we can use pre-e xis ting coun ting orac les to obtain max-entrop y distrib utions inc lude spanning trees, matching s in general graphs, perfec t matchings in bipartite grap hs (usi ng th e algorithm fr om [21]) and subtrees of a rooted tree . The consequen ce for perfect matching s in biparti te graphs makes the algorit hmic strate gies for TS P/A TSP mentioned earlier computa- tional ly feasible, see Section 4. In the rev erse direction we show that if one can solve, ev en approximatel y , 4 W e note that MCMC methods efﬁciently sample (and count) given a ﬁxed γ , usually for γ = 1 corresponding to the uniform distribution. T he goal in our prob lem is to ﬁnd an γ that maximizes the entropy . In f act, gi ven a γ , problem spe ciﬁc MCMC methods can be used to generate a random sample from M according to the product distribution corresponding to γ . 4 the con vex optimizat ion problem of computing the max-entrop y distr ib ution , one can obtain such counting oracle s. This establis hes an equi v alenc e between counting and computing m ax-ent rop y distrib utions in a genera l setting . As a corollary , w e obta in tha t the problem of computing max-en trop y distrib utions over per- fect match ings in gene ral graphs is equi valen t to the, hither to unrelated, problem of app roximate ly counting perfec t matchings in a general graph. 1.1.1 Infor mal Statement of Our Results Before we describe our results a bit more technically , we introduc e some basic notation. For M ⊆ { 0 , 1 } m , let P ( M ) denote the con vex hull of all M where each M ∈ M is thoug ht of as a 0 / 1 vector of length m , denote d 1 M . Thus, gi ve n a θ , for the max-entrop y pro gram to ha ve any solution , θ ∈ P ( M ) . Since we are concer ned with the case when M is gi ven implicitly and may be of exponent ial size, we no longer hope to solv e the max-entrop y con ve x p rogram dire ctly since that may require e xpone ntiall y many vari ables, one for each M ∈ M . T hus, we work w ith the dual to the max-entrop y con v ex program. The dual has m va riables and, if θ is in the relati v e interior of P ( M ) , the optimal dual solu tion can be used to describe the opti mal soluti on to the max-entr opy con vex prog ram, w hich is a product distrib ution. In fact, we assume one can put a ball of radius η around θ and it still remains in the interior of P ( M ) . Importantly , our algorith m requir es access to a gener alize d countin g oracle for M , which gi v en γ can compu te ∑ M ∈ M , M ∋ e ∏ e ′ ∈ M γ e ′ for all e ∈ [ m ] and also the sum ∑ M ∈ M ∏ e ∈ M γ e . W e also conside r the case w hen the oracle is approxima te (possi bly randomized) and for a gi ven ε , can out put the sums abov e up to a multiplicati ve error of 1 ± ε . The follo w ing is the ﬁrst main result of the paper , stated informally here. Theor em 1.1 (Coun ting Implies Optimization, S ee Theorems 2.6 and 2.8) Ther e is an algorit hm w hich, given access to a gener alize d (appr oximate) counting ora cle for M ⊆ { 0 , 1 } m , a θ which is pr omised to be in the η -interi or of P ( M ) and an ε > 0 , outputs a γ suc h that its corr espo nding pr oduct pr obability distrib ution p is such that H ( p ) ≥ ( 1 − ε / η ) H ( p ⋆ ) and for every e ∈ [ m ] , | P M ← p [ e ∈ M ] − θ e | ≤ ε . Her e , p ⋆ is the max-ent r opy distrib utio n corr espo nding to θ . The number of calls the algorithm m ake s to the oracle is bounded by a polynomial in the input size, ln 1 / η and ln 1 / ε . A us eful setting for η and ε to k eep in min d is 1 / m 2 and 1 / m 3 respec ti vel y . The bi t-leng ths of the in puts to th e counti ng oracle are polyno mial in 1 / η and, hence, the running time of our algorithm depends polyno mially on 1 / η . If the gen eralize d counting oracle is ε -appro ximate, the same guarantee holds . Note that for many approx imate cou nting oracles, the dependen ce on ε on their running time is a polyno mial in 1 / ε . Hence, in this case the running time depends polynomia lly on 1 / ε . Finally , note that this result can be easily gener- alized to obtain algorithms for the problem of ﬁ nding the distrib ution that minimize s the Kullback- Leibler di ve r gence from a giv en produc t distrib ution subject to the mar ginal constra ints, see Remark 2.12. At a very high l e vel , the algorithm in this th eorem is obtain ed by applyin g the fra me work of the ellip soid algori thm to the dual of the max-entr opy con v ex program. While it is more con v enien t to work with the dual since it has m varia bles two issues aris e: The domain of optimization bec omes unconstrai ned and the separa tion oracle requires the ability to compute (possib ly expone ntial sums) ove r subsets of M . While the counting oracles can be adapted to compute expo nentia l sums, the unbounded ness of the domain of optimiza tion is an important problem. One of the technical results in the proof of the theorem abov e is structural and sho ws that this dual optimiza tion proble m has an optimal solution in a box of size m / η when θ is in the η -interi or of P ( M ) , see Theorem 2.7. Since γ e s are expo nentia l in the respecti ve dual variab les, there is an approximatio n ˜ γ to the optimal solution to the max-entro py prog ram, when θ is in the η -interi or of P ( M ) , such that the number 5 of bits needed to represe nt each ˜ γ e is at most m / η . Such a result has been obtained for the special case of spann ing trees by [1] and for matchings in a general graph by [25]. Giv en that counting algorithms for many proble ms are still elusiv e, one may ask if they are really nec- essary to compute m ax-ent rop y distrib ution s. The ﬁ nal result of this paper answers this question in the af ﬁrmati ve and est ablishe s a con vers e to Theorem 1.1. Theor em 1.2 (Op timizing Implies Counting, see Theorem 2.11) Ther e is an algorithm, which given or - acle access to an algorit hm to compute an ε -appr oximatio n to the m ax-ent r opy con ve x pr og ram for an η -interi or point of P ( M ) , and a separ ation oracle for P ( M ) , can compute a number Z such that ( 1 − ε ) | M | ≤ Z ≤ ( 1 + ε ) | M | . The number of calls made to the m ax-entr opy orac le is polyno mial in the input size and 1 / ε . This result can be exte nded to obtain genera lized counti ng oracles , see Remark 2.12. F or all polytopes of interest in this paper , sepa ration oracles are known, see Section 3. Moreov er , this result continue s to hold ev en when the separation oracle is approx imate, or w eak . As a coroll ary , using a separatio n oracle for the perf ect matching polyto pe for general gra phs [7, 31], we obt ain that an algorithm to compute a good- enoug h appro ximation to the m ax-ent rop y distrib uti on for an y θ in th e p erfect mat ching poly tope of a graph G implies an FP RAS to cou nt the number of perfect matchings in the same graph. 1.2 T echnical O verview The startin g point for our resul ts is the follo wing dual to the m ax-ent ropy con vex program: inf λ h λ , θ i + ln ∑ M ∈ M e −h λ , 1 M i , (1) where 1 M is the indic ator vector for M . When θ lies in the relat i ve inter ior of P ( M ) , then strong duality holds between the primal and the dual. 5 Hence, it follo ws from the ﬁrst order conditions on the optimal soluti on pair ( p ⋆ , λ ⋆ ) that p ⋆ M ∝ e −h λ ⋆ , 1 M i for each M ∈ M . Suppos e w e kno w that 1. λ ⋆ is bounded , i.e., k λ ⋆ k ≤ R for some R , and 2. there is a generalized counting oracle that allows us to compute the gradien t of the objec ti ve function f ( λ ) def = h λ , θ i + ln ∑ M ∈ M e −h λ , 1 M i at a speciﬁed λ . The grad ient at λ , deno ted ∇ f ( λ ) , turns ou t to be a vec tor whose coordinat e corresp ondin g to e ∈ [ m ] is θ e − ∑ M ∈ M , M ∋ e e −h λ , 1 M i ∑ M ∈ M e −h λ , 1 M i . Then, using the machiner y of the ellipsoid method , it follo w s r elati v ely straight-for wardly that, for a ny ε , we can compu te a po int λ ◦ such th at f ( λ ◦ ) ≤ f ( λ ⋆ ) + ε with at most a poly ( m , ln 1 / ε , ln R ) calls to the co untin g oracle . Note that since the numbers fed into the countin g oracl e are of the form e − λ e , for each e ∈ [ m ] , the runnin g time of the counting oracle depends polynomially on R rather than ln R . Thus, we need R to be polyn omially bounded. Hence, the question is: Can we bound k λ ⋆ k ? 5 When θ lies on the boundary of P ( M ) , the inﬁmum in the dual is not attained for any ﬁnite λ . 6 Indeed , a signiﬁcant part of the work done in [1] wa s to boun d this quantity fo r spanning tree s and in [25] for (not necess arily perfe ct) matchings in graphs. A priori it is not clear why there should be any such bound. In fact, we observ e that if P ( M ) lies in some low-dimen sional af ﬁne space in R m , the opti mal soluti on is not unique and can be shifted in any direction normal to the space, see Lemma 2.5 . Thus, one can only hope for the optimal solutio n to be bound ed once one imposes the restriction that λ ⋆ lies in the linear space corres pondin g to the af ﬁne sp ace in which P ( M ) liv es. One thing that w orks in ou r fa v or is that we ha v e an absolu te uppe r bound (independen t of θ ) o n the opt imal v alue of f ( · ) , namely m . R oughl y , this is beca use at optimali ty this quan tity is an e ntrop y ov er a d iscrete set of si ze at mos t 2 m . This implie s that for a ll M ∈ M , h λ ⋆ , θ i − h λ ⋆ , 1 M i ≤ m . (2) Using this, the η -interi ority of θ , and the fact that the diameter of P ( M ) is at most √ m , it can then be sho wn that max M ∈ M h λ ⋆ , 1 M i − min M ∈ M h λ ⋆ , 1 M i ≤ m √ m η . Let us sho w how this immedi ately implies a bound on R when M corre spond s to all the spanni ng trees of a graph with no brid ge. Suppose T and T ′ are two t rees such that T ′ is obtai ned from T by deletin g an edge e and adding an edge f , then, |h λ ⋆ , 1 T i − h λ ⋆ , 1 T ′ i| = | λ ⋆ e − λ ⋆ f | ≤ m √ m η . Thus, unless the graph has a bridge, this implies | λ ⋆ e − λ ⋆ f | ≤ m 2 √ m η for all e , f ∈ G . Howe ver , attempting a similar combinatorial arg ument for perfect matchings in a bipartite graph, where we do not ha ve this e xch ang e prope rty , the bound is worse by a fact or of 2 m . Thus, we abandon combinato rial approaches and appeal to the geometric implication of (2) to obtain the desired polyn omial bound ing box for all P ( M ) . The argume nt is surprising ly simple and we sk etch it here. One way to interpret (2) is that the vecto r λ ′ def = − λ ⋆ / m has inner prod uct at m ost 1 with v − θ for all v ∈ P ( M ) . For now , neglecti ng the fact that P ( M ) might live in a lower dimensio nal afﬁne space and that 0 may not b e in P ( M ) , this implies tha t λ ′ is in the pola r of P ( M ) . Howe ve r , since θ is in the η -interi or of P ( M ) , P ( M ) contains an ℓ 2 ball of radius at least η inside it. Thus, the polar of P ( M ) m ust be contained in the polar of this ball, which is nothin g but an ℓ 2 ball of radi us 1 / η . This gi v es a bou nd of 1 / η on the ℓ 2 norm of λ ′ and, hence, a bound of m / η on the norm of λ ⋆ as desired. Thus, the ellipsoid method can be used to obtain a solution λ ◦ such that f ( λ ◦ ) ≤ f ( λ ⋆ ) + ε . Why should this app roximate bound imply that the produc t distrib ution obtained using λ is clos e in the mar gi nals to θ ? The observ ation here is that f ( λ ◦ ) − f ( λ ⋆ ) is the Kullback -Leibler (KL) div er gen ce between the two distrib utions. This implies a bound of √ ε on the m ar ginal s using a stan dard uppe r bound on the total v ariatio n distance in terms of the K L-di ve r gence. In the case when w e hav e access only to an approximate counting oracle for M , things are more com- plicate d. Roughly , the approximate coun ting oracle tran slates to having access to an app roximate gradient oracle for the function f ( · ) and one has to ensure that λ ⋆ is not cut-of f during an iteration. T echnicall y , we sho w that this does not happen and, hence, approximate counting oracles are equally useful for obta ining good approxi mations to max-entrop y distrib utions. Finally , not e that the (projected -)grad ient descent approach (see [29]) can also be shown to con v er ge in polyn omial time and, possibly , can result in pra ctical algorithms for computin g max-ent rop y distrib utions. In the case w hen the counting oracle is approx imate, one has to deal with a noisy gradie nt and the solution turns out to be similar to the one i n the ellipso id method-based algorithm in the presenc e of an approx imate counti ng orac le. In addition to a bound on k λ ⋆ k , one needs to bound the 2 → 2 norm of the gra dient of f . 7 While we omit the details of the gradient descen t based- algorit hm, we show that k ∇ f k 2 → 2 is polynomially bound ed, see Remark 2.9 and Theore m C. 1 in Appen dix C. This bound may be of independ ent interes t. W e no w gi v e an ov ervie w of t he re v erse dire ction: Ho w to c ount appr oximatel y giv en the ab ility to solve the max-entrop y con ve x program for any point θ in the η -interi or of P ( M ) . W e start by noting that if we consid er θ ⋆ def = 1 | M | ∑ M ∈ M 1 M , then the optimal v alue of the con vex p rogram is ln | M | . Thus, gi ven acc ess t o this vertex -centro id of P ( M ) one can get an estimate of | M | . Howe ve r , compu ting θ ⋆ can be sho wn to be as hard as counting | M | , for instanc e, when M consists of perfect m atchin gs in a bipart ite graph, see [10]. W e byp ass this obstacle and apply the ellipsoid algorith m on the follo wing (con ve x-pro gramming) problem sup θ inf λ f θ ( λ ) where f θ ( λ ) is the functio n in (1) and where we ha ve chosen to highlig ht th e dependence o n θ . T he ellipsoid algori thm pro poses a θ and expects the max-e ntrop y oracle to output an approx imate v alu e for inf λ f θ ( λ ) . This raises a few issues: First, gi ven our resu lt on optimization via countin g, it is unfair to assume that we ha ve such an oracle that works for all θ , irrespec ti ve of the inte riority of θ in P ( M ) . Thus, we allow queries to the oracle only w hen θ is sufﬁcie ntly in the interio r of P ( M ) . Note that our algorith m for computing the max-entro py dist rib utio n in our ﬁ rst theorem works under these gua rantee s. This requires , in addition, a separatio n oracle for checking w hether a point is in the η -interi or of P ( M ) . W e construct such an η - separa tion oracle from a separatio n oracle for P ( M ) . The latter , giv en a point, either says it is in P ( M ) or return s an inequality valid for P ( M ) but violat ed by this point. The second issue is that θ ⋆ , our tar get point, m ay not be in the η -interi or of P ( M ) . In fact, there may not be any poin t in the η -interi or of P ( M ) when η is 1 / poly ( m ) . Howe v er , under reasona ble conditions on P ( M ) , which are satisﬁed for all polytopes we are interes ted in, w e can sho w that there is a point θ • in the η -interi or of P ( M ) . This allows us to reco v er a good enough estimate of | M | . Thus (the way we apply the frame wor k of elli psoid alg orithm), w e are able to reco v er a po int close enough to θ • by doing a binary sea rch on the tar get value of | M | . A s in the forward directio n, becaus e we assume that the m ax-entr opy algorithm is appro ximate, w e must ar gue that θ • is not cut-o f f during any itera tion of the ellipsoid algorithm. W e conclude this o ve rvie w with a couple of remarks. First, unlik e our results in the forward directio n, we cannot replace the ellips oid method based algorithm by a gradient descent approach . The reason is that we only ha ve a separation oracle to detect wheth er a point is in P ( M ) or not. Second, we can extend our result to sho w that, usi ng a max-en trop y oracle, one can obtain g ener alize d approximat e counting oracles, see Remark 2.12. 1.3 Organization of the Rest of the Paper The rest of the paper is or ganiz ed as follo ws. In S ection 2, we formally deﬁne the objects of interest in our paper inc luding t he con ve x program for optimizing th e m ax-ent rop y distrib uti on and its dual. W e al so deﬁne counti ng orac les that are neede d fo r s olving t he co n vex pro gram. W e th en fo rmally s tate our results and giv e a fe w lemmas stating properties about the optimal and near -op timal solution to the dual of the max-entrop y con ve x progr am. In Section 3 we provi de ex amples of some combinatoria l polytope s to which our resul ts apply . In Section 4, we sho w how certain algorithmic approaches for approximatin g the symmetric and the asymmetric trav elin g salesma n proble m are feasible as a result of one of the m ain results of this paper . In Section 5 , we prove that there is an optimal solutio n to the dual of the max-entrop y con vex program that is contai ned in a ball of s mall ra dius aro und the o rigin. In Section 6, w e u se th is bo und on the optimal solutio n to sho w that counting oracles, both exact and approx imate, can be used to optimize the con vex program via the e llipsoi d algori thm. In Sect ion 7, we sho w the oth er di rection of the red uction and giv e an algo rithm that can approximately count giv en an oracle that can approx imately solv e the max-entro py con vex program. Standard proofs are omitted from the m ain body and appear in A ppend ix A. In Append ix B we show ho w 8 genera lized counting oracles can be obtained via max-entro py oracles. Here, we al so intro duce the progra m for minimizing the KL-div er gence with respect to a ﬁxed distri b ution . Finally , in Appendix C, we giv e a bound on k ∇ f k 2 → 2 . 2 Pr eliminaries 2.1 Notation In this sect ion we introdu ce the general notation used throu ghout the paper . V ector s are denote d by plain letters such as a , b , c , d , x , y , u and v and are ov er R m . W e also use the G reek letters λ , θ , ν and γ to denote vec tors. 0 is sometimes used to denote the all-zero vecto r and the usage should be clear from contex t. For reason s emanating from applications , we choose to index the set [ m ] by e . Hence, the component s of a vec tor are denoted by x e , λ e , θ e , etc. W e also use notation such as x 0 , x 1 , . . . , x t and λ 0 , λ 1 , . . . , λ t to deno te vec tors. It sho uld be cle ar from the con tex t tha t these are v ectors and not their components. The Greek letters η , α , β , ε , ζ are u sed to deno te positi ve real number s. For a set M ∈ { 0 , 1 } m , let 1 M denote the 0 / 1 indicator vec tor for M . W e use 1 M ( e ) to denote its e -th component. Thus, 1 M ( e ) = 1 if e ∈ M and 0 otherwise. The letters p , q and r are rese rve d to de note pr obabil ity distrib utio ns over { 0 , 1 } m . Of s pecial inte rest are pro duct probab ility distrib utions where, for M ∈ { 0 , 1 } m , the probab ility of M is pro portion al to ∏ e ∈ M γ e for some vec tor γ . W e denote such a probab ility distrib utio n by p γ to emphasize its dependence on γ , and let p γ M denote the probab ility of M . Addit ionally h x , y i denote s the inn er produ ct of two vec tors, k · k denot es the Euclidea n norm and k x k ∞ def = max e ∈ [ m ] | x e | . W e also use the notation λ ( M ) to denote h λ , 1 M i for a vec tor λ and M ∈ { 0 , 1 } m . | S | denotes the cardinalit y of a set. 2.2 Combinatorial Polytopes, Separation Oracles, C o unting O racles and Interiority The pol ytopes of inter est arise as con ve x hulls of s ubsets of { 0 , 1 } m for some m . For a set M ⊆ { 0 , 1 } m , the corres pondin g polytope is denoted by P ( M ) . Thus, P ( M ) def = ( ∑ M ∈ M p M 1 M : p M ≥ 0 , ∑ M ∈ M p M = 1 ) . Another w ay to describe P ( M ) is to g i ve a maximal s et of lin early indep enden t equalit ies satisﬁed by a ll its ver tices, and to list the inequ alities that deﬁne P ( M ) . Thus, P ( M ) can be describ ed by ( A = , b ) and ( A ≤ , c ) such that ∀ M ∈ M A = 1 M = b and A ≤ 1 M ≤ c . While the former set cannot be more than m , the latter set can be e xpone ntial in m and we do not assume that ( A ≤ , c ) is giv en to us expl icitly . Separation Oracles. On occasion we requir e an acce ss to a separ ation oracle for P ( M ) of the follo wing form: Giv en λ ∈ R m satisfy ing A = λ = b , the separatio n ora cle either says that A ≤ λ ≤ c or outputs an inequa lity ( a ′ , c ′ ) such that h a ′ , λ i > c ′ . In fa ct, such an oracle is often termed a str ong separa tion oracle. 6 Counting O racles . The standar d counting problem associated to M is to determin e | M | , i.e., the number of verti ces of P ( M ) . W e are interes ted in a more general counting problem associated to M where there is 6 In o ur resu lts that depend on access to a strong separation oracle, we can relax the gua rantee to that of a weak separation oracle. W e omit the details. 9 a weight λ e for eac h e ∈ [ m ] and the w eight of M under this m easure is e − λ ( M ) . A ge nera lized exact cou nting ora cle for M then out puts the follo wing two quant ities: 1. Z λ def = ∑ M ∈ M e − λ ( M ) and 2. for eve ry e ∈ [ m ] , Z λ e def = ∑ M ∈ M , M ∋ e e − λ ( M ) . The oracle is ass umed to be ef ﬁcient, as in i t runs in time pol ynomial in m and bits nee ded to represen t e − λ e for any e ∈ [ m ] . 7 While efﬁcien t gener alized exac t count ing oracle s are kno wn for some setting s, for many problems of interes t the exact cou nting pr oblem i s #P-hard. Howe ver , often, f or thes e #P-ha rd pro blems, ef ﬁcient oracles which can compute arbitrarily good approxi mations to the qua ntities of interest are kno wn. Thus, we hav e to relax the not ion to gene rali zed appr oximate counting oracle s w hich are possibly randomiz ed. Such an oracle , gi ven ε , α > 0 and weigh ts λ ∈ R m , returns e Z λ and e Z λ e for each e ∈ [ m ] . The follo w ing guarantee s hold with probab ility at least 1 − α , 1. ( 1 − ε ) Z λ ≤ e Z λ ≤ ( 1 + ε ) Z λ and 2. for eve ry e ∈ [ m ] , ( 1 − ε ) Z λ e ≤ e Z λ e ≤ ( 1 + ε ) Z λ e . The runn ing time is p olyno mial in m , 1 / ε , log 1 / α and t he nu mber of bits needed to represent e − λ e for a ny e ∈ [ m ] . For the sake of readability , w e ignore the f act that approximat e cou nting ora cle may be rando mized. The statemen ts of the th eorems tha t use randomized ap proximat e counting o racles can be modiﬁed ap propri ately to include the dependen ce on α . Note that if the problem at hand is self-reduci ble, then havin g access to an orac le that outp uts an appro ximatio n to just Z λ suf ﬁces. W e omit the deta ils and the reader is refe rred to a discuss ion on self-redu cibility and counting in [32]. Finally , it can be sho wn that, in our setting, the exi stence of a generali zed (e xact or appro ximate) counting oracle is a stronger requiremen t than the e xisten ce of a separati on oracle. Interior of the Po lytope. The dimens ion of P ( M ) is m − rank ( A = ) ; the polytope restricted to this af ﬁne space is full dimen sional . Since we work with pol ytopes that are not full dimens ional, we extend the notion of the interi or of the polytope P ( M ) and use the follo wing deﬁnition. Deﬁnition 2.1 F or an η > 0 , a point θ is said to be in the η -interi or of P ( M ) if  θ ′ : A = θ ′ = b , k θ − θ ′ k ≤ η  ⊆ P ( M ) . W e say that θ is in the interior of P ( M ) if θ is in the η -interi or of P ( M ) for some η > 0 . W e are be interes ted in the case where η ≥ 1 poly ( m ) . Hence, it is natural to ask if for ev ery P ( M ) , there is a point in its 1 poly ( m ) -interi or . The follo wing lemma, w hose proo f appears in Section 7, asserts that the answer is yes if the entrie s of A ≤ and c are rea sonab le (as is the case in all our applicatio ns). Lemma 2.2 Let M ⊆ { 0 , 1 } m and P ( M ) =  x ∈ R m ≥ 0 : A = x = b , A ≤ x ≤ c  be such that all the ent ries in A ≤ , c ∈ 1 poly ( m ) · Z and their absolut e values ar e at most poly ( m ) . Then ther e exists a ˜ θ ∈ P ( M ) such that ˜ θ is in the 1 poly ( m ) -interi or of P ( M ) . At this point, if one wishes, one can look at Section 3 for some examples of combinato rial polytop es we consid er in this paper . 7 T o deal with issues of irrationality , it sufﬁces to obtain the ﬁrst k bits of Z λ and Z λ e in time polynomial in k and m . 10 2.3 The Maxi mum Entrop y Con ve x Program In this section we present the con ve x program for computing max-e ntrop y dist rib utio ns. Let P ( M ) be the polyto pe corresp onding to M . While we do not car e a bout whet her we hav e an ora cle fo r M in t his se ction, the notion of interi ority is important. For an y point θ ∈ P ( M ) , by deﬁnition, it can be w ritten as a con ve x combination of vertice s of P ( M ) , each of which is i ndicat or vecto r for some M ∈ M . Each such con ve x combi nation is a prob abilit y distrib u- tion over M ∈ M . Of centra l inte rest in this paper is a way to ﬁnd the con v ex combination that maximizes the entrop y of the und erlyin g probabil ity distrib ution. Giv en θ , we can express the problem of ﬁndin g the max-entr opy distrib ution over the vertices of P ( M ) as the progr am in Figure 1. H ere 0 ln 1 0 is assumed to sup ∑ M ∈ M p M ln 1 p M s.t. ∀ e ∈ [ m ] ∑ M ∈ M , M ∋ e p M = θ e ∑ M ∈ M p M = 1 ∀ M ∈ M p M ≥ 0 Figure 1: Max-Entr opy Program for ( M , θ ) be 0 . This e ntrop y function is easi ly seen to b e conca ve and, hence, maximizin g it is a con v ex prog ramming proble m. The follo wing folklore lemma, whose proof appears in Appendix A.1, sho ws that if θ is in the interio r of P ( M ) , then the max-entrop y distrib utio n corresp onding to it is unique and can be succinctly repres ented. Recall the notation that for λ : [ m ] 7→ R and M ∈ M , λ ( M ) def = h λ , 1 M i = ∑ e ∈ M λ e . Lemma 2.3 F or a point θ in the interior of P ( M ) , the r e exis ts a uniq ue distrib ution p ⋆ which attains the max-entr opy while satisfyin g ∑ M ∈ M p ⋆ M 1 M = θ . Mor eo ver , ther e e xists a λ ⋆ : [ m ] 7→ R such that p ⋆ M ∝ e − λ ⋆ ( M ) for each M ∈ M . As we obse rve soon, while p ⋆ is unique, λ ⋆ may not be. First, we record the follo wing deﬁnitions about such produ ct distrib ution s. Deﬁnition 2.4 F or any λ ∈ R m , we deﬁne the distrib ution p λ on M such that ∀ M ∈ M p λ M def = e − λ ( M ) Z λ wher e Z λ def = ∑ N ∈ M e − λ ( N ) . The marg inals of such a distrib uti on ar e denoted by θ λ and deﬁned to be θ λ e def = ∑ M ∈ M , M ∋ e p λ M = Z λ e Z λ wher e Z λ e def = ∑ M ∈ M , M ∋ e e − λ ( M ) . The proof of this lemma relies on establish ing that strong duality holds for computing the max-entro py distrib ution with margina ls θ for the con v ex program in Figure 1 . The dual of this program appears in Figure 2. Thus, if θ is in the inter ior of P ( M ) , then there is a λ ⋆ such that p ⋆ = p λ ⋆ and f θ ( λ ⋆ ) = H ( p ⋆ ) . Note that λ ⋆ may not be unique and, ﬁ nally , that an important property of the dual objec ti ve function is that f θ does not ch ange if we sh ift by a v ect or in the span of th e rows of A = . T his is c apture d in the fo llo wing lemma whose proof appear s in A ppendi x A .2. 11 Lemma 2.5 f θ ( λ ) = f θ ( λ + ( A = ) ⊤ d ) for any d . Thus, w e can restric t our search for the optimal solution to the set { λ ∈ R m : A = λ = 0 } . In this set there is a unique λ ⋆ which achie ves the optimal value since the constraint s ( A = , b ) are assumed to form a maximal linearl y independen t set . W e refer to this λ ⋆ as the uniqu e soluti on to the dual con vex program. inf f θ ( λ ) def = ∑ e ∈ [ m ] θ e λ e + ln ∑ M ∈ M e − λ ( M ) s.t. ∀ e ∈ [ m ] λ e ∈ R Figure 2: Dual of the Max-Entro py Program for ( M , θ ) 2.4 Formal Statement of O ur Results Our ﬁ rst result shows that if one has access to an generali zed exact counti ng oracle then one can indeed compute a good appro ximation to the m ax-ent rop y distrib ution for speciﬁed margin als. Theor em 2.6 Ther e exis ts an alg orithm that, given a maximal set of linearly independ ent equalit ies ( A = , b ) and a gener alized ex act counting oracle for P ( M ) ⊆ R m , a θ in the η -interi or of P ( M ) and an ε > 0 , r etu rns a λ ◦ suc h that f θ ( λ ◦ ) ≤ f θ ( λ ⋆ ) + ε , wher e λ ⋆ is the optimal soluti on to the dual of the max-entr opy con ve x pr ogr am for ( M , θ ) fr om F igu r e 2. Assuming that the gene ral ized exac t countin g oracle is polynomia l in its input par ameter s, the runnin g time of the algorith m is polynomial in m, 1 / η , log 1 / ε and the number of bits needed to re pr esent θ and ( A = , b ) . The proof of this theorem follo ws from an application of the ellipsoid algorit hm for minimizing the dual con ve x program. At a ﬁrst glance, it may seem enough to show that k λ ⋆ k ≤ 2 poly ( m ) η since the number of iterati ons of the ellipsoid algorithm depends on log k λ ⋆ k . Unfortunately , this is not enough since each call to the oracle with input λ tak es time polynomia l in the number of bits needed to represe nt e − λ e for any e ∈ [ m ] . W e sho w the follo wing theorem w hich pro vides a polyno mial bound on k λ ⋆ k . Theor em 2.7 Let θ be in the η -interi or of P ( M ) ⊆ R m . Then ther e ex ists an optimal solutio n λ ⋆ to the dual of the max-entr opy con ve x pr ogra m suc h that k λ ⋆ k ≤ m η . W e speciﬁcally note that the pro of of this th eorem needs that λ ⋆ satisﬁes A = λ ⋆ = 0 . Combinatoriall y , it is an interes ting open prob lem to see one can get such a bound depending only on 1 / η . Next we general ize T heo- rem 2.6 to polytopes where only an approximate coun ting oracle e xists, for example, the perfect match ing proble m in bipartite graphs. While we state this theore m in the context of deterministic counting oracles , it holds in the randomiz ed setting as well. Theor em 2.8 Ther e e xis ts an algorithm, that given a maximal set of linearl y indepen dent equalitie s ( A = , b ) , a gen era lized appr oximate coun ting oracle for P ( M ) ⊆ R m , a θ in η -interi or of P ( M ) and an ε > 0 , r eturn s a λ ◦ suc h that f θ ( λ ◦ ) ≤ f θ ( λ ⋆ ) + ε . Her e λ ⋆ is an opt imal solut ion to the dual of the max-entr opy con vex pr ogr am for ( M , θ ) . Assuming that the gener alized appr oximate coun ting oracle is polynomial in its input parameter s, the running time of the algori thm is polynomial in m , 1 / η , 1 / ε and the number of bits needed to re pr esent θ and ( A = , b ) . 12 It can be sho wn that once we ha ve a solu tion λ ◦ to the dual con ve x program such that f θ ( λ ◦ ) ≤ f θ ( λ ⋆ ) + ε as in Theorems 2.6 and 2.8, one can sho w that the marg inals obtained from the distrib ution corresp ondin g to λ ◦ is close to that of λ ⋆ (which is θ ), i.e.,   θ λ ◦ − θ   ∞ ≤ O ( √ ε ) . S ee A ppend ix A.2, and in parti cular Corollary A.5 for a proof. Remark 2.9 W e can also obtain pr oofs of Theor ems 2.6 and 2.8 by applying the frame work of pr ojec ted gra dient desc ent. (See Section 3.2.3 in [29] for details on the gradient des cent method.) F or the gr adient descen t method to be polynomial time, one would need an upper bound on k λ ⋆ k and k ∇ f k 2 → 2 . The ﬁrs t bound is pr ovided by Theor em 2.7 and the second bound is pr oved in T heor em C . 1 in Appendix C. W e have cho sen the ellipsoid method-b ased pr oofs of Theor ems 2.6 and 2.8 since the ellipsoid method is r equir ed in the pr oof of Theor em 2.11. Our ﬁ nal theorem prov es the reve rse: If one can compute good approx imations to the max-entrop y con v ex progra m for P ( M ) for a gi ven margin al vector , then one can compute good approx imations to the number of ve rtices in P ( M ) . First, we need a notion of a max-entr opy oracle for M . Deﬁnition 2.10 An approx imate max-entrop y oracle for M , given a θ in the η -interi or of P ( M ) , a ζ > 0 , and an ε > 0 , either 1. asserts that inf λ f θ ( λ ) ≥ ζ − ε or 2. re turns a λ ∈ R m suc h that f θ ( λ ) ≤ ζ + ε . The oracle is assu med to be ef ﬁcie nt, i.e., it runs in time polyno mial in m, 1 / ε , 1 / η and the numb er of bits needed to re pr esent ζ . This is consist ent with the algorithms giv en by Theorem 2.6 and 2.8. Theor em 2.11 T her e exists an algo rithm that, giv en a maximal set of line arly independen t equa lities ( A = , b ) and a separa tion oracle and an appr oximate optimization oracle for M as abov e, r eturns a e Z such that ( 1 − ε ) | M | ≤ e Z ≤ ( 1 + ε ) | M | . Assuming that the runnin g times of the sepa rati on orac le and the appr oximate max-entr opy oracle ar e polynomial in their r especti ve inpu t par amete rs, the run ning time of the algorithm is bounded by a polynomial in m , 1 / ε and the number of bits needed to re pr esent ( A = , b ) . Analogo usly , one can easily formul ate and prov e a randomized ve rsion of Theorem 2.11, w e omit the de- tails. As an important corollary of this theorem, if one is abl e to efﬁcient ly ﬁnd approxi mate max-en trop y distrib utions for the perfect matching polytop e for general graphs, then one can approx imately count the number of p erfect matchi ngs they con tain. Both prob lems ha ve lo ng been ope n and this res ult, in partic ular , relates their hardn ess. Remark 2.12 One may ask if Theor em 2.11 can be str eng thened to obtain gener alized appr oximate count- ing oracles fr om m ax-ent r opy orac les. The questio n is natu ral since Theor ems 2.6 and 2.8 assume access to genera lized counting oracl es. The answer is yes and is pr ovided in Theor em B.2 in Appendix B . It turns out that one needs access to a genera lized max-entr opy oracle , an oracl e that can compute the distrib ution that minimizes the KL -diver g ence with r espec t to a ﬁxed pr oduct distrib ution and a given set of m ar ginals. These lat ter pr ogr ams ar e sho wn, in Appen dix B, to be no mor e gener al than max-e ntr opy pr ogr ams. In fact , analo gs of Theor ems 2.6 and 2.8 can be pr oved for min-KL-diver gence pr ogr ams rather than max-entr opy pr ogr ams, see Theor em B.5. 13 2.5 The Ellipsoi d Algo ri thm In this section we revie w the basics of the ellipso id algorithm. T he ellipso id algorithm is used in the proofs of our equi v alen ce between optimization and coun ting: Both in the proof of Theore ms 2.6 and 2.8 and in the proof of Theore m 2.11. Consider the follo wing optimizatio n proble m where g ( · ) is con vex and h i ( · ) are af ﬁne funct ions. inf g ( λ ) s.t. ∀ 1 ≤ i ≤ k h i ( λ ) = 0 λ ∈ R m W e assume that g is differ entiab le ev erywh ere and that its gradien t, den oted by ∇ g , is deﬁned ev erywher e. In our applica tion, for a polyto pe P ( M ) and a θ in the η -interi or of P ( M ) , g = f θ , the objecti v e functio n in th e dual pro gram of Figur e 2. The h i ( · ) s are th e constr aints A = λ = 0 , w here ( A = , c ) is the maximal set of linearl y independen t equalities satisﬁed by the vertic es of M . Thus, as noted in Lemma 2.5, we can restr ict our search for the optimal solutio n to the set K which is deﬁned to be K def = { λ ∈ R m : A = λ = 0 } . Note that 0 ∈ K . The ellipsoid alg orithm can be used to solve such a con ve x progr am under fairly general condit ions and we ﬁrst state a version of it needed in the proof of T heorem 2.6 . A crucia l requirement is a str ong ﬁr st-or der oracle for g which is a functi on such that gi v en a λ , outputs g ( λ ) and ∇ g ( λ ) . S ince we are only int erested in λ ∈ K , and we are gi v en the e qualit ies describing K exp licitly , we assume that we ca n projec t ∇ g ( λ ) to K . By abu se of notation, we denote the latter also by ∇ g ( λ ) . The follo wing theorem claims that if one is giv en access to a strong ﬁ rst-ord er oracle for g , one can use the ell ipsoid a lgorith m to obtain an approximately opt imal solu tion to the con ve x prog ram mentio ned abo v e. This stateme nt is easil y deriv able from [3] (Theorem 8.2.1). Theor em 2.13 G iven any β > 0 and R > 0 , the r e is an alg orithm w hich , give n a str ong ﬁrst -or der oracle for g , ret urns a point λ ′ ∈ R m suc h that g ( λ ′ ) ≤ inf λ ∈ K , k λ k ∞ ≤ R g ( λ ) + β sup λ ∈ K , k λ k ∞ ≤ R g ( λ ) − in f λ ∈ K , k λ k ∞ ≤ R g ( λ ) ! . The number of calls to the str ong ﬁrst-o r der oracle for g ar e bound ed by a polynomial in m, log R and log 1 / β . While we do not explicitly describe the ellipsoid algorithm here, we need the follo wing basic proper ties about min imum v olume enclosin g ellipsoi ds which forms the basis of th e e llipso id algorith m. A set E ⊆ R m is an ellip soid if the re e xists a vector a ∈ R m and a positi v e deﬁnite m × m -matrix A such that E = E ( A , a ) def = { x ∈ R m : ( x − a ) ⊤ A − 1 ( x − a ) ≤ 1 } . W e also denote V ol ( E ) to be the v olume enc losed by the ellipso id E . The follo wing theo rem f ollo ws from the L ¨ owner -Joh n Ellip soid. W e ref er the reader to [15] for more details. Theor em 2.14 G iven an ellipsoid E ( A , a ) and a half-spac e { x : h c , x i ≤ h c , a i } passing thr ough a ther e e xists an ellipsoid E ′ ⊇ E ( A , a ) ∩ { x : h c , x i ≤ h c , a i } such that V ol ( E ′ ) V ol ( E ) ≤ e − 1 2 m . In our applications of this theorem, we in fact need the ellipsoid to be in an af ﬁne space of dimension possib ly lower than m . The deﬁnitions and the theorem continu e to hold under such a setting . 14 3 Examples of Combinatorial Polytopes If one wishes, one can keep the follo wing combinator ial poly topes in mind while trying to understand and interp ret the results of this paper . The Sp anning T r ee Polytope. Gi ve n a graph G = ( V , E ) , let M def = n 1 T ∈ R | E | : T ⊆ E is a spann ing tree of G o . It follo w s from a result of Edmonds [9] that P ( M ) = n x ∈ R | E | ≥ 0 : x ( E ( V )) = | V | − 1 , x ( E ( S )) ≤ | S | − 1 ∀ S ⊆ V o where, for S ⊆ V , E ( S ) def = { e = { u , v } ∈ E : { u , v } ∩ S = { u , v }} and, for a subset of edges H ⊆ E , x ( H ) def = ∑ e ∈ H x e . Edmond [8 ] also sho ws the exist ence of a separati on oracle for this polytop e. A general ized exact counti ng oracle is known for this span ning tree polytope via K irchof f ’ s matrix-tr ee theorem, see [14]. The Perfect Matching Poly tope f or B ipartite Graphs. Giv en a bip artite graph G = ( V , E ) , let M def = n 1 M ∈ R | E | : M is a perfect matching in G o . It follo w s from a theorem of Birkhof f [13] that, when G is bipartite , P ( M ) = n x ∈ R | E | ≥ 0 : x ( δ ( v )) = 1 ∀ v ∈ V o where, for v ∈ V , δ ( v ) def = { e = { u , v } ∈ E } . Here, it can be sho wn that all the face ts, i.e., the deﬁning inequa lities, are one of the set of 2 m inequalitie s 0 ≤ x e ≤ 1 for all e ∈ [ m ] . The exact countin g probl em is #P-hard an d while a (randomize d) generali zed approximate counti ng oracle fo llo ws from a result of Jerru m, Sinclair and V igoda [21] for comput ing permanents. The Cycle Cover Poly tope for Dir ected Graphs. Giv en a directe d graph G = ( V , A ) , let M def = n 1 M ∈ R | A | : M is a cycle co v er in G o . A cycle cover in G is a collection of verte x disjoint directed cyc les that cover all the vertice s of G . The corres pondin g cycle co v er po lytop e is de noted by P ( M ) . T his po lytope is easil y seen to be a spec ial case of the perfect matching polytope f or bipartite graphs as follo w s. For G = ( V , A ) , construct a biparti te gr aph H = ( V L , V R , E ) where V L = V R = V . For e ach vertex v ∈ V w e ha ve v L ∈ V L and v R ∈ V R . T here is an edge betwee n u L ∈ V L and v R ∈ V R in H if and on ly if ( u , v ) ∈ A . Thus, there is a one-to- one correspon dence between cycle cov ers in G and p erfect m atching s in H . Hence , th e [ 21] algorithm gi ves a general ized appro ximate cou nting oracle in this case as well. The Perfect Matching Poly tope f or G eneral Graphs. Giv en a grap h G = ( V , E ) , let M def = n 1 M ∈ R | E | : M is a perfect matching in G o . A celeb rated result of E dmonds [7] states tha t P ( M ) =  x ∈ R | E | ≥ 0 : x ( δ ( v )) = 1 ∀ v ∈ V , x ( E ( S )) ≤ | S | − 1 2 ∀ S ⊆ V , | S | odd  . The separation oracle for this polytop e is non-tri vial and follo ws from the characte rizatio n resu lt of Ed- monds. A direct separatio n oracle was also giv en by Padber g and Rao [31]. Coming up with a counting oracle for this polytop e, e ven with uniform weights which counts the number of perfect matchings in a genera l graph, is a long-stand ing open problem. 15 4 New Algorithmic App r oaches f or the T ra veling Salesma n Proble m Max-entr opy distr ib ution s ove r spanning trees hav e been su ccessf ully app lied to obtain improve d algo rithms for the symmetric [30] as well as the asymmetric tra v eling salesman pro blem [1]. W e outlin e here a dif ferent algori thmic approa ch, using max-entrop y distri b utions ov er cycle co ve rs, which bec omes compu tation ally feasib le as a consequ ence of our results. Let us cons ider t he asymmetric tra vel ing s alesman problem (A TSP ). W e are gi ve n a comple te directed g raph G = ( V , E ) and co st function c : E → R ≥ 0 which satisﬁes the directed triang le inequalit y . The goal is ﬁnd a Hamiltonian cycle of smallest cost. F irst, we formulat e the followin g subtou r elimination linear program in Figure 3. min ∑ e ∈ E c e x e s.t. ∀ v ∈ V x ( δ + ( v )) = x ( δ − ( v )) = 1 ∀ S ⊆ V x ( δ + ( S )) ≥ 1 ∀ e ∈ E 0 ≤ x e ≤ 1 Figure 3: Subtou r E liminatio n L P for G = ( V , E ) and c Here, for a vertex v , δ + ( v ) is the set of directed edges going out of it and δ − ( v ) is the set of directed edges coming in to v . Let x ⋆ denote the optimal solut ion to this linear program. The authors of [1] make the observ ation tha t θ uv def = n − 1 n ( x ⋆ uv + x ⋆ vu ) deﬁned on the undirecte d edge s is a poi nt in the int erior of the spa nning tree polyto pe on G . The algorithm then samples a spanning tree T from the m ax-ent ropy distrib uti on with mar ginals as giv en by θ and crucially relies on properties of such a T to obtain an O  log n log log n  -appro ximation algori thm for the A TS P pro blem. Interes tingly , there is another integ ral polytop e in w hich x ⋆ is contained. Consider the con ve x hull P of all cycl e cov ers of G , see Section 3. Then, P = n x ∈ R | E | ≥ 0 : x ( δ + ( v )) = x ( δ − ( v )) = 1 o . It is easy to see that x ⋆ ∈ P . S imilar to the cycle cover algorithm of Frieze et al [12], the follo wing is a natura l algorithm for the A TSP problem. Randomized C ycle Cov er Algorithm 1. Initialize H ← / 0. 2. Wh ile G is not a single verte x • Solv e the subtour eliminatio n LP for G to obtain the solution x ⋆ . • Sample a cy cle cov er C from the max-ent rop y distrib ution with margina ls x ⋆ . • Incl ude in H all edges in C , i.e., H ← H ∪ ( ∪ C ∈ C C ) . • Select one repre sentat i ve vertex v C in each cycl e C ∈ C and dele te all the other vertic es. 3. Return H . Before analyzing the performance of this algo rithm, a basic questi on is w hether this algo rithm can be im- plemente d in poly nomial time. As an applica tion of Theorem 2.8 to the cycle cove r polyto pe for direc ted 16 graphs , it follo ws that one can sample a c ycle cov er from the m ax-ent ropy distrib ution in polynomial time and, thus , the ques tion is answered af ﬁrmati v ely . The gene ralized (rando mized) approxi mate counting or - acle for cycle cov ers in a graph follo ws from the work of [21 ]. The techni cal cond ition of int eriority of x ⋆ can be satisﬁed with a slight loss in optimality of the objecti v e functio n. The analysis of worst case perfor - mance o f thi s alg orithm is left o pen, b ut to the best of our kno wledge, there is no e xample ruli ng out th at th e Randomize d Cycle Cov er Algorithm is an O ( 1 ) -ap proxima tion. Similarly , the app licatio n of Theor em 2.8 to the perfec t m atchin g polytope in bipartite graphs makes the permane nt-bas ed approach suggeste d in [36] for the (symmetric) TSP computa tionall y feasible. 5 Bounding Bo x In this section, we prov e Theorem 2.7 and show that there is a bound ing box of small radius contai ning the optimal solutio n λ ⋆ . W e begin with the follo w ing lemma. Lemma 5.1 Let θ be a point in the η -interi or of P ( M ) ⊆ R m and let λ ⋆ be the optimal solution to the dual con ve x pr ogr am. Then for any x ∈ P ( M ) h λ ⋆ , θ − x i ≤ m . Pro of: First, note that the supremum of the primal con v ex progra m over all θ is ln | M | ≤ m . Hence, from strong dualit y it follo w s that f ( λ ⋆ ) ≤ m . This implies that f ( λ ⋆ ) = h λ ⋆ , θ i + ln ∑ M ∈ M e − λ ⋆ ( M ) = ln ∑ M ∈ M e h λ ⋆ , θ i− λ ⋆ ( M ) ≤ m . Hence, for e ve ry M ∈ M , h λ ⋆ , θ i − λ ⋆ ( M ) ≤ m . (3) Since x ∈ P ( M ) , we ha v e x = ∑ M ∈ M r M 1 M where ∑ M ∈ M r M = 1 and r M ≥ 0 for eac h M ∈ M . Multiplying (3) by r M and summing ov er M w e get ∑ M ∈ M r M ( h λ ⋆ , θ i − λ ⋆ ( M )) ≤ ∑ M ∈ M r M m . This implies that h λ ⋆ , θ i − ∑ M ∈ M r M λ ⋆ ( M ) ≤ m . As a conseq uence, w e obt ain that h λ ⋆ , θ i − h λ ⋆ , x i ≤ m , completing the proof of the lemma. Pro of of Theor em 2.7. R ecall that A = x = c denotes the maxi mal set of ind epend ent equalities satisﬁed by P ( M ) . W e no w deﬁne the follo w ing objects. Let B def = { x ∈ R m : A = x = c , k x − θ k ≤ η } be the ball centered around θ restric ted to the afﬁne spac e A = x = c of radius η . Since θ is in the η -interi or of P ( M ) , w e ha v e that B ⊆ P ( M ) . Let Q def = { y ∈ R m : A = y = c , k y − θ k ≤ 1 / η } be the ball cente red around θ of radiu s 1 η in the same af ﬁne space and let e Q def = { z ∈ R m : A = z = c , h z − θ , x − θ i ≤ 1 ∀ x ∈ B } . 17 Lemma 5.2 Q = e Q. Pro of: W e ﬁrst pro v e that Q ⊆ e Q . Let y ∈ Q . The cons traints A = y = c are cle arly satisﬁed s ince y ∈ Q . For any x ∈ B , h y − θ , x − θ i ≤ k y − θ kk x − θ k ≤ 1 η · η = 1 . Thus, y ∈ e Q . N o w we sho w that Q ⊆ e Q . Let z ∈ e Q . T he co nstrain ts A = z = c are clear ly satisﬁed sin ce z ∈ e Q . No w consi der z ′ def = θ + z − θ k z − θ k · η . W e ha ve that A = z ′ = A = θ + A = z − A = θ k z − θ k · η = c . Moreo ver , k z ′ − θ k =     θ + z − θ k z − θ k · η − θ     = k z − θ k k z − θ k · η = η . Thus, z ′ ∈ B . Hence, w e must ha v e h z − θ , z ′ − θ i ≤ 1 . This implies that  z − θ , z − θ k z − θ k · η  ≤ 1 and, therefo re, k z − θ k ≤ 1 / η . Thus, z ∈ Q completing the proof. W e no w show that e λ = − λ ⋆ / m + θ ∈ e Q . T o s ee this, ﬁrst observ e that A = e λ = − A = λ ⋆ / m + A = θ = 0 + c . Here we ha v e used the fact that A = λ ⋆ = 0 , see Lemma 2.5. W e no w v erify the secon d condition. Let x ∈ B . Then h e λ − θ , x − θ i = −h λ ⋆ , x − θ i m ≤ 1 m · m = 1 where the last inequality follows from that fact that x ∈ B ⊆ P ( M ) and Lemma 5.1. Thus, e λ ∈ Q and therefo re, we must ha v e k e λ − θ k ≤ 1 / η . Therefo re, k λ ⋆ / m k ≤ 1 / η pro ving T heorem 2.7. 6 Optimization via Counting In this section, we prov e Theorems 2.6 and 2.8. The proof of both the theorems rely on the bounding box result of Theorem 2.7 and emplo ys the framew ork of the ellipsoi d algorit hm from Section 2.5. 18 6.1 Pr oof o f Theorem 2.6 W e ﬁrst use Theorem 2.13 to giv e a proof of Theorem 2.6. The algorithm assumes access to a strong ﬁrst- order oracle for P ( M ) . W e then present details of how to implement a strong ﬁrst-order oracle using an genera lized exact counting oracle. Suppos e λ ⋆ is th e opt imum of our con ve x pro gram. Theorem 2.7 implies that for k λ ⋆ k ∞ ≤ m / η . T hus, we may pic k the bound ing radius to R def = m / η and it do es not cut the opt imal λ ⋆ we are look ing for . T he only thin g left to choose is a β such that β ≤ ε  sup λ ∈ K , k λ k ∞ ≤ R f θ ( λ ) − inf λ ∈ K , k λ k ∞ ≤ R f θ ( λ )  . This wo uld imply that the solu tion λ ◦ outpu t by emplo ying the elli psoid method fro m Theorem 2.13 is suc h that f θ ( λ ◦ ) ≤ f θ ( λ ⋆ ) + ε . T o establish a bound on β , start by no ticing th at in f λ f θ ( λ ) ≥ 0 . This foll o ws from weak-duality and the fa ct that entrop y is always non -neg ati v e. O n the other hand we ha ve the follo wing simple lemma. Lemma 6.1 sup k λ k ∞ ≤ R f θ ( λ ) ≤ ( 2 m + 1 ) R . Pro of: f θ ( λ ) ≤ |h λ , θ i| +      ln ∑ M ∈ M e − λ ( M )      ≤ mR + ln ( 2 m e mR ) ≤ ( 2 m + 1 ) R . Here we ha v e used the fact that | M | ≤ 2 m and that θ e ∈ [ 0 , 1 ] for each e ∈ [ m ] . Thus, β can be chos en to be ε ( 2 m + 1 ) R . Hence, the runnin g time of the ell ipsoid met hod depe nds pol ynomiall y on the the time it take s to implement the strong ﬁrst-order oracle for f θ and log mR ε . Since f θ ( λ ) = h λ , θ i + ln ∑ M ∈ M e − λ ( M ) = h λ , θ i + ln Z λ , it is easily seen that ∇ f θ ( λ ) e = θ e − ∑ M ∈ M , M ∋ e e − λ ( M ) ∑ N ∈ M e − λ ( N ) = θ e − Z λ e Z λ = θ e − θ λ e . Hence, ∇ f θ ( λ ) = θ − θ λ . Recall that the strong ﬁ rst-ord er oracle for f θ requir es, for a giv en λ , f θ ( λ ) and ∇ f θ ( λ ) . The genera lized exact countin g oracle for P ( M ) immediately does it as it gi ves us Z λ e for all e ∈ [ m ] and Z λ . This allo ws us to compute f θ ( λ ) and ∇ f θ ( λ ) in one call to such an oracle. In addition we also need time propor tional to the number of bits needed to represent θ . Thus, the number of calls to the countin g oracle by the ellipso id algorithm of T heorem 2.13 is bounded by a polyno mial in m , log R and log 1 / ε . S ince each oracle call can be implemented in time polynomial in m and R , 8 this gi ve s the required running time and concludes the proof of Theorem 2.6. 6.2 Pr oof o f Theorem 2.8 No w we giv e the elli psoid algorith m that works with a generalized approximate counting oracle and prov e Theorem 2.8. Here, the fact that the counting oracle is appro ximate means that the gradient computed as in the pre vio us section is approxi mate. Thus, this raises the possibility of cutting off the optimal λ ⋆ during the run of the ellipsoid algorithm. W e present the ellipso id algorithm to check, giv en a θ and a ζ , whether | f θ ( λ ⋆ ) − ζ | ≤ ε . The technical hea rt of the matter is to show that w hen | f θ ( λ ⋆ ) − ζ | ≤ ε , λ ⋆ is nev er cut 8 Here we ignore the fact that e − λ e can be irrationa l. This issu e can b e dealt in a standard manner as is done in the implementation details of all ellipsoid algorithms. See [15] for details. 19 of f of the successi v e ellips oids obtained by adding the approximate gradient con strain ts. Moreo ver , in this case, once the radius of th e ellip soid beco mes small enoug h, we can out put it s center as a guess for λ ⋆ . S ince the radius of the ﬁ nal ellipsoid is small and contains λ ⋆ , the follo wing lemma, which bound s the Lipschitz consta nt of f θ , implies that the valu e of f θ at the center of the ellips oid is close enou gh to f θ ( λ ⋆ ) . Lemma 6.2 F or any λ , λ ′ f θ ( λ ) − f θ ( λ ′ ) ≤ 2 √ m k λ − λ ′ k . Pro of: W e ha ve f θ ( λ ) − f θ ( λ ′ ) = h θ , λ − λ ′ i + ln ∑ M ∈ M e − λ ( M ) ∑ M ∈ M e − λ ′ ( M ) ≤ k θ kk λ − λ ′ k + ln max M ∈ M e − λ ( M ) e − λ ′ ( M ) ≤ √ m k λ − λ ′ k + max M ∈ M ( λ ′ ( M ) − λ ( M )) ≤ √ m k λ − λ ′ k + √ m k λ − λ ′ k ≤ 2 √ m k λ − λ ′ k which c ompletes the pro of. Here we hav e u sed the Cauch y-Schwa rz inequ ality in the ﬁrst and third i nequal - ities and in the second inequa lity we hav e used the fact that θ ∈ [ 0 , 1 ] m . Proceedi ng to the ellipsoid algorithm underlying the proof of Theorem 2.8, we do a binary search on the optimal v alue f θ ( λ ⋆ ) up to an accuracy of ε / 8 . For a g uess ζ ∈ ( 0 , m ] , we check wheth er the gu ess is cor rect with the follo w ing ellipsoid algorithm. 9 1. Inp u t (a) A n error parame ter ε > 0 . (b) A n interi ority parameter η > 0 . (c) A θ which is guara nteed to be in the η -interi or of P ( M ) . (d) A maximally linearly independen t set of equalitie s ( A = , b ) for P ( M ) . (e) A generalized approximate counting oracle for M . (f) A guess ζ ∈ ( 0 , m ] (for f θ ( λ ⋆ )) . 2. Initialization (a) L et E 0 def = E ( B 0 , c 0 ) be a sph ere with rad ius R = m / η center ed around the origi n (thus, conta ining λ ⋆ by Theorem 2.7) and restricted to the af ﬁne space A = x = 0 . (b) S et t = 0 3. Repeat until the ellipsoid E t is conta ined in a ball of radius at most ε 16 √ m . (a) G i ven the e llipso id E t def = E ( B t , c t ) , set λ t def = c t . (b) C ompute ζ t using the countin g oracle such that f θ ( λ t ) − ε / 8 ≤ ζ t ≤ f θ ( λ t ) + ε / 8 . (c) If ζ t ≤ ζ i. then retur n λ t and stop . 9 For ease of analysis, we assume that all oracle calls are answered correctly with probability 1 . The failure probability can be adjusted to arbitrary precision with a slight deg radation in the running time. 20 ii. else A. Compute θ t such that k θ t − θ λ t k 1 ≤ ε / 16 R using the countin g oracle . B. Compute the ellipsoid E t + 1 to be the smallest ellipsoid containing the half-ellipso id { λ ∈ E t : h λ − λ t , θ − θ t i ≤ 0 } restri cted to the afﬁne spac e A = x = 0 . iii. t = t + 1 . 4. Let T = t and compute ζ T using the countin g oracle such that f θ ( λ T ) − ε / 8 ≤ ζ T ≤ f θ ( λ T ) + ε / 8 . 5. If ζ T ≤ ζ (a) then retur n λ T and stop . (b) else retur n f θ ( λ ⋆ ) > ζ and stop ( ζ is not a good guess for f θ ( λ ⋆ ) ). W e ﬁrst sho w that the algorithm can be implemented using a polynomial number of queries to the approx i- mate orac le. Steps ( 3b), (3 (c)iiA) an d ( 4) ca n be computed usi ng oracl e call s to the gener alized approx imate counti ng o racle for M to o btain e Z λ t such that ( 1 − ε / 16 ) Z λ t ≤ e Z λ t ≤ ( 1 + ε / 16 ) Z λ t . W e set ζ t = h λ t , θ i + ln e Z λ t . A simple calculat ion then shows that f θ ( λ t ) − ε 8 ≤ ζ t ≤ f θ ( λ t ) + ε 8 since f θ ( λ t ) = h λ t , θ i + ln Z λ t . Similarly using one oracle call to the counti ng oracle with error parameter ε 16 R , we can compute k θ t − θ λ t k 1 ≤ ε 8 R as needed in Step (3(c)iiA) of the algorith m. Using Theorem 2.14, the number of iteratio ns can be bounded by a polynomial in m , log R / ε . The analysis is quite standar d and omitted. Each of the oracle call can be implemente d in time poly nomial in m , R and 1 / ε . W e now show the followin g lemma w hich complete s the proof of Theorem 2.8. Lemma 6.3 Let ζ ◦ be the smalles t guess for whic h the ellip soid algorith m succeed s in ﬁnd ing a solutio n and let λ ◦ denote the corr esp ondin g solution r eturn ed. Then, f θ ( λ ◦ ) ≤ f θ ( λ ⋆ ) + ε . Pro of: Obs erv e that w e ha ve f θ ( λ ◦ ) − ε 8 ≤ ζ ◦ ≤ f θ ( λ ◦ ) + ε 8 . Since ζ ◦ is the smallest guess for which the algorithm succeeds in returning an answer , it fails for some ζ ∈ [ ζ ◦ − ε / 8 , ζ ◦ ] . W e show tha t f θ ( λ ⋆ ) ≥ ζ − ε / 4 . This suf ﬁces to pro ve the theo rem since f θ ( λ ◦ ) ≤ ζ ◦ + ε 8 ≤ ζ + ε 4 ≤ f θ ( λ ⋆ ) + ε 2 . Suppose for the sak e of contradic tion that f θ ( λ ⋆ ) < ζ − ε 4 . (4) W e then sho w that λ ⋆ must be in the ﬁnal e llipsoi d E T when th e ellips oid algorit hm is run with gue ss ζ . Let λ t be center of the ellipso id E t in any ite ration with guess ζ . Let ζ t be comput ed in S tep (3b) such that f θ ( λ t ) − ε 8 ≤ ζ t ≤ f θ ( λ t ) + ε 8 . 21 Since the algorith m does not return any answer with guess ζ , we must ha ve ζ t > ζ . Thus, f θ ( λ t ) ≥ ζ t − ε 8 ≥ ζ − ε 8 ≥ f θ ( λ ⋆ ) + ε 8 where the last inequality follows from inequalit y (4). Let θ t be computed in S tep (3(c)iiA) of the algorithm such that k θ t − θ λ t k 1 ≤ ε 32 R . But then h λ ⋆ − λ t , θ − θ t i = h λ ⋆ − λ t , θ − θ λ t i + h λ ⋆ − λ t , θ λ t − θ t i ≤ f θ ( λ ⋆ ) − f θ ( λ t ) + h λ ⋆ − λ t , θ λ t − θ t i ≤ − ε 8 + k λ ⋆ − λ t kk θ λ t − θ t k ≤ − ε 8 + 2 R · ε 32 R ≤ 0 where the ﬁrst inequality follo ws from con vexity of f . Thus, λ ⋆ satisﬁes the separating constraint put in Step (3(c)iiB) of the algorithm. Therefore, it must be contain ed in the ﬁna l ellipsoid E T . L et λ T be the ce nter of the ellips oid E T . Let ζ T be computed such that | ζ T − f θ ( λ T ) | ≤ ε / 8 . Then ζ T ≤ f θ ( λ T ) + ε 8 ≤ f θ ( λ ⋆ ) + √ m k λ ⋆ − λ T k + ε 8 ≤ f θ ( λ ⋆ ) + √ m ε 16 √ m + ε 8 ≤ ζ − ε 4 + ε 4 ≤ ζ where we use Lemma 6.2. Therefore, the algorithm must ha ve return ed λ ◦ = λ T as the feasib le solution for guess ζ a contra dictio n. T his comple tes the proof of Lemma 6.3. 7 Counting via Optimization In this section we present the proof of Theorem 2.11. W e start by phrasin g the problem of estimating | M | as a con vex opt imization problem. 7.1 A Con vex Program for Counting Let g ( θ ) denote the optimum of the max-entrop y prog ram of F igure 4 for M and a point θ in P ( M ) . If θ is in the interior of P ( M ) , then strong duality holds for this con vex program and g ( θ ) = inf λ f θ ( λ ) , see Lemm a 2.3. By the conca vity of the S hanno n entrop y , g ( · ) can be easily seen to be a conca v e function of θ . Recall that in the setting of Theorem 2.11, we hav e an access to an appr oximate max-en trop y oracle which, giv en a θ in the η -interi or of P ( M ) , a ζ > 0 as a guess for g ( θ ) , and an ε > 0 , either asserts that g ( θ ) ≥ ζ − ε or re turns a λ such that f θ ( λ ) ≤ ζ + ε . The runn ing time of this o racle is poly nomial in m , 1 / ε and in the number of bits needed to represent ζ . Using this oracle, we hope to get an estimate on | M | . The startin g point of the proof is the observ ati on that the point that maximizes g ( θ ) when θ is in P ( M ) is θ ⋆ def = 1 | M | ∑ M ∈ M 1 M , the vert ex centr oid of P ( M ) . Lemma 7.1 sup θ ∈ P ( M ) g ( θ ) ≤ ln | M | and g ( θ ⋆ ) = ln | M | . 22 Pro of: For an y θ in the inter ior of P ( M ) , g ( θ ) is the entro py of some proba bility distrib utio n over the elements in | M | . A sta ndard fact in informat ion t heory implies that the maximum entrop y of an y distrib ution ov er a ﬁ nite set is obtained by the unif orm distrib ut ion. T he entropy of the unifo rm distrib utio n on M is ln | M | , hence, g ( θ ) can be upper bounded by ln | M | . On the other hand, the uniform distrib ution ov er M has mar ginals equal to θ ⋆ and, thus , g ( θ ⋆ ) = ln | M | . Thus, if we could ﬁnd g ( θ ⋆ ) , then w e can esti mate | M | . Finding g ( θ ⋆ ) is the same as solv ing the con vex progra m sup θ ∈ P ( M ) g ( θ ) . W e use the frame work of the ellipsoid m ethod to approximat ely solve this con v ex progra m and ﬁnd a point which gi ve s us a goo d enough estimate to g ( θ ⋆ ) . Note that , unlik e the results in the pre vio us section, the bound ing box here is easily obtained since θ ∈ P ( M ) ⊆ [ 0 , 1 ] m . 7.2 The Interior of P ( M ) T o check how close a candid ate point θ is to θ ⋆ , we use the max-entrop y separat ion oracle provi ded to us. The main dif ﬁculty we encoun ter is that the runnin g time of the m ax-ent ropy oracle with m ar ginal s θ is in v erse-p olyno mially dependent on the interiority of the point θ . N ote that interiority of θ is a pre- requis ite for strong duality to hold and for a succinc t representa tion of the entropy -maximizin g probability distrib ution to e xist a s in Lemma 2.3. T he re ason we a ssume a max-en trop y oracle that wo rks only i f θ is in the in verse-p olynomia l interior of P ( M ) is that such an oracle is the best we can hop e for alg orithmica lly and, indeed, Theorems 2.6 and 2.8 pro vide such an oracle. W ithout this restriction the proof of Theorem 2.11 is simpler , but the theorem itself is less useful as there may not exist a max-entrop y oracle whose runnin g time does not depend on the interior ity of θ . The ﬁrst issu e raised b y interi ority is whethe r the p oint we a re lo oking for , θ ⋆ may no t be in the in verse- polyn omial interior of P ( M ) . T o tackle this , we show that ther e is a point θ • in the η -interi or of P ( M ) for η = poly ( ε , 1 / m ) such that g ( θ • ) ∼ g ( θ ⋆ ) = ln | M | . Thus, instead of aiming for θ ⋆ , the ellipsoid algorithm aims for θ • . Lemma 7.2 Given an ε > 0 , ther e exists an η > 0 and θ • suc h that θ • is in η -interi or of P ( M ) and g ( θ • ) ≥  1 − ε 16 m  ln | M | ≥ ln | M | − ε 16 . Mor eover , η is at least a polynomial in 1 / m and ε . Before w e prove this l emma, we show that th ere e xists some point ˜ θ in the poly ( 1 / m ) interio r of P ( M ) . Such a point is then used to sho w the existe nce of θ • . Note that we do not need to bound g ( ˜ θ ) and only use the fact tha t it is non-neg ati v e. Lemma 7.3 (Same as Lemma 2.2) Let M ⊆ { 0 , 1 } m and P ( M ) =  x ∈ R m ≥ 0 : A = x = b , A ≤ x ≤ c  be such that all the entries in A ≤ , c ∈ 1 k l · Z and their absolute values are at most k u . Then ther e exist s a ˜ θ ∈ P ( M ) suc h that ˜ θ is in the 1 k l k u m 1 . 5 -interi or of P ( M ) . Thus, if k u , k l = poly ( m ) , then ˜ θ is in poly ( 1 / m ) -interi or of P ( M ) . Pro of: Let r be the dimension of P ( M ) . Then, there exi st r + 1 afﬁnely independe nt verti ces z 0 , . . . , z r ∈ { 0 , 1 } m . W e claim that ˜ θ def = 1 r + 1 ∑ r i = 0 z i satisﬁes the conc lusion of the lemma. Let F i def = { x ∈ P ( M ) : A ≤ i x = c i } 23 be a facet of P ( M ) where the inequal ity con straint is one of ( A ≤ , c ) . Since the dimensio n of a facet is one less than that of a polyto pe, at least one of z 0 , z 1 , . . . , z r , say z 0 , does not lie in F and, hence, A ≤ i z 0 < c i . Therefore , A ≤ i z 0 ≤ c i − 1 k l since all coef ﬁcien ts are 1 / k l -inte gral. Thus, the distance of z 0 from F is at least 1 k l k A ≤ i k ≥ 1 k l k u √ m . Hence, th e dista nce of ˜ θ from F is at lea st 1 m · 1 k u k l √ m . S ince t his ar gument works for any f acet F , the dis tance of ˜ θ from e ve ry facet of P ( M ) is at least 1 m · 1 k u k l √ m . Hencefor th, w e assume that k l , k u = poly ( m ) . Pro of: [Proof of Lemma 7.2.] Let ˜ θ be the point in the interio r of P ( M ) as guaranteed by Lemma 7.3. Consider the point θ • def =  1 − ε 16 m 3  θ ⋆ + ε 16 m 3 ˜ θ . Since ˜ θ is in the poly ( 1 / m , ε ) -inter ior of P ( M ) , θ • must also be in the poly ( 1 / m , ε ) -inter ior of P ( M ) . On the other hand , since g ( · ) is a conca v e and non-nega ti ve function of θ , we hav e that g ( θ • ) ≥  1 − ε 16 m 3  · g ( θ ⋆ ) ≥ g ( θ ⋆ ) − ε 16 , where we used the fact that ln | M | ≤ m . 7.3 A Separation Oracle f or Interiority Our ﬁ nal ingredient is a test for checking whether a point θ is in the in verse-po lynomial inte rior of P ( M ) . W e sho w that th e separation or acle for P ( M ) can be use d to giv e such a test. W e sta te the result in g eneral ity for any p olyhed ron P . For a ny η > 0, let P η def = { x : y ∈ P ∀ y such that k x − y k ≤ η } denote the set of η -interi or points in P . Lemma 7.4 Ther e exi sts an algori thm that given a separ ation orac le for a polyhedr on P , a set of maximal linear ly indep endent equalities ( A = , b ) satisﬁed by P, an η > 0 , and θ ∈ R m , either 1. asserts that θ ∈ P η / 2 m , i.e., θ is in the η / 2 m -interi or of P, or , 2. re turns a such that h a , y i < h a , θ i for each y ∈ P η , or equivale ntly , a separatin g hyperp lane which separ ates the η -interi or of P fr om θ . Pro of: W e use the the separ ation oracle for P on a collecti on of a small number of poin ts close to θ to deduc e if θ is in the in terior of P . Even if o ne of t hese point s is not in P , we use a separating hyperplane for such a poi nt to separate θ from the inte rior o f P . First, we describ e the proced ure when P is full dimensional. Let x 0 , . . . x m form an η -reg ular simplex with cen ter θ , i.e., 1 m m ∑ i = 0 x i = θ and k x i − x j k = η 24 for each i 6 = j . Such x 0 , . . . , x m can be found by starting with a regular simple x and then translatin g and scalin g it. Now , the algo rithm applies the separa tion oracle for each of x i . Suppose the separat ion oracle asserts that x i ∈ P for each 0 ≤ i ≤ m . In this case, we assert that θ ∈ P η / 2 m . Observe that since each vertex of the simp lex is in P , we must ha ve that the whole simple x is in P . Since the simple x is regu lar where each edge is length η and the center is θ , there exist s a ball of radius η 2 m center ed at θ which is contai ned in the simple x and, hence, in P . Thus, θ is in η 2 m -interi or of P as asserted. No w , suppose that x i / ∈ P for some i and let a be the separating hyperpla ne, i.e., h a , y i < h a , x i i for each y ∈ P . Then consid er the constraint h a , y i < h a , θ i . W e claim that it is satisﬁed by each y ∈ P η . Let y ∈ P η and consid er y ′ def = y − θ + x i . Since k θ − x i k ≤ η and y ∈ P η , we ha v e that y ′ ∈ P which implies that h a , y ′ i < h a , x i i . But this implie s that h a , y i = h a , y ′ − x i + θ i = h a , y ′ i − h a , x i i + h a , θ i < h a , θ i which gi ve s us the required separatin g hyperp lane. No w consider the case where P is not ful l dimen sional and let r be the dimensi on of P . R ecall that in this case w e deﬁne interior of P by restrict ing our attention to points in the afﬁne space { x : A = x = c } . W e modify the algori thm to chose a r -dimensi onal simplex in this af ﬁne space and check wheth er each of the ver tices of the simplex is in P . The ana lysis is identical in this case. 7.4 The Ellipsoi d Algo ri thm f or Theorem 2.11 No w we present the ellipso id algorithm to approximate ly solv e the con vex program min θ ∈ P ( M ) g ( θ ) and pro ve Theorem 2.11. The starting ellipsoid is a ball of radius √ m that contains [ 0 , 1 ] m which contains P ( M ) . Let us ﬁx an ε > 0 and apply Lemma 7.2 to obtain η which is a polynomial in 1 / m and ε and guaran tees the e xisten ce of θ • in the η -interi or such that g ( θ • ) ≥ ln | M | − ε 16 . In the range ( 0 , ln | M | ] we perfor m a binary search for the high est ζ such that the set S ( ζ , η ) def = { θ ∈ P η ( M ) : g ( θ ) ≥ ζ } is non-e mpty when w e sear ch ζ within an accurac y ε / 16 . Giv en a gue ss ζ for g ( θ • ) , at an itera tion t of the ellipsoid algo rithm, we use the cent er θ t of the ellipso id as a guess for θ • . Ideally , we would pass θ t to the m ax-ent ropy oracle which would either asser t that g ( θ t ) ≥ ζ − ε / 16 or retu rns a λ t such th at f θ t ( λ t ) ≤ ζ + ε / 16 . In the ﬁrst case we s top and ret urn θ t . In t he latter ca se, we c ontinu e the search and use this λ t return ed by the max -entro py oracle to u pdate th e ellips oid into one with a smaller vo lume. Ho we ve r , to get the gua rantee on the running time, we need to ﬁrst check that the can didate point θ t is in the η -interi or of P ( M ) . Here, we use the separation oracle from Lemma 7.4. W e pro ceed to the max-entro py oracle only if this separation oracle asserts that the point θ t is in the η / 2 m -interio r of P ( M ) . In case this sepa ration oracle outputs a hyperpla ne separati ng θ t from P η ( M ) , we use this hyperpla ne to update the ellipsoid. T he ke y technical fact we show is that when | ζ − g ( θ • ) | ≤ ε / 16 , θ • is always contain ed in ev ery ellipsoid. Thus, once the radius of the ellipsoid becomes small enough , we can output its center as a guess for θ • . Since the radius of the ﬁnal ellipsoid is small and contains θ • , the follo w ing lemma impli es that the va lue of g ( · ) at the center of the ellipsoi d is close enou gh to g ( θ • ) and , hence, by Lemma 7.2, to g ( θ ⋆ ) . Lemma 7.5 Let θ , θ ′ ∈ P ( M ) such that k θ − θ ′ k ≤ ε and θ is in η -interi or of P ( M ) . Then g ( θ ′ ) ≥ ( 1 − ε / η ) g ( θ ) . 25 Pro of: Let λ ⋆ be an optimal solutio n to inf λ f θ ( λ ) . Thus, p λ ⋆ is the optimal soluti on to primal con vex progra m and H ( p λ ⋆ ) = inf λ f θ ( λ ) . W e cons truct a pr obabil ity distrib ution q which is feasible for the p rimal con ve x program w ith parame ter θ ′ and H ( q ) ≥ ( 1 − ε / η ) H ( p λ ⋆ ) , thus, prov ing the lemma. W e be gin with a claim. Claim 7.6 Let θ ′′ def = θ ′ − ( 1 − ε / η ) θ ε / η . Then θ ′′ ∈ P ( M ) . Pro of: First, observe that any equality constraint for P ( M ) of the form h A = i , x i = b i is satisﬁed by both θ and θ ′ . Therefo re, h A = i , θ ′′ i = h A = i , θ ′ i − ( 1 − ε / η ) h A = i , θ i ε / η = b i − ( 1 − ε / η ) b i ε / η = b i . Thus, it is enoug h to show that k θ ′′ − θ k ≤ η . T o see this note that k θ ′′ − θ k =     θ ′ − ( 1 − ε / η ) θ ε / η − θ     =     θ ′ − θ ε / η     ≤ ε ε / η = η where the last inequal ity follo ws from the fact that k θ − θ ′ k ≤ ε . Let q ′′ be an arb itrary probability measure o ver M such that the m ar ginal s of q ′′ equal θ ′′ , i.e., θ ′′ e = ∑ M ∈ M : e ∈ M q ′′ M . Let q be the probabi lity measure deﬁned to be q def = ( 1 − ε / η ) p λ ⋆ + ε / η q ′′ . Then ∑ M ∈ M : e ∈ M q M = ( 1 − ε / η ) θ e + ε / η θ ′′ e = θ ′ e . By conca vity and non-ne gati vi ty of the entrop y function, we hav e H ( q ) ≥ ( 1 − ε / η ) H ( p λ ⋆ ) + ε / η H ( q ′′ ) ≥ ( 1 − ε / η ) H ( p λ ⋆ ) as requi red. W e now m ov e on to the descriptio n of the ellipsoid algo rithm and subseq uently compl ete the proo f of Theorem 2.11. 1. Inp u t (a) A n error parame ter ε > 0 . (b) A maximally linearly independen t set of equalitie s ( A = , b ) for P ( M ) . (c) A separation oracle for the facets ( A ≤ , c ) of P ( M ) . (d) A max-entrop y oracle for M . (e) A guess ζ ∈ ( 0 , m ] (for g ( θ • ) ). 2. Initialization (a) L et η be as in Lemma 7.2. (b) L et E 0 def = E ( B 0 , c 0 ) be a sphere w ith radius R = √ m contain ing [ 0 , 1 ] m restric ted to the af ﬁne space A = x = b . 26 (c) S et t = 0 3. Repeat until the ellipsoid E t is conta ined in a ball of radius at most ε η 16 m . (a) G i ven the e llipso id E t def = E ( B t , c t ) , set θ t def = c t . (b) C heck using the separa tion oracle for P ( M ) as in Lemma 7.4 if θ t ∈ P η 2 m ( M ) i. then goto Step (3c) ii. else let e t be the separa ting hyperplane retu rned as in L emma 7.4, i.e., h e t , θ − θ t i ≥ 0 for all θ ∈ P η ( M ) and goto Step (3e). (c) C all the max-en trop y oracle with input θ t , ζ and ε / 16 . (d) If g ( θ t ) ≥ ζ − ε / 16 i. then retur n θ t and stop . ii. else the max-entrop y oracle retu rns λ t such that f θ t ( λ t ) ≤ ζ + ε / 16 . Let e t = λ t . (e) C ompute the ellipsoi d E t + 1 to be the smallest ellipsoid containing the half-ell ipsoid { θ ∈ E t : h e t , θ − θ t i ≥ 0 } and restri cted to the afﬁne spac e A = x = b . (f) t = t + 1 . 4. Let T = t call the max-entrop y oracle w ith input θ T , ζ and ε / 16 . 5. If g ( θ T ) ≥ ζ − ε / 16 (a) then retur n θ T and stop . (b) else retur n g ( θ • ) < ζ and stop ( ζ is not a good gue ss for g ( θ • ) ). It is clear that any call to the appro ximate optimiza tion oracle is m ade for poin ts θ which are in η 2 m interio r . Thus, the running time of the algorithm is polynomially bounded by m and 1 / ε for each call. T o bound the numbe r of iterations note that the starting ellipsoid has radius √ m and the ﬁ nal elli psoid poly ( 1 / m , ε ) . Hence, the number of iteration s can be bounded by Theorem 2.14 by poly ( m , 1 / ε ) . It remains to pro ve the correc tness of the algorithm. T o war ds this, let ζ ◦ be the largest guess of ζ for which the algorithms returns a positi ve answer and let θ ◦ be the point returned by the algorithm for guess ζ ◦ . W e return Z ◦ def = e ζ ◦ as our estimate of | M | . T o complete the proof of Theorme 2.11, we sho w that Z ◦ satisﬁes ( 1 − ε ) | M | ≤ Z ◦ ≤ ( 1 + ε ) | M | . (5) First, we pro ve the follo w ing lemma. Lemma 7.7 Consider the run of the ellipsoi d algor ithm for a guess ζ and let the hyperpla ne { θ : h e t , θ − θ t i ≥ 0 } be used as a separatin g hyper plane in some iter ation of the algorithm. Then this separa ting hyperp lane does not cut any point θ suc h that θ ∈ P η ( M ) and g ( θ ) ≥ ζ + ε / 16 . Pro of: If the hyperplane e t is obtained in Step (3(b)ii), then it is clearly a va lid inequality for P η ( M ) and therefo re does not cut off an y of its points . Otherwise, suppose e t = λ t is obtai ned in S tep (3(d) ii). Then f θ t ( λ t ) = h λ t , θ t i + ln Z λ t ≤ ζ + ε 16 . Hence, h λ t , θ − θ t i = f θ ( λ t ) − f θ t ( λ t ) ≥ f θ ( λ t ) − ζ − ε 16 . (6) 27 Thus, by the assumptio n in the lemma, f θ ( λ t ) ≥ g ( θ ) ≥ ζ + ε / 16 and, therefo re, by (6), θ satisﬁes the cons traint h e t , θ − θ t i ≥ 0. W e no w show that ζ ◦ ≥ ln | M | − 4 ε 16 . Consid er the run of the algorithm for ζ ′ ∈  ln | M | − 4 ε 16 , ln | M | − 3 ε 16  . Since g ( θ • ) ≥ ln | M | − ε 16 ≥ ζ ′ + ε 16 , θ • canno t be cut off in any iteration by L emma 7.7. If the ellipsoid returns an answer w hen run w ith guess ζ ′ then ζ ◦ ≥ ζ ′ ≥ ln | M | − 4 ε 16 as claimed. Othe rwise, we end w ith an ellipsoi d E T of radius at most ε η 16 m . Let θ T be the center of the ellipso id E T . Since θ • ∈ E T , we ha v e tha t k θ • − θ T k ≤ ε η 16 n . Since θ • is in η -interi or , from Lemma 7.5 it follo w s that g ( θ T ) ≥  1 − ε η 16 m η  g ( θ • ) ≥  1 − ε 16 m  g ( θ • ) > ln | M | − 2 ε 16 . This contr adicts the fact that the algorithm did not output θ T in the last iterat ion and asserted g ( θ T ) ≤ ζ ′ + ε 16 ≤ ln | M | − 2 ε 16 . Since ζ ◦ ≥ ln | M | − 4 ε 16 and ζ ◦ ≤ ln | M | + ε 16 . W e obtain tha t ( 1 − ε ) | M | ≤ Z ◦ ≤ ( 1 + ε ) | M | pro ving (5) and completi ng the proof of T heorem 2.11. Refer ences [1] A. Asadpour , M . X. Goemans, A. Madry , S. Oveis-Gh aran, and A . Saberi. An o(log n/ log log n)- approx imation algorithm for the asymmetric tra veling salesman problem. In SOD A , pages 379–3 89, 2010. [2] A. Asadpour and A. S aberi. An approxima tion algorithm for max-min fair allocation of indi visibl e goods . SIAM J. Comput. , 39(7):29 70–29 89, 2010 . [3] A. Ben-T al and A. Nemirovsk i. Optimization III: C on vex analysis, nonlinea r programming theory , nonlin ear programming algorithms. Lecture Notes, 2012. [4] S. Boyd and L. V andenber ghe. Con ve x Optimization . Cambridge Uni vers ity Press, Mar . 200 4. [5] T . M. Co v er and J. Thomas. Elements of informatio n theory . W ile y , New Y ork, 1991 . [6] M. E. Dyer , A. M. Frieze, R . Kannan, A. Kapoor , L. Perko vic, and U. V . V azira ni. A mildly exp onent ial time algor ithm for approxi mating the number of solutio ns to a multidimension al knapsack problem. Combinato rics, P r obability & C omputing , 2:271–284 , 1993. 28 [7] J. Edmonds. Maximum m atchin g and a polyhedron w ith 0 , 1 vertices. J ourn al of Resear ch of the National Bure au of Standar ds , 69:125– 130, 1965. [8] J. Edmonds. S ubmodul ar function s, matroids, and certain polyhed ra. Combinato rial structur es and their applicati ons , pages 69–87, 1970. [9] J. Edmonds. Matroid s and the greedy alg orithm. Mathe matical Pr og ramming , 1:127–136 , 1971. 10.1007 /BF015840 82. [10] K. Elbassion i and H. R. T iwary . Complexity of approximatin g the verte x centroid of a polyhe dron. Theor etical Computer Science , 421(0):56 – 61, 2012. [11] T . Feder and M. M ihail. Balanced matroids . In S. R. Ko saraju, M. Fello ws, A. W igderso n, and J. A. Ellis, editor s, STOC , pages 26–38 . ACM, 199 2. [12] A. M. F rieze, G. Galbiati, and F . Maf ﬁoli. O n the worst-case performance of some algorit hms for the asymmetric tra ve ling salesman problem. Networks , 12(1):23 –39, 1982. [13] B. Garrett. Three observ atio ns on line ar al gebra. (spa nish). Univ . Nac. T ucumn. Re vista A. , 5:147–151 , 1946. [14] C. D. Godsi l and G. Royle. Algebr aic Graph Theory . S pringe r , 2001. [15] M. Gr ¨ otschel, L. Lov ´ asz, and A . S chrijv er . Geometric Algorith ms and Combinatoria l O ptimizati on , v olume 2 of Algorithms and Combinatorics . Springer , second correcte d edition edition, 1993. [16] Z. Huang and S. K annan. The expon ential mechanism for social welfare: P ri v ate, truthful, and nearly optimal. In F OCS , pag es 140–149, 2012. [17] E. T . Jaynes. Information T heory and S tatisti cal Mechanic s. Physical R evie w , 106:620 –630, May 1957. [18] E. T . Jaynes. Information Theory and Statistic al Mechan ics. II. P hysical Revie w , 108:171–1 90, Oct. 1957. [19] M. Jerrum. A ver y simple algorithm for estimatin g the number of k-colorings of a lo w-de gree graph. Random Struct. Algorithms , 7(2):157– 166, 1995. [20] M. Jerr um and A. Sinclair . Approximat ing the permanent. SIAM J. Comput. , 18(6 ):1149 –1178, 1989. [21] M. J errum, A. Sincl air , and E. V igoda . A polyn omial-time approxi mation algorith m for the p ermanen t of a matrix with nonne gati ve entries . J. ACM , 51 (4):67 1–697 , 2004. [22] M. Jerrum, L. G. V aliant, and V . V . V azirani . Random generatio n of combi natoria l structures from a unifor m distrib ution. Theor . Comput. Sci. , 43:169–188 , 1986. [23] J. Kahn. Asymptotics of the chromatic inde x f or mult igraph s. J. Comb . Theory , Ser . B , 68(2 ):233– 254, 1996. [24] J. Kahn. Asymptot ics of the list-ch romatic index for multigra phs. Random Struct. Algorithms , 17(2): 117–1 56, 2000. [25] J. Kahn and P . M. Kayll. On the stochastic indepen dence pro perties of hard-c ore dis trib utio ns. Com- binato rica , 17(3):369–3 91, 1997. 29 [26] J. N. Kapur . Maximum-Entr opy Models in Science and Engineer ing . W iley , New Y ork, 1989. [27] R. M. K arp a nd M. Luby . Monte-carlo algorith ms for enumeratio n and reliabili ty problems. In FOCS , pages 56–64. IEEE Computer Society , 1983. [28] G. Kirchhof f. Ueber die au ¨ osung der gleichunge n, auf welche man bei der untersu chung der linearen ver theilun g galvan ischer str ¨ ome gef ¨ uhrt w ird. Ann. Phys. und Chem. , 72:49 7508, 1847. [29] Y . Nester ov . Introducto ry lectures on con vex progra mming. V olume I: Basic course, 1998. [30] S. Ov eis-Ghara n, A. Saberi, and M. Singh . A r andomize d round ing appro ach to the trav eling sal esman proble m. In R . Ostro vsk y , editor , FOCS , pages 550–559 . IEE E, 2011. [31] M. W . Padber g and M. R. Rao. Odd minimum cut-sets and b -matchings . Mathemati cs of Operati ons Resear ch , 7(1):67–8 0, 1982. [32] A. Sinclair and M. Jerrum. App roximate counting , unifor m generation and rapidly mixing Marko v chains . Inf. Comput. , 82(1):9 3–13 3, July 1989. [33] L. J. Stockme yer . On approxi mation algori thms for #p. SIAM J . Comput. , 14(4): 849–8 61, 1985. [34] L. G. V aliant. The complex ity of computing the permanent. T heor . Comput. Sci. , 8:189–201 , 1979. [35] L. G. V aliant. The complexi ty of enumer ation and reliability prob lems. SIAM J. Comput. , 8(3 ):410– 421, 1979. [36] N. K . V ishn oi. A permanent approach to the trav elin g salesman problem. In FO C S , pages 76–80 , 2012. [37] M. J. W ainwright and M. I. Jord an. Graphical models, exp onenti al famili es, and var iation al infere nce. F ound ations and T ren ds in Machine L earning , 1(1-2):1 –305, 2008. 30 A Omitted Proofs A.1 Duality o f the Max-Entr opy Program Lemma A .1 F or a point θ in the interior of P ( M ) , ther e exists a unique distrib ution p ⋆ which attains the max-entr opy while satisfyin g ∑ M ∈ M p ⋆ M 1 M = θ . Mor eo ver , ther e e xists λ ⋆ : [ m ] 7→ R such that p ⋆ M ∝ e − λ ⋆ ( M ) for each M ∈ M . Pro of: Consider the con vex program for computing the maximum-entro py dist rib utio n with margina ls θ as in F igure 4. W e ﬁrst prov e that the dual of this con vex program is the one gi ve n in Figure 5. T o see sup ∑ M ∈ M p M ln 1 p M s.t. ∀ e ∈ [ m ] ∑ M ∈ M , M ∋ e p M = θ e (7) ∑ M ∈ M p M = 1 (8) ∀ M ∈ M p M ≥ 0 (9) Figure 4: Max-Entr opy Program for ( M , θ ) inf f θ ( λ ) def = ∑ e ∈ [ m ] θ e λ e + ln ∑ M ∈ M e − λ ( M ) s.t. ∀ e ∈ [ m ] λ e ∈ R (10) Figure 5: Dual of the Max-Entro py Program for ( M , θ ) this consider multipliers λ e for constraint s (7) in Figure 4 and a multiplier z for the constraint (8). Then the Lagrangi an L ( p , λ , z ) is deﬁned to be ∑ M ∈ M p M ln 1 p M + ∑ e ∈ [ m ] λ e ( θ e − ∑ M ∈ M , M ∋ e p M ) + z ( 1 − ∑ M ∈ M p M ) . This is the same as ∑ M ∈ M p M ln 1 p M − ∑ M ∈ M p M λ ( M ) − z ∑ M ∈ M p M + ∑ e ∈ [ m ] λ e θ e + z . (11) Let g ( λ , z ) def = inf p ≥ 0 L ( p , λ , z ) . Thus, the p which achie v es g ( λ , z ) can be obtaine d by taking partial deri v a- ti ve s with respect to p M and setting them to 0 as follo ws. ∀ M ∈ M ∂ L ∂ p M = ln 1 p M − 1 − λ ( M ) − z = 0 . (12) 31 Thus, p M = e − 1 − z − λ ( M ) for all M ∈ M . Summing this up ove r all M ∈ M we obta in that ∑ M ∈ M p M = e − 1 − z ∑ M ∈ M e − λ ( M ) . (13) For suc h a ( p , λ , z ) , if we multiply each (12) by p M and add all of them up we obtain ∑ M ∈ M  p M ln 1 p M − p M − p M λ ( M ) − z p M  = 0 , which implies that ∑ M ∈ M  p M ln 1 p M − p M λ ( M ) − z p M  = ∑ M ∈ M p M . Hence, combinin g this w ith (11) and using (13), the dual becomes to ﬁnd the inﬁmum of g ( λ , z ) which is ∑ e ∈ [ m ] λ e θ e + z + e − 1 − z ∑ M ∈ M e − λ ( M ) . Optimizing g ( λ , z ) ov er z one obtains that g ( λ , z ) is minimized when 1 − e − 1 − z ∑ M ∈ M e − λ ( M ) = 0 . Hence, z = ln ∑ M ∈ M e − λ ( M ) − 1 . Thus, the Lagrangia n dual becomes to minimize ∑ e ∈ [ m ] θ e λ e + ln ∑ M ∈ M e − λ ( M ) . This complet es the proof that the dual of Figure 5 is the con vex program in Figure 4. Since θ is in the inte rior of P ( M ) , the primal-dual pair satisﬁes Slater’ s condition and strong duali ty holds, see [4], implying that th e optimum of both the programs is the same. Moreo ve r , by the stri ct conca vity of the entropy function, the optimum is unique. H ence, at optimality , p ⋆ M = e − λ ⋆ ( M ) ∑ N ∈ M e − λ ⋆ ( N ) where λ ⋆ is the optimal dual solutio n and p ⋆ is the optimal primal solutio n. A.2 Optimal and Near -Optimal Dual Solutions In th is section we ﬁrst pro v e that if λ is a solu tion to the prog ram in Figure 5 for ( M , θ ) of v alu e ζ , then so is any λ + ( A = ) ⊤ d for any d . R ecall that ( A = , b ) are the equali ty constraints satisﬁed by all vertices of M . Hence, in our search for the optimal solutio n to the dual con v ex program, we restric t ourselv es to the space of λ s.t. A = λ = 0 . Lemma A .2 (Same as L emma 2.5) f θ ( λ ) = f θ ( λ + ( A = ) ⊤ d ) for any d . Pro of: First, note tha t h λ + ( A = ) ⊤ d , θ i = h λ , θ i + h ( A = ) ⊤ d , θ i . Note that θ can be written as ∑ M ∈ M p M 1 M and A = 1 M = b for all M ∈ M . H ence, h λ + ( A = ) ⊤ d , θ i = h λ , θ i + ∑ M ∈ M p M h d , b i = h λ , θ i + h d , b i since ∑ M ∈ M p M = 1 . On the other hand note that ln ∑ M ∈ M e −h λ +( A = ) ⊤ d , 1 M i = ln e −h d , b i ∑ M ∈ M e −h λ , 1 M i 32 which equals −h d , b i + ln ∑ M ∈ M e −h λ , 1 M i = −h d , b i + ln ∑ M ∈ M e − λ ( M ) . Combining , w e obt ain that f θ ( λ + ( A = ) ⊤ d ) equals h λ + ( A = ) ⊤ d , θ i + ln ∑ M ∈ M e −h λ +( A = ) ⊤ d , 1 M i = h λ , θ i + ln ∑ M ∈ M e − λ ( M ) which equals f θ ( λ ) . This complete s the proof of the lemma. Thus, we can assume that A = λ ⋆ = 0 w here λ ⋆ is the optimal solution for the prog ram of Figure 5 for ( M , θ ) . Next we pro ve that if λ is such th at f θ ( λ ) is close to f θ ( λ ⋆ ) , then p λ and p λ ⋆ are close to each other . W e relate the Kullback -Leibler dista nce b etween p λ and p λ ⋆ to f θ ( λ ) − f θ ( λ ⋆ ) . In particula r θ λ and θ are clo se to each other . Before w e state this lemma, we recall some basic m easure s of proxi mity between probability distrib utions. Deﬁnition A .3 Let p , q be tw o pr obabilit y distrib ut ions over the same space Ω . The following ar e natur al measur es of distances . 1. k p − q k TV def = max S ⊆ Ω | p ( S ) − q ( S ) | . 2. k p − q k 1 def = ∑ ω | p ( ω ) − q ( ω ) | . 3. If p , q > 0 , then the Ku llbac k-Leibler distance between them is deﬁned to be D KL ( p k q ) def = ∑ ω p ( ω ) ln p ( ω ) q ( ω ) . This distan ce function is always non-ne gative bu t not necessarily symmetric. The follo w ing lemma shows a close relat ion bet ween the dual solution s and Kullback -Leibler distance be- tween the corresp onding primal distrib utions. Lemma A .4 Suppose λ is such that f θ ( λ ) ≤ f θ ( λ ⋆ ) + ε wher e λ ⋆ is the optimum dual solution for the instan ce ( M , θ ) . Let p λ , p λ ⋆ be the pr obability distrib ution s corr espon ding to λ and λ ⋆ r esp ectivel y: p λ M def = e − λ ( M ) ∑ N ∈ M e − λ ( N ) and p λ ⋆ M def = e − λ ⋆ ( M ) ∑ N ∈ M e − λ ⋆ ( N ) . Then f θ ( λ ) − f θ ( λ ⋆ ) = D KL ( p λ ⋆ k p λ ) = ε . Pro of: Let Z λ , Z λ ⋆ denote ∑ M ∈ M e − λ ( M ) and ∑ M ∈ M e − λ ⋆ ( M ) respec ti vel y . T hen, it follo ws fr om o ptimalit y of λ ⋆ that θ e = ∑ M ∈ M , M ∋ e e − λ ⋆ ( M ) Z λ ⋆ . (14) Hence, D KL ( p ⋆ k p ) = ∑ M ∈ M p ⋆ M ln 1 p M − ∑ M ∈ M p ⋆ M ln 1 p ⋆ M 33 which, since p M = e − λ ( M ) / Z λ is ∑ M ∈ M p ⋆ M ln Z λ + ∑ M ∈ M p ⋆ M λ ( M ) − ∑ M ∈ M p ⋆ M ln 1 p ⋆ M . This is equal to ln Z λ − f θ ( λ ⋆ ) + ∑ e ∈ [ m ] λ e ∑ M ∈ M , M ∋ e p ⋆ M = ln Z λ − f θ ( λ ⋆ ) + h λ , θ i = f θ ( λ ) − f θ ( λ ⋆ ) . Here, we ha ve use d (14). Hence, D KL ( p ⋆ k p ) = f θ ( λ ) − f θ ( λ ⋆ ) ≤ ε . It is well-kno w n, see [5] Lemm a 12.6.1, pp. 300-301, that for probab ility distrib uti ons p , q o ver the same sample space k p − q k TV ≤ O  p D KL ( p k q )  . (15) Hence, we obtai n the following as a corolla ry to Lemm a A.4. Cor ollary A.5 L et λ be such that f θ ( λ ) ≤ f θ ( λ ⋆ ) + ε . Then, for all e ∈ [ m ] , | θ λ e − θ λ ⋆ e | ≤ O ( √ ε ) . Pro of: This fol lo ws from the fact that | θ λ e − θ λ ⋆ e | ≤ k p λ − p λ ⋆ k TV which is at most O  q D KL ( p λ ⋆ k p λ )  = O  p f θ ( λ ) − f θ ( λ ⋆ )  = O ( √ ε ) . B Generalized Counting and Minimizing Kul lback-Leibler Divergen ce In this se ction, we outlin e ho w to obtain alg orithms for generalize d appro ximate cou nting fr om m ax-ent ropy oracle s. Recall that the genera lized approxi mate counting problem is: Giv en ε > 0 and weights µ ∈ R m , outpu t e Z µ and e Z µ e for each e ∈ [ m ] such that the follo wing guarant ees hold. 1. ( 1 − ε ) Z µ ≤ e Z µ ≤ ( 1 + ε ) Z µ and 2. for eve ry e ∈ [ m ] , ( 1 − ε ) Z µ e ≤ e Z µ e ≤ ( 1 + ε ) Z λ e . Here Z µ is deﬁned to be ∑ M ∈ M e − µ ( M ) . T he running time should be a polynomial in m , 1 / ε , log 1 / α and the numbe r of bits require d to repr esent e − µ e for an y e ∈ [ m ] , or k µ k 1 . 10 T o war ds constructin g such alg o- rithms, we need access to oracles that solve a more general proble m than the m ax-ent rop y problem used in Theorem 2.11, namely , min-Kullback -Leibler (KL)-di ver gence problem. This raises the issue that, while Theorems 2.6 and 2.8 output solutions to max-en trop y pr oblem giv en access to gener alized c ountin g ora cles, to ob tain a ge neraliz ed counting oracle we ne ed acce ss to a min imum KL-di v er genc e oracle. Howe ver , later in this section w e sho w that giv en a generalized counting oracle , we can not only solve the max-entr opy con ve x progra ms as in Theorem 2.8, but also the min KL-div er genc e program in a straigh tforw ard manne r . The con vex program for the min-KL-di ver gence proble m is gi ven in Figure 6. Gi ven a µ ∈ R m , recall that p µ is the product distrib ution p µ M def = e − µ ( M ) Z µ . Observe that the objecti ve is to ﬁnd a distrib utio n p that minimizes 10 The number of bits needed to represent e − µ e for an y e ∈ [ m ] , up to an ad ditiv e error of 2 − m 2 , is max { poly ( m ) , k µ k 1 } for some poly ( m ) . Since all our running times depend on poly ( m ) , we only track the depend ence on k µ k 1 . 34 sup ∑ M ∈ M p M ln p µ ( M ) p M + ln Z µ s.t. ∀ e ∈ [ m ] ∑ M ∈ M , M ∋ e p M = θ e ∑ M ∈ M p M = 1 ∀ M ∈ M p M ≥ 0 Figure 6: Min-KL-Dive r gence Program for ( M , θ , µ ) inf ∑ e ∈ [ m ] θ e λ e + ln ∑ M ∈ M e − λ ( M ) − µ ( M ) + ln Z µ s.t. ∀ e ∈ [ m ] λ e ∈ R Figure 7: Dual of the Min- KL-Div er gen ce P rogram for ( M , θ , µ ) the KL-di ver gence , up to a shif t, between th e dist rib utio ns p and p µ . T his f ollo ws sinc e the o bjecti ve can be re written as ∑ M ∈ M p M ln p µ ( M ) p M + ln Z µ = ln Z µ − D K L ( p k p µ ) where Z µ does not depend on the variab le p but only on the inp ut µ . The dual of thi s con ve x program is gi ve n in Figure 7. W e use the fo llo wing to denote the objecti v e function of the dual: f µ θ ( λ ) def = ∑ e ∈ [ m ] θ e λ e + ln ∑ M ∈ M e − λ ( M ) − µ ( M ) + ln Z µ . When θ is in the interior of P ( M ) , strong duality holds between the programs of F igure 6 and 7. W e assume that we are gi ve n the follo wing approximate oracle to solve the abo v e set of con v ex progra ms. Deﬁnition B .1 An approximat e KL-optimizati on oracle for M , giv en a θ in the η -interi or of P ( M ) , µ ∈ R m , a ζ > 0 , and an ε > 0 , either 1. asserts that inf λ f µ θ ( λ ) ≥ ζ − ε or 2. re turns a λ ∈ R m suc h that f µ θ ( λ ) ≤ ζ + ε . The orac le is assumed to be ef ﬁcien t, i.e., it runs in time polynomia l in m, 1 / ε , 1 / η , the number of bits needed to r epr esent ζ and k µ k 1 . The follo w ing theorem is the appropriat e generaliz ation of T heorem 2.11 in this setti ng. Theor em B.2 Ther e e xists an algorithm that, give n a m aximal set of lin early indepe ndent equalitie s ( A = , b ) and a separ atio n ora cle for P ( M ) , a µ ∈ R m , and an appr oximate KL-optimization oracl e for M as above , r etu rns a e Z such that ( 1 − ε ) Z µ ≤ e Z ≤ ( 1 + ε ) Z µ . Assuming that the running times of the separ ation oracl e and the appr oximate K L orac le ar e polynomial in their re specti ve input paramet ers , the running time of the algori thm is bounded by a polynomial in m , 1 / ε , the number of bits needed to r epr esent ( A = , b ) and k µ k 1 . 35 W e omit t he proo f since it i s a s imple b ut te dious gene ralizat ion of the proo f of Theorem 2.11. W e highl ight belo w the k ey add itiona l points that must be tak en into accou nt. The algorith m in the proof of Theorem B.2 is obtai ned by using the ellipsoid algorithm to maximize the conca v e function max θ g µ ( θ ) ov er the inte rior of P ( M ) . Here, g µ ( θ ) def = min λ f µ θ ( λ ) . Indeed, the maximum is attained at θ ⋆ where θ ⋆ e = ∑ M : e ∈ M e − µ ( M ) Z µ and the ob jecti v e va lue at thi s maximum is ln Z µ . Thus, we use the ellipsoid algorith m to search f or θ ⋆ . The issue of interio rity as in Lemma 7.2 is resolve d by proving the follo wing lemma. Lemma B .3 Given an ε > 0 , ther e exists an η > 0 and θ • suc h that θ • is in η -interi or of P ( M ) and g µ ( θ • ) ≥  1 − ε 16 m k µ k 1  ln Z µ ≥ ln Z µ − ε 16 . Mor eover , η is at least a polynomial in 1 / m , ε and 1 / k µ k 1 . Similarly , Lemma 7.5 can be genera lized to show the follo wing : Lemma B .4 Let θ , θ ′ ∈ P ( M ) such that k θ − θ ′ k ≤ ε and θ is in η -interi or of P ( M ) . Then g µ ( θ ′ ) ≥  1 − ε η k µ k 1  g ( θ ) . Using the above two lemmas it is straightfo rward to generaliz e the ar gument in Theorem 2.11 to pro ve Theorem B.2. T o complete the picture we sho w that , gi v en access to a gen eralize d counting oracl e, we can solv e the abo ve pa ir of con ve x programs. This gi ves the follo w ing theorem which is a ge neraliz ation of Theorem 2.8. Theor em B.5 Ther e e xists an algorithm that, give n a m aximal set of lin early indepe ndent equalitie s ( A = , b ) and a gener aliz ed appr oximate counting oracl e for P ( M ) ⊆ R m , a θ in the η -interi or of P ( M ) , µ ∈ R m and an ε > 0 , re turns a λ ◦ suc h that f µ θ ( λ ◦ ) ≤ f µ θ ( λ ⋆ ) + ε , wher e λ ⋆ is the optimal solutio n to the dual of the max-entr opy con ve x pr o gra m for ( M , θ , µ ) fr om F igu r e 7. Assuming that the gene ral ized appr oximate counting oracle is polynomial in its input parameter s, the runnin g time of the algori thm is polynomial in m, 1 / η , log 1 / ε , the number of bits needed to re pr esent θ and ( A = , b ) , and k µ k 1 . A similar theorem can be stated in the exact counting ora cle setting, extendin g Theorem 2.6. The proof of Theorem B.5 is quite straightfor ward and relies on the follo wing lemma that states that the objecti ve of the primal con ve x prog ram is jus t an add iti ve s hift fro m th e obj ecti v e of th e maximu m entro py co n vex p rogram. Thus, the primal optimum solution remains the same. T his implies that the dual con vex program can be solv ed as in the proof of Theorem 2.8 with one addition al call to the generaliz ed approximat e counting oracle in volv ing µ s. Lemma B .6 Let p be any feasibl e solu tion to the primal con vex pr o gra m given in Fig ur e 6. Then the object ive ∑ M ∈ M p M ln p µ M p M + ln Z µ = ∑ M ∈ M p M ln 1 p M − h θ , µ i . 36 Pro of: W e ha ve ∑ M ∈ M p M ln p µ M p M + ln Z µ = ∑ M ∈ M p M ln 1 p M + ∑ M ∈ M p M ln p µ M + ln Z µ = ∑ M ∈ M p M ln 1 p M + ∑ M ∈ M p M ln e − µ ( M ) Z µ + ln Z µ = ∑ M ∈ M p M ln 1 p M + ∑ M ∈ M p M ( − µ ( M )) + ∑ M ∈ M p M Z µ + ln Z µ = ∑ M ∈ M p M ln 1 p M − ∑ e ∈ E µ e ∑ M : e ∈ M p M = ∑ M ∈ M p M ln 1 p M − ∑ e ∈ E µ e θ e where we ha v e used the f acts that ∑ M p M = 1 and ∑ M : e ∈ M p M = θ e since p satisﬁes the constraint s of the con ve x program in Figure 6. C 2 → 2 -norm of ∇ f For a ﬁx ed θ , let f ( λ ) def = h θ , λ i + ln ∑ M ∈ M e − λ ( M ) . (16) Recall that k ∇ f k 2 → 2 is the 2 → 2 Lipschitz constan t of ∇ f and is deﬁned to be the smallest non-neg ati v e number such that   ∇ f ( λ ) − ∇ f ( λ ′ )   2 ≤ k ∇ f k 2 → 2 · k λ − λ ′ k 2 for all λ , λ ′ . In this section , we show t he follo wing theo rem, which can be used to g i ve alter nati v e gradient- descen t based proofs of Theorems 2.6 and 2.8. Theor em C.1 Let f be deﬁned as in (16) . Then, k ∇ f k 2 → 2 ≤ O ( m √ m ) . Pro of: Giv en λ 1 and λ 2 , le t p λ 1 and p λ 2 be th e cor respon ding product distrib utions and l et θ 1 and θ 2 be th e corres pondin g m ar ginal s. W e break the calculat ion of k ∇ f k 2 → 2 into two parts: k λ 1 − λ 2 k 2 > 1 10 √ m and k λ 1 − λ 2 k 2 ≤ 1 10 √ m . Estimating k ∇ f k 2 → 2 in the ﬁrst case is straigh tforw ard. T o see this, recall that ∇ f ( λ i ) = θ − θ i (17) for i = 1 , 2 . Thus, k ∇ f ( λ i ) k 2 ≤ √ m (18) since θ , θ i ∈ P ( M ) ⊆ [ 0 , 1 ] m , which implies that θ − θ i ∈ [ − 1 , 1 ] m for i = 1 , 2 . Hence, k ∇ f ( λ 1 ) − ∇ f ( λ 2 ) k 2 ≤ 2 √ m . Thus, when k λ 1 − λ 2 k 2 > 1 10 √ m , (19 ) 37 we obtain k ∇ f ( λ 1 ) − ∇ f ( λ 2 ) k 2 ∆ − ineq . ≤ k ∇ f ( λ 1 ) k 2 + k ∇ f ( λ 2 ) k 2 (18) ≤ 2 √ m = 20 m · 1 10 √ m (19) < 20 m · k λ 1 − λ 2 k 2 . Hence, we mov e on to pro ving the theore m in the case k λ 1 − λ 2 k 2 ≤ 1 10 √ m (20) T o war ds this, deﬁne ε def = √ m k λ 1 − λ 2 k 2 . (21) Note that by assumpt ion (20) , ε ≤ 1 / 10 . It follo ws from (21) that for any M ∈ M , | λ 1 ( M ) − λ 2 ( M ) | =      ∑ e ∈ M (( λ 1 ) e − ( λ 2 ) e )      ∆ − ineq . ≤ ∑ e ∈ M | ( λ 1 ) e − ( λ 2 ) e | ≤ k λ 1 − λ 2 k 1 Cau . − Sch . ≤ √ m k λ 1 − λ 2 k 2 = ε . (22) The follo w ing series of claims establish es Theorem C.1. Claim C. 2 e − ε ≤ Z λ 1 Z λ 2 ≤ e ε . Pro of: F or each M ∈ M , (22) implies that e − ε ≤ e − λ 1 ( M ) e − λ 2 ( M ) ≤ e ε . (23) Thus, e − ε ≤ min M ∈ M e − λ 1 ( M ) e − λ 2 ( M ) ≤ ∑ M ∈ M e − λ 1 ( M ) ∑ M ∈ M e − λ 2 ( M ) ≤ max M ∈ M e − λ 1 ( M ) e − λ 2 ( M ) ≤ e ε . (24) Here, we ha ve use d the inequality that for non-ne gati v e numbers a 1 , a 2 , . . . and b 1 , b 2 , . . . , min i a i b i ≤ ∑ i a i ∑ i b i ≤ max i a i b i . (25) The claim follo ws by combinin g (24) w ith the deﬁnition ∑ M ∈ M e − λ 1 ( M ) ∑ M ∈ M e − λ 2 ( M ) = Z λ 1 Z λ 2 . Claim C. 3 F or each M ∈ M , e − 2 ε ≤ p λ 1 M p λ 2 M ≤ e 2 ε . Pro of: By de ﬁnition, p λ 1 M p λ 2 M = e − λ 1 ( M ) e − λ 2 ( M ) · Z λ 2 Z λ 1 . Since all the number s in v olv ed in this product are positi ve, (23) and Claim C.2 imply that both the ratios in the right hand sid e of the equat ion are bounded from belo w by e − ε and from abo ve by e ε . Hence, their produ ct is bounded from below by e − 2 ε and from abov e by e 2 ε , comple ting the proof of the claim. 38 Claim C. 4 F or each e ∈ [ m ] , e − 2 ε ≤ ( θ 1 ) e ( θ 2 ) e ≤ e 2 ε . Pro of: By th e deﬁnition of θ 1 and θ 2 , ( θ 1 ) e ( θ 2 ) e = ∑ M ∈ M , M ∋ e p λ 1 M ∑ M ∈ M , M ∋ e p λ 2 M . Combining Claim C.3 and (25), we obtain that e − 2 ε ≤ min M ∈ M , M ∋ e p λ 1 M p λ 2 M ≤ ∑ M ∈ M , M ∋ e p λ 1 M ∑ M ∈ M , M ∋ e p λ 2 M ≤ max M ∈ M , M ∋ e p λ 1 M p λ 2 M ≤ e 2 ε , completi ng the proof of the claim. Claim C. 5 F or ε ≤ 1 / 10 , k θ 1 − θ 2 k 1 ≤ 3 ε m . Pro of: By de ﬁnition k θ 1 − θ 2 k 1 = ∑ e ∈ [ m ] | ( θ 1 ) e − ( θ 2 ) e | Since θ 1 , θ 2 ≥ 0 , Claim C. 4 impli es that for each e ∈ [ m ] , ( e − 2 ε − 1 )( θ 2 ) e ≤ ( θ 1 ) e − ( θ 2 ) e ≤ ( e 2 ε − 1 )( θ 2 ) e . Since max {| e − 2 ε − 1 | , | e 2 ε − 1 |} ≤ 3 ε for ε ≤ 1 / 10 , the abov e inequa lity reduces to, for each e ∈ [ m ] , | ( θ 1 ) e − ( θ 2 ) e | ≤ 3 ε ( θ 2 ) e . Thus, k θ 1 − θ 2 k 1 = ∑ e ∈ [ m ] | ( θ 1 ) e − ( θ 2 ) e | ≤ ∑ e ∈ [ m ] 3 ε ( θ 2 ) e ≤ 3 ε m since θ 2 ≥ 0 and θ 2 ∈ P ( M ) ⊆ [ 0 , 1 ] m . This completes the proof of the claim. Finally , to complete the proof of Theorem C.1, note that when (20) holds, k ∇ f ( λ 1 ) − ∇ f ( λ 2 ) k 2 (17) = k θ 1 − θ 2 k 2 ℓ 2 ( x , y ) ≤ ℓ 1 ( x , y ) ≤ k θ 1 − θ 2 k 1 Claim C . 4 ≤ 3 m ε (21) = 3 m √ m k λ 1 − λ 2 k 2 . 39

Entropy, Optimization and Counting

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment