Using TPA to count linear extensions

Using TP A to count linea r extensions Jacqueline Banks Univ ersit y of California, Rivers ide jbank003@studen t.ucr.edu Scott M. Ga rrab rant Pitzer College scott@garrabrant.com Ma rk L. Hub er Claremon t McKenna College mh ub er@cmc.edu Anne P erizzolo Colum bia Univ ersit y ap erizzolo11@studen ts.claremon t.edu Abstract A linear extension of a p oset is a p ermutatio n of th e elemen ts of the set that resp ects the partial order. Let #( L ) denote the num b er of linear extensions. It is a #P complete problem to determine #( L ) exac tly for an arbitrary p oset, and so rand omized appr o ximation algorithms that draw r an d omly from the set of linear extensions are used. In this w ork, the set of lin ear extensions is em b edded in a larger state space with a con tin uous p arameter β . The in tro duction of a con tin uous parameter allo ws for the use of a more eﬃcien t metho d for appr oximati ng # ( L ) called TP A . Our p rimary result is that it is p ossible to sample from this con tin uous em b edding in time that as fast or faster than th e b est kno wn metho ds for sampling un if orm ly from linear extensions. F o r a p oset conta inin g n elemen ts, this m eans we can approximate #( L ) to with in a factor of 1 + ǫ with pr obabilit y at least 1 − δ using an exp ected num b er of r andom bits and co mparison s in the p oset w h ic h is at most O ( n 3 (ln n )(ln #( L )) 2 ǫ − 2 ln δ − 1 ) . Keyw ords: p erfect sim ulation, posets, coun ting, #P complete MSC Classiﬁcation: Primary: 65C05 ; 06A07 1 Intro duction Consider the set [ n ] = { 1 , . . . , n } . A p artial or der  is a binary relation on [ n ] that is reﬂexiv e so ( ∀ a ∈ [ n ])( a  a ), antisy mmetric so ( ∀ a, b ∈ [ n ])( a  b ∧ b  a ⇒ a = b ), a nd t r a nsitiv e 1 so ( ∀ a, b, c ∈ [ n ])( a  b ∧ b  c ⇒ a  c ). A set equipp ed with a partial order is a partially ordered set, or p oset for short. The v a lues { 1 , . . . , n } can be view ed as items. F or a p erm utatio n σ , sa y that item i has p osition j if σ ( j ) = i . Then a p ermutation r esp ects the partial order  if whene v er i  j , the p osition of item i is less than the p osition of item j . Suc h a p erm utation is called a linear extension of the partial order. Deﬁnition 1. A p erm utation σ is a line ar extension of the pa r tial order  if for all i and j in [ n ], i  j implies t hat σ − 1 ( i ) < σ − 1 ( j ). F or example, if the partia l order states that 1  3 and 2  4, then the p erm utation ( σ (1) , σ (2) , σ (3) , σ (4)) = (1 , 3 , 2 , 4) w ould b e a linear extension. Ho w ev er, (4 , 1 , 3 , 2) would not since the p osition o f 4 is 1, whic h is smaller tha n the p osition o f 2 whic h is 4. This deﬁnition follo ws tha t of Karzano v and Khatc hy an [9]. Note that some authors suc h as [1 ] deﬁne the linear extension to b e the p erm utation σ − 1 rather than σ . Let L denote the set of linear extensions of a particular p o set. Our goal here is to eﬃcien tly coun t t he n umber of linear extensions, #( L ). F inding #( L ) is a #P complete problem [2] for g eneral par t ia l orders, and so instead of a n exact deterministic method, w e dev elop a randomized a pproximation metho d. There are man y applications of this problem. Morton et al. [10] ha v e sho wn that a particular t yp e of con v ex rank test for nonparametric mo dels can be reduced to coun ting linear extensions. Man y data sets suc h as athletic competitions or pro duct comparisons do not ha v e results for ev ery p o ssible pairing, but instead ha v e an incomplete set o f comparisons. Coun ting linear extensions can b e used to dev elop estimates of the a ctual rank of the items in v olv ed (see [3].) Previous results Previous metho ds fo r this problem ([9, 6]) concen trated on sampling from the set of linear extensions where some of the p erm utation v alues are ﬁxed ahead of time. Generating a single unifor m sample from the set of linear extensions tak es O ( n 3 ln n ) expected n um b er of random bits, using a n um b er of comparisons that is a t most the n um b er of random bits [6]. Using the self-reducibility metho d of Jerrum et al. [8], this can b e used t o estimate #( L ) to within a factor of 1 + ǫ with probability at least 1 − δ in time O ( n 5 (ln n ) 3 ǫ − 2 ln(1 /δ )). Here w e tak e a diﬀeren t approa c h. Instead of sampling uniformly from the set of p erm u- tations, a we ighted distribution is used that has a parameter β . The we ight assigned to an el- emen t v aries contin uo usly with β , and this allo ws us to use a new metho d for turning samples from our weigh ted distribution in to an appro ximation for #( L ) called the T o o tsie P op Algo- rithm ( TP A ). The use of TP A giv es us an a lgorithm that is O ((ln #( L )) 2 n 3 (ln n ) ǫ − 2 ln(1 /δ )). In the w orse case, ln #( L ) is O ( n ln n ) and the complexit y is t he same as the older a lgorithm, ho w ev er, if #( L ) is small compared to n !, this algorithm can b e m uc h fa ster. Even in the w orst case, the constant hidden b y the big- O notation is m uc h smaller f or the new algorithm (see Theorem 4 of Section 7.) Organization In the next section, w e describ e the self-reducibilit y metho d and TP A in detail. Section 3 illustrates the use of T P A on a simple example, and then Section 4 shows ho w it can b e used on the linear extens ions problem b y adding the appropriate w eigh ting. 2 Section 5 then sho ws ho w the non-Mark ovian coupling from the past metho d in tro duced in [6] can also b e used for t his new em b edding, and Section 7 collects results concerning the running time of the pro cedure, including an explicit b ound on the expected num b er of random bits and comparisons used b y the algorithm. 2 The T o otsie P op A lgo rithm In [8 ], Jerrum et al. noted that fo r self-reducible pro blems, an algo rithm for generating from a set could b e used to build an appro ximation algorithm fo r ﬁnding the size of the set. Informally , a problem is self-r educible if the set of solutions can b e partitioned into t he solutions of smaller instances of the pr o blem (for precise details, see [8].) F or example, in linear ex tensions once the v alue of σ ( n ) is determined, the problem of dra wing σ (1) , . . . , σ ( n − 1) is just a smaller linear extension generation problem. While a theoretical tour de force, as a practical matter using self-reducibilit y to build algorithms is diﬃcult. The output of a self-reducibilit y algorithm is a scaled pro duct of binomials, not the easiest distribution to w ork with or a nalyze precisely . The T o otsie P o p Algorithm ( T P A ) [7] is one wa y t o solv e this diﬃcult y . Roughly sp eaking, TP A begins with a large set (the shel l ) con taining a smaller set (the c enter ). A t each step, TP A draws a sample X randomly f r o m t he shell, and reduces the shell as mu ch as possible while still containing X . The pro cess then r ep eats, dra wing samples a nd con tracting the shell. This contin ues until the sample dra wn lands in the cen ter. The n um b er of samples dra wn b efore one fa lls in the cen ter has a Poiss on distribution, with par a meter equal to the natural log a rithm of the r a tio of t he size of the shell to the cente r. T o b e precise, TP A requires the follow ing ingredien ts (a) A measure space ( Ω , F , µ ). (b) Tw o ﬁnite measurable sets B and B ′ satisfying B ′ ⊂ B . The set B ′ is the c enter and B is the shel l . (c) A family of nested sets { A ( β ) : β ∈ R } suc h that β < β ′ implies A ( β ) ⊆ A ( β ′ ). Also µ ( A ( β )) mus t be a contin uous f unction of β , and lim β →−∞ µ ( A ( β )) = 0 . (d) Sp ecial v alues β B and β B ′ that satisfy A ( β B ) = B and A ( β B ′ ) = B ′ . With these ingredien ts, TP A can b e run as follo ws. Let A = ln( µ ( B ) / µ ( B ′ )), so that exp( A ) is what we ar e trying to estimate. Then eac h run thro ugh the for lo op in the algor it hm requires on av erage A + 1 samples, making the total exp ected n um b er of samples r ( A + 1). The v a lue of k in line 7 of the a lgorithm is P oisson distributed with parameter r A . This means that r should b e set to ab out A so that k /r is tigh tly concen trated around A . But w e do not know A ahead of time! This leads to the need f o r a t wo-phase algorithm. In the ﬁrst phase r is set to b e large enough to get a rough appro ximation of A , a nd then in the second phase r is set ba sed on our estimate f r om the ﬁrst run. That is: 1. Call TP A with r 1 = 2 ln(2 /δ ) to obtain ˆ L 1 , and set ˆ A 1 = ln( ˆ L 1 ). 2. Call TP A with r 2 = 2( ˆ A 1 + p ˆ A 1 + 2)[ln( 1 + ǫ ) 2 − ln(1 + ǫ ) 3 ] − 1 ln(4 /δ ) to obtain the ﬁnal estimate. 3 Algorithm 2.1 TP A( r , β B , β B ′ ) Input: Num b er o f runs r , initial index β B , ﬁnal index β B ′ Output: ˆ L (estimate of µ ( B ) /µ ( B ′ )) 1: k ← 0 2: for i from 1 to r do 3: β ← β B , k ← k − 1 4: while β > β B ′ do 5: k ← k + 1, X ← µ ( A ( β )), β ← inf { β ′ ∈ [ β B ′ , β B ] : X ∈ A ( β ′ ) } 6: end while 7: end for 8: ˆ L ← exp( k /r ) The result is output ˆ L 2 that is within a factor of 1 + ǫ of #( L ) with probabilit y at least 1 − δ . This is sho wn in Section 7. 3 Continuous emb edding: simple ex ample T o illustrate TP A v ersus the basic self-reducibilit y approa ch, consider a simple problem that will se rve as a building blo c k for our algorithm on linear extensions later. In this problem, w e estimate the size of the set { 1 , 2 , . . . , n } g iv en the ability to dra w samples uniformly from { 1 , 2 , . . . , b } for a ny b . In the self-reducibilit y approa c h, b egin b y setting β 1 = ⌈ n/ 2 ⌉ and dra wing samples from { 1 , . . . , n } . Coun t how man y fall into { 1 , . . . , β 1 } and use this num b er ˆ a 1 (divided b y the n um b er of samples) as an estimate of β 1 /n . Now rep eat, letting β 2 = ⌈ β 1 / 2 ⌉ and estimating ˆ a 2 = β 2 /β 1 un til β k = 1. Note that E [ˆ a 1 ˆ a 2 · · · ˆ a k − 1 ] = β 1 n β 2 β 1 · · · β k β k − 1 = β k n . Since the ﬁnal estimate ˆ a of 1 n is the pro duct of k − 1 estimates, Fishman called this alg o rithm the pr o duct estimator [4]. The problem with analyzing the output of the pro duct estimator, is that it is the pro duct of k scaled binomials. T o use TP A on this problem, it needs to b e embedded in a con tin uous setting. Consider the state space [0 , n ]. The family of sets needed for TP A will b e [0 , β ], where β B = n and β B ′ = 1. This make s the ratio of the measure of [0 , β B ] to [0 , β B ′ ] equal to n . Note that y ou can draw uniformly from [0 , β ] in the follo wing tw o step fashion. First dra w X ∈ { 1 , 2 , . . . , ⌈ β ⌉} so tha t P ( X = i ) = 1 /β for i < β and P ( X = β ) = ( 1 + β − ⌈ β ⌉ ) /β . If X < β , draw Y uniform on [0 , 1], otherwise dra w Y uniform on [0 , 1 + β − ⌈ β ⌉ ]. The ﬁnal dra w is W = X − 1 + Y . TP A starts with β 0 = n , then draws W as a b ov e. The inﬁm um ov er all β suc h that W ∈ [0 , β ] is just β = W . So β 1 just equals W . Next, redra w W from [0 , β 1 ]. Again, the inﬁm um of β satisfying W ∈ [0 , β ] is just W , so β 2 equals this new v alue of W . This pro cess rep eats until W fa lls in t o [0 , 1]. The estimate k for ln n is just the n umber of steps needed b efore the ﬁnal step in to [0 , 1]. Note that k can equal 0 if the v ery ﬁrst 4 step lands in [0 , 1 ]. This random v ar ia ble k will b e P oisson distributed with par ameter ln n . Recall t ha t the sum of P oisson random v ariables is also Poiss on with pa r ameter equal to the sum of the individual parameters, so rep eating the pro cess r times and summing the results yields a P oisson random v a riable with parameter r ln n . Dividing by r and exp onen tiating then yields an estimate of n . 4 Continuous emb edding: linea r extensions This approach can be extended to the pro blem of linear extensions b y a dding an auxilliary random v ariable. First we deﬁne a distance b et w een an arbitrary permutation and a ho me linear extension. Note that w e can assume without loss o f generality that ( 1 , 2 , . . . , n ) is a v alid linear extension, o t herwise , simply r elab el the items so that it is. Then say that i is the ho me p osi tion of item i . Let a + = max { a, 0 } . In linear exten sion σ , item j ha s p osition σ − 1 ( j ). Deﬁne the distance from item j to its home p osition to b e ( σ − 1 ( j ) − j ) + . The maxim um of these distances ov er all items is the distance fr o m σ to the home p osition That is, let d ( σ, (1 , 2 , . . . , n )) = max j ( σ − 1 ( j ) − j ) + . If the distance is 0, then no elemen t i is to the righ t of the home p osition. The only w ay that can happ en is if σ ( i ) = i for all i . Righ t no w, the distance is discrete, falling into { 0 , 1 , 2 , . . . , n − 1 } , a nd all linear exten sions are equally lik ely . T o ﬁnish the contin uous embedding, it is necessary to change from a uniform distribution to o ne where some linear extensions a re more likely than others. Let β ∈ [0 , n − 1]. Supp ose that item i is farther than β to the righ t of its home p osition. Suc h an item has weigh t 0. If item i is closer than ⌈ β ⌉ to its home p osition, it has w eight 1. If item i is exactly distance ⌈ β ⌉ from its home p osition, then it receiv es w eigh t equal to 1 + β − ⌈ β ⌉ . F or instance, for σ = (1 , 4 , 3 , 2 ), and β = 3, the w eigh t of items 1, 4, and 3, is 1, while the w eigh t of item 2 is 1 + 3 − 3 = 1. If β fa lls to 2 . 3 then the w eigh t of item 2 drops to 1 + 2 . 3 − 3 = 0 . 3. Let the we ight of a linear extension b e the pro duct of the w eigh ts of eac h of its items. That is, w ( σ, β ) = Y i ∈ [ n ] w i ( σ , β ) , where w i ( σ , β ) = ((1+ β −⌈ β ⌉ ) 1 ( σ − 1 ( i ) − i = ⌈ β ⌉ )+ 1 ( σ − 1 ( i ) − i < ⌈ β ⌉ )) . (1) In other w ords, w hen β is an in teger, all w eigh ts are either 0 or 1, and are 1 if and only if all items are at distance at most β t o the righ t of their home p osition. When β is not an in teger, then all items m ust b e at most ⌈ β ⌉ distance from home, a nd eve ry it em whose distance f r om home equals ⌈ β ⌉ receiv es a p enalty factor equal to the fractiona l part of β . Note tha t w ( σ, β ) is an increasing function of β . Supp ose X is a random elemen t of L where P ( X = σ ) ∝ w ( σ, β ) . Let Unif (Ω) denote the uniform distributon ov er the set Ω. Giv en X , create the auxiliary v ariable Y as [ Y | X ] ∼ Unif ([0 , w ( σ, β )]) . Let A ( β ) = { ( x, y ) : x ∈ L , y ∈ [0 , w ( x, β )] } . 5 It is upon these sets A ( β ) that TP A can b e used. Here A ( n − 1) = L × [0 , 1] is the shell and A (0) = { (1 , 2 , . . . , n ) } × [0 , 1] is the cen ter. Then since w ( σ, β ) is an increasing function of β , for β ′ ≤ β , A ( β ′ ) ⊆ A ( β ). 5 Sampling fr om the continuous emb edding F or the contin uous embedding to b e useful for T P A , it mu st b e p ossible to sample from the set of linear extensions with the w eight function giv en in (1). O nce the linear extension X has b een created, sampling the Y to go along with it is simple. T o sample from the set of w eighted linear extensions, ﬁrst consider a Mark ov c hain whose stationary distribution matches the target distribution. This is done by using a Metroplis- Hastings approa c h. The prop osal chain works as follows. With probabilit y 1/2, the c hain just sta ys where it is. With probabilit y 1 / 2, a p osition i is c hosen uniformly from { 1 , . . . , n − 1 } . If suc h a transp osition ob eys the partial order and do es not mo v e an item more than ⌈ β ⌉ to the right of its home p osition, it is the prop osed mov e. If the pro p osal is to transp ose the items, then one item migh t ha v e acquired a w eigh t factor o f 1 + β − ⌈ β ⌉ if it mov es to b e exactly ⌈ β ⌉ from its home po sition. So w e only accept suc h a mov e w ith probabilit y 1 + β − ⌈ β ⌉ . This is enco ded in Algorithm 5 .1 . Algorithm 5.1 ChainStep ( σ , i, C 1 , C 2 ) Input: curren t linear extension Mark ov c hain state σ Output: next linear extension Mark o v chain state σ 1: d ← i + 1 − σ ( i ) 2: if C 1 = 1 and not σ ( i )  σ ( i + 1) and d ≤ ⌈ β ⌉ t hen 3: if d < β or C 2 = 1 then 4: a ← σ ( i + 1), σ ( i + 1) ← σ ( i ), σ ( i ) ← a 5: end if 6: end if W rite B ∼ Bern ( p ) if P ( B = 1) = p and P ( B = 0) = 1 − p . Then with the appropr i- ate choice of random inputs, Algorithm 5.1 has the distribution on linear exte nsions with probabilities prop ortional to w ( · , β ) a s a stationary distribution. Lemma 1. F or σ ∼ w ( σ, β ) , i ∼ Unif ( { 1 , . . . , n − 1 } ) , C 1 ∼ Bern (1 / 2) , C 2 ∼ Bern (1 + β − ⌈ β ⌉ ) . Then ChainStep ( σ , i, C 1 , C 2 ) ∼ w ( σ, β ) . Pr o of. This fo llo ws from t he revers ibility (see for instance [12]) of the Mark o v c hain with resp ect to w . F rom this ch ain, it is p o ssible to build a metho d for obtaining samples exactly f rom the target distribution. The metho d of coupling f rom the past (CFTP) w as dev elop ed b y Propp and Wilson [11] to dr aw samples exactly from the stationary distribution of Mar ko v c hains. F or this problem, an extension called non-Marko vian CFTP [6] is needed. The metho d w orks as fo llo ws. First, a b ounding chain [5] is constructed for the c hain in question. A bounding c hain is an auxiliary c hain on the set of sub sets of the original state 6 space. That is, Ω bound = 2 Ω , where Ω is the state space of the original c hain. Moreo v er, there is a coupling b etw een the original c hain { σ t } a nd the b ounding c hain { S t } such that σ t ev olv es according t o t he kerne l of the original b ounding chain, and σ t ∈ S t → σ t +1 ∈ S t +1 . F or us, the state of the b ounding c hain is indexed by a vec tor B ∈ { 1 , . . . , n, θ } n . Let S ( B ) = { σ : ( ∀ i )(( B ( j ) = i ) ∧ ( σ ( j ′ ) = i ) ⇒ j ′ ≤ j ) } F or instance, if B (3) = 4, then σ ∈ S ( B ) requires that σ (1) = 4 o r σ (2) = 4 or σ ( 3 ) = 4. In this setup θ is a sp ecial sym b ol: if B ( i ) = θ , then there is no restriction on σ whatso eve r. T o visualize what is happ ening with t he state and b ounding state, it will b e useful to hav e a pictorial represen tation. F or instance, if σ = (4 , 2 , 3 , 1) and B = ( θ , 4 , 3 , θ ) this can b e represen ted b y: 4 | θ 2 | 4 3 | 3 1 | θ . The b ounding state w orks by k eeping track of the right most p osition of the item in the underlying state. If B ( i ) = a , sa y that bar | a is at p osition i . T o b e a b ounding state, if bar | a is at p osition i , t hen item a mus t be a t a p osition in { 1 , 2 , . . . , i } . No w supp ose there is a single | 1 at the righ tmost p o sition and all other p o sitions con tain | θ . Then t his state B = ( θ , . . . , θ , 1) b o unds all p erm utations. Next, suppose that t here are no | θ an ywhere in the b o unding state. F or instance B = (2 , 4 , 1 , 3). Let x b e a state b ounded b y B . Then B ( 1 ) = 2 means that item 2 in in p o sition 1. B (2) = 4 means tha t item 4 is in p osition 1 or 2 . But item 2 is in p osition 1, so 4 mus t b e in p osition 2 . Similarly , item 1 mus t b e in p osition 3 and item 3 m ust b e in position 4. In other words , if no comp onen t of B is lab eled θ , then S ( B ) = { B } . In our example 2 | 2 4 | 4 1 | 1 3 | 3 . W e are now ready to state the pro cedure fo r up dating the current state and the b ounding state sim ultaneously . This op erates a s in Algorithm 5.2. Note tha t if the inputs to the Algorithm ha ve i ∼ Unif ( { 1 , 2 , . . . , n } ) and C 1 ∼ Bern (1 / 2), then the state σ is up dated using the same probabilities as the previous chain step. The k ey diﬀerence b et w een ho w σ and B are up dated is t hat if σ ( i ) = B ( i + 1) , then B is up dated using C 3 = 1 − C 1 , otherwise C 3 = C 1 . In any case, since C 1 ∼ B ern (1 / 2), C 3 ∼ B ern (1 / 2) as w ell. Algorithm 5.2 BoundingCha inStep ( σ , B , i, C 1 , C 2 ) Input: curren t stat e and b ounding state ( σ, B ) Output: next state and b ounding state ( σ, B ) 1: C 3 ← (1 − C 1 ) 1 ( σ ( i ) = B ( i + 1)) + C 1 1 ( σ ( i ) 6 = B ( i + 1)) 2: σ ← ChainStep ( σ, i, C 3 , C 2 ) 3: B ← ChainStep ( B , i, C 1 , C 2 ) 4: if B ( n ) = θ then 5: B ( n ) ← 1 + # { j : B ( j ) 6 = θ } 6: end if 7: Return ( σ, B ) Note that σ is b eing up dated as in Algorithm 5.1. The only diﬀeren t is the bounding state up date. First, note that if i + ⌈ β ⌉ ≤ n , then the rightm ost p osition that item i can b e is i + ⌈ β ⌉ . Hence there should b e a | i at p osition i + ⌈ β ⌉ or less. 7 Deﬁnition 2. A b ounding state B is β -tight if f o r all items i with i + ⌈ β ⌉ ≤ n , there exists j ≤ i + ⌈ β ⌉ such that B ( j ) = i . Our main result is: Theorem 1. If σ ∈ S ( B ) for B a β -tight b ounding s tate, then running one s tep of Algo- rithm 5 .2 le a ves σ ∈ S ( B ) r e gar d less of the inputs i , C 1 and C 2 . Pr o of. When C 1 = C 3 = 0, neither the σ state or the B state c ha nges, and so t he result is trivially true. W rite ( φ ( σ ) , φ ( B )) for the o utput of the algorithm, su pressing the dep endence on i , C 1 , and C 2 . Given p erm utation x , write t ( x, i ) for t he p erm utation where x ( i ) and x ( i + 1) ha v e b een transp osed. In order for φ ( σ ) / ∈ φ ( B ), there mus t be an item a that mo ve s to the right of t he bar | a . If there is no | a in the b ounding state (so there do es no t ex ist j with B ( j ) = a ) then this triv ally cannot happ en. Both bars and items can eac h mo v e a t most one step to the righ t or left. So if either the p osition of a is t w o or more to the left of the p osition of t he bar | a , or there is no bar | a in the b ounding state, then this also cannot happ en. With that in mind, supp ose a is exactly one p o sition to the left of the bar | a . Then the only w ay that a and | a could cross is if σ − 1 ( a ) = i , B − 1 ( a ) = i + 1, a nd φ ( σ ) = t ( σ, i ) and φ ( B ) = t ( σ , i ). But when σ ( i ) = a = B ( i + 1), C 3 = 1 − C 1 , so either φ ( σ ) = σ or φ ( B ) = B . So this bad case cannot o ccur. Supp ose a and | a are at the same p o sition. If that p osition is i , then since φ ( σ ) and φ ( B ) are using the same inputs and the weigh t factor incurred by mo ving a to p osition i + 1 is the same fo r b oth, either b oth use the transp ose or neither do. So either w a y no violation o ccurs. The ﬁnal p o ssibility to consider is that σ ( i + 1) = B ( i + 1 ) = a . Is it p ossible for | a to mo v e one p o sition to the left while a stay s at p osition i + 1? F o rtunately , the answ er is once again no. If C 1 = 0 , then C 3 = 0 , so b o t h φ ( σ ) = σ and φ ( B ) = B , so there is nothing to sho w. Supp ose C 1 = C 3 = 1. Now consider t he v alue of σ ( i ). If i = σ ( i ) + ⌈ β ⌉ , then the transp ose op eration on σ w ould mo ve item σ ( i ) too far to the righ t, and so φ ( σ ) = σ . But in this case, since B is β -tight, i = B ( i ) + ⌈ β ⌉ a s w ell, and so φ ( B ) = B . Similarly , if i = σ ( i ) + ⌈ β ⌉ − 1 , then either B ( i ) = σ ( i ) or B ( i + 1) = σ ( i ). But the B ( i + 1) = σ ( i ) case w as dealt with earlier, lea ving again that B ( i ) = σ ( i ). So no w if C 2 = 1 then φ ( σ ) = t ( σ, i ) and φ ( B ) = t ( B , i ), and if C 2 = 0 then φ ( σ ) = σ and φ ( B ) = B . Either w a y , they b oth mov e together. If i < σ ( i ) + ⌈ β ⌉ − 1 , then φ ( σ ) = t ( σ, i ), so there can neve r be a violat io n. Hence in all cases σ ∈ S ( B ) ⇒ φ ( σ ) ∈ S ( φ ( B )). So if σ is b ounded by a β - tigh t B , it will still b e b ounded after taking one step in t he b ounding c hain step. With this established, samples from the target distribution can b e generated as in Algorithm 5.3 [6, 11] using non-Marko vian CFTP . 8 Algorithm 5.3 Generate ( t ) Input: t n um b er o f steps to use to generate a sample Output: σ draw n from the we ighted distribution 1: σ ← (1 , 2 , . . . , n ), B ← ( θ , . . . , θ ) 2: for i from 1 to n − ⌈ β ⌉ do 3: B ( i + ⌈ β ⌉ ) ← i 4: end for 5: B 0 ← B 6: for j f rom 1 to t do 7: dra w i ( j ) ← Unif ([ n − 1]) , C 1 ( j ) ← B ern (1 / 2), C 2 ( j ) ← Bern (1 + β − ⌈ β ⌉ ) 8: ( σ , B ) ← BoundingChainSte p ( σ, B , i ( j ) , C 1 ( j ) , C 2 ( j )) 9: end for 10: if for all i , B ( i ) 6 = θ then 11: σ ← B 12: else 13: σ ← Generate (2 t ), B ← B 0 14: for j f rom 1 to t do 15: ( σ , B ) ← BoundingChainS tep ( σ, B , i ( j ) , C 1 ( j ) , C 2 ( j )) 16: end for 17: end if 6 TP A fo r linea r extensions No w TP A can b e applied to linear extensions. In the pr esen tatio n earlier, giv en X ∼ w ( · ), a single ra ndom v a r ia ble [ X | Y ] ∼ Unif ([0 , w ( X )]) w as used to mak e the joint distribution uniform. Since w ( x ) has a pro duct form, how ev er, it mak es things easier to generate n diﬀeren t auxiliary random v ariables Y 1 , . . . , Y n to mak e it w ork. If w ( x ) = Q i ∈ [ n ] w i ( x ), let eac h [ Y i | X ] b e indep enden t and Unif ([0 , w i ( X )]. Supp ose that Y 2 = 0 . 3. Then if item 2 is 3 units to the righ t of its home p osition, then that implies that β ≥ 2 . 3. If item 2 is 2 units to the righ t of its home position then β ≥ 1 . 3, if it is 1 unit to t he right of home then β ≥ 0 . 3. F inally , if item 2 is at home then β ≥ 0. In general, fo r item j exactly X − 1 ( j ) − j to the righ t of its home p o sition, β ≥ b i = ( X − 1 ( j ) − j − 1 + Y j ) 1 ( X − 1 ( j ) − j > 0). So that me ans that the next v alue of β should b e equal to the largest v alue of b i . Since this minim um is taken ov er all items i , and X is a perm utatio n, the new v alue of β can be set to b e max j ( j − X ( j ) − 1 + Y j ) 1 ( j − X ( j ) > 0) . 7 Analysis In this section we pro v e sev eral results concerning the running time of the pro cedure outlined in the previous section. 9 Algorithm 6.1 TPALinearEx tensions ( r ) Input: Num b er o f runs r Output: ˆ L (estimate of #( L )) 1: k ← 0 2: for i from 1 to r do 3: β ← n − 1 , k ← k − 1 4: while β > 0 do 5: k ← k + 1 6: X ← Genera te (1) 7: for j from 1 to n do 8: dra w Y j ← Unif ([0 , w j ( X , β )]) 9: let β j ← ( X − 1 ( j ) − j − 1 + Y j ) 1 ( X − 1 ( j ) − j > 0) 10: end for 11: β ← max j β j 12: end while 13: end for 14: ˆ L ← exp( k /r ) Theorem 2. The non-Markovian c oupling fr om the p ast in A lgorithm 5.3 r e quir es an ex- p e cte d n umb er of r andom bits b ounde d by 4 . 3 n 3 (ln n )( ⌈ log 2 n ⌉ + 3) and a numb er of c o mp ar- isons b ounde d by 8 . 6 n 3 ln n . Theorem 3. F or ǫ ≤ 1 , the two-phase TP A appr o ach outline d at the end of Se ction 2 gener ates output ˆ L 2 such that P ((1 + ǫ ) − 1 ≤ ˆ L 2 / #( L ) ≤ 1 + ǫ ) ≥ 1 − δ . Theorem 4. The exp e cte d numb er of r a ndom bits ne e de d to appr oximate #( L ) to w ithin a factor o f 1 + ǫ with pr ob ability at l e ast 1 − δ is b ounde d ab ove by 4 . 3 n 3 (ln n )( ⌈ log 2 n ⌉ + 3)[2 ( A + 1) ln(2 / δ ) + ( A + 1) ( A + √ A + 2) ( ln(1 + ǫ ) 2 − ln(1 + ǫ ) 3 ) ln(4 /δ )] . Pr o of of Th e or em 2. Lemma 10 of [6] sho we d that when there is no β para meter, the ex- p ected num b er of steps ta ken b y non-Marko vian CFTP was b ounded a b o v e by 4 . 3 n 3 ln n. So the question is: once the β parameter falls below n , do es the b ound still hold? The b ound was deriv ed b y cons idering how long it t ak es fo r the | θ v alues in the bounding state to disapp ear. Eac h time a | θ reac hes p osition n , it is remov ed and replaced by something of the for m | a . When all the | θ disapp ear, the pro cess in Algorithm 5.3 terminates. When there is no β , the pro babilities that a particular | θ b ound mov es t o the left or the righ t are equal: b oth 1 / (2 n ). (This do es not apply when the b ound is at p osition 1 , in whic h case the b ound canno t mo v e to the left.) The result in [6 ] is really a bo und o n the n um b er of steps in a simple random walk necess ary for the | θ b ounds to all reac h state n . No w supp ose that β ∈ (0 , n ). The probability that a | θ b ound mo ve s to the righ t is still 1 / (2 n ), but now cons ider wh en t he state is of the form . . . | a | θ . . . . F or | θ to mo v e left the | a has to mov e right, and this could o ccur with pro babilit y (1 + β − ⌈ β ⌉ ) / (2 n ). That is, with β ∈ (0 , n ), the chance that the | θ mo v es left can b e b elow 1 / (2 n ). 10 This can only reduce the num b er of mo v es necessary fo r the | θ b ounds to reach the righ t hand side! That is, the random v a riable that is the num b er of steps nee ded for all the | θ b ounds to reach p osition n and disapp ear is dominated b y the same random v ariable for β = n . Hence the b ound obtained b y Lemma 10 of [6] still holds. No w to the random bits. D ra wing a uniform n um b er from { 1 , . . . , n } tak es ⌈ log 2 n ⌉ bits, while dra wing fro m { 0 , 1 } for coin C 1 tak es one bit. The exp ected n um b er of bits needed to dra w a Bernoulli random v aria ble with pa rameter not equal to 1 / 2 is t w o, and so the t o tal bits needed for one step of the pro cess (in exp ectation) is ⌈ log 2 n ⌉ + 3. Eac h step in the b ounding chain and state uses at most t w o comparisons. It will b e helpful in prov ing Theorem 3 to hav e the fo llo wing b o und on the tail of the P oisson distribution. Lemma 2. F or X ∼ Pois ( µ ) and a ≤ µ , P ( X ≥ µ + a ) ≤ exp( − (1 / 2) a 2 /µ + (1 / 2) a 3 /µ 2 ) and for a ≤ µ , P ( X ≤ µ − a ) ≤ exp( − a 2 / (2 µ )) . Pr o of. Thes e follow from Chernoﬀ Bounds whic h ar e essen tia lly Mar ko v’s inequality applied to the mo ment generating function o f the random v a riable. The momen t generating function of X is E [exp( tX )] = exp( µ ( e t − 1)) . So for a > 0 P ( X ≥ µ + a ) = P (exp( tX ) ≥ exp( t ( µ + a )) ≤ exp( µ ( e t − 1)) exp( t ( µ + a )) . Setting t = ln( 1 + a/µ ) minimizes the righ t ha nd side, and yields: P ( X ≥ µ + a ) ≤ exp( a − ( µ + a ) ln(1 + a/µ )) . F or a ≤ µ , − ln( 1 + a/µ ) ≤ − a/µ + (1 / 2 ) a 2 /µ 2 , so P ( X ≥ µ + a ) ≤ exp( − (1 / 2) a 2 /µ + (1 / 2) a 3 /µ 2 ) as desired. F or the second result: P ( X ≤ µ − a ) = P (exp( − tX ) ≥ exp( − t ( µ − a )) ≤ exp( µ ( e − t − 1)) exp( − t ( µ − a )) . Setting t = − ln(1 − a/µ ) then yields the next result. F or a ≤ µ, − ln(1 − a/µ ) ≤ a/µ + ( 1 / 2)( a/µ ) 2 . So − a − ( µ − a ) ln (1 − a/µ ) ≤ − a + ( µ − a )(( a/µ ) + ( 1 / 2)( a/µ )) = − (1 / 2)( a 2 /µ ) − (1 / 2)( a 3 /µ 2 ) . The r ig h t hand side is a t most − (1 / 2) a 2 /µ, whic h completes the pro of. Pr o of of Th e or em 3. Consider the ﬁrst phase of the algorithm, where TP A is run with r 1 = 2 ln(2 /δ ) . Consider the probability of the ev en t { ˆ A 1 + p ˆ A 1 + 2 < A } . This ev en t cannot happ en if A ≤ 2. If A > 2, then this eve nt o ccurs when ˆ A 1 < A − (3 / 2) − p A − 7 / 4. Since r 1 ˆ A 1 ∼ Pois ( r 1 A ), Lemma 2 can b e used to say that P ( r 1 ˆ A 1 < r 1 A − r 1 (3 / 2 + p A − 7 / 4)) ≤ exp( − (1 / 2)( r 1 (3 / 2 + p A − 7 / 4)) 2 / ( r 1 A ) ≤ exp ( − (1 / 2) r 1 (9 / 4 + 3 p A − 7 / 4 + A − 7 / 4) / A ) ≤ exp ( − (1 / 2) r 1 ) ≤ 2 /δ . 11 In other words , with proba bility at least 1 − δ / 2, ˆ A 1 + p ˆ A 1 + 2 ≥ A . No w consider the s econd phase. T o simplify the no tation, let ǫ ′ = ln(1 + ǫ ) , and ˆ A 2 = exp( ˆ L 2 ) where ˆ L 2 is the output from the second phase. Then from the ﬁrst phase r 2 ≥ A ( ǫ ′ 2 − ǫ ′ 3 ) − 1 ln(4 /δ ) with pro babilit y at least 1 − δ / 2. So f rom Lemma 2 , P ( r 2 ˆ A 2 ≥ r 2 A + r 2 ǫ ′ ) ≤ exp( − (1 / 2)( r 2 ǫ ′ ) 2 / ( r 2 A ) + (1 / 2)( r 2 ǫ ′ ) 3 / ( r 2 A ) 2 ) = exp( − (1 / 2) r 2 ǫ ′ 2 / A + (1 / 2) r 2 ǫ ′ 3 / A 2 ) ≤ exp ( − ln(4 /δ )) . A similar b ound holds for the left tail: P ( r 2 ˆ A 2 ≤ r 2 A − r 2 ǫ ′ ) ≤ exp( − (1 / 2) r 2 2 ǫ ′ 2 / ( r 2 A )) ≤ δ / 4 . Therefore, the total probability that failure occurs in eithe r the ﬁrst phase or the second is at most δ / 2 + δ / 4 + δ / 4 = δ. If r 2 ˆ A 2 is within additive error r 2 ǫ ′ = r 2 ln(1 + ǫ ) of r 2 A , then ˆ L 2 = exp( ˆ A 2 /r ) is within a factor of 1 + ǫ of exp( A ), sho wing the result. T o b ound the exp ected running time, the follo wing lo ose b ound o n the expected v a lue of the square ro o t of a P oisson random v aria ble is useful. Lemma 3. F or X ∼ P ois ( µ ) , E [ √ X ] ≤ √ µ. Pr o of. Since √ x is a concav e function, this follows from Jensen’s inequality . Pr o of of Th e or em 4. F rom The orem 2, the expected n um b er of bits p er sample is bo unded b y 4 . 3 n 3 (ln n )( ⌈ log 2 n ⌉ + 3) and does not depend on the sample. Hence the total num b er of exp ected bits can b e b ounded b y the exp ected n um b er of bits p er samples times the exp ected n um b er of samples. The ﬁrst phase of TP A uses r 1 = 2 ln(2 /δ ) runs, eac h with an exp ectatio n of A + 1 samples p er run to make r 1 ( A + 1) exp ected samples. The second phase uses r 2 = ( ˆ A 1 + p ˆ A 1 + 2)[ln(1 + ǫ ) 2 − ln(1 + ǫ ) 3 ] ln(4 /δ ) runs, where r 1 ˆ A 1 ∼ P ois ( r 1 A ) . So from Lemma 3, E [ q ˆ A 1 ] = r − 1 / 2 1 E [ p r 1 A ] ≤ r − 1 / 2 1 p r 1 A = √ A. Using A = ln(#( L ) ) and then combining these factors yields the result. 8 Conclusion TP A is a sharp improv emen t on the self-reducibility metho d of Jerrum et al. for estimating the size of a set. A t ﬁrst gla nce, the contin uity requiremen t of TP A precludes its use for discrete problems such as linear extensions. F ortunately , discrete problems can usually b e em b edded in a con tinuous space to mak e the use of TP A p ossible. Here w e hav e sho wn ho w to a ccomplish this task in suc h a w a y that the time needed to t a k e samples is the same as fo r uniform generation. The result is an algorithm that is m uc h fa ster at estimating the num b er of linear extensions than previously kno wn algorit hms. 12 References [1] N. Alon a nd J. H. Sp encer. The Pr ob abi listic Metho d . Wiley , 2008. [2] G. Bright we ll and P . Winkler. Coun ting linear extensions. Or der , 8(3 ) :2 25–242, 1991. [3] P .C. Fishb urn and W.V. Gehrlein. A comparative analysis of metho ds fo r constructing w eak orders f rom partial o rders. J. Math. So ciolo gy , 4:93–10 2, 1975. [4] G. S. Fishman. Cho osing sample path length and num b er of sample paths when starting in the steady state. Op er. R es. L etters , 1 6:209–219, 1994. [5] M. Hub er. P erfect sampling using b ounding chains . A nnals of Applie d Pr o b ability , 14(2):734– 753, 2004. [6] M. Hub er. F ast p erfect sampling from linear exte nsions. Dis cr ete Mathematics , 306 :4 20– 428, 2006. [7] M. L. Hub er and S. Sc hott. Using TPA for Ba y esian inference . Bayesia n Statistics 9 , pages 25 7 –282, 2010. [8] M. Jerrum, L. V aliant, and V. V azirani. Random generation of com binatorial structures from a uniform distribution. The or et. Comput. Sci. , 43:169–188 , 1986. [9] A. Karzanov and L. Khac hiy an. On the conductance of order Mark ov c hains. Or der , 8(1):7–15, 1 9 91. [10] J. Morton, L. Pac hter, A. Shiu, B. Sturmfels, and O. Wienand. Con v ex rank tests and semigraphoids. SI AM J. Discr ete Math. , 23(2):11 1 7–1134, 2009. [11] J. G. Propp and D. B. Wilson. Exact sampling with coupled Marko v c hains and appli- cations to statistical mec hanics. R ando m Structur es A lgorithms , 9(1–2):223 –252, 1996. [12] S. Resnic k. A dven tur es i n Sto ch astic Pr o c esses . Birkh¨ auser, 19 9 2. A R co de The f ollo wing R co de implemen ts these algorithms. chain.step <- function(state,b eta,i,c1,c2,posetmatrix) { # Takes one step in the Karzanov-Khatch yan chain using randomness in i, c1, c2 # Assumes that home state is the identity permutation # Line 2 of Algorithm 5.1 d <- i + 1 - state[i] # Line 3 of Algorithm 5.1 if ((state[ i] <= length(state )) && (state[i+1 ] <= length (state))) posetflag <- posetmatrix[state [i],state[i+1]] else posetflag <- 0 if ( (c1 == 1) && (posetflag == 0) && (d <= ceiling(b eta)) && ((d < beta) || (c2 == 1))) { 13 a <- state[i+1] ; state [i+1] <- state[i]; state[i] <- a } return(stat e) } bounding.ch ain.step <- function( cstate,beta,i,c1,c2,posetmatrix) { # Based on Algorithm 5.2 # Here cstate is a matrix with two rows, the first is the underlying state, # while the second is the bounding state n <- dim(cstate )[2] # Line 1 if (cstate[ 2,i+1] == cstate[1,i]) c3 <- 1 - c1 else c3 <- c1 # Line 2 & 3 cstate[1,] <- chain.step( cstate[1,],beta,i,c3,c2,posetmatrix) cstate[2,] <- chain.step( cstate[2,],beta,i,c1,c2,posetmatrix) # Line 4 through 6 if (cstate[ 2,n] == (n+1)) cstate[2,n] <- 1 + sum(cstate[2,] <= n) #Line 7 return(csta te) } generate <- function(t,be ta,posetmatrix) { n <- dim(posetm atrix)[1]; beta <- min(beta,n-1 ) # Line 1 sigma <- 1:n; B <- rep(n+1,n) # Line 2 thorugh 4 for (item in 1:(n - ceiling(beta))) B[item+ceil ing(beta)] <- item B0 <- B cstate <- matrix(c(s igma,B),byrow=TRUE,nrow=2) # Line 5 through 8 i <- floor(runi f(t)*(n-1))+1;c1 <- rbinom(t,1, 1/2) c2 <- rbinom(t, 1,1+beta-ceiling(beta)) for (s in 1:t) { cstate <- bounding.chain. step(cstate,beta,i[s],c1[s],c2[s],posetmatrix) } # Line 9 if (sum(cst ate[2,] < (n+1)) == n) return(cstat e[2,]) # Line 11-15 else { cstate[1,] <- generate(2*t,bet a,posetmatrix) 14 cstate[2,] <- B0 for (s in 1:t) cstate <- bounding.c hain.step(cstate,beta,i[s],c1[s],c2[s],posetmatrix) } return(csta te[1,]) } approximate .sample <- function(n ,beta,posetmatrix,tvdist) { # Generates an approximate sample from the target distribution x <- 1:n n <- length(x) t <- 10*n^3*log (n)*log(1/tvdist) for (i in 1:t) { i <- runif(1)*( n-1)+1 c1 <- rbinom(1,1,1/2 ) c2 <- rbinom(1,1,1+b eta-ceiling(beta)) x <- chain.step (x,beta,i,c1,c2,posetmatrix) } return(x) } checksum <- function(x) { checksum <- 0 n <- length(x) for (i in n:1) { onespot <- which(x == 1) checksum <- checksum + factorial(i-1)*( onespot-1) x <- x[-onespot ] - 1 } return(chec ksum+1) } count.perfe ct.linear.extensions <- function(n = 4,beta = 4,posetmatrix,tria ls = 100) { # Generates a number of linear extensions, then counts the results results <- rep(0,fac torial(n)) # Burnin to an approximate sample x <- 1:n n <- length(x) # Take data for (i in 1:trials) { 15 x <- generate(1 ,beta,posetmatrix) cs <- checksum(x) results[cs] <- results[cs] + 1 } return(resu lts/trials) } tpa.count.l inear.extensions <- function(r ,posetmatrix) { # Algorithm 6.1 # Returns an estimate of the number of linear # extension s consistant with posetmatrix require(Mat rix) n <- dim(posetm atrix)[1] # Line 1 k <- 0 # Line 2 through 12 for (i in 1:r) { beta <- n - 1; k <- k - 1 while (beta > 0) { k <- k + 1 x <- generate(1 ,beta,posetmatrix) xinv <- invPerm(x) betastep <- rep(0,n); y <- rep(0,n) for (j in 1:n) { y[j] <- runif(1)*((1 +beta-ceiling(beta))*(xinv[j]-j == ceiling(beta))+ ((xinv[j]-j ) < ceiling (beta))) betastep[j] <- (xinv[j]-j-1+y[ j])*(xinv[j]-j > 0) } beta <- max(betastep ) # cat(" X: ",x,"\n X^{-1}: ",xinv,"\n Y: ",y,"\n betastep: ",betastep,"\n beta: ",b e t a , " \ n " ) } } cat(" Estimate: [",exp((k-2*sq rt(k))/r),",",exp((k+2*sqrt(k))/r),"]\n") return(exp( k/r)) } tpa.approxi mation <- function(po setmatrix,epsilon,delta) { # Gives an $(\epsilo n,\delta)$-ras for the number of posets r1 <- ceiling(2 *log(2/delta)) a1 <- tpa.count .linear.extensions(r1,posetmatrix) 16 a1 <- log(a1) r2 <- ceiling(2 *(a1+sqrt(a1)+2)* (log(1+epsi lon)^2-log(1+epsilon)^3)^(-1)*log(4/delta)) a2 <- tpa.count .linear.extensions(r2,posetmatrix) return(a2) } brute.force .count.linear.extensions <- function(pos ets) { # Counts the number of linear extensions of a poset by direct # ennumerat ion of all n! permuta tions and checki ng each to see # if it is a linear extension # # The poset is given as an n by n matrix whose (i,j)th entry # is the indicator function of $i \preceq j$ require(gto ols) n <- dim(posets )[1] A <- permutatio ns(n,n) nfact <- nrow(A) le.flag <- rep(1,nfa ct) count <- 0 for (i in 1:nfact) { for (a in 1:(n-1)) for (b in (a+1):n) { le.flag[i] <- le.flag[i]*(1-po sets[A[i,b],A[i,a]]) } count <- count + le.flag[i] } return(coun t) } poset1 <- matrix(c(1,0,1, 0,0,1,0,1,0,0,1,0,0,0,0,1),byrow=TRUE,ncol=4) poset2 <- matrix(c(1,0,1, 1,1,1,1,1, 0,1,0,1,0,1,1,1 , 0,0,1,0,1 ,1,0,1, 0,0,0,1,0,1 ,1,1, 0,0,0,0,1, 0,0,0, 0,0,0,0,0,1,0,1, 0,0,0,0,0,0 ,1,1, 0,0,0,0,0, 0,0,1),byrow=TRUE,ncol=8) poset3 <- t(matrix(c(1,1, 0,1,0,1,0,1,0,0,1,1,0,0,0,1),nrow=4)) poset4 <- t(matrix(c(1,0, 1,1,1,1, 0,1,0,1,1,1, 0,0,1,0,1,0, 0,0,0,1,1,1 , 0,0,0,0,1,0, 0,0,0,0,0,1),nrow =6)) 17

Using TPA to count linear extensions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment