Interchanging distance and capacity in probabilistic mappings

In terc hanging distance and c a pacit y in probabilisti c mappings Reid Andersen ∗ Uriel F eige † Octob er 4, 2018 Abstract Harald R¨ ac ke [STOC 200 8] described a new metho d to obtain hierar- chica l decomp ositions of net works in a w a y that minimizes the congestio n . R¨ ac ke’s approach is based on an equiv alence that he disco vered b etw een minimizing congestion and minimizing stretch (in a certain setting). Here w e present R¨ ack e’s eq uiv alence in an abstract setting that is more general than th e one d escribed in R¨ ack e’s work, and clariﬁes the p ow er of R¨ ac ke’s result. In addition, we p resent a related (but diﬀeren t) equiv alence that w as dev elop ed by Y uv al Emek [ESA 2009] and is only kn own to apply to planar graphs. 1 In tro d u ction In this manuscript we presen t results o f a manuscript of Harald R¨ ac ke tit le d “Op- timal hierarchical deco mpo sitions for congestion minimization in netw ork s” [1 6]. Our presentation is more mo dular than the original presentation of R¨ ack e in that it separates the existential asp ects of R¨ ac ke’s res ult from the algo rithmic asp ects. The existential results ar e presented in a mor e abstract setting that allows the reader to apprec ia te the g e ne r ality of R¨ ac ke’s r esult. Our pre senta- tion is a ls o mor e ca reful not to lose o n the tig ht nes s of the parameters (e.g ., no t to giv e aw ay co nstant multiplicative factors). F or slides of a talk ba sed on this manuscript see [1 0]. Our manuscript is or ganized as follows. In Section 2 we discuss the opti- mization problem o f min-bisection. Achieving an improv ed approximation ratio for this pro blem is o ne o f the re sults o f [16], and we use this pro blem as a moti- v ation for the main r e s ults that follows. In Section 3 we present in an abstra ct setting what we view as R¨ ack e’s main result, namely , an equiv alence b e t ween t wo types o f probabilistic embeddings, one concerned with faithfully represent- ing distances and the o ther with faithfully representing capacities. I n Se c tion 4 ∗ Microsoft Live Labs, One Microsof t W a y , Redmond, W A 98052. reida n@microsoft.com . † Departmen t of Computer Scienc e and Applied Mathematics, W eizmann Institute, Re- ho vot 76100, Israel. uriel.feig e@weizmann.ac.il . Supported in part by The Israel Science F oundation (grant No. 87 3/08). 1 we brieﬂy dis cuss a lgorithmic versions of the existential r esult. In Se c tio n 5 we show how the machinery developed leads to a n approximation algor ithm for min-bis ection. In Section 6 we present res ults rela ted to the main theme of this manuscript, but that do not a ppe ar in [16]. These results co nc e rn an equiv alence betw een deterministic embeddings in planar graphs , and were ﬁrst developed a nd used by Y uv a l Emek in [8]. 2 Min-bisection In the min-bisection problem, the input is a gr aph with an e ven n umber n of vertices. In the weigh ted v er sion of the problem, edges hav e arbitrar y nonneg - ative weigh ts, whereas in the un weigh ted version, the w eight o f every e dge is 1 . A bisection of the gra ph is a partition the set of vertices in to tw o sets of equal size. The width of a bisection is the to tal weigh t of the edges that are cut (an edge is cut if its e ndp o ints are on diﬀerent sides of the pa rtition). Min-bisection asks for a bisectio n of minim um width. This problem is NP-hard. One line of resear ch dealing with the NP -hardness of min-bisection oﬀers a bi-criteria approximation. Namely , it is concer ned with developing algorithms that pro duce a partition of the g r aph into nearly equal par ts (rather than exa ctly equal par ts), suc h that the width of the partition is not m uch large r than the width of the minimum bisection. The metho dolo gy used by these algorithms was developed in a sequence of pape r s and curr ently allows o ne to eﬃciently ﬁnd a nea r bisection (e.g ., each of the t wo parts has a t least one thir d of the vertices) whose width is within a m ultiplicative factor of O ( √ log n ) o f the w idth of the minim um bisectio n. The metho do logy used by these pap ers is related to the theme o f the current manuscript. It als o uses an interplay betw een distance and capacity . W e brieﬂy explain this in terplay , and refer the readers to [14, 15, 4 ] for mo r e details . F or simplicity , assume that the input g r aph is a complete g raph,by replacing non-edges b y edge s of weigh t 0. View the weigh t of an edge in a min-bisec tio n problem a s sp ecifying its capacity . The width of a bisection is the total capacity of its asso ciated cut. T he ﬁr st phase o f the bicriteria approximation algorithm inv olves solving some linear program L P (in [14, 15]) or semideﬁnite program SDP (in [4]). The output of this mathematica l pr ogra m ca n be thought of as a fractional cut in the fo llowing sense: edges ar e assigned leng ths, and the longer an edge is, a lar ger fractio n o f it b elongs to the cut. Naturally , for the fractional solution to have small v alue, the LP (or SDP) will try to a ssign s hort lengths to edges of high ca pacity . (This is a theme that will rea pp ea r in the pro of of Theorem 6.) Thereafter, this fractiona l solution is rounded to give a near bisection of width not m uch larg er than the v alue of the fractiona l so lutio n. The rounding pro cedure is more lik ely to cut the long edges than the short edges. Or equiv alen tly , tw o v ertices of short dis tance from ea ch o ther (wher e distance is meas ur ed by sum of edg e lengths a long shor test path) are likely to fall in the same s ide of the partitio n. Hence to ﬁnd a near bis e ction of small capacity , the bicriter ia approximation a lgorithms intro duce an intermediate notion of 2 distance, a nd carefully choo s e (as a solution to the LP or SDP) a dista nce function that in tera cts w ell with the capa cities of the e dg es. In fact, there is a formal connection be t ween the distortion of this dista nc e function in comparison to ℓ 1 distances a nd the approximation ratio (in terms of minimizing capac it y ) that one gets from this methodolo gy . (See [15] for an exa ct statement.) T o mo ve fro m a bicr iteria approximation to a true approximation (in which the output is a true bisection, and approximation is only in the sense that the width is not neces sarily optimal), it app ear s unav oidable that one sho uld us e in some w ay dynamic programming. Cons ider the task of determining whether a graph has a bis ection of width 0. Such a bisection exists if a nd only if there a set of co nnected comp onents of the g raph who se total size is ex actly n/ 2. Determining whether this is the case amounts to so lv ing a subset s um problem (with sizes of connected comp onents serving as input to the subset sum prob- lem), and the o nly a lgorithm known to solve subset sum (in time p olynomial in the num b ers inv olved) is dynamic prog ramming. In [11], the techniques used in the bicreter ia approximation were combined with a dynamic progr amming approach to pro duce a true approximation for min-bisection with approxima- tion ra tio O ((lo g n ) 3 / 2 ) (obtained as the bicriteria approximation ratio times O (log n )). Here w e shall descr ib e R¨ ack e’s a ppr oach that g ives a be tter approx- imation r atio, O (log n ). T rees (a nd mo re gener a lly , gr aphs of bo unded treewidth, though this sub ject is b eyond the scop e of the current manuscript) form a family of g raphs on which many NP-hard problems c a n b e solved using dyna mic progr amming. In partic- ular, min bisection can be solved in p olynomial time on trees. This sugg ests the fo llowing plan for a ppr oximating min-bisection on g eneral graphs, which is presented here us ing terms that are s uggestive but ha ve not be en deﬁned yet. First, ﬁnd a “low distortion embedding” of the input graph into a tree. Then solve min bisection optimally on the tree, using dynamic pr ogra mming . The solution will induce a bis ection on the origina l graph, and the a pproximation ratio will b e b ounded b y the disto rtion o f the embedding in to the tree . The plan as descr ib e d a b ov e has certain dr awbac ks. One is that the dis- tortion when embedding a general gr a ph into a tree might b e very large (e.g., the distance distor tion for an n -cycle). This problem has b een address ed in a satisfactory way in previous w ork [2 , 5]. Rather than embed the gra ph in one tree, o ne ﬁnds a probabilistic embedding into a family of (dominating) trees (the requirement that the trees be dominating is a technical req uirement that will be touched up on in Section 5), and considers average distortion (averaged over all trees). When the o b jective function is linear (as in the case of min bisection), the proba bilistic notio n of embedding suﬃces. Hence the mo diﬁed plan is as follows. Find a lo w distortion probability distribution o ver embeddings of the input graph into (dominating) trees. Then s olve min bisection optimally on each of these trees. Each of these solutio ns will induce a bisection on the origina l graph, a nd for the bes t of them the approximation ra tio will b e b ounded b y the aver age disto r tion o f the probabilistic e mbedding into trees. The known proba bilistic embeddings o f gr aphs into trees ar e tailored to minimize average distortion, where the asp ect that is b eing disto r ted is distance 3 betw e e n vertices. How ever, for the intended application of min bisectio n, the asp ect that interests us is the capacity of cuts r a ther than dis ta nces. Hence R¨ ac ke’s a pproach for approximating min-bisec tion is as describ ed a b ov e, but with the distinction that the distortion of embedding s is measur e d with resp ect to capa city r ather than distance. T o implemen t this approach, one needs to design probabilistic em b eddings with low capac it y distortion. Here is an informal statemen t of R¨ ack e’s result in this respect. Theorem 1 F or every gr aph on n vertic es ther e is a pr ob abili t y distribution of emb e ddings into (dominating) tr e es with O (log n ) aver age distortion of the c ap acity. Mor e over, these emb e ddings c an b e found in p olynomial time. This theor em is ana logous to known theo rems reg a rding the distor tio n of distances in pr obabilistic embeddings [9]. R¨ ac ke’s pr o of of Theor em 1 is by a reduction b etw een these tw o types of embeddings. The existence of such a reduction may not b e unexp ected (as an afterthought), b eca use as we hav e seen in the bicr iteria appr oximation algor ithms, there are cer tain corr esp ondences betw e e n capac it y and distance. As a direc t consequence of the theorem ab ove and the mo diﬁed pla n for approximating min bisection, one obtains an O (log n ) appr oximation for min bisection. This will be discussed in more detail in Section 5. 3 An abstract setting In this section we present an abs tr act se tting that a s a special case will lead to the e x istential c omp onent o f The o rem 1. 3.1 Deﬁnitions Let E b e a set (of edges) and P a co llection of nonempty multisets of E (that we call paths). A mapping M : E − → P maps to every edge i ∈ E a path P ∈ P . It will b e co nv enient to represent a mapping by a matrix M , wher e M ij counts the num b er of times the edge j lies on the path M ( i ). Spanning tree example. E is a set of edges of a co nnected graph G . Consider a spanning tree T of G , and let P b e the set of s imple paths in T . Then there is a natural mapping M that maps every edge ( i, j ) ∈ E to the set of edges that form the unique simple path betw een v ertices i a nd j in T . T ree emb edding example . E is a set of edg e s of a connected gra ph G . Consider an arbitrary tree T deﬁned over the sa me set o f vertices as G (edge s of T need not b e edges o f G ). As in the spa nning tre e example ab ove, there is a na tural mapping fro m edges of G to paths in T . How ever, this is not a mapping in the sense deﬁned a b ov e, be c a use edg es of T are not necessarily edges of G . T o remedy this situation, we represent ea ch edge ( i, j ) of T by a set of edges that form a simple path be t ween i and j in G . This r epresentation is not unique (there may be man y simple paths b etw een i and j in G ), and hence 4 some co nv ention is used to sp ecify one such path uniquely (for exa mple, one may tak e a shortest path, br eaking ties arbitr a rily). Hence now eac h edg e of T corresp o nds to a set of edg es in G , and each simple pa th in T corres p o nds to a collec tio n of several such s e ts. No w T can b e the mapping that ma ps each edge ( i, j ) in G to a multiset of edges of G that is obta ined by joining together (counting multiplicities) the sets o f edg es o f G that form the paths that corresp ond to the edg e s of T that lie along the simple path co nnecting i and j in T . Graph embeddin g example. This is a g eneraliza tio n of the tr ee em b ed- ding exa mple. E is a set of edges of a c onnected graph G on n , and H is an arbitrary diﬀerent g r aph deﬁned on the s a me s et o f vertices. Now an edge ( i, j ) ∈ E is mappe d to some path (using a conv ention such as that of taking a shortest path) that connects i and j in H , and this pa th in H is represented as a multiset o f edges in G (as in the tree embedding example). This deﬁnes a mapping M . Natural alternative versions of the mapping r educe the m ultiset to a set, either by removing m ultiplicities o f edges in the multiset, or by more extensive pr o cessing (e.g., if the multiset corresp onded to a nonsimple path in G that contains cycles, these cycles ma y po ssibly b e remov e d). Hyp ergraph example. E is the set of h y per edges of a connected 3-unifor m hypergra ph H . T is a spa nning tr ee of the hyper graph H , in the sense that it is deﬁned on the same set of vertices as H , and every edge ( i, j ) of T is la be led by some hyperedge of H that contains vertices i and j . Then a hype r edge i, j, k can be mapp ed to the set o f hyperedges that lab el the set of edges alo ng the paths that connect i , j a nd k in T . W e note that the tree em b edding example is essentially the setting in R¨ ack e’s work [16]. How ever the s pa nning tree exa mple suﬃces in o rder to illustr ate the main ideas in R¨ acke’s work. Let M b e a family of admissible mappings. A pro babilistic mapping betw een E a nd M is a probability distribution ov er mappings M ∈ M . T ha t is, with every M ∈ M w e asso ciate s ome λ M ≥ 0, with P M λ M = 1. W e shall cons ider probabilistic mappings in tw o diﬀerent contexts. Deﬁnition 2 (Distance mapping ). Every e dge i has a p ositive lengt h ℓ i asso ciate d with it. We let dist M ( i ) denote the length of the p ath M ( i ) , namely dist M ( i ) = P j M ij ℓ j . The stretc h of an e dge i is dist M ( i ) ℓ i . The aver age str etch of an e dge in a pr ob abilistic mapping is the weighte d aver age ( weighte d ac c or ding to λ M ) of the str etches of the e dge. The str etch of a pr ob abilistic mapping is t he maximum over al l e dges of t heir aver age str etches. The stretch of a par ticular edge may b e smaller than 1. How ever, the str etch of the shortest edge will alwa ys b e at leas t 1. Probabilis tic dis tance ma ppings were consider ed in [2, 7, 1 ] for the s panning tree ex a mple, and in [5, 9] for the tree embedding example. Deﬁnition 3 (Capacit y mapping). Ev ery e dge i has a p ositive c ap acity c i asso ciate d with it. We let l oad M ( j ) denote t he sum (with m ultiplicities) of c ap ac- ities of e dges whose p ath under M c ontains j . Namely, load M ( j ) = P i M ij c i . 5 The congestion of an e dge j is load M ( j ) c j . The aver ag e c ongestion of an e dge in a pr ob abilistic mapping is the weighte d aver age (weighte d ac c or ding to λ M ) of the c ongestions of the e dge. The c ongestion of a pr ob abilistic mapping is the maximum over al l e dges of t heir aver age c ongestions. The conges tion of an edge may b e s maller than 1. How ever, the sum of all capacities in M ( E ) is a t least as large as the sum of all ca pacities in E , implying that the conges tion of a proba bilistic mapping is always at least 1 . F or co ncreteness, let us present the notio ns of distance, stretch, lo ad a nd congestion as applied to the spanning tree exa mple. Consider a connected g raph G in which every edge e = ( i, j ) ha s a p os itive leng th ℓ e and a p ositive ca pacity c e . Consider an arbitrary s pa nning tre e T of G , a nd the mapping from G to T describ ed in the spanning tree example ab ove. Then the distance of edge e is the s um o f le ngth of edges along the unique simple path that connects v er tices i and j in T . The s tr etch of e is then the ratio betw een this distance and ℓ e . The load o n edge e is 0 if e is no t part of the spanning tree. Ho wever, if e is part o f the spanning tr e e, the load is computed as follows. Removing e , the tree T decomp oses in to tw o trees, one con taining v ertex i (that we call T i ) and the o ther containing vertex j (that we ca ll T j ). The loa d of e is the sum of capacities of all edges (including e itself ) that hav e one endp oint in T i and the other in T j . T he conges tion of e is the ratio betw ee n the load a nd c e . 3.2 Probabilistic mappings as zero-sum games W e s ha ll use the following standar d consequence of the minimax theorem for zero sum games (as in [2]). Lemma 4 F or every ρ ≥ 1 and every family of admissible mappings M , t her e is a pr ob abilistic mapping with str etch at most ρ if and only if for every nonne gative c o eﬃcients α i , ther e is a mapping M ∈ M s uch that P α i dist M ( i ) ℓ i ≤ ρ P α i . Pro of. Consider a zero sum g ame in which the play er MAP cho oses an a dmis- sible mapping M , and the play er EDGE choo ses an edge i . The v alue of the game for for EDGE is the stretch of i in the ma pping, and hence EDGE wishes to maximize the stretch wher eas MAP wishes to minimize it. A probabilistic mapping is a r a ndomized strategy for MAP . Cho o sing nonnega tive coeﬃcients α i (and scaling them so that P α i = 1) is a ra ndomized strategy for EDGE. The “only if” dir e ction. If ther e is no randomized s trategy for MAP forcing an exp ected v alue at most ρ , then the minimax theorem implies that there must be a rando mize d strategy for EDGE that enfor ces an exp ected v alue more than ρ , reg ardless o f which mapping play er MAP chooses to play . The “if” dir e ction. If there is no rando mized stra tegy for EDGE forcing an exp ected v alue lar ger than ρ , then the minimax theo r em implies that there must be a r andomized strategy for MAP that enforces an exp ected v alue of at most ρ , reg ardless o f which edge play er EDGE c ho o ses to play .  6 Lemma 5 F or every ρ ≥ 1 and every family of admissible mappings M , ther e is a pr ob abilistic mapping with c ongestion at most ρ if and only if for every nonne gative c o eﬃcients β i , ther e is a m apping M ∈ M such that P β i load M ( i ) c i ≤ ρ P β i . The pro of of Lemma 5 is similar to the pro o f o f L e mma 4, and hence is omitted. 3.3 Main result Theorem 6 F or every ρ ≥ 1 and every family of ad missible mappings M , the fol lowing two statements ar e e quivalent: 1. F or every c ol le ction of lengths ℓ i ther e is a pr ob abili st ic mapping with str etch at most ρ . 2. F or every c ol le ction of c ap acitie s c j ther e is a pr ob abilistic m apping with c ongestion at most ρ . Pro of. W e ﬁrst prove that item 2 implies item 1. Assume that there is a pr o babilistic mapping fro m E using M with con- gestion at most ρ . By Lemma 5, for ev ery nonnega tive coeﬃcients β j , there is a mapping M ∈ M such that P β j load M ( j ) c j ≤ ρ P β i . Hence, using the nota- tion from Deﬁnition 3 , for every no nnegative co eﬃcients β j we hav e a mapping satisfying: X j,i ∈ E β j M ij c i c j ≤ ρ X j ∈ E β j (1) W e need to pr ov e that there is a probabilistic mapping from E using M with stretch at most ρ . By Lemma 4, it suﬃces to prov e that for every nonnega tive co eﬃcients α i , there is a ma pping M ∈ M such that P α i dist M ( i ) ℓ i ≤ ρ P α i . Hence, using the notation fro m Deﬁnition 2, for every nonnegative coeﬃcients α i we need to ﬁnd a mapping sa tisfying: X i,j ∈ E α i M ij ℓ j ℓ i ≤ ρ X i ∈ E α i (2) Cho osing β j = α j and c i = α i /ℓ i (and likewise, c j = α j /ℓ j ) a nd s ubstituting in ine q uality (1), we obtain inequality (2 ). The pr o of that item 1 implies item 2 is similar, c ho os ing α i = β i and ℓ i = β i /c i , and then inequality (2 ) b eco mes inequa lity (1 ).  Observe that the pr o of of Theorem 6 do es not assume that entries of matrices M ∈ M ar e nonneg ative integers (except for the issue that ρ is stated as a quantit y of v alue at least 1). Neither do es it assume that distances or c a pacities are no nneg ative (thoug h they cannot b e 0, since the expr essions for s tretch and congestion involv e divisions b y distances o r c a pacities). 7 3.4 Sim ultaneous stret c h and congestion b ounds Let M b e a family of admissible mappings for which ther e is a probabilistic mapping with stretc h a t most ρ . Hence by Theor e m 6, there is also a (possibly diﬀerent) probabilistic mapping with congestio n at most ρ . How ever, this do es not imply that there is a proba bilistic mapping which simult a neously a chiev es stretch at most ρ and congestio n at most ρ . Consider the following example. The edges E are the se t o f e dges of the following gra ph G . G has tw o sp ecial vertices denoted by s and t . There are √ n vertex disjoint paths b e t ween s and t , each with √ n edges. In a ddition, there is the edg e ( s, t ). Hence altogether E co nt a ins n + 1 edges. The family M of admissible mappings is the canonica l family corres po nding to the set of all spanning trees of G , as describ ed in Sectio n 3 .1; given an edge of E and a spanning tree T o f G , the edge is mapp ed to the unique path in T joining its endpo int s . Let ℓ b e an arbitrary leng th function on E . The following is a proba bilistic mapping in to M o f stretch a t mos t 3. Let P b e the sho rtest path in G b etw een s and t (breaking ties a rbitrarily ). It ma y b e either the e dg e ( s, t ) or one of the √ n pa ths. F o r the probabilistic mapping , we choose a r andom spanning tree a s follows. All edges of P ar e contained in the spanning tree. In addition, fr om every path P ′ 6 = P , exactly o ne edg e is deleted, with probability prop or tional to the length of the edge. Let us analyze the exp ected stretch of the ab ov e probabilistic mapping. F or edges a long the path P , the s tr etch is 1 . Consider no w a n e dg e of length ℓ in a path P ′ 6 = P of length L . With probability ( L − ℓ ) /L < 1 the edge remains in the r andom spa nning tre e , k eeping its original length. With probability ℓ/L the e dg e is not in the spanning tree, and then it is mapped to a path of leng th at most 2 L − ℓ < 2 L (we used here the fact that the length of P is at most L ). Hence the exp e c ted s tretch is s maller than 1 + 2 L ℓ L 1 ℓ ≤ 3. Let c be an a rbitrary capa city function on E . The following is a pro babilistic mapping in to M of congestion a t most 3. F or every pa th P j betw e e n s and t (including the e dge ( s, t ) as one of the paths), let e j denote the edge of minim um capacity o n this path, and le t c j be its capa city . Choose a ra ndom spanning tree that co nt a ins all edges except the e j edges, and exactly o ne of the e j edges, chosen with proba bilit y propo rtional to its ca pacity . Let us analyze the exp ected congestion of the ab ove pr obabilistic mapping. Consider an arbitra ry edge e on path P j . Its capacity c is at least c j . The load that it suﬀers is at most its own ca pacity c , plus p erhaps the capacity c j ≤ c of edge e j , plus with proba bility c j / ( P i c i ) ≤ c/ ( P i c i ) a ca pa city of P i 6 = j c i ≤ P i c i , which in exp ectation co nt r ibutes a t most c . Hence altogether, its expected load is at most 3 c . The tw o proba bilistic mappings that we designed (one for stretch, one for congestion) ar e very diﬀeren t. In pa rticular, the sizes of the supports a re n √ n for the str etch cas e and √ n + 1 for the co ngestion cas e. W e now show that if in the graph G all lengths and all capacities ar e equal to 1 (the un weigh ted case), every probabilistic ma pping must hav e either stretch or congestion at least 8 √ n/ 2. T o achiev e stre tch less tha n √ n/ 2, the edge ( s, t ) must belo ng to the random spanning tree with pr obability at least 1 / 2. Ho wever, whenever ( s, t ) is in the s panning tree, then from every other path connecting s and t one edge needs to be re moved, contributing a total load of √ n to the edge ( s, t ). It is also int er esting to obse r ve that a ra ndom spanning tree (c ho s en uniformly at ra ndom from all spanning trees) co ntains the edge ( s, t ) with probability exactly 1/2 (this is becaus e the eﬀective res istance b etw een s and t is 1 /2, details omitted). Hence the uniform distribution ov er spanning trees is simultaneously ba d (distor tion at leas t √ n/ 2) b oth for str etch and for congestion. W e have seen tha t the proba bilistic mappings that achiev e low congestion may be v ery diﬀerent than those that achieve low stretch. How ever, as we shall see in Section 6, for the sp ecial case consider ed here (that of spanning trees in planar graphs), there are a dditional co nnections b etw een stretch and congestion, based o n planar duality . 4 Algorithmic asp ects In Sec tion 3 our discussion w as concerned only with the existence o f pr obabilistic mappings. Her e we shall discuss how suc h mappings can b e found eﬃciently . As we ha ve seen in Sectio n 3 .2, and using the notatio n o f Lemma 4, the pro blem of ﬁnding a pr obabilistic mapping with smallest distortion can b e cast as a problem of ﬁnding an o ptimal mixed stra tegy for the player MAP , in a zero sum ga me b etw een the play ers MAP and E DGE . W e s hall brie ﬂy rev iew the known results concerning the computation of optimal mixed strategies in zero sum games, and how they c an be applied in our setting. Hence in a sense , this section is independent of Section 3. 4.1 An LP formu lation of zero sum games It is well known that the v alue o f a zero sum game is a solutio n to a linea r progra m (LP), and that linear pro g ramming duality in this case implies the minimax theor em. Consider a game matrix A with r r ows (the pure s trategies for MAP) and c columns (the pure strateg ie s for E DGE), in which entry A ij contains the pay o ﬀ for EDGE if MAP plays r ow i and EDGE plays column j . Map wis he s to select a mixed strateg y tha t minimizes the expected pa yoﬀ, where as EDGE wishes to select a mixed strategy that maximizes this pay oﬀ. With ea ch r ow i w e can asso ciate a v aria ble x i that denotes the probability with whic h row i is play ed in MAP’s mixed strategy . Then the linear program is to minimize ρ sub ject to: • P i A ij x i ≤ ρ for all columns j , • P x i = 1, • x i ≥ 0 for all i . 9 An immediate consequence o f this LP formulation is that an optimal solu- tion to the LP ca n be found in time p olynomial in the size of the game matrix A (e.g., using the Ellipsoid algor ithm). Howev er, in o ur c ontext of probabilistic mappings, it will often be the cas e that the size o f A is not polyno mia l in the parameters of interest. F or example, when mapping a graph G in to a distribu- tion over spanning trees, the para meter of interest is typically n , the num b er of vertices in the gra ph. The n umber of edg es (and hence n umber of co lumns in A ) is at most n 2 and hence p olynomia lly bo unded in n . Howev er, the num b er of mappings (num ber of s panning trees in G ) might be of the or der of n Ω( n ) , which is not p oly nomial in n . Hence if one is in ter e sted in a lgorithms with r un- ning time p olyno mial in n , o ne cannot even write down the matrix A explicitly (though the graph G serves as an implicit representation o f A ). Though the ca se that will in tere s t us most is when there are superp oly no- mially many row strategies and p olynomially ma ny co lumn str ategies, let us discuss ﬁrst the case when there are p o lynomially many r ow strategies and su- per p olynomially ma ny co lumn strategies. W e note tha t in the discussions that follow we s ha ll assume that all payoﬀs are rationa l num b ers with numerators and denominators repr esented by a num- ber o f bits that is po lynomial in a par ameter of in ter est (such a s the smallest of the tw o dimensions o f A ). 4.2 Sup erp olynomially man y column st rategies As we cannot aﬀord to write the matrix A explicitly , w e need to ass ume so me other mechanism for acce s sing the strategies of the column play er. Typically , this is view ed abstractly a s or acle access. A natur al o r acle mo del is the following: Best resp onse oracle . Given a mixed str ategy for the row play er , the oracle provides a pure stra tegy for the column play er of highest exp ected pay oﬀ (together with the corresp onding column of A ). If a b est resp onse or acle is av a ilable, o ne may s till run the ellipsoid algo r ithm (with the b est res po nse o racle ser ving a s a separ ation ora cle) and obtain an optimal so lution to the LP (and hence an optimal mix e d str ategy for the g ame). See [13] for mo re deta ils on this appr oach. 4.3 Sup erp olynomially man y row strategies Here we addre s s the main case of int e r est, when there are sup er po lynomially many row str ategies, but only polynomia lly many column s tr ategies. O ne issue that ha s to b e dealt with is whether an optimal mixed strateg y for the r ow play er can b e repre sented at all in p oly no mial space, giv en that po tentially it requires sp ecifying pro babilities for sup erp olyno mially man y strategies. Luckily , the answ er is po sitive. Mixe d stra tegies are s olutions to linear pro grams, and linear progr a ms hav e b asic fe asible solut ions in which the num b er o f nonzero v ariables do es not exceed the num b er of cons traints (o mitting the nonneg ativity constraints x i ≥ 0). Hence there is an o ptimal mixed stra tegy for the r ow play er 10 whose suppo rt (the num b er of pure strategies that ha ve p ositive probability of being play ed) is not larger than the num b er of columns. T o access the pure str ategies of the row play e r , let us a ssume here that w e hav e a b est resp onse o racle for the row player. Now a standard a pproach is to consider the dual of the LP , which cor resp onds to ﬁnding an optimal mixed strategy for the column play er . By analo gy to Section 4.2, one can use the ellipsoid a lgorithm to ﬁnd an optimal mixed s trategy for the column play er. An optimal solution to the dual LP has the same v alue a s a n optimal so lution to the primal LP , but is not by its e lf a solution to the primal LP (and henc e , we still did not ﬁnd a n o ptimal mixed stra tegy for the row play er ). How ever, there are certain wa ys o f leveraging the ability to solve the dual LP and using it so as to a lso s o lve the primal LP . O ne suc h a ppr oach emplo ys an explo ration pha se that ﬁnds po lynomially man y linearly indep endent constra int s of the dual that are tight at the optimal dual solution, and then ﬁnd an optimal primal s olution that is supp orted only on the primal v ar iables that corres po nd to these dual constraints. Details are omitted here (but pr esumably appear in [13]). 4.4 W eak er oracle mo dels and faster algorithms In the games that in terest us, typically there are p olyno mially many columns (the co lumn play er is E DGE who can play an edge in the gr aph) and sup erp o ly - nomially man y rows (the row player Ma p has expo nentially many ma ppings to choose from). In this resp ect, we are in the setting of Section 4.3. Howev er, we might not ha ve a b est respo nse o racle representation o f MAP . Instead we shall often ha ve a weaker kind of o r acle. δ -resp onse oracle . Given a mixed strategy for the co lumn play er , the oracle provides a pure str ategy for the row pla yer (tog ether with the cor resp onding row of A ) that limits the expected pay oﬀ (to column pla yer) to at most δ . Let ρ b e the true minimax v a lue of the ga me. Then the v alue of δ for a δ - resp onse ora cle must be a t least ρ , but in g eneral might b e m uch lar ger tha n ρ . If this is the only form of access to the pure s trategies of the r ow play er , ﬁnding an o ptimal mixe d stra tegy for the row play er b e comes hop eless. Hence the goal is no lo ng er to ﬁnd the optimal mixed str ategy fo r the row play er, but rather to ﬁnd a mixed strategy that limits the exp ected payoﬀ of the co lumn play er to at most δ (plus low order terms). W e sketc h her e an a pproach of F r eund and Sc hapir e [12] that ca n be used. It is bas e d o n the us e of r e gr et minimizing algorithms. Consider an iterative pro ces s in which in eac h r ound, the column play er se - lects a mixed strategy , the row play er selects in resp ons e a δ -res p o nse (pure) strategy , and the column play er colle cts the expected pa yoﬀ of his mixed s trat- egy a gainst that pure strategy . If the column play er is using a regret minimizing online algor ithm in order to select his mixed strategies (suc h as using a m ulti- plicative weigh t upda te rule), then after p olynomia lly many ro unds (say , t ), his pay oﬀ (whic h can b e at most δ t ) is guara nteed to approa ch (up to low order terms) the total pay oﬀ that the b est ﬁxed column pur e strategy can achiev e against the actual sequence of pure strategies pla yed b y the row play er . This 11 means that if the ro w play er plays the mixed strategy of choo sing one of the t rounds at random and pla ying the row strateg y that was played in this round, no pure column strategy has e x pe c ted payoﬀ signiﬁcantly larger than δ . F or more details on this sub ject, the rea der is referred to [12], or to surveys suc h as [6] or [3]. 4.5 Implemen tation for probabilistic mappings Consider a zero sum game as in the setting of Lemma 2. E DGE has p o lynomially many strategies , whereas MAP potentially has exponentially man y strategies . A b est r esp onse or acle for MAP needs to b e able to ﬁnd a be st resp onse for MAP ag ainst any given mixed strategy of EDGE. In many co ntexts, ﬁnding the b est resp onse is NP-hard. How ever, for the intended applications , often a δ - r esp onse suﬃces, provided that one can guarantee that δ is not too larg e. Indeed, g iven co eﬃcients α i of the edges of the input gr aph, one ca n ﬁnd a spanning tree with av er age stretch ˜ O (log n ) [1] (the ˜ O notation hides some lower order multiplicativ e terms), and a tree embedding of average stretch O (log n ) [9]. This in c ombination with an a lgorithmic framework s imilar to tha t o utlined in Section 4.4 g ives pro babilistic embeddings with stretch ˜ O (log n ) and O (log n ) resp ectively . The ab ov e r esults in combination with Theorem 6 imply that there is also a probabilistic mapping into spanning trees with congestion ˜ O (log n ) and a pro b- abilistic mapping into (arbitrary ) tree s with congestio n O (log n ). T o actually ﬁnd such a mapping algo r ithmically , one needs to ﬁnd a mixed strategy for the play er MAP in the corresp o nding zero sum g ame. The algo rithmic framework of Section 4 .4 shows that this can b e done if we can implement a δ -resp onse oracle for MAP . W e have already s e en that such a n or a cle can be implemen ted for distance mapping, but now need to do so for capacity mapping s . Luc kily , the pro of of Theo rem 6 can b e use d fo r this purp os e. It shows how to transform any δ -r esp onse que r y to a capacity mapping oracle into a δ -resp onse query to a distanc e mapping oracle. This es tablishes the a lgorithmic asp ect of Theo- rem 1. W e remark that the res ulting probabilistic mappings have supp ort size po lynomial in n . 5 Applications R¨ ac ke des crib es several applications to his results, with o blivious routing b eing a pro minent example. Here we concentrate o nly on one of the applica tions, that of min-bisectio n tha t s erved a s our motiv a ting exa mple. Let G b e a connected graph on an even n umber n of vertices, in which edges hav e no nnegative capacities. One wishes to ﬁnd a bisection of minimum width (total capacit y of edges with endp oints in diﬀerent sides of the bipartization). W e present here a po lynomial time a lgorithm with approximation ratio ˜ O (log n ). Consider an a r bitrary s panning tree T of G . Every edge e = ( i, j ) o f T partitions the vertices of G into tw o sets that we call T i and T j . Deﬁne the loa d 12 l oad T ( e ) of edge e to b e the sum of capa cities of edges of G with one endpoint in T i and the other endpoint in T j . (This is consisten t with Section 3.1.) Consider now an arbitrar y bipar tization B of G . Let E T ( B ) b e the set of edges o f T that hav e endp oints in diﬀerent sides of the bipar tization. Then the width of the bipartization is at most P e ∈ E T ( B ) l oad T ( e ). (The load terms count every edge o f G cut by the bipartition at lea st o nce and p erha ps multiple times, and p ossibly also count edges of G not in the bipartization.) This is the domination pr op erty that we w er e r e ferring to in Section 2. By Theor em 1, whos e pro of is summarized in Section 5, one can ﬁnd in po lynomial time a distribution ov er spanning trees of G such that for every edge of G , its exp ected c ongestion (ov er c hoice of random spanning tree) is at most δ = ˜ O (log n ). Consider an o ptimal bisection in G , and let b denote its width. F or each edge cut by the bisection, its exp ected cong estion ov er the pr obabilistic mapping into spanning trees is at most δ . Summing over all edg es cut by the bisection a nd taking a weigh ted average over all spanning trees in the pr o babilistic mapping, we obtain that at least in one such tree T , the width of this bipartizatio n (with resp ect to the load in that tree) is a t mo st δ b . The ab ov e discussion gives the follo wing alg orithm for ﬁnding a bisection o f small width in a graph G whos e minimum bisection has width b . 1. Find a pro babilistic mapping into spanning trees with cong estion at mo st δ . (By the discussion ab ov e this step takes poly nomial time, and δ can be taken to b e ˜ O (log n ). F urther more, the set of spa nning trees in the suppo rt of the probabilistic ma pping has size polyno mial in n .) 2. In each s pa nning tree, ﬁnd an optimal bisection (with resp ect to the load) using dynamic programming. This takes po ly nomial time. Mo r eov er, by the dis c us sion ab ov e, in a t lea st one tr ee the bisection found will have width at most δ b . 3. O f all the bis ections found (o ne p er spanning tree), take the one that in G has s ma llest width. By the domination prop erty , its width is at mo st δ b . The approximation ra tio that w e presented above for min-bisection is δ = ˜ O (log n ), ra ther than O (log n ) as w a s done by R¨ ack e. T o get the O (lo g n ) approximation, instead of pr obabilistic mappings in to spanning tre es one simply uses probabilistic mappings int o (arbitrar y but dominating) trees. Then one can plug in the b ounds of [9] rather than the somewha t weaker bo unds of [1] and obtain the desired approximation r atio. Details omitted. 6 Spanning trees in p lanar graph s In Section 3.4 we saw that for distributio ns ov er spanning trees of plana r graphs, the distributions achieving low stre tch are very diﬀere n t from those achieving low congestion. I n this section we prese nt an interesting connection b etw een 13 low stretch and lo w congestion for spanning trees in planar graphs. A similar connection was observed independently (a nd apparently , b efore our work) b y Y uv al Emek [8]. The family of gra phs that we shall cons ider is that of 2- connected pla nar m ultig r aphs. Speciﬁcally , the graphs need to b e planar, c onnected, w ith no cut edge (an edge whose remov a l disc onnects the graph), and paralle l edges ar e allow ed. In the context of spanning tree s, restricting gra phs to be 2-co nnected is not a signiﬁca n t r estriction, b ecaus e disconnected graphs do not have spanning trees, a nd every cut edg e b elongs to every spanning tree and hence do es not contribute to the complexit y of the problem. The reason why we allow par allel edges is so that the notio n of a dual o f a planar gr aph will alwa ys b e deﬁned. F rom every s et of parallel edges, a spanning tree may contain at most one edge. Every pla nar graph can be em b edded in the plane with no intersecting edges. In fact, several a lgorithms are known to pro duce such embeddings in linea r time. This em b edding might not b e unique, in which ca se w e ﬁx one planar embedding arbitrar ily . Given a planar em b edding, the dual graph is obta ined b y considering every face of the embedding (including the outer fa ce) to b e a vertex of the dual graph, and every edg e of the em b edding co rresp onds to an edg e of the dual graph that connects the tw o v er tices that co rresp ond to the tw o faces that the edge separates. (The fact that the graph has no cut edge s insur es that the dual has no self lo ops. Two faces that s hare more than one edge g ive in the dua l parallel edg es.) The dual graph is pla nar and the plana r embedding that we asso ciate with it is the one na turally obtaine d by the a b ove co nstruction. Under this planar em b edding, the dual of the dual is the primal gr aph. Cycles in the primal g raph corres po nd to cuts in the dual gr a ph and vic e versa. It is well known and easy to see that g iven a spanning tree in the primal graph, the edg e s not in the spanning tree form a spa nning tree in the dual gr aph. (This also gives E uler’s form ula that | V | − 1 + | F | − 1 = | E | .) Consider now a length function o n the edg es o f the pr imal graph. Given a spanning tree of the primal gra ph, for e very spanning tree e dg e its str e tch is 1, and for every other edge its stretch is deter mined by the length of the fundamen ta l cycle that it close s with the s panning tree edges. In the dua l graph, let the ca pacity o f a n edge b e equal to the length of the cor resp onding edge in the primal graph. Co nsider the dual spanning tree. The cong e stion of edges no t on the dual spanning tree is 0 (one less than their stretch in the primal). The load of an edge o n the dual spa nning tree is pre c is ely the sum of ca pacities o f the corr esp onding fundamental cycle in the primal g raph, and hence the congestion is exactly one mor e than the stretch in the primal. The above deterministic cor resp ondence has the following probabilistic corol- lary . Corollary 7 Consider an arbitr ary 2-c onne cte d planar gr aph G and its planar dual ¯ G . Assume that e dges in G have nonne gative lengths wher e as e dges in ¯ G have nonne gative c ap acities, and mor e over, the c ap acity of an e dge in ¯ G is e qual to the length of the c orr esp onding e dge in G . Then for every pr ob abili st ic mapping of G into sp anning tr e es with str etch ρ , the same distribution over the 14 dual tr e es forms pr ob abilistic mapping of ¯ G into sp anning t re es with c ongestion at most ρ + 1 . Likewise, for every pr ob abili st ic mapping of ¯ G into sp anning tr e es with c ongestion ρ , the same distribution over the dual tre es forms a pr ob abilistic mapping of G int o sp anning tr e es with str et ch at m ost ρ + 1 . Ac knowledge ments W e thank Saty en Ka le for helpful discuss io ns on the sub jects of this manuscript. References [1] Itta i Abraham, Y air Ba rtal, Ofer Neiman: Nearly Tight Low Stretch Span- ning T rees. F OCS 2008: 781 -790. [2] No ga Alon, Richard M. Ka rp, David Peleg, Douglas B. W est: A Gra ph- Theoretic Game a nd Its Application to the k-Server Pr oblem. SIAM J. Comput. 24(1): 78- 100 (1995 ). [3] Sa njeev Arora, Elad Hazan, and Sat yen Kale. Multiplica - tive weigh ts metho d: a meta-a lgorithm and its applicatio ns. ht tp:// www.cs.princeton.edu/ ∼ aror a/pubs/MWsur vey .pdf [4] Sa njeev Ar ora, Satish Rao , Umesh V. V azirani: E xpander ﬂows, ge o metric embeddings and gra ph partitio ning . J. A CM 56(2): (2 009). [5] Y air Barta l: Probabilis tic Appro ximatio ns of Metric Spaces a nd Its Algo - rithmic Applications. FOCS 19 96: 184 -193 . [6] Avr im Blum and Yishay Ma nsour. Learning, Regret Minimization, and Equilibria. Bo o k chapter in Alg o rithmic Game Theo r y , Noam Nisan, Tim Roughgar den, Ev a T ardos, a nd Vijay V azir a ni, eds ., Ca mbridge University Press, 2007. [7] Michael Elkin, Y uv al Emek, Daniel A. Spielman, Shang -Hua T eng: Low er - Stretch Spanning T rees. SIAM J. Comput. 38 (2 ): 608 -628 (2008). [8] Y uv a l Emek. k-Outerplana r Graphs, Planar Duality , and Low Stretch Span- ning T rees. T o app ear in ESA 2009 . [9] J ittat F a kc haro e npho l, Satish Rao, Kunal T alw a r : A tight b ound on ap- proximating arbitrar y metrics by tree metrics. J. Co mput. Syst. Sci. 69(3): 485-4 97 (2004). [10] http://www.wisdom.weizmann.ac.il/ ∼ feige/Slides/ CapacityMapping.ppt [11] Uriel F eige, Rob ert Kr authgamer: A Polylogar ithmic Approximation of the Minim um Bisec tio n. SIAM J. Comput. 31(4): 1 090-1 118 (2002). [12] Y o av F reund and Rober t E. Schapire. Adaptive ga me pla ying us ing multi- plicative w e ig hts. Ga mes and Econo mic Behavior, 29:79 –103, 1999. 15 [13] Mar tin Gr otschel, Laszlo Lov asz, Alexander Schrijv er: Geometric Algo- rithms and Combinatorial Optimization, Springer, 1 988; [14] F rank Thomson Leighton, Sa tish Rao: Multicommo dity max-ﬂow min-cut theorems and their use in de s igning a pproximation algo rithms. J. ACM 46(6): 787-8 32 (1999). [15] Nathan Linial, E ran Lo ndon, Y uri Rabinovic h: The Geometry of Gr a phs and Some of its Algo rithmic Applications. Combinatorica 15(2): 215 -245 (1995). [16] Har ald R¨ ack e: Optimal hierar chical decomp ositions for cong e stion mini- mization in netw or ks. STOC 2008: 25 5-264 . 16

Interchanging distance and capacity in probabilistic mappings

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment