Dimension reduction for finite trees in L_1

We show that every n-point tree metric admits a (1+eps)-embedding into a C(eps) log n-dimensional L_1 space, for every eps > 0, where C(eps) = O((1/eps)^4 log(1/eps)). This matches the natural volume lower bound up to a factor depending only on eps. …

Authors: James R. Lee, Arnaud de Mesmay, Mohammad Moharrami

Dimension reduction for finite trees in L_1
Dimension reduction for finite trees in ` 1 James R. Lee ∗ Univ ersit y of W ashington Arnaud de Mesma y ∗ Ecole Normale Sup ´ erieure Mohammad Moharrami ∗ Univ ersit y of W ashington Abstract W e sho w that every n -p oin t tree metric admits a (1 + ε )-em bedding into ` C ( ε ) log n 1 , for every ε > 0, where C ( ε ) ≤ O  ( 1 ε ) 4 log 1 ε )  . This matc hes the natural v olume lo wer bound up to a factor dep ending only on ε . Previously , it was unknown whether ev en complete binary trees on n no des could b e embedded in ` O (log n ) 1 with O (1) distortion. F or complete d -ary trees, our construction achiev es C ( ε ) ≤ O  1 ε 2  . Con ten ts 1 In tro duction 2 1.1 Dimension reduction in ` 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Pro of outline and related w ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 W arm-up: Embedding complete k -ary trees 6 2.1 A single even t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 The Lo cal Lemma argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Colors and scales 11 3.1 Monotone colorings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Multi-scale embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Scale assignmen t 20 4.1 Scale selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Prop erties of the scale selector maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5 The em b edding 29 5.1 The construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2 Prop erties of the ∆ i maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3 The probabilistic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 ∗ P artially supported b y NSF grants CCF-0644037, CCF-0915251, and a Sloan Research F ellowship. A significant p ortion of this work was completed during a visit of the authors to the Institut Henri Poincar ´ e. 1 1 In tro duction Let T = ( V , E ) b e a finite, connected, undirected tree, equipp ed with a length function on edges, len : E → [0 , ∞ ). This induces a shortest-path pseudometric 1 , d T ( u, v ) = length of the shortest u - v path in T . Suc h a metric space ( V , d T ) is called a finite tr e e metric. Giv en t wo metric spaces ( X , d X ) and ( Y , d Y ), and a mapping f : X → Y , w e define the Lipschitz c onstant of f b y , k f k Lip = sup x 6 = y ∈ X d Y ( f ( x ) , f ( y )) d X ( x, y ) . An L -Lipschitz map is one for which k f k Lip ≤ L . One defines the distortion of the mapping f to b e dist ( f ) = k f k Lip · k f − 1 k Lip , where the distortion is understo o d to b e infinite when f is not injective. W e say that ( X , d X ) D -em b eds in to ( Y , d Y ) if there is a mapping f : X → Y with dist ( f ) ≤ D . Using the notation ` k 1 for the space R k equipp ed with the k · k 1 norm, we study the following question: Ho w large must k = k ( n, ε ) b e so that every n -point tree metric (1 + ε )-em b eds in to ` k 1 ? 1.1 Dimension reduction in ` 1 A seminal result of Johnson and Lindenstrauss [JL84] implies that for every ε > 0, ev ery n - p oin t subset X ⊆ ` 2 admits a (1 + ε )-distortion embedding in to ` k 2 , with k = O ( log n ε 2 ). On the other hand, the known upp er b ounds for ` 1 are m uch weak er. T alagrand [T al90], following earlier results of Bourgain-Lindenstrauss-Milman [BLM89] and Schec htman [Sch87], show ed that every n - dimensional subspace X ⊆ ` 1 (and, in particular, ev ery n -p oint subset) admits a (1 + ε )-embedding in to ` k 1 , with k = O ( n log n ε 2 ). F or n -p oin t subsets, this w as v ery recently improv ed to k = O ( n/ε 2 ) b y Newman and Rabinovic h [NR10], using the sp ectral sparsification techniques of Batson, Spielman, and Sriv astav a [BSS09]. On the other hand, Brinkman and Charik ar [BC05] sho wed that there exist n -p oin t subsets X ⊆ ` 1 suc h that any D -embedding of X into ` k 1 requires k ≥ n Ω(1 /D 2 ) (see also [LN04] for a simpler pro of ). Thus the exp onen tial dimension reduction achiev able in the ` 2 case cannot b e matc hed for the ` 1 norm. More recen tly , it has b een sho w by Andoni, Charik ar, Neiman, and Nguy en [A CNN11] that there exist n -point subsets suc h that any (1 + ε )-em b edding requires dimension at least n 1 − O (1 / log ( ε − 1 )) . Regev [Reg11] has given an elegan t proof of b oth these low er b ounds based on information theoretic argumen ts. One can still ask ab out the p ossibilit y of more substantial dimension reduction for certain finite subsets of ` 1 . Such a study w as undertaken by Charik ar and Sahai [CS02]. In particular, it is an elemen tary exercise to verify that ev ery finite tree metric em b eds isometrically in to ` 1 , th us the ` 1 dimension reduction question for trees b ecomes a prominen t example of this type. It was sho wn 2 [CS02] that for every ε > 0, every n -p oint tree metric (1 + ε )-embeds in to ` k 1 with k = O ( log 2 n ε 2 ). It is quite natural to ask whether the dep endence on n can b e reduced to the natural v olume low er b ound of Ω(log n ). Indeed, it is Question 3.6 in the list “Op en problems on embeddings of finite 1 This is a pseudometric b ecause w e may hav e d ( u, v ) = 0 ev en for distinct u, v ∈ V . 2 The original bound prov ed in [CS02] grew like log 3 n , but this was impro ved using an observ ation of A. Gupta. 2 metric spaces” maintained by J. Matou ˇ sek [Mat], ask ed by Gupta, Lee, and T alw ar 3 . As noted there, the question w as, surprisingly , ev en op en for the complete binary tree on n vertices. The presen t pap er resolves this question, achieving the volume lo wer b ound for all finite trees. Theorem 1.1. F or every ε > 0 and n ∈ N , the fol lowing holds. Every n -p oint tr e e metric admits a (1 + ε ) -emb e dding into ` k 1 with k = O (( 1 ε ) 4 log 1 ε log n ) . The pro of is presented in Section 3.1. W e remark that the pro of also yields a randomized p olynomial-time algorithm to construct the embedding. 1.2 Notation F or a graph G = ( V , E ), we use the notations V ( G ) and E ( G ) to denote the v ertex and edge sets of G , resp ectiv ely . F or a connected, ro oted tree T = ( V , E ) and x, y ∈ V , we use the notation P xy for the unique path b et ween x and y in T , and P x for P rx , where r is the ro ot of T . F or k ∈ N , we write [ k ] = { 1 , 2 , . . . , k } . W e also use the asymptotic notation A . B to denote that A = O ( B ), and A  B to denote the conjunction of A . B and B . A . 1.3 Pro of outline and related work W e first discuss the form that all our embeddings will take. Let T = ( V , E ) b e a finite, connected tree, and fix a ro ot r ∈ V . F or each v ∈ V , recall that P v denotes the unique simple path from r to v . Given a lab eling of edges b y v ectors λ : E → R k , we can define ϕ : V → R k b y , ϕ ( x ) = X e ∈ E ( P v ) λ ( e ) . (1) The difficulty now lies in choosing an appropriate labeling λ . An easy observ ation is that if w e ha v e k λ ( e ) k 1 = len ( e ) for all e ∈ E and the set { λ ( e ) } e ∈ E is orthogonal, then ϕ is an isometry . Of course, our goal is to use man y few er than | E | dimensions for the embedding. W e next illustrate a ma jor probabilistic tec hnique employ ed in our approach. Re-randomization. Consider an un weigh ted, complete binary tree of height h . Denote the tree b y T h = ( V h , E h ), let n = 2 h +1 − 1 be the num b er of v ertices, and let r denote the ro ot of the tree. Let κ ∈ N b e some constan t whic h w e will c ho ose momentarily . If we assign to ev ery edge e ∈ E h , a lab el λ ( e ) ∈ R κ , then there is a natural mapping τ λ : V h → { 0 , 1 } κh giv en by τ λ ( v ) = ( λ ( e 1 ) , λ ( e 2 ) , . . . , λ ( e k ) , 0 , 0 , . . . , 0) , (2) where E ( P v ) = { e 1 , e 2 , . . . , e k } , and the edges are labeled in order from the ro ot to v . Note that the preceding definition falls into the framew ork of (1), b y extending eac h λ ( e ) to a ( κh )-dimensional v ector padded with zeros, but the sp ecification here will b e easier to w ork with presen tly . If we c ho ose the label map λ : E h → { 0 , 1 } κ uniformly at random, the probabilit y for the em b edding τ λ sp ecified in (2) to ha ve O (1) distortion is at most exponentially small in n . In fact, the probability for τ λ to b e injectiv e is already this small. This is b ecause for t wo no des u, v ∈ V h 3 Ask ed at the DIMACS W orkshop on Discrete Metric spaces and their Algorithmic Applications (2003). The question was certainly known to others before 2003, and was asked to the first-named author by Assaf Naor earlier that y ear. 3 whic h are the children of the same no de w , there is Ω(1) probabilit y that τ λ ( u ) = τ λ ( v ), and there are Ω( n ) suc h independent even ts. In Section 2, we show that a judicious application of the Lov´ asz Lo cal Lemma [EL75] can b e used to sho w that τ λ has O (1) distortion with non-zero probability . In fact, we show that this approac h can handle arbitrary k -ary complete trees, with distortion 1 + ε . Unkno wn to us at the time of disco v ery , a closely related construction o ccurs in the con text of tree co des for interactiv e communication [Sc h96]. Unfortunately , the use of the Lo cal Lemma do es not extend well to the more difficult setting of arbitrary trees. F or the general case, we employ an idea of Sch ulman [Sc h96] based on r e- r andomization . T o see the idea in our simple setting, consider T h to b e comp osed of a ro ot r , under whic h lie tw o copies of T h − 1 , which w e call A and B , having ro ots r A and r B , resp ectiv ely . The idea is to assume that, inductively , we already ha ve a labeling λ h − 1 : E h − 1 → { 0 , 1 } κ ( h − 1) suc h that the corresp onding map τ λ h − 1 has O (1) distortion on T h − 1 . W e will then construct a random lab eling λ h : E h → { 0 , 1 } κ b y using λ h − 1 on the A -side, and π ( λ h − 1 ) on the B -side, where π randomly alters the lab eling in such a wa y that τ π ( λ h − 1 ) is simply τ λ h − 1 comp osed with a random isometry of ` κ ( h − 1) 1 . W e will then argue that with p ositive probabilit y (ov er the choice of π ), τ λ h has O (1) distortion, Let π 1 , π 2 , . . . , π h − 1 : { 0 , 1 } κ → { 0 , 1 } κ b e i.i.d. random mappings, where the distribution of π 1 is sp ecified by π 1 ( x 1 , x 2 , . . . , x κ ) = ( ρ 1 ( x 1 ) , ρ 2 ( x 2 ) , . . . , ρ κ ( x κ )) , where eac h ρ i is an indep endent uniformly random in volution { 0 , 1 } 7→ { 0 , 1 } . T o ev ery edge e ∈ E h − 1 , we can assign a height α ( e ) ∈ { 1 , 2 , . . . , h − 1 } which is its distance to the ro ot. F rom a lab eling λ : E h − 1 → { 0 , 1 } κ , we define a random labeling π ( λ ) : E h − 1 → { 0 , 1 } κ b y , π ( λ )( e ) = π α ( e ) ◦ λ . By a mild abuse of notation, w e will consider π ( λ ) : E ( B ) → { 0 , 1 } κ . Finally , giv en a lab eling λ h − 1 : E h − 1 → { 0 , 1 } κ , w e construct a random lab eling λ h : E h → { 0 , 1 } κ as follows, λ h ( e ) =            (0 , 0 , . . . , 0) e = ( r, r A ) (1 , 1 , . . . , 1) e = ( r, r B ) λ h − 1 ( e ) e ∈ E ( A ) π ( λ h − 1 )( e ) e ∈ E ( B ) . By construction, the mappings τ λ h | V ( A ) ∪{ r } and τ λ h | V ( B ) ∪{ r } ha v e the same distortion as τ λ h − 1 . In particular, it is easy to c hec k that τ π ( λ h − 1 ) is simply τ λ h − 1 comp osed with an isometry of { 0 , 1 } κ ( h − 1) . No w consider some pair x ∈ V ( A ) and y ∈ V ( B ). It is simple to argue that it suffices to bound the distortion for pairs with m = d T h ( r , x ) = d T h ( r , y ), for m ∈ { 1 , 2 , . . . , h } , so we will assume that x, y hav e the same height in T h . Observ e that τ λ h ( x ) is fixed with respect to the randomness in π , th us if w e write v = τ λ h ( x ) − τ λ h ( y ), where subtraction is tak en co ordinate-wise, mo dulo 2, then v has the form v ≡   1 , 1 , . . . , 1 | {z } κ , b 1 , b 2 , . . . , b κ ( m − 1)   4 where the { b i } are i.i.d. uniform o v er { 0 , 1 } . It is th us an easy consequence of Chernoff b ounds that, with probability at least 1 − e − mκ/ 8 , we ha ve k τ λ h ( x ) − τ λ h ( y ) k 1 = k v k 1 ≥ κ · d T h ( x, y ) 4 . Also, clearly k τ λ h k Lip ≤ κ . On the other hand, the num b er of pairs x ∈ V ( A ) , y ∈ V ( B ) with m = d T h ( r , x ) = d T h ( r , y ) is 2 2( m − 1) , thus taking a union bound, w e ha ve P  dist ( τ λ h ) > max { 4 , dist ( τ λ h − 1 ) }  ≤ h X m =1 2 2( m − 1) e − mκ/ 8 , and the latter b ound is strictly less than 1 for some κ = O (1), showing the existence of a go od map τ λ h . This illustrates how re-randomization (applying a distribution o v er random isometries to one side of a tree) can b e used to achiev e O (1) distortion for embedding T h in to ` O ( h ) 1 . Unfortunately , the arguments b ecome significantly more delicate when we handle less uniform trees. The full-blown re-randomization argument occurs in Section 5. Scale selection. The first step b eyond complete binary trees would b e in passing to complete d -ary trees for d ≥ 3. The same construction as abov e works, but now one has to choose κ  log d . Unfortunately , if the degrees of our tree are not uniform, w e hav e to adopt a significantly more delicate strategy . It is natural to choose a single n um b er κ ( e ) ∈ N for every edge e ∈ E , and then put λ ( e ) ∈ 1 κ ( e ) { 0 , 1 } κ ( e ) (this ensures that the analogue of the em b edding τ λ sp ecified in (2) is 1-Lipsc hitz). Observing the case of d -ary trees, one migh t b e tempted to put κ ( e ) =  log | T u | | T v |  , where e = ( u, v ) is directed aw ay from the root, and we use T v to denote the subtree ro oted at v . If one simply takes a complete binary tree on 2 h no des, and then connects a star of degree 2 h to ev ery vertex, w e hav e κ ( e )  h for every edge, and thus the dimension b ecomes O ( h 2 ) instead of O ( h ) as desired. In fact, there are examples which sho w that it is imp ossible to choose κ ( u, v ) to dep end only on the geometry of the subtree ro oted at u . These “scale selector” v alues ha ve to lo ok at the global geometry , and in particular hav e to enco de the volume growth of the tree at man y scales sim ultaneously . Our even tual scale selector is fairly sophisticated and imp ossible to describ e without delving significantly in to the details of the pro of. F or our purp oses, w e need to consider more general em b eddings of t yp e (1). In particular, the co ordinates of our lab els λ ( e ) ∈ R k will take a range of differen t v alues, not simply a single v alue as for complete trees. W e do try to maintain one imp ortan t, related in v ariant: If P v is the sequence of edges from the ro ot to some vertex v , then ideally for every co ordinate i ∈ { 1 , 2 , . . . , k } and ev ery v alue j ∈ Z , there will b e at most one e ∈ P v for which λ ( e ) i ∈ [2 j , 2 j +1 ). Th us instead of ev ery co ordinate b eing “touc hed” at most once on the path from the ro ot to v , every co ordinate is touched at most once at every sc ale along every such path. This ensures that v arious scales do not interact. F or 5 tec hnical reasons, this prop ert y is not maintained exactly , but analogous concepts arise frequen tly in the pro of. The restricted class of embeddings w e use, along with a discussion of the in v arian ts w e main tain, are introduced in Section 3.2. The actual scale selectors are defined in Section 4. Con trolling the top ology . One of the prop erties that we used abov e for complete d -ary trees is that the depth of suc h a tree is O (log d n ), where n is the n um b er of no des in the tree. This allow ed us to concatenate vectors down a ro ot-leaf path without exceeding our desired O (log n ) dimension b ound. Of course, for general trees, no similar prop erty need hold. How ever, there is still a b ound on the top olo gic al depth of an y n -no de tree. T o explain this, let T = ( V , E ) be a tree with ro ot r , and define a monotone c oloring of T to b e a mapping χ : E → N suc h that for ev ery c ∈ N , the color class χ − 1 ( c ) is a connected subset of some ro ot-leaf path. Suc h colorings were used in previous works on embedding trees into Hilb ert spaces [Mat99, GKL03, LNP09], as well as for preivous lo w-dimensional embeddings into ` 1 [CS02]. The following lemma is well-kno wn and elemen tary . Lemma 1.2. Every c onne cte d n -vertex r o ote d tr e e T admits a monotone c oloring such that every r o ot-le af p ath in T c ontains at most 1 + log 2 n c olors. Pr o of. F or an edge e ∈ E ( T ), let ` ( e ) denote the num b er of leav es beneath e in T (including, p ossibly , an endp oint of e ). Letting ` ( T ) = max e ∈ E ` ( e ), we will prov e that for ` ( T ) ≥ 1, there exists a monotone coloring with at most 1 + log 2 ( ` ( T )) ≤ 1 + log 2 n colors on an y ro ot-leaf path. Supp ose that r is the ro ot of T . F or an edge e , let T e b e the subtree b eneath e , including the edge e itself. If r is the endp oin t of edges e 1 , e 2 , . . . , e k , we ma y color the edges of T e 1 , T e 2 , . . . , T e k separately , since any monotone path is contained completely within exactly one of these subtrees. Th us we ma y assume that r is the endp oin t of only one edge e 1 , and then ` ( T ) = ` ( e 1 ). Cho ose a leaf x in T such that eac h connected component of T 0 of T \ E ( P rx ) has ` ( T 0 ) ≤ ` ( e 1 ) / 2 (this is easy to do by , e.g., ordering the lea ves from left to right in a planar drawing of T ). Color the edges E ( P rx ) with color 1, and inductively color each non-trivial connected comp onen t T 0 with disjoin t sets of colors from N \ { 1 } . By induction, the maximum num b er of colors app earing on a ro ot-leaf path in T is at most 1 + log 2 ( ` ( e 1 ) / 2) = 1 + log 2 ( ` ( T )), completing the pro of. Instead of dealing directly with edges in our actual embedding, w e will deal with color classes. This p oses a n um b er of difficulties, and one ma jor difficult y in v olving vertices which occur in the middle of such classes. F or dealing with these vertices, we will first prepro cess our tree by embedding it into a pro duct of a small num b er of new trees, eac h of which admits colorings of a sp ecial type. This is carried out in Section 3.1. 2 W arm-up: Em b edding complete k -ary trees W e first pro ve our main result for the sp ecial case of complete k -ary trees, with an impro v ed dep endence on ε . The main no velt y is our use of the Lov´ asz Lo cal Lemma to analyze a simple random embedding of such trees into ` 1 . The pro of illustrates the tradeoff b eing concentration and the sizes of the sets {{ u, v } ⊆ V : d T ( u, v ) = j } for eac h j = 1 , 2 , . . . . Theorem 2.1. L et T k,h b e the unweighte d, c omplete k -ary tr e e of height h . F or every ε > 0 , ther e exists a (1 + ε ) -emb e dding of T k,h into ` O (( h log k ) /ε 2 ) 1 . 6 In the next section, we in tro duce our random em b edding and analyze the success probabilit y for a single pair of vertices based on their distance. Then in Section 2.2, we show that with non-zero probabilit y , the construction succeeds for all vertices. In the coming sections and later, in the pro of of our main theorem, we will emplo y the following concentration inequality [McD98]. Theorem 2.2. L et M b e a non-ne gative numb er, and X i (1 ≤ i ≤ n ) b e indep endent r andom variables satisfying X i ≤ E ( X i ) + M , for 1 ≤ i ≤ n . Consider the sum X = P n i =1 X i with exp e ctation E ( X ) = P n i =1 E ( X i ) and V ar( X ) = P n i =1 V ar( X i ) . Then we have, P ( X − E ( X ) ≥ λ ) ≤ exp  − λ 2 2(V ar( X ) + M λ/ 3)  . (3) 2.1 A single even t First k , h ∈ N and ε > 0. W rite T = ( V , E ) for the tree T k,h with ro ot r ∈ V , and let d T b e the un w eigh ted shortest-path metric on T . Additionally , w e define, t =  1 ε  , (4) and m = t d log k e . (5) Let { ~ v (1) , . . . , ~ v ( t ) } , b e the standard basis for R t . Let b 1 , b 2 , . . . , b m b e chosen i.i.d. uniformly o v er { 1 , 2 , . . . , t } . F or the edges e ∈ E , we choose i.i.d. random lab els λ ( e ) ∈ R m × t , eac h of whic h has the distribution of the random v ector (represen ted in matrix notation), 1 m    ~ v ( b 1 ) . . . ~ v ( b m )    . (6) Note that for every e ∈ E , we hav e k λ ( e ) k 1 = 1. W e now define a random mapping g : V → R m ( h − 1) × t as follows: W e put g ( r ) = 0, and otherwise, g ( v ) =           λ ( e 1 ) . . . λ ( e j ) 0 . . . 0           , (7) where e 1 , e 2 , . . . , e j is the sequence of edges encoun tered on the path from the ro ot to v . It is straigh tforw ard to chec k that g is 1-Lipschitz. The next observ ation is also immediate from the definition of g . Observ ation 2.3. F or any v ∈ V and u ∈ V ( P v ) , we have d T ( u, v ) = k g ( u ) − g ( v ) k 1 . F or m, n ∈ N , and A ∈ R m × n , w e use the notation A [ i ] ∈ R n to refer to the i th row of A . W e no w b ound the probability that a giv en pair of vertices exp eriences a large contraction. 7 Lemma 2.4. F or C ≥ 10 , and x, y ∈ V , P h k g ( x ) − g ( y ) k 1 ≤ (1 − C ε ) d T ( x, y ) i ≤ k − C d T ( x,y ) / 2 . (8) Pr o of. Fix x, y ∈ V , and let r 0 denote their lo w est common ancestor. W e define the family random v ariables { X ij } i ∈ [ h − 1] ,j ∈ [ m ] b y setting ` ij = ( i − 1) m + j , and then X ij = k g ( x )[ ` ij ] − g ( r 0 )[ ` ij ] k 1 + k g ( y )[ ` ij ] − g ( r 0 )[ ` ij ] k 1 − k g ( x )[ ` ij ] − g ( y )[ ` ij ] k 1 . (9) Observ e that if i ≤ d T ( r , r 0 ) then X ij = 0 for all j ∈ [ m ] since all three terms in (9) are zero. F urthermore, if i ≥ min( d T ( r , x ) , d T ( r , y )) + 1, then again X ij = 0 for all j ∈ [ m ], since in this case one of the first tw o terms of (9) is zero, and the other is equal to the last. Th us if R = [ h − 1] ∩ [ d T ( r , r 0 ) + 1 , min( d T ( r , x ) , d T ( r , y ))] , then i / ∈ R = ⇒ X ij = 0 for all j ∈ [ m ], and additionally w e ha ve the estimate, | R | = min( d T ( r , x ) , d T ( r , y )) − d T ( r , r 0 ) ≤ d T ( x, y ) 2 . (10) No w, using the definition of g (7), we can write k g ( x ) − g ( y ) k 1 = X i ∈ [ h − 1] ,j ∈ [ m ]  k g ( x )[ ` ij ] − g ( r 0 )[ ` ij ] k 1 + k g ( y )[ ` ij ] − g ( r 0 )[ ` ij ] k 1 − X ij  = k g ( x ) − g ( r 0 ) k 1 + k g ( y ) − g ( r 0 ) k 1 − X i ∈ [ h − 1] ,j ∈ [ m ] X ij (2.3) = d T ( x, r 0 ) + d T ( y , r 0 ) − X i ∈ [ h − 1] ,j ∈ [ m ] X ij = d T ( x, y ) − X i ∈ [ h − 1] ,j ∈ [ m ] X ij . W e will pro ve the lemma by arguing that, P   X i ∈ [ h − 1] ,j ∈ [ m ] X ij ≤ C εd T ( x, y )   ≤ k − C d T ( x,y ) / 2 . W e start the pro of b y first b ounding the maximum of the X ij v ariables. Since, for ev ery ` , w e ha v e k g ( x )[ ` ] − g ( r 0 )[ ` ] k 1 , k g ( y )[ ` ] − g ( r 0 )[ ` ] k 1 ∈  0 , 1 m  , w e conclude that, max n X ij : i ∈ [ h − 1] , j ∈ [ m ] o ≤ 2 m . (11) F or i ∈ R and j ∈ [ m ], using (6) and (7), we see that ( g ( x )[ ` ij ] − g ( r 0 )[ ` ij ]) = 1 m ~ v ( α ) and g ( y )[ ` ij ] − g ( r 0 )[ ` ij ] = 1 m ~ v ( β ), where α and β are i.i.d. uniform ov er { 1 , . . . , t } . Hence, for i ∈ R and j ∈ [ m ], we ha ve P [ X ij 6 = 0] = 1 t . 8 W e can th us bound the exp ected v alue and v ariance of X ij for i ∈ R and j ∈ [ m ] using (11), E [ X ij ] ≤ 2 tm , (12) and V ar( X ij ) ≤ 4 tm 2 . (13) Using (10), w e hav e h − 1 X i =1 m X j =1 E [ X ij ] = X i ∈ R X j ∈ [ m ] E [ X ij ] (12) ≤ X i ∈ R 2 t (10) ≤ d T ( x, y ) t , (14) and h − 1 X i =1 m X j =1 V ar( X ij ) = X i ∈ R X j ∈ [ m ] V ar( X ij ) (13) ≤ X i ∈ R 4 tm (10) ≤ 2 d T ( x, y ) tm . (15) W e now apply Theorem 2.2 to complete the proof: P " X i ∈ [ h − 1] ,j ∈ [ m ] X ij ≥ C  d T ( x, y ) t  # = P " X i ∈ [ h − 1] ,j ∈ [ m ] X ij − d T ( x, y ) t ≥ ( C − 1)  d T ( x, y ) t  # (14) ≤ P   X i ∈ [ h − 1] ,j ∈ [ m ] X ij − E   X i ∈ [ h − 1] ,j ∈ [ m ] X ij   ≥ ( C − 1)  d T ( x, y ) t    ≤ exp   − (( C − 1) d T ( x, y ) /t ) 2 2  P i ∈ [ h − 1] ,j ∈ [ m ] V ar( X ij ) + ( C − 1)( d T ( x, y ) /t )( 2 m ) / 3    (15) ≤ exp − (( C − 1) d T ( x, y ) /t ) 2 2  2 d T ( x, y ) / ( tm ) + ( C − 1)( d T ( x, y ) /t )( 2 m ) / 3  ! = exp  − ( C − 1) 2 4 (1 + ( C − 1) / 3) · m t · d T ( x, y )  . An elementary calculation sho ws that for C ≥ 10, w e hav e ( C − 1) 2 4(1+( C − 1) / 3) ≥ C 2 . Hence, P " X i ∈ [ h − 1] ,j ∈ [ m ] X ij ≥ C εd T ( x, y ) # (4) ≤ P " X i ∈ [ h − 1] ,j ∈ [ m ] X ij ≥ C  d T ( x, y ) t  # ≤ exp  − C m 2 t d T ( x, y )  (5) ≤ k − C d T ( x,y ) / 2 completing the pro of. 9 2.2 The Lo cal Lemma argument W e first give the statemen t of the Lo v´ asz Lo cal Lemma [EL75] and then use it in conjunction with Lemma 2.4 to complete the pro of of Theorem 2.1. Theorem 2.5. L et A b e a finite set of events in some pr ob ability sp ac e. F or A ∈ A , let Γ( A ) ⊆ A b e such that A is indep endent fr om the c ol le ction of events A \ ( { A } ∪ Γ( A )) . If ther e exists an assignment x : A → (0 , 1) such that for al l A ∈ A , we have P ( A ) ≤ x ( A ) Y B ∈ Γ( A ) (1 − x ( B )) , then the pr ob ability that none of the events in A o c cur is at le ast Q A ∈A (1 − x ( A )) > 0 . Pr o of of The or em 2.1. W e ma y assume that k ≥ 2. W e will use Theorem 2.5 and Lemma 2.4 to sho w that with non-zero probability the follo wing inequality holds for all u, v ∈ V , k g ( u ) − g ( v ) k 1 ≤ (1 − 14 ε ) d T ( u, v ) . F or u, v ∈ V , let E uv , b e the even t {k g ( u ) − g ( v ) k 1 ≤ (1 − 14 ε ) d T ( u, v ) } . Now, for u, v ∈ V , define x uv = k − 3 d T ( u,v ) . Observ e that for v ertices u, v ∈ V and a subset V 0 ⊆ V , the even t E uv is m utually indep enden t of the family {E u 0 v 0 : u 0 , v 0 ∈ V 0 } whenev er the induced subgraph of T spanned b y V 0 con tains no edges from P uv . Th us using Theorem 2.5, it is sufficien t to show that for all u, v ∈ V , P ( E uv ) ≤ x uv Y s,t ∈ V : E ( P st ) ∩ E ( P uv ) 6 = ∅ (1 − x st ) . (16) Indeed, this will complete the pro of of Theorem 2.1. T o this end, fix u, v ∈ V . F or e ∈ E and i ∈ N , w e define the set, S e,i = { ( u, v ) : u, v ∈ V , d T ( u, v ) = i , and e ∈ E ( P uv ) } . Since T is a k -ary tree, | S e,i | ≤ i X j =1 k j − 1 · k i − j = i · k i − 1 ≤ k 2 i . (17) 10 Th us we can write, x uv Y s,t ∈ V : E ( P st ) ∩ E ( P uv ) 6 = ∅ (1 − x st ) = x uv Y e ∈ E ( P uv ) Y i ∈ N Y ( s,t ) ∈ S e,i (1 − x st ) = k − 3 d T ( u,v ) Y e ∈ E ( P uv ) Y i ∈ N Y ( s,t ) ∈ S e,i  1 − k − 3 i  (17) ≥ k − 3 d T ( u,v ) Y e ∈ E ( P uv ) Y i ∈ N  1 − k − 3 i  k 2 i ≥ k − 3 d T ( u,v ) Y e ∈ E ( P uv ) Y i ∈ N  1 − k 2 i ( k − 3 i )  = k − 3 d T ( u,v ) Y e ∈ E ( P uv ) Y i ∈ N  1 − 1 k i  . F or x ∈ [0 , 1 2 ], we ha ve e − 2 x ≤ 1 − x , and since k ≥ 2, w e hav e k − i ≤ 1 2 for all i ∈ N , hence x uv Y s,t ∈ V : E ( P st ) ∩ E ( P uv ) 6 = ∅ (1 − x st ) ≥ k − 3 d T ( u,v ) Y e ∈ E ( P uv ) Y i ∈ N exp  − 2 k i  = k − 3 d T ( u,v ) Y e ∈ E ( P uv ) exp − 2 X i ∈ N 1 k i ! = k − 3 d T ( u,v ) Y e ∈ E ( P uv ) exp  − 2 /k 1 − 1 /k  ≥ k − 3 d T ( u,v ) Y e ∈ E ( P uv ) exp  − 4 k  = k − 3 d T ( u,v ) exp  − 4 d T ( u, v ) k  . Since k ≥ 2, w e conclude that, x uv Y s,t ∈ V : E ( P st ) ∩ E ( P uv ) 6 = ∅ (1 − x st ) ≥ k − 7 d T ( u,v ) . On the other hand, Lemma 2.4 applied with C = 14 gives, P [ k g ( u ) − g ( v ) k 1 ≤ (1 − 14 ε ) d T ( u, v )] ≤ k − 7 d T ( u,v ) , yielding (16), and completing the pro of. 3 Colors and scales In the prese n t section, w e dev elop some tools for our ev en tual em b edding. The pro of of our main theorem app ears in the next section, but relies on a key theorem whic h is only prov ed in Section 5. 11 3.1 Monotone colorings Let T = ( V , E ) b e a metric tree ro oted at a vertex r ∈ V . Recall that such a tree T is equipp ed with a length len : E → [0 , ∞ ). W e extend this to subsets of edges S ⊆ E via len ( S ) = P e ∈ S len ( e ). W e recall that a monotone c oloring is a mapping χ : E → N such that each color class χ − 1 ( c ) = { e ∈ E : χ ( e ) = c } is a connected subset of some ro ot-leaf path. F or a set of edges S ⊆ E , w e write χ ( S ) for the set of colors o ccurring in S . W e define the multiplicity of χ by M ( χ ) = max v ∈ V | χ ( P v ) | . Giv en such a coloring χ and c ∈ N , w e define, len χ ( c ) = len ( χ − 1 ( c )) , and len χ ( S ) = P c ∈ S len χ ( c ), if S ⊆ N . F or every δ ∈ [0 , 1] and x, y ∈ V , we define the set of colors C χ ( x, y ; δ ) =  c : len ( P xy ∩ χ − 1 ( c )) ≤ δ · len χ ( c )  ∩ ( χ ( P x ) 4 χ ( P y )) . This is the set of colors c which o ccur in only one of P x and P y , and for which the contribution to P xy is significantly smaller than len χ ( c ). W e also put, ρ χ ( x, y ; δ ) = len χ ( C ( x, y ; δ )) . (18) W e now state a key theorem that will b e pro v ed in Section 5. Theorem 3.1. F or every ε, δ > 0 , ther e is a value C ( ε, δ ) = O (( 1 ε + log log 1 δ ) 3 log 1 ε ) such that the fol lowing holds. F or any metric tr e e T = ( V , E ) and any monotone c oloring χ : E → N , ther e exists a mapping F : V → ` C ( ε,δ )(log n + M ( χ )) 1 , such that for al l x, y ∈ V , (1 − ε ) d T ( x, y ) − δ ρ χ ( x, y ; δ ) ≤ k F ( x ) − F ( y ) k 1 ≤ d T ( x, y ) . (19) The problem one no w confronts is whether the loss in the ρ χ ( x, y ; δ ) term can b e tolerated. In general, we do not ha ve a w ay to do this, so w e first embed our tree into a pro duct of a small n um b er of trees in a wa y that allo ws us to control the corresp onding ρ -terms. Lemma 3.2. F or every ε ∈ (0 , 1) , ther e is a numb er k  1 ε such that the fol lowing holds. F or every metric tr e e T = ( V , E ) and monotone c oloring χ : E → N , ther e exist k metric tr e es T 1 , T 2 , . . . , T k with monotone c olorings { χ i : E ( T i ) → N } k i =1 and mappings { f i : V → V ( T i ) } k i =1 such that M ( χ i ) ≤ M ( χ ) , and | V ( T i ) | ≤ | V | for al l i ∈ [ k ] , and the fol lowing c onditions hold for al l x, y ∈ V : (a) We have, 1 k k X i =1 d T i ( f i ( x ) , f i ( y )) ≥ (1 − ε ) d T ( x, y ) . (20) (b) F or al l i ∈ [ k ] , we have d T i ( f i ( x ) , f i ( y )) ≤ (1 + ε ) d T ( x, y ) . (21) 12 (c) Ther e exists a numb er j ∈ [ k ] such that ε d T ( x, y ) ≥ 2 − ( k +1) k k X i =1 i 6 = j ρ χ i ( f i ( x ) , f i ( y ); 2 − ( k +1) ) (22) Using Lemma 3.2 in conjunction with Theorem 3.1, we can now prov e the main theorem (The- orem 1.1). Pr o of of The or em 1.1. Let ε > 0 b e giv en, let T = ( V , E ) b e an n -v ertex metric tree. Let χ : E → N b e a monotone coloring with M ( χ ) ≤ O (log n ), whic h exists by Lemma 1.2. Apply Lemma 3.2 to obtain metric trees T 1 , . . . , T k with corresp onding monotone colorings χ 1 , . . . , χ k and a mappings f i : V → V ( T i ). Observ e that M ( χ i ) ≤ O (log n ) for eac h i ∈ [ k ]. Let F i : V ( T i ) → ` C ( ε ) log n 1 b e the mapping obtained b y applying Theorem 3.1 to T i and χ i , for eac h i ∈ [ k ], with δ = 2 − ( k +1) , where C ( ε ) = O ( 1 ε 3 (log 1 ε )). Finally , we put F = 1 k (( F 1 ◦ f 1 ) ⊕ ( F 2 ◦ f 2 ) ⊕ · · · ⊕ ( F k ◦ f k )) so that F : V → ` O (( 1 ε ) 4 log 1 ε · log n ) . W e will pro v e that F is a (1 + O ( ε ))-embedding, completing the pro of. First, observe that eac h F i is 1-Lipsc hitz (Theorem 3.1). In conjunction with condition (b) of Lemma 3.2 which sa ys that k f i k Lip ≤ 1 + ε for eac h i ∈ [ k ], w e ha v e k F k Lip ≤ 1 + ε . F or the other side, fix x, y ∈ V and let j ∈ [ k ] b e the num b er guaranteed in condition (c) of Lemma 3.2. Then w e ha ve, k F ( x ) − F ( y ) k 1 = 1 k k X i =1 k ( F i ◦ f i )( x ) − ( F i ◦ f i )( y ) k 1 (19) ≥ 1 k X i 6 = j  (1 − ε ) d T i ( f i ( x ) , f i ( y )) − 2 − ( k +1) ρ χ i ( f i ( x ) , f i ( y ); 2 − ( k +1) )  (22) ≥   1 k X i 6 = j (1 − ε ) d T i ( f i ( x ) , f i ( y ))   − ε d T ( x, y ) ≥ 1 k k X i =1 (1 − ε ) d T i ( f i ( x ) , f i ( y )) ! − 1 k d T j ( f j ( x ) , f j ( y )) − ε d T ( x, y ) (21) ≥ 1 k k X i =1 (1 − ε ) d T i ( f i ( x ) , f i ( y )) ! − 1 + ε k d T ( x, y ) − ε d T ( x, y ) (20) ≥ (1 − ε ) 2 d T ( x, y ) − 1 + ε k d T ( x, y ) − ε d T ( x, y ) ≥ (1 − O ( ε )) d T ( x, y ) where in the final line we ha v e used k  1 ε , completing the pro of. 13 W e now mov e on to the pro of of Lemma 3.2. W e b egin b y proving an analogous statemen t for the half line [0 , ∞ ). An R -star is a metric space formed as follows: Giv en a sequence { a i } ∞ i =1 of positive n umbers, one takes the disjoint union of the in terv als { [0 , a 1 ] , [0 , a 2 ] , . . . } , and then iden tifies the 0 p oint in each, which is canonically called the r o ot of the R -star. An R -star S carries the natural induced length metric d S . W e refer to the asso ciated interv als as br anches , and the length of a br anch is the asso ciated n um b er a i . Finally , if S is an R -star, and x ∈ S \ { 0 } , w e use ` ( x ) to denote the length of the branc h containing x . W e put ` (0) = 0. Lemma 3.3. F or every k ∈ N with k ≥ 2 , ther e exist R -stars S 1 , . . . , S k with mappings f i : [0 , ∞ ) → S i such that the fol lowing c onditions hold: i) F or e ach i ∈ [ k ] , f i (0) is the r o ot of S i . ii) F or al l x, y ∈ [0 , ∞ ) , 1 k P k i =1 d S i ( f i ( x ) , f i ( y )) ≥  1 − 7 k  | x − y | . iii) F or e ach i ∈ [ k ] , f i is (1 + 2 − k +1 ) -Lipschitz. iv) F or x ∈ [0 , ∞ ) , we have ` ( f i ( x )) ≤ 2 k − 1 x. v) F or x ∈ [0 , ∞ ) , ther e ar e at most two values of i ∈ [ k ] such that d S i ( f i (0) , f i ( x )) ≤ 2 − k ` ( f i ( x )) . vi) F or al l x, y ∈ [0 , ∞ ) , ther e is at most one value of i ∈ [ k ] such that f i ( x ) and f i ( y ) ar e in differ ent br anches of S i and 2 − k ( ` ( f i ( x )) + ` ( f i ( y ))) ≤ 2 | x − y | . Pr o of. Assume that k ≥ 2. W e first construct R -stars S 1 , . . . , S k . W e will index the branches of eac h star by Z . F or i ∈ [ k ], S i is a star whose j th branch, for j ∈ Z , has length 2 i − 1+ k ( j +1) . W e will use the notation ( i, j, d ) to denote the p oin t at distance d from the ro ot on the j th branch of S i . Observ e that ( i, j, 0) and ( i, j 0 , 0) describ e the same p oin t (the ro ot of S i ) for all j, j 0 ∈ N . No w, we define for every i ∈ [ k ], a function f i : [0 , ∞ ) → S i as follows: f i ( x ) =   i, j, ( x − 2 i + kj ) / (1 − 2 1 − k )  for 2 − i x ∈ [2 kj , 2 k ( j +1) − 1 ) ,  i, j, 2 i + k ( j +1) − x  for 2 − i x ∈ [2 k ( j +1) − 1 , 2 k ( j +1) ) . Condition (i) is immediate. It is also straigh tforward to verify that k f i k Lip ≤ (1 − 2 1 − k ) − 1 ≤ 1 + 2 − k +1 (23) yielding condition (iii). T ow ard verifying condition (ii), observe that for every x ∈ [0 , ∞ ) and j ∈ { 0 , 1 , . . . , k − 2 } we ha v e d S i ( f i ( x ) , 0) ≥  x − 2 b log 2 x c− j  / (1 − 2 1 − k ) ≥ x − 2 b log 2 x c− j , when i = ( b log 2 x c − j ) mo d k . Using this, we can write 14 k X i =1 d S i ( f i ( x ) , f i (0)) ≥ b log 2 x c X j = b log 2 x c− k +2 x − 2 j = ( k − 1) x − b log 2 x c X j = b log 2 x c− k +2 2 j ≥ ( k − 1) x − 2 b log 2 x c +1 ≥ ( k − 3) x. (24) No w fix x, y ∈ [0 , ∞ ) with x ≤ y . If x ≤ y / 2, then we can use the triangle inequalit y , together with (23) and (24) to write, 1 k k X i =1 d S i ( f i ( x ) , f i ( y )) ≥ 1 k k X i =1  d S i ( f i ( y ) , f i (0)) − d S i ( f i ( x ) , f i (0))  ≥ (1 − 3 /k ) y − (1 + 2 1 − k ) x ≥ (1 − 3 /k ) y − (1 + 1 /k ) x ≥ (1 − 7 /k )( y − x ) + 4 y /k − 8 x/k ≥ (1 − 7 /k )( y − x ) . In the case that y 2 ≤ x ≤ y , for j ∈ { 0 , 1 , . . . , k − 3 } , we ha ve d S i ( f i ( x ) , f i ( y )) ≥ ( y − x ) / (1 − 2 1 − k ) ≥ y − x, when i = ( b log 2 x c − j ) mo d k . F rom this, w e conclude that 1 k k X i =1 d S i ( f i ( x ) , f i ( y )) ≥ 1 k k − 3 X j =0 ( y − x ) ≥ k − 2 k ( y − x ) , (25) yielding condition (ii). It is also straigh tforward to chec k that ` ( f i ( x )) ≤ 2 b log 2 x c + k − 1 ≤ 2 k − 1 x, whic h verifies condition (iv). T o v erify condition (v), note that for x ∈ [0 , ∞ ), the inequalit y d S i ( f i ( x ) , f i (0)) ≤ x/ 2 can only hold for i mo d k ∈ {b log 2 x c , b log 2 x c + 1 } , hence condition (iv) implies condition (v). Finally we verify condition (vi). W e divide the problem into t w o cases. If x < y / 2, then by condition (iv), ` ( f i ( x )) + ` ( f i ( y )) ≤ 2 k − 1 ( x + y ) ≤ 2 k − 1 (2 y ) ≤ 2 k +1 ( y − x ) . In the case that y / 2 < x ≤ y , f i ( x ) and f i ( y ) can b e mapp ed to different branc hes of S i only for i ≡ b log 2 y c ( mod k ), yielding condition (vi). Finally , we mo v e onto the pro of of Lemma 3.2. 15 Pr o of of L emma 3.2. W e put k = d 7 /ε e and prov e the following stronger statemen t b y induction on | V | : There exist metric trees T 1 , T 2 , . . . , T k and monotone colorings χ i : E ( T i ) → N , along with mappings f i : V → V ( T i ) satisfying the conditions of the lemma. F urthermore, each coloring χ i satisfies the stronger condition for all v ∈ V , | χ i ( P f i ( v ) ) | ≤ | χ ( P v ) | . (26) The statemen t is trivial for the tree containing only a single vertex. No w supp ose that we hav e a tree T and coloring χ : E → N . Since T is connected, it is easy to see that there exists a color class c ∈ χ ( E ) with the following prop ert y . Let γ c b e the path whose edges are colored c , and let v c b e the v ertex of γ c closest to the ro ot. Then the induced tree T 0 on the vertex set ( V \ V ( γ c )) ∪ { v c } is connected. Applying the inductive h yp othesis to T 0 and χ | E ( T 0 ) yields metric trees T 0 1 , T 0 2 , . . . , T 0 k with colorings χ 0 i : E ( T 0 i ) → N and mappings f 0 i : V ( T 0 ) → V ( T 0 i ). No w, let S 1 , . . . , S k and { g i : [0 , ∞ ) → S i } b e the R -stars and mappings guaranteed by Lemma 3.3. F or each i ∈ [ k ], let S 0 i b e the induced subgraph of S i on the set { g i ( d T ( v , v c )) : v ∈ V ( γ c ) } , and make S 0 i in to a metric tree ro oted at g i (0), with the length structure inherited from S i . W e no w construct T i b y attaching S 0 i to T 0 i with the ro ot of S 0 i iden tified with the no de f 0 i ( v c ). The coloring χ 0 i is extended to T i b y assigning to each ro ot-leaf path in S 0 i a new color. Finally , we sp ecify functions f i : V → V ( T i ) via f i ( v ) = ( f 0 i ( v ) v ∈ V ( T 0 ) g i ( d T ( v c , v )) v ∈ V \ V ( T 0 ) . It is straight forward to verify that (26) holds for the colorings { χ i } and every vertex v ∈ V . In addition, using the inductive hypothesis, w e ha ve | V ( T i ) | ≤ | V | and M ( χ ) ≤ M ( χ i ) for every i ∈ [ k ], with the latter condition follo wing immediately from (26) and the structure of the mappings { f i } . W e now v erify that conditions (a), (b), and (c) hold. F or x, y ∈ V ( T 0 ), the induction hypothesis guaran tees all three conditions. If b oth x, y ∈ V ( γ c ), then conditions (a) and (b) follow directly from conditions (ii) and (iii) of Lemma 3.3 applied to the maps { g i } . T o verify condition (c), let j ∈ [ k ] b e the single bad index from (vi). W e ha v e for all i 6 = j , ρ χ i ( f i ( x ) , f i ( y ); 2 − ( k +1) ) ≤ 2 k +1 d T ( x, y ) . Since there are at most t wo colors on the path b etw een x and y in an y T i , by condition (v) of Lemma 3.3, there are at most four v alues of i ∈ [ k ] \ { j } such that ρ χ i ( f i ( x ) , f i ( y ); 2 − ( k +1) ) 6 = 0 , hence 1 k X i 6 = j ρ χ i ( f i ( x ) , f i ( y ); 2 − ( k +1) ) ≤ 4 · 2 k +1 k d T ( x, y ) ≤ ε 2 k +1 d T ( x, y ) . Since k f i k Lip is determined on edges ( x, y ) ∈ E , and each suc h edge has x, y ∈ V ( γ c ) or x, y ∈ V ( T 0 ), w e hav e already v erified condition (b) for all i ∈ [ k ] and x, y ∈ V . Finally , w e v erify 16 (a) and (c) for pairs with x ∈ V ( T 0 ) and y ∈ V ( γ c ). W e can chec k condition (a) using the previous t w o cases, 1 k k X i =1 d T i ( f i ( x ) , f i ( y )) = 1 k k X i =1  d T i ( f i ( x ) , f i ( v c )) + d T i ( f i ( y ) , f i ( v c ))  ≥ (1 − ε ) d T ( y , v c ) + (1 − ε ) d T ( x, v c ) ≥ (1 − ε ) d T ( x, y ) . T ow ards v erifying condition (c), note that b y condition (v) from Lemma 3.3, there are at most t w o v alues of i , such that ρ χ i ( f i ( x ) , f i ( y ); 2 − ( k +1) ) − ρ χ i ( f i ( x ) , f i ( v c ); 2 − ( k +1) ) = ρ χ i ( f i ( y ) , f i ( v c ); 2 − ( k +1) ) 6 = 0 . By the induction h yp othesis, there exists a num b er j ∈ [ k ] suc h that ε d T ( x, v c ) ≤ 2 − ( k +1) k X i 6 = j ρ χ i ( f i ( v c ) , f i ( x ); 2 − ( k +1) ) . No w we use condition (iv) from Lemma 3.3 to conclude, 2 − ( k +1) k X i 6 = j ρ χ i ( f i ( x ) , f i ( y ); 2 − k ) ≤ 2 − ( k +1) k X i 6 = j  ρ χ i ( f i ( x ) , f i ( v c ); 2 − k ) + ρ χ i ( f i ( y ) , f i ( v c ); 2 − k )  ≤ εd T ( x, v c ) + 2 − ( k +1) k ! (2 k − 1 d T ( y , v c )) ≤ ε d T ( x, v c ) + ε d T ( v c , y ) = ε d T ( x, y ) , completing the pro of. 3.2 Multi-scale em b eddings W e no w presen t the basics of our m ulti-scale em b edding approach. The next lemma is devoted to com bining scales together without using to o many dimensions, while controlling the distortion of the resulting map. Lemma 3.4. F or every ε ∈ (0 , 1) , the fol lowing holds. L et ( X , d ) b e an arbitr ary metric sp ac e, and c onsider a family of functions { f i : X → [0 , 1] } i ∈ Z such that for al l x, y ∈ X , we have X i ∈ Z 2 i | f i ( x ) − f i ( y ) | < ∞ . (27) Then ther e is a mapping F : V → ` 2+ d log 1 ε e 1 such that for al l x, y ∈ X , (1 − ε ) X i ∈ Z 2 i | f i ( x ) − f i ( y ) | − 2 ζ ( x, y ) ≤ k F ( x ) − F ( y ) k 1 ≤ X i ∈ Z 2 i | f i ( x ) − f i ( y ) | , wher e ζ ( x, y ) = X i : ∃ j max S 2 kj + i | f kj + i ( x ) − f kj + i ( y ) | ≤ 2 c + X j < max S 2 kj + i + ζ i ( x, y ) ≤ 2 c + 2 · 2 k (max S − 1)+ i + ζ i ( x, y ) ≤ 2 c (1 + 2 1 − k ) + ζ i ( x, y ) ≤ (1 + ε/ 2)2 c + ζ i ( x, y ) . 18 On the other hand, | F i ( x ) − F i ( y ) | =       X j ∈ Z 2 kj + i ( f j k + i ( x ) − f j k + i ( y ))       ≥ 2 c − X j ∈ S ∪ T j < max S 2 kj + i − X j ∈ T j > max S 2 kj + i | f kj + i ( x ) − f kj + i ( y ) | ≥ 2 c − X j < max S 2 kj + i − ζ i ( x, y ) ≥ 2 c − 2 · 2 k (max S − 1)+ i − ζ i ( x, y ) ≥ 2 c (1 − 2 1 − k ) − ζ i ( x, y ) ≥ (1 − ε/ 2)2 c − ζ i ( x, y ) . Therefore, (1 − ε ) X j ∈ Z 2 kj + i | f j k + i ( x ) − f j k + i ( y ) | ≤ (1 − ε )((1 + ε/ 2)2 c + ζ i ( x, y )) ≤ (1 − ε/ 2)2 c + ζ i ( x, y ) ≤ | F i ( x ) − F i ( y ) | + 2 ζ i ( x, y ) , completing the verification of (29) in the case when S 6 = ∅ . In the remaining case when S = ∅ and T 6 = ∅ , if the set T do es not ha ve a minimum elemen t, then X j ∈ T 2 kj + i | f kj + i ( x ) − f kj + i ( y ) | = ζ i ( x, y ) , making (29) v acuous since the righ t-hand side is non-p ositive. Otherwise, let ` = min( T ), and write | F i ( x ) − F i ( y ) | =       X j ∈ T 2 kj + i ( f kj + i ( x ) − f kj + i ( y ))       ≥ 2 `k + i | f `k + i ( x ) − f `k + i ( y ) | −       X j ∈ T ,j >` 2 kj + i ( f kj + i ( x ) − f kj + i ( y ))       ≥ 2 `k + i | f `k + i ( x ) − f `k + i ( y ) | − ζ i ( x, y ) = X j ∈ Z 2 kj + i | f kj + i ( x ) − f kj + i ( y ) | − 2 ζ i ( x, y ) . This completes the pro of. In Section 5, w e will require the follo wing straigh tforw ard corollary . Corollary 3.5. F or every ε ∈ (0 , 1) and m ∈ N , the fol lowing holds. L et ( X , d ) b e a metric sp ac e, and supp ose we have a family of functions { f i : X → [0 , 1] m } i ∈ Z such that for al l x, y ∈ X , X i ∈ Z 2 i k f i ( x ) − f i ( y ) k 1 < ∞ . 19 Then ther e exists a mapping F : V → ` m (2+ d log 1 ε e ) 1 such that for al l x, y ∈ X , (1 − ε ) X i ∈ Z  2 i k f i ( x ) − f i ( y ) k 1  − 2 ζ ( x, y ) ≤ k F ( x ) − F ( y ) k 1 ≤ X i ∈ Z 2 i k f i ( x ) − f i ( y ) k 1 , wher e ζ ( x, y ) = m X k =1 X i : ∃ j 0 } . W e now define a family of functions { τ i : V → N ∪ { 0 }} i ∈ Z . 20 F or v ∈ V , let c = χ ( v , p ( v )), and put τ i ( v ) = 0 for i < j log 2  m ( T ) M ( χ )+log 2 | E | k , and otherwise, τ i ( v ) = min     d T ( v , v c ) − min  d T ( v , v c ) , P i − 1 j = −∞ 2 j τ j ( v )  2 i     | {z } ( A ) , ϕ ( c ) − X c 0 ∈ χ ( E ( P v )) τ i ( v c 0 ) | {z } ( B ) ! . (34) The v alue of τ i ( v ) will b e used in Section 5 to determine ho w many coordinates of magnitude  2 i c hange as the embedding pro ceeds from v c to v . In this definition, we try to co ver the distance from ro ot to v with the smallest scales p ossible while satisfying the inequality ϕ ( c ) ≥ τ i ( v ) + X c 0 ∈ χ ( E ( P v )) τ i ( v c 0 ) . F or v ∈ V \ { r } , let c = χ ( v , p ( v )), for each i ∈ Z , part (B) of (34) for τ i ( v c ) implies that τ i ( v c ) ≤ ϕ ( ρ ( c )) − X c 0 ∈ χ ( E ( P v c )) τ i ( v c 0 ) . Hence, ϕ ( c ) − X c 0 ∈ χ ( E ( P v )) τ i ( v c 0 ) = ϕ ( c ) − τ i ( v c ) − X c 0 ∈ χ ( E ( P v c )) τ i ( v c 0 ) ≥ ϕ ( c ) − ϕ ( ρ ( c )) = κ ( c ) ≥ 1 . (35) Therefore, part ( B ) of (34) is alwa ys p ositiv e, so if τ k ( v ) = 0 for some k ≥ j log 2  m ( T ) M ( χ )+log 2 | E | k , then τ k ( v ) is defined b y part ( A ) of (18). Hence P i − 1 j = −∞ 2 j τ j ( v ) ≥ d T ( v , v c ) and the follo wing observ ation is immediate. Observ ation 4.1. F or v ∈ V and k ≥ j log 2  m ( T ) M ( χ )+log 2 | E | k , if τ k ( v ) = 0 then for al l i ≥ k , τ i ( v ) = 0 . Comparing part ( A ) of (34) for τ i ( v ) and τ i +1 ( v ) also allows us to observe the following. Observ ation 4.2. F or v ∈ V and k ≥ j log 2  m ( T ) M ( χ )+log 2 | E | k , if p art (A) in (34) for τ k ( v ) is less than or e qual to p art (B) then for al l i > k , τ i ( v ) = 0 . 4.2 Prop erties of the scale selector maps W e now pro v e some k ey properties of the maps κ, ϕ , and { τ i } . Lemma 4.3. F or every vertex v ∈ V with c = χ ( v , p ( v )) , the fol lowing holds. F or al l i ∈ Z with d T ( v ,v c ) κ ( c ) ≤ 2 i − 1 , we have τ i ( v ) = 0 . 21 Pr o of. If d T ( v , v c ) = 0, the lemma is v acuous. Suppose now that d T ( v , v c ) > 0, and let k = l log 2  d T ( v ,v c ) κ ( c ) m . W e ha ve d T ( v , v c ) ≥ m ( T ) and κ ( c ) ≤ log 2 | E | + 1, therefore k ≥  log 2  m ( T ) M ( χ ) + log 2 | E |  . It follows that for i ≥ k , τ i ( v ) is given b y (34). If τ k ( v ) = 0, then by Observ ation 4.1, for all i ≥ k , τ i ( v ) = 0. On the other hand if τ k ( v ) 6 = 0 then either it is determined b y part (B) of (34), in which case τ k ( v ) = ϕ ( c ) − X c 0 ∈ χ ( E ( P v )) τ k ( v c 0 ) = ϕ ( c ) − τ k ( v c ) − X c 0 ∈ χ ( E ( P v c )) τ k ( v c 0 ) ≥ ϕ ( c ) − ϕ ( ρ ( c )) = κ ( c ) , implying that k X j = −∞ 2 j τ j ( v ) ≥ κ ( c )2 k ≥ d T ( v , v c ) . Examining part (A) of (34), w e see that τ k +1 ( v ) = 0, and b y Observ ation 4.1, τ i ( v ) = 0 for i > k . Alternately , τ k ( v ) is determined by part (A) of (34), and by Observ ation 4.2 τ i ( v ) = 0 for i > k , completing the pro of. The next lemma sho ws ho w the v alues { τ i ( v ) } track the distance from v c to v . Lemma 4.4. F or any vertex v ∈ V with c = χ ( v , p ( v )) , we have d T ( v , v c ) ≤ ∞ X i = −∞ 2 i τ i ( v ) ≤ 3 d T ( v , v c ) . Pr o of. If d T ( v , v c ) = 0, the lemma is v acuous. Supp ose now that d T ( v , v c ) > 0, and let k = max { i : τ i ( v ) 6 = 0 } . By Lemma 4.3, the maximum exists. W e hav e τ k +1 ( v ) = 0, and thus inequalit y (35) implies that part (A) of (34) sp ecifies τ k +1 ( v ), yielding d T ( v , v c ) ≤ k X i = −∞ 2 i τ i ( v ) = ∞ X i = −∞ 2 i τ i ( v ) . On the other hand, since τ k ( v ) > 0, we must hav e d T ( v , v c ) > P k − 1 i = −∞ 2 i τ i ( v ) , and Lemma 4.3 22 implies that 2 k < 2 d T ( v , v c ) , hence, k X i = −∞ 2 i τ i ( v ) ≤ k − 1 X i = −∞ 2 i τ i ( v ) + 2 k & d T ( v , v c ) − P k − 1 i = −∞ 2 i τ i ( v ) 2 k ' < k − 1 X i = −∞ 2 i τ i ( v ) + 2 k d T ( v , v c ) − P k − 1 i = −∞ 2 i τ i ( v ) 2 k + 1 ! = k − 1 X i = −∞ 2 i τ i ( v ) + 2 k + d T ( v , v c ) − k − 1 X i = −∞ 2 i τ i ( v ) ! ≤ d T ( v , v c ) + 2 k < 3 d T ( v , v c ) . The follo wing lemma shows that for an y color c ∈ χ ( E ) the v alue of τ i do es not decrease as w e mo v e further from v c in γ c . Lemma 4.5. L et u, w ∈ V b e such that c = χ ( w , p ( w )) = χ ( u, p ( u )) , and d T ( w , v c ) ≤ d T ( u, v c ) . Then for al l i ∈ Z , we have τ i ( w ) ≤ τ i ( u ) . Pr o of. First let k b e the smallest in teger for whic h,     d T ( w , v c ) − min  d T ( w , v c ) , P k − 1 j = −∞ 2 j τ j ( w )  2 k     ≤ ϕ ( c ) − X c 0 ∈ χ ( E ( P w )) τ k ( v c 0 ) . This k exists since, b y (35), the righ t hand side is alwa ys p ositiv e, while by Lemma 4.3, the left hand side must b e zero for some k ∈ Z . F or i > k , by Observ ation 4.2 we hav e, τ i ( w ) = 0. Therefore, for i > k , we ha v e τ i ( u ) ≥ τ i ( w ). W e now use induction on i to show that for i < k , τ i ( u ) = τ i ( w ), and for i = k , τ k ( u ) ≥ τ k ( w ). Recall that, for i < j log 2  m ( T ) M ( χ )+log 2 | E | k , we hav e τ i ( w ) = τ i ( u ) = 0, which gives us the base case of the induction. No w, by definition of k , part (B) of (34) for τ k − 1 ( w ) is an integer strictly less than part (A), hence k − 1 X j = −∞ 2 j τ j ( w ) = 2 k − 1 τ k − 1 ( w ) + k − 2 X j = −∞ 2 j τ j ( w ) ≤ 2 k − 1 & d T ( w , v c ) − P k − 2 j = −∞ 2 j τ j ( w ) 2 k − 1 ' − 1 ! + k − 2 X j = −∞ 2 j τ j ( w ) < 2 k − 1 d T ( w , v c ) − P k − 2 j = −∞ 2 j τ j ( w ) 2 k − 1 ! + k − 2 X j = −∞ 2 j τ j ( w ) ≤ d T ( w , v c ) . (36) 23 F or j log 2  m ( T ) M ( χ )+log 2 | E | k ≤ i ≤ k , by (36), and as d T ( u, v c ) ≥ d T ( w , v c ), we ha ve min  d T ( w , v c ) , i − 1 X j = −∞ 2 j τ j ( w )  = i − 1 X j = −∞ 2 j τ j ( w ) = min  d T ( u, v c ) , i − 1 X j = −∞ 2 j τ j ( w )  . (37) By our induction h yp othesis for all j < i , τ j ( w ) = τ j ( u ), so using (37) we can write, d T ( w , v c ) − min  d T ( w , v c ) , i − 1 X j = −∞ 2 j τ j ( w )  ≤ d T ( u, v c ) − min  d T ( u, v c ) , i − 1 X j = −∞ 2 j τ j ( u )  . (38) Since χ ( w , p ( w )) = χ ( u, p ( u )), for all i ∈ Z part (B) of (34) is iden tical for τ i ( u ) and τ i ( w ). Therefore, using (38), and the definition of k , for all j log 2  m ( T ) M ( χ )+log 2 | E | k ≤ i < k , part (B) of (34) sp ecifies τ i ( u ) and τ i ( w ), hence τ i ( u ) = τ i ( w ) = ϕ ( c ) − X c 0 ∈ χ ( E ( P w )) τ i ( v c 0 ) . F or the case that i = k , part (B) of (34) is iden tical for τ k ( u ) and τ k ( w ), and inequality (38) implies that part (A) of (34) for τ k ( u ) is at least as large as part (A) of (34) for τ k ( w ), completing the pro of. The next lemma b ounds the distance b etw een tw o v ertices in the graph based on { τ i } . Lemma 4.6. L et k > j log 2  m ( T ) M ( χ )+log 2 | E | k b e an inte ger. F or any two vertic es w and u such that τ k ( u ) 6 = 0 , τ k − 1 ( w ) = 0 and χ ( w, p ( w )) = χ ( u, p ( u )) , we have d T ( u, w ) > 2 k − 1 . Pr o of. By Observ ation 4.1, τ k ( w ) = 0. Letting c = χ ( u, p ( u )), b y Lemma 4.5 w e ha ve d T ( v c , u ) ≥ d T ( v c , w ). Using Lemma 4.5 again, we can conclude that for all i ∈ Z , τ i ( u ) ≥ τ i ( w ). Since τ k − 1 ( w ) = 0, inequalit y (35) implies that part (A) of (34) specifies τ k − 1 ( w ). Therefore, d T ( w , v c ) ≤ k − 2 X i = −∞ 2 i τ i ( w ) ≤ k − 2 X i = −∞ 2 i τ i ( u ) = k − 1 X i = −∞ 2 i τ i ( u ) ! − 2 k − 1 τ k − 1 ( u ) . (39) Since τ k ( u ) > 0, using part ( A ) of (34), we can write d T ( u, v c ) > k − 1 X i = −∞ 2 i τ i ( u ) . (40) 24 Observ ation 4.1 implies that τ k − 1 ( u ) 6 = 0, thus τ k − 1 ( u ) ≥ 1, and using (39) and (40), w e ha v e d T ( w , u ) = d T ( u, v c ) − d T ( w , v c ) > 2 k − 1 , completing the pro of. The next lemma and the following tw o corollaries b ound the n um b er of colors c in the tree whic h hav e a small v alue of ϕ ( c ). Lemma 4.7. F or any k ∈ N ∪ { 0 } , and any c olor c ∈ χ ( E ) , we have # { c 0 ∈ χ ( E ( T ( c ))) : ϕ ( c 0 ) − ϕ ( c ) = k } ≤ 2 k . Pr o of. W e start the pro of by comparing the size of the subtrees T ( c 0 ) and T ( c ) for c 0 ∈ χ ( E ( T ( c ))). F or a given color c 0 ∈ χ ( E ( T ( c ))), we define the sequence { c i } i ∈ N as follows. W e put c 1 = c 0 and for i > 1 we put c i = ρ ( c i − 1 ). Supp ose no w that c m = c , we ha ve ϕ ( c m ) − ϕ ( c 1 ) = m − 1 X i =1 κ ( c i ) ≥ m − 1 X i =1 log 2  | E ( T ( c i +1 )) | | E ( T ( c i )) |  ≥ log 2  | E ( T ( c )) | | E ( T ( c 0 )) |  . (41) This inequality implies that | E ( T ( c )) | ≤ 2 ϕ ( c 0 ) − ϕ ( c ) | E ( T ( c 0 )) | . It is easy to c heck that for colors a, b ∈ χ ( E ( T ( c ))) suc h that ϕ ( a ) = ϕ ( b ), subtrees T ( a ) and T ( b ) are edge disjoin t. Therefore, for k ∈ N ∪ { 0 } , summing o ver all the colors c 0 suc h that ϕ ( c 0 ) − ϕ ( c ) = k giv es # { c 0 ∈ χ ( E ( T ( c ))) : ϕ ( c 0 ) − ϕ ( c ) = k } ≤ X c 0 ∈ χ ( E ( T ( c ))) ϕ ( c 0 ) − ϕ ( c )= k 2 k | E ( T ( c 0 )) | | E ( T ( c )) | = 2 k X c 0 ∈ χ ( E ( T ( c ))) ϕ ( c 0 ) − ϕ ( c )= k | E ( T ( c 0 )) | | E ( T ( c )) | ≤ 2 k . The following t wo corollaries are immediate from Lemma 4.7. Corollary 4.8. F or any k ∈ N , and any c olor c ∈ χ ( E ) , we have # { c 0 ∈ χ ( E ( T ( c ))) : ϕ ( c 0 ) − ϕ ( c ) ≤ k } < 2 k +1 . Corollary 4.9. F or any c olor c ∈ χ ( E ) , and c onstant C ≥ 2 , we have X c 0 ∈ χ ( E ( T ( c ))) \{ c } 2 − C ( ϕ ( c 0 ) − ϕ ( c )) < 2 2 − C . 25 The next lemma is similer to Lemma 4.6. The assumption is more general, and the conclusion is corresp ondingly weak er. This result is used primarily to enable the proof of Lemma 4.11. Lemma 4.10. L et u ∈ V and w ∈ V ( P u ) b e such that ϕ ( χ ( u, p ( u ))) > ϕ ( χ ( w, p ( w ))) . F or al l vertic es x ∈ V ( T u ) , and k ∈ Z with 2 k >  6 d T ( x, w ) ϕ ( χ ( u, p ( u ))) − ϕ ( χ ( w , p ( w )))  , (42) we have τ k ( x ) = 0 . Pr o of. In the case that d T ( x, w ) = 0, this lemma is v acuous. Suppose no w that d T ( x, w ) > 0. Let c 1 , . . . , c m b e the set of colors that app ear on the path P x p ( w ) , in order from x to p ( w ), and for i ∈ [ m ], let y i = v c i . W e prov e this lemma by sho wing that if, k ≥ log 2  6 d T ( x, w ) ϕ ( χ ( u, p ( u ))) − ϕ ( χ ( w , p ( w )))  , (43) then part ( A ) of (34) for τ k ( x ) is zero. First note that, ϕ ( χ ( u, p ( u ))) − ϕ ( χ ( w , p ( w ))) ≤ M ( χ ) + log 2 | E | and d T ( x, w ) ≥ m ( T ), hence (43) implies k ≥  log 2  m ( T ) M ( χ ) + log 2 | E |  . By Lemma 4.4, w e ha v e m − 2 X i =1 2 k − 1 τ k − 1 ( y i ) ≤ m − 2 X i =1 ∞ X j = −∞ 2 j τ j ( y i ) ≤ m − 2 X i =1 3 d T ( y i , y i +1 ) = 3 d T ( y 1 , y m − 1 ) . (44) No w, using (42) gives ϕ ( c 1 ) − ϕ ( c m ) ≥ ϕ ( χ ( u, p ( u ))) − ϕ ( χ ( w , p ( w ))) ≥ 6 d T ( x, w ) 2 k ≥ 6 d T ( x, y m − 1 ) 2 k . (45) Using the ab ov e inequality and (44), we can write d T ( x, y 1 ) = d T ( x, y m − 1 ) − d T ( y 1 , y m − 1 ) ≤ 2 k − 1 3 ϕ ( c 1 ) − ϕ ( c m ) − m − 2 X i =1 τ k − 1 ( y i ) ! . 26 First, note that c m = χ ( y m − 1 , p ( y m − 1 )). No w, we use part ( B ) of (34) for τ k ( y m − 1 ) to write d T ( x, y 1 ) ≤ 2 k − 1 3   ϕ ( c 1 ) −   τ k − 1 ( y m − 1 ) + X c 0 ∈ χ ( E ( P y m − 1 )) τ k − 1 ( v c 0 )   − m − 2 X i =1 τ k − 1 ( y i )   ≤ 2 k − 1 3   ϕ ( c 1 ) − X c 0 ∈ χ ( E ( P x )) τ k − 1 ( v c 0 )   ≤ 2 k − 1   ϕ ( χ ( x, p ( x ))) − X c 0 ∈ χ ( E ( P x )) τ k − 1 ( v c 0 )   . (46) Therefore, either part (A) of (34) specifies τ k − 1 ( x ) in whic h case b y Observ ation 4.2, τ i ( v ) = 0 for i ≥ k , or part (B) of (34) sp ecifies τ k − 1 ( x ) in which case by (46) we ha v e, τ k − 1 ( x )2 k − 1 ≥ d T ( x, y 1 ) , and part (A) of (34) is zero for i ≥ k . In Section 5, we give the description of our em b edding and analyze its distortion. In the analysis of embedding, for a given pair of v ertices x, y ∈ V , w e divide the path betw een x and y in to subpaths and for eac h subpath we show that either the contribution of that subpath to the distance betw een x and y in the em b edding is “large” through a concen tration of measure argument, or we use the follo wing lemma to show that the length of the subpath is “small,” compared to the distance b et ween x and y . The complete argumen t is somewhat more delicate and one can find the details of how Lemma 4.11 is used in the pro of of Lemma 5.15. Lemma 4.11. Ther e exists a c onstant C > 0 such that the fol lowing holds. F or any c ∈ χ ( E ) and v ∈ V ( T ( c )) , and for any ε ∈ (0 , 1 2 ] , ther e ar e vertic es u, u 0 ∈ V with u 6 = u 0 and d T ( u, v ) ≤ ε d T ( u, u 0 ) , and such that, u, u 0 ∈ { v a : a ∈ χ ( E ( P v v c )) } ∪ { v } . F urthermor e, for al l vertic es x ∈ V ( P u 0 u ) \ { u 0 } , for al l k ∈ Z , τ k ( x ) 6 = 0 = ⇒ 2 k <  C d T ( u, u 0 ) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( χ ( v c , p ( v c ))))  . Pr o of. Let r 0 = v c , and let c 1 , . . . , c m b e the set of colors that app ear on the path P v r 0 in order from v to r 0 , and put c m +1 = χ ( r 0 , p ( r 0 )). W e define y 0 = v , and for i ∈ [ m ], y i = v c i . Note that { y 0 , . . . , y m } = { v } ∪ { v a : a ∈ χ ( E ( P v v c )) } , and for i ≤ m , χ ( y i , p ( y i )) = c i +1 . W e giv e a constructiv e pro of for the lemma. F or i ∈ N , we construct a sequence ( a i , b i ) ∈ N × N , the idea b eing that P y a i ,y b i is a nonempty subpath P v r 0 suc h that for different v alues of i , these subpaths are edge disjoint. At each step of construction either we can use ( a i , b i ) to find u and u 0 suc h that they satisfy the prop erties of this lemma, or w e find ( a i +1 , b i +1 ) such that b i +1 < b i . The last condition guaran tees that w e can alw a ys find u and u 0 that satisfy conditions of this lemma. 27 W e start with a 1 = m and b 1 = m − 1. If d T ( v , y b 1 ) ≤ εd T ( y a 1 , y b 1 ) then  2 d T ( y m , y m − 1 ) ϕ ( χ ( y m − 1 , p ( y m − 1 ))) − ϕ ( χ ( r 0 , p ( r 0 )))  = 2 d T ( y a 1 , y b 1 ) κ ( c ) and by Lemma 4.3 the assignment u 0 = y a 1 and u = y b 1 satisfies the conditions of this lemma if C ≥ 1 2 . Otherwise, for i ≥ 1, w e choose ( a i +1 , b i +1 ) based on ( a i , b i ), and construct the rest of the sequence preserving the follo wing three prop erties: i) ϕ ( c b i +1 ) − ϕ ( c a i +1 ) ≥ ϕ ( c a i +1 ) − ϕ ( χ ( r 0 , p ( r 0 ))); ii) d T ( y b i , v ) ≥ εd T ( y b i , y a i ); iii) a i > b i . Let j ∈ { 0 , . . . , m } b e the maximum in teger such that εd T ( y j , y b i ) ≥ d T ( v , y j ). Note that j < b i , and the maximum alw ays exists b ecause y 0 = v . W e will no w split the pro of into three cases. Case I: ϕ ( c j +2 ) − ϕ ( c b i +1 ) ≥ 2( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) . In this case by condition (iii), ϕ ( c b i +1 ) − ϕ ( c a i +1 ) > 0. Hence j + 1 < b i , and w e can preserve conditions (i), (ii) and (iii) with ( a i +1 , b i +1 ) = ( b i , j + 1) . Case II: ϕ ( c j +2 ) − ϕ ( c b i +1 ) < 2( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) and ϕ ( c j +1 ) − ϕ ( c b i +1 ) ≥ 6( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) . In this case b y (32) w e ha ve, κ ( c j +1 ) = ϕ ( c j +1 ) − ϕ ( c j +2 ) = ( ϕ ( c j +1 ) − ϕ ( c b i +1 )) − ( ϕ ( c j +2 ) − ϕ ( c b i +1 )) . Using the conditions of this case, w e write κ ( c j +1 ) = ( ϕ ( c j +1 ) − ϕ ( c b i +1 )) − ( ϕ ( c j +2 ) − ϕ ( c b i +1 )) ≥ 6( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) − ( ϕ ( c j +2 ) − ϕ ( c b i +1 )) =  2( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) + 4( ϕ ( c b i +1 ) − ϕ ( c a i +1 ))  −  ϕ ( c j +2 ) − ϕ ( c b i +1 )  >  2( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) + 2( ϕ ( c j +2 ) − ϕ ( c b i +1 ))  −  ϕ ( c j +2 ) − ϕ ( c b i +1 )  , and by condition (i), κ ( c j +1 ) >   ϕ ( c b i +1 ) − ϕ ( c a i +1 )  +  ϕ ( c a i +1 ) − ϕ ( χ ( r 0 , p ( r 0 ))  + 2( ϕ ( c j +2 ) − ϕ ( c b i +1 ))  −  ϕ ( c j +2 ) − ϕ ( c b i +1 )  = ϕ ( c j +2 ) − ϕ ( χ ( r 0 , p ( r 0 ))) . (47) Th us if d T ( y j +1 , v ) ≥ ε d T ( y j , y j +1 ), then ( a i +1 , b i +1 ) = ( j + 1 , j ), satisfies condition (i) by (47), and it is also easy to verify that it satisfies conditions (ii) and (iii). If d T ( y j +1 , v ) < ε d T ( y j , y j +1 ), then by (32), ϕ ( χ ( y j , p ( y j ))) = ϕ ( c j +1 ) = κ ( c j +1 ) + ϕ ( c j +2 ) 28 and by (47),  2 d T ( y j , y j +1 ) ( ϕ ( χ ( y j , p ( y j ))) − ϕ ( χ ( r 0 , p ( r 0 ))))  =  2 d T ( y j , y j +1 ) κ ( c j +1 ) + ϕ ( c j +2 ) − ϕ ( χ ( r 0 , p ( r 0 )))  > d T ( y j , y j +1 ) κ ( c j +1 ) . Hence Lemma 4.3 implies that the assignmen t u 0 = y j +1 and u = y j satisfies the conditions of this lemma if C ≥ 1 2 . Case I I I: ϕ ( c j +1 ) − ϕ ( c b i +1 ) < 6( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) . In this case we use Lemma 4.10 to show that the assignmen t u = y j and u 0 = y b i satisfies the conditions of the lemma. W e hav e ϕ ( χ ( y j , p ( y j ))) − ϕ ( χ ( r 0 , p ( r 0 ))) = ϕ ( c j +1 ) − ϕ ( χ ( r 0 , p ( r 0 ))) = ( ϕ ( c j +1 − ϕ ( c b i +1 )) + ( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) + ( ϕ ( c a i +1 ) − ϕ ( χ ( r 0 , p ( r 0 )))) < 6( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) + ( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) + ( ϕ ( c a i +1 ) − ϕ ( χ ( r 0 , p ( r 0 )))) , and by condition (i), ϕ ( χ ( y j , p ( y j ))) − ϕ ( χ ( r 0 , p ( r 0 ))) < 8( ϕ ( c b i +1 ) − ϕ ( c a i +1 )) . Condition (ii) and the definition of y j imply that, d T ( y j , y b i ) ≥ (1 − ε ) d T ( v , y b i ) ≥ ε (1 − ε ) d T ( y a i , y b i ) ≥ ε 2 d T ( y a i , y b i ) . Hence, 6( 2 ε ) d T ( y j , y b i ) 1 8 ( ϕ ( χ ( y j , p ( y j ))) − ϕ ( χ ( r 0 , p ( r 0 )))) ! ≥  6 d T ( y b i , y a i ) ϕ ( c b i +1 ) − ϕ ( c a i +1 )  , and b y applying Lemma 4.10 with u = y b i and w = y a i , w e can conclude that the assignmen t u = y j and u 0 = y b i satisfies the conditions of this lemma with C = 96. 5 The em b edding W e now present a pro of of Theorem 3.1, thereby completing the pro of of Theorem 1.1. W e first in tro duce a random em b edding of the tree T in to ` 1 , and then show that, for a suitable choice of parameters, with non-zero probabilit y our construction satisfies the conditions of the theorem. Notation: W e use the notations and definitions introduced in Section 4. Moreov er, in this section, for c ∈ χ ( E ) ∪ { χ ( r , p ( r )) } , w e use ρ − 1 ( c ) to denote the set of colors c 0 ∈ χ ( E ) such that ρ ( c 0 ) = c , i.e. the colors of the “children” of c . F or m, n ∈ N , and A ∈ R m × n , we use the notation A [ i ] to refer to the i th row of A and A [ i, j ] to refer to the j th elemen t in the i th ro w. 29 5.1 The construction Fix δ, ε ∈ (0 , 1 2 ], and let t = d ε − 1 + log d log 2 1 /δ ee , (48) and m = d t 2 ( M ( χ ) + log 2 | E | ) e . (49) (See Lemma 5.15 for the relation b et ween ε and δ , and the parameters of Theorem 3.1). F or i ∈ Z , w e first define the map ∆ i : V → R m × t , and then w e use it to construct our final em b edding. F or a v ertex v ∈ V and c = χ ( v , p ( v )), let α = P c 0 ∈ χ ( E ( P v )) t 2 τ i ( v c 0 ), and β = α + min t 2 τ i ( v ) , $ d T ( v c , v ) − P i − 1 ` = −∞ 2 ` τ ` ( v ) 2 i /t 2 %! . Note that β ≤ m since τ i ( v ) + X c 0 ∈ χ ( E ( P v )) τ i ( v 0 c ) ≤ ϕ ( c ) ≤ M ( χ ) + log 2 | E | . F or j ∈ [ m ], w e define, ∆ i ( v )[ j ] =         2 i t 2 , 0 , 0 . . . , 0  if α < j ≤ β ,  d T ( v c , v ) −  P i − 1 ` = −∞ 2 ` τ ` ( v )  + ( β − α ) 2 i t 2  , 0 , 0 . . . , 0  if j = β + 1 and β − α < t 2 τ i ( v ), (0 , 0 . . . , 0) otherwise. (50) Observ e that the scale selector τ i c ho oses the scales in this definition, and for v ∈ V and i ∈ Z , ∆ i ( v ) = 0 when τ i ( v ) = 0. Also note that the second case in the definition only occurs when τ i ( v ) is sp ecified by part (A) of (34), and in that case P ` ≤ i 2 ` τ ` ( v ) > d ( v, v c ). No w, w e presen t some key prop erties of the map ∆ i ( v ). The following t wo observ ations follo w immediately from the definitions. Observ ation 5.1. F or v ∈ V and i ∈ Z , e ach r ow in ∆ i ( v ) has at most one non-zer o c o or dinate. Observ ation 5.2. F or v ∈ V and i ∈ Z , let α = P c 0 ∈ χ ( E ( P v )) t 2 τ i ( v c 0 ) . F or j / ∈ ( α, α + t 2 τ i ( v )] , we have ∆ i ( v )[ j ] = (0 , . . . , 0) . Pro ofs of the next four lemmas will be presen ted in Section 5.2. Lemma 5.3. F or v ∈ V , ther e is at most one i ∈ Z and at most one c ouple ( j, k ) ∈ [ m ] × [ t ] such that ∆ i ( v )[ j, k ] / ∈ { 0 , 2 i t 2 } . Lemma 5.4. L et c ∈ χ ( E ) , and u, w ∈ V ( γ c ) \{ v c } b e such tha t d T ( w , v c ) ≤ d T ( u, v c ) . F or al l i ∈ Z and ( j, k ) ∈ [ m ] × [ t ] , we have ∆ i ( w )[ j, k ] ≤ ∆ i ( u )[ j, k ] . 30 Lemma 5.5. F or c ∈ χ ( E ) , and u, w ∈ V ( γ c ) \ { v c } , we have d T ( w , u ) = X i ∈ Z k ∆ i ( u ) − ∆ i ( w ) k 1 , (51) and d T ( v c , u ) = X i ∈ Z k ∆ i ( u ) k 1 . (52) Lemma 5.6. F or c ∈ χ ( E ) , u, w ∈ V ( γ c ) \ { v c } , i > j and k ∈ [ m ] , if b oth k ∆ i ( u )[ k ] − ∆ i ( w )[ k ] k 1 6 = 0 , and k ∆ j ( u )[ k ] − ∆ j ( w )[ k ] k 1 6 = 0 , then d T ( u, w ) ≥ 2 j − 1 . Re-randomization. F or t ∈ N , let π t : R t → R t b e a random mapping obtained by uniformly p erm uting the co ordinates in R t . Let { σ i } i ∈ [ m ] b e a sequence of i.i.d. random v ariables with the same distribution as π t . W e define the random v ariable π t,m : R m × t → R m × t as follows, π t,m    r 1 . . . r m    =    σ 1 ( r 1 ) . . . σ m ( r m )    . The construction. W e no w use re-randomization to construct our final em b edding. F or c ∈ χ ( E ), and i ∈ Z , the map f i,c : V ( T ( c )) → R m × t will represent an em b edding of the subtree T ( c ) at scale 2 i /t 2 . Recall that, V ( T ( c )) = V ( γ c ) ∪   [ c 0 ∈ ρ − 1 ( c ) V ( T ( c 0 )) \ { v c 0 }   . Let { Π i,c 0 : i ∈ Z , c 0 ∈ ρ − 1 ( c ) } b e a sequence of i.i.d. random v ariables which each hav e the distribution of π t,m . W e define f i,c : V ( T ( c )) → R m × t as follows, f i,c ( x ) =    0 if x = v c , ∆ i ( x ) if x ∈ V ( γ c ) \ { v c } , ∆ i ( v c 0 ) + Π i,c 0 ( f i,c 0 ( x )) if x ∈ V ( T ( c 0 )) \ { v c 0 } for some c 0 ∈ ρ − 1 ( c ) . (53) Re-randomization p erm utes the elemen ts within each row, and the p erm utations are indep en- den t for differen t subtrees, scales, and rows. Finally , w e define f i = f i,c 0 , where c 0 = χ ( r , p ( r )). W e use the following lemma to pro ve Theorem 3.1. Lemma 5.7. Ther e exists a universal c onstant C such that the fol lowing holds with non-zer o pr ob ability: F or al l x, y ∈ V , (1 − C ε ) d T ( x, y ) − δ ρ χ ( x, y ; δ ) ≤ X i ∈ Z k f i ( x ) − f i ( y ) k 1 ≤ d T ( x, y ) . (54) W e will pro v e Lemma 5.7 in Section 5.3. W e first make tw o observ ations, and then use them to pro v e Theorem 3.1. Our first observ ation is immediate from Observ ation 5.1 and Observ ation 5.2, since in the third case of (53), b y Observ ation 5.2 , ∆ i ( v 0 c ) and Π i,c 0 ( f i,c 0 ( x )) must b e supp orted on disjoin t sets of ro ws. 31 Observ ation 5.8. F or any v ∈ V and for any r ow j ∈ [ m ] , ther e is at most one non-zer o c o or dinate in f i ( v )[ j ] . Observ ation 5.2 and Lemma 5.5 also imply the following. Observ ation 5.9. F or any v ∈ V and u ∈ P v , we have d T ( u, v ) = P i ∈ Z k f i ( u ) − f i ( v ) k 1 . Using these, together with Corollary 3.5, w e no w pro v e Theorem 3.1. Pr o of of The or em 3.1. By Lemma 5.7, there exists a choice of mappings { g i } i ∈ Z suc h that for all x, y ∈ V , d T ( x, y ) ≥ X i ∈ Z k g i ( x ) − g i ( y ) k ≥ (1 − O ( ε )) d T ( x, y ) − δ ρ χ ( x, y ; δ ) . W e will apply Corollary 3.5 to the family given b y n f i = t 2 g i 2 i o i ∈ Z to arrive at an em b edding F : V → ` tm ( 2+ d log 1 ε e ) 1 suc h that G = F /t 2 satisfies, d T ( x, y ) ≥ k G ( x ) − G ( y ) k 1 ≥ (1 − O ( ε )) d T ( x, y ) − δ ρ χ ( x, y ; δ ) . (55) Observ e that the co domain of f i is R m × t , where mt = Θ(( 1 ε + log log ( 1 δ )) 3 log n ) , and the codomain of G is R d , where d = Θ(log 1 ε ( 1 ε + log log ( 1 δ )) 3 log n ). T o achiev e (55), w e need only sho w that for every x, y ∈ V , w e ha ve ζ ( x, y ) . εd T ( x, y ), where ζ ( x, y ) is defined in (30). Recalling this definition, we now restate ζ in terms of our explicit family n f i = t 2 g i 2 i o i ∈ Z . W e hav e, ζ ( x, y ) = X ( k 1 ,k 2 ) ∈ [ m ] × [ t ] X i : ∃ j 0, and let m w = max { i : τ i ( w ) 6 = 0 } . By Lemma 4.3 the maximum alw ays exists. 33 W e will no w split the rest of the pro of into t wo cases. Case 1: τ m w − 1 ( u ) = 0 . In this case by Lemma 4.6 w e ha ve d T ( u, w ) > 2 m w − 1 . F or ( k 1 , k 2 ) ∈ [ m ] × [ t ], if h i ( u, w ; k 1 , k 2 ) 6 = 0 then by (50), i ≤ m w and 2 i t 2 ≤ 2 m w t 2 < 2 d T ( u, w ) t 2 ≤ 2 d T ( x, y ) t 2 ≤ εd T ( x, y ) . Case 2: τ m w − 1 ( u ) 6 = 0 . Let m u = max { i : τ i ( u ) 6 = 0 } . By Lemma 4.5 and as τ m w − 1 ( u ) 6 = 0, w e ha ve m u ≤ m w ≤ m u + 1 . Observ ation 4.2, implies that for all j < m u , τ j ( u ) + X c 0 ∈ χ ( E ( P u )) τ j ( v c 0 ) = ϕ ( c ) . W e hav e m w ≥ m u , and by Observ ation 4.2, τ j ( w ) + X c 0 ∈ χ ( E ( P w )) τ j ( v c 0 ) = τ j ( u ) + X c 0 ∈ χ ( E ( P u )) τ j ( v c 0 ) = ϕ ( c ) . (61) therefore, by Observ ation 5.2 for j < m u and k ∈ [ t 2 ϕ ( c )] k ( g j ( x ) − g j ( u ))[ k ] k 1 = k ( g j ( y ) − g j ( w ))[ k ] k 1 = 0 , (62) and by Observ ation 5.2 and part (B) of (34), for all i ∈ Z , all the non-zero elements of g i ( u ) − g i ( w ) are in the first t 2 ϕ ( c ) rows. Supp ose that there exists k ∈ [ m ] such that k ( g i ( u ) − g i ( w ))[ k ] k 1 6 = 0. No w, w e divide the pro of in to tw o cases again. Case 2.1: There exists a j < i , such that k ( g j ( x ) − g j ( u ))[ k ] k 1 + k ( g j ( y ) − g j ( w ))[ k ] k 1 6 = 0 . In this case, there must exist some c 0 ∈ χ ( E ( P x )) 4 χ ( E ( P y )), such that k ( g j ( v c 0 ) − g j ( u c 0 ))[ k ] k 1 6 = 0 . By (53) and (50), we ha ve τ j ( u c 0 ) 6 = 0. Inequalit y (62) implies j ≥ m u , and finally b y Lemma 4.3, d T ( x, y ) ≥ d T ( u c 0 , v c 0 ) > 2 j − 1 ≥ 2 m u − 1 ≥ 2 m w − 2 ≥ 2 i − 2 . (63) Case 2.2: k ( g j ( x ) − g j ( u ))[ k ] k 1 + k ( g j ( y ) − g j ( w ))[ k ] k 1 = 0 for all j < i . In this case, either for all j < i , k g j ( x )[ k ] − g j ( y )[ k ] k 1 = 0 which implies that for k 0 ∈ [ t ], ( k , k 0 ) / ∈ S i ( u, w ), or k g j ( u )[ k ] − g j ( w )[ k ] k 1 6 = 0 for some j < i . If k g j ( u )[ k ] − g j ( w )[ k ] k 1 6 = 0 for some j < i then b y Lemma 5.6, d T ( x, y ) ≥ d T ( u, w ) ≥ 2 m u − 1 ≥ 2 m w − 2 ≥ 2 i − 2 . (64) F or i > m w w e hav e k g i ( u ) − g i ( w ) k 1 = 0, therefore in b oth cases if h i ( x, y ; k 1 , k 2 ) 6 = 0 either for all j < i , k g j ( x )[ k ] − g j ( y )[ k ] k 1 = 0 or 2 i t 2 ≤ 4 d T ( x, y ) t 2 ≤ 2 εd T ( x, y ) . 34 5.2 Prop erties of the ∆ i maps W e now presen t pro ofs of Lemmas 5.3 – 5.6. Pr o of of L emma 5.3. F or a fixed i ∈ Z , b y (50) there is at most one elemen t in ∆ i ( v ) that takes a v alue other than { 0 , 2 i t 2 } . W e prov e this lemma by sho wing that if for some i ∈ Z , and ( j, k ) ∈ [ m ] × [ t ], ∆ i ( v )[ j, k ] / ∈  0 , 2 i t 2  , then for all i 0 > i and ( j 0 , k 0 ) ∈ [ m ] × [ t ], we ha ve ∆ i 0 ( v )[ j 0 , k 0 ] = 0. Let c = χ ( v , p ( v )). Using (50), w e can conclude that t 2 τ i ( v ) > $ d T ( v c , v ) − P i − 1 ` = −∞ 2 ` τ ` ( v ) 2 i /t 2 % . Since the left hand side is an in teger, t 2 τ i ( v ) ≥ d T ( v c , v ) − P i − 1 ` = −∞ 2 ` τ ` ( v ) 2 i /t 2 , and X ` ≤ i 2 ` τ ` ( v ) = 2 i τ k ( v ) + X ` i w e hav e τ i 0 ( v ) = 0, thus k ∆ i 0 ( v ) k 1 = 0 and the pro of is complete. Pr o of of L emma 5.4. F or i < j log 2  m ( T ) M ( χ )+log 2 | E | k w e hav e k ∆ k ( u ) k = k ∆ k ( w ) k 1 = 0. Let ν b e the minim um integer suc h that part (A) of (34) for τ ν ( w ) is less that or equal to part (B). This ν exists since, by (35), part (B) of (34) is alwa ys p ositiv e, while b y Lemma 4.3, part (A) of (34) must b e zero for some ν ∈ Z . First we analyze the case when i < ν . Observ ation 4.2 implies that part (B) of (34) sp ecifies the v alue of τ i ( w ). By Lemma 4.5 τ i ( u ) ≥ τ i ( w ), but the part (B) for τ i ( u ) is the same as for τ i ( w ), so we m ust ha v e τ i ( u ) = τ i ( w ) , and the same reasoning holds for τ ` ( w ) for ` < i . Using this and the fact that part (A) do es not define τ i ( w ), w e hav e 2 i τ i ( w ) + X ` ν , by Observ ation 4.2, w e hav e τ i ( w ) = 0, and ∆ i ( w )[ j, k ] = 0. 35 Pr o of of L emma 5.5. F or all i ∈ Z , recalling the definition α and β in (50) for ∆ i ( u ), we ha ve β − α = min t 2 τ i ( v ) , $ d T ( v c , v ) − P i − 1 ` = −∞ 2 ` τ ` ( v ) 2 i /t 2 %! . and by definition of ∆ i ( u ) we ha ve, k ∆ i ( u ) k 1 = min   2 i τ i ( u ) , d T ( u, v c ) − X j 0, and α from (50) is a multiple of t 2 , for all t 2 b k t 2 c < h < k we ha v e k ∆ i ( u )[ h ] k 1 = 2 i t 2 . This implies that, k ∆ i ( u ) − ∆ i ( w ) k 1 ≥ 2 i t 2  k − 1 − t 2  k t 2  ≥ 2 j t 2  k − 1 − t 2  k t 2  . Moreo v er, k ∆ j ( w )[ k ] k 1 < 2 j t 2 , and (50) implies that for all k < h ≤ t 2 b 1 + k t 2 c , we hav e k ∆ j ( w )[ h ] k 1 = 0. The same argumen t also sho ws that, k ∆ j ( u ) − ∆ j ( w ) k 1 ≥ 2 j t 2  t 2  1 + k t 2  − k  . Hence by (65), d T ( u, w ) ≥ t 2 − 1 t 2 2 j ≥ 2 j − 1 . 36 5.3 The probabilistic analysis W e are th us left to pro ve Lemma 5.7. F or c ∈ χ ( E ), w e analyze the embedding for T ( c ) b y going through all c 0 ∈ χ ( E ( T ( c ))) one by one in increasing order of ϕ ( c 0 ). Our first lemma b ounds the probabilit y of a bad ev ent, i.e. of a subpath not contributing enough to the distance in the em b edding. Lemma 5.10. F or any C ≥ 8 , the fol lowing holds. Consider thr e e c olors a ∈ χ ( E ) , b ∈ ρ − 1 ( a ) , and c ∈ χ ( E ( P u v b )) for some u ∈ V ( T ( b )) . Then for every w ∈ V ( T ( a )) \ V ( T ( b )) , we have P " ∃ x ∈ V ( P w v a ) : X i ∈ Z k f i,a ( x ) − f i,a ( u ) k 1 ≤ (1 − C ε ) d T ( u, v c ) + X i ∈ Z k f i,a ( v c ) − f i,a ( x ) k 1 | { f i,c 0 } c 0 ∈ ρ − 1 ( a ) # ≤ 1 d log 2 1 /δ e exp  − ( C / ( ε 2 β +2 )) d T ( u, v c )  , (66) wher e β = max { i : ∃ y ∈ P u v c \{ v c } , τ i ( y ) 6 = 0 } . (Se e Figur e 1 for p osition of vertic es in the tr e e.) v c v a v b w x γ a γ b T ( b ) u Figure 1: P osition of vertices corresp onding to the statemen t of Lemma 5.10. Pr o of. Recall that R m × t is the co domain of f i,a . F or i ∈ Z , and j ∈ [ m ], and z ∈ V ( P w v a ), let s ij ( z ) =    f i,a ( z )[ j ] − f i,a ( v c )[ j ]    1 +    f i,a ( v c )[ j ] − f i,a ( u )[ j ]    1 −    f i,a ( z )[ j ] − f i,a ( u )[ j ]    1 . W e hav e, X i ∈ Z k f i,a ( u ) − f i,a ( v c ) k 1 + X i ∈ Z k f i,a ( v c ) − f i,a ( z ) k 1 = X i ∈ Z k f i,a ( z ) − f i,a ( u ) k 1 + X i ∈ Z ,j ∈ [ m ] s ij ( z ) . 37 By Observ ation 5.9, w e ha ve d T ( u, v c ) = P i ∈ Z k f i,a ( u ) − f i,a ( v c ) k 1 , therefore d T ( u, v c ) − X i ∈ Z ,j ∈ [ m ] s ij ( z ) = X i ∈ Z k f i,a ( z ) − f i,a ( u ) k 1 − X i ∈ Z k f i,a ( z ) − f i,a ( v c ) k 1 . (67) Let E = { f i,c 0 : c 0 ∈ ρ − 1 ( a ) } . W e define P E [ · ] = P [ · | E ] . In order to prov e this theorem, w e b ound P E   ∃ x ∈ V ( P w v a ) : X i ∈ Z ,j ∈ [ m ] s ij ( x ) ≥ C εd T ( u, v c )   . W e start b y bounding the maximum of the random v ariables s ij . F or i > β w e ha v e ∆ i ( u ) = ∆ i ( v c ), hence f i,a ( u ) = f i,a ( v c ). Using the triangle inequality for all for all i ∈ Z , j ∈ [ m ] and z ∈ P w v a , s ij ( z ) ≤ 2 k f i,a ( v c )[ j ] − f i,a ( u )[ j ] k 1 , (68) Hence for all i ∈ Z and j ∈ [ m ] b y Observ ation 5.8, s ij ( z ) ≤ 2 k f i,a ( v c )[ j ] − f i,a ( u )[ j ] k 1 ≤ 2 β +1 t 2 . (69) First note that, if z is on the path b et ween v b and v a then b y Observ ation 5.9, s ij ( z ) = 0. Observ ation 5.2 and (50) imply that if k f i,a ( u )[ j ] − f i,a ( v c )[ j ] k 1 6 = 0 then k f i,a ( v c )[ j ] k 1 = 0. F rom this, w e can conclude that s ij ( z ) 6 = 0 if and only if there exists a k ∈ [ t ] such that both f i,a ( u )[ j, k ] − f i,a ( v c )[ j, k ] 6 = 0 and f i,a ( z )[ j, k ] 6 = 0. Since by Lemma 5.4, for all i ∈ Z , j ∈ [ m ] and k ∈ [ t ], w e hav e f i,a ( w )[ j, k ] ≥ f i,a ( z )[ j, k ], w e conclude that for z ∈ P w v a if s ij ( z ) 6 = 0 then s ij ( w ) 6 = 0. No w, for i ∈ Z and j ∈ [ m ], w e define a random v ariable X ij = ( 0 if s ij ( w ) = 0, 2 k f i,a ( u )[ j ] − f i,a ( v c )[ j ] k 1 if s ij ( w ) 6 = 0. (70) Note that since the re-randomization in (53) is p erformed indep endently on eac h row and at each scale, the random v ariables { X ij : i ∈ Z , j ∈ [ m ] } are mutually indep endent. By (68), for all z ∈ P w v a , we ha ve s ij ( z ) ≤ X ij , and thus P E   ∃ x ∈ V ( P w v a ) : X i ∈ Z ,j ∈ [ m ] s ij ( x ) ≥ C εd T ( u, v c )   ≤ P E   X i ∈ Z ,j ∈ [ m ] X ij ≥ C εd T ( u, v c )   . (71) As b efore, for X ij to b e non-zero, it m ust b e that k ∈ [ t ] is such that f i,a ( w )[ j, k ] 6 = 0 and f i,a ( u )[ j, k ] − f i,a ( v c )[ j, k ] 6 = 0. Since w / ∈ V ( T ( b )) with the re-randomization in (53) and Observ a- tion 5.8, this happ ens at most with probabilit y 1 t , hence for j ∈ [ m ], and i ∈ Z , P E [ X ij 6 = 0] = P E  k f i,a ( w )[ j ] − f i,a ( v c )[ j ] k 1 + k f i,a ( v c )[ j ] − f i,a ( u )[ j ] k 1 − k f i,a ( w )[ j ] − f i,a ( u )[ j ] k 1 6 = 0  ≤ 1 t . 38 This yields, E [ X ij | E ] ≤ 1 t (2 k f i,a ( u )[ j ] − f i,a ( v c )[ j ] k 1 ) . (72) No w we use (69) to write V ar( X ij | E ) ≤ 1 t (2 k f i,a ( u )[ j ] − f i,a ( v c )[ j ] k 1 ) 2 ≤ 2 β +2 t 3 k f i,a ( u )[ j ] − f i,a ( v c )[ j ] k 1 , and use Observ ation 5.9 in conjunction with (72) to conclude that E   X i ∈ Z ,j ∈ [ m ] X ij | E   ≤ X i ∈ Z ,j ∈ [ m ] 2 t k f i ( v c )[ j ] − f i ( u )[ j ] k 1 = 2 t d T ( v c , u ) , (73) and X i ∈ Z ,j ∈ [ m ] V ar( X ij | E ) ≤ X i ∈ Z ,j ∈ [ m ] 2 β +2 t 3 k f i ( v c )[ j ] − f i ( u )[ j ] k 1 = 2 β +2 t 3 d T ( v c , u ) . (74) Define M = max { X ij − E [ X ij | E ] : i ∈ Z , j ∈ [ m ] } . W e now apply Theorem 2.2 to complete the pro of: P E " X i ∈ Z ,j ∈ [ m ] X ij ≥ C  d T ( u, v c ) t  # = P E " X i ∈ Z ,j ∈ [ m ] X ij − 2 d T ( u, v c ) t ≥ ( C − 2)  d T ( u, v c ) t  # (73) ≤ P E   X i ∈ Z ,j ∈ [ m ] X ij − E   X i ∈ Z ,j ∈ [ m ] X ij | E   ≥ ( C − 2)  d T ( u, v c ) t    ≤ exp   − (( C − 2) d T ( u, v c ) /t ) 2 2  P i ∈ Z ,j ∈ [ m ] V ar( X ij | E ) + ( C − 2)( d T ( u, v c ) /t ) M / 3    . Since E [ X ij | E ] ≥ 0, (69) implies M ≤ 2 β +1 t 2 . No w, we can plug in this b ound and (74) to write, P E " X i ∈ Z ,j ∈ [ m ] X ij ≥ C  d T ( u, v c ) t  # ≤ exp   − (( C − 2) d T ( u, v c ) /t ) 2 2  2 β +2 t 3 d T ( u, v c ) + ( C − 2)( d T ( u, v c ) /t )(2 β +1 /t 2 ) / 3    = exp  − t ( C − 2) 2 d T ( u, v c ) 2 (2 β +2 + ( C − 2)(2 β +1 ) / 3)  = exp  − ( C − 2) 2 ( C − 2) / 3 + 2  td T ( u, v c ) 2 β +2  . 39 An elementary calculation sho ws that for C ≥ 8, ( C − 2) 2 ( C − 2) / 3+2 > C, hence P E " X i ∈ Z ,j ∈ [ m ] X ij ≥ C  d T ( u, v c ) t  # < exp  − ( C t/ 2 β +2 ) d T ( u, v c )  (48) ≤ exp  − C  1 ε + log  log 2 1 δ   1 2 β +2  d T ( u, v c )  =  1 d log 2 (1 /δ ) e  C d T ( u,v c ) 2 β +2 · exp  − C  1 ε   1 2 β +2  d T ( u, v c )  . Since there exists a y ∈ P u v c \{ v c } such that τ β ( y ) 6 = 0, and for all c 0 ∈ χ ( E ), κ ( c 0 ) ≥ 1, Lemma 4.3 implies that d T ( u, v c ) > 2 β − 1 , and for C ≥ 8, we ha ve C d T ( u,v c ) 2 β +2 > 1. Therefore, P E " ∃ x ∈ V ( P w v a ) : X i ∈ Z k f i,a ( x ) − f i,a ( u ) k 1 ≤ (1 − C ε ) d T ( u, v c ) + X i ∈ Z k f i,a ( v c ) − f i,a ( x ) k 1 # (67) ≤ P E   ∃ x ∈ V ( P w v c ) : X i ∈ Z ,j ∈ [ m ] s ij ( x ) ≥ C εd T ( u, v c )   (71) ≤ P E " X i ∈ Z ,j ∈ [ m ] X ij ≥ C ε ( d T ( u, v c )) # (48) ≤ P E " X i ∈ Z ,j ∈ [ m ] X ij ≥ C  d T ( u, v c ) t  # <  1 d log 2 (1 /δ ) e  · exp  − C  1 ε 2 β +2  d T ( u, v c )  , completing the pro of. The Γ a mappings. Before pro ving Lemma 5.7, we need some more definitions. F or a color a ∈ χ ( E ), w e define a map Γ a : V ( T ( a )) → V ( T ( a )) based on Lemma 5.10. F or u ∈ V ( γ a ), w e put Γ a ( u ) = u . F or all other vertices u ∈ V ( T ( a )) \ V ( γ a ), there exists a unique color b ∈ ρ − 1 ( a ) such that u ∈ V ( T ( b )). W e define Γ a ( u ) as the v ertex w ∈ V ( P uv b ) whic h is closest to the ro ot among those v ertices satisfying the following condition: F or all v ∈ V ( P uw ) \ { w } and k ∈ Z , τ k ( v ) 6 = 0 implies 2 k < d T ( u, w ) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( a )) . (75) Clearly such a v ertex exists, because the conditions are v acuously satisfied for w = u . W e no w pro v e some properties of the map Γ a . Lemma 5.11. Consider any a ∈ χ ( E ) and u ∈ V ( T ( a )) such that Γ a ( u ) 6 = u . Then we have Γ a ( u ) = v c for some c ∈ χ ( E ( P uv a )) \ { a } . Pr o of. Let w ∈ V ( P u Γ a ( u ) ) b e suc h that Γ a ( u ) = p ( w ). The vertex w alw a ys exists b ecause Γ a ( u ) ∈ V ( P u ) \ { u } . If χ ( w , Γ a ( u )) 6 = χ (Γ a ( u ) , p (Γ a ( u ))) then Γ a ( u ) is v c for some c ∈ χ ( E ( P u v a )) \ { a } . 40 No w, for the sak e of con tradiction supp ose that χ ( w , Γ a ( u )) = χ (Γ a ( u ) , p (Γ a ( u ))). In this case, w e show that for all v ∈ P u p (Γ a ( u )) \ { p (Γ a ( u )) } , and k ∈ Z , τ k ( v ) 6 = 0 implies 2 k < d T ( u, p (Γ a ( u ))) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( a )) . (76) This is a con tradiction since by definition of Γ a , it must b e that Γ a ( u ) is the closest v ertex to the ro ot satisfying this condition, y et p (Γ a ( u )) is closer to ro ot than Γ a ( u ). Observ e that, V ( P u p (Γ a ( u )) ) \ { p (Γ a ( u )) } = V ( P u Γ a ( u ) ) . W e first v erify (76) for Γ a ( u ) and k ∈ Z with τ k (Γ a ( u )) 6 = 0. Since Γ a ( u ) ∈ V ( P u ), we ha ve d T ( u, Γ a ( u )) ≤ d T ( u, p (Γ a ( u ))) . (77) Recalling that p ( w ) = Γ a ( u ), by Lemma 4.5 for all k ∈ Z , τ k (Γ a ( u )) ≤ τ k ( w ), therefore for all k ∈ Z , with τ k (Γ a ( u )) 6 = 0, we ha ve τ k ( w ) 6 = 0 as well, hence (75) implies 2 k < d T ( u, Γ a ( u )) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( a )) (77) ≤ d T ( u, p (Γ a ( u ))) ε ( ϕ ( χ ( u, p ( u )) − ϕ ( a )) . (78) F or all other v ertices, v ∈ V ( P u Γ a ( u ) ) \ { Γ a ( u ) } , and k ∈ Z with τ k ( v ) 6 = 0 by (75), 2 k < d T ( u, Γ a ( u )) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( a )) (77) ≤ d T ( u, p (Γ a ( u ))) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( a )) , (79) completing the pro of. Lemma 5.12. Supp ose that a ∈ χ ( E ) and u ∈ V ( T ( a )) . F or any w ∈ V ( P u Γ a ( u ) ) , such that χ ( u, p ( u )) = χ ( w , p ( w )) we have Γ a ( w ) ∈ V ( P u Γ a ( u ) ) . Pr o of. F or the sak e of contradiction, suppose that Γ a ( w ) / ∈ V ( P u Γ a ( u ) ). Since w ∈ V ( P u ), and Γ a ( w ) / ∈ V ( P u Γ a ( u ) ), we ha ve Γ a ( w ) ∈ V ( P Γ a ( u ) ), and d T ( u, Γ a ( u )) ≤ d T ( u, Γ a ( w )) . (80) Since w ∈ V ( P u Γ a ( u ) ) b y assumption, for all v ertices, we hav e V ( P u w ) \ { w } ⊆ V ( P u Γ a ( u ) ) \ { Γ a ( u ) } . Th us for all v ∈ V ( P u w ) \ { w } and k ∈ Z with τ k ( v ) 6 = 0 by (75), 2 k < d T ( u, Γ a ( u )) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( a )) (80) ≤ d T ( u, Γ a ( w )) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( a )) . (81) The fact that w ∈ V ( P u Γ a ( u ) ) also implies that d T ( w , Γ a ( w ))) ≤ d T ( u Γ a ( w ))). Therefore, for all vertices v ∈ V ( P w Γ a ( w ) ) \ { Γ a ( w ) } and k ∈ Z with τ k ( v ) 6 = 0 by (75), 2 k < d T ( w , Γ a ( w )) ε ( ϕ ( χ ( w , p ( w ))) − ϕ ( a )) ≤ d T ( u, Γ a ( w )) ε ( ϕ ( χ ( w , p ( w ))) − ϕ ( a )) = d T ( u, Γ a ( w )) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( a )) . (82) W e hav e, V ( P u Γ a ( w ) ) = V ( P u w ) ∪  V ( P w Γ a ( w ) ) \ { Γ a ( w ) }  . 41 Hence, by (81) and (82), for all v ∈ V ( P u Γ a ( w ) ) \ { Γ a ( w ) } and k ∈ Z , τ k ( v ) 6 = 0 implies 2 k < d T ( u, p (Γ a ( w ))) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( a )) . (83) This is a contradiction to the definition of Γ a ( u ), since Γ a ( u ) must b e the closest v ertex to the ro ot satisfying this condition, y et Γ a ( w ) is closer to ro ot than Γ a ( u ). Defining represen tatives for γ c . Now, for eac h c ∈ χ ( E ), w e define a small set of represen tatives for v ertices in γ c . Later, we use these sets to b ound the con traction of pairs of v ertices that hav e one endp oin t in γ c . F or a ∈ χ ( E ) and c ∈ χ ( E ( T ( a ))) \ { a } , w e define the set R a ( c ) ⊆ V ( γ c ), the set of represen ta- tiv es for γ c , as follows R a ( c ) = d log 2 1 δ e− 1 [ i =0 n u ∈ V ( γ c ) : u is the furthest vertex from v c s.t. Γ a ( u ) 6 = u and d ( u, v c ) ≤ 2 − i len ( γ c ) o . (84) The next lemma sa ys when a vertex has a close represen tativ e. Lemma 5.13. Consider a ∈ χ ( E ) and c ∈ χ ( E ( T ( a ))) \ { a } . F or al l vertic es u ∈ V ( γ c ) with Γ a ( u ) 6 = u ther e exists a w ∈ R a ( c ) such that, d T ( u, v c ) ≤ d T ( w , v c ) ≤ 2 max  d T ( u, v c ) , δ len ( γ c )  . Pr o of. Let i ≥ 0 be suc h that d T ( u, v c ) len ( γ c ) ∈  2 − i − 1 , 2 − i  . If i ≤ d log 2 1 δ e − 1, then (84) implies that either u ∈ R a ( c ), or there exists a w ∈ R a ( c ) such that d T ( u, v c ) < d T ( w , v c ) ≤ len ( γ c ) 2 i ≤ 2 d T ( u, v c ) . On the other hand, if i > d log 2 1 δ e − 1, then (84) implies that either u ∈ R a ( c ), or that there exists a w ∈ R a ( c ), such that d T ( u, v c ) < d T ( w , v c ) ≤ len ( γ c ) 2 d log 2 1 δ e− 1 ≤ 2 δ len ( γ c ) , completing the pro of. The following lemma, in conjunction with Lemma 5.13, reduces the num b er of vertices in V ( γ c ) that we need to analyze using Lemma 5.10. Lemma 5.14. L et ( X, d ) b e a pseudometric, and let f : V → X b e a 1 -Lipschitz map. F or x, y ∈ V , and x 0 , y 0 ∈ V ( P xy ) and h ≥ 0 , if d ( f ( x ) , f ( y )) ≥ d T ( x, y ) − h then d ( f ( x 0 ) , f ( y 0 )) ≥ d T ( x 0 , y 0 ) − h . 42 Pr o of. Supp ose without loss of generality that d T ( x 0 , x ) ≤ d T ( y 0 , x ). Using the triangle inequality , d ( f ( x 0 ) , f ( y 0 )) ≥ d ( f ( x ) , f ( y )) − d ( f ( x ) , f ( x 0 )) − d ( f ( y ) , f ( y 0 )) ≥ ( d T ( x, y ) − h ) − d ( f ( x ) , f ( x 0 )) − d ( f ( y ) , f ( y 0 )) ≥ d T ( x, y ) − d T ( x, x 0 ) − d T ( y , y 0 ) − h = d T ( x 0 , y 0 ) − h . The following lemma constitutes the inductiv e step of the pro of of Lemma 5.7. Lemma 5.15. Ther e exists a universal c onstant C , such that for any c olor c ∈ χ ( E ) ∪ { χ ( r , p ( r )) } , the fol lowing holds. Supp ose that, with non-zer o pr ob ability, for al l c 0 ∈ ρ − 1 ( c ) , and for al l p airs x, y ∈ V ( T ( c 0 )) , we have (1 − C ε ) d T ( x, y ) − δ ρ χ ( x, y ; δ ) ≤ X i ∈ Z k f i,c 0 ( x ) − f i,c 0 ( y ) k 1 ≤ d T ( x, y ) . (85) Then with non-zer o pr ob ability for al l x, y ∈ V ( T ( c )) , we have (1 − C ε ) d T ( x, y ) − δ ρ χ ( x, y ; δ ) ≤ X i ∈ Z k f i,c ( x ) − f i,c ( y ) k 1 ≤ d T ( x, y ) . (86) Pr o of. Let E denote the ev en t that, for all c 0 ∈ ρ − 1 ( c ), and all x, y ∈ V ( T ( c 0 )), we ha ve d T ( x, y ) ≥ X i ∈ Z k f i,c 0 ( x ) − f i,c 0 ( y ) k ≥ (1 − C ε ) d T ( x, y ) − δ ρ χ ( x, y ; δ ) . (87) W e will prov e the lemma by showing that, conditioned on E , (86) holds with non-zero probability . F or x, y ∈ V ( T ( c )) we define, µ ( x, y ) = max { ϕ ( a ) : a ∈ χ ( E ) and x, y ∈ V ( T ( a )) } . Note that since x, y ∈ V ( T ( c )), we ha ve µ ( x, y ) ≥ ϕ ( c ) . (88) It is easy to see that if µ ( x, y ) > ϕ ( c ), then x, y ∈ V ( T ( c 0 )) for some c 0 ∈ ρ − 1 ( c ). By construction, if c 0 ∈ ρ − 1 ( c ) and x, y ∈ V ( T ( c 0 )), then k f i,c ( x ) − f i,c ( y ) k = k f i,c 0 ( x ) − f i,c 0 ( y ) k , hence E implies that (86) holds for all such pairs. Th us in the remainder of the proof, w e need only handle pairs x, y ∈ V ( T ( c )) with µ ( x, y ) = ϕ ( c ). W rite χ ( E ( T ( c ))) = { c 1 , c 2 , . . . , c n } , where the colors are ordered so that ϕ ( c j ) ≤ ϕ ( c j +1 ) for j = 1 , 2 , . . . , n − 1. Let ε 1 = 24 ε , where the constant 24 comes from Lemma 5.10. And let ε 2 = 2 · C 0 ε , where C 0 is the constant from Lemma 4.11. 43 F or i ∈ [ m ], w e define the ev en t X i as follows: F or all j ≤ i , and all x ∈ V ( γ c i ) and y ∈ V ( γ c j ) with µ ( x, y ) = ϕ ( c ), w e ha ve X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 ≥ d T ( x, y ) − ε 1 d T ( x, y ) − ε 2 d T (Γ c ( x ) , Γ c ( y )) − δ ρ χ ( x, y ; δ ) . (89) F or all pairs x ∈ V ( γ c i ) and y ∈ V ( γ c j ), the even t X max( i,j ) implies, X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 ≥ d T ( x, y ) − ( ε 1 + ε 2 ) d T ( x, y ) − δ ρ χ ( x, y ; δ ) . In particular this shows that for C = 2 · C 0 + 24, if the even ts X 1 , X 2 , . . . , X n all occur, then (86) holds for all pairs x, y ∈ V ( T ( c )). Hence w e are left to sho w that P [ X 1 ∧ · · · ∧ X n | E ] > 0 . T o this end, w e define new ev en ts { Y i : i ∈ [ n ] } and w e show that for every i ∈ [ n ], P E [ X 1 ∧ · · · ∧ X i | X 1 ∧ · · · ∧ X i − 1 ∧ Y i ] = 1 , (90) and then we b ound the probability that Y i do es not o ccur by , P E  Y i  ≤ 2 − 3( ϕ ( c i ) − ϕ ( c ))+1 . (91) By , Lemma 5.5 and the definition of f k,c (53), w e ha ve P E [ X 1 ] = 1. Since for all i ∈ { 2 , . . . n } , c i ∈ χ ( E ( T ( c ))) \ { c } , we ha v e P E [ X 1 ∧ · · · ∧ X n ] ≥ 1 − n X i =2 P E  Y i  (91) ≥ 1 − n X i =2 2 − 3( ϕ ( c i ) − ϕ ( c ))+1 (4.9) > 1 − 2 · 2 (2 − 3) = 0 , whic h completes the pro of. F or each i ∈ [ n ], w e define the ev ent Y i as follo ws: F or all j < i , and all vertices x ∈ R c ( c i ) and y ∈ V ( γ c j ) with µ ( x, y ) = ϕ ( c ), w e hav e X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 − X k ∈ Z k f k,c (Γ c ( x )) − f k,c ( y ) k 1 ≥ (1 − ε 1 / 2) d T ( x, Γ c ( x )) . (92) W e now complete the pro of of Lemma 5.15 by pro ving (90) and (91). Pro of of (90) . Supp ose that X 1 , . . . , X i − 1 and Y i hold. W e will show that X i holds as w ell. First note for all v ertices in x, y ∈ V ( γ c i ), by Lemma 5.5 and the definition of f k,c i (53), we ha ve d T ( x, y ) = X k ∈ Z k f k,c i ( x ) − f k,c i ( y ) k 1 = X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 , 44 th us w e only need to prov e (89) for pairs x ∈ V ( γ c i ), and y ∈ V ( γ c j ) for with j < i and µ ( x, y ) = ϕ ( c ). W e now divide the pairs with one endp oint in γ c i in to tw o cases based on Γ c . Case I: x ∈ V ( γ c i ) with x 6 = Γ c ( x ) , and y ∈ V ( γ c j ) for some j < i , and µ ( x, y ) = ϕ ( c ) . In this case, b y Lemma 5.13, there exists a vertex z ∈ R c ( c i ) such that d ( x, v c i ) ≤ d ( z , v c i ) ≤ 2 max ( δ len ( E ( γ c i )) , d T ( x, v c i )) . If d ( x, v c i ) ≤ δ len ( E ( γ c i )), then by (18), w e ha ve len ( E ( γ c i )) = ρ χ ( x, v c i ; δ ), hence d T ( z , Γ c ( z )) ≤ d T ( v c i , Γ c ( z )) + 2 max( δ len ( E ( γ c i )) , d T ( x, v c i )) ≤ d T ( v c i , Γ c ( z )) + 2 max( δ ρ χ ( x, v c i ; δ ) , d T ( x, v c i )) ≤ d T ( v c i , Γ c ( z )) + 2 δ ρ χ ( x, v c i ; δ ) + 2 d T ( x, v c i ) ≤ 2 δ ρ χ ( x, v c i ; δ ) + 2 d T ( x, Γ c ( z )) . (93) v c  c y x  c i z v c i  c ( x )  c ( z )  c (  c ( z ))  c ( y ) Figure 2: Position of vertices in the subtree T ( c ) for Case I. Since z ∈ R c ( c i ), by definition we hav e Γ c ( z ) 6 = z , therefore by Lemma 5.11, Γ c ( z ) = v c 0 for some color c 0 ∈ χ ( P z v c ) \ { c } . The function ϕ is non-decreasing along an y ro ot leaf path, hence χ (Γ c ( z ) , p (Γ c ( z ))) = c ` for some ` < i . W e refer to Figure 2 for the relativ e position of the vertices referenced in the following inequal- ities. Using our assumption that X 1 , . . . , X i − 1 and Y i hold, we can write X k ∈ Z k f k,c ( z ) − f k,c ( y ) k 1 Y i ≥ d T (Γ c ( z ) , z ) − ( ε 1 / 2) d T ( z , Γ c ( z )) + X k ∈ Z k f k,c (Γ c ( z )) − f k,c ( y ) k 1 X max( `,j ) ≥ d T (Γ c ( z ) , z ) − ( ε 1 / 2) d T ( z , Γ c ( z )) + d T (Γ c ( z ) , y ) − ε 2 d T (Γ c (Γ c ( z )) , Γ c ( y )) − ε 1 d T (Γ c ( z ) , y ) − δ ρ χ (Γ c ( z ) , y ; δ ) ≥ d T ( y , z ) − ( ε 1 / 2) d T ( z , Γ c ( z )) − ε 2 d T (Γ c (Γ c ( z )) , Γ c ( y )) − ε 1 d T (Γ c ( z ) , y ) − δ ρ χ (Γ c ( z ) , y ; δ ) . 45 W e may assume that ε 1 < 1, otherwise the statement of the lemma is v acuous. Using the preceding inequalit y , and applying Lemma 5.14 on pairs ( z , y ) and ( x, y ) implies that X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 ≥ d T ( x, y ) − ( ε 1 / 2) d T ( z , Γ c ( z )) − ε 2 d T (Γ c (Γ c ( z )) , Γ c ( y )) − ε 1 d T (Γ c ( z ) , y ) − δ ρ χ (Γ c ( z ) , y ; δ ) (93) ≥ d T ( x, y ) − ( ε 1 / 2)  2 d T ( x, Γ c ( z )) + 2 δ ρ χ ( x, v c i ; δ )  − ε 2 d T (Γ c (Γ c ( z )) , Γ c ( y )) − ε 1 d T (Γ c ( z ) , y ) − δ ρ χ (Γ c ( z ) , y ; δ ) , where in the last line we ha v e used the fact that ε 1 < 1. W e hav e χ ( x, p ( x )) = χ ( z , p ( z )) = c i . Moreo ver, since Γ c ( z ) 6 = z , using Lemma 5.11 it is easy to c hec k that x ∈ P z Γ c ( z ) . Therefore, b y Lemma 5.12, d T (Γ(Γ c ( z )) , y ) ≤ d T (Γ c ( z ) , y ) ≤ d T (Γ c ( x ) , y ) , and combining this with the preceding inequalit y yields, X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 ≥ d T ( x, y ) − ( ε 1 / 2)  2 d T ( x, Γ c ( z )) + 2 δ ρ χ ( x, v c i ; δ )  − ε 2 d T (Γ c ( x ) , Γ c ( y )) − ε 1 d T (Γ c ( z ) , y ) − δ ρ χ (Γ c ( z ) , y ; δ ) . Recall the definition of C ( x, y ; δ ) in (18). Since b y Lemma 5.11, Γ c ( z ) = v c 0 for some color c 0 ∈ χ ( P z v c ) \ { c } , w e ha ve C (Γ c ( z ) , y ; δ ) ⊆ C ( v c i , y ; δ ), hence ρ χ ( v c i , y ; δ ) ≥ ρ χ (Γ c ( z ) , y ; δ ) and th us, X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 ≥ d T ( x, y ) − ( ε 1 / 2)  2 d T ( x, Γ c ( z )) + 2 δ ρ χ ( x, v c i ; δ )  − ε 2 d T (Γ c ( x ) , Γ c ( y )) − ε 1 d T (Γ c ( z ) , y ) − δ ρ χ ( v c i , y ; δ ) ≥ d T ( x, y ) − ε 1 d T ( x, Γ c ( z )) − ε 2 d T (Γ c ( x ) , Γ c ( y )) − ε 1 d T (Γ c ( z ) , y ) − δ  ρ χ ( v c i , y ; δ ) + ε 1 ρ χ ( x, v c i ; δ )  ≥ d T ( x, y ) − ε 1 d T ( x, Γ c ( z )) − ε 2 d T (Γ c ( x ) , Γ c ( y )) − ε 1 d T (Γ c ( z ) , y ) − δ  ρ χ ( x, v c i ; δ ) + ρ χ ( v c i , y ; δ )  , where in the last line we ha v e again used that ε 1 < 1. The set of colors that appear on the paths P x v c i and P v c i y are disjoin t, therefore ρ χ ( x, y ; δ ) = ρ χ ( x, v c i ; δ ) + ρ χ ( v c i , y ; δ ), and X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 ≥ d T ( x, y ) − ε 1 d T ( x, Γ c ( z )) − ε 2 d T (Γ c ( x ) , Γ c ( y )) − ε 1 d T (Γ c ( z ) , y ) − δ ρ χ ( x, y ; δ ) = d T ( x, y ) − ε 1 d T ( x, y ) − ε 2 d T (Γ c ( x ) , Γ c ( y )) − δ ρ χ ( x, y ; δ ) . Case I I: x ∈ V ( γ c i ) with x = Γ c ( x ) , and y ∈ V ( γ c j ) for some j < i , and µ ( x, y ) = ϕ ( c ) . In this case, if x ∈ V ( γ c ) then the ev ent X j implies (89). On the other hand, suppose that x ∈ V ( T ( c 0 )) for some c 0 ∈ ρ − 1 ( c ). Recall that ε 2 2 = C 0 ε , where C 0 is the constan t from Lemma 4.11. By Lemma 4.11 (with c 0 , x , and ε 2 2 substituted for c , v , and ε , resp ectiv ely , in the statement of Lemma 4.11), there exist vertices u, u 0 ∈ { x } ∪ { v a : a ∈ χ ( E ( P x v c 0 )) } such that d T ( x, u ) ≤ ( ε 2 / 2) d T ( u 0 , u ) . (94) 46 and for all v ertices z ∈ V ( P u 0 u ) \ { u 0 } and for all k ∈ Z , τ k ( z ) 6 = 0 = ⇒ 2 k <  d T ( u, u 0 ) ε ( ϕ ( χ ( u, p ( u ))) − ϕ ( χ ( v c 0 , p ( v c 0 ))))  . W e ha ve χ ( v c 0 , p ( v c 0 )) = c , and this condition is exactly the same condition as (75) for Γ c ( u ), therefore d T ( x, u ) ≤ ( ε 2 / 2) d T ( u 0 , u ) ≤ ( ε 2 / 2) d T (Γ c ( u ) , u ) . (95) Note that, the ass umption that Γ c ( x ) = x implies that, u 6 = x and u = v a for some a ∈ χ ( E ( P u,v c 0 )). W e hav e, X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 − X k ∈ Z k f k,c ( u ) − f k,c ( y ) k 1 ≥ − X k ∈ Z k f k,c ( x ) − f k,c ( u ) k 1 (5.9) = − d T ( x, u ) (95) ≥ d T ( x, u ) − ε 2 d T ( u, Γ c ( u )) ≥ d T ( x, u ) − ε 2 d T ( x, Γ c ( u )) = d T ( x, u ) − ε 2 d T (Γ c ( x ) , Γ c ( u )) . (96) Since u = v a for some a ∈ χ ( E ( P u,v c 0 )), χ ( u, p ( u )) = c ` , for some ` < i , and X max( `,j ) implies that, X k ∈ Z k f k,c ( u ) − f k,c ( y ) k 1 ≥ d T ( u, y ) − ε 2 d T (Γ c ( u ) , Γ c ( y )) − ε 1 d T ( u, y ) − δ ρ χ ( u, y ; δ ) . Recall the definition of C ( x, y ; δ ) in (18), W e ha ve u = v a for some a ∈ χ ( E ( P u,v c 0 )), therefore C ( u, y ; δ ) ⊆ C ( x, y ; δ ), and ρ χ ( u, y ; δ ) ≤ ρ χ ( x, y ; δ ). No w we can write, X k ∈ Z k f k,c ( u ) − f k,c ( y ) k 1 ≥ d T ( u, y ) − ε 2 d T (Γ c ( u ) , Γ c ( y )) − ε 1 d T ( u, y ) − δ ρ χ ( x, y ; δ ) . (97) Adding (96) and (97) we can conclude that X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 ≥ d T ( u, y ) + d T ( u, x ) − ε 2 d T (Γ c ( x ) , Γ c ( u )) + d T (Γ c ( u ) , Γ c ( y )) − ε 1 d T ( x, y ) − δ ρ χ ( x, y ; δ ) ≥ d T ( x, y ) − ε 2 d T (Γ c ( x ) , Γ c ( y )) − ε 1 d T ( x, y ) − δ ρ χ ( x, y ; δ ) , completing the pro of of (90). Pro of of (91) . W e prov e this inequalit y by first b ounding the probability that (92) holds for a fixed x and all y ∈ V ( γ c j ) (for a fixed j ∈ { 1 , . . . , i − 1 } ) with µ ( x, y ) = ϕ ( c ). Then we use a union b ound to complete the pro of. W e start the pro of b y giving some definitions. F or a v ertex x ∈ R c ( c i ), let S x = n j ∈ { 1 , . . . , i − 1 } : there exists a v ∈ V ( γ c j ) such that µ ( x, v ) = ϕ ( c ) o . 47 And for a ∈ S x , we define w ( x ; a ) as the vertex v ∈ V ( γ a ) whic h is furthest from the ro ot among those satisfying µ ( x, v ) = ϕ ( c ). Finally for x ∈ R c ( c i ), we put β x = max  k ∈ Z : ∃ z ∈ P x Γ c ( x ) \ { Γ c ( x ) } , τ k ( z ) 6 = 0  . Inequalit y (75) implies, 2 β x < d T ( x, Γ c ( x )) ε ( ϕ ( c i ) − ϕ ( c )) . (98) By definition of R c , for all elements x ∈ R c ( c i ), we hav e Γ c ( x ) 6 = x . Moreo v er, by Lemma 5.11, Γ c ( x ) = v c 0 for some c 0 ∈ χ ( E ( P x v c )) \ { c } . No w, for x ∈ R c ( c i ) and a ∈ S x w e apply Lemma 5.10 with ε 1 / 2 = 12 ε to write P E " ∃ y ∈ P w ( x ; a ) ,v c : X k ∈ Z k f k,c ( x ) − f k,c ( y ) k 1 ≤ (1 − ε 1 / 2) d T ( x, Γ c ( x )) + X k ∈ Z k f k,c ( y ) − f k,c (Γ c ( x )) k 1 # ≤ 1 d log 2 1 /δ e exp  − 12 d T ( x, Γ c ( x )) 2 β x +2 ε  (98) ≤ exp( − 3( ϕ ( c i ) − ϕ ( c ))) d log 2 1 /δ e . (99) Note that, for all y ∈ V ( γ c a ) with µ ( x, y ) = ϕ ( c ), w e hav e y ∈ P w ( x ; a ) ,v c . By definition of R c ( c i ), | R c ( c i ) | ≤ d log 2 δ − 1 e . W e also hav e ϕ ( c j ) ≤ ϕ ( c i ) for j < i , and by Corollary 4.8, | S x | ≤ i < 2 ϕ ( c i ) − ϕ ( c )+1 . T aking a union b ound o v er all x ∈ R c ( c i ) and a ∈ S x implies, P E [ Y i ] (99) ≤ X x ∈ R c ( c i ) | S x |  1 d log 2 δ − 1 e exp( − 3( ϕ ( c i ) − ϕ ( c )))  <  d log 2 δ − 1 e 2 ϕ ( c i ) − ϕ ( c )+1   1 d log 2 δ − 1 e exp( − 3( ϕ ( c i ) − ϕ ( c )))  = 2 ϕ ( c i ) − ϕ ( c )+1 exp( − 3( ϕ ( c i ) − ϕ ( c ))) . Since ϕ ( c i ) ≥ ϕ ( c ), b y an elementary calculation we conclude that P E [ Y i ] < 2 · 2 − 3( ϕ ( c i ) − ϕ ( c )) , whic h completes the pro of of (91). Finally , we presen t the pro of of Lemma 5.7. Pr o of of L emma 5.7. Let C b e the same c onstan t as the constan t in Lemma 5.15. F or the sak e of con tradiction, supp ose that P " ∀ x, y ∈ V , (1 − C ε ) d T ( x, y ) − δ ρ χ ( x, y ; δ ) ≤ X i ∈ Z k f i ( x ) − f i ( y ) k 1 ≤ d T ( x, y ) # = 0 . 48 No w let c ∈ χ ( E ) ∪ { χ ( r, p ( r )) } b e a color with a maximal v alue of ϕ ( c ) suc h that, P " ∀ x, y ∈ V ( T ( c )) , (1 − C ε ) d T ( x, y ) − δ ρ χ ( x, y ; δ ) ≤ X i ∈ Z k f i,c ( x ) − f i,c ( y ) k 1 ≤ d T ( x, y ) # = 0 . (100) F or a ∈ χ ( E ), κ ( a ) > 0. Hence, for all c 0 ∈ ρ − 1 ( c ) , by (32), ϕ ( c 0 ) > ϕ ( c ), and by maximalit y of c , for all c 0 ∈ ρ − 1 ( c ) , we ha ve P " x, y ∈ V ( T ( c 0 )) , (1 − C ε ) d T ( x, y ) − δ ρ χ ( x, y ; δ ) ≤ X i ∈ Z k f i,c 0 ( x ) − f i,c 0 ( y ) k 1 ≤ d T ( x, y ) # > 0 . But now applying Lemma 5.15 con tradicts (100), completing the pro of. References [A CNN11] A. Andoni, M. Charik ar, O. Neiman, and H. L. Nguyen. Near linear low er b ound for dimension reduction in L 1 . T o app ear, Pr o c e e dings of the 52nd Annual IEEE Confer enc e on F oundations of Computer Scienc e , 2011. [BC05] Bo Brinkman and Moses Charik ar. On the imp ossibilit y of dimension reduction in ` 1 . J. A CM , 52(5):766–788, 2005. [BLM89] J. Bourgain, J. Lindenstrauss, and V. Milman. Appro ximation of zonoids by zonotopes. A cta Math. , 162(1-2):73–141, 1989. [BSS09] Josh ua D. Batson, Daniel A. Spielman, and Nikhil Sriv astav a. Twice-raman ujan spar- sifiers. In Pr o c e e dings of the 41st Annual ACM Symp osium on The ory of Computing , pages 255–262, 2009. [CS02] Moses Charik ar and Amit Sahai. Dimension reduction in the ` 1 norm. In 43r d Annual Symp osium on F oundations of Computer Scienc e , 2002. [EL75] P . Erd˝ os and L. Lov´ asz. Problems and results on 3-c hromatic hypergraphs and some related questions. In Infinite and finite sets (Col lo q., Keszthely, 1973; de dic ate d to P. Er d˝ os on his 60th birthday), Vol. II , pages 609–627. Collo q. Math. So c. J´ anos Bolyai, V ol. 10. North-Holland, Amsterdam, 1975. [GKL03] An upam Gupta, Rob ert Krauthgamer, and James R. Lee. Bounded geometries, frac- tals, and lo w-distortion em b eddings. In 44th Symp osium on F oundations of Computer Scienc e , pages 534–543, 2003. [JL84] William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. In Confer enc e in mo dern analysis and pr ob ability (New Haven, Conn., 1982) , volume 26 of Contemp. Math. , pages 189–206. Amer. Math. So c., Providence, RI, 1984. [LN04] J. R. Lee and A. Naor. Embedding the diamond graph in L p and dimension reduction in L 1 . Ge om. F unct. Anal. , 14(4):745–747, 2004. 49 [LNP09] James R. Lee, Assaf Naor, and Y uv al Peres. T rees and Mark ov con vexit y . Ge om. F unct. A nal. , 18(5):1609–1659, 2009. [Mat] J. Matou ˇ sek. Op en problems on low-distortion embeddings of finite metric spaces. Online: h ttp://k am.mff.cuni.cz/ ∼ matousek/metrop.ps. [Mat99] J. Matou ˇ sek. On em b edding trees into uniformly con vex Banach spaces. Isr ael J. Math. , 114:221–237, 1999. [McD98] Colin McDiarmid. Concentration. In Pr ob abilistic metho ds for algorithmic discr ete mathematics , v olume 16 of A lgorithms Combin. , pages 195–248. Springer, Berlin, 1998. [NR10] Ilan Newman and Y uri Rabino vic h. On cut dimension of ` 1 metrics and v olumes, and related sparsification techniques. CoRR , abs/1002.3541, 2010. [Reg11] Oded Regev. Entrop y-based b ounds on dimension reduction in L 1 . 2011. [Sc h87] Gideon Sc hech tman. More on em b edding subspaces of L p in l n r . Comp ositio Math. , 61(2):159–169, 1987. [Sc h96] Leonard J. Sc h ulman. Co ding for in teractiv e communication. IEEE T r ans. Inform. The ory , 42(6, part 1):1745–1756, 1996. Co des and complexity . [T al90] Michel T alagrand. Embedding subspaces of L 1 in to l N 1 . Pr o c. A mer. Math. So c. , 108(2):363–369, 1990. 50

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment