A Spectral Approach to Analyzing Belief Propagation for 3-Coloring

Reading time: 43 minute
...

📝 Original Info

  • Title: A Spectral Approach to Analyzing Belief Propagation for 3-Coloring
  • ArXiv ID: 0712.0171
  • Date: 2017-11-17
  • Authors: Researchers mentioned in the ArXiv original paper

📝 Abstract

Contributing to the rigorous understanding of BP, in this paper we relate the convergence of BP to spectral properties of the graph. This encompasses a result for random graphs with a ``planted'' solution; thus, we obtain the first rigorous result on BP for graph coloring in the case of a complex graphical structure (as opposed to trees). In particular, the analysis shows how Belief Propagation breaks the symmetry between the $3!$ possible permutations of the color classes.

💡 Deep Analysis

This research explores the key findings and methodology presented in the paper: A Spectral Approach to Analyzing Belief Propagation for 3-Coloring.

Contributing to the rigorous understanding of BP, in this paper we relate the convergence of BP to spectral properties of the graph. This encompasses a result for random graphs with a ``planted’’ solution; thus, we obtain the first rigorous result on BP for graph coloring in the case of a complex graphical structure (as opposed to trees). In particular, the analysis shows how Belief Propagation breaks the symmetry between the $3!$ possible permutations of the color classes.

📄 Full Content

This paper deals with a rigorous analysis of the Belief Propagation ("BP" for short) algorithm on certain instances of the 3-coloring problem. Originally BP was introduced by Pearl [14] as a message passing algorithm to compute the marginals at the vertices of a probability distribution described by an acyclic "graphical model", i.e., a representation of the distribution's dependency structure as an acyclic graph. Although in the worst case BP will fail if the graphical representation features cycles, various version of BP are in common use as heuristics in artificial intelligence and statistics, where they frequently perform well empirically as long as the underlying model does at least not contain (many) "short" cycles. However, there is currently no general theory that could explain the empirical success of BP (with the notable exception of the use of BP in LDPC decoding [11,12,15]).

A striking recent application of BP is to instances of NP-hard constraint satisfaction problems such as 3-SAT or 3-coloring; this is the type of problems that we are dealing with in the present work. In this case the primary objective is not to compute the marginals of some distribution, but to construct a solution to the constraint satisfaction problem. For example, BP can be used to (attempt to) compute a proper 3-coloring of a given graph. Indeed, empirically BP (and its sibling Survey Propagation “SP”) seems to perform well on problem instances that are notoriously “hard” for other current algorithmic approaches, including the case of sparse random graphs.

For instance, let G(n, p) be the random graph with vertex set V = {1, . . . , n} that is obtained by including each possible edge with probability 0 < p = p(n) < 1 independently. Thus, the expected degree of any vertex in G(n, p) is (n -1)p ∼ np. Then there exists a threshold τ = τ (n) such that for any ǫ > 0 the random graph G(n, p) is 3-colorable with probability 1 -o(1) if np < (1 -ǫ)τ , whereas G(n, p) is not 3-colorable if np > (1 + ǫ)τ [1]. In fact, random graphs G(n, p) with average degree np just below τ were considered the example of “hard” instances of the 3-coloring problem, until statistical physicists discovered that BP/SP can solve these graph problems efficiently in a regime considered “hard” for any previously known algorithms (possibly right up to the threshold density) [4,6]. While there are exciting and deep arguments from statistical physics that provide a plausible explanation of why these message passing algorithms succeed, these arguments are non-rigorous, and indeed no mathematically rigorous analysis is currently known.

The difficulty in understanding the performance of BP/SP on G(n, p) actually lies in two aspects. The first aspect is the combinatorial structure of the random graph G(n, p) with respect to the 3-coloring problem, which is not very well understood. In fact, even the basic problem of obtaining the precise value of the threshold τ is one of the current challenges in the theory of random graphs. Furthermore, we lack a rigorous understanding of the “solution space geometry”, i.e., the structure of the set of all proper 3-colorings of a typical random graph G(n, p) (e.g., how many proper 3-colorings are there typically, and what is the typical Hamming distance between any two). But according to the statistical physics analysis, the solution space geometry affects the behavior of BP significantly.

The second aspect, which we focus on in the present work, is the actual BP algorithm: given a graph G, how/why does the BP algorithm “construct” a 3-coloring? Thus far there has been no rigorous analysis of BP that applies to graph coloring instances except for graphs that are globally tree-like (such as trees or forests). However, it seems empirically that BP performs well on many graphs that are just locally tree like (i.e., do not contain “short” cycles). Therefore, in the present paper our goal is to analyze BP rigorously on a class of graphs that may have a complex combinatorial structure globally, but that have a very simple solution space geometry. More precisely, we shall relate the success of BP to spectral properties of the adjacency matrix of the input graph. In addition, we point out that the analysis comprises a natural random graph model (namely, a “planted solution” model).

The main contribution of this paper is a rigorous analysis of BP for 3-coloring. We basically show that if a certain (simple) spectral heuristic for 3-coloring succeeds, then so does BP. Thus, the result does not refer to a specific random graph model, but to a special class of graphs -namely graphs that satisfy a certain spectral condition. More precisely, we say that a graph G = (V, E) on n vertices is (d, ǫ)-regular if there exists a 3-coloring of G with color classes V 1 , V 2 , V 3 such that the following is true. Let 1 Vi ∈ R V be the vector whose entries equal 1 on coordinates v ∈ V i , and 0 on all other coordinates; then R1. for all 1 < i < j < 3 the vector 1 Vi -1 Vj is an eigenvector of the adjacency matrix A(G) with eigenvalue -d, and

We shall state a few elementary properties of (d, ǫ)-regular graphs in Proposition 11 below (assuming that ǫ is sufficiently small -ǫ < 0.01, say). For instance, we shall see that (d, ǫ)-regularity implies that each vertex v ∈ V i has precisely d neighbors in each other color class V j (i = j). Moreover, (V 1 , V 2 , V 3 ) is the only 3-coloring of G (up to permutations of the color classes, of course), and for each pair i = j the bipartite graph consisting of the V i -V j -edges is an expander. Furthermore, if a graph G is (d, ǫ)-regular for any ǫ < 0.01, say, then the following spectral heuristic is easily seen to produce a 3-coloring.

Output the equivalence classes of ≈ as a 3-coloring of G.

The equivalence classes of ≈ are precisely the three color classes V 1 , V 2 , V 3 . For if v, w belong to the same color class, then their entries in all three vectors 1 Vi -1 Vj (i < j) coincide; hence, as the space spanned by these vectors contains χ 1 , χ 2 , we have v ≈ w. Conversely, if v ≈ w, then the entries of v and w in all the vectors 1 Vi -1 Vj coincide, because these vectors lie in the space spanned by χ 1 , χ 2 ; consequently, v, w belong to the same color class V k .

The main result of this paper is that BP can 3-color (d, 0.01)-regular graphs in polynomial time, provided that d is not too small and the number of vertices is sufficiently large. We defer the description of the actual (randomized, polynomial time) BP coloring algorithm BPCol, which the following theorem refers to, to Section 2. Theorem 1. There exist constants d 0 , κ > 0 such that for each d ≥ d 0 there is a number n 0 = n 0 (d) so that the following holds. If G = (V, E) is a (d, 0.01)-regular graph on n = |V | ≥ n 0 vertices, then with probability ≥ κn -1 over the coin tosses of the algorithm, BPCol(G) outputs a proper 3-coloring of G.

Observe that Theorem 1 deals with “sparse” graphs, since the lower bound n 0 on the number of vertices depends on d. The proof yields an exponential dependence, i.e., n 0 = exp(Θ(d)). Conversely, this means that the average degree of G is at most logarithmic in n, which is arguably the most relevant regime to analyze BP (cf. Section 2). Moreover, by applying BPCol O(n) times independently, the success probability can be boosted to 1 -α for any α > 0. Besides, there is an easy way to modify the (initialization step of) BPCol so that the success probability of one iteration is at least κ (rather than κn -1 ), cf. Remark 4 for details.

Let us emphasize that the contribution of Theorem 1 is not that we can now 3-color a class of graphs for which no efficient algorithms were previously known, as the aforementioned spectral heuristic 3-colors (d, 0.01)-regular graphs in polynomial time. Instead, the new aspect is that we can show that the Belief Propagation algorithm 3-colors (d, 0.01)-regular instances, thus shedding new light on this heuristic. Indeed, the proof of Theorem 1, which we present in Section 3, shows that in a sense BPCol “emulates” the spectral heuristic (although no spectral techniques occur in the description of BPCol). Thus, we establish a connection between spectral methods and BP. Besides, we note that no “purely combinatorial” algorithm (that avoids the use of advanced techniques such as Semidefinite Programming or spectral methods) is known to 3-color (d, 0.01)-regular graphs.

To illustrate Theorem 1, and to provide an example of (d, 0.01)-regular graphs, we point out that the main result comprises a regular random graph model with a “planted” 3-coloring. Let G n,d,3 be the random graph with vertex set V = {1, . . . , 3n} obtained as follows.

  1. Let V 1 , V 2 , V 3 be a random partition of V into three pairwise disjoint sets of equal size.

  2. For any pair 1 < i < j < 3 independently choose a d-regular bipartite graph with vertex set V i ∪ • V j uniformly at random.

For a fixed d we say that G n,d,3 has a certain property P with high probability (“w.h.p”), if the probability that G n,d,3 enjoys P tends to 1 as n → ∞. Concerning G n,d,3 , Theorem 1 implies the following.

Corollary 2. Suppose that d ≥ d 0 is fixed. With high probability a random graph G = G n,d,3 has the following property: with probability ≥ κn -1 over the coin tosses of the algorithm, BPCol(G) outputs a proper 3-coloring of G.

To prove Corollary 2, we show that w.h.p. G n,d,3 is (d, 0.01)-regular, cf. Section 4.

Alon and Kahale [2] were the first to employ spectral techniques for 3-coloring sparse random graphs. They present a spectral heuristic and show that this heuristic finds a 3-coloring in the so-called “planted solution model”. This model is somewhat more difficult to deal with algorithmically than the G n,d,3 model that we study in the present work. For while in the G n,d,3 -model each vertex v ∈ V i has exactly d neighbors in each of the other color classes V j = V i , in the planted solution model of Alon and Kahale the number of neighbors of v ∈ V i in V j has a Poisson distribution with mean d. In effect, the spectral algorithm in [2] is more sophisticated than the spectral heuristic from Section 1.2. In particular, the Alon-Kahale algorithm succeeds on (d, 0.01)-regular graphs (and hence on G n,d,3 w.h.p.). There are numerous papers on the performance of message passing algorithms for constraint satisfaction problems (e.g., Belief Propagation/Survey Propagation) by authors from the statistical physics community (cf. [4,5,10] and PSfrag replacements the references therein). While these papers provide rather plausible (and insightful) explanations for the success of message passing algorithms on problem instances such as random graphs G n,p or random k-SAT formulae, the arguments (e.g., the replica or the cavity method) are mathematically non-rigorous. To the best of our knowledge, no connection between spectral methods and BP has been established in the physics literature. Feige, Mossel, and Vilenchik [8] showed that the Warning Propagation (WP) algorithm for 3-SAT converges in polynomial time to a satisfying assignment on a model of random 3-SAT instances with a planted solution. Since the messages in WP are additive in nature, and not multiplicative as in BP, the WP algorithm is conceptually much simpler. Moreover, on the model studied in [8] a fairly simple combinatorial algorithm (based on the “majority vote” algorithm) is known to succeed. By contrast, no purely combinatorial algorithm (that does not rely on spectral methods or semi-definite programming) is known to 3-color G n,d,3 or even arbitrary (d, 0.01)-regular instances.

A very recent paper by Yamamoto and Watanabe [16] deals with a spectral approach to analyzing BP for the Minimum Bisection problem. Their work is similar to ours in that they point out that a BP-related algorithm pseudo-bp emulates spectral methods. However, a significant difference is that pseudo-bp is a simplified version of BP that is easier to analyze, whereas in the present work we make a point of analyzing the BP algorithm for coloring as it is stated in [4] (cf. Remark 8 for more detailed comments). Nonetheless, an interesting aspect of [16] certainly is that this paper shows that BP can be applied to an actual optimization problem, rather than to the problem of just finding any feasible solution (e.g., a k-coloring).

The effectiveness of message passing algorithms for amplifying local information in order to decode codes close to channel capacity was recently established in a number of papers, e.g. [11,12,15]. Our results are similar in flavor, however the analysis provided here allows to recover a proper 3-coloring of the entire graph, whereas in the random LDPC codes setting, message passing allows to recover only a 1 -o(1) fraction of the codeword correctly. In [12] it is shown that for the erasure channel, all bits may be recovered correctly using a message passing algorithm, however in this case the message passing algorithm is of combinatorial nature (all messages are either 0 or 1) and the LDPC code is designed so that message passing works for it.

Following [4], in this section we will describe the basic ideas behind the BP algorithm. Since BP is a heuristic based on non-rigorous ideas (mainly from artificial intelligence and/or statistical physics), the discussion of its main ideas will lack mathematical rigor a bit; in fact, some of the assumptions that BP is based on (e.g., “asymptotic independence”) may seem ridiculous at first glance. Nonetheless, as we pointed out in the introduction, BP makes up for this by being very successful empirically. At the end of this section, we will state the version of BP that we are going to work with precisely.

The basic strategy behind the BP algorithm for 3-coloring is to perform a fixed point iteration for certain “messages”, starting from a suitable initial assignment. In the case of 3-coloring the messages correspond to the edges of the graph and to the three available colors. More precisely, to each (undirected) edge {v, w} of the graph G = (V, E) and each color a ∈ {1, 2, 3} we associate two messages η a v→w from v to w about a, and η a w→v from w to v about a; in general, we will have η a v→w = η a w→v . Thus, the messages are directed objects. Each of these messages η a v→w is a number between 0 and 1, which we interpret as the “probability” that vertex v takes the color a in the graph obtained from G by removing w. Here “probability” refers to the choice of a random (proper) 3-coloring of G -w, while the graph G is considered fixed. (There is an obvious symmetry issue with this definition, which we will discuss shortly.)

Having introduced the variables η a v→w , we can set up the Belief Propagation Equations for coloring, which are the basis of the BP algorithm. The BP equations reflect a relationship that the probabilities η a v→w should (approximately) satisfy under certain assumptions on the graph G, namely that

for all edges {v, w} of G and all a ∈ {1, 2, 3} (cf. Figure 1).

The idea behind (2.1) is that v takes color a in the graph G -w iff none of its neighbors u ∈ N (v) \ w has color a in G -v. Furthermore, the probability of this event (“no u has color a”) is assumed to be (asymptotically) equal to the product u∈N (v)\w 1 -η a u→v of the individual probabilities; that is, the neighbors u = w of v are assumed to be asymptotically independent. Of course, this assumption does not hold for arbitrary graphs G. Finally, the numerator on the r.h.s. of (2.1) is just a normalizing term, which ensures that 3 a=1 η a v→w = 1. The reason why in the above discussion we refer to the probability that v takes color a in the graph G -w obtained by removing w rather than just to the probability that v takes color a in G is that in the latter case the neighbors u ∈ N (v) would never be (asymptotically) independent -not even if G is a tree. For in this case the presence of vmore precisely, the existence of the short path (u, v, u ′ ) for any two neighbors u, u ′ ∈ N (v) of v -would render the colors within the neighborhood N (v) heavily dependent. Similarly, if G contains triangles, so that for some vertices v the neighborhood N (v) is not an independent set, then the independence assumption that is implicit in (2.1) will be violated. Nonetheless, if G does not feature (many) short cycles -say, all the cycles are of length Ω(log |V |) as |V | → ∞ -then the BP equations (2.1) may at least be asymptotically valid. The random graph model G n,d,3 provides an example of graphs (essentially) without such short cycles. Now, the basic idea behind the BP algorithm is the following. We start with a “reasonable” initial assignment η a v→w (0) and use (2.1) to perform a fixed point iteration by letting

for all {v, w} ∈ E and a ∈ {1, 2, 3}. As soon as some of the values η a v→w (l + 1) are strongly “biased” toward either 0 or 1, we try to exploit this information to obtain a coloring.

Before we state the BP algorithm precisely, we need to discuss an important issue with the BP equations (2.1). Namely, in the case of 3-coloring the set of all 3-colorings is symmetric under permuting the color classes. Therefore, if we actually define η a v→w to equal the probability w.r.t. a random 3-coloring of G -w, then trivially η a v→w = 1 3 for all a, v, w. In fact, this trivial solution is actually a fixed point of (2.2). Hence, we need to “break symmetry”. In particular, it is not a good idea to choose the initial assignment η a v→w (0) = 1 3 for all a, v, w. Therefore, we do not start from η a v→w (0) = 1 3 , but we assign to each η a v→w the value 1 3 plus a small random error δ. The hope is that this random error will cause the fixed point iterations (2.2) to converge to a non trivial fixed point (other than η a v→w (0) = 1 3 for all a, v, w), and that this fixed point yields sufficient information to 3-color G. For instance, if χ :

is a fixed point of (2.2), and clearly the 3-coloring χ can be read out of the above messages easily. The algorithm BPCol is shown in Fig. 2. Observe that Step 1 ensures that

Remark 4. Theorem 1 states that the probability (over the random decisions in Step 1) that BPCol yields a proper 3-coloring of its (d, 0.01)-regular input graph is Ω(n -1 ). This can be boosted to Ω(1) by means of the following

Input: A graph G = (V, E). Output: An assignment of colors to the vertices of G.

Let δ = exp(-log 3 n).

For each v ∈ V perform the following independently: choose a ∈ {1, 2, 3} uniformly at random and assign η a v→w (0

for all b ∈ {1, 2, 3} \ {a} and w ∈ N (v).

For l = 1, . . . , l * = ⌈log 4 n⌉ compute η a v→w (l + 1) using (2.2) for all a, v, and w.

For each v ∈ V and each a ∈ {1, 2, 3} compute

Figure 2: the algorithm BPCol.

slightly more careful initialization. Instead of choosing a random a for each v ∈ V independently, we choose a random permutation σ of V and let W a = {σ((a -1)n/3 + 1), . . . , σ(an/3)} (a = 1, 2, 3). Then, for each v ∈ W a we set η a v→w (0

The proof of Proposition 13 below shows that this leads to a success probability of Ω(1). Nonetheless, we chose to state BPCol with independent decisions in its initalization, because this appears more natural (and generic) to us.

Remark 5. Although in the above discussion of the BP equation (2.2) we referred to “local” properties (such as the absence of short cycles), such local properties will not occur explicitly in our analysis of BPCol. Indeed, relating BPCol to spectral graph properties, the analysis has a “global” character. Nonetheless, various local conditions (e.g., a relatively small number of short cycles) are implicit in the “global” assumption that the graph G is (d, 0.01)-regular (cf. Theorem 1). For more background on spectral vs. combinatorial graph properties cf. Chung and Graham [7]. Remark 6. BPCol updates the messages η a v→w “in parallel”, i.e, the messages carry “time stamps” (cf. (2.2)). An alternative, equally common option would be “serial” updates, e.g., by choosing each time a random pair v, w of adjacent vertices along with a color a ∈ {1, 2, 3} and updating η a v→w via (2.1).

BPCol exploits the result of the fixed point iteration (2.2) in a more straightforward fashion than the version of BP described in [4]. Namely, after performing a fixed point iteration of (2.2), the algorithm in [4] does not assign colors to all vertices (as Step 3 of BPCol does), but only to a small fraction (the most decisive ones with respect to the calculated values). Then, the algorithm performs another fixed point iteration, etc. The reason is that in the random graph model considered in [4] typically the number of proper 3-colorings is exponential in the number of vertices, whereas (d, 0.01)-regular graphs have only one 3-coloring (up to permutations of the colors).

Remark 8. Let us discuss the essential differences between BPCol for k = 2 and the algorithm pseudo-bp analyzed in [16].

  1. In pseudo-bp the products in (2.1) are taken over all neighbors of v, including w. This apparently minor modification has a major impact on the analysis. For including w causes the messages η a v→w to be independent of w. Consequently, in pseudo-bp the messages at time l are 2|V |-dimensional objects, whereas in the present work the dimension is 2k|E|.

  2. pseudo-bp actually works with the logarithms ln(η a v→w ) of the messages instead of the original η a v→w . Of course, the equation (2.1) can be phrased in terms of ln(η a v→w ) as ln(η a v→w ) = F (ln(η a u→v )) u∈N (v) for some function F . Now, in pseudo-bp this non-linear function F is replaced by a truncated linear function F .

Throughout this section, we let ǫ > 0 be a sufficiently small constant (whose value will be determined implicitly in the course of the proof). Moreover, we keep the assumptions from Theorem 1. Thus, we let d > d 0 for a sufficiently large constant d 0 ; in particular, we assume that d 0 > exp(ǫ -2 ). In addition, we assume that n > n 0 for some sufficiently large number n 0 = n 0 (d), and that G = (V, E) is a (d, 0.01)-regular graph on n = |V | vertices. This is reflected by the use of asymptotic notation in the analysis, which always refers to n being sufficiently large.

Furthermore, we let (V 1 , V 2 , V 3 ) be a 3-coloring of G with respect to which the conditions R1 and R2 from the definition of (d, 0.01)-regularity hold. (Actually a (d, 0.01)-regular graph has a unique 3-coloring up to permutations of the color classes, but we will not use this fact.) The following easy observation will be used frequently.

Proof. Assume w.l.o.g. that i = 1 and j = 2. By condition R1 ξ = 1 Vi -1 Vj is an eigenvector of the adja-

Following [4], we will denote the elements (v, w) ∈ A as v → w. Furthermore, we shall frequently work with the vector space

). Hence, we shall denote such a vector as Γ = (Γ i v→w ) v→w∈A,i∈{1,2,3} . Semantically, one can think of Γ i v→w as the “message” that v sends to w about color i. Note that the messages η a v→w (l) defined from Section 2 constitute vectors η(l) = (η a v→w (l)) v→w∈A,a∈{1,2,3} ∈ R. We will denote the scalar product of vectors ξ, η as ξ, η . Moreover, || ξ || = ξ, ξ denotes the ℓ 2 -norm. In addition, if M : R n1 → R n2 is linear, then we let || M || = max ξ∈R n 1 , || ξ || =1 || M ξ || signify the operator norm of M . Further, M T denotes the transpose of M , i.e., the unique linear operator

In order to analyze BPCol, we shall relate the fixed point iteration of (2.2) to the spectral coloring algorithm from Section 1.2. More precisely, we will approximate the fixed point iteration of the non-linear operation (2.2) by a fixed point iteration for a linear operator. One of the key ingredients in the analysis is to show how symmetry is broken (i.e., convergence to the all-1 3 fixed point is avoided). Indeed, it may not be clear a priori that this will happen at all, because the random bias generated in Step 1 of BPCol is uncorrelated to the planted coloring. The analysis is based on the following crucial observation (cf. Corollary 12 below): after a logarithmic number of iterations, for all v ∈ V i , w ∈ V j , i = j the messages η a v→w are dominated by eigenvectors of the linear operator which we use to approximate (2.2). Furthermore, these eigenvectors mirror the coloring (V 1 , V 2 , V 3 ) and are (almost) constant on every color class V i (with basically 0, 1, -1 values on the different color classes). Hence, the (random) initial bias gets amplified so that the planted 3-coloring can eventually be read out of the messages.

To carry out this analysis precisely, we set

Moreover, we let B : R → R denote the (non-linear) operator defined by

Then (2.2) can be rephrased in terms of the vectors ∆(l) = (∆ a v→w (l)) v→w∈A, a∈{1,2,3} ∈ R as

We shall see that we can approximate the non-linear operator B in (3.1) by the following linear operator

Indeed, B ′ : R → R is just the total derivative of B at 0. We define a sequence Ξ(l) by letting Ξ(0) = ∆(0) and Ξ(l) = B ′ l Ξ(0) for l ≥ 1, thinking of Ξ(l) as a “linear approximation” to ∆(l). As a first step, we shall simplify the operator B ′ a little.

Proof.

Step 1 of BPCol ensures that the initial vector satisfies

Therefore, by induction and by the definition (3.2) of B ′ we see that

, the second summand on the r.h.s. of (3.2) vanishes.

Due to Lemma 10, we may just replace B ′ by the simpler linear operator L : R → R defined by

which satisfies

We also note for future reference that

because (2.3) entails that (3.5) is true for l = 0, whence the definition (3.3) of L shows that (3.5) holds for all l > 0.

In order to prove Theorem 1, we shall first analyze the sequence Ξ(l) and then bound the error || Ξ(l) -∆(l) || ∞ resulting from linearization. To study the sequence Ξ(l), we investigate the dominant eigenvalues of L and their corresponding eigenvectors. More precisely, we shall see that our assumption on the spectrum of the adjacency matrix A(G) implies that the dominant eigenvectors of L mirror a 3-coloring of G. We defer the proof of the following proposition to Section 3.3. Proposition 11. Let e a ij ∈ R be the vector with entries

Moreover, let E be the space spanned by the 18 vectors e a ij (a, i, j ∈ {1, 2, 3}, i = j). Then L operates on E as follows.

S1. Finally, we have

The eigenvectors that we are mostly interested in are ζ a 2 , ζ a 3 (a = 1, 2, 3) as (3.6) shows that these vectors represent the coloring (V 1 , V 2 , V 3 ) completely. As a next step, we shall show that Ξ(l) can be approximated well by a linear combination of the vectors ζ a 2 , ζ a 3 , provided that l is sufficiently large. To this end, let

be the projection of the initial vector ∆(0) = Ξ(0) onto the eigenvector ζ a i ; we shall see below that the normalization in (3.9) ensures that x a i is bounded away from 0. Proof. Since by assumption the initial vector Ξ(0) is perpendicular to e a for a = 1, 2, 3 and because e 1 , e 2 , e 3 are eigenvectors of L by S2, we have Ξ(l) ⊥ e a . Therefore, we can decompose Ξ(l) as

Thus, to prove the corollary we need to compute the numbers z a i (l) and bound || ξ(l) || ∞ . With respect to the coefficients z a i (l), note that z a i (l) = λ l z a i (0), because by S1 ζ a i is an eigenvector with eigenvalue λ. Moreover,

Hence, (3.9) and (3.10) yield z a i (0) = x a i • ν. Thus,

To bound the “error term” || ξ(l) || ∞ , we note that S3-S5 entail While in the initial vector ∆(0) = Ξ(0) the messages are completely uncorrelated with the coloring (V 1 , V 2 , V 3 ), Corollary 12 entails that the dominant contribution to Ξ(L 1 ) comes from the eigenvectors ζ a i , which represent that coloring. This implies that all vertices v in each class V a send essentially the same messages to all other vertices w ∈ V b about each of the colors 1, 2, 3, and these messages are solely determined by the initial projections x a i of ∆(0) onto ζ a i . Hence, after L 1 iterations the messages are essentially coherent and strongly correlated to the planted coloring. Thus, as a next step we analyze the distribution of the projections x a i . To simplify the expression resulting from Corollary 12, let

, and y a 3 = -x a 3 .

(3.16)

Then (3.6) and Corollary 12 entail that for all v ∈ V i , all w ∈ N (v), and l ≥ L 1 we have

Of course, the numbers y a i only depend on the initial vector ∆(0). Therefore, we say that ∆(0) is feasible if F1. ∆(0) ⊥ e a for a = 1, 2, 3, and The elementary (though tedious) proof of Proposition 13 can be found in Section 3.4. Combining Corollary 12 and Proposition 13, we conclude that with probability Ω(n -1 ) (namely, if ∆(0) is feasible) we have

Having obtained a sufficient understanding of the sequence Ξ(l), we will now show that these vectors provide a good approximation to the vectors ∆(l), which we are actually interested in. The proof of the following proposition can be found in Section 3.5.

Combining the information on the sequence Ξ(l) provided by Corollary 12 and Proposition 13 with the bound on || Ξ(L 2 ) -∆(L 2 ) || ∞ from Proposition 14, we can show that the messages obtained in the next one or two steps of the algorithm already represent the coloring rather well. To be precise, let us call the vector η(l) proper if

The proof of Proposition 15 is the content of Section 3.6. Proposition 15 shows that the “rounding procedure” in Step 3 of BPCol applied to the messages η(L 3 ) would yield the coloring (V 1 , V 2 , V 3 ). However, BPCol actually applies that rounding procedure to η(l * ), where l * > L 3 . Therefore, in order to show that BPCol outputs a proper 3-coloring, we need to show that these messages η(l * ) are proper, too.

Proof. Let v ∈ V a for some 1 ≤ a ≤ 3, w ∈ N (v), and {b, c} = {1, 2, 3} \ {a}. Since η(l) is proper, we have

Consequently, the definition (2.2) of the sequence η(l) shows that

As the construction (2.2) of η(l + 1) ensures that η 1 v→w (l + 1) + η 2 v→w (l + 1) + η 3 v→w (l + 1) = 1, (3.21) entails that η a v→w (l + 1) ≥ 0.99 and η b v→w (l + 1) ≤ 0.01, whence η(l + 1) is proper.

Proof of Theorem 1. Proposition 13 states that ∆(0) is feasible with probability Ω(n -1 ). Therefore, to establish Theorem 1, we show that BPCol outputs the coloring

Thus, assume that ∆(0) is feasible and let L 2 be the maximum integer such that

and the ℓ ∞ -norm of Ξ(l) grows by a factor of λ in each iteration. Therefore, Proposition 15 entails that η(L 3 ) is proper for some L 3 = Θ(log 3 n). Thus, by Lemma 16 the final η(ℓ * ) generated in Step 2 is proper, whence Step 3 of BPCol outputs the coloring V 1 , V 2 , V 3 .

The operation (3.3) of L is symmetric with respect to the three colors a = 1, 2, 3. Therefore, we shall represent L as a tensor product of a 3 × 3 matrix and an operator that represents the graph G. To this end, we define operators M : R A → R A and K : R A → R A by

i.e., -1 2 (M -K) represents the operation of L with respect to a single color a ∈ {1, 2, 3}. Therefore, we can rephrase the definition (3.3) of L on the space

Hence, in order to understand L, we basically need to analyze M -K.

For i, j ∈ {1, 2, 3} we define vectors e ij ∈ R A by letting

The following lemma shows that it makes sense to split the analysis of M -K into two parts: first we shall analyze how M -K operates on the space E 0 spanned by the vectors e ij (1 ≤ i, j ≤ 3, i = j); then, we will study the operation of M -K on E ⊥ 0 .

Lemma 17. If ξ ∈ E 0 , then Mξ, M T ξ, Kξ, K T ξ ∈ E 0 .

Proof. Let i, j, k ∈ {1, 2, 3} be pairwise distinct. Since Ke ij = e ji , we have KE 0 ⊂ E 0 . Moreover, K T = K. Furthermore, by Lemma 9

Hence, Me ij = d(e jk + e ji ), and thus ME 0 ⊂ E 0 . In addition, the transpose of M is given by

Therefore,

To study the operation of M -K on E 0 , note that (3.24) implies that (M -K)e ij = de jk + (d -1)e ji , if i, j, k ∈ {1, 2, 3} are pairwise distinct. Therefore, with respect to the basis e 12 , e 23 , e 21 , e 23 , e 31 , e 32 of E 0 , we can represent the operation of M -K on E 0 by the 6 × 6 matrix

.

Observe that M is not symmetric. Hence, a priori it is not clear that M is diagonalizable with real eigenvalues. Nevertheless, a (very tedious) direct computation yields the following.

Lemma 18. The 6 × 6 matrix M is diagonalizable and has the non-zero eigenvalues 1, 2d -1,

The eigenspace with eigenvalue 2d -1 is spanned by 1. Moreover, there are two mutually perpendicular eigenvectors

Since M describes the operation of M -K on the subspace E 0 , Lemma 18 implies the following.

Corollary 19. Restricted to the subspace E 0 , the operator M -K is diagonalizable with non-zero eigenvalues 1, 2d -1, and Λ, Λ ′ as in (3.25) Corollary 19 describes the operation of M -K on E 0 completely. Therefore, as a next step we shall analyze how M -K operates on E ⊥ 0 . More precisely, our goal is to show that restricted to E ⊥ 0 the norm of M -K is significantly smaller than Λ. To this end, we observe that the operator K merely permutes the coordinates. Consequently,

(3.26)

To to bound the norm of M on E ⊥ 0 , we consider three subspaces of E ⊥ 0 . The first subspace S consists of all vectors ξ ∈ E ⊥ 0 such that the value ξ v→w only depends on the “start vertex” v; in symbols,

If ξ ∈ S and v ∈ V , then we let ξ v→ = ξ v→w for any w ∈ N (v), i.e., ξ v→ is the “outgoing value” of v.

The second subspace T consists of all ξ ∈ E ⊥ 0 such that ξ u→v depends only on the “target vertex” v, i.e.,

For ξ ∈ T and v ∈ V we let ξ →v = ξ u→v for any u ∈ N (v), i.e., ξ →v signifies the “incoming value” of v. Furthermore, the third subspace U consists of all ξ such that for any vertex the sum of the “incoming” values equals 0:

  1. Moreover, if ξ ∈ T , then (Mξ) v→w = 2dξ →v for all v → w ∈ A. In particular, Mξ ∈ S. η u→v = 0, whence T ⊥ U . Furthermore, for any γ ∈ E ⊥ 0 the vector η with entries

ξ u→w lies in T , because the sum on the r.h.s. is independent of v. In addition, ξ = γ -η satisfies

so that ξ ∈ U . Hence, any γ ∈ E ⊥ 0 can be written as γ = η + ξ with η ∈ T and ξ ∈ U , i.e., E ⊥ 0 = T ⊕ U .

By now we have all the prerequisites to analyze the operation of M on E ⊥ 0 .

Proof. Let ξ ∈ E ⊥ 0 . By the third part of Lemma 20 there is a decomposition ξ = ξ T + ξ U such that ξ T ∈ T and ξ U ∈ U . Furthermore, the first part of Lemma 20 entails that Mξ = Mξ T . Therefore, we may assume without loss of generality that ξ = ξ T ∈ T . Hence, the second part of of Lemma 20 implies that

and ξ ′ = Mξ ∈ S. Consequently, letting ξ ′′ = Mξ ′ = M 2 ξ, we obtain

ξ ′ u→ .

(3.28) Since the r.h.s. of (3.28) is independent of w, we conclude ξ ′′ ∈ S.

In order to bound || ξ ′′ || = || M 2 ξ || , we shall express the sum on the r.h.s. of (3.28) in terms of the adjacency matrix A(G). To this end, consider the two vectors

Before we get to the proof, let us briefly discuss why the assertion (i.e., Proposition 13) is plausible. In fact, let us point out that the vector ∆(0) is easily seen to satisfy F2 with probability Ω(1). For each of the inner products ∆(0), ζ a i is a sum of n independent random variables, whence the central limit theorem implies that

, which is independent of the random vector ∆(0), is needed to ensure that mean and variance are of order Θ(1)). In fact, since the vectors (ζ a i ) a=1,2,3; i=2,3 are mutually perpendicular, the joint distribution of the random variables

is asymptotically a (multivariate) Gaussian. Therefore, the probability that ∆(0) satisfies F2 is Ω(0).

However, once we condition on ∆(0) satisfying F1, the entries of ∆(0) are not independent anymore, whence the above argument does not yield a bound on the probability that ∆(0) satisfies both F1 and F2. Nonetheless, the dependence of the entries of ∆(0) is weak enough to allow for an elementary direct analysis. We begin with bounding the probability that ∆(0) satisfies F1. To this end, we define a partition (W 1 , W 2 , W 3 ) of V by letting

in other words, W i consists of all vertices for which the random number a chosen in Step 1 of BPCol was equal to i.

Lemma 22. The probability that ∆(0) satisfies F1 is Ω(n -1 ).

Proof. A sufficient condition for ∆(0) to satisfy F1 is that

Moreover, the total number of vectors that can be generated by Step 1 of BPCol equals 3 n , out of which

Therefore, the assertion follows from Stirling’s formula.

In the remainder of this section we condition on the event that ∆(0) is such that W 1 = W 2 = W 3 . Thus, (W 1 , W 2 , W 3 ), is just a random partition of V into three classes of equal size, and for all v ∈ W i , all j ∈ {1, 2, 3}{i}, and all w ∈ N (v) we have

Lemma 23. For any constant c 1 > 0 there exists a constant c 2 > 0 such that the following holds. If (s a i ) i,a=1,2,3 are integers of absolute value

Proof. The sets W 1 , W 2 , W 3 are randomly chosen mutually disjoint subsets of V of cardinality n/3 each, whereas V 1 , V 2 , V 3 are fixed subsets of V . Therefore, the total number of ways to choose W 1 , W 2 , W 3 is given by the multinomial coefficient n n/3,n/3,n/3 ; by Stirling’s formula,

Moreover, the number of ways to choose

(because the a’th factor on the r.h.s. equals the number of ways to partition V a into three pieces

Furthermore, once more due to Stirling’s formula,

Since we are assuming that s a i ≤ c 1 √ n and

for a bounded number c ′ 2 that depends only on c 1 . Finally, plugging (3.39) into (3.37) and cancelling, we obtain the assertion.

Corollary 24. For any two constants c 3 , β > 0 there exists a constant c 4 > 0 such that the following holds. If

Proof. Let S be the set of all tuples (s a i ) a,i=1,2,3 of integers such that |n -

as desired.

Since the vector ∆(0) just represents the partition W 1 , W 2 , W 3 , and the vectors j =i e a ij just represents the coloring V 1 , V 2 , V 3 , Corollary 24 easily implies a result on the joint distribution of the inner products ∆(0), j =i e a ij .

Corollary 25. For any two constants c 5 , γ > 0 there exists a constant c 6 > 0 such that the following is true. Suppose that (z a i ) 1≤a,i≤3 are numbers such that |z b j | ≤ c 5 and

Proof. The definition of η(0) in Step 1 of BPCol shows that

for all v → w ∈ A. (3.42) Furthermore, using (3.40), we can easily compute the scalar product ∆(0), j =i e a ij (1 ≤ a, i ≤ 3):

Therefore, the assertion follows from Corollary 24 by setting

Proof of Proposition 13. Let α = exp(-1/ǫ) and

and let V ⊂ R A be the space spanned by these nine vectors. In addition, let q : R A → V be the orthogonal projection onto V. Since the construction of the initial vector ∆(0) in Step 1 of BPCol ensures that ∆(0) ∈ V, we have

Hence, instead of the vectors ζ a i we may work with their projections qζ a i onto V. Thus, let q a ij ∈ R be the coefficients such that

q a ij e a j (i = 2, 3, a = 1, 2, 3).

Then by symmetry we have q a ij = q b ij for all 1 ≤ a, b ≤ 3; therefore, we will briefly write q ij instead of q a ij . Furthermore, (3.6) implies the bounds 0.99 ≤ q 21 ≤ 1.01, -1.01 ≤ q 22 ≤ -0.99, -0.01 ≤ q 23 ≤ 0.01, (3.46) 0.99 ≤ q 31 ≤ 1.01, -0.01 ≤ q 32 ≤ -0.01, -1.01 ≤ q 33 ≤ -0.99.

(3.47)

As a consequence, the matrix

q 21 q 22 q 23 q 31 q 32 q 33 1 1 1   is regular, and there is a constant q ji z a i -

Therefore, (3.52) yields P [∀a, i : |x a i -xa i | < α/2] ≥ c 6 . Thus, the assertion follows from (3.45) and Lemma 22.

Our goal in this section is to bound the error || ∆(l) -Ξ(l) || ∞ resulting from replacing the non-linear operator B by the linear operator L. Since ∆(l) = B l ∆(0) and Ξ(l) = L l Ξ(0) = L l ∆(0) by (3.4), the main difficulty of this analysis is to bound how errors that were made early on in the sequence (i.e., for “small” l) amplify in the subsequent iterations. To control this phenomenon, we shall proceed by induction on l. We begin with a simple lemma that bounds the error occurring in a single iteration. Recall that the constructions of Ξ(l) and ∆(l) ensure that Lemma 26. Suppose that Γ satisfies 3 a=1 Γ a v→w = 0 for all v → w ∈ A.

Proof. We employ the elementary inequalities

where we let (3.55)

.56) and (3.57) yield that

Finally, a glance at (3.3) reveals that (LΓ) a v→w = 1 3 (1 -L a ), and thus the assertion follows from (3.58). Lemma 26 allows us to bound the error || ∆(l + 1) -Ξ(l + 1) || ∞ resulting from iteration l + 1 in terms of the error || ∆(l) -Ξ(l) || ∞ from the previous iteration. In the sequel we let C > 0 denote a sufficiently large constant.

Proof. By Lemma 26 and the definition (3.3) of L we have

This implies the assertion, because we are assuming that || ∆(l) -Ξ(l) || ∞ ≤ (Cd) -1 .

Further, applying Lemma 27 L times recursively, we obtain the following bound.

To proceed, we need the following (rough) absolute bound on the error

Proof. The proof is by induction on l. For L = 0 the assertion is trivially true. Thus, assume that || ∆(l) -Ξ(l) || ∞ < (Cd) -1 for all l < L ≤ log 2 n. Then Corollary 28 entails that

Further, the definition (3.3) of L shows that

As δ ≤ exp(-log

We proceed inductively for log

Furthermore, || Ξ(l) -∆(l) || ∞ < (Cd) -1 for all l < log 2 n by Lemma 29. Therefore, we can apply Corollary 28 to obtain

Finally, Proposition 14 follows from Lemma 30 directly.

Let µ = νλ L2 . Then Corollary 12 and Proposition 13 entail that

To prove Proposition 15, we consider two cases. The first case is that || ∆(L 2 ) || ∞ ≤ (ǫd) -1 is “small”. Then it will take two more steps for the messages to properly represent the coloring

is “large”, we will just need one more step (L 3 = L 2 + 1). In both cases the proof is based on a direct analysis of the BP equations (2.2).

where |γ(u, v, i)| ≤ ǫ 3 and β, β ′ > ǫ 2 .

Proof. We have

for some -2ǫ 2 ≤ γ 4 = γ 4 (1, v, w) ≤ 2ǫ 2 . Now, assume that v ∈ V 2 . Then (3.64) and (3.65) entail that there are numbers -ǫ 2 < γ 5 , γ 6 < ǫ 2 such that

Therefore, Corollary 32. Suppose that. 0.01ǫd

Proof. We assume without loss of generality that a = 1. Moreover, suppose that v ∈ V 1 . We shall bound the quotient

for j = 2, 3, from below. Lemma 31 implies that for u ∈ V 3

1 -η 1 u→v (L 2 + 1) 1 -η 2 u→v (L 2 + 1) ≥ 2/3 + (1 -ǫ 3 )β ′ 2/3 + (1 + ǫ 3 )β ′ ≥ 1 + 3ǫ 3 β ′ ≥ 1 -6ǫ 3 .

Hence,

Furthermore, for u ∈ V 2 Lemma 31 entails that

Consequently, (3.74)

Without loss of generality we may consider a vertex v ∈ V 1 and a neighbor w ∈ N (v). We will prove that η 1 v→w (L 2 + 1)/η 2 v→w (L 2 + 1) > 1000. Since 3 j=1 η j v→w (L 2 + 1) = 1, this implies the assertion. To bound the quotient from below, we decompose

which implies the assertion.

Finally, Proposition 15 is a direct consequence of Corollary 32 and Lemma 33.

Throughout this section, we assume that d ≥ d 0 for a sufficiently large constant d 0 > 0, and that n > n 0 = n 0 (d) for a large enough n 0 . Set p = d/n. Let G = G n,d,3 be a random graph with vertex set V = {1, . . . , 3n} and “planted” 3-coloring V 1 , V 2 , V 3 . In order to analyze the adjacency A(G), we shall employ the following lemma, which follows immediately from the “converse expander mixing lemma” from [3]. Proof. Let A(G) = (a v,w ) v,w∈V denote the adjacency matrix of G. Moreover, let

Then A ij = (a ij vw ) v,w∈V is the adjacency matrix of the bipartite subgraph of G induced on V i ∪ V j . Let E be the subspace of R V spanned by the three vectors 1 (provided that d is sufficiently large). Furthermore, as the construction of G ensures that each vertex v ∈ V i has exactly d neighbors in each class V j = V i , we can compute the vector ζ i = A(G) 1 Vi as follows: for any v ∈ V

Hence, ζ i = A(G) 1 Vi = d j =i 1 Vj . Therefore, for any 1 ≤ i < j ≤ 3 we have

Combining (4.2) and (4.3), we see that G is (d, 0.01)-regular w.h.p.

Finally, Corollary 2 follows from Theorem 1 and Corollary 36.

We have shown that BPCol 3-colors (d, 0.01)-regular graphs in polynomial time. Three potentially interesting extensions suggest themselves, which may be the subject of future work.

  1. In (d, 0.01)-regular graphs every vertex has precisely d neighbors in each color class except for its own. By comparison, in the planted random graph model studied in [2] the number that a vertex has in another color class is Poisson with mean d. It would be interesting to see if/how the present analysis can be modified to deal with such a more irregular degree distribution.

) is a more involved version of Belief Propagation (although SP can be rephrased as BP on a different model [13]) and performs very well empirically on random graphs G(n, p). It would be interesting to extend our analysis to SP.

  1. In a (d, 0.01)-regular graph there is exactly one 3-coloring (up to permutations of the color classes). Nonetheless, we think that the techniques of our analysis can be extended to more complicated “solution spaces”. For instance, it should be straightforward to deal with graphs that have a bounded number of distinct 3-colorings.

S4. Furthermore, LE ⊂ E and L T E ⊂ E.

S4

v = u∈N (v) η ′ u for all v ∈ V , i.e., η ′′ = A(G)η ′ .

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut