In this paper we explore a connection between two seemingly different problems from two different domains: the small-set expansion problem studied in unique games conjecture, and a popular community finding approach for social networks known as the modularity clustering approach. We show that a sub-exponential time algorithm for the small-set expansion problem leads to a sub-exponential time constant factor approximation for some hard input instances of the modularity clustering problem.
Deep Dive into On a Connection Between Small Set Expansions and Modularity Clustering in Social Networks.
In this paper we explore a connection between two seemingly different problems from two different domains: the small-set expansion problem studied in unique games conjecture, and a popular community finding approach for social networks known as the modularity clustering approach. We show that a sub-exponential time algorithm for the small-set expansion problem leads to a sub-exponential time constant factor approximation for some hard input instances of the modularity clustering problem.
All graphs considered in this note are undirected and unweighted 2 . Let G = (V, E) denote the given input graph with n = |V | nodes and m = |E| edges, let d v denote the degree of a node v ∈ V , and let A(G) = a u,v (G) denote the adjacency matrix of G, i.e., a u,v (G) = 1, if {u, v} ∈ E 0, otherwise. Since our result spans over two distinct research areas, we summarize the relevant definitions from both research fields [1,6] below for convenience.
(a) By a “set of (k) communities” we mean a partition of the set of nodes V into (k) non-empty parts.
(b) If G is d-regular for some given d, then its symmetric stochastic walk matrix is denoted by A(G), and is defined as the n×n real symmetric matrix A(G) = au,v(G) d .
(c) For a real number τ ∈ [ 0, 1), the τ -threshold rank of G, denoted by rank τ (G), is the number of eigenvalues λ of A(G) satisfying |λ| > τ .
(d) For a subset ∅ ⊂ S ⊂ V of nodes, the following quantities are defined:
The modularity of a set of communities S is M(S) = S∈ S M(S).
(f) The goal of the modularity k-clustering problem on an input graph G is to find a set of at most k communities S that maximizes M(S).
S is a set of at most k communities M(S) denote the optimal modularity value for a modularity k-clustering; it is easy to verify that 0 ≤ OPT k (G) < 1.
(g) The goal of the modularity clustering problem on G is to find a set of (unspecified number of) communities S that maximizes M(S). Let OPT(G)
denote the optimal modularity value for a modularity clustering; obviously, OPT(G) = OPT n (G).
(h) exp(ξ) denotes 2 cξ for some constant c > 0 that is independent of ξ.
The modularity clustering problems as described above is extremely popular in practice in their applications to biological networks [8,9] as well as to social networks [5][6][7]. For relevant computational complexity results for modularity maximization, see [2,4]. The following results from [4] demonstrate the computational hardness of OPT 2 (G) and OPT(G) even if G is a regular graph.
Theorem 1.1. [4] (a) For every constant d ≥ 9, there exists a collection of d-regular graphs G of n nodes such it is NP-hard
(b) There exists a collection of (n -3)-regular graphs G of n nodes such it is NP-hard to decide if OPT(G) > 0.9388 n-4 or if OPT(G) < 0.9382 n-4 .
Theorem 2.1. Let G be a d-regular graph. Then, for some constant 0 < ε < 1 /2, there is an algorithm A ε with the following properties:
• A ε runs in sub-exponential time, i.e., in time exp(δ n) for some constant 0 < δ = δ(ε) < 1 that depends on ε only.
Remark 2.2 (usability of the approximation algorithm in Theorem 2.1). We prove Theorem 2.1 for ε = 10 -6 . It is natural to ask if there are in fact infinite families of d-regular graphs G that satisfy OPT(G) ≥ 1 -10 -6 or OPT(G) ≤ 10 -6 . The answer is affirmative, and we provide below examples of infinite families of such graphs.
OPT(G) ≥ 1 -10 -6 : Consider, for example, the following explcit bound was demonstrated in [2, Corollary 6.4]:
Based on this and other known results on modularity clustering, examples of families of regular graphs G for which OPT(G) ≥ 1 -10 -6 include:
(1) G is an union of k disjoint cliques each with n k > 3 nodes for any k > 10 6 .
(2) G is obtained by a local modification from the graph in (1) such as:
• Start with an union of k disjoint cliques C 1 , C 2 , . . . , C k each with n k > 3 nodes for any k sufficiently large with respect to 10 6 (k ≥ 10 7 suffices).
• Remove an arbitrary edge {u i , v i } from each clique C i .
Let U = ∪ k i=1 {u i } and and V = ∪ k i=1 {v i }. • Add to G the edges corresponding to any perfect matching in the complete bipartite graph with node sets U and V .
OPT(G) ≤ 10 -6 : Theorem 1.1 [4] involves infinitely many graphs of n > 4 + 0.9388 × 10 6 nodes satisfying OPT(G) < 0.9388 n-4 < 10 -6 (these graphs are edge complements of appropriate families of 3regular graphs used in [3]).
Proof of Theorem 2.1. 3 Set ε = 10 -6 . We assume that G is d-regular, and either OPT(G) ≥ 1 -10 -6 or OPT(G) ≤ 10 -6 .
Let S = S 1 , S 2 , . . . , S k be a set of communities of G. The objective function M(S) can be equivalently expressed as follows via simple algebraic manipulation [2,[5][6][7]. Let m i denote the number of edges whose both endpoints are in S i , m ij denote the number of edges one of whose endpoints is in S i and the other in S j and D i =
We will provide an approximation for OPT 2 (G) and then use the result that OPT 2 (G) ≥ OPT(G) 2 proved in [4]. Note that if if OPT(G) ≤ 10 -6 then obviously OPT 2 (G) ≤ 10 -6 , whereas if
2 . Consider a partition S of V into exactly two sets, say S and S = V \ S with 0 < µ(S) ≤ 1 /2. By Lemma 2.2 of [4], M(S) = M(S) and thus
Thus, letting D = D(S), µ = µ(S) and Φ = Φ(S), we have Φ = 1 -D as per our notations used in page 2 and the goal of modularity 2-clustering is to maximize the following function f over all possible valid choices of D and µ:
Let S ⋆ = { S ⋆ , S ⋆ } be an optimal solution for modularity
…(Full text truncated)…
This content is AI-processed based on ArXiv data.