Maximizing the Cohesion is NP-hard
📝 Original Info
- Title: Maximizing the Cohesion is NP-hard
- ArXiv ID: 1109.1994
- Date: 2011-10-11
- Authors: Adrien Friggeri (ENS / LIP Laboratoire de lInformatique du Parallelisme / INRIA Grenoble Rh^one-Alpes, IXXI), Eric Fleury (ENS / LIP Laboratoire de lInformatique du Parallelisme / INRIA Grenoble Rh^one-Alpes, IXXI)
📝 Abstract
We show that the problem of finding a set with maximum cohesion in an undirected network is NP-hard.💡 Deep Analysis

📄 Full Content
Let G = (V, E) be a graph with vertex set V and edge set E of size n = |V | ≥ 4. For all vertices u ∈ V , we write d G (u) the degree of u, or more simply d(u) 1 . A triangle in G is a triplet of pairwise connected vertices.
For all sets of vertices S ⊆ V , let G[S] = (S, E S ) be the subgraph induced by S on G. We write m(S) = |E S | the number of edges in G[S], and i
Moreover, for all (u, v) in E, let △(uv) = |{w ∈ V : (uw, vw) ∈ E 2 }| be the number of triangles the edge uv belongs to in G.
Finally, we recall the definition of the cohesion of a set S in G: In this article we examine the problem of finding a set of vertices S ⊆ V of maximum cohesion, i.e. for all subset S ′ ⊆ V , C(S ′ ) ≤ C(S).
We now proceed to prove that finding a set of vertices with maximum cohesion in G is NP-hard. We will first show in Section 1 that this problem is equivalent to that of finding a connected set of vertices with maximum cohesion in G. The decision problem associated to the latter is Connected-Cohesive.
Then, we shall prove that Connected-Cohesive is NP-complete by reducing Clique (problem GT19 in [2]). From there we deduce that the optimization problem of finding a set of vertices with maximum cohesion is NP-hard. Problems 1. Connected-Cohesive:
- Clique:
induced by S is a clique?
In order to prove that a set of vertices with maximum cohesion in a given network is connected, we need the following lemma:
Proof. Suppose C(S 1 ) ≤ C(S 1 ∪ S 2 ) and C(S 2 ) ≤ C(S 1 ∪ S 2 ). Given that S 1 and S 2 are disconnected, i(S 1 ∪ S 2 ) = i(S 1 ) + i(S 2 ) and o(S 1 ∪ S 2 ) = o(S 1 ) + o(S 2 ). We can then write:
By summing ( 1) and ( 2), we obtain:
We then have:
Which simplifies to:
Hence the contradiction. Therefore, for all S 1 , S 2 ⊆ V , disconnected:
Theorem 1.2. Let S be the set of vertices of G with the highest cohesion, S is connected.
Proof. Suppose S is not connected, then their exist two disconnect subsets S 1 , S 2 ⊆ S such that S = S 1 ∪ S 2 . Given that S has maximum cohesion, we have C(S) ≥ C(S 1 ). Thus per Lemma 1.1: C(S) < C(S 2 ) and S does not have the highest cohesion, hence the contradiction.
Corollary 1.3. Per Theorem 1.2, the problem of searching for a set of vertices with maximum cohesion is strictly equivalent to that of searching a set of connected vertices with maximum cohesion.
2 Connected-Cohesive is NP-complete
First note that given a set S of vertices of G, it is possible to verify that S is a solution of Connected-Cohesive by computing its cohesion, its size, its connectivity and the minimum degree of its vertices, all in polynomial time. Therefore Connected-Cohesive is in NP.
Algorithm 1 Transforms an instance of Clique in an instance of Connected-Cohesive Require:
Let us now reduce Clique to Connected-Cohesive. Let (G = (V, E), k ∈ N) be an instance of Clique2 . We can assume that G is connected (if not, we
is a clique and for all u un K, the neighbors of u are also in V . Therefore, each edge in K forms one triangle with each vertex in V \ K, which leads to o G ′ (K) = k 2 (nk). Finally, this gives a cohesion:
Conversely, let S ⊆ V ′ be a connected set of vertices such that C G ′ (S) ≥
. We will show that S is a clique of size larger than k and that S ⊆ V . First note that |S| ≥ 3, because by definition, if |S| < 3, C G ′ (S) = 0 which would lead to a contradiction. First, suppose that S is not a clique in G, then let us distinguish two cases:
If S ⊆ V and S is not a clique, then S contains two vertices u, v ∈ V 2 such that uv ∈ E.
If S ⊆ V , then ∃u ∈ S \ V , and S being connected, there exist v ∈ V ′ such that uv ∈ E.
INRIA
Here, as elsewhere, we drop the index referring to the underlying graph if the reference is clear.RR n°7734
We consider here that |G| > 2 and k > 2, although this is not exactly Clique, this problem is clearly NP-complete, given that the complexity of Clique does not arise from those small values.RR n°7734
Note that the problem of finding a set of vertices of maximum cohesion containing a set of predefined vertices is also NP-hard, by an immediate reduction
📸 Image Gallery
