Finding Dense Subgraphs in G(n,1/2)

Reading time: 7 minute
...

📝 Original Info

  • Title: Finding Dense Subgraphs in G(n,1/2)
  • ArXiv ID: 0807.5111
  • Date: 2008-07-31
  • Authors: Atish Das Sarma, Amit Deshpande, Ravi Kannan

📝 Abstract

Finding the largest clique is a notoriously hard problem, even on random graphs. It is known that the clique number of a random graph G(n,1/2) is almost surely either k or k+1, where k = 2log n - 2log(log n) - 1. However, a simple greedy algorithm finds a clique of size only (1+o(1))log n, with high probability, and finding larger cliques -- that of size even (1+ epsilon)log n -- in randomized polynomial time has been a long-standing open problem. In this paper, we study the following generalization: given a random graph G(n,1/2), find the largest subgraph with edge density at least (1-delta). We show that a simple modification of the greedy algorithm finds a subset of 2log n vertices whose induced subgraph has edge density at least 0.951, with high probability. To complement this, we show that almost surely there is no subset of 2.784log n vertices whose induced subgraph has edge density 0.951 or more.

💡 Deep Analysis

Deep Dive into Finding Dense Subgraphs in G(n,1/2).

Finding the largest clique is a notoriously hard problem, even on random graphs. It is known that the clique number of a random graph G(n,1/2) is almost surely either k or k+1, where k = 2log n - 2log(log n) - 1. However, a simple greedy algorithm finds a clique of size only (1+o(1))log n, with high probability, and finding larger cliques – that of size even (1+ epsilon)log n – in randomized polynomial time has been a long-standing open problem. In this paper, we study the following generalization: given a random graph G(n,1/2), find the largest subgraph with edge density at least (1-delta). We show that a simple modification of the greedy algorithm finds a subset of 2log n vertices whose induced subgraph has edge density at least 0.951, with high probability. To complement this, we show that almost surely there is no subset of 2.784log n vertices whose induced subgraph has edge density 0.951 or more.

📄 Full Content

Finding the largest clique is a notoriously hard problem, even on random graphs. It is known that the clique number of a random graph G(n, 1/2) is almost surely either k or k + 1, where k = ⌈2 log n -2 log log n -1⌉ (Section 4.5 in [1], also [2]). However, a simple greedy algorithm finds a clique of size only log n (1 + o(1)), with high probability, and finding larger cliques -that of size even (1 + ǫ) log n -in randomized polynomial time has been a long-standing open problem [3]. In this paper, we study the following generalization: given a random graph G(n, 1/2) find the largest subgraph with edge density at least (1δ). We show that a simple modification of the greedy algorithm finds a subset of 2 log n vertices whose induced subgraph has edge density at least 0.951, with high probability. To complement this, we show that almost surely there is no subset of 2.784 log n vertices whose induced subgraph has edge density 0.951 or more.

We use G(n, p) to denote a random graph on n vertices where each pair of vertices appears as an edge independently with probability p. We use V to denote its set of vertices and E to denote its set of edges. Moreover, given two subsets S ⊆ V and T ⊆ V , we use E(S, T ) to denote the set of edges with one endpoint in S and another endpoint in T . The density of the subgraph induced by vertices in S is given by

.

Therefore, the expected density of G(n, 1/2) is 1/2 and the density of any clique is 1.

In Section 2 we describe our algorithm for finding subgraphs of density 1δ. We give a bound on the largest subgraph of density 1δ in the following Section 3. Finally, in Section 4, we present some open problems.

In this section, we describe our algorithm and give a relationship between the size of the subgraph obtained by the algorithm, and its density. In particular, we show that the algorithm can be used to obtain a subset of 2 log n vertices of density 0.951, with high probability.

Greedy Algorithm to pick a dense subgraph: Input: a random graph G(n, 1/2) and δ > 0. Output: a subset S ⊆ V of size k = 2 log n.

Notice that the algorithm first partitions all nodes into k random subsets of the same size, and then picks one vertex from each partition. This partitioning is necessary to argue about independence in our analysis of choosing vertices greedily.

In the analysis below, H(δ) is the standard notation of the Shannon entropy function, which is

The following lemma gives a lower bound on the number of edges we can expect to add to our subgraph, for the i-th vertex added by the algorithm. Lemma 2.1. For any 0 ≤ i ≤ k and δ i that satisfies

we have

Proof. We know by the previous results, that as long as k < log n, the vertex added has all edges to S k-1 . Consider k ≥ log n. The algorithm has n l vertices to choose from. The expected number of vertices among these, with at least (1δ k )k vertices is given by, Fix v ∈ V i+1 . The probability that v has at least (1

where

is the Shannon entropy (here log is taken with base 2).

Using independence of these events for different v ∈ V i+1 , we get

Therefore,

We now give a union bound over all k additions of vertices, using the previous lemma.

Lemma 2.2.

. . , V k are disjoint, using independence and Lemma 2.1 we get

The point is that we are picking exactly one vertex from each vertex set/partition, and hence do not lose any randomness or independence of the edges. This now gives us a bound on the minimum number of edges one can expect, w.h.p., in the chosen set of k vertices. We are not able to express, in a closed form, the size of a subgraph obtainable using this algorithm for a specific density. Therefore, we state the best density one can guarantee w.h.p. for k = 2 log n. This is stated as a theorem below, which we prove subsequently.

Theorem 2.3. Our algorithm produces a subset S ⊆ V of size k = 2 log n such that density (S) 0.951, almost surely.

Proof. From Lemma 2.2 we have that, almost surely,

where m = n/2k ln(log n). Here we use the fact that we can choose δ i = 0 for the first log m steps.

Now using Equations ( 1) and (2) we have

and computing an upper bound on the integral numerically. Proof. For every S ⊆ V of size k, define an indicator random variable X S as follows.

.

By linearity of expectation, the expected number of subgraphs of size k and density at least 1δ is

Therefore, by Markov inequality we have

as n → ∞. Or in other words, almost surely there is no subset of k vertices that induce a subgraph of density at least 1δ.

Notice that for density 0.951, the gap/ratio between the largest subgraph that exists and the largest subgraph that we can find is smaller than in the case of cliques. This is interesting, although not entirely unexpected as for density 0.5, the whole graph can be output. This ratio for density 0.951 is however significantly smaller than 2; it is 2.784/2 = 1.392.

For a concrete open problem, is there a polynomial time algorithm that outputs a subgraph of density 1ǫ and size 2

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut