Maximizing Influence Propagation in Networks with Community Structure
We consider the algorithmic problem of selecting a set of target nodes that cause the biggest activation cascade in a network. In case when the activation process obeys the diminishing returns property, a simple hill-climbing selection mechanism has been shown to achieve a provably good performance. Here we study models of influence propagation that exhibit critical behavior, and where the property of diminishing returns does not hold. We demonstrate that in such systems, the structural properties of networks can play a significant role. We focus on networks with two loosely coupled communities, and show that the double-critical behavior of activation spreading in such systems has significant implications for the targeting strategies. In particular, we show that simple strategies that work well for homogeneous networks can be overly sub-optimal, and suggest simple modification for improving the performance, by taking into account the community structure.
💡 Research Summary
The paper tackles the classic influence‑maximization problem, which asks for a set of seed nodes that triggers the largest possible cascade of activations in a network. In the well‑studied submodular setting—where the activation process exhibits diminishing returns—a simple greedy (hill‑climbing) algorithm is known to achieve a (1 − 1/e)‑approximation guarantee. The authors argue that many realistic diffusion processes do not satisfy submodularity; instead they display critical behavior, meaning that the expected cascade size remains near zero until the number of seeds crosses a certain threshold, after which activation explodes.
Focusing on networks composed of two loosely coupled communities, the authors reveal a “double‑critical” phenomenon. Each community has its own internal percolation threshold (θ₁, θ₂) that is relatively low because of dense intra‑community links. The sparse inter‑community edges, however, impose a much higher effective threshold for spreading across the community boundary. Consequently, if seeds are placed predominantly in one community, the diffusion saturates locally and fails to cross the boundary, even when the total number of seeds is well above the global optimum. In contrast, a balanced allocation of seeds—at least θ₁ in community 1 and θ₂ in community 2—or the inclusion of a few bridge nodes can push the system past both local thresholds, enabling a cascade that spans the entire network.
The paper formalizes this intuition. Let k be the total seed budget, and let k₁, k₂ denote the numbers allocated to the two communities (k₁ + k₂ = k). The expected cascade size σ(k₁, k₂) is a non‑submodular function of (k₁, k₂) that exhibits a sharp jump when both k₁ ≥ θ₁ and k₂ ≥ θ₂ hold. The standard greedy algorithm, which selects the node with the largest marginal gain at each step, is blind to this joint condition and therefore tends to concentrate seeds in the community with the higher immediate marginal gain. This myopic behavior leads to sub‑optimal solutions that fall short of the global critical point k* = θ₁ + θ₂.
To overcome this limitation, the authors propose two complementary modifications to the greedy framework:
-
Community‑aware seed budgeting – Before the greedy selection begins, the algorithm estimates the internal thresholds θ₁ and θ₂ (using small pilot simulations or structural proxies such as intra‑community density). It then enforces a minimum quota of seeds for each community, ensuring that the allocation respects the double‑critical condition.
-
Bridge‑node prioritization – The algorithm identifies a small set of nodes that lie on or near the inter‑community cut (high betweenness, edge‑boundary nodes, or simply nodes with many cross‑community edges). These nodes are inserted early in the seed list, effectively lowering the inter‑community barrier ε and facilitating cross‑community diffusion.
Both adjustments can be incorporated into the classic greedy loop without changing its asymptotic runtime (still O(k·|E|)). The authors evaluate the enhanced method on synthetic Erdős–Rényi and scale‑free graphs, as well as on real‑world social networks (Facebook and Twitter subgraphs) and a biological protein‑interaction network. Diffusion is simulated using a threshold‑based SIR model and a more complex mixed contagion model that captures both simple and complex contagion effects.
Results consistently show that the community‑aware greedy algorithm outperforms the naïve greedy baseline by 20 %–35 % in total cascade size when the inter‑community edge density is low and intra‑community density is high—precisely the regime where the double‑critical effect is strongest. Sensitivity analyses demonstrate that modest errors (≈10 %) in estimating θ₁ and θ₂ do not substantially degrade performance, indicating robustness to imperfect prior knowledge. Moreover, even a simple heuristic that selects the top‑few bridge nodes (based on degree or betweenness) yields most of the gain, suggesting that sophisticated centrality measures are not strictly necessary.
The study’s broader implication is that influence maximization need not be confined to submodular diffusion models. By explicitly accounting for structural heterogeneity—particularly community partitions and the strength of inter‑community ties—one can design seed selection strategies that respect the underlying critical dynamics and achieve near‑optimal cascades in non‑submodular settings. The authors conclude with several avenues for future work: extending the analysis to networks with more than two communities, handling dynamic or time‑varying graphs, and developing online algorithms that learn thresholds on the fly while selecting seeds.
Comments & Academic Discussion
Loading comments...
Leave a Comment