A Community-Aware Framework for Influence Maximization with Explicit Accounting for Inter-Community Influence

Reading time: 15 minute
...

๐Ÿ“ Original Info

  • Title: A Community-Aware Framework for Influence Maximization with Explicit Accounting for Inter-Community Influence
  • ArXiv ID: 2512.23973
  • Date: 2025-12-30
  • Authors: Eliot W. Robson, Abhishek K. Umrawal

๐Ÿ“ Abstract

Influence Maximization (IM) seeks to identify a small set of seed nodes in a social network to maximize expected information spread under a diffusion model. While communitybased approaches improve scalability by exploiting modular structure, they typically assume independence between communities, overlooking inter-community influence-a limitation that reduces effectiveness in real-world networks. We introduce Community-IM++, a scalable framework that explicitly models cross-community diffusion through a principled heuristic based on community-based diffusion degree (CDD) and a progressive budgeting strategy. The algorithm partitions the network, computes CDD to prioritize bridging nodes, and allocates seeds adaptively across communities using lazy evaluation to minimize redundant computations. Experiments on large real-world social networks under different edge weight models show that Community-IM++ achieves near-greedy influence spread at up to 100 times lower runtime, while outperforming Community-IM and degree heuristics across budgets and structural conditions. These results demonstrate the practicality of Community-IM++ for largescale applications such as viral marketing, misinformation control, and public health campaigns, where efficiency and cross-community reach are critical. The rapid growth of social media has transformed how information, ideas, and products spread across society, influencing domains as diverse as marketing, public health, and civic engagement (Evans and McKee 2010) . Organizations increasingly leverage social networks not only for advertising but also for socially beneficial campaigns-such as promoting healthy behaviors, spreading factual information, and raising awareness about critical issues (Goldenberg, Libai, and Muller 2001a; Pan, Deng, and Shen 2015) . A key challenge in these efforts is identifying a small set of individuals โ€  Part of this work was completed while Eliot W. Robson was a Ph.D. candidate at the

๐Ÿ“„ Full Content

Motivation whose adoption of a message or product can trigger a large cascade of influence throughout the network.

In real-world scenarios, individuals who act as bridges between communities-often characterized by high betweenness centrality (Freeman 1977)-play a critical role in diffusion. For example, community leaders active in multiple social groups can accelerate vaccination campaigns; factcheckers who span political communities can curb misinformation during elections; and influencers who engage across diverse interest groups can amplify marketing campaigns beyond niche audiences. Similarly, in disaster response, volunteers connected to multiple local communities can disseminate emergency alerts more effectively. These cases highlight the importance of modeling inter-community influence when designing strategies for information spread.

This challenge is formalized as the Influence Maximization (IM) problem, introduced by Domingos and Richardson (2001): “If we can convince a subset of individuals in a social network to adopt a new product or innovation, and aim to trigger a large cascade of further adoptions, which individuals should we target?” Formally, the goal is to select k seed nodes to maximize the expected number of influenced nodes under a given diffusion model. Kempe, Kleinberg, and Tardos (2003) showed that the IM problem is NP-hard, motivating a rich body of research on scalable algorithms. While greedy algorithms offer near-optimal solutions, they rely on costly Monte Carlo simulations, making them impractical for large networks. Heuristic methods improve scalability but often sacrifice accuracy.

To address this trade-off, Umrawal, Quinn, and Aggarwal (2023) proposed a community-aware divide-and-conquer framework that partitions the network into communities, optimizes within each, and combines results efficiently. This approach improves runtime while maintaining competitive influence spread. However, a key limitation remains: it overlooks inter-community influence, which can be substantial in real-world networks where information often crosses boundaries, potentially leading to suboptimal strategies.

Our work addresses this gap by introducing a framework that explicitly models inter-community influence, thereby enabling more effective diffusion strategies for applications ranging from viral marketing to public health interventions.

Influence Maximization has been studied extensively under various diffusion models and algorithmic paradigms. Early heuristics such as degree centrality and degree discount (Kempe, Kleinberg, and Tardos 2015;Chen, Wang, and Yang 2009) are computationally efficient but lack theoretical guarantees. Under the independent cascade (IC) model (Goldenberg, Libai, and Muller 2001a), Kempe, Kleinberg, and Tardos (2015) proposed a greedy algorithm with a (1 -1/e)-approximation guarantee, later optimized by CELF and CELF++ (Leskovec et al. 2007;Goyal, Bonchi, and Lakshmanan 2011). Despite these improvements, scalability remains a challenge for large networks.

Community-aware approaches, such as Community-IM (Umrawal and Aggarwal 2023;Umrawal, Quinn, and Aggarwal 2023), exploit modular structure to improve efficiency by partitioning the network and applying local optimization. While effective, these methods assume independence between communities, ignoring cross-community influence-a limitation that reduces their applicability in networks with significant inter-community interactions.

Recent work also explores data-driven models (Goyal, Bonchi, and Lakshmanan 2011;Pan, Deng, and Shen 2015), fractional budget allocation (Chen, Wu, and Yu 2020;Umrawal, Aggarwal, and Quinn 2023;Bhimaraju et al. 2024), and online settings (Lei et al. 2015;Wen et al. 2017;Vaswani et al. 2017;Agarwal et al. 2022;Nie et al. 2022). However, few approaches explicitly address the interplay between community structure and cross-community diffusion, which is central to our contribution.

We propose Community-IM++, an extension of communityaware IM frameworks that incorporates a heuristic for intercommunity influence. Our contributions are threefold:

  1. Modeling: We introduce a principled heuristic to capture cross-community diffusion under the IC model. 2. Algorithmic Framework: We integrate this heuristic into a scalable divide-and-conquer approach, maintaining efficiency while improving influence spread. 3. Empirical Analysis: We evaluate Community-IM++ on real-world networks, comparing it against state-of-theart baselines and analyzing its behavior under different structural and diffusion conditions.

The remainder of the paper is organized as follows: Section 2 presents preliminaries and problem formulation. Section 3 introduces our inter-community influence estimation method. Section 4 details the Community-IM++ framework. Section 5 reports experimental results and insights. Finally, Section 6 concludes with future directions.

In this section, we provide the required preliminaries and formally state the problem of interest.

Several models of diffusion over social networks have been proposed in the literature. In this work, we focus on the independent cascade (IC) model (Goldenberg, Libai, and Muller 2001a,b). While other models such as the linear threshold (Granovetter 1978;Schelling 2006) and pressure threshold (Stutsman, Robson, and Umrawal 2025), exist, our focus on IC is motivated by the fact that the proposed heuristic is rigorously defined under this model.

In the IC model, we are given a graph G = (V, E) and the random process begins at time 0 with an initial set S of active nodes, called the seed set. When a node v โˆˆ S first becomes active at time t, it has a single chance to activate each of its inactive neighbors w, succeeding with probability p v,w independently of prior history. If the activation succeeds, w becomes active at time t + 1. Regardless of the outcome, v cannot attempt to activate w again. The process continues until no new nodes are activated and is progressive, meaning nodes never revert from active to inactive.

For the IC diffusion process, define the collection of random variables {Y

The influence ฯƒ (S) of a set S is then defined as Kempe, Kleinberg, and Tardos (2015) showed that ฯƒ (S) is a monotone non-decreasing submodular set function under the IC model.

Problem 1. The influence maximization (IM) problem is formally defined as:

3 Estimating the Influence

Computing the exact value of ฯƒ (โ€ข) is computationally expensive and, in fact, #P -hard. In this section, we define the estimator used in our framework and introduce the concept of diffusion degree to account for inter-community influence.

At any time t, a node v โˆˆ V can be either active or inactive. We denote the process with the random variables

is the indicator random variable for v being active at the end of the process with seed set S. These random variables are defined over the sample space G of edge activation functions.

The influence ฯƒ (S) is defined as the expected number of active nodes at the end of the cascade, given that S is the seed set:

Since computing ฯƒ (S) exactly is #P -hard (Kempe, Kleinberg, and Tardos 2015), we use the following Monte Carlo estimator:

where g 1 , . . . , g k โˆˆ G are sampled uniformly at random. In other words, we approximate influence by averaging the number of activated nodes over k independent simulations.

To explicitly account for inter-community influence-ignored in prior work (Umrawal, Quinn, and Aggarwal 2023)-we introduce the concept of diffusion degree. Definition 3.1. For a node v โˆˆ V , the diffusion degree, denoted ฯƒ (2) (v), is the expected influence of v on nodes within distance two in G. Under the independent cascade model:

v,u be all paths from v to u of length at most two. Then:

(1 -P (P )) , where P (P ) denotes the probability that all edges of path P are active.

Proof. For a fixed u โˆˆ N (2) (v), let P

(2) v,u represent the events that each path from v to u is live. Because u is at most two edges away from v, these paths are disjoint, and thus the events are independent. Define I v as the event that S = {v}. Then:

(1 -P (P )).

By linearity of expectation:

(1 -P (P )) .

Remark 3.3. Our formulation of diffusion degree differs from the original definition by Pal, Kundu, and Murthy ( 2014), but we retain the name for conceptual similarity. Unlike the original, our definition accounts for multiple independent paths to a node. Remark 3.4. Restricting to paths of length at most two is computationally significant: it guarantees path independence, making the heuristic efficient to compute while capturing key inter-community effects.

In this section, we introduce the proposed Community-IM++ framework to solve Problem 1. Our approach builds on the Community-IM framework introduced by Umrawal, Quinn, and Aggarwal ( 2023), but explicitly accounts for inter-community influence using the machinery developed in Section 3. The key contribution is the integration of a heuristic estimator that prioritizes nodes likely to spread influence across community boundaries, addressing a critical limitation of prior work.

Given a graph G = (V, E) with edge activation probabilities E : E โ†’ [0, 1], our algorithm proceeds in four steps:

(1) Obtain a hard partition V of the input graph G and E into disjoint communities.

(2) From this partition, compute a linear set function ฯ V :

V โ†’ R as a heuristic to account for inter-community influence.

(3) Construct the final seed set by lazily querying each community for the node with the highest marginal gain until the total budget is exhausted, accounting for intercommunity influence of each node using ฯ V . Community-IM++ differs from Community-IM primarily through Step (2), where we introduce a principled heuristic to capture cross-community diffusion. This addition is motivated by real-world scenarios where bridging nodes-those connecting otherwise disconnected communities-play a disproportionate role in spreading information.

We partition

A common measure of partition quality is the modularity score:

where m c is the total internal edge weight of V i , m is the total edge weight of G, and K i is the weighted degree sum of nodes in V i . The resolution parameter ฮณ controls granularity: ฮณ < 1 favors larger communities, while ฮณ > 1 favors smaller ones.

We adopt the Leiden algorithm (Traag, Waltman, and Van Eck 2019), an efficient modularity-based method known for high-quality partitions and scalability. This choice aligns with prior findings that community-aware approaches improve runtime without sacrificing influence spread (Umrawal, Quinn, and Aggarwal 2023).

Once V is computed, we define an estimator for influence within each community:

where {g (i) j } j are samples from the edge activation distribution restricted to V i . The term (1 + ฯ V (v)) adjusts for inter-community influence, ensuring that nodes with higher cross-community potential receive greater weight. Lemma 4.1. For a partition V, the estimator

Proof. If the independence condition holds, we can assign

{v} , and incorporate this into ฯ V (v). The claim follows from the linearity of expectation.

Community-Based Diffusion Degree. We instantiate ฯ V using the community-based diffusion degree (CDD), defined as:

where v โˆˆ V i . Intuitively, CDD V (v) measures the expected activation of nodes outside v’s community within two hops. This choice is computationally efficient and captures key inter-community effects, as discussed in Section 3.

Justification. Restricting to two hops balances accuracy and scalability: it preserves independence assumptions while prioritizing nodes that bridge communities. Empirically, such nodes often correspond to high-betweenness actors (Freeman 1977), which play critical roles in real-world diffusion scenarios.

Social Relevance. Nodes with high diffusion degree often correspond to individuals who bridge communities, similar to those with high betweenness centrality (Freeman 1977). In real-world networks, such nodes play a critical role in spreading information across otherwise disconnected groups. For example, community leaders active in multiple social circles can accelerate vaccination campaigns, and influencers who span diverse interest groups can amplify marketing or awareness efforts. By incorporating diffusion degree into influence estimation, our framework prioritizes these bridging nodes, enabling strategies that better reflect real-world diffusion dynamics.

After computing ฯ V , we generate nested solutions for each community and combine them using a progressive budgeting algorithm (Umrawal, Quinn, and Aggarwal 2023). This approach allocates seeds incrementally across communities based on marginal gains, ensuring near-optimal influence spread under budget constraints.

Implementation Details. Algorithm 1 relies on the observation that marginal gains within each community decrease monotonically. We exploit this property by implementing progressive budgeting with lazy evaluation: each community is queried for the next unselected node with the largest marginal gain only when needed. This approach avoids a substantial number of redundant computations, enabling our algorithm to outperform CELF in practice.

Our implementation uses Python coroutines to manage lazy queries efficiently. While this optimization limits parallelism, it reduces overall work performed and memory overhead. CELF (Leskovec et al. 2007) is used as the subroutine for computing marginal gains within each community.

Algorithm 1: Lazy Progressive-Budgeting

k m โ† k m + 1 {Update budget for community m} 8: end for 9: S * โ† c i=1 S i,ki {Final seed set} 10: return S *

We evaluated the performance of our Community-IM++ framework using real-world social networks. This section describes the datasets and their properties, comparison algorithms, experimental setup, and results with discussion.

The real-world network data was obtained from the Stanford Large Network Dataset Collection (Leskovec and Krevl 2014). Downloading and caching were automated using the Pooch library (Uieda et al. 2020). Table 1 summarizes key structural properties of the networks used in our experiments, including node count, edge count, average degree, and modularity for partitions obtained using the Leiden algorithm (Traag, Waltman, and Van Eck 2019). These properties are relevant to diffusion dynamics and motivate the use of community-aware approaches.

High modularity values indicate strong community structure, motivating the use of community-aware algorithms. Amazon, with a modularity of 0.91, is the most modular network, showing tightly clustered product communities that make cross-community influence challenging. DBLP (0.81) also exhibits strong modularity typical of academic collaboration networks, while Deezer (0.65) reflects substantial For edge weights, we use the weighted cascade (WC) model (Kempe, Kleinberg, and Tardos 2015) and the trivalency (TV) model (Goyal, Bonchi, and Lakshmanan 2011). In WC, each in-edge for a node v is set to 1/in-degree(v); in TV, each edge weight is drawn uniformly from {0.1, 0.01, 0.001}.

We compare Community-IM++ against:

  1. Community-IM (Umrawal, Quinn, and Aggarwal 2023): A community-aware framework ignoring intercommunity influence. Note that our benchmarks for this algorithm use the performance optimization described in Algorithm 1 to provide a more equitable comparison with Community-IM++. 2. CELF (Leskovec et al. 2007): An optimized greedy algorithm with (1 -1/e) approximation guarantees. 3. Degree (Kempe, Kleinberg, and Tardos 2015): A simple heuristic selecting nodes with highest degree.

All algorithms were implemented in Python using CyNetDiff (Robson, Reddy, and Umrawal 2024) for efficient diffusion simulation. As mentioned in Section 4, we used the Leiden algorithm (Traag, Waltman, and Van Eck 2019) to detect the communities forming hard partitions of the networks under consideration.

Budgets tested: k โˆˆ {5, 20, 100, 200, 400}. Influence was estimated as the average number of activated nodes over 10,000 Monte Carlo simulations per seed set, with 95% confidence intervals reported. Hardware: 8-core Intel Xeon E5-1660v3 CPU @ 3GHz, 64GB RAM. Software: Python 3.12.5, NetworkX (Hagberg, Schult, and Swart 2008) for graph storage and conversion.

Figures 1 and2 show influence spread under the WC and TV models, respectively. The x-axis represents the seed budget k, and the y-axis shows the expected number of activated nodes. Each curve corresponds to an algorithm: Degree (purple), CELF (blue), Community-IM (orange), and Community-IM++ (green). Influence was estimated as the average number of activated nodes over 10,000 Monte Carlo simulations per seed set, making estimation error negligible.

Figures 3 and4 report runtime performance for the same algorithms. The x-axis represents the seed budget k, and the y-axis shows execution time in seconds.

The experimental results reveal several key insights:

Influence Spread. Under WC, Community-IM++ outperforms Degree and Community-IM across all budgets, and approaches CELF’s influence at a fraction of the cost. Under TV, Community-IM++ surpasses all baselines, including CELF, for larger budgets-highlighting the heuristic’s strength in heterogeneous edge-weight settings.

Runtime and Scalability. CELF’s runtime grows rapidly with budget size, becoming impractical for large networks. Community-IM++ exhibits near-constant growth, even for k = 400, due to progressive budgeting and lazy evaluation.

Cost-Benefit Analysis. CELF offers marginally higher influence under WC, but at a runtime penalty exceeding 100 times. For practical applications-viral marketing, misinformation control-Community-IM++ provides near-CELF influence at a fraction of the cost.

Summary of Contributions. This work advances community-aware influence maximization by introducing a heuristic that captures inter-community diffusion, addressing a key limitation of prior frameworks. By integrating community-based diffusion degree into a divideand-conquer approach and coupling it with progressive budgeting, Community-IM++ achieves influence spread comparable to CELF while reducing runtime by orders of magnitude. Our experiments across networks with varying modularity confirm that gains are most pronounced in highly modular structures, where bridging nodes play a pivotal role.

Limitations. While promising, the current implementation is limited to the independent cascade model and a fixed two-hop assumption for inter-community influence. These design choices, while computationally efficient, may restrict generalizability to networks with lower modularity or more complex diffusion dynamics.

Future Work. Future directions include generalization to other diffusion models, such as linear threshold and pressure-based models; ablation and sensitivity analyses of heuristic parameters, including community resolution and hop length; and robustness under alternative community detection methods and overlapping community structures.

6:

๐Ÿ“ธ Image Gallery

page_1.png page_2.png page_3.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut