A Community-Based Sampling Method Using DPL for Online Social Network

A Community-Based Sampling Method Using DPL for Online Social Network
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper, we propose a new graph sampling method for online social networks that achieves the following. First, a sample graph should reflect the ratio between the number of nodes and the number of edges of the original graph. Second, a sample graph should reflect the topology of the original graph. Third, sample graphs should be consistent with each other when they are sampled from the same original graph. The proposed method employs two techniques: hierarchical community extraction and densification power law. The proposed method partitions the original graph into a set of communities to preserve the topology of the original graph. It also uses the densification power law which captures the ratio between the number of nodes and the number of edges in online social networks. In experiments, we use several real-world online social networks, create sample graphs using the existing methods and ours, and analyze the differences between the sample graph by each sampling method and the original graph.


💡 Research Summary

The paper addresses the challenge of sampling large online social network graphs while preserving both the node‑edge ratio and the overall topology of the original network. Existing sampling techniques—random node selection, random edge selection, and exploration‑based methods—typically determine sample size based on either the number of nodes or edges, ignoring the densification behavior observed in real‑world social networks. Consequently, sampled graphs often exhibit a node‑edge ratio that deviates from the original and may fail to capture community structures, leading to inconsistent results across multiple samples.

To overcome these limitations, the authors propose a novel sampling framework that integrates two key concepts: hierarchical community extraction and the Densification Power Law (DPL). First, the original graph is partitioned into densely connected sub‑graphs (communities) using a community detection algorithm (e.g., Louvain or Infomap). The resulting hierarchy is represented as a dendrogram, which records parent‑child relationships among communities. This hierarchical decomposition ensures that local topological features are explicitly identified.

Second, the DPL, which states that the number of edges e scales with the number of nodes n as e ∝ n^α (with 1 < α < 2), is employed to guide the selection of both nodes and edges within each community. For every community, the exponent α is estimated from its actual node and edge counts. When a global sample size is specified (either as a fraction of nodes or edges), the method allocates a proportional number of nodes to each community. Nodes are then sampled with probability proportional to their degree, preventing the over‑representation of high‑degree hubs while still favoring structurally important vertices. The required number of edges for the community sample is computed using the community‑specific α, and only edges present in the original graph between the selected nodes are retained.

After constructing sample sub‑graphs for all communities, the algorithm merges them in a bottom‑up fashion following the dendrogram. During this merging, inter‑community connections are re‑established according to the hierarchy, ensuring that the global connectivity mirrors that of the original network. The final sampled graph therefore respects the DPL‑derived node‑edge ratio both locally (within each community) and globally (across the whole network), and it preserves the hierarchical community structure.

The authors evaluate their approach on several real‑world social networks, including Facebook, Twitter, and YouTube datasets. They compare against seven baseline samplers: Random Node (RN), Random Degree Node (RDN), Random Edge (RE), Random Node‑Edge (RNE), Random Walk (RW), Random Jump (RJ), and Forest Fire (FF). Five structural properties are measured: degree distribution, singular value distribution, singular vector distribution, average clustering coefficient distribution, and hop‑distance distribution. Differences between sampled and original graphs are quantified using the Kolmogorov‑Smirnov D‑statistic. Across all datasets and metrics, the proposed method yields the smallest D‑values, indicating the closest match to the original graphs. Notably, the sampled graphs maintain a node‑edge ratio that aligns with the DPL prediction, and the community‑aware sampling avoids the locality bias inherent in exploration‑based methods.

In summary, the paper contributes (1) a DPL‑driven mechanism for determining sample size that preserves the characteristic densification of social networks, (2) a hierarchical community‑based sampling strategy that captures both local and global topology, and (3) extensive empirical evidence demonstrating superior fidelity over existing techniques. Limitations include the computational overhead of community detection and the sensitivity of α estimation, suggesting future work on scalable community extraction and extensions to dynamic or streaming networks.


Comments & Academic Discussion

Loading comments...

Leave a Comment