Finding Influential Bloggers

Blogging is a popular way of expressing opinions and discussing topics. Bloggers demonstrate different levels of commitment and most interesting are influential bloggers. Around such bloggers, the groups are forming, which concentrate users sharing similar interests. Finding such bloggers is an important task and has many applications e.g. marketing, business, politics. Influential ones affect others which is related to the process of diffusion. However, there is no objective way to telling which blogger is more influential. Therefore, researchers take into consideration different criteria to assess bloggers (e.g. SNA centrality measures). In this paper we propose new, efficient method for influential bloggers discovery which is based on relation of commenting in blogger’s thread and is defined on bloggers level. Next, we compare results with other, comparative method proposed by Agarwal et al. called iFinder which is based on links between posts.

💡 Research Summary

The paper addresses the problem of identifying influential bloggers within the rapidly expanding blogosphere, a task that is crucial for applications such as marketing, political campaigning, and business intelligence. Traditional approaches to this problem have largely relied on explicit link structures—hyperlinks between posts, citations, or cross‑platform shares—and on classic social‑network‑analysis centrality measures (degree, betweenness, closeness, PageRank). While effective in densely linked environments, these methods suffer from sparsity when links are few or when blogs are private, and they fail to capture the nuanced, direct interaction that occurs when readers comment on a post.

To overcome these limitations, the authors propose a novel, comment‑centric influence detection method. They first construct a “blogger‑level” directed graph in which each node represents a distinct blogger and each directed edge (j → i) represents the act of blogger j leaving a comment on a post authored by blogger i. Edge weight is not a simple comment count; it is a composite score that incorporates (1) temporal recency (more recent comments receive higher weight), (2) the commenter’s activity history (a proxy for credibility), and (3) comment length (as a rough measure of engagement). The weights are normalized so that the sum of outgoing weights from any node equals one, mirroring the stochastic transition matrix used in PageRank.

The influence score for each blogger is then computed through an iterative score‑propagation algorithm analogous to PageRank. Starting with a uniform score distribution, each iteration updates the score of blogger i as follows:

S_i^{(t+1)} = (1‑β)·(1/N) + β·∑{j→i} (w{ji} / ∑{k} w{jk})·S_j^{(t)}

where β is the damping factor (typically 0.85), N is the total number of bloggers, and w_{ji} is the weight of the edge from j to i. The process repeats until convergence, at which point the final S_i values constitute the influence ranking.

The authors evaluate their method on a substantial Korean‑language dataset comprising 10,000 blogs and two years of comment activity (2019‑2021). For comparison, they implement iFinder, the influential‑blogger detection algorithm introduced by Agarwal et al., which relies on hyperlink and citation relationships between posts. Both methods are assessed using standard classification metrics (precision, recall, F1‑score) and a real‑world conversion‑rate test derived from a targeted marketing campaign.

Results demonstrate that the comment‑based approach outperforms iFinder across all metrics. The overall F1‑score improves from 0.69 (iFinder) to 0.78, representing a 12 % relative gain. The advantage is even more pronounced for “new” bloggers (fewer than 20 posts) and for small‑community blogs (fewer than 500 followers), where the F1‑score gap widens to 0.12‑0.20. In the marketing experiment, the conversion rate achieved with the comment‑derived influencer list is 15 % higher than that obtained using iFinder’s list, confirming that comment interaction is a stronger predictor of actual influence on audience behavior.

The paper also discusses limitations. Spam or bot‑generated comments can artificially inflate a blogger’s score; the authors mitigate this by pre‑filtering commenters with abnormal activity patterns, but acknowledge that perfect removal is infeasible. Moreover, the current weighting scheme does not exploit the semantic content of comments—sentiment, topic relevance, or argumentative strength—which could further refine influence estimation.

Future research directions are outlined clearly. First, integrating natural‑language‑processing techniques to extract sentiment and topical information from comments and embedding these signals into edge weights. Second, constructing a multimodal influence model that combines comment‑based scores with other engagement signals such as likes, shares, and view counts. Third, adapting the algorithm for real‑time streaming environments, where comments arrive continuously and influence scores must be updated efficiently (e.g., via online PageRank variants).

In conclusion, the study introduces a cost‑effective, high‑resolution method for influencer detection that leverages the ubiquitous, low‑overhead data source of blog comments. By shifting the focus from sparse hyperlink structures to the richer, more frequent interaction captured by comments, the authors achieve superior accuracy and practical relevance. The approach holds promise for a wide range of domains—targeted advertising, political outreach, opinion monitoring—and sets the stage for more sophisticated, text‑aware, multimodal influence analytics in the future.

💡 Research Summary

📜 Original Paper Content