Node discovery problem for a social network
Methods to solve a node discovery problem for a social network are presented. Covert nodes refer to the nodes which are not observable directly. They transmit the influence and affect the resulting collaborative activities among the persons in a social network, but do not appear in the surveillance logs which record the participants of the collaborative activities. Discovering the covert nodes is identifying the suspicious logs where the covert nodes would appear if the covert nodes became overt. The performance of the methods is demonstrated with a test dataset generated from computationally synthesized networks and a real organization.
💡 Research Summary
The paper addresses the “node discovery problem” in social networks, which seeks to identify covert (unobservable) nodes that influence collaborative activities but do not appear in surveillance logs. Such covert nodes are typical in clandestine groups (terrorist cells, criminal syndicates) where leaders deliberately hide their participation. The authors formalize the problem: the full set of nodes is split into observable nodes O and covert nodes C. Each collaborative activity generates a pattern δi (a set of participants). Surveillance logs record only the observable part di = δi ∩ O, which can be represented as a binary D × N matrix. The goal is to find logs where di ≠ δi, i.e., logs that would look different if covert nodes were overt.
Two solution approaches are proposed.
-
Heuristic method – All nodes appearing in the logs are clustered using Jaccard similarity and a k‑medoids (or alternative) algorithm into C clusters, assuming prior knowledge of the number of covert groups. For each log di and each cluster cl, a weight w(di, cl) = max_{nj∈cl} B(nj∈di) / Σi B(nj∈di) is computed, where B is a Boolean indicator. A simple ranking function s(di) = Σl B(di ∩ cl ≠ ∅) assigns higher scores to logs that intersect many clusters. The method is computationally light (linear in the number of nodes and logs) and works well when the underlying network exhibits clear community structure, but it does not exploit any explicit transmission model and can be inaccurate if the network topology is unknown or the clustering assumption fails.
-
Statistical inference method – The influence transmission is modeled probabilistically: each node j has an initiator probability fj, and each ordered pair (j, k) has a transmission probability rjk. These parameters are collected into a single vector θ. The likelihood of the observed logs is L(θ) = Σi log p(di | θ), where p(di | θ) = Σj dij fj ∏_{k≠j}
Comments & Academic Discussion
Loading comments...
Leave a Comment