Link Prediction with Social Vector Clocks
State-of-the-art link prediction utilizes combinations of complex features derived from network panel data. We here show that computationally less expensive features can achieve the same performance in the common scenario in which the data is available as a sequence of interactions. Our features are based on social vector clocks, an adaptation of the vector-clock concept introduced in distributed computing to social interaction networks. In fact, our experiments suggest that by taking into account the order and spacing of interactions, social vector clocks exploit different aspects of link formation so that their combination with previous approaches yields the most accurate predictor to date.
💡 Research Summary
The paper addresses the problem of predicting future links in social networks when the underlying data consist of a stream of time‑stamped dyadic interactions (e.g., emails, phone calls, Twitter mentions). Traditional link‑prediction approaches usually convert such event streams into a series of static network snapshots (panel data) and then apply a large toolbox of structural similarity measures (common neighbors, Adamic‑Adar, preferential attachment, etc.). This conversion discards the precise ordering and spacing of events, which can be crucial for anticipating new connections, especially in communication settings where reciprocity and indirect information flow are strong predictors.
To exploit the fine‑grained temporal information directly, the authors introduce Social Vector Clocks (SVC), an adaptation of the vector‑clock mechanism from distributed computing. In a classic vector clock each node maintains a vector of timestamps indicating the most recent information it could have received from every other node, updated whenever a communication event occurs. The naïve implementation requires O(N²) space and O(N) work per event, which is infeasible for large social graphs. The authors therefore propose two key optimizations tailored to social networks:
- Limited horizon – because social graphs exhibit a small‑world property, only nodes within a bounded hop‑distance k (e.g., 2‑3 hops) are likely to exchange useful information. Each node therefore stores temporal views only for those neighbors, reducing space to O(N·k).
- Direction‑aware updates – for directed communications (email, tweets) only the receiver’s clock is updated; for bidirectional interactions (phone calls, meetings) both parties update. This avoids unnecessary propagation of timestamps.
With these optimizations, SVC can be maintained online in linear time with respect to the number of events and the chosen horizon k. From the raw vector‑clock data the authors derive three lightweight features for each dyad:
- Latency – the difference between the current time and the latest timestamp that the source could possibly have about the target. Small latency indicates that the source is up‑to‑date with the target’s information.
- Indirect Update Count – the number of times information about one node has reached the other via intermediate nodes (e.g., A → C → B). This captures gossip‑like diffusion that is invisible to direct‑interaction counts.
- Recency Gradient – the proportion of recent interactions (within a short window) relative to all past interactions, providing a measure of how “fresh” the relationship is.
These temporally‑aware features are combined with standard structural predictors in a supervised learning framework. The authors adopt the “realization” protocol: the event stream is split into multiple non‑overlapping intervals, each providing a training set (
Comments & Academic Discussion
Loading comments...
Leave a Comment