Predicting Social Links for New Users across Aligned Heterogeneous Social Networks

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Online social networks have gained great success in recent years and many of them involve multiple kinds of nodes and complex relationships. Among these relationships, social links among users are of great importance. Many existing link prediction methods focus on predicting social links that will appear in the future among all users based upon a snapshot of the social network. In real-world social networks, many new users are joining in the service every day. Predicting links for new users are more important. Different from conventional link prediction problems, link prediction for new users are more challenging due to the following reasons: (1) differences in information distributions between new users and the existing active users (i.e., old users); (2) lack of information from the new users in the network. We propose a link prediction method called SCAN-PS (Supervised Cross Aligned Networks link prediction with Personalized Sampling), to solve the link prediction problem for new users with information transferred from both the existing active users in the target network and other source networks through aligned accounts. We proposed a within-target-network personalized sampling method to process the existing active users’ information in order to accommodate the differences in information distributions before the intra-network knowledge transfer. SCAN-PS can also exploit information in other source networks, where the user accounts are aligned with the target network. In this way, SCAN-PS could solve the cold start problem when information of these new users is total absent in the target network.

💡 Research Summary

The paper tackles the problem of predicting social links for newly‑joined users (the “cold‑start” scenario) by leveraging information from both the target social network and one or more aligned source networks. Traditional link‑prediction methods assume that all nodes share the same data distribution and that sufficient historical activity is available for each user. In practice, new users have very sparse or even zero activity in the target network, and their activity distribution differs markedly from that of long‑standing (“old”) users. Consequently, models trained on old users perform poorly when applied to new users.

To address these challenges, the authors propose SCAN‑PS (Supervised Cross Aligned Networks link prediction with Personalized Sampling). The method consists of four main components:

Heterogeneous Feature Extraction – From each network, four categories of features are derived: (i) social structural features (Common Neighbor, Jaccard, Adamic/Adar), (ii) spatial distribution features (inner product, cosine similarity, Euclidean distance of location vectors, and location‑based CN/Jaccard), (iii) temporal activity features (time‑slot overlap measures), and (iv) textual content features (TF‑IDF vectors of words used in posts). The networks are modeled as heterogeneous graphs containing users, locations, timestamps, and words, with corresponding edge types (friendship, check‑in, temporal, word‑usage).
Personalized Sampling within the Target Network – Instead of random sampling of old users for training, the method computes a similarity score between each old user and the new‑user population based on the extracted features (e.g., degree, activity frequency). Old users that are statistically closer to new users receive higher sampling probabilities. This “personalized” sampling creates a training subset whose feature distribution better matches that of the new users, mitigating distribution shift and reducing bias.
Cross‑Network Knowledge Transfer via Anchor Links – Users often maintain accounts on multiple platforms (e.g., Twitter, Foursquare). Anchor links explicitly map accounts belonging to the same real‑world individual across networks. By aligning the target network with one or more source networks through these directed anchor links, the method imports the rich feature vectors of a new user’s counterpart in the source network(s). The imported features are concatenated (after normalization and optional dimensionality reduction) with the sampled target‑network features, yielding a comprehensive representation even when the target‑network side is empty.
Supervised Learning and Prediction – The combined feature vectors and binary labels indicating the existence of a social link are fed into a supervised classifier (the authors evaluate logistic regression, random forests, and gradient‑boosted trees). The learned function f(L) → {0,1} predicts whether a potential link between a new user and any other user in the target network should exist.

Experimental Evaluation – The authors conduct extensive experiments on two real‑world aligned heterogeneous networks: Twitter and Foursquare. New users are categorized by “newness” (accounts created within 1 week, 1 month, 3 months). Baselines include classic structural predictors (CN, JC, AA), pseudo‑cold‑start methods, and cross‑network transfer approaches that do not employ personalized sampling. Performance is measured using AUC, Precision@K, and Recall@K.

Key findings:

SCAN‑PS consistently outperforms all baselines across all newness levels, achieving AUC improvements of 7–12 percentage points.
In the extreme cold‑start case where a new user has no target‑network activity, SCAN‑PS still reaches an AUC of ~0.78 solely by exploiting source‑network features, whereas traditional methods fall near random guessing (AUC ≈ 0.55).
The personalized sampling component alone yields noticeable gains over naïve random sampling, confirming that aligning feature distributions is crucial.
Adding source‑network information on top of personalized sampling further boosts performance, demonstrating the complementary nature of intra‑network and inter‑network knowledge transfer.

Contributions and Limitations – The paper’s primary contributions are: (1) a quantitative analysis of distributional differences between new and old users and a sampling strategy to correct it, (2) a framework for cross‑network knowledge transfer via anchor links in heterogeneous graphs, and (3) an integrated model (SCAN‑PS) that achieves state‑of‑the‑art results on large‑scale real data. Limitations include reliance on the availability and accuracy of anchor links, reduced benefit when source networks are themselves sparse, and the use of hand‑crafted features rather than end‑to‑end graph neural networks.

Future Directions – The authors suggest extending the approach to (i) automatically infer or refine anchor links with probabilistic models, (ii) learn adaptive weighting between target and source features via domain‑adaptation techniques, (iii) replace the feature‑based pipeline with heterogeneous graph neural networks that can jointly learn representations and perform link prediction, and (iv) develop lightweight, online versions capable of real‑time recommendation for newly arriving users.

In summary, SCAN‑PS offers a practical and theoretically grounded solution to the cold‑start link‑prediction problem by harmonizing intra‑network sampling with inter‑network transfer, thereby improving the early social experience of new users and strengthening overall network connectivity.

Predicting Social Links for New Users across Aligned Heterogeneous Social Networks

💡 Research Summary

Comments & Academic Discussion

Leave a Comment