Predicting User Engagement in Twitter with Collaborative Ranking

Predicting User Engagement in Twitter with Collaborative Ranking
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Collaborative Filtering (CF) is a core component of popular web-based services such as Amazon, YouTube, Netflix, and Twitter. Most applications use CF to recommend a small set of items to the user. For instance, YouTube presents to a user a list of top-n videos she would likely watch next based on her rating and viewing history. Current methods of CF evaluation have been focused on assessing the quality of a predicted rating or the ranking performance for top-n recommended items. However, restricting the recommender system evaluation to these two aspects is rather limiting and neglects other dimensions that could better characterize a well-perceived recommendation. In this paper, instead of optimizing rating or top-n recommendation, we focus on the task of predicting which items generate the highest user engagement. In particular, we use Twitter as our testbed and cast the problem as a Collaborative Ranking task where the rich features extracted from the metadata of the tweets help to complement the transaction information limited to user ids, item ids, ratings and timestamps. We learn a scoring function that directly optimizes the user engagement in terms of nDCG@10 on the predicted ranking. Experiments conducted on an extended version of the MovieTweetings dataset, released as part of the RecSys Challenge 2014, show the effectiveness of our approach.


💡 Research Summary

The paper addresses the problem of predicting user engagement on Twitter, specifically the sum of retweets and favorites that a tweet receives, by framing it as a collaborative ranking task rather than the traditional rating prediction or top‑N recommendation problems common in collaborative filtering (CF). The authors argue that conventional CF evaluation metrics such as RMSE or precision/recall focus only on rating accuracy or the quality of a short recommendation list, overlooking other dimensions that are crucial for social media platforms where user interaction (likes, retweets, mentions) is the primary signal of success.

To operationalize the task, the authors use the MovieTweetings Extended dataset released for the RecSys Challenge 2014. This dataset contains IMDb movie ratings posted by users on Twitter, together with the full set of tweet metadata returned by the Twitter API (e.g., timestamps, retweet status, mentions, follower/friend counts). Each observation is a triple (user u, item i, tweet d). The engagement label for a triple is defined as:

 engagement(u,i,d) = retweets(u,i,d) + favorites(u,i,d).

The authors extract a 16‑dimensional feature vector φ(u,i,d) for each triple. Features include: the explicit rating given by the user, deviation of that rating from the user’s median rating, average user engagement, binary indicator of whether the user’s average engagement is positive, average user rating, friend‑to‑follower ratio, total tweet count of the user, average engagement per item, binary indicator of positive item engagement, average item rating, aggregated friend‑to‑follower ratios per item, aggregated tweet counts per item, binary flags for presence of mentions, retweet status, whether the tweet is a retweet of another tweet present in the dataset, and the frequency of retweets observed for the item. These features capture both user‑centric and item‑centric social signals as well as tweet‑level properties.

The learning problem is cast into a Learning‑to‑Rank framework where each user is treated as a query and the tweets authored by that user are the documents to be ranked. The relevance judgments are the engagement values. The authors aim to directly optimize Normalized Discounted Cumulative Gain at cutoff 10 (nDCG@10), a metric that rewards higher relevance scores appearing near the top of the ranked list and is standard in information retrieval.

For the ranking model, the authors employ an ensemble of MART (Multiple Additive Regression Trees) and LambdaMART. LambdaMART is a pairwise learning‑to‑rank algorithm that uses λ‑gradients to approximate the change in the target IR metric (here nDCG@10) caused by swapping the order of a pair of documents. By feeding these gradients into a gradient‑boosted tree learner, LambdaMART can directly optimize nDCG@10 during training. MART provides a complementary boosting baseline; the linear combination of the two models is intended to balance bias and variance and to improve robustness.

Data preprocessing includes pruning users with fewer than four interactions or more than 200 interactions, which reduces noise caused by extremely sparse or overly active users. The remaining data are split 80 %/20 % for training/validation to tune hyper‑parameters: number of leaves per tree (10), learning rate (0.1), and early stopping after 50 rounds without improvement on the validation nDCG@10. The maximum number of features considered at each split is set to the total number of extracted features. Implementation relies on Python libraries (NumPy, SciPy, scikit‑learn) and RankLib’s LambdaMART implementation.

Baseline methods for comparison are: (1) Factorization Machine (FM) trained with MCMC using libFM, incorporating user, item, and rating indicator variables with engagement as the label; (2) recRating, which predicts engagement simply by the explicit rating (feature F1); (3) recHEI, which predicts engagement by the historical average engagement of the item (feature F8); and (4) recRandom, which outputs random scores.

Experimental results on the test split show that the simple rating‑based predictor recRating already achieves a respectable nDCG@10 of 0.8182, indicating that explicit user ratings are a strong proxy for future engagement. FM performs slightly worse (0.7950). The proposed CRUE (Collaborative Ranking for User Engagement) model reaches an nDCG@10 of 0.87, outperforming all baselines by a noticeable margin (approximately 6 percentage points over recRating). A scatter plot of rating versus engagement demonstrates a positive correlation but also substantial variance, justifying the need for richer features and a ranking‑oriented loss.

The authors conclude that (a) collaborative ranking with direct optimization of an IR metric can effectively predict social engagement, (b) leveraging tweet‑level metadata provides valuable signals beyond traditional user‑item interaction matrices, and (c) the approach scales to real‑world datasets with high sparsity after modest preprocessing. They suggest future work could incorporate textual content analysis (sentiment, topics), temporal dynamics, or online learning to handle streaming tweet data in real time.

Overall, the paper contributes a novel application of learning‑to‑rank techniques to social media recommendation, demonstrates the practical benefit of optimizing nDCG@10 for engagement prediction, and provides a reproducible experimental pipeline using publicly available datasets and open‑source tools.


Comments & Academic Discussion

Loading comments...

Leave a Comment