Counting How the Seconds Count: Understanding Algorithm-User Interplay in TikTok via ML-driven Analysis of Video Content

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Short video streaming systems such as TikTok, YouTube Shorts, Instagram Reels, etc., have reached billions of active users worldwide. At the core of such systems are (proprietary) recommendation algorithms which recommend a sequence of videos to each user, in a personalized way. We aim to understand the temporal evolution of recommendations made by such algorithms, as well as the interplay between the recommendations and user experience. While past work has studied recommendation algorithms using textual data (e.g., titles, hashtags, etc.) as well as user studies and interviews, we add a third modality of analysis - we perform automated analysis of the videos themselves. To perform such multimodal analysis, we develop a new HCI measurement approach that starts with our new tool called VCA (Video Content Analysis) that leverages recent advances in Vision Language Models (VLMs). We apply VCA on a trifecta of HCI methodologies - real user studies, interviews, and data donation. This allows us to understand temporal aspects of how well TikTok’s recommendation algorithm is perceived by users, is affected by user interactions, and aligns with user history; how users are sensitive to the order of videos recommended; and how the algorithm’s effectiveness itself may be predictable in the future. While it is not our goal to reverse-engineer TikTok’s recommendation algorithm, our new findings indicate behavioral aspects that the TikTok user community can benefit from.

💡 Research Summary

The paper investigates the dynamic interplay between TikTok’s proprietary recommendation algorithm and user experience by introducing a novel measurement tool called Video Content Analysis (VCA). Unlike prior work that relied on textual metadata (titles, hashtags) or labor‑intensive manual video coding, VCA leverages recent Vision‑Language Models (specifically Video‑LLaMA) to generate rich multimodal embeddings that capture both visual and auditory aspects of short videos. These embeddings are clustered to produce human‑interpretable content categories, enabling large‑scale, content‑aware analysis without extensive annotation.

The authors apply VCA within a mixed‑methods HCI study comprising three pillars: (1) a controlled user experiment with 68 participants who browse TikTok for ten minutes while rating engagement, content quality, and interface satisfaction; (2) semi‑structured interviews to elicit users’ folk theories about the algorithm; and (3) a data‑donation effort that collects over 2.65 million video watch events. This design allows the researchers to examine short‑term (session‑level) and long‑term (months‑level) phenomena.

Five research questions guide the work: (RQ1) how recommendation content evolves over short and long timescales; (RQ2) the relationship between user actions (likes, shares) and subsequent recommendations; (RQ3) whether users engage more with videos aligned with their historical consumption; (RQ4) the impact of sequence continuity on user satisfaction; and (RQ5) the predictability of watch behavior using video‑content features.

Key findings include: (1) Users spend a large portion of daily time on a narrow set of topics, yet their topic interests shift frequently, indicating a desire for novelty. (2) Over time, recommended feeds become increasingly individualized rather than converging on trending content, suggesting the algorithm builds personalized topic trajectories. (3) Immediate interactions (likes, shares) have limited influence on the next few recommendations, contradicting the common folk belief that the algorithm instantly “learns” from each action. (4) Users prefer seeing topic changes within a session; novelty boosts perceived content quality. (5) Disrupting the natural order of the recommendation sequence (by silently dropping a few videos) quickly degrades user experience, highlighting the importance of sequential continuity for immersion. (6) Using VCA‑derived embeddings, the authors predict whether a user will watch more than 10 % of a video with 70 % accuracy, outperforming models that rely solely on interaction logs.

These results collectively challenge the prevailing narrative that TikTok’s algorithm is highly reactive to single actions and instead portray it as a system that emphasizes long‑term user profiling, content diversity, and smooth sequential flow. The study also demonstrates the practical utility of VCA as a scalable, multimodal analysis framework that can be extended to other video‑centric platforms or domains where large‑scale video understanding is required. By providing a concrete, reproducible pipeline, the work offers HCI researchers a ready‑to‑use tool for bridging the gap between algorithmic black‑boxes and user‑centered insights.

Counting How the Seconds Count: Understanding Algorithm-User Interplay in TikTok via ML-driven Analysis of Video Content

💡 Research Summary

Comments & Academic Discussion

Leave a Comment