An Estimation of Online Video User Engagement from Features of Continuous Emotions

An Estimation of Online Video User Engagement from Features of Continuous Emotions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

💡 Research Summary

The paper investigates how continuous affective signals—arousal, valence, and a newly introduced trustworthiness dimension—relate to various YouTube user‑engagement metrics such as daily views, likes, dislikes, comment counts, and comment‑like counts. The authors leverage the MUSE‑CAR dataset, which contains roughly 600 hours of frame‑wise annotations for the three dimensions across 300 vehicle‑review videos (≈40 h total). Five annotators recorded each dimension using a joystick at 0.25 Hz; the recordings were fused into a gold‑standard signal via the Evaluator Weighted Estimator (EWE) and subsequently z‑standardized.

In addition to the affective time series, the authors harvested per‑day engagement statistics (Vp/d, Lp/d, Dp/d, Cp/d, LCp/d) from the YouTube API and scraped over 79 k comments. A random subset of 1 100 comments was manually labeled as positive, neutral, negative, or not applicable by three annotators (majority voting, inter‑rater agreement 0.47). These labeled comments were used to fine‑tune an ALBERT‑based transformer for sentiment classification, which then automatically labeled the remaining comments.

Feature extraction from the continuous signals comprised basic statistics (mean, standard deviation, 5‑, 25‑, 50‑, 75‑, 95‑percentiles) and more expressive time‑series descriptors: count‑below‑mean, number‑of‑peaks, absolute energy, boundary‑range, fluctuation‑amplitude, etc. In total, roughly 30 features per dimension were generated. Pearson correlation analysis revealed statistically significant (p < 0.05) relationships:

  • Arousal – smaller boundary ranges and lower fluctuation amplitudes correlate positively with likes and comment counts; a higher “count‑below‑mean” (i.e., more low‑arousal segments) is linked to longer view durations.
  • Valence – a greater number of peaks is associated with higher comment volume and a larger proportion of positive comments, suggesting that frequent emotional swings attract discussion.
  • Trustworthiness – higher absolute energy (stronger trust signal) aligns with more likes and comment‑likes, indicating that perceived credibility boosts positive user actions.

For prediction, the authors framed each engagement metric as a regression problem and trained linear‑kernel Support Vector Regressors (SVR). They compared three feature sets: (i) the full set of all extracted features, (ii) a semi‑automatic “cross‑task” selection based on correlation thresholds, and (iii) a fully automatic task‑specific selection using recursive feature elimination. The selected subsets consistently outperformed the full set. For example, predicting daily likes (Lp/d) yielded a mean absolute error (MAE) of 1.55 likes/day with all features, which improved to 1.33 likes/day with semi‑automatic selection and 1.23 likes/day with automatic selection (the dataset mean is 9.73 ± 28.75 likes/day). Similar gains were observed for views, comments, and comment‑likes.

The study’s contributions are twofold: (1) it demonstrates that continuous affective and trustworthiness signals, without any audio, visual, or textual content features, can meaningfully predict user engagement; (2) it identifies interpretable, low‑dimensional feature patterns that explain how specific emotional dynamics drive engagement.

Limitations include the domain specificity (vehicle‑review videos), the high annotation cost (≈600 h × 3 dimensions × 5 annotators), reliance on linear SVR rather than more expressive deep models, and the absence of multimodal fusion with audio/video cues. Future work is suggested to (i) extend the approach to broader content categories, (ii) explore deep regression architectures, (iii) integrate multimodal features, and (iv) develop real‑time affective monitoring for recommendation or moderation systems.

In summary, the paper provides a rigorous, data‑driven analysis linking continuous emotional dynamics—especially arousal stability, valence peak frequency, and trustworthiness intensity—to concrete measures of YouTube audience behavior, offering valuable insights for affect‑aware content creation, recommendation, and platform governance.


Comments & Academic Discussion

Loading comments...

Leave a Comment