Live or Lie: Action-Aware Capsule Multiple Instance Learning for Risk Assessment in Live Streaming Platforms

Live or Lie: Action-Aware Capsule Multiple Instance Learning for Risk Assessment in Live Streaming Platforms
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Live streaming has become a cornerstone of today’s internet, enabling massive real-time social interactions. However, it faces severe risks arising from sparse, coordinated malicious behaviors among multiple participants, which are often concealed within normal activities and challenging to detect timely and accurately. In this work, we provide a pioneering study on risk assessment in live streaming rooms, characterized by weak supervision where only room-level labels are available. We formulate the task as a Multiple Instance Learning (MIL) problem, treating each room as a bag and defining structured user-timeslot capsules as instances. These capsules represent subsequences of user actions within specific time windows, encapsulating localized behavioral patterns. Based on this formulation, we propose AC-MIL, an Action-aware Capsule MIL framework that models both individual behaviors and group-level coordination patterns. AC-MIL captures multi-granular semantics and behavioral cues through a serial and parallel architecture that jointly encodes temporal dynamics and cross-user dependencies. These signals are integrated for robust room-level risk prediction, while also offering interpretable evidence at the behavior segment level. Extensive experiments on large-scale industrial datasets from Douyin demonstrate that AC-MIL significantly outperforms MIL and sequential baselines, establishing new state-of-the-art performance in room-level risk assessment for live streaming. Moreover, AC-MIL provides capsule-level interpretability, enabling identification of risky behavior segments as actionable evidence for intervention. The project page is available at: https://qiaoyran.github.io/AC-MIL/.


💡 Research Summary

Abstract
The paper tackles the problem of detecting risky live‑streaming rooms on platforms such as Douyin, where only room‑level supervision is available. By formulating the task as a Multiple Instance Learning (MIL) problem, each room is treated as a bag and each “user‑timeslot capsule”—a short subsequence of actions performed by a specific user within a fixed time window—is treated as an instance. The authors propose Action‑aware Capsule MIL (AC‑MIL), a hierarchical architecture that combines serial and parallel modules to capture temporal dynamics, cross‑user dependencies, and multi‑granular semantics. Extensive experiments on large‑scale industrial data demonstrate that AC‑MIL outperforms a wide range of MIL, time‑series, and graph‑based baselines, while also providing capsule‑level risk attributions for interpretability.

1. Introduction
Live streaming services have become a major medium for social interaction, entertainment, and e‑commerce. Their real‑time, open nature, however, enables coordinated malicious behaviors such as fraud, where streamers and planted viewers collaborate to lure audiences into scams. Detecting such risks is challenging because (i) malicious cues are sparse, indirect, and embedded in massive multimodal action streams; (ii) systems must achieve high recall with low false‑alarm rates; and (iii) detection must be near real‑time. The authors argue that weak supervision—only a binary label per room—is realistic for industrial settings, motivating a MIL formulation.

2. Problem Formulation
An action is defined as a 4‑tuple (user, timestamp, action‑type, text). A live‑streaming room over a horizon


Comments & Academic Discussion

Loading comments...

Leave a Comment