VICTOR: Dataset Copyright Auditing in Video Recognition Systems

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Video recognition systems are increasingly being deployed in daily life, such as content recommendation and security monitoring. To enhance video recognition development, many institutions have released high-quality public datasets with open-source licenses for training advanced models. At the same time, these datasets are also susceptible to misuse and infringement. Dataset copyright auditing is an effective solution to identify such unauthorized use. However, existing dataset copyright solutions primarily focus on the image domain; the complex nature of video data leaves dataset copyright auditing in the video domain unexplored. Specifically, video data introduces an additional temporal dimension, which poses significant challenges to the effectiveness and stealthiness of existing methods. In this paper, we propose VICTOR, the first dataset copyright auditing approach for video recognition systems. We develop a general and stealthy sample modification strategy that enhances the output discrepancy of the target model. By modifying only a small proportion of samples (e.g., 1%), VICTOR amplifies the impact of published modified samples on the prediction behavior of the target models. Then, the difference in the model’s behavior for published modified and unpublished original samples can serve as a key basis for dataset auditing. Extensive experiments on multiple models and datasets highlight the superiority of VICTOR. Finally, we show that VICTOR is robust in the presence of several perturbation mechanisms to the training videos or the target models.

💡 Research Summary

The paper introduces VICTOR, the first proactive dataset‑copyright auditing method designed specifically for video recognition systems. Existing auditing techniques focus on images or audio and cannot be directly applied to video because of the additional temporal dimension, variable video lengths, and the complexity of modern video models. VICTOR addresses these challenges by (1) modifying only a tiny fraction of the released dataset (typically ≤1 %), (2) preserving the original class label of each modified sample to avoid side‑effects on model performance, and (3) injecting a subtle, frame‑wise procedural noise that does not alter visual semantics but amplifies the influence of the modified samples on a model’s output distribution.

During the audit phase, the auditor (who knows the original dataset and the modifications) queries a suspect model in a black‑box manner. For each modified sample, the auditor also obtains the model’s prediction on the corresponding unmodified version (kept private). The absolute difference between the two output probability vectors is computed. By estimating a decision threshold from a calibration set of such differences, VICTOR performs a statistical hypothesis test: if the observed differences are consistently below the threshold, the model is deemed to have been trained on the released dataset; otherwise, it is considered clean. Post‑processing steps handle cases where both predictions have low confidence, reducing false positives.

The authors evaluate VICTOR on three widely used video datasets—Kinetics‑400, UCF‑101, and HMDB‑51—and on six state‑of‑the‑art video models covering three families: 2D‑CNN + RNN, 3D‑CNN, and transformer‑based architectures (e.g., SlowFast, TimeSformer, ViViT). With only 1 % of samples modified, VICTOR achieves auditing accuracies ranging from 95 % to nearly 100 % across all model–dataset combinations. Robustness is further demonstrated against three common evasion strategies: (i) input preprocessing such as frame sampling and cropping, (ii) training‑time interventions like data augmentation and regularization, and (iii) post‑training adjustments including temperature scaling. In all cases, performance degradation is minimal, confirming that the procedural‑noise perturbation remains effective despite realistic transformations.

Key contributions are: (1) defining the problem of video‑dataset copyright auditing and proposing the first solution, (2) a label‑preserving, low‑cost perturbation mechanism that leverages procedural noise to magnify behavioral differences without harming model utility, and (3) a rigorous verification pipeline based on output‑difference thresholds and hypothesis testing that works under black‑box access. The paper also releases an open‑source implementation.

Limitations include potential attenuation of the procedural noise under aggressive video compression or extreme visual transformations, and reduced statistical power when the modification ratio falls below 0.5 %. Future work may explore adaptive noise strength, extensions to multimodal video data (audio, text), and more sophisticated statistical models for threshold estimation. Overall, VICTOR demonstrates that proactive, stealthy data modifications can provide reliable, scalable copyright protection for video datasets in real‑world deployment scenarios.

VICTOR: Dataset Copyright Auditing in Video Recognition Systems

💡 Research Summary

Comments & Academic Discussion

Leave a Comment