YouTube UGC Dataset for Video Compression Research
Non-professional video, commonly known as User Generated Content (UGC) has become very popular in today’s video sharing applications. However, traditional metrics used in compression and quality assessment, like BD-Rate and PSNR, are designed for pristine originals. Thus, their accuracy drops significantly when being applied on non-pristine originals (the majority of UGC). Understanding difficulties for compression and quality assessment in the scenario of UGC is important, but there are few public UGC datasets available for research. This paper introduces a large scale UGC dataset (1500 20 sec video clips) sampled from millions of YouTube videos. The dataset covers popular categories like Gaming, Sports, and new features like High Dynamic Range (HDR). Besides a novel sampling method based on features extracted from encoding, challenges for UGC compression and quality evaluation are also discussed. Shortcomings of traditional reference-based metrics on UGC are addressed. We demonstrate a promising way to evaluate UGC quality by no-reference objective quality metrics, and evaluate the current dataset with three no-reference metrics (Noise, Banding, and SLEEQ).
💡 Research Summary
The paper “YouTube UGC Dataset for Video Compression Research” addresses a critical gap in video compression research by introducing a large-scale, publicly available dataset specifically focused on User Generated Content (UGC). It highlights the fundamental mismatch between traditional video quality assessment metrics and the reality of UGC, where source videos are often “non-pristine,” containing pre-existing artifacts like noise, blur, and compression from consumer devices.
The core contribution is the dataset itself: 1500 video clips, each 20 seconds long, sampled from 1.5 million Creative Commons licensed YouTube videos. The clips span 15 diverse categories (e.g., Gaming, Sports, Vlog, HDR, VR) and multiple resolutions (from 360p to 4K), stored in raw YUV 4:2:0 format. A key innovation lies in the novel sampling methodology designed to ensure the dataset is representative of the massive source pool. Instead of relying on superficial metadata, the authors extract four complexity features directly from encoding logs using an H.264 encoder: Spatial Complexity (from I-frame bitrate), Color Complexity (ratio of chroma to luma error), Temporal Complexity (ratio of P-frame to I-frame bitrate), and Chunk Variation (standard deviation of bitrate across 1-second segments). These features directly relate to compression difficulty and quality consistency in practical encoding pipelines. A systematic sampling algorithm in this 4D feature space ensures high coverage (average 89%) of the original video population, resulting in a less “spiky” and more representative distribution.
The paper provides a compelling analysis of the challenges in UGC quality assessment. It demonstrates with visual examples how reference-based metrics like PSNR, SSIM, and VMAF can fail dramatically when the reference itself is flawed. For instance, they may indicate severe quality loss when the compressed version is visually similar to a noisy original, or fail to credit compression that actually reduces pre-existing artifacts. The authors identify two root causes: the “non-pristine original” and the “mismatch between absolute and reference quality.”
As a promising alternative, the paper advocates for the use of no-reference (NR) objective quality metrics. It evaluates the newly created dataset using three such metrics: a noise detector, a banding artifact detector, and SLEEQ (a metric for compression artifacts in natural scenes). This analysis shows that most uploaded YouTube videos do not contain severe artifacts and reveals category-specific trends (e.g., more banding in Animation videos). The suggested approach is to apply these NR metrics independently to the source and encoded versions and use the difference to assess quality impact, moving away from a single flawed reference comparison.
In conclusion, this work provides a valuable resource to spur research into compression algorithms and quality assessment methods tailored for the real-world conditions of UGC. It makes the dataset available for download and positions the effective evaluation of quality degradation for non-pristine content as a key open research question.
Comments & Academic Discussion
Loading comments...
Leave a Comment