Adaptive Resolution and Chroma Subsampling for Energy-Efficient Video Coding
Conventional video encoders typically employ a fixed chroma subsampling format, such as YUV420, which may not optimally reflect variations in chroma detail across different types of content. This can lead to suboptimal chroma quality and inefficiencies in bitrate allocation. We propose an Adaptive Resolution-Chroma Subsampling (ARCS) framework that jointly optimizes spatial resolution and chroma subsampling to balance perceptual quality and decoding efficiency. ARCS selects an optimal (resolution, chroma format) pair for each bitrate by maximizing a composite quality-complexity objective, while enforcing monotonicity constraints to ensure smooth transitions between representations. Experimental results using x265 show that, compared to a fixed-format encoding (YUV444), on average, ARCS achieves a 13.48 % bitrate savings and a 62.18 % reduction in decoding time, which we use as a proxy for the decoding energy, to yield the same colorVideoVDP score. The proposed framework introduces chroma adaptivity as a new control dimension for energy-efficient video streaming.
💡 Research Summary
The paper introduces Adaptive Resolution‑Chroma Subsampling (ARCS), a novel framework that treats chroma subsampling as a first‑class adaptation variable alongside spatial resolution for energy‑efficient video streaming. Conventional adaptive streaming pipelines typically fix the chroma format (most often YUV420) and only vary resolution or bitrate, which ignores the fact that chroma detail varies widely across content and that chroma processing can dominate decoding complexity. ARCS addresses this gap by jointly selecting the optimal pair (resolution r, chroma format c) for each target bitrate b.
The methodology proceeds in three stages. First, from a high‑quality YUV444 source, multiple versions are generated covering two resolutions (1080p and 2160p) and three chroma formats (YUV420, YUV422, YUV444). High‑quality resampling filters are used to avoid aliasing. Second, each (r, c) configuration is encoded with x265 (slower preset) and decoded with the HM reference decoder. For every configuration the authors record (i) the actual bitrate, (ii) a perceptual quality score Q (ColorVideoVDP in JOD units, complemented by weighted PSNR), and (iii) average decoding time per frame τ_D, which serves as a proxy for decoding energy. Third, a composite objective J = Q − α·log(τ_D) is defined, where α∈
Comments & Academic Discussion
Loading comments...
Leave a Comment