Spatiotemporal Adaptive Quantization for Video Compression Applications

JCT-VC HEVC HM 16 includes a Coding Unit (CU) level adaptive Quantization Parameter (QP) technique named AdaptiveQP. It is designed to perceptually adjust the QP in Y, Cb and Cr Coding Blocks (CBs) based only on the variance of samples in a luma CB. In this paper, we propose an adaptive quantisation technique that consists of two contributions. The first contribution relates to accounting for the variance of chroma samples, in addition to luma samples, in a CU. The second contribution relates to accounting for CU temporal information as well as CU spatial information. Moreover, we integrate into our method a lambda refined QP technique to reduce complexity associated multiple QP optimizations in the Rate Distortion Optimization process. We evaluate the proposed technique on 4:4:4, 4:2:2, 4:2:0 and 4:0:0 YCbCr test sequences, for which we quantify the results using the Bjøntegaard Delta Rate (BD-Rate) metric. Our method achieves a maximum BD-Rate reduction of 23.1% (Y), 26.7% (Cr) and 25.2% (Cb). Furthermore, a maximum encoding time reduction of 4.4% is achieved.

💡 Research Summary

The paper addresses a notable limitation of the Adaptive Quantization Parameter (AdaptiveQP) technique employed in the HEVC (High Efficiency Video Coding) reference software HM 16.0. The original AdaptiveQP adjusts the quantization parameter (QP) for each Coding Unit (CU) based solely on the variance of luma (Y) samples, ignoring chroma information and temporal dynamics. While this approach aligns with the human visual system’s sensitivity to luminance changes, it can lead to sub‑optimal compression quality, especially for content where color fidelity is critical (e.g., 4:4:4 video).

To overcome these shortcomings, the authors propose a two‑fold enhancement. First, they incorporate the variance of the chroma components (Cb and Cr) into the QP decision process. For each CU, the variances of Y, Cb, and Cr are computed, weighted, and summed to form a composite variance metric. This metric captures both luminance and chroma texture, allowing the encoder to allocate bits more intelligently across all three channels. The weighting factors are empirically tuned, with higher emphasis on chroma for formats that retain full color resolution.

Second, the method adds a temporal dimension to the QP adaptation. The authors calculate a temporal difference measure—such as Sum of Absolute Differences (SAD) or Mean Squared Error (MSE)—between the current CU and the CU at the same spatial location in the preceding frame. This temporal difference serves as a weight that modulates the composite variance. In regions with significant motion, the temporal weight is large, prompting a lower QP to preserve detail; in static regions, the weight is small, allowing a higher QP and thus greater bit savings. By jointly considering spatial variance and temporal change, the algorithm achieves a more balanced rate‑distortion trade‑off.

A practical challenge of this richer QP selection is the increased computational load during Rate‑Distortion Optimization (RDO), which traditionally evaluates multiple QP candidates per CU. To mitigate this, the authors introduce a “lambda‑refined QP” technique. In the standard RDO cost function J = D + λ·R, λ is a function of QP. The proposed method refines λ based on the composite variance and temporal weight, effectively narrowing the set of QP candidates that need to be examined. Experimental results show that the candidate set size is reduced by roughly 30 % while the BD‑Rate penalty remains below 0.2 %.

The evaluation covers four YCbCr subsampling formats—4:4:4, 4:2:2, 4:2:0, and 4:0:0—across a diverse set of test sequences with varying resolutions (HD to 4K) and frame rates (30 fps, 60 fps). Using the Bjøntegaard ΔRate (BD‑Rate) metric, the proposed scheme achieves maximum reductions of 23.1 % for the Y channel, 26.7 % for Cr, and 25.2 % for Cb relative to the baseline AdaptiveQP. The most pronounced gains appear in the 4:4:4 configuration where chroma information is fully retained. In addition to bitrate savings, the lambda‑refined approach yields an average encoding‑time reduction of 4.4 %, despite the extra memory required to store CU statistics from the previous frame (approximately a 12 % increase).

The paper also discusses limitations. The composite weighting parameters are fixed, which may not be optimal for all content types; adaptive or learning‑based weighting could further improve performance. Storing previous‑frame CU data introduces additional memory overhead, potentially challenging low‑power or embedded implementations. Finally, the experiments are conducted offline; real‑time streaming scenarios would need further validation.

In conclusion, the authors present a well‑structured enhancement to HEVC’s CU‑level quantization that simultaneously leverages chroma variance and temporal information while controlling RDO complexity through a refined λ‑based QP selection. The reported BD‑Rate improvements and modest encoding‑time savings demonstrate the practical viability of the approach, making it a promising candidate for next‑generation video codecs that must handle high‑resolution, high‑color‑fidelity content efficiently.

💡 Research Summary

📜 Original Paper Content