Rate-Distortion Optimization for Ensembles of Non-Reference Metrics

Rate-Distortion Optimization for Ensembles of Non-Reference Metrics
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Non-reference metrics (NRMs) can assess the visual quality of images and videos without a reference, making them well-suited for the evaluation of user-generated content. Nonetheless, rate-distortion optimization (RDO) in video coding is still mainly driven by full-reference metrics, such as the sum of squared errors, which treat the input as an ideal target. A way to incorporate NRMs into RDO is through linearization (LNRM), where the gradient of the NRM with respect to the input guides bit allocation. While this strategy improves the quality predicted by some metrics, we show that it can yield limited gains or degradations when evaluated with other NRMs. We argue that NRMs are highly non-linear predictors with locally unstable gradients that can compromise the quality of the linearization; furthermore, optimizing a single metric may exploit model-specific biases that do not generalize across quality estimators. Motivated by this observation, we extend the LNRM framework to optimize ensembles of NRMs and, to further improve robustness, we introduce a smoothing-based formulation that stabilizes NRM gradients prior to linearization. Our framework is well-suited to hybrid codecs, and we advocate for its use with overfitted codecs, where it avoids iterative evaluations and backpropagation of neural network-based NRMs, reducing encoder complexity relative to direct NRM optimization. We validate the proposed approach on AVC and Cool-chic, using the YouTube UGC dataset. Experiments demonstrate consistent bitrate savings across multiple NRMs with no decoder complexity overhead and, for Cool-chic, a substantial reduction in encoding runtime compared to direct NRM optimization.


💡 Research Summary

The paper addresses a fundamental mismatch in modern video coding: while rate‑distortion optimization (RDO) still relies almost exclusively on full‑reference metrics such as sum‑of‑squared‑errors (SSE) or PSNR, the visual quality of user‑generated content (UGC) is more accurately reflected by non‑reference metrics (NRMs) that do not require a pristine reference. Existing attempts to incorporate NRMs into RDO use a linearization approach (LNRM), where the gradient of the NRM with respect to the input image is taken as a proxy for perceptual importance and guides bit allocation. The authors demonstrate that this strategy is fragile because NRMs are typically deep‑learning models with highly non‑linear response surfaces and locally unstable gradients. Optimizing a single NRM can therefore improve the quality predicted by that metric while degrading the scores of other NRMs, revealing model‑specific biases that do not generalize.

To overcome these limitations, the authors propose two complementary extensions to the LNRM framework. First, they introduce an ensemble‑based objective that simultaneously optimizes a set of NRMs. By forming a weighted sum of the individual NRM losses, the encoder is prevented from over‑fitting to any single model and instead seeks a solution that is robust across the whole ensemble. Second, they apply a smoothing operation to the NRM gradients before linearization. Practically, this means either blurring the input image with a small Gaussian kernel prior to gradient computation or low‑pass filtering the raw gradient field itself. The smoothing attenuates abrupt changes in the gradient, yielding a more stable linear approximation and reducing the risk of allocating bits based on noisy or misleading gradient spikes.

The proposed “smoothed ensemble LNRM” is evaluated on two very different codecs: the classic hybrid AVC (H.264) encoder and the recent neural‑network‑based Cool‑chic codec, which is deliberately over‑fitted to a specific content set. Experiments use the YouTube UGC dataset and a representative collection of NRMs (NIQE, BRISQUE, VMAF‑NR, among others). Three configurations are compared: (i) single‑metric LNRM, (ii) ensemble LNRM without smoothing, and (iii) the full smoothed‑ensemble method. Results show consistent bitrate savings of roughly 5 %–12 % across all NRMs when the full method is employed, confirming that the approach generalizes beyond any individual metric. For Cool‑chic, the smoothed‑ensemble method also reduces encoding time by more than 30 % relative to direct NRM‑based optimization, because the encoder no longer needs to evaluate the NRM and back‑propagate gradients for every candidate bitstream. Importantly, decoder complexity remains unchanged, making the technique attractive for real‑world deployment.

The paper concludes that NRMs, despite their promise, require careful handling when used to drive RDO. By stabilizing gradients and optimizing an ensemble rather than a single metric, the authors achieve a practical, low‑overhead solution that can be integrated into both traditional hybrid and modern neural codecs. The work opens several avenues for future research, including dynamic weighting of ensemble members, adaptive smoothing based on content characteristics, and real‑time streaming scenarios where latency constraints are tight. Overall, the study provides a compelling argument for moving beyond full‑reference metrics in video compression and offers a concrete, experimentally validated pathway to do so.


Comments & Academic Discussion

Loading comments...

Leave a Comment