Adapting JPEG XS gains and priorities to tasks and contents

Adapting JPEG XS gains and priorities to tasks and contents
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Most current research in the domain of image compression focuses solely on achieving state of the art compression ratio, but that is not always usable in today’s workflow due to the constraints on computing resources. Constant market requirements for a low-complexity image codec have led to the recent development and standardization of a lightweight image codec named JPEG XS. In this work we show that JPEG XS compression can be adapted to a specific given task and content, such as preserving visual quality on desktop content or maintaining high accuracy in neural network segmentation tasks, by optimizing its gain and priority parameters using the covariance matrix adaptation evolution strategy.


💡 Research Summary

The paper investigates how the lightweight, low‑latency image codec JPEG XS can be tuned for specific tasks and content types by optimizing its sub‑band gain and priority parameters. JPEG XS uses a table of 30 pairs of gains (Gb) and priorities (Pb), one for each sub‑band, which are stored in the image header and influence the truncation position of wavelet coefficients during encoding. The ISO 21122‑1 standard provides a default set of weights that maximizes PSNR, but these weights are not necessarily optimal for other objectives such as perceptual visual quality or the performance of downstream AI models.

To explore this design space, the authors employ the Covariance Matrix Adaptation Evolution Strategy (CMA‑ES), a derivative‑free evolutionary optimizer well‑suited for small‑dimensional, non‑convex problems. The initial solution X₀ is the standard ISO weight set, and the initial standard deviation σ₀ is derived from the natural variability of each parameter. Using the pycma implementation, the population size is automatically set to 14. Gains are treated as continuous variables during the search; their integer‑rounded values are fed to the encoder, while the fractional parts are sorted in descending order to generate the priority ordering, faithfully reproducing the priority mechanism defined in the JPEG XS spec.

Two families of experiments are conducted. The first targets human visual system (HVS) metrics: MS‑SSIM and PSNR. Training data consist of 240 “featured” pictures from Wikimedia Commons (scaled and cropped to 768 × 512) and a separate set of 200 synthetic desktop screenshots (1920 × 1080). For each content class, the optimizer runs 4 000 function evaluations at three target bitrates (1.0, 3.0, 5.0 bpp). The fitness value is the average 1‑MS‑SSIM, –PSNR, or prediction error across the set.

The second family addresses a computer‑vision task: semantic segmentation on the Cityscapes dataset. A HarDNet model is first trained on uncompressed images. The optimizer then evaluates candidate weight sets by encoding the 500‑image validation split, decoding, running inference, and computing the mean Intersection‑over‑Union (IoU). Because the model has already seen the training split, only the validation data are used for fitness evaluation. For each bitrate, 1 500 CMA‑ES evaluations are performed, and the process is repeated across three city‑based folds, yielding a weighted average IoU.

All encoding/decoding is parallelized across 78 CPU threads, while MS‑SSIM and segmentation inference run on GPUs. The total optimization time is roughly three weeks. After optimization, the authors interpolate intermediate bitrates by evaluating the top‑10 weight sets from the nearest optimized bitrate and performing a small (≈150) CMA‑ES run at the interpolated point.

Results show that HVS‑optimized weights achieve substantial bitrate savings while preserving perceptual quality. At 1 bpp, the MS‑SSIM‑optimized weights require 12‑18 % fewer bits than the standard weights to reach the same MS‑SSIM score; at higher bitrates the savings range from 3.8 % to 18 %. PSNR‑optimized weights provide negligible improvement, confirming that the ISO table is already near‑optimal for this metric.

For the AI task, weights optimized for IoU deliver dramatic reductions in required bitrate: 33 % to 59 % fewer bits are needed to maintain the same segmentation accuracy. Interestingly, at 7 bpp the compressed images with optimized weights slightly outperform the uncompressed baseline (IoU 0.7514 vs 0.7506), suggesting that moderate compression can act as a regularizer for the network. Conversely, HVS‑optimized weights do not improve IoU, and AI‑optimized weights do not improve MS‑SSIM or PSNR, underscoring the importance of aligning the fitness function with the desired outcome.

The study demonstrates that, despite the small parameter space, JPEG XS can be effectively customized for diverse objectives without overfitting, even when training on modestly sized datasets. CMA‑ES proves to be a near‑parameter‑free method that converges reliably, making it attractive for practical deployment where encoder parameters must be tuned once and then embedded in the bitstream. The authors suggest future work on multi‑objective optimization (e.g., jointly maximizing MS‑SSIM and IoU), dynamic adaptation of weights in streaming scenarios, and extending the approach to other lightweight codecs.

In conclusion, the paper provides a clear, reproducible framework for task‑specific JPEG XS weight optimization, delivering measurable bitrate reductions for both perceptual and machine‑vision metrics while preserving full compatibility with standard‑compliant decoders.


Comments & Academic Discussion

Loading comments...

Leave a Comment