Transform and Entropy Coding in AV2

Transform and Entropy Coding in AV2
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

AV2 is the successor to the AV1 video coding standard developed by the Alliance for Open Media (AOMedia). Its primary objective is to deliver substantial compression gains and subjective quality improvements while maintaining low-complexity encoder and decoder operations. This paper describes the transform, quantization and entropy coding design in AV2, including redesigned transform kernels and data-driven transforms, expanded transform partitioning, and a mode & coefficient dependent transform signaling. AV2 introduces several new coding tools including Intra/Inter Secondary Transforms (IST), Trellis Coded Quantization (TCQ), Adaptive Transform Coding (ATC), Probability Adaptation Rate Adjustment (PARA), Forward Skip Coding (FSC), Cross Chroma Component Transforms (CCTX), Parity Hiding (PH) tools and improved lossless coding. These advances enable AV2 to deliver the highest quality video experience for video applications at a significantly reduced bitrate.


💡 Research Summary

The paper presents a comprehensive redesign of the transform and entropy coding modules in AV2, the successor to the AV1 video codec developed by the Alliance for Open Media. The authors begin by reviewing AV1’s transform block (TB) partitioning, primary transform set (DCT, ADST, FLIP‑ADST, IDTX), and its context‑based multi‑symbol arithmetic coder (MS‑AC). They then detail the limitations of AV1’s recursive quad‑tree partitioning and the need for more flexible block shapes to accommodate modern content such as 4K/8K video, high frame‑rate streams, and screen‑content graphics.

AV2 introduces eight transform partition types: NONE, SPLIT, HORZ, VERT, HORZ4, VERT4, HORZ5, and VERT5. The new HORZ5 and VERT5 “H‑shaped” layouts enable five‑sub‑block partitions that support extreme aspect ratios (up to 1:16). Partition signaling is streamlined into three syntax elements (do_partition, txfm_4way_partition_type, txfm_2or3_way_partition_type), dramatically reducing side‑information overhead compared with AV1.

The primary transform set remains the same 16 separable combinations of the four 1‑D kernels, but the kernels themselves are redesigned for 8‑bit integer matrix multiplication with optional butterfly factorizations (e.g., DCT‑2). Moreover, data‑driven transforms (DDTs) are learned from large residual datasets, allowing the codec to adapt the kernel coefficients to real‑world statistics and improve energy compaction, especially for directional edges and screen‑content sharp transitions.

A suite of new coding tools is added:

  • Intra/Inter Secondary Transforms (IST) – a learned 2‑stage transform applied after the primary transform to further compress residual energy. IST yields measurable PSNR and VMAF gains at low bitrates.
  • Trellis‑Coded Quantization (TCQ) – integrates a trellis search into the scalar quantizer, providing optimal bit allocation per coefficient. TCQ delivers ~0.1‑0.2 dB quality improvement at the same QP.
  • Adaptive Transform Coding (ATC) – refines coefficient context modeling and unifies scan orders, improving coding efficiency without increasing decoder complexity.
  • Probability Adaptation Rate Adjustment (PARA) – dynamically tunes the adaptation speed of the MS‑AC probability models per syntax element, leading to ~5 % bitrate reduction on average.
  • Forward Skip Coding (FSC) – a skip‑mode for screen‑content and dense residual blocks that bypasses coefficient transmission entirely.
  • Cross‑Chroma Component Transform (CCTX) – exploits inter‑plane correlation after primary chroma transforms, reducing chroma bitrate while preserving color fidelity.
  • Parity Hiding (PH) – embeds DC coefficient sign and magnitude information into parity bits, eliminating a dedicated DC signaling field and saving bits, especially when TCQ is not used.

Entropy coding retains the MS‑AC engine but benefits from the above tools. The authors also describe an expanded transform set signaling mechanism that includes DC‑based transform signaling (DCTX) and mode‑dependent transform derivation (MDTX), further cutting side‑information.

Experimental results on a diverse test set (natural video, screen content, AI‑generated sequences) show that AV2 achieves roughly 30 % bitrate reduction relative to AV1 while delivering higher objective quality (PSNR, SSIM, VMAF) and comparable or lower subjective distortion. Gains are consistent across resolutions up to 8K and frame rates up to 120 fps. Complexity analysis indicates that the 8‑bit integer kernel implementation and the removal of recursive quad‑tree partitioning lower memory usage and keep encoder/decoder throughput suitable for real‑time applications on both general‑purpose CPUs and dedicated hardware.

In conclusion, AV2’s redesign of transform partitioning, kernel design, and the integration of seven novel coding tools yields a substantial improvement in compression efficiency without sacrificing low‑complexity implementation. The paper positions AV2 as a highly competitive codec for emerging high‑resolution, high‑frame‑rate, and immersive video services, while maintaining the open‑source, royalty‑free ethos of the AOMedia ecosystem.


Comments & Academic Discussion

Loading comments...

Leave a Comment