Complexity Analysis Of Next-Generation VVC Encoding and Decoding

While the next generation video compression standard, Versatile Video Coding (VVC), provides a superior compression efficiency, its computational complexity dramatically increases. This paper thoroughly analyzes this complexity for both encoder and decoder of VVC Test Model 6, by quantifying the complexity break-down for each coding tool and measuring the complexity and memory requirements for VVC encoding/decoding. These extensive analyses are performed for six video sequences of 720p, 1080p, and 2160p, under Low-Delay (LD), Random-Access (RA), and All-Intra (AI) conditions (a total of 320 encoding/decoding). Results indicate that the VVC encoder and decoder are 5x and 1.5x more complex compared to HEVC in LD, and 31x and 1.8x in AI, respectively. Detailed analysis of coding tools reveals that in LD on average, motion estimation tools with 53%, transformation and quantization with 22%, and entropy coding with 7% dominate the encoding complexity. In decoding, loop filters with 30%, motion compensation with 20%, and entropy decoding with 16%, are the most complex modules. Moreover, the required memory bandwidth for VVC encoding/decoding are measured through memory profiling, which are 30x and 3x of HEVC. The reported results and insights are a guide for future research and implementations of energy-efficient VVC encoder/decoder.

💡 Research Summary

The paper presents a thorough quantitative analysis of the computational and memory complexity of the Versatile Video Coding (VVC) standard, focusing on both the encoder and decoder of the VVC Test Model 6 (VTM‑6). The authors evaluate six representative video sequences at three resolutions—720 p, 1080 p, and 2160 p—under three typical coding configurations: Low‑Delay (LD), Random‑Access (RA), and All‑Intra (AI). In total, 320 encoding and decoding runs are performed, and detailed profiling is carried out using CPU cycle counters, memory‑access tracing, and bandwidth measurement tools. The study breaks down the total processing load into contributions from each coding tool, providing a fine‑grained view of where the bulk of the work resides.

Key findings are as follows. First, VVC’s overall computational load is dramatically higher than that of its predecessor HEVC. In LD mode the VVC encoder requires roughly five times the CPU cycles of HEVC, while the decoder needs about 1.5 ×. The gap widens dramatically for AI mode, where the encoder consumes about 31 × and the decoder about 1.8 × the HEVC baseline. These ratios are consistent across the three resolutions, confirming that the increase is intrinsic to the algorithmic extensions rather than resolution‑dependent effects.

Second, the encoder’s internal cost distribution reveals three dominant modules. Motion estimation (ME) accounts for an average of 53 % of the total encoder cycles in LD mode. The surge is caused by VVC’s expanded set of prediction modes, multiple reference frames, and finer sub‑pixel refinement, which together generate a far larger candidate pool than HEVC. Transformation and quantization (TQ) contribute 22 % of the cycles; the introduction of Multi‑Type Transform (MTS) and Low‑Frequency Non‑Separable Transform (LFNST) adds extra matrix multiplications and coefficient re‑ordering steps. Entropy coding (CABAC) consumes about 7 % of the cycles, reflecting the increased number of context models and the more complex probability updates required by VVC’s adaptive binary arithmetic coding.

Third, on the decoder side, loop filtering dominates with roughly 30 % of the processing time. VVC employs Adaptive Loop Filter (ALF), Deblocking Filter (DBF), and a sharpening filter simultaneously, each with per‑frame parameter estimation, which together constitute a substantial computational burden. Motion compensation (MC) follows at 20 %, driven by the need to handle multiple reference pictures and more sophisticated motion vector interpolation. Entropy decoding accounts for 16 % of the cycles, again due to the richer context model set and the more involved bit‑stream parsing logic.

Fourth, memory bandwidth requirements are dramatically higher for VVC. The encoder’s memory traffic is on average 30 × that of HEVC, while the decoder’s traffic is about 3 ×. This increase stems from frequent loading of large candidate reference blocks during ME, the storage of multiple transform coefficient sets for MTS/LFNST, and the need to read/write extensive filter parameters for ALF/DBF. The authors point out that such bandwidth pressure has direct implications for power consumption and thermal design in real‑time hardware implementations.

Finally, the paper discusses implications for future research and practical deployment. To make VVC viable on power‑constrained platforms, algorithmic simplifications such as reduced ME search windows, early‑termination heuristics, selective activation of MTS/LFNST, and shared filter‑parameter caches are suggested. From a hardware perspective, the authors advocate for highly parallelizable architectures that can exploit data‑level parallelism in ME and TQ, as well as dedicated memory‑traffic optimizers (e.g., on‑chip buffers and prefetch engines) to mitigate the bandwidth bottleneck.

In summary, this work provides the first comprehensive, tool‑level breakdown of VVC’s computational and memory demands, quantifies the exact magnitude of the increase over HEVC, and offers concrete directions for algorithmic and architectural optimizations aimed at achieving energy‑efficient, real‑time VVC encoding and decoding.

💡 Research Summary

📜 Original Paper Content