Fast Multirate Encoding for 360° Video in OMAF Streaming Workflows
Preparing high-quality 360-degree video for HTTP Adaptive Streaming requires encoding each sequence into multiple representations spanning different resolutions and quantization parameters (QPs). For ultra-high-resolution immersive content such as 8K 360-degree video, this process is computationally intensive due to the large number of representations and the high complexity of modern codecs. This paper investigates fast multirate encoding strategies that reduce encoding time by reusing encoder analysis information across QPs and resolutions. We evaluate two cross-resolution information-reuse pipelines that differ in how reference encodes propagate across resolutions: (i) a strict HD -> 4K -> 8K cascade with scaled analysis reuse, and (ii) a resolution-anchored scheme that initializes each resolution with its own highest-bitrate reference before guiding dependent encodes. In addition to evaluating these pipelines on standard equirectangular projection content, we also apply the same two pipelines to cubemap-projection (CMP) tiling, where each 360-degree frame is partitioned into independently encoded tiles. CMP introduces substantial parallelism, while still benefiting from the proposed multirate analysis-reuse strategies. Experimental results using the SJTU 8K 360-degree dataset show that hierarchical analysis reuse significantly accelerates HEVC encoding with minimal rate-distortion impact across both equirectangular and CMP-tiled content, yielding encoding-time reductions of roughly 33%-59% for ERP and about 51% on average for CMP, with Bjontegaard Delta Encoding Time (BDET) gains approaching -50% and wall-clock speedups of up to 4.2x.
💡 Research Summary
The paper addresses a critical bottleneck in delivering ultra‑high‑resolution 360° video through OMAF‑based HTTP Adaptive Streaming (HAS): the massive computational cost of generating a full bitrate ladder that spans multiple resolutions (HD, 4K, 8K) and several quantization parameters (QPs). Encoding each representation with a full‑search rate‑distortion optimization (RDO) can require more than 30 CPU‑hours per 8K sequence, which is prohibitive for large VR libraries or live streaming scenarios.
To reduce this cost, the authors propose two cross‑resolution analysis‑reuse pipelines that exploit the strong correlation of encoder decisions (CU partitions, intra/inter modes, motion vectors) across QPs and resolutions. Both pipelines are built on the x265 HEVC encoder’s analysis‑save/load primitives, using the strongest reuse level (level 10) to preserve CU‑level structure.
1. Cascaded Reuse Cascade (CRC) – a strict bottom‑up cascade: an anchor encode is performed at the lowest QP for a given resolution (e.g., HD). Its analysis data are reused for all higher‑QP encodes at the same resolution and then scaled (factor 2) to seed the next higher resolution (HD → 4K → 8K). This maximizes cross‑resolution reuse but serializes the workflow, limiting parallelism.
2. Per‑Resolution Anchor (PRA) – each resolution independently selects its own anchor (low, medium, or high quality). Dependent encodes at the same resolution reuse that anchor’s analysis, while anchors for different resolutions are generated separately. This approach sacrifices some cross‑resolution reuse in exchange for greater flexibility and parallel execution across resolutions.
The study also investigates the impact of projection format on reuse efficiency. Equirectangular projection (ERP) introduces severe polar distortion, causing CU statistics to vary dramatically across resolutions and reducing the reliability of analysis scaling. Cubemap projection (CMP) splits the sphere into six uniformly sampled faces, stabilizing block statistics and enabling independent face‑wise encoding. Consequently, CMP not only improves the fidelity of cross‑resolution reuse but also provides up to six‑fold parallelism, which is especially valuable for OMAF’s region‑based delivery.
Experimental Setup
- Dataset: SJTU 8K 360° collection (15 ERP sequences, 8192 × 4096, 30 s each). Each ERP sequence is also converted to CMP (six 2048 × 2048 faces).
- Encoder: x265 (HEVC) medium preset, intra period = 1 s, 4 CPU threads, analysis‑reuse level 10, with limited refinement (re‑fine‑intra = 4, re‑fine‑inter = 2, re‑fine‑mv = 1).
- Hardware: Intel Xeon 32‑core, 128 GB RAM, allowing both serial and six‑way parallel runs.
- Metrics: PSNR, WS‑PSNR, Bjøntegaard Delta (BD‑PSNR, BD‑Rate), Bjøntegaard Delta Encoding Time (BDET), and wall‑clock speedup.
Results
- Encoding Time: ERP‑CRC and ERP‑PRA achieve 33 %–59 % reductions (BDET ≈ ‑30 % to ‑50 %). CMP‑based pipelines further improve savings, averaging 51 % reduction and reaching up to a 4.2× speedup (≈ 75 % less wall‑clock time).
- Rate‑Distortion Impact: BD‑PSNR loss stays within –0.10 dB to –0.30 dB; WS‑PSNR differences are ≤ 0.05 dB, indicating negligible perceptual quality degradation even for complex motion scenes.
- Projection Influence: CMP consistently outperforms ERP in analysis reuse efficiency by 8 %–12 %, thanks to uniform sampling and face‑wise parallelism.
- CRC vs. PRA: CRC yields the greatest time savings due to maximal cross‑resolution reuse but is less parallelizable. PRA offers more flexibility and better scalability across resolutions at the cost of additional memory for multiple anchor files.
All generated bitstreams are packaged into an OMAF‑compliant DASH presentation, including projection metadata, region‑based signalling, and representation descriptors, enabling immediate deployment in existing viewport‑adaptive players.
Conclusion
The authors demonstrate that systematic cross‑resolution analysis reuse, combined with projection‑aware processing (especially CMP), can cut 8K 360° multirate encoding time by up to 60 % while preserving visual quality. This makes the preparation of full bitrate ladders feasible for large‑scale VR services and opens the door to real‑time or near‑real‑time workflows. Future work will explore extending the approach to VVC, incorporating lightweight machine‑learning models for more robust scaling of analysis data, and integrating the pipelines into live‑streaming chains to further reduce latency and energy consumption.
Comments & Academic Discussion
Loading comments...
Leave a Comment