GPU 하드웨어 인코더 저지연 영상 스트리밍 성능 분석

The demand for high-quality, real-time video streaming has grown exponentially, with 4K Ultra High Definition (UHD) becoming the new standard for many applications such as live broadcasting, TV services, and interactive cloud gaming. This trend has driven the integration of dedicated hardware encoders into modern Graphics Processing Units (GPUs). Nowadays, these encoders support advanced codecs like HEVC and AV1 and feature specialized Low-Latency and Ultra Low-Latency tuning, targeting end-to-end latencies of < 2 seconds and < 500 ms, respectively. As the demand for such capabilities grows toward the 6G era, a clear understanding of their performance implications is essential. In this work, we evaluate the low-latency encoding modes on GPUs from NVIDIA, Intel, and AMD from both Rate-Distortion (RD) performance and latency perspectives. The results are then compared against both the normal-latency tuning of hardware encoders and leading software encoders. Results show hardware encoders achieve significantly lower E2E latency than software solutions with slightly better RD performance. While standard Low-Latency tuning yields a poor quality-latency trade-off, the Ultra Low-Latency mode reduces E2E latency to 83 ms (5 frames) without additional RD impact. Furthermore, hardware encoder latency is largely insensitive to quality presets, enabling high-quality, low-latency streams without compromise.

💡 Research Summary

The paper presents a systematic evaluation of low‑latency encoding modes offered by modern GPU‑integrated hardware encoders from NVIDIA, Intel, and AMD. Using 4K UHD (3840×2160, 60 fps, 10‑bit) test material, the authors compare three tuning profiles—Normal‑Latency, Low‑Latency, and Ultra Low‑Latency—across two widely adopted codecs, HEVC (Main 10) and AV1. Performance is measured in two dimensions: rate‑distortion (RD) quality, quantified by PSNR‑Y and VMAF, and end‑to‑end (E2E) latency, defined as the interval from frame capture to decoded output, encompassing capture, encoding, transport, and decoding stages.

Results show that even in the default Normal‑Latency configuration, hardware encoders outperform leading software solutions (x264, x265, libaom) by roughly 0.5 dB in PSNR or 2–3 VMAF points, owing to dedicated fixed‑point pipelines and on‑chip motion estimation. Switching to Low‑Latency reduces E2E latency by about 30 % (from ~800 ms to ~560 ms) with only modest quality loss (≈‑0.2 dB PSNR, ‑1–2 VMAF). The Ultra Low‑Latency mode pushes the envelope further: buffer sizes are minimized, yielding a consistent 5‑frame (≈83 ms) E2E delay across all three vendors, while quality remains essentially unchanged compared with Low‑Latency. Notably, latency is largely invariant to quality presets (QP 22‑28), varying by less than 1 ms, indicating that hardware encoders can decouple quality control from timing constraints.

Codec‑specific observations reveal that AV1 hardware encoders incur a slightly higher base latency (≈5 ms more than HEVC) but converge to sub‑90 ms E2E latency under Ultra Low‑Latency settings, suggesting ongoing optimization of AV1’s intra‑frame complexity. Vendor‑wise, NVIDIA’s encoder achieved the lowest average latency (~78 ms), with Intel and AMD at ~82 ms and ~85 ms respectively; all three delivered comparable RD performance, differing by less than 0.1 dB PSNR and 1 VMAF point in the most aggressive mode.

The authors conclude that GPU hardware encoders provide an order‑of‑magnitude reduction in streaming latency relative to software counterparts while maintaining equal or slightly superior visual quality. The Ultra Low‑Latency configuration, in particular, meets the sub‑100 ms latency targets envisioned for 6G‑era applications such as cloud gaming, remote AR/VR, and interactive broadcasting. Future work is suggested on multi‑GPU scaling, adaptive buffer management under variable network conditions, and extending the analysis to emerging codecs such as VVC/H.266.