Rate-Distortion Analysis of Multiview Coding in a DIBR Framework
Depth image based rendering techniques for multiview applications have been recently introduced for efficient view generation at arbitrary camera positions. Encoding rate control has thus to consider both texture and depth data. Due to different structures of depth and texture images and their different roles on the rendered views, distributing the available bit budget between them however requires a careful analysis. Information loss due to texture coding affects the value of pixels in synthesized views while errors in depth information lead to shift in objects or unexpected patterns at their boundaries. In this paper, we address the problem of efficient bit allocation between textures and depth data of multiview video sequences. We adopt a rate-distortion framework based on a simplified model of depth and texture images. Our model preserves the main features of depth and texture images. Unlike most recent solutions, our method permits to avoid rendering at encoding time for distortion estimation so that the encoding complexity is not augmented. In addition to this, our model is independent of the underlying inpainting method that is used at decoder. Experiments confirm our theoretical results and the efficiency of our rate allocation strategy.
💡 Research Summary
**
This paper addresses the fundamental problem of how to allocate a limited bit budget between texture (color) and depth data in a depth‑image‑based rendering (DIBR) multiview video system. In DIBR, each captured view consists of a texture image and an associated depth map; at the decoder, virtual views are synthesized by projecting the texture of the nearest reference views using the depth information. Errors in the texture stream directly affect the color of the synthesized pixels, while errors in the depth stream cause geometric distortions such as object displacement or boundary artifacts. Consequently, the overall visual quality of the rendered views depends critically on the relative amount of bits assigned to texture versus depth.
State of the art and its limitations
Existing multiview coding (MVC) standards, such as H.264/AVC, typically allocate bits based on the distortion of the depth maps themselves or use a fixed ratio (e.g., 70 % texture, 30 % depth). More recent works try to improve this by rendering one or more virtual views at the encoder, measuring the resulting distortion, and then adjusting the bit allocation. While these approaches can achieve higher quality, they suffer from two major drawbacks: (1) the encoder must perform computationally expensive rendering for each candidate allocation, which is unsuitable for real‑time applications; (2) the derived rate‑distortion (R‑D) functions are often dependent on the specific view geometry and on the in‑painting (hole‑filling) algorithm used at the decoder, limiting their generality.
Proposed solution
The authors propose a novel R‑D model that eliminates the need for any rendering at the encoder while still providing a near‑optimal bit allocation. Their approach rests on three pillars:
- Simplified scene model – The 3‑D scene is abstracted as a 2‑D binary function f(t₁,t₂) that equals 1 inside foreground objects and 0 elsewhere (flat background). This captures the essential geometry (foreground vs. background) while keeping the mathematics tractable. Depth values are assumed to lie within a known range
Comments & Academic Discussion
Loading comments...
Leave a Comment