Collaborative P2P Streaming of Interactive Live Free Viewpoint Video

Collaborative P2P Streaming of Interactive Live Free Viewpoint Video

We study an interactive live streaming scenario where multiple peers pull streams of the same free viewpoint video that are synchronized in time but not necessarily in view. In free viewpoint video, each user can periodically select a virtual view between two anchor camera views for display. The virtual view is synthesized using texture and depth videos of the anchor views via depth-image-based rendering (DIBR). In general, the distortion of the virtual view increases with the distance to the anchor views, and hence it is beneficial for a peer to select the closest anchor views for synthesis. On the other hand, if peers interested in different virtual views are willing to tolerate larger distortion in using more distant anchor views, they can collectively share the access cost of common anchor views. Given anchor view access cost and synthesized distortion of virtual views between anchor views, we study the optimization of anchor view allocation for collaborative peers. We first show that, if the network reconfiguration costs due to view-switching are negligible, the problem can be optimally solved in polynomial time using dynamic programming. We then consider the case of non-negligible reconfiguration costs (e.g., large or frequent view-switching leading to anchor-view changes). In this case, the view allocation problem becomes NP-hard. We thus present a locally optimal and centralized allocation algorithm inspired by Lloyd’s algorithm in non-uniform scalar quantization. We also propose a distributed algorithm with guaranteed convergence where each peer group independently make merge-and-split decisions with a well-defined fairness criteria. The results show that depending on the problem settings, our proposed algorithms achieve respective optimal and close-to-optimal performance in terms of total cost, and outperform a P2P scheme without collaborative anchor selection.


💡 Research Summary

The paper addresses the problem of efficiently delivering interactive live free‑viewpoint video (FVV) to multiple peers in a peer‑to‑peer (P2P) network. In FVV, a set of fixed anchor cameras capture texture and depth streams; a user can request any virtual viewpoint that lies between two neighboring anchors and the system synthesizes the view on‑the‑fly using depth‑image‑based rendering (DIBR). The quality of the synthesized view deteriorates as the virtual viewpoint moves farther away from its two anchor cameras, so each peer would ideally select the closest pair of anchors. However, if every peer independently requests its own optimal anchors, the same anchor streams are transmitted repeatedly, inflating the overall bandwidth cost. The authors therefore propose a collaborative anchor‑selection framework in which peers with different virtual‑view demands share common anchor streams, thereby trading off a modest increase in synthesis distortion for a substantial reduction in network cost.

Cost Model

The total system cost is modeled as the sum of three components:

  1. Anchor access cost (Cₐ) – a fixed price for acquiring a particular anchor’s texture‑depth pair from the source or from another peer.
  2. Synthesis distortion cost (C_d) – a function of the distance between the requested virtual view and the two anchors used for rendering; larger distances yield higher distortion.
  3. Reconfiguration cost (C_r) – incurred whenever a peer changes its virtual view and consequently needs to switch to a different anchor pair. This captures the overhead of additional control messages, possible buffering, and temporary quality loss.

The optimization problem is to assign to each peer i an anchor pair a(i) and a virtual view v(i) that minimize the sum of all Cₐ, C_d, and C_r across the whole system.

Zero Reconfiguration Cost (C_r ≈ 0)

When view‑switching is cheap (e.g., the network can instantly accommodate new anchor selections), the reconfiguration term can be ignored. The problem then reduces to partitioning the ordered set of possible virtual views into contiguous intervals, each interval being served by a single anchor pair. The authors solve this with a classic dynamic‑programming (DP) approach:

State: dp