Nebula 대규모 3D 가우시안 스플래팅 협업 렌더링 가속 프레임워크

Reading time: 6 minute
...

📝 Abstract

3D Gaussian splatting (3DGS) has drawn significant attention in the architectural community recently. However, current architectural designs often overlook the 3DGS scalability, making them fragile for extremely large-scale 3DGS. Meanwhile, the VR bandwidth requirement makes it impossible to deliver high-fidelity and smooth VR content from the cloud. We present Nebula, a coherent acceleration framework for large-scale 3DGS collaborative rendering. Instead of streaming videos, Nebula streams intermediate results after the LoD search, reducing 1925% data communication between the cloud and the client. To further enhance the motionto-photon experience, we introduce a temporal-aware LoD search in the cloud that tames the irregular memory access and reduces redundant data access by exploiting temporal coherence across frames. On the client side, we propose a novel stereo rasterization that enables two eyes to share most computations during the stereo rendering with bit-accurate quality. With minimal hardware augmentations, Nebula achieves 2.7× motion-to-photon speedup and reduces 1925% bandwidth over lossy video streaming. CCS Concepts: • Computer systems organization → Neural networks; Real-time system architecture; • Computing methodologies → Rasterization.

💡 Analysis

3D Gaussian splatting (3DGS) has drawn significant attention in the architectural community recently. However, current architectural designs often overlook the 3DGS scalability, making them fragile for extremely large-scale 3DGS. Meanwhile, the VR bandwidth requirement makes it impossible to deliver high-fidelity and smooth VR content from the cloud. We present Nebula, a coherent acceleration framework for large-scale 3DGS collaborative rendering. Instead of streaming videos, Nebula streams intermediate results after the LoD search, reducing 1925% data communication between the cloud and the client. To further enhance the motionto-photon experience, we introduce a temporal-aware LoD search in the cloud that tames the irregular memory access and reduces redundant data access by exploiting temporal coherence across frames. On the client side, we propose a novel stereo rasterization that enables two eyes to share most computations during the stereo rendering with bit-accurate quality. With minimal hardware augmentations, Nebula achieves 2.7× motion-to-photon speedup and reduces 1925% bandwidth over lossy video streaming. CCS Concepts: • Computer systems organization → Neural networks; Real-time system architecture; • Computing methodologies → Rasterization.

📄 Content

Neural rendering is ushering in a renaissance in computer graphics by enabling photorealistic and view-dependent rendering, with much higher speeds than conventional ray tracing [16,74,76]. In recent years, neural rendering has drawn significant attention in the architectural community [18, 24, 25, 27, 31, 35, 50-52, 55-57, 62, 65, 71, 78, 82, 104], with 3D Gaussian splatting (3DGS) standing out due to its compact representation and superior rendering performance. †Corresponding authors.

While prior 3DGS accelerator designs [18,24,25,35,51,52,55,62,104] achieve real-time mobile rendering for smallscale scenes [2,37,48], they often overlook the scalability challenge of 3DGS, making their designs fragile for largescale rendering (e.g., city-scale) [47,59,66,79,98]. As shown in Sec. 3.1, the memory requirement for such scenes can reach up to 66 GB, far exceeding the typical memory capacity (<12 GB) of devices in virtual reality (VR) [6,8,99]. This memory gap motivates us to design a collaborative rendering framework that leverages the resources available in the cloud to overcome the memory constraints of VR devices.

Despite numerous solutions for remote or collaborative rendering [54,69,94,100,101,107,108], they primarily target conventional video streaming. Such approaches paired with the HEVC codecs [70,84,96] often require over 1 Gbps for 4K VR content at 90 FPS [68]. To alleviate the bandwidth pressure, we make two key insights unique to large-scale 3DGS: 1) in virtual 3D scenes, the number of newly visible Gaussians introduced by continuous pose changes remains roughly constant; and 2) the memory requirement of large-scale 3DGS peaks during the initial level-of-detail (LoD) search, but drops sharply in the subsequent stages.

By leveraging these insights, we propose Nebula, a collaborative rendering framework tailored for 3DGS at infinite scale. Rather than streaming fully rendered images, Nebula transmits the intermediate results after the initial LoD search, i.e., Gaussians required for subsequent stages, to the client. We show that Nebula requires a 1925% lower bandwidth compared to conventional video streaming. Additionally, Nebula also exhibits strong scalability: its bandwidth demand is less susceptible to resolution or frame rate increases.

Algorithmically, Nebula makes three core contributions. First, we propose a temporal-aware LoD search algorithm in Sec. 4.2 that can be deployed upon existing GPUs out-of-thebox. Specifically, our algorithm regulates the DRAM access by streamingly processing data in LoD search and leverages the temporal similarity across frames to avoid unnecessary data accesses. Second, we design a runtime Gaussian management system in Sec. 4.3 that compresses and transmits only the non-overlapped Gaussians across the adjacent frames arXiv:2512.20495v1 [cs.AR] 23 Dec 2025 Fig. 1. The rendering pipeline for large-scale 3DGS consists of four stages: LoD search, preprocessing, sorting, and rasterization. First, LoD search traverses the LoD tree to determine a set of Gaussians with a desired LoD granularity. The result Gaussians form a “cut” that separates the top and bottom of the LOD tree. Then, the Gaussians on the cut go through a sequence of operations, i.e., preprocessing, sorting, and rasterization, to render an image, similar to the small-scale 3DGS pipelines [46].

to further reduce the data transfer between the cloud and the client. Third, for VR rendering, where two tightly paired stereo displays need to be rendered, we introduce a novel stereo rasterization pipeline in Sec. 4.4 that exploits triagulation [34,85], a widely-used technique in computer vision to share most of the computations in the remaining pipeline while still producing bit-accurate images.

Architecturally, we show that our stereo rasterization can be easily integrated into any mainstream 3DGS accelerators with minimal hardware augmentations (Sec. 5). Overall, Nebula achieves 2.7× motion-to-photon speedup and reduces bandwidth by 1925% compared to video streaming. On the cloud side, our temporal-aware LoD search delivers up to 52.7× speedup compared to off-the-shelf GPU implementations. On the client side, our stereo rasterization achieves up to 21.7× speedup and 5.3× speedup, compared to a mobile Ampere GPU and the state-of-the-art accelerators [52,104], respectively, all with minimal hardware overhead.

The contributions of this paper are as follows:

• A collaborative rendering framework that is tailored for large-scale 3DGS with great scalability.

In this section, we first give a brief background on remote rendering in Sec. 2.1. Then, we introduce the general rendering pipeline for large-scale 3DGS in Sec. 2.2.

Video Streaming. The mainstream approach to remote rendering in AR/VR is video streaming. A client first transmits a pose to a remote server, which then renders an image corresponding to that pose and streams it back to the client. Lastly, the client displays the received image on the sc

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut