실시간 스트리밍을 위한 4D 가우시안 스플래팅 최적화 프레임워크 AirGS

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: 실시간 스트리밍을 위한 4D 가우시안 스플래팅 최적화 프레임워크 AirGS
ArXiv ID: 2512.20943
Date: 2025-12-24
Authors: Zhe Wang, Jinghang Li, Yifei Zhu

📝 Abstract

Free-viewpoint video (FVV) enables immersive viewing experiences by allowing users to view scenes from arbitrary perspectives. As a prominent reconstruction technique for FVV generation, 4D Gaussian Splatting (4DGS) models dynamic scenes with time-varying 3D Gaussian ellipsoids and achieves high-quality rendering via fast rasterization. However, existing 4DGS approaches suffer from quality degradation over long sequences and impose substantial bandwidth and storage overhead, limiting their applicability in real-time and wide-scale deployments. Therefore, we present AirGS, a streaming-optimized 4DGS framework that rearchitects the training and delivery pipeline to enable high-quality, low-latency FVV experiences. AirGS converts Gaussian video streams into multi-channel 2D formats and intelligently identifies keyframes to enhance frame reconstruction quality. It further combines temporal coherence with inflation loss to reduce training time and representation size. To support communication-efficient transmission, AirGS models 4DGS delivery as an integer linear programming problem and design a lightweight pruning level selection algorithm to adaptively prune the Gaussian updates to be transmitted, balancing reconstruction quality and bandwidth consumption. Extensive experiments demonstrate that AirGS reduces quality deviation in PSNR by more than 20% when scene changes, maintains framelevel PSNR consistently above 30, accelerates training by 6×, reduces per-frame transmission size by nearly 50% compared to the SOTA 4DGS approaches.

💡 Deep Analysis

Deep Dive into 실시간 스트리밍을 위한 4D 가우시안 스플래팅 최적화 프레임워크 AirGS.

📄 Full Content

AirGS: Real-Time 4D Gaussian Streaming for Free-Viewpoint Video Experiences Zhe Wang, Jinghang Li and Yifei Zhu∗ Global College, Shanghai Jiao Tong University Email: 123369423@sjtu.edu.cn, sjtulijinghang@sjtu.edu.cn, yifei.zhu@sjtu.edu.cn Abstract—Free-viewpoint video (FVV) enables immersive view- ing experiences by allowing users to view scenes from arbitrary perspectives. As a prominent reconstruction technique for FVV generation, 4D Gaussian Splatting (4DGS) models dynamic scenes with time-varying 3D Gaussian ellipsoids and achieves high-quality rendering via fast rasterization. However, existing 4DGS approaches suffer from quality degradation over long sequences and impose substantial bandwidth and storage over- head, limiting their applicability in real-time and wide-scale de- ployments. Therefore, we present AirGS, a streaming-optimized 4DGS framework that rearchitects the training and delivery pipeline to enable high-quality, low-latency FVV experiences. AirGS converts Gaussian video streams into multi-channel 2D formats and intelligently identifies keyframes to enhance frame reconstruction quality. It further combines temporal coherence with inflation loss to reduce training time and representation size. To support communication-efficient transmission, AirGS models 4DGS delivery as an integer linear programming problem and design a lightweight pruning level selection algorithm to adap- tively prune the Gaussian updates to be transmitted, balancing reconstruction quality and bandwidth consumption. Extensive experiments demonstrate that AirGS reduces quality deviation in PSNR by more than 20% when scene changes, maintains frame- level PSNR consistently above 30, accelerates training by 6×, reduces per-frame transmission size by nearly 50% compared to the SOTA 4DGS approaches. Index Terms—Free-viewpoint video, 4D Gaussian Splatting, video streaming I. INTRODUCTION Free-viewpoint video (FVV) allows users to explore a dynamic scene from arbitrary viewpoints, offering an immer- sive and interactive visual experience. To generate the FVV, the straightforward way is to reconstruct the scene into 3D model from captured multi-view 2D image sequences. Existing dynamic scene reconstruction methods either explicitly model scene geometry using volumetric or mesh-based primitives [1], [2], or synthesize novel views through image-based interpola- tion [3]–[5]. However, these methods often struggle to achieve high reconstruction quality in real-world scenes with complex geometry and rich appearance variations [6]. Recently, 3D Gaussian Splatting (3DGS) [8] has emerged as a highly efficient method for novel view synthesis, offering superior rendering quality and speed. The core idea is to rep- resent a static scene as a set of 3D Gaussian ellipsoids learned via optimization, and to render them efficiently using fast ras- terization. Building on this breakthrough, several efforts have This work is supported by the National Key R&D Program of China (Grant No. 2024YFC3017100) and NSFC No. 62302292. ∗Corresponding author. extended 3DGS to 4D dynamic scene modeling [9], [10]. How- ever, these methods typically use a fixed number of Gaussian ellipsoids across all frames, which limits reconstruction quality over long sequences and increases per-frame storage overhead. To address this, 3DGStream [6] introduces a keyframe-based design, where a reference frame (anchor) is represented with full 3D Gaussians, and subsequent frames capture dynamics via a lightweight MLP that predicts Gaussian updates in the anchor’s space. To reduce the runtime cost associated with frequent MLP queries, V 3 [7] further stores precomputed MLP outputs as multi-channel 2D images, enabling faster decoding with minimal computational overhead. These advancements demonstrate the potential of 4DGS- based FVV generation. However, streaming 4DGS content in practical network environments remains largely unexplored. Fundamentally, several key challenges persist towards achiev- ing real-time deployment. First, while the introduction of keyframe and the auxiliary MLP significantly reduces com- munication cost, keyframe selection remains an open problem. Most of existing studies fix the set of Gaussian primitives in the anchor space after initialization. Consequently, simply selecting the first frame as a keyframe often leads to noticeable quality degradation and loss of image details, demonstrated in Fig. 1, since early frames may lack information about later- appearing objects or large motions. Second, training a full 3DGS model for each keyframe is computationally expen- sive. Naively increasing the number of keyframes improves reconstruction quality, but significantly increases training time and resource usage. Both highlight the need for an efficient and scalable training strategy. Third, transmitting a full 3DGS for every frame incurs substantial bandwidth and latency overhead, making it difficult to satisfy real-time requirements in practical fluctuating

…(Full text truncated)…

📄 Read Full PDF on ArXiv