📝 Original Info
- Title: 실시간 스트리밍을 위한 4D 가우시안 스플래팅 최적화 프레임워크 AirGS
- ArXiv ID: 2512.20943
- Date: 2025-12-24
- Authors: Zhe Wang, Jinghang Li, Yifei Zhu
📝 Abstract
Free-viewpoint video (FVV) enables immersive viewing experiences by allowing users to view scenes from arbitrary perspectives. As a prominent reconstruction technique for FVV generation, 4D Gaussian Splatting (4DGS) models dynamic scenes with time-varying 3D Gaussian ellipsoids and achieves high-quality rendering via fast rasterization. However, existing 4DGS approaches suffer from quality degradation over long sequences and impose substantial bandwidth and storage overhead, limiting their applicability in real-time and wide-scale deployments. Therefore, we present AirGS, a streaming-optimized 4DGS framework that rearchitects the training and delivery pipeline to enable high-quality, low-latency FVV experiences. AirGS converts Gaussian video streams into multi-channel 2D formats and intelligently identifies keyframes to enhance frame reconstruction quality. It further combines temporal coherence with inflation loss to reduce training time and representation size. To support communication-efficient transmission, AirGS models 4DGS delivery as an integer linear programming problem and design a lightweight pruning level selection algorithm to adaptively prune the Gaussian updates to be transmitted, balancing reconstruction quality and bandwidth consumption. Extensive experiments demonstrate that AirGS reduces quality deviation in PSNR by more than 20% when scene changes, maintains framelevel PSNR consistently above 30, accelerates training by 6×, reduces per-frame transmission size by nearly 50% compared to the SOTA 4DGS approaches.
💡 Deep Analysis
Deep Dive into 실시간 스트리밍을 위한 4D 가우시안 스플래팅 최적화 프레임워크 AirGS.
Free-viewpoint video (FVV) enables immersive viewing experiences by allowing users to view scenes from arbitrary perspectives. As a prominent reconstruction technique for FVV generation, 4D Gaussian Splatting (4DGS) models dynamic scenes with time-varying 3D Gaussian ellipsoids and achieves high-quality rendering via fast rasterization. However, existing 4DGS approaches suffer from quality degradation over long sequences and impose substantial bandwidth and storage overhead, limiting their applicability in real-time and wide-scale deployments. Therefore, we present AirGS, a streaming-optimized 4DGS framework that rearchitects the training and delivery pipeline to enable high-quality, low-latency FVV experiences. AirGS converts Gaussian video streams into multi-channel 2D formats and intelligently identifies keyframes to enhance frame reconstruction quality. It further combines temporal coherence with inflation loss to reduce training time and representation size. To support communicati
📄 Full Content
AirGS: Real-Time 4D Gaussian Streaming for
Free-Viewpoint Video Experiences
Zhe Wang, Jinghang Li and Yifei Zhu∗
Global College, Shanghai Jiao Tong University
Email: 123369423@sjtu.edu.cn, sjtulijinghang@sjtu.edu.cn, yifei.zhu@sjtu.edu.cn
Abstract—Free-viewpoint video (FVV) enables immersive view-
ing experiences by allowing users to view scenes from arbitrary
perspectives. As a prominent reconstruction technique for FVV
generation, 4D Gaussian Splatting (4DGS) models dynamic
scenes with time-varying 3D Gaussian ellipsoids and achieves
high-quality rendering via fast rasterization. However, existing
4DGS approaches suffer from quality degradation over long
sequences and impose substantial bandwidth and storage over-
head, limiting their applicability in real-time and wide-scale de-
ployments. Therefore, we present AirGS, a streaming-optimized
4DGS framework that rearchitects the training and delivery
pipeline to enable high-quality, low-latency FVV experiences.
AirGS converts Gaussian video streams into multi-channel 2D
formats and intelligently identifies keyframes to enhance frame
reconstruction quality. It further combines temporal coherence
with inflation loss to reduce training time and representation size.
To support communication-efficient transmission, AirGS models
4DGS delivery as an integer linear programming problem and
design a lightweight pruning level selection algorithm to adap-
tively prune the Gaussian updates to be transmitted, balancing
reconstruction quality and bandwidth consumption. Extensive
experiments demonstrate that AirGS reduces quality deviation in
PSNR by more than 20% when scene changes, maintains frame-
level PSNR consistently above 30, accelerates training by 6×,
reduces per-frame transmission size by nearly 50% compared to
the SOTA 4DGS approaches.
Index Terms—Free-viewpoint video, 4D Gaussian Splatting,
video streaming
I. INTRODUCTION
Free-viewpoint video (FVV) allows users to explore a
dynamic scene from arbitrary viewpoints, offering an immer-
sive and interactive visual experience. To generate the FVV,
the straightforward way is to reconstruct the scene into 3D
model from captured multi-view 2D image sequences. Existing
dynamic scene reconstruction methods either explicitly model
scene geometry using volumetric or mesh-based primitives [1],
[2], or synthesize novel views through image-based interpola-
tion [3]–[5]. However, these methods often struggle to achieve
high reconstruction quality in real-world scenes with complex
geometry and rich appearance variations [6].
Recently, 3D Gaussian Splatting (3DGS) [8] has emerged
as a highly efficient method for novel view synthesis, offering
superior rendering quality and speed. The core idea is to rep-
resent a static scene as a set of 3D Gaussian ellipsoids learned
via optimization, and to render them efficiently using fast ras-
terization. Building on this breakthrough, several efforts have
This work is supported by the National Key R&D Program of China (Grant
No. 2024YFC3017100) and NSFC No. 62302292. ∗Corresponding author.
extended 3DGS to 4D dynamic scene modeling [9], [10]. How-
ever, these methods typically use a fixed number of Gaussian
ellipsoids across all frames, which limits reconstruction quality
over long sequences and increases per-frame storage overhead.
To address this, 3DGStream [6] introduces a keyframe-based
design, where a reference frame (anchor) is represented with
full 3D Gaussians, and subsequent frames capture dynamics
via a lightweight MLP that predicts Gaussian updates in the
anchor’s space. To reduce the runtime cost associated with
frequent MLP queries, V 3 [7] further stores precomputed MLP
outputs as multi-channel 2D images, enabling faster decoding
with minimal computational overhead.
These advancements demonstrate the potential of 4DGS-
based FVV generation. However, streaming 4DGS content in
practical network environments remains largely unexplored.
Fundamentally, several key challenges persist towards achiev-
ing real-time deployment. First, while the introduction of
keyframe and the auxiliary MLP significantly reduces com-
munication cost, keyframe selection remains an open problem.
Most of existing studies fix the set of Gaussian primitives
in the anchor space after initialization. Consequently, simply
selecting the first frame as a keyframe often leads to noticeable
quality degradation and loss of image details, demonstrated in
Fig. 1, since early frames may lack information about later-
appearing objects or large motions. Second, training a full
3DGS model for each keyframe is computationally expen-
sive. Naively increasing the number of keyframes improves
reconstruction quality, but significantly increases training time
and resource usage. Both highlight the need for an efficient
and scalable training strategy. Third, transmitting a full 3DGS
for every frame incurs substantial bandwidth and latency
overhead, making it difficult to satisfy real-time requirements
in practical fluctuating
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.