ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting
3D scene stylization approaches based on Neural Radiance Fields (NeRF) achieve promising results by optimizing with Nearest Neighbor Feature Matching (NNFM) loss. However, NNFM loss does not consider global style information. In addition, the implicit representation of NeRF limits their fine-grained control over the resulting scenes. In this paper, we introduce ABC-GS, a novel framework based on 3D Gaussian Splatting to achieve high-quality 3D style transfer. To this end, a controllable matching stage is designed to achieve precise alignment between scene content and style features through segmentation masks. Moreover, a style transfer loss function based on feature alignment is proposed to ensure that the outcomes of style transfer accurately reflect the global style of the reference image. Furthermore, the original geometric information of the scene is preserved with the depth loss and Gaussian regularization terms. Extensive experiments show that our ABC-GS provides controllability of style transfer and achieves stylization results that are more faithfully aligned with the global style of the chosen artistic reference. Our homepage is available at https://vpx-ecnu.github.io/ABC-GS-website.
💡 Research Summary
**
The paper introduces ABC‑GS, a novel framework for 3D scene style transfer that builds on the recent 3D Gaussian Splatting (3DGS) representation rather than the traditional Neural Radiance Fields (NeRF). The authors identify two major shortcomings of existing NeRF‑based methods: (1) the Nearest‑Neighbor Feature Matching (NNFM) loss only aligns each rendered feature with its closest style feature, ignoring the global distribution of style information; (2) the implicit nature of NeRF makes fine‑grained, user‑controlled editing difficult.
ABC‑GS addresses these issues through a two‑stage pipeline. In the controllable matching stage, users provide semantic masks for the content images (e.g., via SAM). These masks are back‑projected onto the explicit 3D Gaussians, assigning a weight for each semantic label to every Gaussian. Corresponding style masks are generated for the reference artwork, and to avoid style leakage across semantic boundaries the authors apply mask erosion followed by a “style isolation” procedure that extracts each masked region, fills missing pixels, and recomposes a clean style patch. This results in well‑defined semantic matching groups (\Omega_z) that pair content masks, style masks, and the associated Gaussians. A linear color transformation (matrix (A) and bias (b)) is then computed so that the mean and covariance of the recolored content colors match those of the style colors, ensuring consistent color palettes across the scene.
The stylization stage introduces the Feature Alignment Style Transfer (F‑AST) loss, which replaces NNFM with a global alignment strategy. For each semantic group, the method builds an affinity matrix (A_z) by checking whether a rendered feature and a style feature belong to each other’s k‑nearest‑neighbor sets (k=5 in experiments). This binary matrix captures mutual similarity relationships across the entire feature sets. An alignment matrix (P_z) is then obtained by minimizing the weighted squared distance between transformed rendered features and style features, effectively solving a Procrustes‑like problem that aligns the two distributions. The final F‑AST loss is a cosine similarity term (\mathcal{L}{FAST}=1-\frac{F_r\cdot F{rs}}{|F_r||F_{rs}|}), encouraging the rendered feature map to follow the global style distribution of the reference image.
To preserve scene content and geometry, ABC‑GS combines several auxiliary losses: a content preservation loss (\mathcal{L}{con}) (L2 distance between rendered and original VGG features), a total variation loss (\mathcal{L}{tv}) to suppress high‑frequency noise, a depth loss (\mathcal{L}{dep}) that penalizes deviations between the rendered depth map and the depth map obtained from the original Gaussians, and regularization terms on Gaussian scale and opacity ((\mathcal{L}{sca}, \mathcal{L}_{opa})). The overall objective is a weighted sum of these terms, with empirically chosen coefficients that balance stylization strength against content fidelity.
Experiments are conducted on two real‑world multi‑view datasets (LLFF and T&T) and on style collections from WikiArt and ARF. The authors compare ABC‑GS against state‑of‑the‑art 3D style transfer methods including ARF, Ref‑NPR (augmented with AdaIN), and StyleGaussian. Qualitative results demonstrate that ABC‑GS produces more faithful global style transfer, better color matching, and fewer artifacts such as style bleeding between objects. Quantitatively, the method achieves lower Gram‑matrix distances to the reference style, maintains depth consistency, and runs at real‑time frame rates on a single RTX 4090 thanks to the efficient 3DGS rasterizer.
In summary, ABC‑GS makes three key contributions: (1) a controllable matching stage that leverages semantic masks to precisely align content and style at the level of individual Gaussians; (2) the F‑AST loss that aligns entire feature distributions, overcoming the locality limitation of NNFM; (3) a suite of geometry‑preserving losses that keep the original scene structure intact while allowing expressive stylization. By exploiting the explicit nature of 3D Gaussian Splatting, the framework enables interactive, fine‑grained 3D style transfer suitable for VR/AR, gaming, and digital content creation, pushing the field beyond the constraints of NeRF‑based approaches.
Comments & Academic Discussion
Loading comments...
Leave a Comment