Tune-Your-Style: Intensity-tunable 3D Style Transfer with Gaussian Splatting

Tune-Your-Style: Intensity-tunable 3D Style Transfer with Gaussian Splatting
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

3D style transfer refers to the artistic stylization of 3D assets based on reference style images. Recently, 3DGS-based stylization methods have drawn considerable attention, primarily due to their markedly enhanced training and rendering speeds. However, a vital challenge for 3D style transfer is to strike a balance between the content and the patterns and colors of the style. Although the existing methods strive to achieve relatively balanced outcomes, the fixed-output paradigm struggles to adapt to the diverse content-style balance requirements from different users. In this work, we introduce a creative intensity-tunable 3D style transfer paradigm, dubbed \textbf{Tune-Your-Style}, which allows users to flexibly adjust the style intensity injected into the scene to match their desired content-style balance, thus enhancing the customizability of 3D style transfer. To achieve this goal, we first introduce Gaussian neurons to explicitly model the style intensity and parameterize a learnable style tuner to achieve intensity-tunable style injection. To facilitate the learning of tunable stylization, we further propose the tunable stylization guidance, which obtains multi-view consistent stylized views from diffusion models through cross-view style alignment, and then employs a two-stage optimization strategy to provide stable and efficient guidance by modulating the balance between full-style guidance from the stylized views and zero-style guidance from the initial rendering. Extensive experiments demonstrate that our method not only delivers visually appealing results, but also exhibits flexible customizability for 3D style transfer. Project page is available at https://zhao-yian.github.io/TuneStyle.


💡 Research Summary

The paper introduces Tune‑Your‑Style, a novel intensity‑tunable 3D style transfer framework built on top of 3D Gaussian Splatting (3DGS). Existing 3DGS‑based stylization methods produce a single, fixed output, forcing users to accept either over‑stylized or under‑stylized results. Tune‑Your‑Style addresses this limitation by explicitly modeling style intensity and providing a learnable “style tuner” that lets users continuously adjust the amount of style injected into a scene, from 0 % (no style) to 100 % (full style).

Technical contributions can be grouped into two modules: (1) Intensity‑tunable Style Injection (ISI) and (2) Tunable Stylization Guidance (TSG).

ISI introduces Gaussian neurons—a small neural network attached to each 3D Gaussian primitive (position, scale, rotation, opacity, color). Given a reference style image, the neurons predict attribute offsets for every primitive, thereby forming a dense, per‑primitive style field. To make this field controllable, the authors design a style tuner whose input is a continuous intensity parameter β. β is quantized by a staircase function H(β) into discrete levels, then mapped to an embedding Vβ via a learnable lookup. The final stylized scene is computed as ˆΘβ = Θ + Vβ ⊙ G(S, Θ), where G denotes the neuron‑predicted offsets and ⊙ is element‑wise multiplication. This formulation makes the intensity parameter directly influence every attribute, allowing smooth interpolation between the original and fully stylized appearance. Additionally, a 3D Gaussian filter removes low‑importance Gaussians (determined by hit count, opacity, and occlusion) to prevent artifacts caused by redundant primitives during style injection.

TSG provides the supervision needed for the tunable injection to converge. Instead of training a costly 2‑D encoder‑decoder, the method leverages a pre‑trained image‑conditioned diffusion model (e.g., IP‑Adapter). For each training view, the rendered image is fed to the diffusion model together with the style reference, producing a stylized view. However, naïvely using these views leads to multi‑view inconsistency: textures may shift or blur across viewpoints. To solve this, the authors propose Cross‑View Style Alignment. An anchor view is randomly selected; its feature map is back‑projected into 3D using the predicted depth, then re‑projected onto other viewpoints, yielding warped content features. These warped features are concatenated with the current view’s features and injected as key/value pairs into the self‑attention layers of the diffusion process (mutual self‑attention). This aligns the style texture across views while preserving the underlying geometry.

Guidance is delivered in two stages. In the full‑style stage, the loss is computed between the 3DGS rendering (with Vβ close to 1) and the diffusion‑generated stylized view, encouraging the network to follow the style. In the zero‑style stage, the loss is computed against the original rendering (Vβ close to 0), preventing the network from drifting away from the content. By gradually varying β during training, the style tuner learns to produce stable outputs for any intensity level.

The authors evaluate Tune‑Your‑Style on several indoor and outdoor scenes, comparing against state‑of‑the‑art NeRF‑based (StyleRF, CoARF) and 3DGS‑based (G‑Style, InstantStyleGaussian) methods. Quantitatively, the proposed method achieves higher PSNR, SSIM, and lower LPIPS, indicating better fidelity and perceptual similarity. A user study with 30 participants shows a strong preference for the tunable interface: 92 % of respondents found the intensity slider intuitive and appreciated the smooth transition of results. Qualitative examples demonstrate that as β increases, textures and colors gradually adopt the artistic style while preserving geometry, and the method can even blend multiple style references to create hybrid stylizations.

Limitations include reliance on 2‑D diffusion priors, which may struggle with highly reflective or transparent surfaces, and the linear β‑to‑embedding mapping, which could be extended to non‑linear functions for more expressive control. Future work may explore integrating 3‑D diffusion models, richer intensity schedules, and real‑time optimization for interactive applications.

In summary, Tune‑Your‑Style advances 3D style transfer by (i) explicitly modeling per‑primitive style intensity, (ii) providing a continuous, learnable intensity control, (iii) ensuring multi‑view consistency through cross‑view diffusion alignment, and (iv) demonstrating superior visual quality and user‑centric customizability. This framework opens new possibilities for personalized artistic creation in gaming, AR/VR, and digital content pipelines.


Comments & Academic Discussion

Loading comments...

Leave a Comment