공간 제어 기반 3D 자산 생성 및 정밀 편집 시스템

Reading time: 5 minute
...

📝 Original Info

  • Title: 공간 제어 기반 3D 자산 생성 및 정밀 편집 시스템
  • ArXiv ID: 2512.05343
  • Date: 2025-12-05
  • Authors: Elisabetta Fedele, Francis Engelmann, Ian Huang, Or Litany, Marc Pollefeys, Leonidas Guibas

📝 Abstract

Figure 1: SPACECONTROL enables spatially controlled 3D asset generation from simple geometric primitives such as superquadrics (light blue) and other geometry types such as polygon meshes. Top: rapid asset generation. From quick 3D sketches and brief text prompts, we can generate high quality assets. Bottom: fine-grained editing, including adjusting a chair's backrest and adding armrests (left) or precisely controlling a sofa's dimensions and pillow arrangements (right).

💡 Deep Analysis

Deep Dive into 공간 제어 기반 3D 자산 생성 및 정밀 편집 시스템.

Figure 1: SPACECONTROL enables spatially controlled 3D asset generation from simple geometric primitives such as superquadrics (light blue) and other geometry types such as polygon meshes. Top: rapid asset generation. From quick 3D sketches and brief text prompts, we can generate high quality assets. Bottom: fine-grained editing, including adjusting a chair’s backrest and adding armrests (left) or precisely controlling a sofa’s dimensions and pillow arrangements (right).

📄 Full Content

Generating 3D assets is a fundamental step in building virtual worlds, useful for gaming, simulation, virtual reality applications, and digital design. Recently the field of 3D generation gained immense traction, and we are now able to create assets of previously unseen quality (Xiang et al., 2025;Zhang et al., 2024;Vahdat et al., 2022;Gao et al., 2022;Wu et al., 2025;Siddiqui et al., 2024;Zhao et al., 2025;Chen et al., 2025b;Huang et al., 2025;Corsetti et al., 2025). A persistent challenge, however, is controllability, i.e., how users can effectively steer generation to align with desired shapes and appearances.

Current controllable 3D generation methods rely mainly on text or image conditioning. Text is accessible and flexible but inherently ambiguous and ill-suited for specifying precise geometry. Images provide stronger alignment with 3D structures but are cumbersome to edit and not intuitive for fine-grained adjustments. As a result, neither modality enables artists or designers to directly manipulate the geometry of generated objects. A more natural paradigm is to allow users to interact with the generative model in 3D space, starting from coarse or abstract geometry and refining toward detailed assets.

Existing methods that introduce 3D geometric control fall into two categories: training-based and guidance-based. Training-based methods fine-tune existing generative models to support a specific form of geometric input, e.g., LION (Vahdat et al., 2022) for voxel conditioning, and Spice-E (Sella et al., 2024) for primitive or mesh conditioning. These methods provide controllability but require retraining, which reduces the original model’s generalization capabilities. In contrast, guidance-based methods such as LatentNeRF (Metzer et al., 2023) and Coin3D (Dong et al., 2024) act solely at inference time without retraining, but usually involve substantial optimization overhead and constrain 3D structure only indirectly. Other works enrich existing 3D assets with geometric and appearance detail (Michel et al., 2022;Chen et al., 2023;Barda et al., 2025), yet they assume fine-grained input geometry, limiting usability in creative workflows where artists often begin with coarse sketches.

In this work, we present SPACECONTROL, a training-free method that injects explicit geometric control into modern framework for text-or image-conditioned 3D generation, such as Trellis (Xiang et al., 2025) or SAM 3D (Chen et al., 2025a), by directly encoding user-specified geometry into its latent space and using it as explicit guidance. Our method requires no additional training and enables controllable generation from diverse forms of geometry, ranging from simple primitives to detailed meshes.

We compare SPACECONTROL against both training-based (Sella et al., 2024) and guidancebased (Dong et al., 2024) approaches, as well as a stronger training-based variant of Spice-E adapted to Trellis. Remarkably, despite requiring no fine-tuning, SPACECONTROL achieves superior geometric faithfulness while preserving visual realism. We further provide a user interface that allows online editing of superquadrics and real-time generation of textured assets, supporting practical deployment in design workflows.

In summary, our contributions are the following:

• We introduce a training-free guidance method that conditions a powerful pre-trained generative model (Trellis) on user-defined geometry via latent space intervention, enabling geometry-aware generation without the need for costly fine-tuning.

• We conduct extensive evaluations, including a user study and quantitative analysis, showing that our method outperforms prior state-of-the-art methods for shape-conditioned 3D asset generation.

• We develop an interactive user interface that enables online editing of superquadrics and their real-time conversion into detailed, textured 3D assets, supporting practical deployment in creative workflows.

The field of 3D generation has experienced a rapid growth during the past few years both in terms of output modalities and controllability. Similar to the first image diffusion models (Ramesh et al., 2021), early applications of diffusion models for 3D generation (Nichol et al., 2022) were conducting the diffusion process in the original input space and were limited in the generated output type.

More recent approaches (Vahdat et al., 2022;Jun & Nichol, 2023) started running the generation in a more compact latent space, leading to substantial improvements both in terms of quality and efficiency. To achieve an even increased efficiency, (Zhang et al., 2024;Xiang et al., 2025) have started to disentangle the modeling of the structure from the appearance, leading to unprecedented high-quality generations. The separate modeling of geometry and appearance opens the door to explicit forms of spatially grounded conditioning, as done in our SPACECONTROL.

Given a pretrained generative model, there are two main approaches to introduce a new contr

…(Full text truncated)…

📸 Image Gallery

align-main.jpeg align-main.webp align_supergen_v3.jpeg align_supergen_v3.webp appx-new.jpeg appx-new.webp boat_005_coin3d_2.jpg boat_005_coin3d_2.webp boat_005_crosscontrol.jpg boat_005_crosscontrol.webp boat_005_input.jpg boat_005_input.webp boat_005_spice_e.jpg boat_005_spice_e.webp boat_005_supergen_t0_6.jpg boat_005_supergen_t0_6.webp chicken_000_coin3d_1.jpg chicken_000_coin3d_1.webp chicken_019_crosscontrol.jpg chicken_019_crosscontrol.webp chicken_019_input.jpg chicken_019_input.webp chicken_019_spice_e.jpg chicken_019_spice_e.webp chicken_019_supergen_t0_6.jpg chicken_019_supergen_t0_6.webp cousin_gen.jpeg cousin_gen.webp cousin_sq.jpeg cousin_sq.webp cow_016_coin3d_1.jpg cow_016_coin3d_1.webp cow_016_crosscontrol.jpg cow_016_crosscontrol.webp cow_016_input.jpg cow_016_input.webp cow_016_spice_e.jpg cow_016_spice_e.webp cow_016_supergen_t0_6.jpg cow_016_supergen_t0_6.webp cross-control.jpeg cross-control.webp elephant_012_coin3d_1.jpg elephant_012_coin3d_1.webp elephant_012_crosscontrol.jpg elephant_012_crosscontrol.webp elephant_012_input.jpg elephant_012_input.webp elephant_012_spice_e.jpg elephant_012_spice_e.webp elephant_012_supergen_t0_6.jpg elephant_012_supergen_t0_6.webp image-cond-main.jpeg image-cond-main.webp method-no-text.png method-no-text.webp radio_030_coin3d_1.jpg radio_030_coin3d_1.webp radio_030_crosscontrol.jpg radio_030_crosscontrol.webp radio_030_input.jpg radio_030_input.webp radio_030_spice_e.jpg radio_030_spice_e.webp radio_030_supergen_t0_6.jpg radio_030_supergen_t0_6.webp submarine_006_coin3d_1.jpg submarine_006_coin3d_1.webp submarine_006_crosscontrol.jpg submarine_006_crosscontrol.webp submarine_006_input.jpg submarine_006_input.webp submarine_006_spice_e.jpg submarine_006_spice_e.webp submarine_006_supergen_t0_6.jpg submarine_006_supergen_t0_6.webp sup-bis-v2.jpeg sup-bis-v2.webp teaser_2.jpg teaser_2.webp tradeoff_tau_0.jpeg tradeoff_tau_0.webp tree_022_coin3d_1.jpg tree_022_coin3d_1.webp tree_022_crosscontrol.jpg tree_022_crosscontrol.webp tree_022_input.jpg tree_022_input.webp tree_022_spice_e.jpg tree_022_spice_e.webp tree_022_supergen_t0_6.jpg tree_022_supergen_t0_6.webp trellis.jpeg trellis.webp user_interface_gen.jpeg user_interface_gen.webp user_interface_sq.jpeg user_interface_sq.webp userstudy.jpg userstudy.webp whale_025_coin3d_1.jpg whale_025_coin3d_1.webp whale_025_crosscontrol.jpg whale_025_crosscontrol.webp whale_025_input.jpg whale_025_input.webp whale_025_spice_e.jpg whale_025_spice_e.webp whale_025_supergen_t0_6.jpg whale_025_supergen_t0_6.webp with-local.png with-local.webp without-local.png without-local.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut