Figure 1: SPACECONTROL enables spatially controlled 3D asset generation from simple geometric primitives such as superquadrics (light blue) and other geometry types such as polygon meshes. Top: rapid asset generation. From quick 3D sketches and brief text prompts, we can generate high quality assets. Bottom: fine-grained editing, including adjusting a chair's backrest and adding armrests (left) or precisely controlling a sofa's dimensions and pillow arrangements (right).
Deep Dive into 공간 제어 기반 3D 자산 생성 및 정밀 편집 시스템.
Figure 1: SPACECONTROL enables spatially controlled 3D asset generation from simple geometric primitives such as superquadrics (light blue) and other geometry types such as polygon meshes. Top: rapid asset generation. From quick 3D sketches and brief text prompts, we can generate high quality assets. Bottom: fine-grained editing, including adjusting a chair’s backrest and adding armrests (left) or precisely controlling a sofa’s dimensions and pillow arrangements (right).
Generating 3D assets is a fundamental step in building virtual worlds, useful for gaming, simulation, virtual reality applications, and digital design. Recently the field of 3D generation gained immense traction, and we are now able to create assets of previously unseen quality (Xiang et al., 2025;Zhang et al., 2024;Vahdat et al., 2022;Gao et al., 2022;Wu et al., 2025;Siddiqui et al., 2024;Zhao et al., 2025;Chen et al., 2025b;Huang et al., 2025;Corsetti et al., 2025). A persistent challenge, however, is controllability, i.e., how users can effectively steer generation to align with desired shapes and appearances.
Current controllable 3D generation methods rely mainly on text or image conditioning. Text is accessible and flexible but inherently ambiguous and ill-suited for specifying precise geometry. Images provide stronger alignment with 3D structures but are cumbersome to edit and not intuitive for fine-grained adjustments. As a result, neither modality enables artists or designers to directly manipulate the geometry of generated objects. A more natural paradigm is to allow users to interact with the generative model in 3D space, starting from coarse or abstract geometry and refining toward detailed assets.
Existing methods that introduce 3D geometric control fall into two categories: training-based and guidance-based. Training-based methods fine-tune existing generative models to support a specific form of geometric input, e.g., LION (Vahdat et al., 2022) for voxel conditioning, and Spice-E (Sella et al., 2024) for primitive or mesh conditioning. These methods provide controllability but require retraining, which reduces the original model’s generalization capabilities. In contrast, guidance-based methods such as LatentNeRF (Metzer et al., 2023) and Coin3D (Dong et al., 2024) act solely at inference time without retraining, but usually involve substantial optimization overhead and constrain 3D structure only indirectly. Other works enrich existing 3D assets with geometric and appearance detail (Michel et al., 2022;Chen et al., 2023;Barda et al., 2025), yet they assume fine-grained input geometry, limiting usability in creative workflows where artists often begin with coarse sketches.
In this work, we present SPACECONTROL, a training-free method that injects explicit geometric control into modern framework for text-or image-conditioned 3D generation, such as Trellis (Xiang et al., 2025) or SAM 3D (Chen et al., 2025a), by directly encoding user-specified geometry into its latent space and using it as explicit guidance. Our method requires no additional training and enables controllable generation from diverse forms of geometry, ranging from simple primitives to detailed meshes.
We compare SPACECONTROL against both training-based (Sella et al., 2024) and guidancebased (Dong et al., 2024) approaches, as well as a stronger training-based variant of Spice-E adapted to Trellis. Remarkably, despite requiring no fine-tuning, SPACECONTROL achieves superior geometric faithfulness while preserving visual realism. We further provide a user interface that allows online editing of superquadrics and real-time generation of textured assets, supporting practical deployment in design workflows.
In summary, our contributions are the following:
• We introduce a training-free guidance method that conditions a powerful pre-trained generative model (Trellis) on user-defined geometry via latent space intervention, enabling geometry-aware generation without the need for costly fine-tuning.
• We conduct extensive evaluations, including a user study and quantitative analysis, showing that our method outperforms prior state-of-the-art methods for shape-conditioned 3D asset generation.
• We develop an interactive user interface that enables online editing of superquadrics and their real-time conversion into detailed, textured 3D assets, supporting practical deployment in creative workflows.
The field of 3D generation has experienced a rapid growth during the past few years both in terms of output modalities and controllability. Similar to the first image diffusion models (Ramesh et al., 2021), early applications of diffusion models for 3D generation (Nichol et al., 2022) were conducting the diffusion process in the original input space and were limited in the generated output type.
More recent approaches (Vahdat et al., 2022;Jun & Nichol, 2023) started running the generation in a more compact latent space, leading to substantial improvements both in terms of quality and efficiency. To achieve an even increased efficiency, (Zhang et al., 2024;Xiang et al., 2025) have started to disentangle the modeling of the structure from the appearance, leading to unprecedented high-quality generations. The separate modeling of geometry and appearance opens the door to explicit forms of spatially grounded conditioning, as done in our SPACECONTROL.
Given a pretrained generative model, there are two main approaches to introduce a new contr
…(Full text truncated)…
This content is AI-processed based on ArXiv data.