UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

Reading time: 5 minute
...

📝 Original Info

  • Title: UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
  • ArXiv ID: 2512.21185
  • Date: 2025-12-24
  • Authors: Tanghui Jia, Dongyu Yan, Dehao Hao, Yang Li, Kaiyi Zhang, Xianyi He, Lanjiong Li, Yuhan Wang, Jinnan Chen, Lutao Jiang, Qishen Yin, Long Quan, Ying-Cong Chen, Li Yuan

📝 Abstract

In this report, we introduce UltraShape 1.0, a scalable 3D diffusion framework for high-fidelity 3D geometry generation. The proposed approach adopts a two-stage generation pipeline: a coarse global structure is first synthesized and then refined to produce detailed, high-quality geometry. To support reliable 3D generation, we develop a comprehensive data processing pipeline that includes a novel watertight processing method and high-quality data filtering. This pipeline improves the geometric quality of publicly available 3D datasets by removing low-quality samples, filling holes, and thickening thin structures, while preserving fine-grained geometric details. To enable fine-grained geometry refinement, we decouple spatial localization from geometric detail synthesis in the diffusion process. We achieve this by performing voxel-based refinement at fixed spatial locations, where voxel queries derived from coarse geometry provide explicit positional anchors encoded via RoPE, allowing the diffusion model to focus on synthesizing local geometric details within a reduced, structured solution space. Our model is trained exclusively on publicly available 3D datasets, achieving strong geometric quality despite limited training resources. Extensive evaluations demonstrate that UltraShape 1.0 performs competitively with existing open-source methods in both data processing quality and geometry generation. All code and trained models will be released to support future research.

💡 Deep Analysis

Figure 1

📄 Full Content

Technical Report UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement Tanghui Jia*1, Dongyu Yan*2, Dehao Hao*3, Yang Li2, Kaiyi Zhang3, Xianyi He1, Lanjiong Li2 Yuhan Wang5, Jinnan Chen4, Lutao Jiang2, Qishen Yin1, Long Quan3, Ying-Cong Chen2, Li Yuan1 1Shenzhen Graduate School, Peking University 2The Hong Kong University of Science and Technology (Guangzhou) 3The Hong Kong University of Science and Technology 4National University of Singapore 5S-Lab, Nanyang Technological University ∗Equal contribution Figure 1 High-quality 3D assets generated by UltraShape 1.0. Best viewed with zoom-in. Abstract In this report, we introduce UltraShape 1.0, a scalable 3D diffusion framework for high-fidelity 3D geometry generation. The proposed approach adopts a two-stage generation pipeline: a coarse global structure is first synthesized and then refined to produce detailed, high-quality geometry. To support reliable 3D generation, we develop a comprehensive data processing pipeline that includes a novel watertight processing method and high-quality data filtering. This pipeline improves the geometric quality of publicly available 3D datasets by removing low-quality samples, filling 1 arXiv:2512.21185v2 [cs.CV] 25 Dec 2025 holes, and thickening thin structures, while preserving fine-grained geometric details. To enable fine-grained geometry refinement, we decouple spatial localization from geometric detail synthesis in the diffusion process. We achieve this by performing voxel-based refinement at fixed spatial locations, where voxel queries derived from coarse geometry provide explicit positional anchors encoded via RoPE, allowing the diffusion model to focus on synthesizing local geometric details within a reduced, structured solution space. Our model is trained exclusively on publicly available 3D datasets, achieving strong geometric quality despite limited training resources. Extensive evaluations demonstrate that UltraShape 1.0 performs competitively with existing open-source methods in both data processing quality and geometry generation. All code and trained models will be released to support future research. Date: December 29, 2025 Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/ 1 Introduction 3D content generation plays a fundamental role across a wide range of applications, including film and visual effects production, augmented and virtual reality, robotics, industrial design, and modern video games. Across these domains, generation of high-fidelity 3D geometry remains a core technical requirement. As demand for scalable, automated 3D geometry generation continues to grow, learning-based 3D generation has emerged as a key research direction in computer vision and computer graphics. Compared to 2D content generation, 3D generation poses substantially greater challenges. First, high-quality 3D data is significantly scarcer, often represented non-uniformly, and typically requires strong geometric properties, such as watertightness, to be directly usable in downstream tasks. In addition, common 3D representations are inherently sparse, and both memory consumption and computational cost scale cubically with spatial resolution, severely limiting the achievable level of geometric detail and scalability. These factors make it difficult for existing methods to produce fine-grained geometry while maintaining robustness at higher resolutions. As a result, 3D generation techniques have not yet converged on a unified, scalable pipeline. Existing watertight remeshing techniques for 3D generative models can be broadly categorized into UDF- based, visibility-check-based, and flood-fill-based approaches. UDF-based methods typically compute unsigned distance fields (UDFs) on dense voxel grids and derive pseudo-SDFs by subtracting a small offset ϵ [2, 28]; however, this heuristic lacks explicit sign inference, often resulting in double-layered surfaces or the erroneous removal of valid disconnected components (e.g., wheels) when filtering for the largest connected part. Alternatively, visibility-check-based methods employ ray casting to identify interior regions [12, 13, 28], which effectively seal cracks and eliminate spurious internal structures but remain sensitive to occlusions and prone to high-frequency geometric noise in complex regions. Finally, flood-fill-based strategies infer signs by expanding from exterior seeds (e.g., ManifoldPlus [7]) to generate clean, regularized surfaces. Despite their effectiveness on closed shapes, these methods rely heavily on watertight assumptions; when applied to non-watertight or self-intersecting inputs, the fill process often leaks into the interior, yielding unintended double-layered thin shells. Alongside earlier approaches such as Score Distillation Sampling [1, 18, 21] and Large Reconstruction Models [6, 20, 25], diffusion transformer (DiT [17])-based methods have recently become the leading paradigm in 3D generation. They can be broadly

📸 Image Gallery

teaser.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut