ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph
Current text-to-3D generation methods excel in natural scenes but struggle with industrial applications due to two critical limitations: domain adaptation challenges where conventional LoRA fusion causes knowledge interference across categories, and geometric reasoning deficiencies where pairwise consistency constraints fail to capture higher-order structural dependencies essential for precision manufacturing. We propose a novel framework named ForgeDreamer addressing both challenges through two key innovations. First, we introduce a Multi-Expert LoRA Ensemble mechanism that consolidates multiple category-specific LoRA models into a unified representation, achieving superior cross-category generalization while eliminating knowledge interference. Second, building on enhanced semantic understanding, we develop a Cross-View Hypergraph Geometric Enhancement approach that captures structural dependencies spanning multiple viewpoints simultaneously. These components work synergistically improved semantic understanding, enables more effective geometric reasoning, while hypergraph modeling ensures manufacturing-level consistency. Extensive experiments on a custom industrial dataset demonstrate superior semantic generalization and enhanced geometric fidelity compared to state-of-the-art approaches. Our code and data are provided in the supplementary material attached in the appendix for review purposes.
💡 Research Summary
ForgeDreamer tackles the long‑standing gap between state‑of‑the‑art text‑to‑3D generation, which excels on natural scenes, and the stringent requirements of industrial parts such as fasteners, electronic components, and precision‑machined surfaces. The authors identify two root problems: (1) a domain gap where pretrained 2D diffusion models lack the semantic knowledge to understand industrial terminology, and (2) geometric reasoning that relies on pairwise view consistency, which cannot capture the higher‑order structural relationships needed for manufacturing‑level accuracy.
To solve these issues, the paper introduces two complementary innovations. First, a Multi‑Expert LoRA Ensemble is built by fine‑tuning separate LoRA adapters for each industrial category (e.g., screws, nuts, LEDs). These adapters are then merged through a teacher‑student knowledge distillation pipeline. Each teacher model (the base Stable Diffusion network plus a category‑specific LoRA) generates multi‑view images and latent representations for its trigger word. The student model learns a unified text encoder and UNet by minimizing MSE‑based alignment losses between teacher and student features, first training the text encoder alone and then jointly updating both encoder and UNet. This two‑stage distillation resolves conflicts among categories, yielding a single LoRA weight set that generalizes across domains without the interference observed in naïve additive fusion.
Second, the Cross‑View Hypergraph Geometric Enhancement replaces traditional pairwise consistency with a hypergraph formulation. Latent features from multiple views are flattened into pixel‑level nodes; hyperedges are created by connecting each node to its top‑k most similar nodes in feature space, regardless of spatial alignment. A Hypergraph Neural Network (HGNN) performs message passing over these hyperedges, aggregating information from many views simultaneously. The resulting hypergraph‑based geometric gradient loss (L_HG) is combined with existing pairwise consistency losses, encouraging the 3D representation to satisfy high‑order structural constraints such as threading continuity, connector alignment, and dimensional tolerances.
The authors construct a custom multi‑view industrial dataset comprising four views of components like bolts, gaskets, and LEDs, captured under controlled lighting. Evaluation uses three metrics: (i) text‑image alignment (SD‑Score), (ii) Chamfer Distance for geometric fidelity, and (iii) human preference studies. ForgeDreamer outperforms strong baselines (DreamFusion, ProlificDreamer, LucidDreamer) by roughly 12 % in semantic alignment and 18 % in Chamfer Distance reduction. Ablation studies confirm that (a) removing the ensemble and using a single LoRA leads to severe knowledge interference, and (b) omitting the hypergraph causes noticeable distortions on complex parts.
In summary, ForgeDreamer demonstrates that (1) consolidating multiple domain‑specific LoRA adapters via teacher‑student distillation yields robust, interference‑free semantic understanding, and (2) modeling geometric consistency as a hypergraph problem captures higher‑order dependencies essential for industrial precision. The combined system delivers high‑fidelity, text‑driven 3D models suitable for manufacturing, design, and downstream simulation pipelines, and the proposed techniques are readily extensible to other multi‑domain generative tasks.
Comments & Academic Discussion
Loading comments...
Leave a Comment