ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation

ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent advances in zero-shot text-to-3D generation have revolutionized 3D content creation by enabling direct synthesis from textual descriptions. While state-of-the-art methods leverage 3D Gaussian Splatting with score distillation to enhance multi-view rendering through pre-trained text-to-image (T2I) models, they suffer from inherent prior view biases in T2I priors. These biases lead to inconsistent 3D generation, particularly manifesting as the multi-face Janus problem, where objects exhibit conflicting features across views. To address this fundamental challenge, we propose ConsDreamer, a novel method that mitigates view bias by refining both the conditional and unconditional terms in the score distillation process: (1) a View Disentanglement Module (VDM) that eliminates viewpoint biases in conditional prompts by decoupling irrelevant view components and injecting precise view control; and (2) a similarity-based partial order loss that enforces geometric consistency in the unconditional term by aligning cosine similarities with azimuth relationships. Extensive experiments demonstrate that ConsDreamer can be seamlessly integrated into various 3D representations and score distillation paradigms, effectively mitigating the multi-face Janus problem.


💡 Research Summary

The paper “ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation” addresses one of the most persistent challenges in the field of text-to-3D synthesis: the “Janus Problem.” As text-to-3D generation leverages pre-trained Text-to-Image (T2I) models via Score Distillation Sampling (SDS), it inherently inherits the “view bias” present in those T2I models. This bias, where models are trained predominantly on front-facing perspectives, leads to the generation of multi-faced objects, where conflicting features appear across different viewing angles, breaking the structural integrity of the 3D asset.

To overcome this, the authors introduce ConsDreamer, a framework designed to enforce multi-view consistency through two primary technical innovations. The first component is the View Disentanglement Module (VDM). The VDM operates on the conditional term of the distillation process by decoupling viewpoint-dependent information from the text prompts. By stripping away the inherent viewpoint biases embedded in the prompts, the VDM allows for precise camera control and ensures that the object’s identity remains stable regardless of the viewing angle, effectively neutralizing the “front-facing” preference of the T2I prior.

The second component is the Similarity-based Partial Order Loss, which targets the unconditional term. This loss function enforces geometric consistency by aligning the cosine similarities of visual features with the relative azimuth relationships of the camera. In essence, it mathematically mandates that the transition of visual features between different views must follow a consistent, ordered pattern corresponding to the rotation of the camera. This prevents sudden, non-physical jumps in texture or geometry during view transitions, ensuring a smooth and continuous 3D structure.

The experimental results demonstrate that ConsDreamer is highly versatile and can be seamlessly integrated into various 3D representations, including 3D Gaussian Splatting, and different score distillation paradigms. The method significantly mitigates the Janus problem and enhances the overall fidelity and multi-view consistency of the generated 3D content. By providing a robust solution to view bias, ConsDreamer paves the way for more reliable and high-quality automated 3D content creation workflows, making it a significant contribution to the evolution of generative AI in the 3D domain.


Comments & Academic Discussion

Loading comments...

Leave a Comment