PEGAsus: 3D Personalization of Geometry and Appearance

PEGAsus: 3D Personalization of Geometry and Appearance
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present PEGAsus, a new framework capable of generating Personalized 3D shapes by learning shape concepts at both Geometry and Appearance levels. First, we formulate 3D shape personalization as extracting reusable, category-agnostic geometric and appearance attributes from reference shapes, and composing these attributes with text to generate novel shapes. Second, we design a progressive optimization strategy to learn shape concepts at both the geometry and appearance levels, decoupling the shape concept learning process. Third, we extend our approach to region-wise concept learning, enabling flexible concept extraction, with context-aware and context-free losses. Extensive experimental results show that PEGAsus is able to effectively extract attributes from a wide range of reference shapes and then flexibly compose these concepts with text to synthesize new shapes. This enables fine-grained control over shape generation and supports the creation of diverse, personalized results, even in challenging cross-category scenarios. Both quantitative and qualitative experiments demonstrate that our approach outperforms existing state-of-the-art solutions.


💡 Research Summary

PEGAsus introduces a novel framework for 3‑D shape personalization that simultaneously learns reusable geometric and appearance concepts from reference objects and composes them with textual prompts to generate new, customized 3‑D models. The authors first formalize 3‑D personalization as the extraction of category‑agnostic attributes—both geometric and visual—from a reference shape, followed by their recombination with free‑form text. To achieve this, they build upon TRELLIS, a recent 3‑D foundation model that separates generation into a sparse structure stage (producing geometry) and a structured latent stage (producing appearance features).

The core technical contribution is a progressive optimization pipeline that decouples concept learning from the underlying generative model. For global concept learning, the method first optimizes a learnable text embedding while keeping the generator frozen, thereby capturing coarse‑grained attributes without disrupting the model’s prior. In a second step, the same embedding is fixed and the geometry (or appearance) generator is fine‑tuned, allowing fine‑grained details to be encoded. Both steps share the same loss function but differ in the parameters being updated, balancing concept fidelity with transferability.

Region‑wise concept learning extends this idea by introducing two complementary losses: a context‑aware loss that enforces visual coherence with surrounding geometry/appearance, and a context‑free loss that isolates the target region’s attributes. This enables users to specify a region (e.g., the legs of a frog or a stripe pattern) and extract a localized concept that can later be recombined with arbitrary text. The learned concept is represented jointly by an optimized text embedding and a fine‑tuned generator, allowing straightforward inference: the embedding is concatenated with a new prompt and fed into the adapted generator to synthesize the desired shape.

PEGAsus supports four personalization modes—global vs. region‑wise × geometry vs. appearance—providing fine‑grained control over both structural and visual aspects. Experiments span multiple object categories (flowers, chairs, robots, frogs) and include both quantitative metrics (FID, CLIP‑Score) and qualitative visual comparisons. The results demonstrate that PEGAsus outperforms prior state‑of‑the‑art methods in quality and flexibility, especially in cross‑category scenarios where unconventional combinations such as “a chair with watermelon texture and frog‑leg geometry” are successfully generated.

The paper also discusses limitations: the fine‑tuning stage is computationally intensive, region masks must be manually provided, and TRELLIS’s native output format (e.g., Gaussian splats) may require post‑processing for mesh‑based pipelines. Future work is suggested in the direction of lightweight adaptation techniques, automatic region detection, and multimodal inputs (e.g., sketches plus text) to lower the barrier for end‑users.

In summary, PEGAsus provides a powerful, decoupled approach to 3‑D personalization that bridges the gap between example‑based attribute extraction and text‑driven generation, opening new possibilities for content creation in gaming, VR/AR, product design, and beyond.


Comments & Academic Discussion

Loading comments...

Leave a Comment