Learning to Build Shapes by Extrusion
We introduce Text Encoded Extrusion (TEE), a text-based representation that expresses mesh construction as sequences of face extrusions rather than polygon lists, and a method for generating 3D meshes from TEE using a large language model (LLM). By learning extrusion sequences that assemble a mesh, similar to the way artists create meshes, our approach naturally supports arbitrary output face counts and produces manifold meshes by design, in contrast to recent transformer-based models. The learnt extrusion sequences can also be applied to existing meshes - enabling editing in addition to generation. To train our model, we decompose a library of quadrilateral meshes with non-self-intersecting face loops into constituent loops, which can be viewed as their building blocks, and finetune an LLM on the steps for reassembling the meshes by performing a sequence of extrusions. We demonstrate that our representation enables reconstruction, novel shape synthesis, and the addition of new features to existing meshes.
💡 Research Summary
The paper introduces a novel approach for generating and editing 3D meshes by representing mesh construction as a sequence of face extrusions encoded in plain text, called Text Encoded Extrusion (TEE). Traditional transformer‑based mesh generators treat vertices and faces as discrete tokens, which leads to excessively long sequences, limited resolution, difficulty handling continuous coordinates, and often produce non‑manifold outputs. In contrast, the authors observe that the geometric operation of extruding a face loop—detaching a patch of faces, duplicating its boundary, and stitching a new loop between the copies—is inherently manifold‑preserving and invertible. By decomposing quadrilateral meshes that satisfy the Face Extrusion Quad (FEQ) constraints (no self‑intersecting or self‑adjacent loops) into a set of non‑overlapping face loops, they construct a directed acyclic graph (DAG) that captures the hierarchical dependencies among extrusions.
Each extrusion is described by three textual components: (1) a geometric description of the base patch, (2) references to previously performed extrusions that provide the necessary faces, and (3) a closed curve (expressed in a 2D harmonic parameterization of the patch) that delineates the region to be extruded. These components are concatenated into a human‑readable string, forming the TEE representation of the entire mesh. The authors generate many possible topological orderings of the DAG to increase training diversity, and they further cluster similar extrusions using K‑means on vertex positions, replacing each cluster with a representative extrusion token.
The TEE strings are then used to fine‑tune a large language model (LLM), specifically a variant of LLaMA, enabling the model to predict the next extrusion token given the previous context. Because the LLM operates on text, it can handle arbitrary numbers of extrusions without a fixed context window, and it naturally outputs continuous geometric parameters embedded in the textual description. During inference, the predicted TEE sequence is parsed back into geometric data, the harmonic maps are recomputed for each base patch, and the extrusions are applied sequentially to reconstruct a full mesh.
The authors evaluate their method on two newly released quadrilateral datasets derived from DF‑AUST (human upper bodies) and MANO (hand models). Compared against recent transformer‑based mesh generators such as LegoGPT, MeshGPT, and MeshXL, their LoopGPT system demonstrates several key advantages: (i) guaranteed manifoldness by construction, (ii) no upper bound on face count, allowing high‑resolution outputs, (iii) ability to represent sharp features and thin structures because vertex coordinates are continuous, and (iv) support for feature completion and editing—new extrusions can be added to an existing mesh to augment or modify it. Qualitative results show diverse, plausible shapes generated from a fine‑tuned LLM, as well as successful addition of user‑specified features to pre‑existing meshes.
In summary, the paper presents a compelling framework that leverages the expressive power of large language models for 3D geometry creation. By translating extrusion operations into a textual language, the method sidesteps the token‑length bottleneck of traditional transformers, preserves topological correctness, and opens the door to interactive mesh editing within the same generative pipeline. Future work may extend the approach to non‑quadrilateral meshes, meshes with arbitrary genus, and conditional generation guided by natural‑language prompts or other modalities.
Comments & Academic Discussion
Loading comments...
Leave a Comment