Authors: ** Xiangjun Tang, Biao Zhang, Peter Wonka* King Abdullah University of Science and Technology (KAUST) {xiangjun.tang, biao.zhang, peter.wonka}@kaust.edu.sa **
📝 Abstract
sa (a) Points (b) Depth/Color (c) Normal (d) Mesh 𝐗𝐗 ~𝒩𝒩 Figure 1. Our generative framework produces diverse avatar geometry sequences from noise, with geometries represented as points (a). For visualization, these points can be rendered via Gaussian splatting (GS), producing depth images (b) and normal images (c). Colors (b) can then be obtained by GS optimization, using a depth-guided video generation model (Wan 2.1), while the normal images (c) effectively highlight fine folds and wrinkles. Our synthesized geometries are of high quality and can be directly converted into meshes (d) via Poisson reconstruction. The highlighted regions demonstrate fine-grained garment dynamics that faithfully follow human motion.
💡 Deep Analysis
📄 Full Content
Human Geometry Distribution for 3D Animation Generation
Xiangjun Tang
Biao Zhang
Peter Wonka*
King Abdullah University of Science and Technology
{xiangjun.tang, biao.zhang, peter.wonka}@kaust.edu.sa
(a) Points
(b) Depth/Color
(c) Normal
(d) Mesh
𝐗𝐗 ~𝒩𝒩
Figure 1. Our generative framework produces diverse avatar geometry sequences from noise, with geometries represented as points (a).
For visualization, these points can be rendered via Gaussian splatting (GS), producing depth images (b) and normal images (c). Colors (b)
can then be obtained by GS optimization, using a depth-guided video generation model (Wan 2.1), while the normal images (c) effectively
highlight fine folds and wrinkles. Our synthesized geometries are of high quality and can be directly converted into meshes (d) via Poisson
reconstruction. The highlighted regions demonstrate fine-grained garment dynamics that faithfully follow human motion.
Abstract
Generating realistic human geometry animations remains a
challenging task, as it requires modeling natural clothing
dynamics with fine-grained geometric details under limited
data. To address these challenges, we propose two novel de-
signs. First, we propose a compact distribution-based latent
representation that enables efficient and high-quality geom-
etry generation. We improve upon previous work by estab-
lishing a more uniform mapping between SMPL and avatar
geometries. Second, we introduce a generative animation
model that fully exploits the diversity of limited motion
*Corresponding author.
data. We focus on short-term transitions while maintain-
ing long-term consistency through an identity-conditioned
design. These two designs formulate our method as a two-
stage framework: the first stage learns a latent space, while
the second learns to generate animations within this latent
space. We conducted experiments on both our latent space
and animation model. We demonstrate that our latent space
produces high-fidelity human geometry surpassing previous
methods (90% lower Chamfer Dist.). The animation model
synthesizes diverse animations with detailed and natural
dynamics (2.2× higher user study score), achieving the best
results across all evaluation metrics.
1
arXiv:2512.07459v1 [cs.GR] 8 Dec 2025
1. Introduction
Generating 3D human geometry animation is a fundamental
task in visual generation and human modeling. The goal is
to synthesize natural dynamics with fine-grained geometric
details, which poses significant challenges. First, captur-
ing fine-grained details requires modeling subtle geometric
structures such as folds and wrinkles. Second, learning nat-
ural dynamics is challenging due to the limited availability
of 3D animation data, where models may easily overfit and
fail to reproduce realistic garment deformation in response
to human movement.
Early methods [28, 33, 34] learn dynamics for spe-
cific garments, or model avatars from video or scanned
data [11, 17–19, 23, 29, 32, 40, 41, 45, 53, 61]. While these
approaches can synthesize plausible dynamics with limited
data, they are not generative methods and fail to general-
ize to unseen avatars or garments. In contrast, generative
avatar models [6, 13, 16, 22, 51, 59, 62] extend to diverse
identities and offer better generalization, yet they struggle to
preserve high-fidelity geometry and learn realistic clothing
deformations. Overall, no existing approach satisfies both
requirements.
To address these challenges, we propose two key de-
signs. First, we propose a latent representation based on
the Human Geometry Distribution (HuGeoDis) [39], which
enables the synthesis of high-fidelity geometry from a com-
pact latent representation. However, the original HuGeoDis
suffers from imbalanced sampling: it requires a large num-
ber of points to adequately cover a geometry, and under-
sampled areas often lead to reconstruction artifacts.
To
mitigate this, we design a new training scheme that first
establishes more uniform mappings between SMPL and
avatar geometries, and then learns from these correspon-
dences.
This design enables high-quality geometry gen-
eration with significantly fewer points, thereby improving
efficiency for long animation sequences. Second, we intro-
duce a generative animation model that captures temporal
dynamics from limited 3D human animation data. We em-
ploy a conditional diffusion model that models short-term
transitions, which has been empirically shown to leverage
diverse motion data more effectively than directly modeling
long sequences [37]. Long sequences are generated auto-
regressively from these transitions, with long-term con-
sistency preserved via conditional inputs to the diffusion
model.
We
conduct
experiments
to
validate
both
our
distribution-based latent representation and the gener-
ative animation model. For the latent space, we evaluate
reconstruction accuracy and efficiency, and further assess
its performance for the static random avatar generation
task, a standard benchmark in avatar gene