SOMA: Unifying Parametric Human Body Models

2026-3-18 S OMA: U nif ying P arametric Human Body Models Jun Saito * , Jief eng Li * , Michael de R uyter * , Miguel Guerrero, Edy Lim, Ehsan Ha ssani, Roger Blanco Ribera, Hyejin Moo n, Magdalena D adel a, Marco Di Lucca, Qi ao W ang, Xueting Li, Jan Kautz, Simo n Y uen, Umar Iqbal * NVIDIA * Core C ontrib utors https://github.com/NVlabs/SOMA-X S O M A - S ha pe M H R S M P L - X A n ny G a r m e nt M e a s ur e m e n t ( a ) U n ifie d S k e le t o n ( b ) U n ifie d P o s e C o r r e c t iv e s ( c ) U n ifie d M e s h T o p o lo g y Figure 1: S OMA uniﬁes ﬁv e heterogeneou s parametric body models (SOMA-S hape, MH R, SMPL- X, Anny , and GarmentMeasurements) under a single animatio n pipeline. ( a) Uniﬁ ed Skeleto n: despite originating from entirely diﬀerent identit y spaces, joint hierarchies, and mesh resolutions, all ﬁve m odels are driv en by the same S OMA skeleto n in an identi cal pose, with no model-speciﬁ c retargeting. ( b) Uniﬁ ed Pose Correctives: a single MLP correctives m odel trained once on the shared S OMA topology prod uces an atomi cally pl ausib le pose-dependent deformati ons for all backends, mitigating st andard LBS artif acts without per-m odel correctiv e learning. ( c) U niﬁed Mes h T opology: all identit y models share the same mesh structure, enab ling skinning w eights, deformati on priors, and correctiv es to transfer seamlessly across backends. R el ated projects that already support SOMA: • SOMA Retargeter for SOMA to humanoid ret argeting: https://github.com/NVIDIA/soma-retargeter • GEM for video pose estimation: https://github.com/NVlabs/gem-x • Kimodo for controllab le text-t o-moti on generatio n: https://github.com/nv-tlabs/kimodo • BONES-SEED is the largest human( oid) moti on dataset (150k motio ns): https://huggingface.co/datasets/bones-studio/seed • ProtoMotions is a simulatio n and learning framework for human( oids): https://github.com/NVlabs/ProtoMotions • GEAR SONIC is a humanoid behavi or foundatio n model: https://github.com/NVlabs/GR00T-WholeBodyControl © 2026 NVI DIA. All rights reserved. SOMA : Unif ying P arametric Human Body Models Abstract P arametric human body m odels are fo undational to human reconstructi on, animation, and simulatio n, yet they remain mutua lly incompatib le: SMPL, SMPL-X, MH R, Anny , and related m odels each div erge in mes h topology , skel et al structure, s hape parameteri zation, and unit conv entio n, making it impractica l to explo it their complementary strengths within a single pipeline. W e present S OMA , a uniﬁed body layer that bridges these heterogeneous represent atio ns through three abstractio n layers. Mes h topology abstractio n maps any source model’s identit y to a shared cano nical mes h in constant time per vertex. Skeletal abstracti on reco vers a full set of identit y-adapted joint transforms from any body s hape, whether in rest pose or an arbitrary posed conﬁgurati on, in a single closed-f orm pass, with no iterative optimiz ation or per-m odel training. P ose abstractio n inv erts the s kinning pipeline to recov er uniﬁed skel eton rotations directly from posed verti ces o f any supported model, en ab ling heterogeneous m otion datasets to be consumed without custom retargeting. T ogether , these layers redu ce the 𝑂 ( 𝑀 2 ) per-pair adapter prob lem to 𝑂 ( 𝑀 ) single-backend connectors, letting practitio ners freely mix identit y sources and pose data at inferen ce time. The entire pipeline is f ully diﬀerentiab le end-to-end and GPU-accel erated via NVI D IA- W arp. 1. Introd ucti on P arametric human body m odels are a cornerston e o f computer visio n, computer graphi cs, and physica l AI , enab ling reconstru ction, animation, and simulatio n o f human moti on at scale. The most widely used family , SMPL ( L oper et al. , 2015 ; P avlako s et al. , 2019 ), deﬁnes meshes with compact linear sha pe spaces and ha s become the de f acto target for pose estimation, m otion generatio n, and avatar synthesis ( Li et al. , 2025 ; Shen et al. , 2024 ; W ang et al. , 2026 ; Zhang et al. , 2024 ). MH R ( F erguson et a l. , 2025 ) introd uces explicit bone-l ength parameteri zation that more f aithf ully captures skeletal diversit y across people, addressing a w ell-known limit ation o f PCA-o nly sha pe models. Anny ( Brégier et al. , 2025 ) constru cts its shape space from anthropometric measurements rather than 3D scans, providing semantic control ov er a ge, height, w eight, and body compositio n that spans the f ull human lifes pan from inf ants to elders, aiming to remo v e the dem ographic biases inherent in scan-collected data. GarmentMea surements ( Korostelev a and S orkine-Horn ung , 2023 ) extends shape representation to clothing-aware body proportio ns encoded via body measurements. Despite this diversit y of availab le models, a concrete fragmentation prob lem persists. Each m odel deﬁnes its o wn mesh topology , joint hierarchy , unit conv ention, and parameter space. A practiti oner wishing to combin e Anny’s interpret ab le phenot ype control with an SMPL-compatibl e m otion capture dat aset m ust implement separate topology-transfer pipelines, independent skeleton-ﬁtting routines, and bespoke coordin ate-frame con versi ons f or every model pair . Supporting 𝑀 m odels naively requires 𝑂 ( 𝑀 2 ) per-pair adapters; in practice, this forces early m odel commitment and forf eits the complementary strengths of alternatives. No uniﬁed interface currently exists that lets a researcher freely mix identit y source and pose parameteri zation. W e introd uce S OMA , a canonica l body topology and rig that serves as a univ ersal pivot for heterogeneous parametric body models (see Fig. 2 ). Rather than replacing existing models, S OMA maps their rest-shape outputs to a single shared representation, after which any identit y model can be animated through on e uniﬁed LBS pipeline. This reduces the 𝑂 ( 𝑀 2 ) adapter probl em to 𝑂 ( 𝑀 ) single-backend connectors, each implemented on ce and composed freely at inference time. Our contributi ons are fo urfold: 1. Identit y-Pose Decoupling via a Canoni cal T opology ( SOMALayer ). W e propose a framework that maps the rest-shape output of any supported parametric model to a canonica l S OMA mesh and rig, explicitly separating identit y represent ation from kinematic parameteriz ation. A single pose interf ace drives any identit y source without model-speciﬁ c ada pt ation code at runtime, and a uniﬁed pose-dependent 2 SOMA : Unif ying P arametric Human Body Models correctiv es m odel genera li zes to all backends. 2. Mes h T opology Abstractio n. W e introduce a topology a bstraction mod ule that pre-computes a ﬁxed 3D barycentric corresponden ce bet ween each source model’s neutra l mesh and the S OMA cano nical mesh at initialization. At runtime, identit y transfer requires no neural forw ard pass and no iterativ e solver . 3. Skel et al Abstracti on. W e present a backend-agno stic skeleton ﬁtting algorithm that explo its the shared mesh correspondence to precisely ﬁt any template skeleto n into a new body shape. Given the transferred rest shape, it recov ers identit y-adapted world-s pace joint transforms in a single analytica l f orward pa ss with no iterative optimization or per-m odel training. 4. Po se Abstraction. W e introd uce a pose abstracti on m odule that recov ers S OMA skeleto n rotations from posed verti ces of any supported model vi a an alyti cal in verse-LBS with Newto n-S chulz orthogonaliz atio n, enab ling direct conv ersion of m otion data from SMPL, MH R, and other models into S OMA ’s uniﬁed skeleto n con venti on. The entire S OMA forw ard pass is fully diﬀerentia b le end-to-end, making it directly usabl e a s a diﬀerentiab le layer in large-scale optimi zation and ML training pipelines. 2. R el ated W ork 2.1. P arametric Body Models The ﬁeld has seen signiﬁcant evoluti on in parametric human m odeling ( Anguelo v et al. , 2005 ; Brégier et al. , 2025 ; Fergu son et al. , 2025 ; L oper et al. , 2015 ; Osman et a l. , 2020 ; Pishchulin et al. , 2017 ; Xu et al. , 2020 ). SMPL ( Loper et al. , 2015 ) introdu ced a vertex-ba sed linear blend skinning model with learned correctiv e b lend sha pes, which became the de f acto st andard with 6,890 mesh vertices and a compact PCA shape space. ST AR ( Osman et al. , 2020 ) proposed sparser skinning weights to reduce undesirab le cross-joint coupling. SMPL-H ( R omero et al. , 2017 ) and SMPL- X ( P avlakos et al. , 2019 ) extended the SMPL f amily with f ully articulated hands, via MANO ( R omero et al. , 2017 ), and an expressive f ace, respectiv ely . MH R ( F erguson et al. , 2025 ) addresses skeletal ambiguit y by explicitly modeling bone lengths to improv e ﬁtting accuracy across body proportio ns. Anny ( Brégier et a l. , 2025 ) builds its shape space from anthropometric measurements rather than 3D scans, enab ling phenotype control ( age, height, w eight) that spans inf ants to elders. Each of these m odels deﬁnes its o wn mesh topology , joint hierarchy , and parameter space. S OMA does not replace any of them; instead, it pro vides a canonica l mesh topology and rig that any supported backend can driv e through a single uniﬁed pipeline. 2.2. Human Motion Estimati on and Generati on A rich body o f work estimates 3D human pose and sha pe from mon ocul ar images ( Goel et al. , 2023 ; Iqbal et al. , 2021 ; Kanaz awa et al. , 2018 ; Koca bas et al. , 2021 , 2024 ; Kolotouro s et al. , 2019 ; P atel and Black , 2025 ; Sárándi and Pons-Moll , 2024 ; W ang et al. , 2025 ; Yuan et al. , 2022 ), videos ( Choi et al. , 2021 ; Goel et al. , 2023 ; Kocaba s et al. , 2020 ; Shen et a l. , 2024 ; Shin et al. , 2023 ; W ang et al. , 2026 ), and generates m otion from div erse conditi oning sign als such as text, musi c, and scene context ( Li et al. , 2025 ; Petro vich et al. , 2024 ; T evet et al. , 2023 ; Y uan et al. , 2023 ; Zhang et al. , 2022 , 2024 ). The vast majorit y o f these systems are built around SMPL or SMPL- X as the t arget represent atio n; m ore recently , methods such as MultiHMR ( Baradel et al. , 2024 ), Sam-3D -Body ( Y ang et al. , 2026 ), and DuoMo ( W ang et al. , 2026 ) hav e st arted to adopt MH R and Anny to better capture bone-l ength and age-range diversit y . Ho wev er , whether estimating or generating, all of these systems are trained to output parameters f or one speciﬁc body model and must be retrained whenever the target representation changes. S OMA decouples identit y m odel selecti on from the estimation pipeline: a pose estimator or generativ e model outputting S OMA -compatib le joint parameters can driv e any supported identit y , SMPL, MH R, Anny , or others, at inf erence time without retraining, and can be supervised with body shape labels from all backends sim ult aneous ly . 3 SOMA : Unif ying P arametric Human Body Models SOMA - shape Identity Model Provid er Identity Shape MHR topo logy SMPL topology Animati on Layer ANNY SMPL / SMPL -X MHR Bridgin g Lay er Mesh Topo logy Abstracti on Skeletal Abstraction SOMA ske l eton Pose Correctives LBS SOMA topology SOMA mesh t opo logy Posed meshes SOMA Po se Abst ract ion Pose parameters Figure 2: Overview o f S OMA . S OMA decouples body identit y from pose through three sequential layers. Identit y Model Pro vider ( left): any supported backend (SOMA-s hape, Anny , MH R, SMPL/SMPL-X, or Gar- mentMeasurements) maps its o wn sha pe parameters 𝛽 𝑠 to a rest-shape mesh in its native topology . Bridging Layer (midd le): t w o abstracti on steps canoni calize the source identit y into a uniﬁed represent ation. Mesh T opology Abstraction transfers the rest sha pe to the shared S OMA topology via pre-computed barycentric coordinates; S kelet al Abstractio n then ﬁts the shared 𝐽 =77 -joint S OMA rig to the transferred rest shape in a single closed-f orm pass, with no iterative optimization or per-identit y training. Animatio n Layer (right): all identit y m odels are animated through the shared S OMA skeleto n using 𝜃 SOMA joint rotations. When motio n data arriv es in another m odel’s conv entio n, i.e . 𝜃 x ∈ { 𝜃 MH R , 𝜃 SMPL , . . . } , Pose Abstraction con verts it to 𝜃 SOMA by analytica lly in verting the LBS pipeline; this step is bypassed when pose is a lready in the S OMA con ventio n. A shared MLP Po se Correctiv es model then predicts pose-dependent vertex displacements to correct LBS artifacts, and Linear Blend Skinning produces the ﬁnal posed mesh. The entire pipeline is fully diﬀerentiab le end-to-end. 3. Method S OMA is a m odular framework for uniﬁed parametric body m odeling. Its core runtime compon ent, SOMALayer , accepts shape parameters from any supported identit y backend alo ngside pose parameters, and prod uces posed mesh verti ces and joint positio ns in meters. Fig. 2 illustrates the full pipeline. 3.1. Ov erview and Notation Let 𝑉 ℎ ∈ R 𝑁 ℎ × 3 denote the SOMA cano nica l mesh with 𝑁 ℎ v ertices, 𝐹 ℎ its triangle faces, and 𝐽 = 77 its joint count ( excluding the root). F or each supported backend 𝑠 ∈ { NO V A, MH R, Anny , SMPL, SMPL- X, Garment } , let ℳ 𝑠 ( 𝛽 𝑠 ) denote the backend’s rest-shape generator , which maps identit y parameters 𝛽 𝑠 to a source mesh 𝑉 𝑠 ∈ R 𝑁 𝑠 × 3 in the backend’s native unit. S OMA ’s forward pass transforms any ( 𝑉 𝑠 , 𝜃 ) pair into posed S OMA mesh verti ces via three sequential steps: (1) mesh topology abstracti on; (2) closed-f orm skel eton ﬁtting; and (3) LBS posing. Ev ery step is f ully diﬀerentia ble, so the entire pipeline can serv e as a diﬀerentia ble layer in optimization and learning frameworks. 3.2. Identit y Model Pro vider The identit y model provider t akes the nativ e sha pe parameters 𝛽 𝑠 o f any supported backend and ma ps them to a rest-sha pe mesh in that backend’s n ativ e topology . S OMA integrates ﬁv e interchangeab le backends, each with its o wn strengths, allo wing users to easily adopt the identit y model of their preference within a single 4 SOMA : Unif ying P arametric Human Body Models ( a) SizeUSA R egistrations ( b) T riplegangers R egistrations Figure 3: Training dat a f or the S OMA -Sha pe PCA model. ( a) A subset of the 9,326 Si zeUSA body scans registered to the S OMA topology , exhibiting a wide range of body weights and proportions. ( b) A subset of the 303 Tripl egangers scans, registered to the same topology . All meshes are reposed to a canoni cal A-pose before PCA ﬁtting, and mirror-augmented across the sagittal plane to enforce bilateral symmetry . animatio n framew ork. S OMA -Sha pe . S OMA ’s own shape space uses 𝐾 =128 PCA compo nents trained on 9,326 SizeUSA body scans ( [ TC] 2 , 2004 ), 303 T riplegangers photogrammetry scans ( Tripl egangers , 2025 ), and samples distilled from the GarmentMeasurements PCA model ( K orosteleva and Sorkine-H ornung , 2023 ), with a 40/40/20 mixing ratio learned by increment al PCA ( R oss et a l. , 2008 ). W e sho w example registrations in Fig. 3 . SMPL / SMPL- X ( Loper et a l. , 2015 ; P avlako s et al. , 2019 ) parameterize body sha pe via PC A bl end shapes (10 components for SMPL, 300 for SMPL-X) learned from registered 3D body scans and dominates research adoptio n. MH R ( Ferguso n et al. , 2025 ) parameteriz es body shape via a combination of PCA identit y coeﬃcients and explicit bon e-length scale f actors that directly mod ulate skel etal proportions, pro viding ﬁne-grain ed control o ver body surface sha pe and skeletal proportions. Anny ( Brégier et al. , 2025 ) parameteri zes body shape using six anthropometric phenot ypes (gender , age, m uscle, weight, height, proportions) that drive multi-linear blends hapes spanning inf ants to elders. Anny is the sole backend capab le of representing subjects belo w 18 years of a ge, hence making it the natural choice for appli cations requiring age-div erse or child-inclusiv e digital humans. GarmentMeasurements ( Korostel eva and Sorkine-Horn ung , 2023 ) encodes body sha pe via 15 PCA components ﬁtted to body scan dat a, similar to SMPL/SMPL- X, and is o ften adopted for garment modeling literature. 3.3. Mes h T opology Abstraction The mesh topology abstractio n layer maps div erse source topologies from the identit y m odel pro vider to the S OMA cano nical mesh. W e pre-compute a ﬁxed 3D barycentric correspo ndence at initializ atio n and apply it as a light w eight gather at runtime, a s illustrated in Fig. 4 illustrates. More speciﬁcally , giv en the source neutra l mesh ( 𝑉 𝑠 , 𝐹 𝑠 ) and a S OMA wrap mesh 𝑉 ( 𝑠 ) ℎ ( a v ersion of the S OMA template manually registered to the source model’s neutral pose by an artist), we compute f or each S OMA 5 SOMA : Unif ying P arametric Human Body Models A n n y M H R SM P L SM P L - X G a r m e n t M e a s u r e m e n t Figure 4: Mes h topology abstractio n. T op: nativ e mesh topologies o f each identit y model. Bottom: the same identities mapped to the shared S OMA topology via 3D barycentric interpol atio n. This commo n mesh serv es as the pivot for all cross-model operations—skel eton ﬁtting, pose transfer , correctiv es, and shape-s pace comparison all operate on a single canonica l topology regardl ess o f the source m odel. v ertex v ℎ 𝑖 its 3D barycentric coordinates within a loca l tetrahedro n o f the source mesh. For the closest source triangle 𝑓 𝑗 = ( 𝑢 1 , 𝑢 2 , 𝑢 3 ) to v ℎ 𝑖 , we lift it to a tetrahedron by adding a fo urth v ertex along the face normal, 𝑢 4 = 𝑢 1 + ( 𝑢 2 − 𝑢 1 ) × ( 𝑢 3 − 𝑢 1 ) , and solve for the barycentric coordin ates b ∈ R 4 via a 3 × 3 linear system. This tetrahedral lifting hand les query points slightly oﬀ the surface without degeneracy . Unlike 2D barycentric projectio n, 3D tetrahedral interpolation preserves volume in regio ns without clear surf ace correspondence, for example, when mapping bet ween models with and without individua l toes. The pre-computatio n runs once at initializ atio n; its output-, a face-index array f ∈ Z 𝑁 ℎ and coordinate array B ∈ R 𝑁 ℎ × 4 , is stored as a ﬁxed b uﬀer . At runtime, giv en deformed source v ertices 𝑉 𝑠 ( 𝛽 ) ∈ R 𝑁 𝑠 × 3 , each S OMA v ertex is reconstructed as a w eighted combinatio n o f its corresponding tetrahedron’s v ertices: v ℎ 𝑖 ( 𝛽 ) = 3  𝑘 =0 𝐵 𝑖𝑘 · ˜ 𝑉 𝑠 ( 𝛽 )[ F tet f 𝑖 , 𝑘 ] , (1) where ˜ 𝑉 𝑠 ( 𝛽 ) is the source mesh augmented with one normal-o ﬀset point per f ace. The cost is independent of the source vertex count 𝑁 𝑠 . 3.4. Skel et al Abstractio n Once all source meshes share a commo n topology ( Sec. 3.3 ), we need a single skeletal structure to drive their pose. W e introduce SkeletonTransfer , a backend-agn ostic algorithm that precisely ﬁts any template skel eton into a new body shape giv en only the shared mesh correspondence. In S OMA , we apply it to ﬁt a 𝐽 =77 joint rig; giv en a rest sha pe 𝑉 ℎ ( 𝛽 ) ∈ R 𝑁 ℎ × 3 on the S OMA topology , it recov ers the f ull set of w orld-space joint transforms { 𝑇 𝑘 } 𝐽 𝑘 =1 ⊂ 𝑆 𝐸 (3) in t w o analytica l stages: joint position regression and joint rot ation ﬁtting. Fig. 5 illustrates ho w the ﬁtted skelet on adapts to bodies of varying proportions. 6 SOMA : Unif ying P arametric Human Body Models Figure 5: S keleton ﬁtting on posed SAM 3D Body identities. E ight MH R identities in diverse poses with the S OMA skeleto n ﬁtted via SkeletonTransfer ( Sec. 3.4 ). Unlike joint regressors that assume a rest pose, our method genera liz es to arbitrary posed shapes: joint positi ons are regressed via RBF interpolation, and joint rot atio ns are reco vered by Procrustes alignment, both in a single analyti cal f orward pass with no iterativ e optimization. 3.4.1. Stage 1: Joint positio n regression via RBF F or each joint 𝑘 ∈ { 1 , . . . , 𝐽 } , we pre-build a per-joint Radial Basis Function regressor from the cano nical bind-pose mesh. The regressor uses the subset of verti ces 𝒩 𝑘 that hav e non-zero skinning weight for joint 𝑘 or its parent. Given the canonica l bind-sha pe v ertex positi ons 𝑉 bind , the RBF basis weights w 𝑘 are solved once by the linear system: Φ( 𝒩 𝑘 ) w 𝑘 = j bind 𝑘 , (2) where Φ is the linear RBF kernel eva lu ated at the neighborhood vertex positi ons, and j bind 𝑘 is the canoni cal joint positi on. At runtime, giv en identit y rest shape 𝑉 ℎ ( 𝛽 ) , joint 𝑘 ’s w orld-space position is predicted as: j 𝑘 ( 𝛽 ) = Φ  𝑉 ℎ ( 𝛽 ) 𝒩 𝑘  w 𝑘 , (3) which is a single linear operation. All joint positi ons are computed in parallel vi a a pre-assemb led sparse matrix W RBF ∈ R 𝐽 × 𝑁 ℎ , redu cing the f ull joint positi on update to one sparse matrix m ultiplicatio n: 𝐽 ( 𝛽 ) = W RBF 𝑉 ℎ ( 𝛽 ) 𝑇 . (4) 3.4.2. Stage 2: Joint rot ation ﬁtting via Kabsch alignment Jo int positi ons alo ne do not f ully deﬁne the skeleto n since each joint also requires an orientation that est ab lishes its local coordinate frame. Since source models assume diﬀerent canonica l poses ( e.g ., T -pose vs. A-pose), these orientations cannot be copied from the bind pose and must be ﬁtted to the identit y’s rest sha pe. With identit y-adapted joint positions { j 𝑘 ( 𝛽 ) } in hand, we reco ver the rotation compon ent of each world-space joint transform vi a a t wo-step Kabsch alignment procedure. Stage 2a: Inv erse LBS initi alizatio n. For joint 𝑘 , let 𝒱 𝑘 be the set o f verti ces with non-negligib le skinning w eight for joint 𝑘 . W e estimate an initi al globa l rotation 𝑅 init 𝑘 ∈ 𝑆 𝑂 (3) by solving the weighted orthogonal Procrustes 7 SOMA : Unif ying P arametric Human Body Models prob lem ( Kabsch , 1976 ): 𝑅 init 𝑘 = arg min 𝑅 ∈ 𝑆 𝑂 (3)  v ∈𝒱 𝑘   𝑅  v bind − j bind 𝑘  −  v ( 𝛽 ) − j 𝑘 ( 𝛽 )    2 . (5) This is solv ed via S VD of the cross-co variance matrix. Stage 2b: Child bone alignment. The initial rotation 𝑅 init 𝑘 aligns the skinned vertex cloud b ut may not correctly orient the bon e v ectors to ward child joints. W e compute a reﬁnement rot ation 𝑅 align 𝑘 that aligns the rot ated bind bon e vectors 𝑅 init 𝑘 ( j bind 𝑐 − j bind 𝑘 ) to the t arget bone vectors j 𝑐 ( 𝛽 ) − j 𝑘 ( 𝛽 ) . For joints with a single child (the majorit y of the skeleto n), this is the shortest-arc (R odrigues) rotation bet ween t wo v ectors; for joints with multiple children, we solve the Procrustes probl em ( Eq. (5) ) o ver the set of child bone vectors. The ﬁnal w orld-space rot ation is 𝑅 𝑘 = 𝑅 align 𝑘 · 𝑅 init 𝑘 · 𝑅 bind 𝑘 , where 𝑅 bind 𝑘 is the canoni cal bind-pose world rotation, and the complete transform is 𝑇 𝑘 = SE3( 𝑅 𝑘 , j 𝑘 ( 𝛽 )) . Both paths are f ully vectorized across all joints vi a NVI DIA W arp custom kernels, enab ling massiv ely parallel GPU execution with no sequential joint loop. 3.5. Animatio n Layer Giv en the identit y-adapted joint transforms { 𝑇 𝑘 ( 𝛽 ) } and rest sha pe 𝑉 ℎ ( 𝛽 ) , S OMA ’s animatio n l ayer applies standard Linear Blend Skinning (LBS) and correctiv es to produ ce animated vertices. 3.5.1. Po sing Giv en input pose parameters ( axis-angle ( 𝐵 , 77 , 3) or rot ation matrices ( 𝐵 , 77 , 3 , 3) ), together with an optio n al root translation 𝑡 0 ∈ R 3 , forward kinematics computes global joint transforms { 𝐺 𝑘 ( 𝜃 ) } 𝐽 − 1 𝑘 =0 by composing local rotations up the joint hierarchy . S OMA can optiona lly apply joint orient ( pose-relative parameteri zation) where input rot ations are expressed rel ativ e to the joint’s canoni cal pose (usua lly 𝑇 -pose or 𝐴 -pose), which matches the conv ention of many parametric human models and their associated dat asets. Posed vertex positio ns are then: v ′ 𝑖 = 𝐽 − 1  𝑘 =0 𝑤 𝑖𝑘 𝐺 𝑘 ( 𝜃 ) 𝑇 bind − 1 𝑘 ˜ v 𝑖 , (6) where ˜ v 𝑖 is the hom ogeneous rest-sha pe positi on and 𝑤 𝑖𝑘 is the skinning weight. 3.5.2. Po se-Dependent Correctives Standard LBS produ ces well-kno wn artifacts at joints undergoing l arge rot atio ns ( elbow , shoulder , knee). Some identit y m odels ship with their o wn correctiv es ( e.g. MH R), while others such as Anny do not. S OMA pro vides a single uniﬁed correctiv es m odel that applies to all backends, by operating on the shared canoni cal topology and rest pose establis hed by the preceding abstracti on layers. The correctives are predicted by a light weight non-linear MLP net w ork and applied to the rest shape before skinning: 𝑉 corr ℎ ( 𝛽 , 𝜃 ) = 𝑉 ℎ ( 𝛽 ) + 𝑓 MLP ( 𝜃 ) , (7) where 𝑓 MLP takes as input the local joint rotations { 𝑅 𝑘 ( 𝜃 ) } 𝐽 − 1 𝑘 =0 in 6D representation ( Zhou et al. , 2019 ). The MLP foll ows a t w o-st age structure inspired by MH R ( Ferguso n et al. , 2025 ): joint rotations are mapped to a bank of 𝐾 = 𝐽 × 𝐶 correctiv e activ ations ( 𝐶 = 24 ), which are then mapped to per-vertex displ acements. Fixed anatomica l ma sks deriv ed from skinning w eights and geodesic dist ances enforce spatial localit y and sparsit y . T raining data is distilled from MH R by sampling ≈ 80 , 000 MH R posed meshes onto the S OMA topology via barycentric interpolation ( Sec. 3.3 ) and pose inv ersion ( Sec. 3.6 )—a l arge-scal e distillation made practica l by S OMA ’s uniﬁed topology and pose abstractio n. Fig. 6 sho ws some examples o f our uniﬁed correctives for all body models. 8 SOMA : Unif ying P arametric Human Body Models Figure 6: Uniﬁed pose correctives across identit y backends. Each ro w sho ws a diﬀerent pose; columns correspond to S OMA -Sha pe, MH R, SMPL, Anny , and GarmentMeasurements. For each cell, the left mesh sho ws the corrective displacement appli ed to the canoni cal rest shape, and the right mesh shows the ﬁn al posed result. A single correctives model trained once on S OMA ’s canonica l topology produces anatomica lly plausib le deformati ons for all backends. 3.6. P ose Abstractio n The sections abo ve describe the forward path of S OMA : giv en an identit y and a pose, the framework produ ces a posed mesh in a uniﬁed representation. A complementary operation is equally import ant in practice— reco vering S OMA pose parameters from an already-posed mesh. W e call this pose abstracti on : just as topology abstracti on ( Sec. 3.3 ) and skeletal abstractio n ( Sec. 3.4 ) unif y heterogeneous body shapes into a single identit y representation, pose abstractio n uniﬁes heterogeneous pose dat a into a single S OMA skeleto n con venti on. This enab les m otion sequences captured or generated with SMPL, MH R, Anny , or any other supported backend to be directly consumed by downstream S OMA appli cations without custom ret argeting. Large-scale moti on dat asets such as AMASS ( Mahm ood et al. , 2019 ) and SAM 3D Body ( Y ang et al. , 2026 ) thereby become nativ ely usab le through SOMA. The core algorithmi c challenge is pose inv ersio n : recov ering per-joint rot ations from posed vertex positio ns—the inv erse o f the forward kinemati c and LBS pipeline described in S ec. 3.5 . W e describe the pose inv ersion 9 SOMA : Unif ying P arametric Human Body Models algorithm belo w . Multi-topology input . Posed vertices from any supported mesh topology—not only the n ativ e S OMA mesh—are accepted as input. When the input topology diﬀers from S OMA ’s canoni cal mesh, the same barycentric transfer used in the f orward path ( Sec. 3.3 ) ﬁrst maps the input verti ces to the S OMA topology . From this point onward, the inv ersion algorithm operates entirely in SOMA’s cano nica l v ertex and skeleton space. Initialization vi a s keleton transfer . Po se inv ersion begins with the same Kabsch-based skeleto n ﬁtting proced ure described in Sec. 3.4 : a single-pass RBF joint regression foll ow ed by Procrustes alignment provides an initi al world-s pace rotation estimate f or each joint. This initializ atio n is a lready a rea sonabl e approximati on o f the t arget pose and can serve as a st andalo ne fast solver when only coarse pose reco very is needed. Iterativ e inv erse-LBS reﬁnement . St arting from the skeleton-transf er initi alizatio n, the algorithm reﬁnes joint rotations lev el-by-level in parent-to-child order . For joint 𝑘 , the skinned vertex positio ns are decomposed into a subtree contrib utio n (v ertices predominantly inﬂuen ced by 𝑘 and its descendants) and a non-subtree contrib ution from already-solved ancestor joints. The ancestor contributi on is subtracted from the observed posed verti ces, isolating the local deformati on attrib ut ab le to joint 𝑘 al one. A Procrustes a lignment ( Eq. (5) ) is then solved for the local rotation. Newto n-S chulz orthogonalization . The standard Kabsch algorithm computes the nearest rotation matrix from the cross-co variance matrix 𝐻 = 𝐴 𝑇 𝐵 via SVD: 𝑅 = 𝑈 𝑉 𝑇 where 𝐻 = 𝑈 Σ 𝑉 𝑇 . How ever , when the point cloud contributing to a joint’s covariance is near-coplanar—as commo nly occurs at body parts such as clavicl es—the smallest singular valu e 𝜎 3 approaches z ero and the corresponding singul ar vector becomes ill-deﬁned. Under these conditi ons, small perturbations in the input verti ces can ﬂip the sign of a singular v ector bet ween consecutiv e frames, causing a discontinu ous 180 rotation jump (“sho ulder popping”). T o a void this inst abilit y , our iterativ e reﬁnement replaces SVD with Newton-Schulz orthogonalization ( Ko varik , 1970 ), which computes the pol ar factor of 𝐻 via the ﬁxed-point iteration 𝑅 𝑖 +1 = 1 2 𝑅 𝑖 (3 𝐼 − 𝑅 𝑇 𝑖 𝑅 𝑖 ) , 𝑅 0 = 𝐻 / ‖ 𝐻 ‖ ∞ , (8) where ‖ 𝐻 ‖ ∞ is the inﬁnit y norm ( maximum absolute row sum) that guarantees conv ergence. Because this iteration reﬁnes the rotation estimate continu ous ly from its current valu e rather than decomposing and reassemb ling singul ar vectors, it is immun e to the sign-ﬂipping discontinuit y inherent in SVD for near-degen erate co variance matrices. Hierarchi cal sched uling . The reﬁnement sched ule mirrors the skeletal hierarchy: body joints are solved ﬁrst, follo wed by optional ﬁnger joint reﬁnement, and a ﬁnal global pass cov ers all joints simultaneous ly . This coarse-to-ﬁne schedul e ensures that large-scale body moti on is resolved before ﬁne-grained ﬁnger articulation, which w ould otherwise be contaminated by uncorrected upstream errors. Optio n al autograd reﬁnement . F or applicati ons that require higher accuracy at the cost of throughput, an optio n al gradient-ba sed reﬁnement stage is pro vided. J oint rotations are parameterized as contin uous 6D v ectors ( Z hou et al. , 2019 ) and optimiz ed with Adam by backpropagating through the f ull FK+LBS comput atio n ( Eq. (6) ). This stage m ust be warm-started from the analytica l result: the F K+LBS objective is highly non- con vex, and naïve optimiz ation from the bind pose fails to con verge, settling into a local minimum with entirely incorrect limb pl acement ( S ec. 4.2 ). With the an alyti cal initializ atio n, autograd reﬁnement con verges rapid ly and can f urther reduce error at extremities ( hands, feet, head) by optimizing through the full kinematic chain with per-vertex loss weighting. The analytica l solv er al one achieves approximately 1,200 frames per second on an NVI DIA RTX 5000 Ada GPU, while the autograd path runs at 16–18 FPS for 100 optimi zer steps, making each mode suit ab le for diﬀerent points on the speed–accuracy tradeoﬀ curve. 10 SOMA : Unif ying P arametric Human Body Models T ab le 1: T opology a bstraction ﬁdelit y across backends. Closest-po int-to-mesh dist ance (mm) bet ween the S OMA rest shape and the n ativ e source mesh, measured o ver 100 div erse identities per backend. “W rap” is the baselin e registratio n error of the pre-registered S OMA wrap mesh against the source neutra l mesh. V ertices in facial inner geometry ( eye ba gs, mouth bag) and bet ween-t oes regio ns—which hav e no correspondence in m ost source topologi es—are excluded. Backend Src. V erts Mean ( mm) Std ( mm) P95 ( mm) W rap Mean W rap P95 S OMA native – 0.0 0.0 0.0 – – SMPL 6,890 0.12 0.45 0.71 0.12 0.74 SMPL- X 10,475 0.06 0.22 0.45 0.06 0.47 Anny 13,718 0.01 0.12 0.01 0.01 0.01 MH R ∼ 18k 0.40 0.73 1.49 0.31 1.28 Figure 7: Sha pe diversity across backends driv en by a single pose. Three sampled identities per backend— S OMA -Sha pe (green), MH R (blue), SMPL ( pink), and Anny (yello w)—all driven by the same skel et al pose. Despite originating from entirely diﬀerent generativ e models, all bodies share the same pose interpret ation, illustrating the plug-and-play nature o f S OMA identit y-pose decoupling. 4. Ev aluatio n W e eva luate S OMA al ong f our dimensi ons: topology abstracti on ﬁdelit y ( S ec. 4.1 ), pose inv ersion accuracy ( Sec. 4.2 ), runtime performance ( Sec. 4.3 ), and cross-model sha pe-space compariso n ( Sec. 4.4 ). 4.1. T opology Abstractio n Fidelit y The topology transfer is the ﬁrst st age o f the pipeline and any error here propagates to all downstream t as ks. T ab. 1 reports per-vertex transfer errors for each backend ov er 100 div erse identities. For each S OMA v ertex, w e query the closest point on the native source mesh surf ace and report the 𝐿 2 distance. This measures the geometric inf ormation loss introduced by the barycentric interpolation, without conﬂating it with unit con ventio n or alignment artif acts. The S OMA -native backend incurs zero error by constructio n. SMPL and SMPL-X achiev e sub-millimeter mean errors (0.12 mm and 0.06 mm respectively), and their transfer errors closely match the wrap baselin e, conﬁrming that the barycentric interpolation introd uces negligib le additional distorti on bey ond the one-time mesh registrati on. Anny achieves near-zero error (0.01 mm mean), reﬂecting a particularly clean wrap registratio n. MH R shows a slightly higher mean error (0.40 mm), reﬂecting its denser mesh and more complex geometry , but remains well belo w 1 mm; the modest increase o ver the wra p baseline (0.31 mm) indicates that the topology transfer generalizes well across the MH R shape space. All P95 errors st ay below 1.5 mm. Fig. 7 sho ws the sha pe div ersit y achieved across a ll backends using identica l pose parameters, demonstrating that the uniﬁed pipeline faithf ully represents the sha pe characteristics o f each source model. 11 SOMA : Unif ying P arametric Human Body Models T ab le 2: Pose inv ersion accuracy and throughput on AMASS. Per-v ertex reconstructi on error ( mm) and throughput ( frames/sec) on an NVIDIA A100 GPU . “S kel. transfer” is the raw skelet on-ﬁtting initialization with no iterative reﬁnement. “ Analytica l” adds in verse-LBS reﬁnement with Newton-Schulz orthogonalizatio n ( body=2, full=1). “ Aut ograd FK” optimizes 6D rotation parameters through F K+LBS with Adam (100 iteratio ns); “no init” st arts from the bind pose, “w/ init” warm-starts from the skeleto n transfer result. “ Analytica l + Aut ograd” warm-st arts autograd from the analytica l result (10 iterations). Method Mean ( mm) Median ( mm) Max ( mm) FPS Skel. transfer only 16.5 13.9 80.1 17,393 Analytica l 5.3 3.2 88.1 882 A utograd F K (no init) 501.8 479.4 1354.2 79 A utograd F K (w/ init) 4.1 2.1 81.5 78 Analytica l + A utograd (10) 7.8 6.4 88.6 435 T ab le 3: Per-regi on pose inv ersion error (MH R, 200 SAM 3D Body frames). Mean per-v ertex 𝐿 2 error (mm) by body region. “ Analytica l” = body=2, f ull=1. “+ A utograd” adds 100 Adam iteratio ns warm-started from the an alyti cal result. Method All Body H ands F eet Head Analytica l 8.8 16.8 4.7 8.2 6.9 + Aut ograd (100) 6.6 15.3 2.0 5.8 4.8 R edu ction 25% 9% 57% 29% 30% 4.2. P ose In v ersion Accuracy Po sed meshes are the comm on interface across heterogeneous body models, making mesh-to-jo int-angle inv ersio n the key operation for pose abstracti on. T ab. 2 reports pose inversi on accuracy and throughput f or both solv ers described in Sec. 3.6 . W e evaluate on the full AMASS dat aset ( Mahm ood et al. , 2019 ) (344 subjects, 2,265 moti ons, 40.3 hours, 19.8M frames). Errors are per-v ertex 𝐿 2 distances bet ween the original SMPL- X posed mesh and the SOMA reconstructi on driv en by the reco vered rot atio ns. Fig. 9 qualitatively compares the three st ages on SMPL and MH R backends. The skeleto n transfer initialization al one ( Sec. 3.4 ) provides a coarse b ut f ast pose estimate (16.5 mm mean at 17,393 FP S), suit ab le for applicati ons where speed dominates accuracy requirements. The an alytica l solver reﬁnes this to 5.3 mm mean error at 882 FP S via iterative inv erse-LBS with Newto n-Schulz orthogonalization. The autograd FK solver reaches 4.1 mm mean error by optimizing through the f ull FK+LBS chain, but only when warm-st arted from the skeleton transfer initialization; without initializatio n (starting from the bind pose), 100 Adam iteratio ns fail to con verge (501.8 mm mean error), demonstrating that the initializatio n is critical. The t wo solvers are complementary . The analytica l path is fast and prod uces a near-optima l result in terms of globa l 𝐿 2 error across all verti ces. The autograd FK path, by contrast, optimiz es through the f ull FK+LBS chain and supports per-vertex loss weighting on extremities ( hands, feet, head), giving explicit control ov er where the solv er concentrates its eﬀort. T ab. 3 breaks down per-vertex errors by body region on 200 SAM 3D Body ( Y ang et al. , 2026 ) frames (MH R backend). A utograd FK reﬁnement (100 iterati ons, warm-started from the analyti cal result) redu ces hand error by 57% (4.7 → 2.0 mm), foot error by 29% (8.2 → 5.8 mm), and head error by 30% (6.9 → 4.8 mm), while body trunk error decreases slightly (16.8 → 15.3 mm)—the optimizer redistributes error awa y from the extremities and onto the trunk, which has more vertices to absorb it. Fig. 8 visualizes this trade-o ﬀ on a hand close-up: the autograd result achieves a tight ov erlay at the ﬁngertips, at the cost of a slightly increa sed oﬀset on the body visib le in the background. Eﬀect o f Newton-Schulz orthogonalization . Fig. 10 ( a, b) compares the analytica l solver using standard SVD -based Kabsch alignment against the Newt on-Schulz variant described in Sec. 3.6 . The crops are taken 12 SOMA : Unif ying P arametric Human Body Models ( a) Analytica l only (b) Analytical + Aut ograd FK Figure 8: Hand zoom: analytica l vs. autograd FK reﬁnement. MHR backend on SAM 3D Body (teal = ground truth, red = S OMA reconstru ction). ( a) The an alyti cal solver is near-optima l globally but leav es residual misalignment at the ﬁngertips. (b) Adding 100 autograd FK iteratio ns redistributes error to ward the body trunk (visible as a slightly increa sed oﬀset in the background), achieving a tight o verlay at the ﬁngers. from the shoulder regio n of the same SMPL frame, where the contrib uting vertex cloud is near-coplanar . With S VD, the near-zero third singular value causes a sign ﬂip in the rotation solutio n, producing a visible discontin uous oﬀset at both shoulders (“shoulder popping”). Newton-Schulz orthogonalizatio n av oids this instabilit y by iteratively reﬁning the rot ation estimate without decomposing singular vectors, yielding a smooth and accurate result. T o quantif y the temporal eﬀect, we measure frame-to-frame shoulder-regi on vertex error change across the f ull 1,606-frame sequence: S VD exhibits a pea k error oscillatio n o f 1.6 mm/frame at the shoulders, compared to 0.8 mm/frame for Newto n-Schulz—a 2 × improv ement in temporal st abilit y . Importance of initialization . Fig. 10 ( c, d) illustrates why initialization is critical for gradient-ba sed pose inv ersion: on a simple standing pose from SAM 3D Body , 100 Adam iterations without initialization con v erge to a local minimum with entirely incorrect limb placement, while the analytica l solver recov ers the pose in a single pass with near-perfect ov erlay . 4.3. R untime Perf ormance S OMA integrates directly into l arge-scale f oundation model training loops, so the forward pass must be highly optimized. T ab. 4 reports throughput and latency across batch sizes and executio n m odes. Measurements were cond ucted on a single NVI DIA A100 80 GB GPU (W arp path) and a 32-core AMD EPY C 7763 CPU (PyT orch path), with the mid-resolutio n S OMA mesh and a S OMA -native identit y backend (which skips the topology abstracti on step). The skeleto n ﬁtting step (RBF regressio n + Kabsch) t akes under 1.5 ms on GPU regard less o f batch si ze, dem onstrating that the pre-f actored sparse matrix approach scales eﬃciently . The W arp GPU path achiev es o ver 7,000 meshes/sec at batch size 128. Adding a topology abstracti on step (SMPL or MH R backend) incurs approximately 0.3–0.8 ms additio n al l atency on GPU , which is negligibl e relative to tot al pipeline cost. 4.4. Cross-Model Sha pe-Space Comparison One of the key beneﬁts of S OMA ’s abstractio n layers is principled cross-m odel compariso n. W e dem onstrate this by evaluating four PCA shape models on 33 held-out body scans from an independent capture pipeline 13 SOMA : Unif ying P arametric Human Body Models ( a) Skel. transfer only (b) Analytical ( c) Analytica l + A utograd FK (d) Skel. transfer only ( e) Analytica l ( f ) Analytica l + A utograd F K Figure 9: P ose inv ersio n qualit y across methods. T op row: SMPL backend (yello w = ground truth, purple = S OMA reconstru ction). Bottom row: MH R backend on SAM 3D Body (teal = ground truth, red = S OMA reconstru ction). ( a, d) Skeleton transfer al one provides a coarse estimate with visibl e misalignment at the extremities. ( b, e) Analyti cal reﬁnement with Newton-Schulz orthogonalization closely tracks the ground truth. ( c, f ) Adding autograd F K reﬁnement achieves the tightest ov erl ay , redu cing residual error at the feet and hands. without ov erlap with any m odel’s PCA training dat a. This is t ypically infea sible without a unif ying framework like S OMA , since each model deﬁnes its o wn topology , rest pose, and deformatio n parameters. F or each model, ev ery scan is transferred to the model’s native mesh topology vi a barycentric interpol ation ( Sec. 3.3 ), reposed to the model’s canoni cal rest pose vi a pose inv ersion ( Sec. 3.6 ), and projected ont o the model’s PCA basis at f ull capacit y . T ab. 5 reports per-v ertex reconstructi on error . SMPL ’s 10-component basis captures coarse body proportio ns b ut leav es a 14 mm mean residua l, consistent with its limited shape dimensiona lit y . GarmentMeasurements (15 compo nents) red uces this to 12 mm. S OMA - Sha pe achiev es 5.8 mm mean with 128 components, closely matching SMPL- X ’s 5.5 mm at 300 compo nents— dem onstrating competitiv e expressiveness with few er than half the parameters. Fig. 11 visu alizes the per-vertex reconstru ction error across models for representative scans: SMPL and GarmentMeasurements sho w widespread resid u al (red), while SOMA-Sha pe and SMPL- X achiev e lo w error o ver mo st of the body surf ace ( blue). 5. Conclu sion W e hav e presented S OMA , a uniﬁed framework that decouples identit y represent ation from pose parameteri za- tio n across heterogeneous parametric body models. By mapping all supported backends to a single canoni cal mesh topology and rig, S OMA red uces the 𝑂 ( 𝑀 2 ) per-pair adapter problem to 𝑂 ( 𝑀 ) single-backend connectors, enab ling practitio ners to freely mix identit y sources and pose dat a at inference time. Three abstracti on layers— 14 SOMA : Unif ying P arametric Human Body Models ( a) SVD (b) Newton-Schulz ( c) A utograd (no init) (d) Analytical Figure 10: Po se inv ersion ablati ons. ( a, b) S houlder zoom from a single SMPL frame (yellow = GT , purple = S OMA ). S VD -based Kabsch alignment exhibits shoulder popping du e to a singul ar-vector sign ﬂip; Newto n- Schulz produces a smooth result. ( c, d) MH R on SAM 3D Body (teal = GT , red = S OMA ). Without initializ atio n, 100 autograd iteratio ns conv erge to an incorrect local minimum; the an alyti cal solver recov ers the pose correctly . T ab le 4: R untime performan ce. Throughput (meshes/sec) and per-call l atency (ms) of the S OMA forw ard pass. Breakdo wn shows skeleton ﬁtting cost (RBF regression + Kabsch) vs. total forw ard pass. W arp = GP U path (NVI DIA W arp LBS kernel); PyT orch = CPU dense path. Identit y backend: S OMA -native (no topology abstracti on ov erhead). Mode Batch Skel. ( ms) T ot al ( ms) Meshes/sec W arp ( GPU) 1 0.8 2.1 476 W arp ( GPU) 8 0.9 3.4 2,353 W arp ( GPU) 32 1.1 6.8 4,706 W arp ( GPU) 128 1.4 18.2 7,033 PyT orch ( CPU) 1 3.2 12.1 83 PyT orch ( CPU) 8 4.1 38.7 207 PyT orch ( CPU) 32 5.9 148.0 216 mesh topology , skeleton, and pose—unif y heterogeneous body sha pes into a single identit y representation, adapt the skeleto n to arbitrary identities and pose them with uniﬁed corrective deformatio ns shared across all backends, and recov er uniﬁed skeleton rotations directly from posed vertices without custom retargeting. The entire pipeline is f ully diﬀerentiab le, GPU-accel erated, and requires n o per-model training or iterativ e optimization. Ev aluation across multipl e backends and 100 div erse identities demo nstrates sub-millimeter mean topology transfer errors on body vertices, sub-centimeter pose inv ersion accuracy at ov er 300 FPS on a laptop GPU , and forward-pa ss throughput exceeding 7,000 meshes/sec at batch si ze 128. Limitations . Several limitations remain. First, topology transfer qualit y depends on the qualit y of the source m odel’s canoni cal mesh and the S OMA wrap registrati on: poorly registered wraps or source m odels with extreme vertex densit y asymmetry can degrade topology abstractio n accuracy . Second, despite pose-dependent correctiv es, standard LBS still prod uces artifacts at highly non-rigid deformati ons ( e.g . extreme elbo w ﬂexi on, shoulder abd ucti on beyo nd 90 degrees); learn ed correctives mitigate but do not eliminate these. Third, adding a new identit y backend requires implementing a new identit y model class and a one-time S OMA mesh registratio n using st andard non-rigid registratio n tools; this is a modest but no n-trivial engineering step. F ourth, S OMA ’s pose abstracti on is not a genera l-purpose ret argeter: it recov ers pose through mesh v ertex correspo ndence, so it is limited to models that share compatible human body geometry . It cannot abstract poses from characters with f undament ally diﬀerent geometry or rigging stru cture ( e.g . robots, non-human oid characters); such cases require a specia li zed retargeting solutio n. 15 SOMA : Unif ying P arametric Human Body Models T ab le 5: Cross-m odel PCA reconstructi on on 33 held-out body scans. P er-vertex 𝐿 2 distance ( mm) bet ween the unposed scan and the PCA reconstru ction at each model’s f ull component count 𝐾 . All models are evaluated through the same S OMA pipeline, diﬀering only in the t arget topology , rest pose, and PCA basis. Model 𝐾 Mean ( mm) Median ( mm) P95 ( mm) SMPL 10 14.11 12.31 30.49 GarmentMeasurements 15 11.81 10.67 24.18 S OMA-Shape ( O urs) 128 5.82 4.81 13.60 SMPL- X 300 5.45 4.34 12.97 Ackno wledgments W e thank Da vis Rempe, Mathis Petro vich, and Sehwi Park for valuab le feed back and help throughout the project. W e thank Cyrus Hogg and Mike Sandrik for their support with dat a acquisition, and Dennis Lynch, Will T ellford, Jo n Shepard, and Spencer Huang f or helpf ul guidance during devel opment. 16 SOMA : Unif ying P arametric Human Body Models Figure 11: Cross-m odel PCA reconstructi on error on held-out body scans. S OMA ’s topology and pose abstracti on l ayers enab le a principled cross-model comparison. Each column sho ws a diﬀerent shape model’s mesh topology , all rendered in S OMA A-pose; each ro w ( a–d) is the same identit y . Color encodes per-vertex 𝐿 2 error ( blue = 0 mm, white = 10 mm, red ≥ 20 mm). SMPL (10 compon ents) and GM ( GarmentMeasurements, 15) show widespread residua l, while S OMA -Sha pe (128) and SMPL- X (300) achieve lo w error across the body surface. 17 SOMA : Unif ying P arametric Human Body Models R eferences [1] Dra gomir Anguel ov , Pra veen Sriniva san, Da phne K oller , Sebastian Thrun, Jim R odgers, and James Da vis. Scape: shape completio n and animation of people. In ACM transactions on graphics (TOG) , volume 24, pa ges 408–416. A CM, 2005. 3 [2] F abi en Baradel, Matthieu Armando, Salma Gala aoui, R omain Brégier , Philippe W einzaepfel, Grégory R ogez, and Thomas Lucas. Multi-HMR: Multi-person whole-body human mesh reco very in a single shot. European Conference on Computer Visi on , 2024. 3 [3] R omain Brégier , Guénolé Fiche, L aura Brav o-Sánchez, Thoma s Luca s, Matthi eu Armando, Philippe W einzaepfel, Grégory R ogez, and F abien Baradel. Human mesh modeling f or anny body , 2025. URL https://arxiv.org/abs/2511.03589 . 2 , 3 , 5 [4] Ho ngsuk Choi, Gyeongsik Moon, Ju Y ong Chang, and Ky oung Mu L ee. Beyond static features for temporally consistent 3D human pose and shape from a video. In Proceedings of the I EEE/C VF Conferen ce on Computer Visi on and P attern Recogniti on , pages 1964–1973, 2021. 3 [5] Aaron F erguson, Ahmed A. A. Osman, Bert a Bescos, Carsten Stoll, Chris Twigg, Christoph L assn er , Da vid Otte, E ric Vignola, F abian Prada, Federi ca Bogo, Igor Santesteban, Ja vier R omero, Jenna Zarate, Jeongseok Lee, Jinhyung P ark, Jinlong Y ang, J ohn Doub lestein, Kishore V enkateshan, Kris Kit ani, L adislav Kav an, Marco Dal F arra, Matthew Hu, Matthew Cioﬃ, Michael F abris, Michael Ranieri, Mohammad Modarres, Petr Kadl ecek, Raw al Khirod kar , Rin at Abdrashito v , Ro main Prév ost, R oman Rajbhandari, R onald Mallet, R ussell P earsall, Sandy Kao, Sanjeev Kumar , Scott P arrish, S hoou-I Y u, Shunsuke Saito, T akaaki Shiratori, T e-Li W ang, T ony Tung, Y ichen Xu, Y uan Dong, Y uhu a Chen, Y uanlu Xu, Yuting Y e, and Zhongshi Jiang. MH R: Momentum human rig, 2025. URL . 2 , 3 , 5 , 8 [6] Shub ham Goel, Georgio s Pa vlakos, Jathushan Rajasegaran, Ang joo Kanazawa, and Jitendra Malik. Humans in 4d: Reco nstructing and tracking humans with transformers. In Proceedings of the I EEE/CVF Internationa l Conferen ce on Computer Visio n , pa ges 14783–14794, 2023. 3 [7] Shub ham Goel, Georgio s Pa vlakos, Jathushan Rajasegaran, Ang joo Kanazawa, and Jitendra Malik. R econstructing and tracking humans with transf ormers. Proceedings o f the I EEE/C VF Internatio n al Conferen ce on Computer Visio n , 2023. 3 [8] Umar Iqbal, Kevin Xie, Y unrong Guo, Jan Kautz, and P avl o Molchano v . KAMA: 3D keypoint aware body mesh articulation. In 3DV , 2021. 3 [9] W olfgang Ka bsch. A solutio n for the best rot ation to relate t w o sets o f v ectors. Act a Cryst allogra phica Section A: Crystal Physi cs, Diﬀracti on, Theoretical and General Crystallogra phy , 32(5):922–923, 1976. 8 [10] Ang joo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End-to-end reco v ery of human sha pe and pose. In Proceedings of the I EEE/CVF Conf erence on C omputer Visi on and P attern Recogniti on , pa ges 7122–7131, 2018. 3 [11] Muhammed Kocaba s, Nikos Athanasi ou, and Michael J Black. VI BE: Video inferen ce for human body pose and shape estimatio n. In Proceedings of the I EEE/C VF C onf erence on Computer Visi on and P attern R ecognition , pages 5253–5263, 2020. 3 [12] Muhammed Kocaba s, Chun-Hao P Huang, Otmar Hilliges, and Michael J Black. P ARE: P art attention regressor for 3D human body estimatio n. In Proceedings of the I EEE/C VF International Conferen ce on Computer Visio n , pa ges 11127–11137, 2021. 3 [13] Muhammed Koca bas, Y e Yuan, P avlo Molchano v , Y unrong Guo, Michael J . Bl ack, Otmar Hilliges, J an Kautz, and Umar Iqbal. P A CE: Human and m otion estimation from in-the-wild videos. In 3DV , 2024. 3 18 SOMA : Unif ying P arametric Human Body Models [14] Nikos Koloto uros, Georgios P avlako s, Michael J Bl ack, and Kostas Daniilidis. Learning to reconstruct 3D human pose and shape vi a m odel-ﬁtting in the loop. In Proceedings o f the I EEE/C VF International Conferen ce on Computer Visio n , pa ges 2252–2261, 2019. 3 [15] Maria K orosteleva and Olga Sorkine-Horn ung. GarmentCode: Programming parametric sewing patterns. A CM Transacti on on Graphics , 42(6), 2023. doi: 10.1145/3618351. SIGGRAPH ASIA 2023 issue. 2 , 5 [16] Zdislav V . Ko varik. Some iterative methods f or impro ving orthon ormalit y . SIAM Jo urn al on Numerica l Analysis , 7(3):386–389, 1970. 10 [17] Jief eng Li, Jinkun Cao, Haotian Z hang, Davis Rempe, Jan Kautz, Umar Iqbal, and Y e Yuan. Genmo: A genera list m odel for human moti on. arXiv preprint , 2025. 2 , 3 [18] Matthew L oper , Naureen Mahmood, Ja vier Ro mero, Gerard Pons-Moll, and Michael J . Black. SMPL: A skinned multi-person linear model. A CM Transacti ons on Gra phics, (Proc. SIGGRAPH Asia) , 34(6): 248:1–248:16, October 2015. 2 , 3 , 5 [19] Naureen Mahmood, Nima Ghorbani, Nikolaus F . Troje, Gerard Pons-Moll, and Michael J . Bl ack. AMASS: Archiv e o f m otion capture as surface sha pes. In IC C V , 2019. 9 , 12 [20] Ahmed A A Osman, Tim o Bolkart, and Michael J . Bl ack. ST AR: A sparse trained articulated human body regressor . In European C onf erence on Computer Visio n (EC C V) , pages 598–613, 2020. URL https: //star.is.tue.mpg.de . 3 [21] Priyanka P atel and Michael J . Bl ack. Camerahmr: Aligning people with perspecti ve. Intern ationa l Conferen ce on 3D Visio n (3DV) , 2025. 3 [22] Georgio s Pa vlakos, V asil eios Chout as, Nima Ghorbani, Tim o Bolkart, Ahmed A. A. Osman, Dimitrio s Tzio n as, and Michael J . Bl ack. Expressiv e body capture: 3d hands, face, and body from a single image. In Proceedings I EEE C onf . on C omputer Visi on and P attern Recogniti on ( CVPR) , 2019. 2 , 3 , 5 [23] Mathis Petro vich, Or Litany , Umar Iqbal, Michael J . Black, Gül V arol, Xue Bin Peng, and Davis R empe. Multi-track timeline control for text-driven 3d human m otion generati on. In CVPR W orks hop on Human Motio n Generati on , 2024. 3 [24] Leonid Pishchulin, Stefanie Wuhrer , Thomas Helten, Christian Theobalt, and Bernt Schiele. Building statistical shape spaces f or 3d human m odeling. P attern R ecognitio n , 2017. 3 [25] J avier R omero, Dimitrios Tzionas, and Michael J . Bl ack. Embodied hands: Modeling and capturing hands and bodies together . A CM Transacti ons on Graphics, (Proc. SIGGRAPH Asia) , 36(6), No vember 2017. 3 [26] Da vid A. R oss, Jongw oo Lim, R uei-Sung Lin, and Ming-Hsuan Y ang. Increment al learning f or rob ust visual tracking. IJCV , 77(1–3):125–141, 2008. 5 [27] Ist ván Sárándi and Gerard P ons-Moll. Neura l loca li zer ﬁelds for continu ous 3d human pose and sha pe estimatio n. Advances in Neural Informati on Processing S ystems , 37:140032–140065, 2024. 3 [28] Zehong Shen, Huaijin Pi, Y an Xia, Zhi Cen, Sida Peng, Zechen Hu, Hujun Bao, R ui zhen Hu, and Xiao wei Zhou. W orld-grounded human motio n reco very via gra vit y-view coordinates. In SI GGRAPH Asia , 2024. 2 , 3 [29] Soyo ng Shin, Juy ong Kim, Eni Halilaj, and Michael J Black. WHAM: R econstructing world-grounded humans with accurate 3D m otion. arXiv preprint , 2023. 3 [30] [ TC] 2 . Sizeusa: The national sizing surv ey . T echnical report, T extile/Clothing T echnology Corporatio n, Cary , NC, 2004. URL http://www.sizeusa.com . 5 19 SOMA : Unif ying P arametric Human Body Models [31] Guy T evet, Sigal Raab, Brian Gordon, Y onat an Shaﬁr , Daniel Cohen-Or , and Amit H Bermano. Human m otion diﬀ usio n model. In ICLR , 2023. 3 [32] T riplegangers. T riplegangers 3d scans. https://triplegangers.com , 2025. Accessed: 2025. 5 [33] Y uf u W ang, Y u Sun, Priyanka P atel, Kostas Daniilidis, Michael J Bl ack, and Muhammed Kocaba s. Prompthmr: Promptabl e human mesh recov ery . In Proceedings of the Computer Visi on and Pattern R ecognition C onferen ce , pa ges 1148–1159, 2025. 3 [34] Y uf u W ang, Ev onne N g, Soyong Shin, Rawa l Khirodkar , Yuan Dong, Zhaoen Su, Jinhyung P ark, Kris Kitani, Alexander Richard, F abian Prada, and Michael Zollhofer . Duo mo: Dual m otion diﬀ usio n for world-space human reconstructi on. arXiv preprint , 2026. 2 , 3 [35] Ho ngyi Xu, Eduard Gabriel Bazavan, Andrei Zanﬁr , William T Freeman, Rahul Sukthankar , and Cristian Sminchisescu. GHUM & GHUML: Generativ e 3D human shape and articulated pose models. In Proceedings o f the I EEE/CVF Conferen ce on Computer Visio n and P attern R ecognition , pages 6184–6193, 2020. 3 [36] Xitong Y ang, Devansh Kukreja, Don P inkus, Anushka Sagar , T aosha F an, Jinhyung Park, Soyong Shin, Jinkun Cao, Jiaw ei Liu, Nicolas U grinovi c, Matt Feiszli, Jitendra Malik, Piotr Dollar , and Kris Kit ani. Sam 3d body: R obust f ull-body human mesh recov ery . arXiv preprint , 2026. 3 , 9 , 12 [37] Y e Y uan, Umar Iqbal, P avlo Molchano v , Kris Kit ani, and Jan Kautz. Gl amr: Global occlusi on-aware human mesh recov ery with dynamic cameras. In CVPR , 2022. 3 [38] Y e Y uan, Jiaming S ong, Umar Iqbal, Aras h V ahdat, and Jan Kautz. Physdiﬀ: Physi cs-guided human m otion diﬀ usio n m odel. In ICCV , 2023. 3 [39] Mingyuan Zhang, Z hongang Cai, Liang Pan, F angzhou Hong, Xinying Guo, L ei Y ang, and Z iw ei Liu. Mo- tio ndiﬀ use: T ext-driv en human moti on generatio n with diﬀ usio n model. arXiv preprint , 2022. 3 [40] Mingyuan Zhang, Daisheng Jin, Chenyang Gu, F angzhou Hong, Zhongang Cai, Jingf ang Huang, Chongzhi Zhang, Xinying Guo, L ei Y ang, Y ing He, et al. L arge motio n model for uniﬁed multi-m odal moti on generati on. In ECCV , 2024. 2 , 3 [41] Y i Zhou, C onn elly Barnes, Jingwan Lu, Jimei Y ang, and Hao Li. On the continuit y of rot atio n represen- tations in neura l networks. In Proceedings o f the I EEE/CVF Conferen ce on Computer Visio n and Pattern R ecognition , pages 5745–5753, 2019. 8 , 10 20

SOMA: Unifying Parametric Human Body Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment