HyPlaneHead: Rethinking Tri-plane-like Representations in Full-Head Image Synthesis
Tri-plane-like representations have been widely adopted in 3D-aware GANs for head image synthesis and other 3D object/scene modeling tasks due to their efficiency. However, querying features via Cartesian coordinate projection often leads to feature entanglement, which results in mirroring artifacts. A recent work, SphereHead, attempted to address this issue by introducing spherical tri-planes based on a spherical coordinate system. While it successfully mitigates feature entanglement, SphereHead suffers from uneven mapping between the square feature maps and the spherical planes, leading to inefficient feature map utilization during rendering and difficulties in generating fine image details. Moreover, both tri-plane and spherical tri-plane representations share a subtle yet persistent issue: feature penetration across convolutional channels can cause interference between planes, particularly when one plane dominates the others. These challenges collectively prevent tri-plane-based methods from reaching their full potential. In this paper, we systematically analyze these problems for the first time and propose innovative solutions to address them. Specifically, we introduce a novel hybrid-plane (hy-plane for short) representation that combines the strengths of both planar and spherical planes while avoiding their respective drawbacks. We further enhance the spherical plane by replacing the conventional theta-phi warping with a novel near-equal-area warping strategy, which maximizes the effective utilization of the square feature map. In addition, our generator synthesizes a single-channel unified feature map instead of multiple feature maps in separate channels, thereby effectively eliminating feature penetration. With a series of technical improvements, our hy-plane representation enables our method, HyPlaneHead, to achieve state-of-the-art performance in full-head image synthesis.
💡 Research Summary
The paper presents a comprehensive study of the shortcomings of existing tri‑plane‑based representations used in 3D‑aware generative adversarial networks (GANs) for full‑head synthesis, and introduces a novel hybrid representation called “hy‑plane”. Traditional tri‑plane structures, popularized by EG3D, store features on three orthogonal Cartesian planes. While this design is memory‑efficient and leverages head symmetry, it couples features across opposite sides of the head, leading to feature entanglement and conspicuous mirroring artifacts in asymmetric regions (e.g., the back of the head showing front‑face textures). SphereHead attempted to solve this by switching to a spherical coordinate system and using a spherical tri‑plane. Although it eliminates mirroring, the naïve θ‑ϕ warping is non‑equal‑area: features become densely packed near the poles and sparse near the equator, causing inefficient utilization of the square feature map, seam discontinuities at ϕ = ±π, and polar artifacts. Moreover, SphereHead’s dual‑sphere solution doubles parameters and still suffers from uneven expressiveness because each sphere’s sparse equatorial region must cover the other’s dense polar region.
The authors first identify a subtle but persistent problem common to both representations: feature penetration across convolutional channels. Because each plane is generated from a separate channel, convolutional kernels receive the same input values at a given uv location for all planes, making it difficult for the network to disentangle plane‑specific information without explicit supervision. This leads to inter‑plane interference, especially when one plane dominates, and manifests as faint but noticeable artifacts.
To address these issues, the paper proposes three key innovations:
-
Unify‑Split Strategy – Instead of producing multiple feature maps in separate channels, the generator creates a single‑channel unified feature map and then splits it into the required planes. This eliminates cross‑channel feature penetration entirely, allowing each plane to learn its own representation without interference.
-
Near‑Equal‑Area Warping – The spherical plane is no longer warped with raw (θ, ϕ) coordinates. The authors adopt the Lambert azimuthal equal‑area (LAEA) projection, converting spherical coordinates (colatitude θ, longitude ϕ) into polar coordinates (R, Θ) via
R = 2 · cos(ϕ / 2), Θ = −θ.
The resulting circular map is then transformed into a square using an elliptical grid mapping. This process preserves area almost uniformly, removes the seam at ϕ = ±π, and distributes features evenly across the sphere, thereby maximizing the utilization of the square feature map and avoiding polar density spikes. -
Hybrid‑Plane (Hy‑Plane) Representation – The core contribution is a 3 + 1 configuration: three orthogonal planar tri‑planes (capturing symmetric, dense features) plus one spherical plane (capturing anisotropic, asymmetric details). The planar components efficiently model symmetric aspects of the head (e.g., bilateral facial features), while the spherical component handles non‑symmetric cues such as hair flow, subtle facial expressions, and irregular surface geometry. By combining both, the representation retains the uniform spatial density of tri‑planes and the directional disentanglement of spherical planes, eliminating mirroring artifacts without sacrificing detail.
Additional variants are explored: increasing the area proportion of the spherical plane to boost its expressive power, and a “dual‑plane‑dual‑sphere” design where two spherical planes face opposite poles, fully resolving seam and polar artifacts at the cost of extra parameters.
Experimental Validation – The authors evaluate HyPlaneHead on high‑resolution head datasets (FFHQ‑Head, CelebA‑HQ) using standard GAN metrics (FID, KID, LPIPS) and 3D consistency measures (PSNR across rendered viewpoints). HyPlaneHead achieves a ~10 % reduction in FID compared to EG3D and outperforms SphereHead and PanoHead by a larger margin, especially in regions previously plagued by mirroring (back of the head) and fine hair details. Ablation studies confirm that each component (unify‑split, equal‑area warping, hybrid composition) contributes significantly: removing unify‑split raises FID by ~1.2, reverting to θ‑ϕ warping introduces seam artifacts and degrades FID by ~0.7, and using only planar or only spherical planes reduces overall quality, demonstrating the necessity of the hybrid design.
Conclusion and Outlook – HyPlaneHead demonstrates that careful architectural redesign—addressing both geometric mapping and channel interaction—can overcome long‑standing limitations of tri‑plane‑based 3D‑aware GANs. The hybrid plane concept is generic and could be extended to other object categories (full‑body, vehicles) or integrated with dynamic (temporal) modeling for video synthesis. The paper thus advances the state of the art in photorealistic, view‑consistent head generation, providing a solid foundation for future immersive AR/VR avatar creation.
Comments & Academic Discussion
Loading comments...
Leave a Comment