Fused-Planes: Why Train a Thousand Tri-Planes When You Can Share?

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Tri-Planar NeRFs enable the application of powerful 2D vision models for 3D tasks, by representing 3D objects using 2D planar structures. This has made them the prevailing choice to model large collections of 3D objects. However, training Tri-Planes to model such large collections is computationally intensive and remains largely inefficient. This is because the current approaches independently train one Tri-Plane per object, hence overlooking structural similarities in large classes of objects. In response to this issue, we introduce Fused-Planes, a novel object representation that improves the resource efficiency of Tri-Planes when reconstructing object classes, all while retaining the same planar structure. Our approach explicitly captures structural similarities across objects through a latent space and a set of globally shared base planes. Each individual Fused-Planes is then represented as a decomposition over these base planes, augmented with object-specific features. Fused-Planes showcase state-of-the-art efficiency among planar representations, demonstrating $7.2 \times$ faster training and $3.2 \times$ lower memory footprint than Tri-Planes while maintaining rendering quality. An ultra-lightweight variant further cuts per-object memory usage by $1875 \times$ with minimal quality loss. Our project page can be found at https://fused-planes.github.io .

💡 Research Summary

The paper addresses the inefficiency of training Tri‑Plane‑based Neural Radiance Fields (NeRFs) for large collections of 3D objects. Traditional approaches allocate an independent Tri‑Plane to each object, ignoring the structural redundancy that often exists within a class. The authors propose “Fused‑Planes,” a representation that decomposes each object into a micro component (object‑specific planes) and a macro component (a weighted combination of globally shared base planes). The macro component is generated by multiplying a small set of base planes (M ≪ N) with learned per‑object coefficients, dramatically reducing the number of trainable parameters per object.

To further improve efficiency, the method is trained in a 3D‑aware latent space provided by an auto‑encoder (Eϕ, Dψ). Images are encoded into low‑dimensional latent vectors, and volume rendering is performed on these latent images rather than full‑resolution RGB, cutting rendering cost. Crucially, the latent space is learned jointly with the Fused‑Planes, allowing the shared base planes and object‑specific micro planes to align with the latent geometry, preserving rendering quality that typical latent NeRFs lose when using a pre‑trained generic latent space.

Experiments on ShapeNet (10 k objects) and a proprietary large‑scale dataset demonstrate that, under a fixed budget of 7 minutes per object and roughly 1.5 MB memory, Fused‑Planes achieve a PSNR of 29.69 dB—about 10 % higher than standard Tri‑Planes (26.78 dB)—while training 7.2× faster and using 3.2× less memory. An ultra‑lightweight variant (Fused‑Planes‑UL W) removes the micro component entirely, reducing per‑object memory to 0.0008 MB and still delivering 28.44 dB PSNR, a negligible loss for many applications.

Ablation studies confirm that (1) increasing the number of base planes improves quality but linearly raises memory, (2) omitting the micro planes degrades PSNR by ~2 dB, and (3) training the latent space jointly is essential—using a fixed latent space drops PSNR by ~1.8 dB. The method excels when class‑level structural similarity is high (e.g., chairs, cars) but may be less effective for highly deformable categories such as human bodies.

Limitations include potential loss of fine‑grained detail when the macro component dominates and the need to choose an appropriate number of base planes. Future work could explore hierarchical base‑plane dictionaries, meta‑learning for rapid adaptation to new categories, or integration with dynamic scene representations.

In summary, Fused‑Planes introduces a shared‑plane paradigm combined with a jointly learned 3D‑aware latent space, delivering substantial speed‑up and memory savings without sacrificing visual fidelity. This makes large‑scale 3D reconstruction feasible for resource‑constrained environments and opens avenues for scalable downstream tasks such as editing, classification, and generative modeling of 3D objects.

Fused-Planes: Why Train a Thousand Tri-Planes When You Can Share?

💡 Research Summary

Comments & Academic Discussion

Leave a Comment