LD-SLRO: Latent Diffusion Structured Light for 3-D Reconstruction of Highly Reflective Objects
Fringe projection profilometry-based 3-D reconstruction of objects with high reflectivity and low surface roughness remains a significant challenge. When measuring such glossy surfaces, specular reflection and indirect illumination often lead to severe distortion or loss of the projected fringe patterns. To address these issues, we propose a latent diffusion-based structured light for reflective objects (LD-SLRO). Phase-shifted fringe images captured from highly reflective surfaces are first encoded to extract latent representations that capture surface reflectance characteristics. These latent features are then used as conditional inputs to a latent diffusion model, which probabilistically suppresses reflection-induced artifacts and recover lost fringe information, yielding high-quality fringe images. The proposed components, including the specular reflection encoder, time-variant channel affine layer, and attention modules, further improve fringe restoration quality. In addition, LD-SLRO provides high flexibility in configuring the input and output fringe sets. Experimental results demonstrate that the proposed method improves both fringe quality and 3-D reconstruction accuracy over state-of-the-art methods, reducing the average root-mean-squared error from 1.8176 mm to 0.9619 mm.
💡 Research Summary
The paper addresses a long‑standing problem in fringe‑projection profilometry (FPP): accurate 3‑D reconstruction of highly reflective, low‑roughness surfaces. On such objects, specular highlights and inter‑reflections corrupt the projected sinusoidal fringe patterns, causing over‑exposure, loss of fringe contrast, and non‑sinusoidal distortions that break the assumptions of traditional phase‑shifting algorithms. Existing solutions either modify the hardware (multiple exposures, optical masks) or apply post‑processing techniques (inverted fringes, color‑channel separation, CNN‑based enhancement). While these methods can alleviate saturation or highlight removal, they do not fully recover the underlying fringe waveform, especially when specular components dominate.
To overcome these limitations, the authors propose LD‑SLRO (Latent Diffusion Structured Light for Reflective Objects), a conditional generative framework based on latent diffusion models. The core idea is to treat the degraded fringe stack as a noisy observation in a compact latent space and to learn a denoising diffusion process that restores the clean fringe set. The system comprises three main components:
-
Diffuse Reflection Autoencoder – a VAE‑style encoder‑decoder that processes a 24‑channel stack of phase‑shifted fringes captured on a diffuse reference surface. The encoder maps the stack to an 8‑dimensional latent distribution (mean + log‑variance), and the decoder reconstructs the full 24‑channel fringe set. Residual blocks, Group Normalization, Swish activations, and a mid‑level attention block are used throughout.
-
Specular Reflection Encoder – a separate network that ingests a small number (e.g., 5) of fringe images obtained from the highly reflective target. It extracts features describing the geometry‑dependent specular behavior and encodes them as a multivariate Gaussian latent vector. This vector captures the directionality and intensity of specular highlights, which are crucial for correcting the warped fringe waveform.
-
Conditional Denoiser (Latent Diffusion) – a UNet‑like architecture equipped with multi‑head attention. During the diffusion process, the denoiser receives the noisy latent sample x_t, the timestep t, and a conditioning embedding c that concatenates the two latent vectors (diffuse and specular). A novel Time‑Variant Channel‑Affine Fusion layer injects the specular latent information into the denoiser at each timestep via a learned affine transformation that varies with t, allowing the model to gradually shift focus from specular correction (early steps) to fringe structure refinement (later steps).
Training follows the standard DDPM formulation but predicts the clean latent sample directly (L_x0 = ‖x₀ − x̂₀‖²) rather than the noise, which stabilizes learning given the high dimensionality of the latent space. During inference, the forward diffusion is deterministic (σ_t = 0) so that a fixed conditioning yields a reproducible output. Importantly, the framework decouples input and output fringe configurations: a sparse input set (e.g., 4 phase‑shifted images) can be transformed into a dense output set (24 images) with a different spatial frequency, enabling faster acquisition without sacrificing reconstruction quality.
Experimental validation involved six highly reflective objects, including polished metal parts, electroplated surfaces, and near‑mirror specimens. LD‑SLRO was compared against state‑of‑the‑art methods such as multi‑view BRDF reconstruction, hardware‑masking approaches, CNN‑based fringe synthesis, and the Y‑FFC network. Quantitatively, LD‑SLRO reduced the average root‑mean‑squared error (RMSE) of the reconstructed 3‑D shape from 1.8176 mm (best baseline) to 0.9619 mm—a 47 % improvement. Peak‑signal‑to‑noise ratio (PSNR) and structural similarity (SSIM) of the restored fringe images also increased markedly. Qualitatively, over‑exposed regions were successfully recovered, fringe contrast was restored, and the phase maps exhibited far fewer discontinuities, leading to more accurate phase unwrapping and triangulation.
The authors acknowledge two main limitations. First, the multi‑step diffusion process incurs higher computational cost than a single‑pass CNN, which may hinder real‑time deployment. Second, the specular encoder is trained on a specific illumination setup; its generalization to drastically different lighting conditions remains to be demonstrated. Future work is suggested on model compression, adaptive timestep scheduling, and extending the conditioning to multispectral or color fringe patterns.
In summary, LD‑SLRO introduces a novel data‑driven paradigm for structured‑light 3‑D scanning of mirror‑like surfaces. By leveraging latent diffusion conditioned on both diffuse and specular latent codes, it simultaneously suppresses over‑exposure, restores lost fringe information, and enables dense fringe synthesis from a minimal set of captured images, achieving substantially higher reconstruction accuracy than existing hardware‑ or algorithm‑based solutions.
Comments & Academic Discussion
Loading comments...
Leave a Comment