TF-UNet: Resolving Complex Speckles for Single-Shot Reconstruction of 512^2-Matrix Images Using a Micron-Sized Optical Fiber

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Tapered optical fibers (TFs), with diameters gradually reduced from hundreds of microns to the micron scale, offer key advantages over conventional flat optical fibers (FFs), including uniform illumination, efficient long-range signal collection, and minimal invasiveness for applications in high-sensitivity biosensing, optogenetics, and photodynamic therapy. However, high-fidelity, single-shot imaging through a single TF remains underexplored due to intermodal coupling from the tapering geometry, which distorts output speckle patterns and poses challenges for image reconstruction using existing deep learning methods. Here, we propose a physics-inspired TF-UNet architecture that augments skip connections with hierarchical grouped-MLP fusion to effectively capture non-local, cross-scale dependencies caused by intermodal coupling in TFs. We experimentally validate our method on both FFs and TFs, demonstrating that TF-UNet outperforms standard U-Net variants in structural and perceptual fidelity while maintaining competitive PSNR at quadratic complexity. Our study offers a promising approach for deep learning-based imaging through micron-sized, ultrafine optical fibers, enabling scanning-free single-shot reconstruction on a 512x512 reconstruction matrix, and further validating the framework on biologically meaningful neuronal and vascular datasets for physically interpretable characterization.

💡 Research Summary

This paper introduces TF‑UNet, a physics‑inspired deep learning architecture designed to reconstruct high‑resolution images from the complex speckle patterns generated by tapered optical fibers (TFs). Unlike conventional flat fibers (FFs) whose modal dynamics are relatively stationary, TFs gradually reduce their core diameter from hundreds of microns to a few microns, causing strong, space‑variant inter‑modal coupling. The resulting speckle field exhibits non‑local, cross‑scale dependencies that challenge standard convolutional networks and naïve attention mechanisms.

TF‑UNet builds upon the classic encoder‑decoder U‑Net backbone with four resolution scales but augments every skip connection with a hierarchical grouped‑MLP fusion block. In this block, the channel dimension is split into multiple groups (0, 1, 2, 4 groups from shallow to deep layers). Each group undergoes an independent MLP that mixes information globally across both spatial and channel axes, thereby capturing the high‑order, non‑local interactions induced by the taper geometry. A orthogonality regularizer is added to the loss to encourage decorrelated group bases, mirroring the physical orthogonality of fiber modes and promoting mode disentanglement.

The grouped‑MLP design reduces the naïve O(N²) cost of a full‑global MLP to a near‑linear complexity by limiting the mixing to channel groups and applying the operation hierarchically. Consequently, TF‑UNet can process 512 × 512 speckle images with modest GPU memory consumption, roughly 30 % lower than comparable global‑MLP or transformer‑based models.

For data acquisition, the authors built a DMD‑based illumination system that projects binary masks of natural images (sampled from ImageNet) onto the proximal end of either a TF (NA = 0.39, tip diameter ≈ 5 µm, length 2.5 mm) or an FF with matching NA. The output speckle patterns are captured by a CMOS camera through a 4‑f relay. The dataset comprises 13 440 speckle‑mask pairs (6 720 TF, 6 720 FF), split 8:1:1 for training, validation, and testing.

Performance is evaluated using structural similarity (SSIM), multi‑scale SSIM (MS‑SSIM), learned perceptual image patch similarity (LPIPS), peak signal‑to‑noise ratio (PSNR), and Pearson correlation. Across all metrics TF‑UNet outperforms baseline U‑Net variants (plain U‑Net, Residual U‑Net, Attention U‑Net). Notably, SSIM and MS‑SSIM improve by 4–7 %, LPIPS decreases (indicating better perceptual quality), while PSNR remains comparable or slightly higher. Memory usage drops by about one‑third, and inference latency stays on par with the baselines.

Beyond synthetic ImageNet data, the authors test TF‑UNet on biologically relevant datasets (neuronal and vascular images). The model preserves fine structural details and suppresses speckle‑induced noise, demonstrating its potential for real‑world biomedical imaging. Analysis of learned weights reveals that the grouped‑MLP layers implicitly encode the spatially varying coupling coefficient κ(s) and propagation‑constant mismatch Δβ(s) that govern mode exchange in a tapered fiber. The orthogonal regularizer further aligns the learned representations with the physical notion of mode orthogonality, enabling the network to automatically differentiate adiabatic (weak coupling) from non‑adiabatic (strong coupling) regions along the fiber.

In summary, TF‑UNet offers a compact, data‑efficient solution for single‑shot, high‑resolution image reconstruction through micron‑scale tapered fibers. By integrating physics‑aware grouped‑MLP fusion into the U‑Net framework, it captures the non‑local, space‑variant mappings that traditional convolutional or attention mechanisms miss, while maintaining quadratic computational complexity and modest memory demands. The method paves the way for minimally invasive endoscopic imaging, optogenetic stimulation, and photodynamic therapy where ultra‑thin fibers are required, and it sets a precedent for incorporating physical priors into deep learning models for complex wave‑propagation problems.

TF-UNet: Resolving Complex Speckles for Single-Shot Reconstruction of 512^2-Matrix Images Using a Micron-Sized Optical Fiber

💡 Research Summary

Comments & Academic Discussion

Leave a Comment