PISE: Physics-Anchored Semantically-Enhanced Deep Computational Ghost Imaging for Robust Low-Bandwidth Machine Perception

PISE: Physics-Anchored Semantically-Enhanced Deep Computational Ghost Imaging for Robust Low-Bandwidth Machine Perception
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose PISE, a physics-informed deep ghost imaging framework for low-bandwidth edge perception. By combining adjoint operator initialization with semantic guidance, PISE improves classification accuracy by 2.57% and reduces variance by 9x at 5% sampling.


💡 Research Summary

The paper introduces PISE (Physics‑Anchored Semantically‑Enhanced Deep Computational Ghost Imaging), a novel framework designed for low‑bandwidth edge perception where the goal is to extract machine‑relevant semantic information rather than reconstruct high‑fidelity visual images. In many IoT and robotic applications, transmitting full‑frame data is infeasible due to limited bandwidth, strict energy budgets, or intermittent connectivity. Computational Ghost Imaging (CGI) offers a hardware‑efficient way to acquire compressed measurements using a single‑pixel detector, but at extreme undersampling (e.g., 5 % of the pixels) the inverse problem becomes severely ill‑posed. Traditional compressive reconstruction methods (IST‑A‑Net+, ADMM‑CSNet) produce structured artifacts, while purely data‑driven deep approaches either oversmooth the output or generate semantically inconsistent textures.

PISE tackles these issues by combining two complementary ideas. First, a physics‑based anchor is created through the adjoint operator of the sensing matrix (Aᵀ). Given measurements y, a coarse proxy image x_init = R(Aᵀy) is obtained by reshaping the back‑projected vector. Although this proxy is noisy and aliased at 5 % sampling, it preserves coarse spatial layout and object localization, providing a meaningful starting point for subsequent optimization. Second, semantic guidance is introduced via a perceptual loss computed on frozen VGG‑16 feature maps (layers relu1_2, relu2_2, relu3_3). The total loss is L = λ_mse‖x − x̂‖₂² + λ_perc∑_j‖ϕ_j(x) − ϕ_j(x̂)‖₁, with λ_mse = 1.0 and λ_perc = 0.05. This balances pixel‑wise fidelity with high‑level feature alignment, preventing the gradient collapse typical of pure MSE training while avoiding the instability that arises when perceptual loss is used without physical constraints.

Training dynamics are monitored using the ℓ₂ norm of the gradient with respect to network parameters, G(t) = ‖∇_θ L‖₂. PISE maintains a stable gradient magnitude throughout training, indicating that the physics anchor regularizes the search space and the perceptual term restores high‑frequency cues needed for downstream classifiers.

Experiments are conducted on Fashion‑MNIST (28 × 28 grayscale) and CIFAR‑10 (32 × 32 color) under a primary sampling rate of 5 % and additional rates of 2 %, 10 %, and 20 %. Training uses additive white Gaussian noise, while evaluation employs Poisson noise to mimic photon‑limited detectors. Baselines include IST‑A‑Net+, ADMM‑CSNet, a plain U‑Net trained with MSE only, and a pix2pix‑style conditional GAN adapted to CGI. Metrics focus on classification accuracy (using a frozen classifier) and PSNR.

Key quantitative findings: on CIFAR‑10 with 5 % sampling and Poisson noise, PISE achieves 21.64 % classification accuracy with a modest PSNR of 18.77 dB, comparable to the MSE‑U‑Net baseline but with far lower variance. On Fashion‑MNIST, PISE reaches 83.08 % ± 0.23 accuracy, improving over the MSE baseline (80.51 % ± 2.12) by 2.57 % absolute and reducing run‑to‑run variance by roughly ninefold. Visual inspection shows that PISE recovers fine details such as shoe laces that are lost in MSE reconstructions, while avoiding the spurious textures sometimes introduced by GAN‑based methods.

Computationally, PISE contains 31 M parameters and ~407 M FLOPs, similar to the MSE‑U‑Net, but achieves 2455 FPS on an NVIDIA RTX 6000 GPU—about a six‑fold speedup over physics‑based baselines (IST‑A‑Net+, ADMM‑CSNet). This demonstrates that FLOP counts alone can be misleading; parallel efficiency and memory access patterns matter for real‑time deployment.

Ablation studies confirm the contributions of each component. Using only random initialization plus MSE yields reasonable accuracy but high variance and over‑smoothed outputs. Adding perceptual loss without the physics anchor sharpens features and boosts peak accuracy (83.15 %) but leads to unstable training (high gradient norms, larger variance). The full PISE configuration (adjoint initialization + perceptual loss) matches the peak accuracy while delivering stable training dynamics and low variance.

Robustness tests under increasing noise levels show that PISE’s PSNR degrades only 0.19 dB at σ = 0.2, compared to 0.31 dB for the MSE baseline, confirming that the physics anchor regularizes the solution manifold when measurements become unreliable.

The authors acknowledge limitations: experiments are performed with simulated sensing matrices and synthetic noise; real‑hardware validation is pending. Moreover, VGG‑16 is pretrained on natural RGB images, which may be suboptimal for small grayscale datasets, suggesting future work on domain‑specific feature extractors. Potential extensions include jointly learning the sensing patterns together with reconstruction, and designing lighter semantic networks for ultra‑low‑power edge devices.

In summary, PISE demonstrates that integrating a physics‑driven initialization with feature‑space semantic guidance can overcome the gradient collapse and over‑smoothing problems of MSE‑only reconstruction under extreme undersampling. It delivers higher classification accuracy, dramatically reduced variance, and real‑time inference, making it a promising solution for bandwidth‑constrained edge perception tasks.


Comments & Academic Discussion

Loading comments...

Leave a Comment