Global Self-Attention with Exact Fourier Propagation for Phase-Only Far-Field Holography

Reading time: 5 minute
...

📝 Original Info

  • Title: Global Self-Attention with Exact Fourier Propagation for Phase-Only Far-Field Holography
  • ArXiv ID: 2602.17624
  • Date: 2026-02-19
  • Authors: 논문에 명시된 저자 정보가 제공되지 않았습니다.

📝 Abstract

Phase-only computer-generated holography (CGH) seeks a phase pattern for a spatial light modulator (SLM) whose propagated optical field reproduces a desired intensity distribution. In the far-field (Fraunhofer) regime, optical propagation reduces to a Fourier transform, such that each hologram pixel contributes to the entire reconstructed intensity distribution. When restricted to phase-only modulation, intensity must be shaped through global phase interference effects, making the inverse mapping from target intensity to phase highly non-linear and sensitive to local minima. We present a proof-of-concept physics-in-the-loop approach in which a transformer maps a target intensity image to a phase-only SLM field and is trained end-to-end through exact FFT-based propagation embedded directly within optimization. We further observe that patch tokenization strongly shapes the optimization geometry: coarse tokenization acts as an implicit spectral regularizer that stabilizes training and suppresses checkerboard-like attractors, while finer tokenization increases spatial degrees of freedom but benefits from curriculum or hierarchical refinement. Despite training on limited primitives and restricted digit subsets, the learned generator exhibits out-of-distribution (OOD) generalization to unseen digits and hand-drawn target patterns. These results suggest that transformer architectures, whose self-attention enables global token interactions, are a natural fit for far-field holography and provide a viable foundation for scalable physics-grounded hologram generation.

💡 Deep Analysis

📄 Full Content

Computer-generated holography (CGH) seeks to determine a phase pattern ϕ(x, y) displayed on a spatial light modulator (SLM) such that, after optical propagation, the resulting intensity distribution matches a desired target. In the far-field (Fraunhofer) regime, scalar diffraction theory shows that propagation reduces to a Fourier transform. [1] Let the complex field immediately after the SLM be U 0 (x, y) = e jϕ(x,y) , (1) where the amplitude is fixed to unit magnitude and only the phase is modulated. Under far-field propagation, the complex field in the reconstruction plane is given by U f (u, v) = F{U 0 (x, y)}, (2) and the observed intensity is

(

The inverse problem is therefore: given a desired target intensity I T (u, v), find a phase distribution ϕ(x, y) such that

Two structural properties make this problem challenging. First, the Fourier transform is a global operator, meaning each SLM pixel contributes to the entire far-field distribution. Consequently, local phase adjustments influence global intensity patterns. Second, because the SLM modulates phase only, intensity must be shaped through phase interference rather than in conjunction with amplitude control, resulting in a highly non-linear mapping from ϕ to I f . This problem has traditionally been addressed using alternating-projection methods. The Gerchberg-Saxton (GS) algorithm iteratively enforces intensity constraints in both planes while keeping phase the degree of freedom (DOF). [2] Hybrid input-output variants and related improvements to mitigate stagnation and accelerate convergence were later introduced. [3] These methods remain foundational, but require multiple iterations per target and may converge to local minima depending on initialization and constraint choices, yielding sub-optimal reconstructions.

Recent advances in deep learning have motivated hologram generation, where a neural network predicts a hologram in a single forward pass. Early work demonstrated neural networks for phase recovery and holographic reconstruction. [4] Subsequent reviews summarize rapid progress in deep-learning-based CGH, including convolutional neural networks trained to generate holograms directly and camera-in-the-loop frameworks that learn propagation corrections jointly with hardware calibration. [5][6][7][8] Most of this work operates in Fresnel or near-field regimes, frequently modeled using the angular spectrum method (ASM). For example, recent attention-enhanced convolutional approaches have been proposed for ASM-based hologram generation. [9] In contrast to Fresnel or ASM formulations, where propagation can retain partially local structure depending on sampling and depth, the Fraunhofer regime reduces to Fourier propagation, eliminating spatial locality in the forward operator. Neural networks have also been applied to complex hologram representations for recognition tasks. [10] But, these approaches process holographic data as input, rather than synthesizing phase-only holograms under explicit propagation constraints.

The structural properties of far-field propagation therefore have direct consequences for model design. Generating a valid phase-only solution requires coordinated structure across distant regions of the phase field, reflecting the nonlocal coupling inherent to far-field diffraction. This observation suggests that architectural inductive biases capable of modeling long-range interactions may be particularly well suited to far-field hologram synthesis.

Classical convolutional neural networks, originally popularized for image recognition, emphasize locality through finite receptive fields and weight sharing. [11] Although deep networks can approximate global interactions, their architectural prior emphasizes locality, which is highly effective for natural images but may be less directly aligned with nonlocal diffraction operators.

Transformers, however, introduced by Vaswani et al., implement global token interactions through self-attention. [12] In scaled dot-product attention, each token aggregates information from every other token in a single layer, allowing the model to represent global dependencies explicitly.

When applied to images, Vision Transformers (ViT) demonstrated that spatial patches can be treated as tokens and processed through global attention mechanisms. [13] Patch tokenization not only restructures the image into interaction units but also controls the effective spatial degrees of freedom available to the model.

In far-field holography, Fourier propagation mixes spatial information across the reconstruction plane, and such interaction mechanisms provide a structurally aligned inductive bias. This alignment between architectural bias and physical structure motivates the investigation of transformer-based models for phase-only far-field hologram generation under explicit physics-in-the-loop training. An additional practical advantage of this formulation is inference speed. Once trained, the network genera

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut