Modeling Inverse Ellipsometry Problem via Flow Matching with a Large-Scale Dataset

Modeling Inverse Ellipsometry Problem via Flow Matching with a Large-Scale Dataset
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Inverse ellipsometry, i.e., reconstructing optical constants and film thickness from the measured phase difference $Δ$ and amplitude ratio $Ψ$, is a fundamentally ill-posed problem. Traditional solutions rely on slow, expert-driven iterative fitting, while the development of machine learning approaches has been severely limited by the lack of large-scale, physically consistent datasets. To address this gap, we introduce \textbf{EllipBench}, a comprehensive benchmark comprising over 8 million high-precision samples spanning 98 thin-film materials and 5 substrates. Building upon this benchmark, we conduct a systematic evaluation of a broad spectrum of methods, including traditional machine learning models, deep neural networks, and Physics-Informed Neural Networks, and show that existing paradigms consistently struggle to fully resolve the inverse ellipsometry task. To better capture its inherent ambiguity, we further propose a novel \textbf{Decoupled Conditional Flow Matching (DCFM)} framework. Rather than formulating the problem as deterministic point-to-point regression, DCFM explicitly decouples geometric film thickness and incorporates it as a robust physical condition to guide a continuous vector field for modeling the inverse probability distribution of wavelength-dependent optical constants. Combined with a gradient detachment strategy and physics-based constraints, our joint architecture effectively mitigates intrinsic physical ambiguities and delivers a robust and accurate solution for inverse ellipsometry.


💡 Research Summary

**
The paper tackles the fundamentally ill‑posed inverse ellipsometry problem—recovering a thin‑film’s complex refractive index (n₂ + ik₂) and thickness (d) from measured amplitude ratio (Ψ) and phase difference (Δ). The authors make two major contributions.

First, they release EllipBench, a publicly available benchmark containing more than eight million high‑precision samples. The dataset spans 98 distinct thin‑film materials (metals, alloys, inorganic compounds, polymers) deposited on five common substrates (amorphous Si, ITO, crystalline Si, SrTiO₃, polyimide). Spectral data are sampled from 380 nm to 1000 nm with a 2.6 nm step, and film thicknesses range logarithmically from 1 nm to 96 nm in 20 steps. Forward simulations use the exact Fresnel multilayer formalism at a 70° incidence angle. To ensure physical consistency, the authors introduce an Energy Conservation Error (EC Error) metric that measures the deviation of reflected plus transmitted energy from unity for both polarizations. Samples with excessive EC Error are filtered out, resulting in a clean, physically realistic corpus. Analysis of EC Error reveals that metallic films, especially in the 16–31 nm thickness range, exhibit the highest physical ambiguity, while polymers and many compounds remain well‑behaved.

Second, the paper proposes Decoupled Conditional Flow Matching (DCFM), a novel probabilistic framework that overcomes the limitations of deterministic regression approaches. Traditional methods (Levenberg‑Marquardt fitting, shallow ML models, standard deep neural networks, and even physics‑informed neural networks) struggle because the mapping from (Ψ, Δ, λ, substrate indices) to (n₂, k₂, d) is many‑to‑one: different combinations of optical constants and thickness can produce almost identical ellipsometric spectra. DCFM addresses this by treating thickness d as a condition and modeling the wavelength‑dependent optical constants (n₂(λ), k₂(λ)) as a continuous probability flow. Concretely, a prior distribution p(z) (e.g., standard Gaussian) is transformed into the target conditional distribution p(x|d) via a time‑dependent bijection φₜ(z). The flow is trained by minimizing a flow‑matching loss L_FM = E_{t,z}


Comments & Academic Discussion

Loading comments...

Leave a Comment