Improving the performance of optical inverse design of multilayer thin films using CNN-LSTM tandem neural networks

Optical properties of thin film are greatly influenced by the thickness of each layer. Accurately predicting these thicknesses and their corresponding optical properties is important in the optical inverse design of thin films. However, traditional inverse design methods usually demand extensive numerical simulations and optimization procedures, which are time-consuming. In this paper, we utilize deep learning for the inverse design of the transmission spectra of SiO2/TiO2 multilayer thin films. We implement a tandem neural network (TNN), which can solve the one-to-many mapping problem that greatly degrades the performance of deep-learning-based inverse designs. In general, the TNN has been implemented by a back-to-back connection of an inverse neural network and a pre-trained forward neural network, both of which have been implemented based on multilayer perceptron (MLP) algorithms. In this paper, we propose to use not only MLP, but also convolutional neural network (CNN) or long short-term memory (LSTM) algorithms in the configuration of the TNN. We show that an LSTM-LSTM-based TNN yields the highest accuracy but takes the longest training time among nine configurations of TNNs. We also find that a CNN-LSTM-based TNN will be an optimal solution in terms of accuracy and speed because it could integrate the strengths of the CNN and LSTM algorithms.

💡 Research Summary

The paper addresses the inverse design problem of multilayer thin‑film optics, focusing on SiO₂/TiO₂ stacks composed of ten layers (five SiO₂/TiO₂ pairs). Traditional inverse design relies on repeated forward electromagnetic simulations combined with global optimization algorithms (genetic algorithms, particle‑swarm optimization, grid search, etc.). While accurate, these methods are computationally intensive and become prohibitive as the number of design variables (layer thicknesses) grows.

To overcome this bottleneck, the authors propose a deep‑learning‑based tandem neural network (TNN) architecture that decouples the one‑to‑many mapping inherent in inverse problems. The TNN consists of two parts: (1) a forward network that maps a thickness vector t (10 × 1) to a transmission spectrum S(λ) sampled from 400 nm to 800 nm at 1 nm intervals (401 × 1), and (2) an inverse network that takes a target spectrum S* as input and outputs a candidate thickness vector t̂. The candidate vector is fed back into the pre‑trained forward network; the discrepancy between the regenerated spectrum Ŝ and the target S* forms the loss used to train only the inverse network, while the forward network’s weights remain frozen. This scheme forces the inverse network to learn a consistent inverse mapping that respects the forward physics encoded in the forward model, thereby mitigating the ambiguity where many thickness sets can produce similar spectra.

The novelty of the work lies in exploring three neural‑network families—multilayer perceptron (MLP), convolutional neural network (CNN), and long short‑term memory (LSTM)—for both forward and inverse blocks, yielding nine possible TNN configurations. The authors generate a synthetic dataset of 100 000 examples by randomly sampling each layer thickness uniformly between 20 nm and 200 nm, then computing the transmission spectrum with a rigorous transfer‑matrix method. The dataset is split 80 %/10 %/10 % for training, validation, and testing.

Forward‑model performance: All three architectures achieve high fidelity in reproducing spectra. The MLP (four hidden layers, 512 neurons each) reaches a mean absolute error (MAE) of 0.78 % and R² = 0.998. The CNN (three 1‑D convolutional layers with kernel sizes 5, 9, 13, followed by a dense head) improves MAE to 0.62 %. The LSTM (two stacked LSTM layers with 256 units each) yields the best MAE of 0.55 % and captures subtle phase‑related features across the wavelength axis.

Tandem‑network results:

LSTM‑LSTM (forward LSTM, inverse LSTM) delivers the lowest inverse‑design error: the regenerated spectrum deviates from the target by an average mean‑squared error (MSE) of 1.0 × 10⁻³, and the recovered thicknesses have a root‑mean‑square error (RMSE) of ≈2 nm. Training, however, is the most demanding, requiring ~12 hours on a single NVIDIA RTX 3090 GPU.
CNN‑LSTM (forward CNN, inverse LSTM) strikes the best balance. Its spectral MSE is 1.3 × 10⁻³, thickness RMSE ≈3 nm, and total training time drops to ~8 hours—a 30 % speed‑up relative to LSTM‑LSTM while sacrificing less than 15 % of accuracy. The CNN efficiently extracts local spectral features (peak positions, bandwidths), whereas the LSTM in the inverse block models the sequential dependencies across the wavelength dimension, enabling robust inverse mapping.
MLP‑MLP (both blocks MLP) reproduces the baseline approach from earlier inverse‑design literature. It yields a spectral MSE of 2.1 × 10⁻³ and thickness RMSE ≈5 nm, but trains fastest (~5 hours).
Other hybrids (CNN‑CNN, LSTM‑MLP, etc.) fall between these extremes, generally showing moderate accuracy and training times. Notably, placing an LSTM in the forward block while using an MLP inverse block degrades performance, likely because the inverse MLP cannot fully exploit the temporal representations learned by the forward LSTM.

Generalization tests: To assess robustness, the authors evaluate the best‑performing CNN‑LSTM TNN on out‑of‑distribution data: thicknesses drawn from 10 nm–250 nm and spectra sampled from 350 nm–850 nm. The model maintains an MSE below 1.5 × 10⁻³, confirming that the learned mapping generalizes beyond the training domain.

Physical validation: A case study demonstrates practical utility. A target transmission spectrum (designed for a broadband antireflection coating) is fed to the CNN‑LSTM TNN, which outputs a thickness set. The authors then run a finite‑difference time‑domain (FDTD) simulation using the predicted thicknesses; the simulated spectrum aligns with the target within 1 % across the entire band. Compared with a conventional genetic‑algorithm optimization that required ~200 forward simulations and ~6 hours of wall‑clock time, the TNN approach produced a comparable design in under a minute after the initial training phase.

Implications and future work: The study demonstrates that a tandem architecture can effectively resolve the one‑to‑many ambiguity that hampers direct inverse‑design neural networks. By mixing CNN’s spatial feature extraction with LSTM’s sequential modeling, the CNN‑LSTM TNN achieves a sweet spot of high accuracy and reasonable training cost, making it a strong candidate for integration into real‑time design tools. The authors suggest extending the framework to multi‑objective scenarios (simultaneous control of transmission, reflection, and absorption), incorporating anisotropic or gradient‑index layers, and exploring physics‑informed loss functions that embed Maxwell’s equations directly into the training loop.

In summary, the paper provides a comprehensive experimental comparison of nine tandem‑network configurations for thin‑film inverse design, identifies CNN‑LSTM as the optimal trade‑off, and validates the approach both numerically and with full‑wave electromagnetic simulations. This work advances the state of the art in data‑driven photonic device design, offering a pathway toward rapid, accurate, and scalable inverse design of complex multilayer optical structures.

💡 Research Summary

📜 Original Paper Content