We introduce MxDiffusion, a hybrid physics- and data-driven diffusion-based framework that enables efficient and highly accurate generation of photonic structures from target optical properties. The improved accuracy is achieved through a two-stage generation strategy, in which the first diffusion model is explicitly trained with Maxwells equation-based loss to embed physical insight directly into the inverse design process, while the second model maps the physically consistent intermediate representation to the final structural geometry with significantly higher fidelity than solely data-driven approaches. The performance of MxDiffusion is validated on two representative applications: gold nanostructures patterned on a silica substrate and a highly tunable bandpass filter based on phase change material. In both cases, the proposed framework consistently outperforms a conventional data-driven diffusion model benchmark, particularly for out-of-training-distribution design targets and highly constrained resonance conditions. These results demonstrate the efficacy and superiority of MxDiffusion as a general physics-guided inverse design paradigm.
Introduction. Machine Learning (ML)-based photonic inverse design refers to the use of machine learning models to determine structural parameters that satisfy a prescribed optical response. Due to the time-consuming and inefficient nature of traditional trial-and-error approaches for designing photonic structures, machine learning-based techniques have emerged as powerful tools for exploring complex design objectives that are difficult or even impractical to achieve using trialand-error methods. 1 ML-based inverse design of photonic structures has attracted significant attention over the past several years [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16] and have been adopted across many domains of optics and photonics, including nonlinear optics, 17,18 quantum photonics, [19][20][21] imaging systems, [22][23][24] photonic crystals, 25 metasurface design, 26 to name a few.
While deep neural networks (DNNs), such as fully connected networks (FCNs) and convolutional neural networks (CNNs), are well suited for forward prediction tasks, 27,28 for example, predicting the transmission spectrum a metasurface, inverse design problems typically require generative models, such as variational autoencoders (VAEs) and generative adversarial networks (GANs), to produce new structural configurations that satisfy specified optical response targets. A variational autoencoder (VAE) is a deep generative model composed of an encoder and a decoder, where the encoder compresses input data into a low-dimensional latent representation and the decoder reconstructs the original data by minimizing reconstruction error during training.
Once trained, the decoder can generate new samples from random latent vectors. In photonic inverse design, VAEs have been employed as pattern generators in combination with forward prediction networks or further refined using evolutionary strategies. 17,29,30 However, VAEs suffer from limited conditional generative capability, as they primarily generate samples through random latent-space sampling, which restricts precise control over the generated patterns to satisfy the complex design objectives. GANs have become the most widely used generative framework for photonic inverse design. [31][32][33][34] A GAN consists of a generator that produces synthetic samples and a discriminator that distinguishes between real and generated data. Through adversarial training, the generator learns to produce outputs that increasingly resemble the true data distribution. Despite their success, GANs rely solely on adversarial feedback without an explicit likelihood formulation or physical constraint, which sometimes leads to unstable training dynamics, mode collapse, and limited diversity in generated designs. 35 To address these limitations, physics-based techniques such as adjoint optimization and physics-informed loss functions have been integrated with machine learning models to improve inverse design performance. [35][36][37][38][39] More recently, diffusion models have been introduced as a robust alternative for inverse design, as they decompose the generation process into a sequence of noiseprediction steps governed by a Markov chain. 40 Diffusion models have demonstrated superior performance over VAEs and GANs across a wide range of applications, [41][42][43][44][45][46][47][48] including photonic inverse design using conditional inputs such as S-parameters and far-field spatial power distributions. Positional sinusoidal encoding is used as a conditional data for training diffusion model as well. 49 Recent efforts have also incorporated adjoint optimization into diffusion models to inject physical knowledge during sampling; 50 however, this approach requires real-time electromagnetic simulations to compute adjoint gradients, resulting in substantial computational overhead.
In this work, we introduce a physics-aware MxDiffusion framework, which directly integrates Maxwell’s equations into the training of a diffusion model, providing physical insight intrinsically within the learning framework. This approach significantly enhances the capability of conventional data-driven diffusion models, particularly in optimizing critical boundary regions of photonic structures to meet target design specifications. Importantly, the proposed method does not require real-time simulations during sampling, and both the training and sampling costs remain comparable to those of traditional data-driven diffusion models. The efficacy of our MxDiffusion framework is illustrated through two sets of design examples. First, we consider a periodic gold nanostructure with fixed thickness and arbitrary geometrical shape patterned on glass substrates, where user-defined transmission spectra serve as the design targets. In this case, the MxDiffusion framework consistently outperforms the data-driven diffusion baseline, particularly when generating structure patterns to satisfy out-of-training-distribution target spectra. In the second e
This content is AI-processed based on open access ArXiv data.