메타서피스 기반 나노포톤닉스 기초 모델 MOCLIP의 고속 무제한 설계와 광학 저장 혁신

Reading time: 5 minute
...

📝 Abstract

Foundation models (FM) are transforming artificial intelligence by enabling generalizable, data-efficient solutions across different domains for a broad range of applications. However, the lack of large and diverse datasets limits the development of FM in nanophotonics. This work presents MOCLIP (Metasurface Optics Contrastive Learning Pretrained), a nanophotonic foundation model that integrates metasurface geometry and spectra within a shared latent space. MOCLIP employs contrastive learning to align geometry and spectral representations using an experimentally acquired dataset with a sample density comparable to ImageNet-1K. The study demonstrates MOCLIP inverse design capabilities for high-throughput zero-shot prediction at a rate of 0.2 million samples per second, enabling the design of a full 4-inch wafer populated with high-density metasurfaces in minutes. It also shows generative latent-space optimization reaching 97 percent accuracy. Finally, we introduce an optical information storage concept that uses MOCLIP to achieve a density of 0.1 Gbit per square millimeter at the resolution limit, exceeding commercial optical media by a factor of six. These results position MOCLIP as a scalable and versatile platform for next-generation photonic design and data-driven applications.

💡 Analysis

Foundation models (FM) are transforming artificial intelligence by enabling generalizable, data-efficient solutions across different domains for a broad range of applications. However, the lack of large and diverse datasets limits the development of FM in nanophotonics. This work presents MOCLIP (Metasurface Optics Contrastive Learning Pretrained), a nanophotonic foundation model that integrates metasurface geometry and spectra within a shared latent space. MOCLIP employs contrastive learning to align geometry and spectral representations using an experimentally acquired dataset with a sample density comparable to ImageNet-1K. The study demonstrates MOCLIP inverse design capabilities for high-throughput zero-shot prediction at a rate of 0.2 million samples per second, enabling the design of a full 4-inch wafer populated with high-density metasurfaces in minutes. It also shows generative latent-space optimization reaching 97 percent accuracy. Finally, we introduce an optical information storage concept that uses MOCLIP to achieve a density of 0.1 Gbit per square millimeter at the resolution limit, exceeding commercial optical media by a factor of six. These results position MOCLIP as a scalable and versatile platform for next-generation photonic design and data-driven applications.

📄 Content

Deep learning has rapidly permeated nanophotonic inverse design, enabling advancements in metasurfaces [1][2][3][4][5][6], photonic crystals [7], plasmonics [8], and photonic integrated circuits [9]. Deep neural networks open up applications in Tb/s hyperspectral video understanding [10], computational imaging [11], and optical sensing [12]. The next frontier in the field involves integrating artificial general intelligence (AGI) concepts to bridge the gap between data-driven science and physical design constraints, and to discover new forms of adaptive photonics solutions [3,13].

The latest advancements in AGI, represented by foundation models (FMs) [14][15][16], currently drive innovation across multiple scientific fields, including machine vision [17], medical diagnosis [18][19][20], climate modeling [21], material chemistry [22,23], and robotics [24]. In nanophotonics, FMs could transform inverse design from bespoke, task-specific optimization, and deep learning pipelines into unified representations that generalize across devices, facilitating automated design and data-driven analysis of increasingly complex nanophotonic architectures [3].

A primary hurdle to implementing FMs in nanophotonics is their substantial training data requirement [25]. Robust generalization performance demands high dataset variability, quantified by the number of degrees of freedom (DOFs) of the nanostructures [26][27][28], or equivalently by the in-trinsic dimension (ID) of the data manifold [28,29]. The relationship between the dataset size N required to train a deep learning model efficiently and the DOFs d follows the power law N = kd α ,

where k is a constant and α ranges approximately between 0.1 and 2 depending on the modality and task [27,[30][31][32]. Figure 1 compares the state-of-the-art nanophotonic and computer vision datasets based on their size and DOFs. Computer vision datasets, such as ImageNet-1K, used to train the latest generation of FM transformers, exhibit DOFs of approximately 40, yielding densities of 30 000 samples per DOF [27]. In contrast, nanophotonic datasets typically have fewer than 10 DOFs and contain about 10 to 1000 samples per DOF [33][34][35][36][37][38][39], resulting in sample densities per DOF that require orders of magnitude enhancements to reach FM standards [16]. The difficulty in increasing the scale and diversity of nanophotonic datasets stems from the reliance on computationally intensive electrodynamic simulations [1,25]. Simulations generate approximately one sample per minute, even when conducted on high-performance computing platforms (e.g., FDTD on an HPE Cray EX supercomputer using 192 cores). Apart from their slow data generation rate, simulations often fail to accurately capture real material properties, structural imperfections, and rarely model response deviations from the plane-wave approximation [6,25], limiting the generalization properties of deep learning models trained using simulated data. This work introduces an FM for nanophotonic inverse design named Metasurface Optics Contrastive Learning Pretrained (MOCLIP). The model uses more than 11 000 samples per DOF, exceeding the density of most state-of-the-art computer vision datasets and approaching the scale of ImageNet-1K. MOCLIP generalizes the Contrastive Language-Image Pretrained (CLIP) model [40], originally developed by OpenAI for computer vision and widely used in applications such as Sta-ble Diffusion [41], to the domain of nanophotonics. MOCLIP provides a unified embedding space for metasurface geometries and their spectral responses, enabling both spectrum-to-geometry retrieval (inverse design) and geometry-to-spectrum retrieval (spectra prediction) [42] within a single framework.

MOCLIP undergoes training entirely on experimental data, without resorting to computer simulations, using a dataset generation strategy that combines dense wafer-scale fabrication with automated hyperspectral (HS) measurements. The process enables manufacturing around 40 samples per minute and their optical characterization at a rate of 380 samples per minute. This work showcases different applications of MOCLIP, including large-scale inverse design via zero-shot prediction at a rate of 2 • 10 5 samples per second, generative latent space optimization for highfidelity inverse design with over 97% accuracy, and a proof-of-concept optical memory storage technology with information density up to 0.1 Gbit/mm 2 , enhancing the information density of commercial optical solutions by a factor of six [43].

Figure 2 summarizes MOCLIP’s dataset generation, high-level architecture, and applications. Dataset generation starts by algorithmically generating a library of random free-form metasurface geometries from shape primitives represented by ellipses, rectangles, and rings (Fig. 2a). Subsequent fabrication implements these patterns into silicon-on-glass metasurfaces, producing multiple geometries on a single substrate. A custom-made aut

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut