With the proliferation of ultra-high-speed mobile networks and internet-connected devices, along with the rise of artificial intelligence, the world is generating exponentially increasing amounts of data - data that needs to be processed in a fast, efficient and smart way. These developments are pushing the limits of existing computing paradigms, and highly parallelized, fast and scalable hardware concepts are becoming progressively more important. Here, we demonstrate a computational specific integrated photonic tensor core - the optical analog of an ASIC-capable of operating at Tera-Multiply-Accumulate per second (TMAC/s) speeds. The photonic core achieves parallelized photonic in-memory computing using phase-change memory arrays and photonic chip-based optical frequency combs (soliton microcombs). The computation is reduced to measuring the optical transmission of reconfigurable and non-resonant passive components and can operate at a bandwidth exceeding 14 GHz, limited only by the speed of the modulators and photodetectors. Given recent advances in hybrid integration of soliton microcombs at microwave line rates, ultra-low loss silicon nitride waveguides, and high speed on-chip detectors and modulators, our approach provides a path towards full CMOS wafer-scale integration of the photonic tensor core. While we focus on convolution processing, more generally our results indicate the major potential of integrated photonics for parallel, fast, and efficient computational hardware in demanding AI applications such as autonomous driving, live video processing, and next generation cloud computing services.
Integrated photonics benefits from the modularity and scalable fabrication methods of integrated circuits, whilst having two key advantages over its electronic counterparts: (1) massively parallel data transfer through wavelength division multiplexing (WDM) in conjunction with multichannel sources (i.e. optical frequency combs); and (2) extremely high data modulation speeds limited only by the bandwidth of on-chip optical modulators and photodetectors. These uniquely photonic advantages have led to the ubiquity of optical networks for information transfer and are presently revolutionizing data centre interconnects (i.e. server-to-switch communication). However, these developments have yet to seriously challenge digital electronics in the arena of information processing. Despite the current dominance of integrated electronics for computing, an application-specific optical processor not limited by the energy-bandwidth trade-off of electrical interconnects 8 could bring the advantages of optical networking to the field of computing. This would result in very high computational throughput via low-latency (i.e. information processing and propagation at the speed of light) and parallel operations in a single physical optical processing core using WDM -essentially providing an additional scaling dimension through use of frequency space. While the concept of free-space optics for efficient linear computing (e.g. Fourier transforms, convolutions, matrix multiplication, etc.) has existed for many decades 21 and continues to inspire novel computing architectures [22][23][24][25][26] , precise control of the optical phase over the entire system remains the primary factor limiting scalability and commercialization.
Integrated photonics holds promise to solve these challenges. However, integration together with CMOS compatible manufacturing is of paramount importance: on chip, both energy-efficient optical memory units and a compact, broadband multi-channel laser source must be combined within a scalable photonic architecture. Recent work on integrated photonic processors for MVMs and neuromorphic computing [27][28][29] has shown the potential advantages of the photonic approach, but key issues such as large footprints (11,000 µm 2 per interferometer unit 27 ) and the use of thermo-optic heaters to tune the phase or resonance wavelength of their components (ranging on average from 1 mW to 10 mW per heater for ring resonators and Mach-Zehnder interferometers respectively) were bottlenecks 30 , as well as resonant devices such as add-drop resonators that limit the modulation bandwidth. Additionally, while using WDM for processing multiple inputs simultaneously in the same physical hardware has been proposed 31 , it has not yet been demonstrated on-chip.
Here, we design and experimentally demonstrate a novel scalable, CMOS compatible, photonic hardware accelerator (which we term “photonic tensor core” in the following) capable of many parallel MVM operations at optical data rates to process images using convolution filters (here, edge detection and emboss filters) and test it on the MNIST database with a smallscale convolutional neural network (CNN). In a departure from electronic accelerators (see Fig. 1a), our photonic processor implements an on-chip matrix multiplication engine capable of performing parallel multiply-accumulate operations using multiple wavelengths derived from a photonic chip-based optical frequency comb, that are incoherently added within a network of waveguides that exploit phase-change materials. We leverage recent advances in chip-scale microcombs 32,33 operating in the regime of dissipative Kerr soliton (DKS) states, which enable broadband, low-noise, and fully integrated optical frequency combs with line spacing ranging from GHz to THz domains and that are compatible with wafer scale manufacturing and integration with on-chip lasers [34][35][36] . These devices have already been employed in system level demonstrations such as massively parallel coherent communications 37 , chip-scale frequency synthesizers 38 , and massively parallel LiDAR 39 . Thus far DKS systems, however, have remained unexplored for photonic computing. Key to our approach is the encoding of image data onto the individual comb teeth of an on-chip frequency comb, and subsequently encoding fixed convolutional kernels in the nonvolatile configuration (i.e. the amorphous or crystalline phase) of integrated phase-change material cells that couple evanescently to a matrix of interconnected photonic waveguides (shown in Fig. 1b). Our approach minimizes both latency and the movement of data, by using non-volatile in-memory photonic MAC operations and greatly reduces the footprint cost of photonics by multiplexing computations in the same photonic core. Importantly, both the soliton microcombs and the matrix of photonic waveguides can be implemented in silicon nitride 40 , an ultra-low loss, CMOS compatible nonlinear integrated phot
This content is AI-processed based on open access ArXiv data.