Ultrafast photonic reinforcement learning based on laser chaos

Reading time: 6 minute
...

📝 Abstract

Reinforcement learning involves decision making in dynamic and uncertain environments, and constitutes one important element of artificial intelligence (AI). In this paper, we experimentally demonstrate that the ultrafast chaotic oscillatory dynamics of lasers efficiently solve the multi-armed bandit problem (MAB), which requires decision making concerning a class of difficult trade-offs called the exploration-exploitation dilemma. To solve the MAB, a certain degree of randomness is required for exploration purposes. However, pseudo-random numbers generated using conventional electronic circuitry encounter severe limitations in terms of their data rate and the quality of randomness due to their algorithmic foundations. We generate laser chaos signals using a semiconductor laser sampled at a maximum rate of 100 GSample/s, and combine it with a simple decision-making principle called tug-of-war with a variable threshold, to ensure ultrafast, adaptive and accurate decision making at a maximum adaptation speed of 1 GHz. We found that decision-making performance was maximized with an optimal sampling interval, and we highlight the exact coincidence between the negative autocorrelation inherent in laser chaos and decision-making performance. This study paves the way for a new realm of ultrafast photonics in the age of AI, where the ultrahigh bandwidth of photons can provide new value.

💡 Analysis

Reinforcement learning involves decision making in dynamic and uncertain environments, and constitutes one important element of artificial intelligence (AI). In this paper, we experimentally demonstrate that the ultrafast chaotic oscillatory dynamics of lasers efficiently solve the multi-armed bandit problem (MAB), which requires decision making concerning a class of difficult trade-offs called the exploration-exploitation dilemma. To solve the MAB, a certain degree of randomness is required for exploration purposes. However, pseudo-random numbers generated using conventional electronic circuitry encounter severe limitations in terms of their data rate and the quality of randomness due to their algorithmic foundations. We generate laser chaos signals using a semiconductor laser sampled at a maximum rate of 100 GSample/s, and combine it with a simple decision-making principle called tug-of-war with a variable threshold, to ensure ultrafast, adaptive and accurate decision making at a maximum adaptation speed of 1 GHz. We found that decision-making performance was maximized with an optimal sampling interval, and we highlight the exact coincidence between the negative autocorrelation inherent in laser chaos and decision-making performance. This study paves the way for a new realm of ultrafast photonics in the age of AI, where the ultrahigh bandwidth of photons can provide new value.

📄 Content

1 Ultrafast photonic reinforcement learning based on laser chaos Makoto Naruse1, Yuta Terashima2, Atsushi Uchida2 & Song-Ju Kim3

1 Strategic Planning Department, National Institute of Information and Communications Technology, 4-2-1 Nukui-kita, Koganei, Tokyo 184-8795, Japan 2 Department of Information and Computer Sciences, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama city, Saitama 338-8570, Japan 3 WPI Center for Materials Nanoarchitectonics, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan

2 ABSTRACT Reinforcement learning involves decision making in dynamic and uncertain environments, and constitutes one important element of artificial intelligence (AI). In this paper, we experimentally demonstrate that the ultrafast chaotic oscillatory dynamics of lasers efficiently solve the multi-armed bandit problem (MAB), which requires decision making concerning a class of difficult trade-offs called the explorationexploitation dilemma. To solve the MAB, a certain degree of randomness is required for exploration purposes. However, pseudo-random numbers generated using conventional electronic circuitry encounter severe limitations in terms of their data rate and the quality of randomness due to their algorithmic foundations. We generate laser chaos signals using a semiconductor laser sampled at a maximum rate of 100 GSample/s, and combine it with a simple decision-making principle called tug-of-war with a variable threshold, to ensure ultrafast, adaptive and accurate decision making at a maximum adaptation speed of 1 GHz. We found that decision-making performance was maximized with an optimal sampling interval, and we highlight the exact coincidence between the negative autocorrelation inherent in laser chaos and decision-making performance. This study paves the way for a new realm of ultrafast photonics in the age of AI, where the ultrahigh bandwidth of photons can provide new value.

3 INTRODUCTION Physical unique attributes of photons have been utilized in information processing in the literature of optical computing1. New photonic processing principles have recently emerged to solve complex time-series prediction problems2-4, and issues in spatiotemporal dynamics5 and combinatorial optimization6, which coincide with the rapid shift to the age of artificial intelligence (AI). These novel approaches exploit the ultrahigh bandwidth attributes of photons and their enabling device technologies2,3,6. This paper experimentally demonstrates the usefulness of ultrafast chaotic oscillatory dynamics in semiconductor lasers for reinforcement learning, which is among the most important elements in machine learning. Reinforcement learning involves adequate decision making in dynamic and uncertain environments7. It forms the foundation of a variety of applications, such as information infrastructures8, online advertisements9, robotics10, transportation11, and Monte Carlo tree search12, which is used in computer gaming13. A fundamental of reinforcement learning is known as the multi- armed bandit problem (MAB), where the goal is to maximize total reward from multiple slot machines, the reward probabilities of which are unknown7,14,15. To solve the MAB, one needs to explore better slot machines. However, too much exploration may result in excessive loss, whereas too quick a decision, or insufficient exploration, may lead to neglect of the best machine. There is a trade-off, referred to as the explorationexploitation dilemma7. A variety of algorithms for solving

4 the MAB have been proposed in the literature, such as -greedy14, softmax16, and upper confidence bound17. These approaches typically involve probabilistic attributes, especially for exploration purposes. While the implementation and improvements of such algorithms on conventional digital computing are important for various practical applications, understanding their limitations and investigating novel approaches are also important based on perspectives from postsilicon computing. For example, the pseudo-random number generation (RNG) used in conventional algorithmic approaches has severe limitations, such as its data rate, due to the operating frequencies of digital processors (~ GHz range). Moreover, the quality of randomness in RNG has serious limitations18. The usefulness of photonic random processes for machine learning is also discussed by utilizing multiple optical scattering19. We consider that directly utilizing physical irregular processes in nature is an exciting approach with the goal of realizing artificially constructed, physical decision-making machines20. Indeed, the intelligence of slime moulds or amoebae, a single-cell natural organism, has been used in solution searches, whereby complex inter-cellular spatiotemporal dynamics play a key role21. This stimulated the subsequent disc

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut