Scalable photonic reinforcement learning by time-division multiplexing of laser chaos

Reading time: 5 minute
...

📝 Original Info

  • Title: Scalable photonic reinforcement learning by time-division multiplexing of laser chaos
  • ArXiv ID: 1803.09425
  • Date: 2023-11-07
  • Authors: Yoshitaka Yamamoto, Takahiro Hasegawa, Masahiro Kato, Hiroshi Saito, Kenji Nakajima, Yusuke Tanaka, Shinya Watanabe, Daisuke Fujita, Koichi Takahashi, Junichi Mori

📝 Abstract

Reinforcement learning involves decision making in dynamic and uncertain environments and constitutes a crucial element of artificial intelligence. In our previous work, we experimentally demonstrated that the ultrafast chaotic oscillatory dynamics of lasers can be used to solve the two-armed bandit problem efficiently, which requires decision making concerning a class of difficult trade-offs called the exploration-exploitation dilemma. However, only two selections were employed in that research; thus, the scalability of the laser-chaos-based reinforcement learning should be clarified. In this study, we demonstrated a scalable, pipelined principle of resolving the multi-armed bandit problem by introducing time-division multiplexing of chaotically oscillated ultrafast time-series. The experimental demonstrations in which bandit problems with up to 64 arms were successfully solved are presented in this report. Detailed analyses are also provided that include performance comparisons among laser chaos signals generated in different physical conditions, which coincide with the diffusivity inherent in the time series. This study paves the way for ultrafast reinforcement learning by taking advantage of the ultrahigh bandwidths of light wave and practical enabling technologies.

💡 Deep Analysis

Figure 1

📄 Full Content

Recently, the use of photonics for information processing and artificial intelligence has been intensively studied by exploiting the unique physical attributes of photons. The latest examples include a coherent Ising machine for combinatorial optimization 1 , photonic reservoir computing to perform complex time-series predictions 2,3 , and ultrafast random number generation using chaotic dynamics in lasers 4,5 in which the ultrahigh bandwidth attributes of light bring novel advantages. Reinforcement learning, also called decision making, is another important branch of research, which involves making decisions promptly and accurately in uncertain, dynamically changing environments 6 and constitutes the foundation of a variety of applications ranging from communication infrastructures 7,8 and robotics 9 to computer gaming 10 .

The multi-armed bandit problem (MAB) is known to be a fundamental reinforcement learning problem where the goal is to maximize the total reward from multiple slot machines whose reward probabilities are unknown and could dynamically change 6 . To solve the MAB, it is necessary to explore higher-reward slot machines. However, too much exploration may result in excessive loss, whereas too quick of a decision or insufficient exploration may lead to missing the best machine; thus, there is a trade-off referred to as the explorationexploitation dilemma 11 .

In our previous study, we experimentally demonstrated that the ultrafast chaotic oscillatory dynamics of lasers [2][3][4][5] can be used to solve the MAB efficiently 12 . With a chaotic time series generated by a semiconductor laser with a delayed feedback sampled at a maximum rate of 100 GSample/s followed by a digitization mechanism with a variable threshold, ultrafast, adaptive, and accurate decision making was demonstrated. Such ultrafast decision making is unachievable using conventional algorithms on digital computers 11,13,14 , which rely on pseudorandom numbers. It was also demonstrated that the decision-making performance is maximized by utilizing an optimal sampling interval that exactly coincides with the negative autocorrelation inherent in the chaotic time series 12 . Moreover, even when assuming that pseudorandom numbers and coloured noise were available in such a highspeed domain, the laser chaos method outperformed these alternatives; that is, chaotic dynamics yields superior decision-making abilities 12 .

However, only two options, or slot machines, were employed in the MAB investigated therein; that is, the two-armed bandit problem was studied. A scalable principle and technologies toward an N-armed bandit with N being a natural number are strongly demanded for practical applications. In addition, detailed insights into the relations between the resulting decision-making abilities and properties of chaotic signal trains should be pursued to achieve deeper physical understanding as well as performance optimization at the physical or photonic device level.

In this study, we experimentally demonstrated a scalable photonic reinforcement learning principle based on ultrafast chaotic oscillatory dynamics in semiconductor lasers. Taking advantage of the high-bandwidth attributes of chaotic lasers, we incorporated the concept of time-division multiplexing into the decision-making strategy; specifically, consecutively sampled chaotic signals are used in the proposed method to determine the identity of the slot machine in a binary digit form.

In the recent literature on photonic decision making, near-field-mediated optical excitation transfer 15,16 and single photon 17,18 methods have been discussed; the former technique involves pursuing the diffraction-limit-free spatial resolution 19 , whereas the latter reveals the benefits of the wave-particle duality of single light quanta 20 . A promising approach for achieving scalability by means of near-field-coupled excitation transfer or single photons is spatial parallelism; indeed, a hierarchical principle has been successfully demonstrated experimentally in solving the four-armed bandit problem using single photons 18 . In contrast, the high-bandwidth attributes of chaotic lasers accommodate time-division multiplexing and have been successfully used in optical communications 21 .

In this study, we transformed the hierarchical decision-making strategy 18 into the time domain, transcending the barrier toward scalability. We also successfully resolved the bandit problem with up to 64 arms.

Meanwhile, four kinds of chaotic signals experimentally generated in different conditions, as well as quasiperiodic sequences, were subjected to performance comparisons and characterizations, including diffusivity analysis. In addition, computer-generated pseudorandom signals and coloured noise were used to clarify the similarities and differences with respect to chaotically fluctuating random signals. Detailed dependency analysis with regard to the precision of parameter adjustments, samplin

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut