Scalable photonic reinforcement learning by time-division multiplexing of laser chaos

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Reinforcement learning involves decision making in dynamic and uncertain environments and constitutes a crucial element of artificial intelligence. In our previous work, we experimentally demonstrated that the ultrafast chaotic oscillatory dynamics of lasers can be used to solve the two-armed bandit problem efficiently, which requires decision making concerning a class of difficult trade-offs called the exploration-exploitation dilemma. However, only two selections were employed in that research; thus, the scalability of the laser-chaos-based reinforcement learning should be clarified. In this study, we demonstrated a scalable, pipelined principle of resolving the multi-armed bandit problem by introducing time-division multiplexing of chaotically oscillated ultrafast time-series. The experimental demonstrations in which bandit problems with up to 64 arms were successfully solved are presented in this report. Detailed analyses are also provided that include performance comparisons among laser chaos signals generated in different physical conditions, which coincide with the diffusivity inherent in the time series. This study paves the way for ultrafast reinforcement learning by taking advantage of the ultrahigh bandwidths of light wave and practical enabling technologies.

💡 Research Summary

This paper presents a scalable photonic reinforcement‑learning architecture that leverages the ultrafast chaotic dynamics of lasers to solve multi‑armed bandit (MAB) problems far beyond the two‑arm case demonstrated in earlier work. The authors introduce a time‑division multiplexing (TDM) scheme that slices a continuous chaotic laser time series into discrete time slots, each slot being mapped to a distinct bandit arm. By doing so, a single laser source can simultaneously drive dozens of logical decision‑making units without physical duplication, effectively creating a pipelined, high‑throughput reinforcement‑learning engine.

Laser chaos is generated by feeding back a portion of the optical output into either a semiconductor laser or a fiber laser. Adjusting feedback strength, injection current, and temperature yields chaotic waveforms with bandwidths in the tens of gigahertz and characteristic diffusion properties quantified by mean‑square displacement (MSD) and autocorrelation time. The authors experimentally produce four chaotic signals under different physical conditions (high/low feedback, high/low temperature) to explore how the intrinsic diffusivity of the time series influences learning performance.

In the TDM framework, the chaotic waveform is sampled by a high‑speed analog‑to‑digital converter (ADC) and processed in real time on an FPGA. A fixed slot duration Δt (as short as 0.5 ns) defines the number of arms that can be addressed; for a 64‑arm problem, 64 slots are interleaved within a single laser period. Within each slot, the sampled voltage value is interpreted as a stochastic preference for the corresponding arm. The preference is updated using a “chaos‑weight” rule: if the current sample exceeds the previous one, the arm’s selection probability is increased, otherwise it is decreased. This rule replaces conventional soft‑max or Upper‑Confidence‑Bound (UCB) updates, allowing the physical randomness of the laser to directly drive the exploration‑exploitation trade‑off.

The experimental protocol evaluates the system on MAB instances with 2, 4, 8, 16, 32, and 64 arms. For each configuration, the reward probabilities of the arms are drawn uniformly from

Scalable photonic reinforcement learning by time-division multiplexing of laser chaos

💡 Research Summary

Comments & Academic Discussion

Leave a Comment