Entangled and correlated photon mixed strategy for social decision making

Collective decision making is important for maximizing total benefits while preserving equality among individuals in the competitive multi-armed bandit (CMAB) problem, wherein multiple players try to

Entangled and correlated photon mixed strategy for social decision making

Collective decision making is important for maximizing total benefits while preserving equality among individuals in the competitive multi-armed bandit (CMAB) problem, wherein multiple players try to gain higher rewards from multiple slot machines. The CMAB problem represents an essential aspect of applications such as resource management in social infrastructure. In a previous study, we theoretically and experimentally demonstrated that entangled photons can physically resolve the difficulty of the CMAB problem. This decision-making strategy completely avoids decision conflicts while ensuring equality. However, decision conflicts can sometimes be beneficial if they yield greater rewards than non-conflicting decisions, indicating that greedy actions may provide positive effects depending on the given environment. In this study, we demonstrate a mixed strategy of entangled- and correlated-photon-based decision-making so that total rewards can be enhanced when compared to the entangled-photon-only decision strategy. We show that an optimal mixture of entangled- and correlated-photon-based strategies exists depending on the dynamics of the reward environment as well as the difficulty of the given problem. This study paves the way for utilizing both quantum and classical aspects of photons in a mixed manner for decision making and provides yet another example of the supremacy of mixed strategies known in game theory, especially in evolutionary game theory.


💡 Research Summary

The paper addresses the competitive multi‑armed bandit (CMAB) problem, a paradigm in which several agents simultaneously select from multiple stochastic resources (slot machines) and aim to maximize their cumulative rewards while preserving fairness. In earlier work the authors showed that pairs of entangled photons can be used to implement a “quantum‑cooperative” decision‑making protocol: the measurement outcomes of the two photons are perfectly correlated, guaranteeing that the two agents never choose the same arm (no conflict) and that they receive identical expected payoffs. This strategy excels in static, symmetric environments but may be sub‑optimal when the reward landscape changes over time or when occasional conflicts can lead to the discovery of higher‑payoff arms.

To overcome this limitation the authors propose a mixed strategy that combines the entangled‑photon protocol with a classical‑correlated‑photon protocol. In the latter, the photons are not entangled but their polarizations are statistically correlated; each agent makes a probabilistic choice based on its local measurement, allowing for both coordinated and independent actions. By varying the proportion of entangled‑photon versus correlated‑photon trials, the system can interpolate between pure cooperation (zero conflict) and pure exploration (higher conflict but potentially higher reward).

The experimental design consists of two reward environments. (1) A static, symmetric environment where all arms have identical, time‑invariant reward probabilities. (2) A dynamic, asymmetric environment where each arm’s reward probability drifts and occasional “high‑reward bursts” appear on specific arms. For each environment the authors run five mixing ratios (0 % entangled / 100 % correlated, 25 % / 75 %, 50 % / 50 %, 75 % / 25 %, 100 % / 0 %) over 10⁴ decision rounds, recording total accumulated reward, inter‑agent reward disparity (a fairness metric), and the frequency of arm‑selection conflicts.

Results show that in the static symmetric case the pure entangled‑photon strategy yields the highest fairness and the maximal total reward, confirming previous findings. In the dynamic asymmetric case, however, the best performance is achieved with an intermediate mixture: roughly 30–40 % entangled photons and 60–70 % correlated photons. The correlated‑photon component supplies stochastic exploration that quickly detects shifting high‑reward arms, while the entangled component preserves a baseline of coordinated exploitation once a promising arm is identified. Notably, a modest level of conflict does not dramatically reduce total reward; instead, it can be beneficial when it drives agents toward different arms that happen to be more lucrative at that moment.

From these observations the authors construct a theoretical framework for adaptive mixing. They propose a Bayesian estimator that continuously updates the perceived volatility (σ²) of the reward environment. The mixing ratio is then adjusted in real time: higher estimated volatility leads to a larger weight on the correlated‑photon (exploratory) protocol, whereas low volatility favors the entangled‑photon (exploitative) protocol. Simulations confirm that this adaptive rule approaches the empirically optimal fixed mixtures across a range of player counts, arm numbers, and volatility profiles.

The paper situates the mixed quantum‑classical approach within the broader context of game theory, particularly the concept of mixed‑strategy equilibrium in evolutionary games. Entangled photons represent an extreme point in strategy space (pure cooperation), while correlated photons span a continuum of probabilistic strategies. By physically combining these points, the decision‑making system can dynamically occupy interior points of the mixed‑strategy simplex, thereby achieving performance that neither pure strategy can attain alone.

In conclusion, the study demonstrates that leveraging both quantum entanglement and classical correlation in a tunable hybrid architecture can enhance collective decision‑making in environments where reward structures are uncertain or time‑varying. The findings open avenues for quantum‑enhanced distributed algorithms in resource allocation, network routing, and other socio‑technical systems where fairness, adaptability, and total efficiency must be balanced. Future work is suggested on scaling to many agents, integrating predictive models of reward dynamics, and deploying the protocol in real‑world infrastructure testbeds.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...