Exploring the holographic entropy cone via reinforcement learning
We develop a reinforcement learning algorithm to study the holographic entropy cone. Given a target entropy vector, our algorithm searches for a graph realization whose min-cut entropies match the target vector. If the target vector does not admit such a graph realization, it must lie outside the cone, in which case the algorithm finds a graph whose corresponding entropy vector most nearly approximates the target and allows us to probe the location of the facets. For the $\sf N=3$ cone, we confirm that our algorithm successfully rediscovers monogamy of mutual information beginning with a target vector outside the holographic entropy cone. We then apply the algorithm to the $\sf N=6$ cone, analyzing the 6 “mystery” extreme rays of the subadditivity cone from arXiv:2412.15364 that satisfy all known holographic entropy inequalities yet lacked graph realizations. We found realizations for 3 of them, proving they are genuine extreme rays of the holographic entropy cone, while providing evidence that the remaining 3 are not realizable, implying unknown holographic inequalities exist for $\sf N=6$.
💡 Research Summary
The authors introduce a reinforcement‑learning (RL) framework to investigate the holographic entropy cone (HEC), a polyhedral cone that captures all entropy vectors realizable by holographic states. The key observation underlying the work is that any holographic entropy vector can be represented as a set of minimum‑cut values on a weighted graph whose boundary vertices correspond to the subsystems (including a purifier). Consequently, determining whether a target entropy vector lies inside the HEC reduces to the combinatorial problem of finding a graph whose min‑cut entropies exactly match the target.
The RL algorithm treats the edge‑weight configuration of the graph as the environment state. A policy network receives the current weight vector and proposes an update; the reward is defined as the cosine similarity between the entropy vector produced by the current graph (via min‑cut calculations) and the target vector. A reward of 1 indicates perfect agreement, i.e., the target is inside the HEC and an exact graph realization has been found. If the maximal achievable reward is less than 1, the target lies outside the cone; the gradient of the reward then points toward the nearest facet, providing a data‑driven probe of yet‑unknown holographic entropy inequalities.
The paper first validates the method on the N = 3 case, where the HEC is completely characterized by subadditivity (SA) and the monogamy of mutual information (MMI). Because the reward landscape can be computed analytically for this low‑dimensional setting, the authors compare the learned policy gradients with the exact analytical gradients. Starting from a target vector that violates MMI, the RL agent’s updates move the graph weights toward the MMI boundary, effectively “rediscovering’’ the inequality. This proof‑of‑concept demonstrates that the RL approach not only classifies vectors correctly but also yields physically meaningful gradient directions.
The main application concerns the N = 6 subadditivity cone (SAC). Prior work identified 208 new extreme rays of the SAC that satisfy all known holographic entropy inequalities; among them, six “mystery” rays lacked any known graph realization. The authors train the same RL architecture (a three‑layer multilayer perceptron with ReLU activations, Adam optimizer, learning rate 1e‑4) on each of these six targets for up to one million steps, employing reward scaling, gradient clipping, and a “safe‑distance” constraint to maintain numerical stability.
The results are striking: three of the mystery rays admit exact graph realizations discovered by the RL agent, thereby confirming them as genuine extreme rays of the N = 6 HEC. For the remaining three, the best‑achieved reward stays below 0.97 despite extensive training, and no exact min‑cut match is found. This suggests that these three vectors lie outside the current HEC description, implying the existence of additional, as‑yet‑unknown holographic entropy inequalities at six parties.
Technical contributions include: (i) a clear formulation of the reward function that simultaneously serves as a classifier and a gradient‑based facet locator; (ii) a gradient‑constrained movement algorithm that respects a “safe distance’’ from the boundary to avoid unstable updates; (iii) quantitative analysis of sample complexity, showing how the signal‑to‑noise ratio and number of training episodes affect convergence. The authors also release their code (GitHub link) and provide extensive supplementary material, such as analytical derivations for the symmetric N = 3 case and detailed hyper‑parameter studies.
In the discussion, the authors argue that RL offers a scalable, automated tool for exploring the combinatorial explosion inherent in higher‑party HEC studies. They propose future extensions such as graph‑neural‑network policies, multi‑agent cooperation, and facet‑specific reward shaping to target particular conjectured inequalities. Moreover, the methodology could be adapted to other entropy‑cone problems, including the full quantum entropy cone, by replacing the min‑cut oracle with appropriate quantum entropy estimators.
Overall, the paper demonstrates that a relatively simple vanilla policy‑gradient RL algorithm can effectively navigate the high‑dimensional space of graph realizations, correctly classify entropy vectors, and uncover new structural information about the holographic entropy cone. The discovery that half of the previously “mystery’’ extreme rays are realizable while the other half likely require new inequalities constitutes a concrete advance in our understanding of holographic entanglement structure.
Comments & Academic Discussion
Loading comments...
Leave a Comment