Reinforcement learning (RL) is a subfield of machine learning that focuses on developing models that can autonomously learn optimal decision-making strategies over time. In a recent pioneering paper, Wagner demonstrated how the Deep Cross-Entropy RL method can be applied to tackle various problems from extremal graph theory by reformulating them as combinatorial optimization problems. Subsequently, many researchers became interested in refining and extending the framework introduced by Wagner, thereby creating various RL environments specialized for graph theory. Moreover, a number of problems from extremal graph theory were solved through the use of RL. In particular, several inequalities concerning the Laplacian spectral radius of graphs were refuted, new lower bounds were obtained for certain Ramsey numbers, and contributions were made to the Turán-type extremal problem in which the forbidden structures are cycles of length three and four. Here, we present Reinforcement Learning for Graph Theory (RLGT), a novel RL framework that systematizes the previous work and provides support for both undirected and directed graphs, with or without loops, and with an arbitrary number of edge colors. The framework efficiently represents graphs and aims to facilitate future RL-based research in extremal graph theory through optimized computational performance and a clean and modular design.
Reinforcement learning (RL) is a subfield of machine learning (ML) that deals with developing models that automatically learn optimal decisions over time [33]. At a high level, an RL system comprises an agent and an environment, with the agent iteratively interacting with the environment by performing actions on it, and the environment providing feedback in return through observations and rewards. In an RL setting, the agent aims to discover a strategy, called the policy, which should maximize the long-term success of its actions with respect to the rewards returned by the environment. Since the agent learns purely by interacting with the environment without having any additional information on the problem being solved, RL is considered to be much more focused on goal-directed learning through interaction than other ML paradigms [50].
A combinatorial optimization problem is any problem where a given function f : C → R should be maximized (resp. minimized) over a finite set of configurations C. As it turns out, the RL formalism can naturally be adapted to tackle such problems; see [34] and the references therein. This can be achieved by considering an RL environment whose states correspond to complete (or partial) configurations, and where the rewards indicate how an action improves or worsens a given configuration with respect to f . Here, we consider the applications of RL to solving combinatorial optimization problems pertaining to graphs, i.e., extremal graph theory problems.
Recently, Wagner [54] demonstrated how RL can be successfully used to construct counterexamples that refute graph theory conjectures. His idea was to create an RL environment that constructs simple undirected graphs of a given order n ∈ N by arranging the n 2 unordered pairs of vertices in some manner and executing n 2 binary actions that correspond to these pairs. Here, if the i-th action is 1, then the vertices in the i-th pair should be adjacent; otherwise, they should not be adjacent. Additionally, a reward is received only after the final action is executed, and it should equal a configurable graph invariant f of the constructed graph. Although such an environment is simple, Wagner showed that the Deep Cross-Entropy method [7,43] can be used in conjunction with it to achieve satisfactory results for the problem of maximizing a graph invariant f over the set of graphs of a given order. As a direct consequence, it is possible to disprove inequalities involving graphs by transforming the expression L(G) ⩽ R(G) to L(G) -R(G) ⩽ 0 and finding a graph whose corresponding invariant is positive, where L(G) (resp. R(G)) denotes the left-hand (resp. right-hand) side of an inequality in graph G. With this approach, Wagner disproved several conjectured claims either by directly obtaining a counterexample or by uncovering structural patterns that helped manually construct a counterexample.
Wagner’s approach was also successfully used in [49] to refute a conjecture by Akbari, Alazemi and Anđelić [2] on the graph energy and the matching number of graphs. Afterwards, Ghebleh et al. [24] offered a reimplementation of Wagner’s approach to increase its readability, stability and computational performance. In this framework, the Deep Cross-Entropy method was again used in conjunction with the RL environment introduced by Wagner, but the operations involving states were notably implemented more efficiently through NumPy-based vectorization [30]. Additionally, the final reward function was turned into a separate argument so that it could optionally be executed more efficiently using external code, e.g., Java code using JPype [37]. With this approach and by applying the features from the graph6java library [25], the authors succeeded in disproving several previously conjectured upper bounds on the Laplacian spectral radius of graphs [10]. We briefly note that Taieb et al. [52] successfully refuted two more of these upper bounds by applying the Monte Carlo search technique.
Using the same framework developed in [24], Ghebleh et al. [23] obtained four new lower bounds on small Ramsey numbers involving complete bipartite graphs, wheel graphs and book graphs [41]. Afterwards, this framework was used once again by the same authors [22] to help obtain an explicit construction of harmonic graphs [42] with arbitrarily many distinct vertex degrees.
Concurrently, Mehrabian et al. [36] used RL to tackle the Turán-type extremal problem [46] originally posed by Erdős [20] in 1975, in which the forbidden structures are cycles of length three and four. In their approach, a different RL environment was used, where the states are all the graphs of a given order and the actions are edgeflipping operations. By incorporating curriculum learning [48] into the AlphaZero [45] and tabu search [26,27] algorithms, they obtained new lower bounds for n ∈ {64, 65, 66, . . . , 134}, where n is the graph order. We mention in passing that this was achieved through a novel neura
This content is AI-processed based on open access ArXiv data.