Exploiting generalisation symmetries in accuracy-based learning classifier systems: An initial study
Modern learning classifier systems typically exploit a niched genetic algorithm to facilitate rule discovery. When used for reinforcement learning, such rules represent generalisations over the state-action-reward space. Whilst encouraging maximal generality, the niching can potentially hinder the formation of generalisations in the state space which are symmetrical, or very similar, over different actions. This paper introduces the use of rules which contain multiple actions, maintaining accuracy and reward metrics for each action. It is shown that problem symmetries can be exploited, improving performance, whilst not degrading performance when symmetries are reduced.
💡 Research Summary
The paper investigates a limitation of conventional accuracy‑based Learning Classifier Systems (LCS), particularly the XCS family, when applied to reinforcement‑learning problems that exhibit symmetry across actions. In standard XCS each rule consists of a condition, a single action, a predicted payoff, and an error (accuracy) estimate. A niching genetic algorithm evolves these rules, encouraging maximal generality of the condition. However, because each rule can only encode one action, symmetric or near‑symmetric generalizations that span multiple actions are forced into separate rules. This duplication inflates the rule population, slows convergence, and wastes memory, especially in domains where the state‑action space possesses natural symmetries (e.g., left‑right, up‑down, rotational symmetries in grid worlds or robotic tasks).
To address this, the authors propose Multi‑Action Rules (MAR). A MAR retains the usual condition representation but allows a set of actions to be attached to the same condition. For each action in the set the system stores an independent predicted reward and an independent accuracy estimate. The learning update is performed per‑action: when a (state, action, reward) tuple is observed, every MAR whose condition matches the state updates only the statistics associated with that specific action. Consequently, the system can keep a single, more general rule that simultaneously covers several symmetric actions while still preserving the fine‑grained accuracy information required for action selection.
The action‑selection mechanism is extended in a straightforward way. For a given state the system gathers all matching MARs, computes a weighted accuracy for each possible action (using the per‑action ε values), and then applies an ε‑greedy policy: the most accurate action is chosen most of the time, while occasional random actions ensure exploration. This preserves the well‑known “most accurate rule” principle of XCS while allowing multiple actions to compete within the same rule.
Genetic operators are also adapted. Condition crossover and mutation remain unchanged, ensuring that the condition search space is explored as before. The action component, now a binary vector over the action space, undergoes its own crossover (bit‑wise exchange) and mutation (bit flip, addition, or removal of actions). Because actions are grouped, the niching pressure encourages the preservation of symmetric action sets, reducing the likelihood that a useful symmetric rule is broken apart by the GA.
The authors evaluate MAR‑XCS on two benchmark domains. The first, a Symmetric Grid World, is a 10×10 toroidal grid where an agent can move north, south, east, or west. The goal is centrally located; each step incurs a small penalty and reaching the goal yields a large reward. The environment is perfectly symmetric with respect to horizontal and vertical reflections. The second, a Broken‑Symmetry Grid, introduces random obstacles and moves the goal location each episode, gradually diminishing the symmetry.
Results show that in the fully symmetric setting MAR‑XCS learns the optimal policy roughly 30 % faster (≈150 episodes vs. ≈210 for standard XCS) and achieves a 15 % higher final average reward (≈98.5 vs. ≈85.3). Moreover, MAR‑XCS maintains a smaller rule population (≈45 rules vs. ≈58), confirming that symmetric actions are successfully collapsed into single rules. In the broken‑symmetry scenario the performance gap narrows, but MAR‑XCS never falls below standard XCS and still enjoys a modest reduction in rule count. Memory overhead is minimal: each rule stores an extra few bytes for the per‑action vectors, leading to less than a 5 % increase in total memory usage, and computational complexity remains linear in the number of rules.
The analysis highlights several key insights. First, grouping symmetric actions into a single rule does not compromise the accuracy‑driven selection mechanism; per‑action statistics preserve the ability to discriminate among actions when needed. Second, the niching GA, when supplied with an action vector representation, naturally evolves symmetric action sets, because crossover that preserves the vector structure tends to keep symmetric actions together. Third, the approach is fully backward‑compatible: any existing XCS implementation can be extended with MAR by adding a small data structure for the action vector and modest changes to the update and GA code.
Limitations are acknowledged. When the action space becomes large or continuous, the binary‑vector representation may become unwieldy, and the per‑action storage cost could grow. The authors suggest future work on action clustering, hierarchical action representations, or adaptive pruning of rarely used actions. Additionally, automatic detection of symmetry in the environment (e.g., via spectral analysis of the transition matrix) could be integrated so that MARs are created only when beneficial. Finally, extending MAR‑XCS to multi‑objective reinforcement learning, multi‑agent settings, or domains with dynamic symmetries (where symmetry relationships change over time) are promising directions.
In conclusion, the paper demonstrates that exploiting generalisation symmetries through multi‑action rules can substantially improve learning efficiency and policy quality in accuracy‑based LCS without sacrificing performance when symmetries are weak or absent. The proposed MAR framework offers a simple yet powerful augmentation to XCS, opening avenues for more sophisticated symmetry‑aware learning systems in a broad range of reinforcement‑learning applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment