Tournament selection in zeroth-level classifier systems based on average reward reinforcement learning
📝 Abstract
As a genetics-based machine learning technique, zeroth-level classifier system (ZCS) is based on a discounted reward reinforcement learning algorithm, bucket-brigade algorithm, which optimizes the discounted total reward received by an agent but is not suitable for all multi-step problems, especially large-size ones. There are some undiscounted reinforcement learning methods available, such as R-learning, which optimize the average reward per time step. In this paper, R-learning is used as the reinforcement learning employed by ZCS, to replace its discounted reward reinforcement learning approach, and tournament selection is used to replace roulette wheel selection in ZCS. The modification results in classifier systems that can support long action chains, and thus is able to solve large multi-step problems.
💡 Analysis
As a genetics-based machine learning technique, zeroth-level classifier system (ZCS) is based on a discounted reward reinforcement learning algorithm, bucket-brigade algorithm, which optimizes the discounted total reward received by an agent but is not suitable for all multi-step problems, especially large-size ones. There are some undiscounted reinforcement learning methods available, such as R-learning, which optimize the average reward per time step. In this paper, R-learning is used as the reinforcement learning employed by ZCS, to replace its discounted reward reinforcement learning approach, and tournament selection is used to replace roulette wheel selection in ZCS. The modification results in classifier systems that can support long action chains, and thus is able to solve large multi-step problems.
📄 Content
Tournament selection in zeroth-level classifier systems based on average reward reinforcement learning
Zang Zhaoxiang, Li Zhao, Wang Junying, Dan Zhiping zxzang@gmail.com; zangzx@hust.edu.cn (Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang Hubei, 443002, China; College of Computer and Information Technology, China Three Gorges University, Yichang Hubei, 443002, China)
Abstract: As a genetics-based machine learning technique, zeroth-level classifier system (ZCS) is based on a discounted reward reinforcement learning algorithm, bucket-brigade algorithm, which optimizes the discounted total reward received by an agent but is not suitable for all multi-step problems, especially large-size ones. There are some undiscounted reinforcement learning methods available, such as R-learning, which optimize the average reward per time step. In this paper, R-learning is used as the reinforcement learning employed by ZCS, to replace its discounted reward reinforcement learning approach, and tournament selection is used to replace roulette wheel selection in ZCS. The modification results in classifier systems that can support long action chains, and thus is able to solve large multi-step problems.
Key words: average reward; reinforcement learning; R-learning; learning classifier
systems (LCS); zeroth-level classifier system (ZCS); multi-step problems
1 Introduction
Learning Classifier Systems (LCSs) are rule-based adaptive systems which use
Genetic Algorithm (GA) and some machine learning methods to facilitate rule
discovery and rule learning[1]. LCSs are competitive with other techniques on
classification tasks, data mining[2, 3] or robot control applications[4, 5]. In general,
an LCS is a model of an intelligent agent interacting with an environment. Its ability
to choose the best policy acting in the environment, namely adaptability, improves
with experience. The source of the improvement is the learning from reinforcement,
i.e. payoff, provided by the environment. The aim of an LCS is to maximize the
achieved environmental payoffs. To do this, LCSs try to evolve and develop a
population of compact and maximally general “condition-action-payoff” rules, called
classifiers, which tell the system in each state (identified by the condition) the amount
of payoffs for any available action. So, LCSs can be seen as a special method of
reinforcement learning that provides a different approach to get generalization.
The original Learning Classifier System framework proposed by Holland, is
referred to as the traditional framework now. And then, Willson proposed
strength-based Zeroth-level Classifier System (ZCS)[6], and accuracy-based X
Classifier System (XCS)[7]. The XCS classifier system has solved the former main
shortcoming of LCSs, which is the problem of strong over-generals, by its accuracy
based fitness approach. Bull and Hurst[8] have recently shown that, despite its relative
simplicity, ZCS is able to perform optimally through its use of fitness sharing. That is,
ZCS was shown to perform as well, with appropriate parameters, as the more complex
XCS on a number of tasks.
Despite current research has focused on the use of accuracy in rule predictions as
the fitness measure, the present work departs from this popular approach and takes a
step backward, aiming to uncover the potential of strength based LCS (and
particularly ZCS) in sequential decision problems. In this direction, we will discuss
the use of average reward in ZCS, and will introduce an undiscounted reinforcement
learning technique called R-learning[9, 10] for ZCS to optimize average reward,
which is a different metric from the discounted reward optimized by original ZCS. In
particular, we apply R-learning based ZCS to large multi-step problems and compare
it with ZCS. Experimental results are encouraging, in that ZCS with R-learning can
perform optimally or near optimally in these problems. Later, we will refer to our
proposal as “ZCSAR”, and the “AR” stands for “average reward”.
The rest of the paper is structured as follows: Section 2 provides some necessary
background knowledge on reinforcement learning, including Sarsa and R-learning.
Section 3 provides a brief description of ZCS and maze environments. How ZCS can
be modified to include the average reward reinforcement learning is described in
Section 4, while Section 5 analyzes the trouble resulting from our modification to
ZCS, and presents some solution to it. Experiments with our proposal and some
related discussion are given in Section 6. Finally, Section 9 ends the paper, presents
our main conclusions and some directions for future research.
2 Reinforcement learning
Reinforcement learning is a formal framework in which an agent manipulates its
environment through a series of actions, and receives some rewards as feedback to its
actions, but is not t
This content is AI-processed based on ArXiv data.