A Threshold-based Scheme for Reinforcement Learning in Neural Networks
📝 Abstract
A generic and scalable Reinforcement Learning scheme for Artificial Neural Networks is presented, providing a general purpose learning machine. By reference to a node threshold three features are described 1) A mechanism for Primary Reinforcement, capable of solving linearly inseparable problems 2) The learning scheme is extended to include a mechanism for Conditioned Reinforcement, capable of forming long term strategy 3) The learning scheme is modified to use a threshold-based deep learning algorithm, providing a robust and biologically inspired alternative to backpropagation. The model may be used for supervised as well as unsupervised training regimes.
💡 Analysis
A generic and scalable Reinforcement Learning scheme for Artificial Neural Networks is presented, providing a general purpose learning machine. By reference to a node threshold three features are described 1) A mechanism for Primary Reinforcement, capable of solving linearly inseparable problems 2) The learning scheme is extended to include a mechanism for Conditioned Reinforcement, capable of forming long term strategy 3) The learning scheme is modified to use a threshold-based deep learning algorithm, providing a robust and biologically inspired alternative to backpropagation. The model may be used for supervised as well as unsupervised training regimes.
📄 Content
A Threshold-based Scheme for Reinforcement Learning in Neural Networks
Thomas H. Ward
Abstract
A generic and scalable Reinforcement Learning scheme for Artificial Neural Networks is presented,
providing a general purpose learning machine. By reference to a node threshold three features are
described 1) A mechanism for Primary Reinforcement, capable of solving linearly inseparable problems 2)
The learning scheme is extended to include a mechanism for Conditioned Reinforcement, capable of
forming long term strategy 3) The learning scheme is modified to use a threshold-based deep learning
algorithm, providing a robust and biologically inspired alternative to backpropagation. The scheme may be
used for supervised as well as unsupervised training regimes.
1 Introduction
This paper proposes that a general purpose learning machine can be achieved by implementing Reinforcement
Learning in an Artificial Neural Network (ANN), three interdependent methods which attempt to emulate the core
mechanisms of that process are presented. Ultimately the biological plausibility of this scheme may be validated by
reference to natural organisms. However that does not preclude the possibility that there is more than one underlying
mechanism providing Reinforcement Learning in nature.
AI research has characteristically followed a bottom-up approach; focusing on subsystems that address distinct,
specialized and unrelated problem domains. In contrast the work presented follows a distinctly top-down approach
attempting to model intelligence as a whole system; a causal agent interacting with the environment [6]. The agent is
not designed to solve a particular problem, but is instead assigned a reward condition. The reward condition serves
as a goal, and in the path a variety of unknown challenges may be present. To solve these problems efficiently the
agent requires intelligence.
This top-down approach assumes that the core self organizing mechanisms of learning that exist in natural
organisms can be replicated in artificial autonomous agents. These can then be scaled up by endowing the agent with
more resources (sensors, neurons & motors). Given sufficient resources and learning opportunities an agent may
provide an efficient solution to a problem provided one exists. Also given the generalization properties of ANN’s
the agent can provide appropriate responses to novel stimuli.
A distinction is made between supervised, unsupervised and reinforcement training regimes. Supervised learning
regimes use a (human) trainer to assign desired input-output pattern pairings. Unsupervised training regimes are
typically used to cluster a data set into related groups. Reinforcement Learning (RL) may be considered a subtype of
unsupervised training; it is sometimes called learning with a critic rather than learning with a teacher as the feedback
is evaluative (right or wrong) rather than instructive (where a desired output action is prescribed). Significant RL
successes have been achieved with the use of Temporal Difference (TD) methods [5][7], notably Q-learning[2].
1 First a definition of intelligence is required:
The demonstration of beneficial behaviors acquired through learning.
A beneficial action/behavior being one that would result in a positive survival outcome (eg successful feeding,
mating, self preservation) for the agent. For the most part our inherent internal reward systems encourage us to
perform beneficial behaviors, but this is not always the case (eg substance abuse may be rewarding but not
beneficial). The term ‘desirable behavior’ is avoided due to existing usage of the term ‘desired output’ in supervised
learning schemes.
Let’s revise our definition, and expectation, of intelligence:
The demonstration of rewarding behaviors acquired through learning.
Rewarding behaviors/actions will be selected for reinforcement (ie learnt) over non rewarding ones. Rewarding
behaviors are those that allow the agent to achieve the reward condition, thereby ac
This content is AI-processed based on ArXiv data.