A Threshold-based Scheme for Reinforcement Learning in Neural Networks

Reading time: 4 minute
...

📝 Abstract

A generic and scalable Reinforcement Learning scheme for Artificial Neural Networks is presented, providing a general purpose learning machine. By reference to a node threshold three features are described 1) A mechanism for Primary Reinforcement, capable of solving linearly inseparable problems 2) The learning scheme is extended to include a mechanism for Conditioned Reinforcement, capable of forming long term strategy 3) The learning scheme is modified to use a threshold-based deep learning algorithm, providing a robust and biologically inspired alternative to backpropagation. The model may be used for supervised as well as unsupervised training regimes.

💡 Analysis

A generic and scalable Reinforcement Learning scheme for Artificial Neural Networks is presented, providing a general purpose learning machine. By reference to a node threshold three features are described 1) A mechanism for Primary Reinforcement, capable of solving linearly inseparable problems 2) The learning scheme is extended to include a mechanism for Conditioned Reinforcement, capable of forming long term strategy 3) The learning scheme is modified to use a threshold-based deep learning algorithm, providing a robust and biologically inspired alternative to backpropagation. The model may be used for supervised as well as unsupervised training regimes.

📄 Content

A Threshold-based Scheme for Reinforcement Learning in Neural Networks

Thomas H. Ward

thomas.holland.ward@gmail.com

Abstract

A generic and scalable Reinforcement Learning scheme for Artificial Neural Networks is presented,

providing a general purpose learning machine. By reference to a node threshold three features are

described 1) A mechanism for Primary Reinforcement, capable of solving linearly inseparable problems 2)

The learning scheme is extended to include a mechanism for Conditioned Reinforcement, capable of

forming long term strategy 3) The learning scheme is modified to use a threshold-based deep learning

algorithm, providing a robust and biologically inspired alternative to backpropagation. The scheme may be

used for supervised as well as unsupervised training regimes.

1 Introduction

This paper proposes that a general purpose learning machine can be achieved by implementing Reinforcement

Learning in an Artificial Neural Network (ANN), three interdependent methods which attempt to emulate the core

mechanisms of that process are presented. Ultimately the biological plausibility of this scheme may be validated by

reference to natural organisms. However that does not preclude the possibility that there is more than one underlying

mechanism providing Reinforcement Learning in nature.

AI research has characteristically followed a bottom-up approach; focusing on subsystems that address distinct,

specialized and unrelated problem domains. In contrast the work presented follows a distinctly top-down approach

attempting to model intelligence as a whole system; a causal agent interacting with the environment [6]. The agent is

not designed to solve a particular problem, but is instead assigned a reward condition. The reward condition serves

as a goal, and in the path a variety of unknown challenges may be present. To solve these problems efficiently the

agent requires intelligence.

This top-down approach assumes that the core self organizing mechanisms of learning that exist in natural

organisms can be replicated in artificial autonomous agents. These can then be scaled up by endowing the agent with

more resources (sensors, neurons & motors). Given sufficient resources and learning opportunities an agent may

provide an efficient solution to a problem provided one exists. Also given the generalization properties of ANN’s

the agent can provide appropriate responses to novel stimuli.

A distinction is made between supervised, unsupervised and reinforcement training regimes. Supervised learning

regimes use a (human) trainer to assign desired input-output pattern pairings. Unsupervised training regimes are

typically used to cluster a data set into related groups. Reinforcement Learning (RL) may be considered a subtype of

unsupervised training; it is sometimes called learning with a critic rather than learning with a teacher as the feedback

is evaluative (right or wrong) rather than instructive (where a desired output action is prescribed). Significant RL

successes have been achieved with the use of Temporal Difference (TD) methods [5][7], notably Q-learning[2].

1 First a definition of intelligence is required:

The demonstration of beneficial behaviors acquired through learning.

A beneficial action/behavior being one that would result in a positive survival outcome (eg successful feeding,

mating, self preservation) for the agent. For the most part our inherent internal reward systems encourage us to

perform beneficial behaviors, but this is not always the case (eg substance abuse may be rewarding but not

beneficial). The term ‘desirable behavior’ is avoided due to existing usage of the term ‘desired output’ in supervised

learning schemes.

Let’s revise our definition, and expectation, of intelligence:

The demonstration of rewarding behaviors acquired through learning.

Rewarding behaviors/actions will be selected for reinforcement (ie learnt) over non rewarding ones. Rewarding

behaviors are those that allow the agent to achieve the reward condition, thereby ac

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut