Q-Learning with Basic Emotions
📝 Abstract
Q-learning is a simple and powerful tool in solving dynamic problems where environments are unknown. It uses a balance of exploration and exploitation to find an optimal solution to the problem. In this paper, we propose using four basic emotions: joy, sadness, fear, and anger to influence a Qlearning agent. Simulations show that the proposed affective agent requires lesser number of steps to find the optimal path. We found when affective agent finds the optimal path, the ratio between exploration to exploitation gradually decreases, indicating lower total step count in the long run
💡 Analysis
Q-learning is a simple and powerful tool in solving dynamic problems where environments are unknown. It uses a balance of exploration and exploitation to find an optimal solution to the problem. In this paper, we propose using four basic emotions: joy, sadness, fear, and anger to influence a Qlearning agent. Simulations show that the proposed affective agent requires lesser number of steps to find the optimal path. We found when affective agent finds the optimal path, the ratio between exploration to exploitation gradually decreases, indicating lower total step count in the long run
📄 Content
7th IEEE International Conference Humanoid, Nanotechnology, Information Technology
Communication and Control, Environment and Management (HNICEM)
The Institute of Electrical and Electronics Engineers Inc. (IEEE) – Philippine Section
12-16 November 2013 Hotel Centro, Puerto Princesa, Palawan, Philippines
Q-Learning with Basic Emotions
Wilfredo Badoy Jr. Department of Information System and Computer Science Ateneo de Manila University Quezon City, Philippines wbadoy@yahoo.com Kardi Teknomo Department of Information System and Computer Science Ateneo de Manila University Quezon City, Philippines teknomo@gmail.com
Abstract—Q-learning is a simple and powerful tool in solving dynamic problems where environments are unknown. It uses a balance of exploration and exploitation to find an optimal solution to the problem. In this paper, we propose using four basic emotions: joy, sadness, fear, and anger to influence a Q- learning agent. Simulations show that the proposed affective agent requires lesser number of steps to find the optimal path. We found when affective agent finds the optimal path, the ratio between exploration to exploitation gradually decreases, indicating lower total step count in the long run. Index Terms— intelligent agent, affective computing, navigation, emotions. I. INTRODUCTION Imagine a world where humans and robots are indistinguishable from each other. Where robots interact with us like normal people do. Wouldn’t it be nice if they can feel our emotions, and in turn they can exhibit emotions of their own? Although that is very far in the future, steps have been made in to that direction. Several researches have been done combining computer learning with affect even in the early 80’s. These experiments where usually set in a controlled physical environment with a mechanical robot that is fed with goals and moved through the environment using some sort of reinforcement learning procedure. These robots can either accept additional rewards through human intervention or from their own actions. Today, similar experiments can been done on discrete environments with artificial virtual agents. Robots are commonplace today in industries, they assist humans with work that humans could never do physically, such as lifting heavy parts or working in hostile environments. As these robots will leave the realm of the industry into our homes and workplaces, it is important that we interact with them efficiently and that is comfortable for both humans and agents. Humans not just interact with information but also with emotion. In fact, negative emotion enhances memory accuracy [1] and positive emotion broaden our scope of attention [2]. Different affect models have been used in the past to incorporate some sort of emotion in the learning process of robots and agents. Arousal and pleasure factors have been used to influence and agent’s movement. Our paper incorporates higher level emotions such as fear, anger, sadness, and joy in agent learning, specifically Q- learning partially based on Korsten and Taylor’s model [5]. Our main purpose is to investigate the difference or similarity in the number of exploration steps between a normal q-learning agent versus an agent whose decision is based on circumstances where it can mimic joy, sadness, fear, and/or anger. Two main performance indicators will be explored: the average number of steps per episode, and the average number of steps until the optimal path is found. Equation 1 is the Temporal Difference update equation for Q-learning
ܳ௧ାଵሺݏ௧, ܽ௧ሻ՚ ܳሺݏ௧, ܽ௧ሻ ߙሾݎ௧ାଵγ max
ܳሺݏ௧ାଵ, ܽሻെܳሺݏ௧, ܽ௧ሻሿ
(1)
where ݏ௧ is the current state, and ܽ௧ is the action taken; ܽ௧ is
also the next state ݏ௧ାଵ. A new quality value ܳ௧ାଵሺݏ௧, ܽ௧ሻ is
calculated from the old value ܳሺݏ௧, ܽ௧ሻ and a correction. The
correction is based on learning rate ߙ which determines to what
extent the newly acquired information will override the old
information, the reward value ݎ௧ାଵ which is observed after
performing ܽ௧ in ݏ௧, the discount factor ߛ which determines the
importance of future rewards, and maxܳሺݏ௧ାଵ, ܽሻ the
maximum ܳ for all state-action pairs of the next state.
We used the ߝ-greedy algorithm as stated in [6] for its
action selection policy. In this policy, most of the time they
choose an action that has maximal estimated reward value, but
with a probability ߝ, they instead select an action at random.
Learning rate ߙ and discount rate ߛ will be held at a constant
rate, while only the ߝ-greedy parameter ߝ will be made to vary
for investigation.
Since ߝ does not govern the action of the proposed agent, it
will be the only parameter that should differ in the number of
steps taken by the normal agent and the proposed agent. The
number of steps taken by the proposed agent is unaffected by
changes in ߝ, while the normal agent’s number of steps vary
with the change in ߝ. Only four basic emotion
This content is AI-processed based on ArXiv data.