Evolution of Fear and Social Rewards in Prey-Predator Relationship
Fear is a critical brain function that enables us to learn to avoid danger via reinforcement learning (RL). While many researchers have argued that fear has evolved to escape predators, how varying predatory pressures have shaped fear and other rewards, including positive social rewards for collective grouping, remains an open question. In this study, we investigate the relationship between predatory pressure and fear using an evolutionary simulation of RL agents with evolving rewards. In our simulation, prey and predator RL agents co-evolve their reward functions, including visual rewards for observing prey and predators. While fear-like negative visual rewards for predators often evolved in prey, we also observed cases in which positive rewards for both predators and prey evolved, the latter serving as a social reward for collective grouping. A comparison between different environmental conditions revealed that stronger predator hunting capability promoted stronger fear reward, while less food supply promoted more negative social reward. Moreover, fear did not evolve in response to static pitfalls with non-lethal damage, suggesting that actively hunting predators played an important role in its evolution. These results highlight the special role of predators in the diverse evolution of fear and social rewards.
💡 Research Summary
The paper investigates how predatory pressure shapes the evolution of fear and social rewards by employing an evolutionary reinforcement‑learning (RL) simulation in which prey and predator agents co‑evolve their reward functions. Each agent is a circular body in a two‑dimensional rigid‑body physics world. Prey are smaller, can consume green food items within a 120° forward visual cone, and possess 32 proximity sensors plus 18 tactile sensors. Predators are larger, cannot eat food, and hunt prey within a configurable mouth angle (40°–80°). Both species use a three‑layer multilayer perceptron (64 hidden units) to map sensor inputs to two motor forces; policies are learned via Proximal Policy Optimization (PPO).
Reward signals are linear combinations of genetically inherited weights: w_eat (food intake), w_act (action cost/benefit), w_prey (social reward for detecting conspecifics), and w_pred (reward for detecting predators). These weights mutate (Student‑t noise) and are clipped to
Comments & Academic Discussion
Loading comments...
Leave a Comment