RLSLM: A Hybrid Reinforcement Learning Framework Aligning Rule-Based Social Locomotion Model with Human Social Norms

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

Navigating human-populated environments without causing discomfort is a critical capability for socially-aware agents. While rule-based approaches offer interpretability through predefined psychological principles, they often lack generalizability and flexibility. Conversely, data-driven methods can learn complex behaviors from large-scale datasets, but are typically inefficient, opaque, and difficult to align with human intuitions. To bridge this gap, we propose RLSLM, a hybrid Reinforcement Learning framework that integrates a rule-based Social Locomotion Model, grounded in empirical behavioral experiments, into the reward function of a reinforcement learning framework. The social locomotion model generates an orientation-sensitive social comfort field that quantifies human comfort across space, enabling socially aligned navigation policies with minimal training. RLSLM then jointly optimizes mechanical energy and social comfort, allowing agents to avoid intrusions into personal or group space. A human-agent interaction experiment using an immersive VR-based setup demonstrates that RLSLM outperforms state-of-the-art rule-based models in user experience. Ablation and sensitivity analyses further show the model’s significantly improved interpretability over conventional data-driven methods. This work presents a scalable, human-centered methodology that effectively integrates cognitive science and machine learning for real-world social navigation.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

Moving around human-populated environments without causing discomfort is an essential requirement for social agents, since they are widely engaged in human-agent interaction (Sheridan 2016). Such socially-aware navigation entails consideration of multiple social factors and remains a highly challenging problem (Francis et al. 2025).

Existing work on socially-aware navigation can be broadly classified into two categories, rule-based and datadriven. Rule-based approaches typically adopt models with identified variables and interpretable, quantifiable principles, like proxemics (Chen, Zhang, and Zou 2018) and velocity (Kim et al. 2015), either derived from social psychology or manually designed. Although these models show strength in interpretability and low computational overhead, they are often (1) difficult to quantify precisely, (2) limited in generalizability across environments, and (3) less flexible, which may lead to unnatural behaviors like oscillatory paths (Kretzschmar et al. 2016), ultimately constraining their realworld applicability.

Meanwhile, data-driven methods, such as reinforcement learning (RL) (Wang et al. 2024) and imitation learning (Karnan et al. 2022), have enabled agents to emulate human navigation behaviors based on large-scale human trajectory datasets (Kapoor et al. 2023;Terry et al. 2021) or simulation environments (Manso et al. 2020;Tsoi et al. 2020;Vuong et al. 2024). Although these approaches have achieved promising results, they are (1) highly dependent on the quality of the dataset, (2) expensive to train, and (3) often lack interpretability or alignment with human intuitions. With insufficient prior knowledge to guide the training, datadriven methods are often inefficient and prone to pitfalls.

Therefore, an important question arises: can these two approaches be integrated to develop models that are efficient, adaptable, and interpretable-while remaining aligned with real-world human social behavior? To address this, we propose RLSLM, a hybrid framework that integrates a computational social locomotion model derived from psychological research (Zhou et al. 2022) into the reward structure of an RL agent. Based on well-controlled behavioral experiments, the rule-based social locomotion model computes an orientation-sensitive, asymmetric discomfort field that covers the entire navigation area, with higher field values indicating a greater amount of discomfort that the agent may cause to others when passing that point. By incorporating this rule-based model into a multi-objective RL framework to jointly minimize mechanical energy and social discomfort, we enable the agent to learn complex socially aligned rules within a small number of training epochs, such as avoiding invasion of personal space and social groups.

We further compared RLSLM with two rule-based models using human comfort ratings. The results demonstrate that our framework significantly outperforms these baselines in terms of users’ comfort.

In summary, this work contributes: • A novel hybrid RL framework that integrates a psychologically grounded social locomotion model into reinforcement learning, combining the interpretability and prior knowledge of rule-based methods with the adaptability and expressiveness of data-driven approaches. This framework is potentially generalizable and applicable in other scenarios with similarly scarce data. • Performance breakthrough in user comfort: RLSLM achieves a mean comfort rating of 4.21/5, significantly outperforming the best rule-based baseline (∆rating = 1.12, Bonferroni corrected post-hoc comparisons, P < 0.001). This establishes a new Pareto frontier in the tradeoff between comfort and efficiency.

Recent studies have explored the incorporation of social rules into navigation algorithms. The design of social rules modules is mostly driven by intuition, dataset statistics, or physical modeling of human path planning. Static properties like the proper radius of personal space are often determined by intuition and experience in previous studies (Gong et al. 2025). Qualitative navigation decisions (e.g. passing on the left or right when encountering others) and trajectory features that facilitate path prediction can be learned from real-world pedestrians datasets (Kretzschmar et al. 2016).

To support dynamic path planning in human-populated scenarios and avoid collision, physical-based models like the social force model have been developed to simulate particlelike motion of the crowd (Helbing and Molnar 1995;Shiomi et al. 2014), often corresponding to intuitive geometric relations instead of real pedestrians’ movements (Chen et al. 2018). In conclusion, although a great number of navigation studies have taken social rules into account, most of them are not quantitatively grounded in human behavioral experiments, which can lead to the generation of unnatural paths (Chen et al. 2016). Notably, a recent study determines social-aware parameters (e.g. neighbor distance) t

View Original ArXiv

This content is AI-processed based on ArXiv data.

RLSLM: A Hybrid Reinforcement Learning Framework Aligning Rule-Based Social Locomotion Model with Human Social Norms

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found