PlanIt: A Crowdsourcing Approach for Learning to Plan Paths from Large Scale Preference Feedback

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider the problem of learning user preferences over robot trajectories for environments rich in objects and humans. This is challenging because the criterion defining a good trajectory varies with users, tasks and interactions in the environment. We represent trajectory preferences using a cost function that the robot learns and uses it to generate good trajectories in new environments. We design a crowdsourcing system - PlanIt, where non-expert users label segments of the robot’s trajectory. PlanIt allows us to collect a large amount of user feedback, and using the weak and noisy labels from PlanIt we learn the parameters of our model. We test our approach on 122 different environments for robotic navigation and manipulation tasks. Our extensive experiments show that the learned cost function generates preferred trajectories in human environments. Our crowdsourcing system is publicly available for the visualization of the learned costs and for providing preference feedback: \url{http://planit.cs.cornell.edu}

💡 Research Summary

**
The paper introduces PlanIt, a crowdsourcing framework designed to learn user preferences for robot trajectories in human‑rich indoor environments. Traditional robot path planning often relies on handcrafted cost functions that encode safety constraints such as collision avoidance or distance from humans. While necessary, these criteria ignore the nuanced social context of human activities (e.g., watching TV, interacting, working). PlanIt addresses this gap by (1) representing each human activity as a spatial “planning affordance” – a probability distribution over the space surrounding the human‑object pair, and (2) collecting large‑scale, weak preference feedback from non‑expert users via short video clips of robot motion.

The affordance model distinguishes two categories of activities: those where the human and object are close (sitting, working) and those where they are separated (watching, walking). For each activity a cost function Ψ_a(t|E) is defined for a waypoint t in environment E. The cost is factorized into angular preference (modeled with von‑Mises distributions), distance preference (modeled with 1‑D Gaussians), and edge preference (modeled with Beta distributions). Parameters (means, concentrations, variances) are learned from data. The overall trajectory cost is the product of waypoint costs, equivalent to the sum of logarithmic costs, allowing the planner to evaluate an entire path as a scalar desirability score.

PlanIt’s user interface shows videos (typically <15 s) of a PR2 robot navigating a room while humans perform various activities. Users label video segments as “good”, “neutral”, or “bad”. These labels are weak: they do not specify the underlying reason, and they apply only to sub‑trajectories. The learning algorithm treats the activity associated with each labeled segment as a latent variable and employs an EM‑like procedure to jointly infer the latent activity assignments and update the affordance parameters. The loss function enforces that “good” segments receive lower cost than “bad” ones, effectively learning a ranking over trajectories.

The authors generated a dataset of 2,500 trajectories across 122 distinct bedroom and living‑room layouts using OpenRAVE and RRT sampling. After crowdsourcing feedback, they trained the affordance‑based cost model. Evaluation involved two parts: (i) quantitative comparison against baseline cost functions (pure distance‑based, hand‑crafted social costs) using human preference judgments, and (ii) real‑world deployment on a PR2 robot navigating in unseen environments. In both cases, the learned model produced trajectories that significantly reduced interference with human activities (e.g., avoiding the line of sight between a person and a TV, not passing behind a working person) and were consistently preferred by human evaluators.

Key contributions include:

A novel activity‑centric affordance representation that captures angular, distance, and edge preferences relevant to path planning.
A scalable crowdsourcing pipeline that gathers weak, noisy preference data from non‑experts, enabling learning across many environments without expensive expert demonstrations.
Empirical evidence that the learned cost function generalizes to new scenes and improves social compliance of robot navigation compared to existing methods.

Limitations noted by the authors are the inherent noise in weak labels, the current focus on 2‑D navigation (extension to high‑dimensional manipulation remains future work), and the coarse three‑level labeling scheme which may not capture finer‑grained preferences. Future directions include richer multi‑dimensional feedback, integration with dynamic human motion prediction, and coupling the learned cost with other planners (e.g., CHOMP, TrajOpt) for smoother trajectories.

Overall, PlanIt demonstrates that large‑scale, crowd‑sourced preference data can be transformed into a principled, probabilistic cost model that enables robots to navigate in a socially aware manner, bridging the gap between abstract user preferences and concrete trajectory optimization.

PlanIt: A Crowdsourcing Approach for Learning to Plan Paths from Large Scale Preference Feedback

💡 Research Summary

Comments & Academic Discussion

Leave a Comment