Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections
Human input has enabled autonomous systems to improve their capabilities and achieve complex behaviors that are otherwise challenging to generate automatically. Recent work focuses on how robots can use such input - like demonstrations or corrections - to learn intended objectives. These techniques assume that the human’s desired objective already exists within the robot’s hypothesis space. In reality, this assumption is often inaccurate: there will always be situations where the person might care about aspects of the task that the robot does not know about. Without this knowledge, the robot cannot infer the correct objective. Hence, when the robot’s hypothesis space is misspecified, even methods that keep track of uncertainty over the objective fail because they reason about which hypothesis might be correct, and not whether any of the hypotheses are correct. In this paper, we posit that the robot should reason explicitly about how well it can explain human inputs given its hypothesis space and use that situational confidence to inform how it should incorporate human input. We demonstrate our method on a 7 degree-of-freedom robot manipulator in learning from two important types of human input: demonstrations of manipulation tasks, and physical corrections during the robot’s task execution.
💡 Research Summary
The paper addresses a fundamental limitation in current robot learning from human input: the implicit assumption that the human’s true objective lies within the robot’s predefined hypothesis space. When this assumption is violated—i.e., the hypothesis space is misspecified—standard methods that maintain a distribution over objectives still fail because they only reason about which hypothesis is correct, not whether any hypothesis is correct.
To overcome this, the authors propose a Bayesian framework that explicitly estimates a “situational confidence” metric (denoted β) reflecting how well the robot’s hypothesis space can explain observed human inputs. The framework works as follows: given a parametrized cost function Cθ, the robot updates a posterior P(θ|data) using the likelihood of the human’s actions under each hypothesis. Simultaneously, β is computed as an aggregate measure of the likelihood across all hypotheses; low β indicates that none of the hypotheses generate the observed input with high probability, signaling misspecification.
The authors instantiate this framework for two prevalent forms of human input: (1) demonstrations, where a human provides a full trajectory for the robot to imitate, and (2) physical corrections, where a human intervenes during execution to adjust the robot’s behavior. For demonstrations, the robot performs a standard Bayesian IRL update and computes β from the overall likelihood of the demonstrated trajectory. For corrections, an online inference scheme based on Kalman‑filter‑like updates is derived, allowing β to be estimated in near‑real‑time as the robot receives force or position adjustments.
Experiments are conducted on a 7‑DoF manipulator. In a “table‑distance” scenario, the human wants the robot to keep objects close to a table, but the robot’s hypothesis space lacks a feature for table distance. The resulting β is low, correctly indicating that the robot cannot explain the human’s intent. A second scenario involves height adjustments via kinesthetic corrections; again, the missing height feature yields low β.
A user study with twelve participants further validates the approach. When β is high, the robot quickly converges to the correct objective and achieves high task success. When β is low, the robot either halts learning or requests additional input, demonstrating that situational confidence can guide adaptive behavior.
The paper also discusses limitations: current experiments use low‑dimensional linear cost models; extending β estimation to high‑dimensional neural‑network hypothesis spaces may be computationally demanding. Moreover, repeated inputs that are individually explainable by some hypothesis can still mask underlying misspecification, making it hard to distinguish noise from systematic gaps. Future work is suggested on (i) integrating other feedback modalities (comparisons, language), (ii) dynamically expanding or contracting the hypothesis space based on β, and (iii) developing strategies to query diverse human inputs to resolve ambiguity.
Overall, the contribution is a principled method for robots to ask “Can I understand this input?” and to quantify the answer, enabling early detection of hypothesis‑space misspecification and more robust human‑robot collaboration.
Comments & Academic Discussion
Loading comments...
Leave a Comment