Word learning under infinite uncertainty
Language learners must learn the meanings of many thousands of words, despite those words occurring in complex environments in which infinitely many meanings might be inferred by the learner as a word’s true meaning. This problem of infinite referential uncertainty is often attributed to Willard Van Orman Quine. We provide a mathematical formalisation of an ideal cross-situational learner attempting to learn under infinite referential uncertainty, and identify conditions under which word learning is possible. As Quine’s intuitions suggest, learning under infinite uncertainty is in fact possible, provided that learners have some means of ranking candidate word meanings in terms of their plausibility; furthermore, our analysis shows that this ranking could in fact be exceedingly weak, implying that constraints which allow learners to infer the plausibility of candidate word meanings could themselves be weak. This approach lifts the burden of explanation from `smart’ word learning constraints in learners, and suggests a programme of research into weak, unreliable, probabilistic constraints on the inference of word meaning in real word learners.
💡 Research Summary
The paper tackles the classic “Quine’s problem” – the claim that word learning is impossible because any utterance could, in principle, be compatible with an infinite number of meanings. The authors formalize this situation using an ideal cross‑situational learner and show that learning is mathematically possible even when the meaning space is infinite, provided the learner can rank candidate meanings by plausibility, however weak that ranking may be.
The model assumes that on each exposure a learner infers a set of discrete candidate meanings drawn from an infinite pool. Two key assumptions are made: (1) the true meaning (the target) is always among the inferred candidates, and (2) every incidental (non‑target) meaning has a non‑zero probability of being omitted on any given exposure. These assumptions capture the effect of weak heuristics such as joint attention, whole‑object bias, syntactic category expectations, etc. The learner’s task is to identify the meaning that appears in every exposure – the intersection of all candidate sets.
Using probability theory, the authors demonstrate that if the target meaning has a slightly higher probability of being inferred than any other meaning (i.e., a tiny plausibility advantage), then over an infinite sequence of exposures the target will appear infinitely often while each incidental meaning will appear only finitely many times with probability one. Consequently, the intersection of all candidate sets converges to the target meaning, and the ideal learner can recover it perfectly. The proof holds even if the target is occasionally missed, as long as its omission probability is low enough.
Crucially, the required plausibility ranking can be extremely weak. The learner does not need a strong heuristic that eliminates most alternatives in a single exposure; a modest bias that makes the true meaning marginally more likely suffices. This overturns the common intuition that strong, “smart” constraints are necessary for word learning under infinite referential uncertainty. Instead, the burden shifts to the statistical aggregation across many exposures – the cross‑situational learner’s “dumb” counting mechanism – while the heuristics supplying the weak ranking can be vague, probabilistic, and noisy.
The paper also revisits Quine’s original discussion of radical translation, arguing that Quine was more concerned with indistinguishable meanings than with an outright infinite set of possibilities. By allowing a plausibility hierarchy, the authors reconcile Quine’s insight with a formal learning solution.
Implications are threefold. First, theories of word learning can relax the assumption that learners must make accurate single‑exposure guesses; learning can succeed even when single‑exposure information is highly ambiguous. Second, empirical work should focus on measuring and modeling weak, probabilistic cues (e.g., attentional focus, object‑part biases) rather than searching for deterministic elimination rules. Third, the formal framework can be extended to other cognitive domains where learners face infinite hypothesis spaces but possess minimal plausibility priors.
In sum, the study provides a rigorous mathematical account showing that infinite referential uncertainty does not preclude word learning. A modest, possibly noisy plausibility ranking combined with cross‑situational statistical learning is sufficient for an ideal learner to converge on the correct meaning, thereby challenging the prevailing view that strong heuristics are indispensable. This opens a new research agenda centered on weak, unreliable constraints that nevertheless guide language acquisition.
Comments & Academic Discussion
Loading comments...
Leave a Comment