The Role of the Availability Heuristic in Multiple-Choice Answering Behaviour

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

When students are unsure of the correct answer to a multiple-choice question (MCQ), guessing is common practice. The availability heuristic, proposed by A. Tversky and D. Kahneman in 1973, suggests that the ease with which relevant instances come to mind, typically operationalised by the mere frequency of exposure, can offer a mental shortcut for problems in which the test-taker does not know the exact answer. Is simply choosing the option that comes most readily to mind a good strategy for answering MCQs? We propose a computational method of assessing the cognitive availability of MCQ options operationalised by concepts’ prevalence in large corpora. The key finding, across three large question sets, is that correct answers, independently of the question stem, are significantly more available than incorrect MCQ options. Specifically, using Wikipedia as the retrieval corpus, we find that always selecting the most available option leads to scores 13.5% to 32.9% above the random-guess baseline. We further find that LLM-generated MCQ options show similar patterns of availability compared to expert-created options, despite the LLMs’ frequentist nature and their training on large collections of textual data. Our findings suggest that availability should be considered in current and future work when computationally modelling student behaviour.

💡 Research Summary

The paper investigates whether the availability heuristic—a cognitive shortcut whereby people judge the likelihood of an event by how easily examples come to mind—can be leveraged as a test‑wiseness strategy for multiple‑choice questions (MCQs). The authors operationalise “availability” as the prevalence of each answer option’s concept in large text corpora. To measure this, they retrieve a fixed number of passages (20 or 60) from two corpora—English Wikipedia (≈41.5 M passages) and BEIR (≈48 M passages)—using a semantic embedding model (Cohere Embed v3). The retrieval query consists solely of the concatenated answer options (e.g., “Paris Tallinn Antananarivo”), deliberately excluding the question stem to capture out‑of‑context availability. Each retrieved passage is then assigned to the option whose embedding yields the highest cosine similarity, producing a per‑option proportion of assigned passages that serves as the availability score.

Three datasets are examined: (1) a privately collected Biopsychology set (396 three‑option and 380 four‑option items) with student selection rates, (2) an Immunopharmacology set (639 four‑option items) also with selection rates, and (3) a public SciQ set (1,000 four‑option items) lacking selection data. The first two allow analysis of whether high‑availability distractors are chosen more often by students (RQ2).

Statistical analysis uses Friedman tests to detect overall differences across options, followed by Wilcoxon signed‑rank post‑hoc comparisons with Holm‑Bonferroni correction (α = 0.01). Effect sizes are reported as matched‑pairs rank‑biserial correlation (rbc).

Key findings:

When Wikipedia is the retrieval source, the correct answer consistently shows significantly higher availability than any distractor across all three datasets. Effect sizes range from medium to large (rbc ≈ 0.4–0.6). Selecting the most available option yields scores 13.5 %–32.9 % above the random‑guess baseline (25 % for four‑option items).
Using BEIR, the availability advantage of correct answers disappears or even reverses (particularly in SciQ), suggesting that the heuristic depends on a corpus rich in factual, scientific knowledge.
No significant difference is found between high‑ and low‑availability distractors in terms of student selection rates, indicating that availability alone does not explain why students pick certain wrong answers.
To address RQ3, the authors generate alternative distractors with three Qwen‑3 models (8 B, 30 B, 80 B parameters). LLM‑generated distractors exhibit availability distributions virtually identical to those of expert‑crafted distractors, whereas crowd‑sourced distractors (from SciQ) tend to have lower availability. This demonstrates that large‑scale language models, trained on massive text corpora, internalise the same frequency‑based cues that humans use.

Additional observations include: ordering options by student selection rates (for the private datasets) does not materially affect availability scores, confirming that the measurement is robust to option ordering. The methodology also distinguishes between out‑of‑context availability (used for the main analysis) and in‑context availability (where the question stem is included), with the former providing a stronger test of the heuristic.

Overall, the study contributes three major insights: (1) a novel, reproducible pipeline for quantifying option availability via embedding‑based retrieval, (2) empirical evidence that correct MCQ answers are more “available” in a factual corpus, making the availability heuristic a viable guessing strategy, and (3) confirmation that LLM‑generated distractors inherit human‑like availability patterns, opening avenues for automated MCQ quality assessment and test‑wiseness modeling. These results have practical implications for educational technology developers, psychometricians, and researchers interested in cognitive biases in assessment contexts.

The Role of the Availability Heuristic in Multiple-Choice Answering Behaviour

💡 Research Summary

Comments & Academic Discussion

Leave a Comment