Behavioral Indicators of Overreliance During Interaction with Conversational Language Models

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

LLMs are now embedded in a wide range of everyday scenarios. However, their inherent hallucinations risk hiding misinformation in fluent responses, raising concerns about overreliance on AI. Detecting overreliance is challenging, as it often arises in complex, dynamic contexts and cannot be easily captured by post-hoc task outcomes. In this work, we aim to investigate how users’ behavioral patterns correlate with overreliance. We collected interaction logs from 77 participants working with an LLM injected plausible misinformation across three real-world tasks and we assessed overreliance by whether participants detected and corrected these errors. By semantically encoding and clustering segments of user interactions, we identified five behavioral patterns linked to overreliance: users with low overreliance show careful task comprehension and fine-grained navigation; users with high overreliance show frequent copy-paste, skipping initial comprehension, repeated LLM references, coarse locating, and accepting misinformation despite hesitation. We discuss design implications for mitigation.

💡 Research Summary

The paper tackles the problem of users over‑relying on conversational large language models (LLMs) such as ChatGPT, Gemini, or Claude. While these models are increasingly embedded in everyday tasks, their propensity to hallucinate or inject misinformation can lead users to accept false information, compromising the quality of human‑AI collaboration. Traditional research measures over‑reliance by comparing final task outcomes with and without AI assistance, but this outcome‑centric approach overlooks the interaction process that actually drives those outcomes.

To fill this gap, the authors conducted a controlled laboratory study with 77 participants from two universities. Each participant completed three realistic tasks (information‑search, writing, and planning) while interacting with a conversational LLM whose responses were deliberately seeded with plausible misinformation (simulating hallucinations, outdated facts, or prompt‑injection attacks). Over‑reliance was operationalized as the degree to which participants’ final submissions incorporated the injected false information.

During the experiment, the system logged fine‑grained interaction events—mouse clicks, scrolls, keypresses, copy‑paste actions—producing time‑ordered “action sequences.” The authors encoded these sequences using a state‑of‑the‑art sequence‑aware model (e.g., a transformer‑based encoder) and applied density‑based clustering (DBSCAN/HDBSCAN) to discover recurring behavioral patterns. Five distinct clusters emerged, each representing a behavioral archetype that correlates with the measured over‑reliance level:

Careful task comprehension & fine‑grained navigation – participants read the task description thoroughly, formulated precise queries, and edited LLM output selectively. This pattern aligns with low over‑reliance.
Frequent copy‑paste of entire LLM responses – users copied whole answers without editing, leading to high propagation of misinformation.
Skipping initial comprehension, repeated LLM references – participants bypassed the task brief, asked follow‑up questions, and repeatedly consulted the LLM even after moments of hesitation, ultimately trusting the model.
Coarse, visually‑driven navigation – interaction was dominated by large‑scale scrolling and clicks on prominent UI elements, with little detailed inspection.
Hesitation yet acceptance of misinformation – observable pauses (delayed keystrokes) were followed by acceptance of the false content, indicating a conflict between System 1 intuition and System 2 deliberation that resolves in favor of the former.

Mapping these patterns onto dual‑process cognitive theory, the authors argue that clusters 1 and 2 reflect System 2 (slow, analytical) processing, whereas clusters 3‑5 reflect System 1 (fast, intuitive) processing. Post‑experiment strategy reports from participants corroborated these interpretations, showing high concordance between self‑reported strategies and observed behavior clusters.

The study’s methodological contribution lies in demonstrating that low‑level interaction logs can serve as reliable proxies for over‑reliance, enabling real‑time detection. Design implications include:

Real‑time risk detection – monitoring metrics such as copy‑paste frequency, scroll depth, and repeated LLM calls can flag potential over‑reliance moments.
Adaptive mitigation – upon detection, the UI could inject verification prompts, surface source citations, or automatically route LLM answers through fact‑checking APIs, thereby nudging users toward more critical evaluation.
Forced comprehension steps – embedding mandatory summarization or note‑taking phases before allowing LLM assistance could reduce the tendency to skip initial understanding, mitigating System 1‑driven over‑reliance.

In sum, the paper provides a process‑oriented framework that links observable interaction behaviors to over‑reliance on conversational LLMs. By supplying a publicly released dataset linking behavior to quantitative over‑reliance scores, and by offering a robust clustering pipeline, the work lays groundwork for future systems that can detect and counteract over‑reliance on the fly, ultimately improving the safety and effectiveness of human‑AI collaboration.

Behavioral Indicators of Overreliance During Interaction with Conversational Language Models

💡 Research Summary

Comments & Academic Discussion

Leave a Comment