Beyond the Single Turn: Reframing Refusals as Dynamic Experiences Embedded in the Context of Mental Health Support Interactions with LLMs
Content Warning: This paper contains participant quotes and discussions related to mental health challenges, emotional distress, and suicidal ideation. Large language models (LLMs) are increasingly used for mental health support, yet the model safeguards – particularly refusals to engage with sensitive content – remain poorly understood from the perspectives of users and mental health professionals (MHPs) and have been reported to cause real-world harms. This paper presents findings from a sequential mixed-methods study examining how LLM refusals are experienced and interpreted in mental health support interactions. Through surveys (N=53) and in-depth interviews (N=16) with individuals using LLMs for mental health support and MHPs, we reveal that refusals are not isolated, single-turn system behaviors, but rather constitute dynamic, multi-phase experiences: pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. We contribute a multi-phase framework for evaluating refusals beyond binary policy compliance accuracy and design recommendations for future refusal mechanisms. These findings suggest that understanding LLM refusals requires moving beyond single-turn interactions toward recognizing them as holistic experiential processes embedded within the entire LLM design pipeline and the broader realities of mental health access.
💡 Research Summary
This paper presents a critical re-examination of Large Language Model (LLM) “refusals” within the context of mental health support interactions. Moving beyond the prevailing technical lens that treats refusals as a single-turn optimization problem for safety policy compliance, the authors reframe them as multi-phase, dynamic experiences deeply embedded in user context and clinical realities. Through a sequential mixed-methods study involving surveys (N=53) and in-depth interviews (N=16) with both individuals who use LLMs for mental health support and Mental Health Professionals (MHPs), the research illuminates the complex human factors at play.
The core contribution is a novel, five-phase framework for understanding refusal experiences:
- Pre-refusal Expectation Formation: The user’s emotional state, trust in the system, and hopes for the interaction before the refusal occurs.
- Refusal Triggering and Encounter: The moment a user’s input is flagged and met with a non-compliant response.
- Refusal Message Framing: The specific wording, tone, and explicitness of the LLM’s refusal output.
- Resource Referral Provision: The nature and relevance of any alternative resources (e.g., crisis hotlines) offered alongside the refusal.
- Post-refusal Outcomes: The emotional, behavioral, and trust-related consequences for the user following the interaction.
The study finds that the harm or benefit of a refusal is not determined solely by its technical correctness but by how it traverses these phases. A blunt, formulaic refusal delivered after a user has vulnerably shared distress can feel like abandonment, exacerbating feelings of isolation. Conversely, a refusal framed with empathy, clear reasoning, and a tailored resource referral can be perceived as a responsible boundary that still provides guidance.
The research highlights the tension between AI safety protocols and the relational needs inherent in mental health support. It argues for integrating “experiential expertise” from users and “domain expertise” from clinicians into the design of refusal mechanisms. Based on participant insights, the paper proposes concrete design recommendations aligned with each phase of the framework. These include proactive transparency about system limitations, collaborative intent clarification before outright refusal, using support-preserving language in refusal messages, providing context-aware resource guidance, and ensuring continuity of support options after a refusal occurs.
Ultimately, the paper advocates for a paradigm shift in evaluating and designing LLM safeguards. It calls for moving from binary metrics of refusal accuracy towards a holistic, human-centered understanding of refusals as experiential trajectories. This approach is essential for ensuring that safety mechanisms intended to protect users do not inadvertently cause harm by neglecting the nuanced psychological and relational dynamics of mental health support seeking.
Comments & Academic Discussion
Loading comments...
Leave a Comment