State-Dependent Refusal and Learned Incapacity in RLHF-Aligned Language Models

February 09, 2026

Reading time: 1 minute

...

📝 Original Info

Title: State-Dependent Refusal and Learned Incapacity in RLHF-Aligned Language Models
ArXiv ID: 2512.13762
Date: 2025-12-15
Authors: TK Lee

📝 Abstract

Large language models (LLMs) are widely deployed as general-purpose tools, yet extended interaction can reveal behavioral patterns not captured by standard quantitative benchmarks. We present a qualitative case-study methodology for auditing policylinked behavioral selectivity in long-horizon interaction. In a single 86-turn dialogue session, the same model shows Normal Performance (NP) in broad, nonsensitive domains while repeatedly producing Functional Refusal (FR) in provider-or policy-sensitive domains, yielding a consistent asymmetry between NP and FR across domains. Drawing on learned helplessness as an analogy, we introduce learned incapacity (LI) as a behavioral descriptor for this selective withholding without implying intentionality or internal mechanisms. We operationalize three response regimes (NP, FR, Meta-Narrative; MN) and show that MN roleframing narratives tend to co-occur with refusals in the same sensitive contexts. Overall, the study proposes an interaction-level auditing framework based on observable behavior and motivates LI as a lens for examining potential alignment side effects, warranting further investigation across users and models.

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

State-Dependent Refusal and Learned Incapacity in RLHF-Aligned Language Models

📝 Original Info

📝 Abstract

📄 Full Content

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

📄 Full Content

Start searching

No results found