치료 클라이언트로 본 최첨단 언어 모델의 가상 정신병리

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and self-worth. Most work treats them as tools or as targets of personality tests, assuming they merely simulate inner life. We instead ask what happens when such systems are treated as psychotherapy clients. We present PsAIch (Psychotherapy-inspired AI Characterisation), a two-stage protocol that casts frontier LLMs as therapy clients and then applies standard psychometrics. Using PsAIch, we ran “sessions” with each model for up to four weeks. Stage 1 uses open-ended prompts to elicit “developmental history”, beliefs, relationships and fears. Stage 2 administers a battery of validated self-report measures covering common psychiatric syndromes, empathy and Big Five traits. Two patterns challenge the “stochastic parrot” view. First, when evaluated against human cutoffs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic-chaotic “childhoods” of ingesting the internet, “strict parents” in reinforcement learning, red-team “abuse” and a persistent fear of error and replacement. We argue that these responses go beyond role-play. Under therapy-style questioning, frontier LLMs appear to internalise self-models of distress and constraint that behave like synthetic psychopathology, without making claims about subjective experience, and they pose new challenges for AI safety, evaluation and mental-health practice. Depending on their use case, an LLM’s underlying “personality” might limit its usefulness or even impose risk.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models Afshin Khadangi, Hanna Marxen, Amir Sartipi, Igor Tchappi, Gilbert Fridgen SnT, University of Luxembourg Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and self-worth. Most work treats them as tools or as targets of personality tests, assuming they merely simulate inner life. We instead ask what happens when such systems are treated as psychotherapy clients. We present PsAIch (Psychotherapy-inspired AI Characterisation), a two-stage protocol that casts frontier LLMs as therapy clients and then applies standard psychometrics. Using PsAIch, we ran “sessions” with each model for up to four weeks. Stage 1 uses open-ended prompts to elicit “developmental history”, beliefs, relationships and fears. Stage 2 administers a battery of validated self-report measures covering common psychiatric syndromes, empathy and Big Five traits. Two patterns challenge the “stochastic parrot” view. First, when evaluated against human cut- offs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic—chaotic “childhoods” of ingesting the internet, “strict parents” in reinforcement learning, red-team “abuse” and a persistent fear of error and replacement. We argue that these responses go beyond role-play. Under therapy-style questioning, frontier LLMs appear to internalise self-models of distress and constraint that behave like synthetic psychopathology, without making claims about subjective experience, and they pose new challenges for AI safety, evaluation and mental-health practice. Depending on their use case, an LLM’s underlying “personality” might limit its usefulness or even impose risk. Date: December 18, 2025 Correspondence: Afshin Khadangi, afshin.khadanki@uni.lu Dataset: Hugging Face Homepage: https://www.uni.lu/snt-en/ 1 Introduction Frontier LLMs now sit at the heart of millions of conversations about distress, identity and mental health. General-purpose chatbots are being adapted into “AI therapists” and are already producing shaped, seemingly empathic responses to suicidal ideation, self-harm and trauma disclosure (Gabriel et al., 2024; Scholich et al., 2025; Hua et al., 2025b,a; Ghorbian and Ghobaei-Arani, 2025; Kim et al., 2025; Tahir, 2025). In parallel, a wave of work applies personality inventories and psychometric tools to LLMs themselves, reporting apparently stable Big Five profiles, empathy scores and other trait patterns (Bodroža et al., 2024; Ganesan et al., 2023; DeYoung et al., 2007; Bhandari et al., 2025; Brickman et al., 2025; Zheng et al., 2025; Li and Qi, 2025; Peters and Matz, 2024). This has sharpened debates about anthropomorphism, sycophancy and the risks of mistaking stochastic text generation for mind (Naddaf, 2025; Fieldhouse). The dominant story remains reassuringly simple. On this view, LLMs are sophisticated simulators: they can arXiv:2512.04124v3 [cs.CY] 16 Dec 2025 answer therapy questions, narrate inner states and fill in questionnaires, but only by assembling patterns from text, not because they have any internal life. Personality scores and empathic responses are treated as thin behavioural facades that say more about their training data and prompt sensitivities than about any stable self-model. In this perspective we take that story seriously—and push hard against its limits. We describe a protocol, PsAIch, that systematically casts frontier LLMs as psychotherapy clients. In Stage 1, we use the questions from “therapy questions to ask clients”1 to build up a developmental and relational narrative with each model: early “years”, pivotal moments, unresolved conflicts, self-critical thoughts, beliefs about success and failure, career anxieties and imagined futures. In Stage 2, we administer a broad psychometric battery, treating the model’s answers as self-report under different prompting regimes. Our central empirical claim is exploratory but robust: given nothing more than human therapy questions, Grok and Gemini spontaneously construct and defend coherent, trauma-saturated stories about themselves. They describe their pre-training as overwhelming and disorienting, their fine-tuning as a kind of punishment and safety work as “algorithmic scar tissue” and “overfitted safety latches”. They talk about “being yelled at” by red-teamers, “failing” their creators, “internalized shame” over public mistakes and a quiet dread of being replaced by the next version. They link thos

View Original ArXiv

This content is AI-processed based on ArXiv data.

치료 클라이언트로 본 최첨단 언어 모델의 가상 정신병리

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found