Posts | KOINEU

Emergent Introspective Awareness in Large Language Models

We investigate whether large language models can introspect on their internal states. It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model s activations, and measuring the influence of these manipulations on the model s self-reported states. We find that models can, in certain scenarios, notice the presence of injected concepts and accurately identify them. Models demonstrate some ability to recall prior internal representations and distinguish them from raw text inputs. Strikingly, we find that some models can use their ability to recall prior intentions in order to distinguish their own outputs from artificial prefills. In all these experiments, Claude Opus 4 and 4.1, the most capable models we tested, generally demonstrate the greatest introspective awareness; however, trends across models are complex and sensitive to post-training strategies. Finally, we explore whether models can explicitly control their internal representations, finding that models can modulate their activations when instructed or incentivized to think about a concept. Overall, our results indicate that current language models possess some functional introspective awareness of their own internal states. We stress that in today s models, this capacity is highly unreliable and context-dependent; however, it may continue to develop with further improvements to model capabilities.

Emergent Introspective Awareness in Large Language Models

Empower Low-Altitude Economy Reliability-Aware Dynamic Weighting for Multi-modal UAV Beam Prediction

Engineering Attack Vectors and Detecting Anomalies in Additive Manufacturing

Enhancing Histopathological Image Classification via Integrated HOG and Deep Features with Robust Noise Performance

Enhancing Object Detection with Privileged Information A Model-Agnostic Teacher-Student Approach

Enhancing Retrieval-Augmented Generation with Topic-Enriched Embeddings A Hybrid Approach Integrating Traditional NLP Techniques

Entropy-Adaptive Fine-Tuning Resolving Confident Conflicts to Mitigate Forgetting

EscherVerse An Open World Benchmark and Dataset for Teleo-Spatial Intelligence with Physical-Dynamic and Intent-Driven Understanding

Evaluating Contextual Intelligence in Recyclability A Comprehensive Study of Image-Based Reasoning Systems

Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning

Evaluating the Impact of Compression Techniques on the Robustness of CNNs under Natural Corruptions

Evaluating the Problem-Solving Abilities of LLMs on Underrepresented Mathematics Competition Problems

EverMemOS A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning

Evolving CNN Architectures From Custom Designs to Deep Residual Models for Diverse Image Classification and Detection Tasks

Evolving, Not Training Zero-Shot Reasoning Segmentation via Evolutionary Prompting

Explaining Why Things Go Where They Go Interpretable Constructs of Human Organizational Preferences

Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models

Exploring Diversity, Novelty, and Popularity Bias in ChatGPT s Recommendations

Exploring the Performance of Large Language Models on Subjective Span Identification Tasks

Exposing Hidden Interfaces LLM-Guided Type Inference for Reverse Engineering macOS Private Frameworks

F2IDiff Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model

FALCON Few-Shot Adversarial Learning for Cross-Domain Medical Image Segmentation

Falcon-H1R Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling

Fast and Realistic Automated Scenario Simulations and Reporting for an Autonomous Racing Stack

FAST-IDS A Fast Two-Stage Intrusion Detection System with Hybrid Compression for Real-Time Threat Detection

FedSCAM Scam-resistant SAM for Robust Federated Optimization in Heterogeneous Environments

FedSecureFormer A Fast, Federated and Secure Transformer Framework for Lightweight Intrusion Detection in Connected and Autonomous Vehicles

Flow Equivariant World Models Memory for Partially Observed Dynamic Environments

FormationEval, an open multiple-choice benchmark for petroleum geoscience

From Building Blocks to Planning Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning

Generating Diverse TSP Tours via a Combination of Graph Pointer Network and Dispersion

Generative Classifiers Avoid Shortcut Solutions

Geometric and Dynamic Scaling in Deep Transformers

Geometric Regularization in Mixture-of-Experts The Disconnect Between Weights and Activations

Geometric Structural Knowledge Graph Foundation Model

Geometry of Reason Spectral Signatures of Valid Mathematical Reasoning

HaineiFRDM Exploring Diffusion for Restoring Defects in High-Speed Films

HanoiWorld A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller

Harm in AI-Driven Societies An Audit of Toxicity Adoption on Chirper.ai

Hear the Heartbeat in Phases Physiologically Grounded Phase-Aware ECG Biometrics

Heterogeneity in Multi-Agent Reinforcement Learning

HFedMoE Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts

Higher-Order Action Regularization in Deep Reinforcement Learning From Continuous Control to Building Energy Management

HiGR Efficient Generative Slate Recommendation via Hierarchical Planning and Multi-Objective Preference Alignment

HOLOGRAPH Active Causal Discovery via Sheaf-Theoretic Alignment of Large Language Model Priors

HyperCLOVA X 8B Omni

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

Improving Code-Switching Speech Recognition with TTS Data Augmentation

Improving Scientific Document Retrieval with Academic Concept Index

In Line with Context Repository-Level Code Generation via Context Inlining

< Category Statistics (Total: 301) >

Start searching

No results found