Cognitively-Inspired Episodic Memory Architectures for Accurate and Efficient Character AI

Reading time: 5 minute
...

📝 Abstract

Large language models show promise for embodying historical characters in dialogue systems, but existing approaches face a critical trade-off: simple retrieval-augmented generation produces shallow responses, while multi-stage reflection achieves depth at prohibitive latency. We present an architecture that resolves this tension through offline data augmentation and efficient parallel retrieval from structured episodic memory. Our system transforms biographical data into 1,774 enriched first-person memories with affective-semantic metadata, then employs two-stage retrieval achieving 0.52s prompt generation. Evaluation using LLM-as-judge and RAGAs metrics shows our approach achieves parity with traditional RAG on GPT-4 while significantly outperforming it on smaller models (GPT-3.5, GPT-3), suggesting particular value for resource-constrained deployments. Beyond dialogue, the structured memory enables novel visualization tools: spatiotemporal heatmaps, emotional trajectory analysis, and interactive path tracking, positioning the system as both a dialogue interface and research tool for biographical analysis. We use Van Gogh as a test case, but the architecture is generalizable to any historical figure with substantial textual records, offering a practical framework for educational, museum, and research applications requiring both accuracy and efficiency

💡 Analysis

Large language models show promise for embodying historical characters in dialogue systems, but existing approaches face a critical trade-off: simple retrieval-augmented generation produces shallow responses, while multi-stage reflection achieves depth at prohibitive latency. We present an architecture that resolves this tension through offline data augmentation and efficient parallel retrieval from structured episodic memory. Our system transforms biographical data into 1,774 enriched first-person memories with affective-semantic metadata, then employs two-stage retrieval achieving 0.52s prompt generation. Evaluation using LLM-as-judge and RAGAs metrics shows our approach achieves parity with traditional RAG on GPT-4 while significantly outperforming it on smaller models (GPT-3.5, GPT-3), suggesting particular value for resource-constrained deployments. Beyond dialogue, the structured memory enables novel visualization tools: spatiotemporal heatmaps, emotional trajectory analysis, and interactive path tracking, positioning the system as both a dialogue interface and research tool for biographical analysis. We use Van Gogh as a test case, but the architecture is generalizable to any historical figure with substantial textual records, offering a practical framework for educational, museum, and research applications requiring both accuracy and efficiency

📄 Content

Creating believable AI character agents has evolved from rule-based systems to contemporary LLM implementations, yet fundamental challenges persist. Historical figures present a particularly compelling use case. Their lives are well-documented through biographies, letters, and archival materials, yet traditional approaches to character embodiment struggle to transform this rich textual heritage into coherent, interactive experiences.

Recent advances in LLMs have renewed interest in character-based dialogue systems. However, when deployed without grounding mechanisms, these systems exhibit a persistent tendency toward hallucination, generating plausible but factually incorrect information with high confidence. For historical figures like Vincent van Gogh, where accuracy matters for educational and cultural preservation applications, such fabrications undermine system trustworthiness and limit practical deployment.

Current approaches to character embodiment face a fundamental trade-off between accuracy and responsiveness. Simple retrieval-augmented generation (RAG) systems can ground responses in factual data but often produce shallow, disconnected answers that fail to capture the experiential richness of a character’s lived history. Multi-stage architectures incorporating self-reflection, iterative refinement, and chained retrievals achieve greater depth and coherence but at severe computational cost. Systems requiring multiple sequential LLM calls can take tens of seconds to construct a single response, far too slow for natural conversation and prohibitively expensive for real-time educational applications. Our work addresses a gap in conversational AI: existing systems optimize either for accuracy (through expensive multi-stage processing) or efficiency (through simple retrieval), but educational and museum deployments require both. By conducting reflection and enrichment offline, we achieve the contextual depth of multi-stage systems at single-retrieval latency. This design philosophy-separating concerns temporally rather than eliminating them-offers a generalizable approach to resource-constrained AI deployment.

This latency problem becomes particularly acute when considering deployment constraints. While largescale models like GPT-4 and Claude v4 with large context windows offer impressive capabilities, educational institutions, museums, and research environments often require systems that can run locally or on modest cloud infrastructure. Smaller models that enable such deployment show even greater performance degradation with traditional RAG approaches, making the efficiency-accuracy gap a critical barrier to practical application.

We present an architecture that resolves the efficiency-accuracy dilemma through a novel combination of offline data augmentation and real-time parallel retrieval. Our key insight is that the computationally expensive work of self-reflection, context enrichment, and perspective transformation can be performed once during dataset construction rather than repeatedly during each interaction. By pre-processing biographical and epistolary data into structured episodic memories enriched with affective-semantic metadata (temporal markers, emotional valence/arousal, character relationships, geographic locations), we create a substrate that enables both rapid retrieval and rich contextualization.

Our system implements a three-tier memory architecture inspired by cognitive models:

  1. Long-term episodic memory:1,774 first-person experiential memories derived from biographic text letters 2. Intermediate conversational memory: dynamically maintained records of recent dialogue 3. Short-term working memory (the LLM’s context window).

A two-stage retrieval mechanism accesses these memory tiers in parallel. Initial similarity search uses concise memory descriptions optimized for matching user queries, then retrieves full first-person narratives and associated affective-semantic metadata for prompt construction. The result is a system achieving an average prompt generation time of 0.52 seconds while maintaining response quality that outperforms traditional RAG implementations, particularly for smaller, more deployable models. Our evaluation using LLM-as-a-judge and RAGAs (RAG assessment) frameworks demonstrates consistent advantages in faithfulness and contextual relevance across multiple model scales.

While conversational interaction represents the most visible application of our architecture, the structured nature of our episodic memory dataset enables a complementary use case: interactive visualization and exploration of historical figures’ cognitive and experiential landscapes. Our affectivesemantic enrichment (valence, arousal, geographic coordinates, character co-occurrence, autobiographical significance) transforms biographical data into a multidimensional space that can be analyzed, visualized, and navigated. We demonstrate applications including spatiotemporal heatmaps, emot

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut