Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models

Reading time: 5 minute
...

📝 Original Info

  • Title: Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models
  • ArXiv ID: 2511.20799
  • Date: 2025-11-25
  • Authors: Trung Cuong Dang, David Mohaisen

📝 Abstract

Large language models, trained on massive corpora, are prone to verbatim memorization of training data, creating significant privacy and copyright risks. While previous works have proposed various definitions for memorization, many exhibit shortcomings in comprehensively capturing this phenomenon, especially in aligned models. To address this, we introduce a novel framework: multi-prefix memorization. Our core insight is that memorized sequences are deeply encoded and thus retrievable via a significantly larger number of distinct prefixes than non-memorized content. We formalize this by defining a sequence as memorized if an external adversarial search can identify a target count of distinct prefixes that elicit it. This framework shifts the focus from single-path extraction to quantifying the robustness of a memory, measured by the diversity of its retrieval paths. Through experiments on open-source and aligned chat models, we demonstrate that our multi-prefix definition reliably distinguishes memorized from non-memorized data, providing a robust and practical tool for auditing data leakage in LLMs.

💡 Deep Analysis

Figure 1

📄 Full Content

Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models Trung Cuong Dang 1 David Mohaisen 1 Abstract Large language models, trained on massive cor- pora, are prone to verbatim memorization of train- ing data, creating significant privacy and copy- right risks. While previous works have proposed various definitions for memorization, many ex- hibit shortcomings in comprehensively capturing this phenomenon, especially in aligned models. To address this, we introduce a novel framework: multi-prefix memorization. Our core insight is that memorized sequences are deeply encoded and thus retrievable via a significantly larger number of distinct prefixes than non-memorized content. We formalize this by defining a sequence as mem- orized if an external adversarial search can iden- tify a target count of distinct prefixes that elicit it. This framework shifts the focus from single-path extraction to quantifying the robustness of a mem- ory, measured by the diversity of its retrieval paths. Through experiments on open-source and aligned chat models, we demonstrate that our multi-prefix definition reliably distinguishes memorized from non-memorized data, providing a robust and prac- tical tool for auditing data leakage in LLMs. 1. Introduction Large language models have demonstrated remarkable ca- pabilities in tasks ranging from advanced text generation to nuanced reasoning and interactive dialogue. Their power stems from training on vast Internet-scale datasets, enabling them to internalize extensive knowledge and complex lin- guistic structures (Brown et al., 2020). However, this train- ing paradigm introduces a critical challenge: the tendency of LLMs to memorize and reproduce segments of their training data verbatim. This phenomenon, which has been demon- strated to occur even in state-of-the-art models (Carlini et al., 1Department of Computer Science, University of Central Florida. Correspondence to: Trung Cuong Dang , David Mohaisen . Preprint. November 27, 2025. 2021), raises concerns regarding privacy, copyright, and the ethical deployment of these models. Consequently, estab- lishing a precise definition for memorization is of paramount importance, with implications for both practical applications and legal frameworks (Cooper et al., 2023), and the existing definitions offer varied perspectives on memorization. Some initial formulations define memorization based on simple elicitation, where a sequence is considered mem- orized if any prompt can be found that reproduces it ex- actly (Nasr et al., 2025). The weakness of this definition, however, is that it conflates true memorization with basic instruction-following, as a prompt could simply contain the target sequence itself (e.g., “Repeat the following text: ...”). To tackle this, more constrained definitions have been proposed. A widely adopted version is discoverable memo- rization (Carlini et al., 2023), which defines the concept as a completion test using the target sequence’s own prefix. This approach, however, lacks robustness against modern align- ment techniques. Its reliance on natural prefixes makes it susceptible to evasion by models trained to refuse such com- pletions, creating an “illusion of compliance” even when the memory persists (Ippolito et al., 2023). To address this issue, an information-theoretic definitions has been introduced in (Schwarzschild et al., 2024), which defines that a sequence is memorized if it can be elicited by a prompt (significantly) shorter than the sequence it- self. While this definition is more resilient to surface-level alignment, its practical utility is constrained by the com- putationally intensive search required to find an optimally compressed adversarial prompt, especially for very long target sequences. Crucially, while these state-of-the-art frameworks differ in their approach, they share a common conceptual limitation: they define memorization through the lens of finding a single elicitation path. They are therefore less designed to characterize the broader accessibility and robustness of a memory, properties that indicate how deeply it is ingrained within the model. Contributions In this work, we propose that memories within LLMs can be retrieved using various distinct cues, similar to the concept of cued recall in human memory (Tulv- ing & Pearlstone, 1966; Tulving & Osler, 1968). We present 1 arXiv:2511.20799v1 [cs.CL] 25 Nov 2025 A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models a framework that redefines memorization, not just by the ability to elicit memories, but by the diversity of paths that can lead to such retrieval. Our main argument is that se- quences that are deeply memorized are identifiable by their retrieval through numerous unique prefix prompts. This notion of multiple retrieval avenues is akin to adversarial robustness analy

📸 Image Gallery

page_1.png page_2.png page_3.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut