Title: Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models
ArXiv ID: 2511.20799
Date: 2025-11-25
Authors: Trung Cuong Dang, David Mohaisen
📝 Abstract
Large language models, trained on massive corpora, are prone to verbatim memorization of training data, creating significant privacy and copyright risks. While previous works have proposed various definitions for memorization, many exhibit shortcomings in comprehensively capturing this phenomenon, especially in aligned models. To address this, we introduce a novel framework: multi-prefix memorization. Our core insight is that memorized sequences are deeply encoded and thus retrievable via a significantly larger number of distinct prefixes than non-memorized content. We formalize this by defining a sequence as memorized if an external adversarial search can identify a target count of distinct prefixes that elicit it. This framework shifts the focus from single-path extraction to quantifying the robustness of a memory, measured by the diversity of its retrieval paths. Through experiments on open-source and aligned chat models, we demonstrate that our multi-prefix definition reliably distinguishes memorized from non-memorized data, providing a robust and practical tool for auditing data leakage in LLMs.
💡 Deep Analysis
📄 Full Content
Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust
Detection of Training Data Leakage in Large Language Models
Trung Cuong Dang 1 David Mohaisen 1
Abstract
Large language models, trained on massive cor-
pora, are prone to verbatim memorization of train-
ing data, creating significant privacy and copy-
right risks. While previous works have proposed
various definitions for memorization, many ex-
hibit shortcomings in comprehensively capturing
this phenomenon, especially in aligned models.
To address this, we introduce a novel framework:
multi-prefix memorization. Our core insight is
that memorized sequences are deeply encoded and
thus retrievable via a significantly larger number
of distinct prefixes than non-memorized content.
We formalize this by defining a sequence as mem-
orized if an external adversarial search can iden-
tify a target count of distinct prefixes that elicit it.
This framework shifts the focus from single-path
extraction to quantifying the robustness of a mem-
ory, measured by the diversity of its retrieval paths.
Through experiments on open-source and aligned
chat models, we demonstrate that our multi-prefix
definition reliably distinguishes memorized from
non-memorized data, providing a robust and prac-
tical tool for auditing data leakage in LLMs.
1. Introduction
Large language models have demonstrated remarkable ca-
pabilities in tasks ranging from advanced text generation to
nuanced reasoning and interactive dialogue. Their power
stems from training on vast Internet-scale datasets, enabling
them to internalize extensive knowledge and complex lin-
guistic structures (Brown et al., 2020). However, this train-
ing paradigm introduces a critical challenge: the tendency of
LLMs to memorize and reproduce segments of their training
data verbatim. This phenomenon, which has been demon-
strated to occur even in state-of-the-art models (Carlini et al.,
1Department
of
Computer
Science,
University
of
Central
Florida.
Correspondence
to:
Trung
Cuong
Dang
,
David
Mohaisen
.
Preprint. November 27, 2025.
2021), raises concerns regarding privacy, copyright, and the
ethical deployment of these models. Consequently, estab-
lishing a precise definition for memorization is of paramount
importance, with implications for both practical applications
and legal frameworks (Cooper et al., 2023), and the existing
definitions offer varied perspectives on memorization.
Some initial formulations define memorization based on
simple elicitation, where a sequence is considered mem-
orized if any prompt can be found that reproduces it ex-
actly (Nasr et al., 2025). The weakness of this definition,
however, is that it conflates true memorization with basic
instruction-following, as a prompt could simply contain
the target sequence itself (e.g., “Repeat the following text:
...”). To tackle this, more constrained definitions have been
proposed. A widely adopted version is discoverable memo-
rization (Carlini et al., 2023), which defines the concept as a
completion test using the target sequence’s own prefix. This
approach, however, lacks robustness against modern align-
ment techniques. Its reliance on natural prefixes makes it
susceptible to evasion by models trained to refuse such com-
pletions, creating an “illusion of compliance” even when
the memory persists (Ippolito et al., 2023).
To address this issue, an information-theoretic definitions
has been introduced in (Schwarzschild et al., 2024), which
defines that a sequence is memorized if it can be elicited
by a prompt (significantly) shorter than the sequence it-
self. While this definition is more resilient to surface-level
alignment, its practical utility is constrained by the com-
putationally intensive search required to find an optimally
compressed adversarial prompt, especially for very long
target sequences. Crucially, while these state-of-the-art
frameworks differ in their approach, they share a common
conceptual limitation: they define memorization through the
lens of finding a single elicitation path. They are therefore
less designed to characterize the broader accessibility and
robustness of a memory, properties that indicate how deeply
it is ingrained within the model.
Contributions
In this work, we propose that memories
within LLMs can be retrieved using various distinct cues,
similar to the concept of cued recall in human memory (Tulv-
ing & Pearlstone, 1966; Tulving & Osler, 1968). We present
1
arXiv:2511.20799v1 [cs.CL] 25 Nov 2025
A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models
a framework that redefines memorization, not just by the
ability to elicit memories, but by the diversity of paths that
can lead to such retrieval. Our main argument is that se-
quences that are deeply memorized are identifiable by their
retrieval through numerous unique prefix prompts. This
notion of multiple retrieval avenues is akin to adversarial
robustness analy