Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the divergence from the model’s top-1 prediction and local correlation between adjacent tokens. In this work, we propose Gap-K%, a novel pretraining data detection method grounded in the optimization dynamics of LLM pretraining. By analyzing the next-token prediction objective, we observe that discrepancies between the model’s top-1 prediction and the target token induce strong gradient signals, which are explicitly penalized during training. Motivated by this, Gap-K% leverages the log probability gap between the top-1 predicted token and the target token, incorporating a sliding window strategy to capture local correlations and mitigate token-level fluctuations. Extensive experiments on the WikiMIA and MIMIR benchmarks demonstrate that Gap-K% achieves state-of-the-art performance, consistently outperforming prior baselines across various model sizes and input lengths.


💡 Research Summary

The paper tackles the problem of detecting whether a given text was part of the pre‑training corpus of a large language model (LLM), a task that has become increasingly important for privacy, copyright, and model‑evaluation reasons. Existing state‑of‑the‑art methods such as Min‑K% and its improved variant Min‑K%++ rely solely on token‑level log‑probabilities, treating each token independently and ignoring two key aspects: (1) the divergence between the model’s top‑1 predicted token and the true token, and (2) the local correlation among adjacent tokens.

The authors observe that during next‑token training the cross‑entropy loss generates the strongest gradient for the token that receives the highest probability if it is not the target. Consequently, training data are explicitly optimized to minimize the gap between the log‑probability of the top‑1 prediction and that of the ground‑truth token. For unseen (non‑training) data, the model often assigns high probability to a plausible but incorrect token, creating a noticeable gap. This insight motivates the proposed metric, Gap‑K%.

Gap‑K% is computed as follows. For each position t in a sequence, the raw gap score is
(g_t = \frac{\log p(x_t|x_{<t}) - \max_{v\in V}\log p(v|x_{<t})}{\sigma_t}),
where (\sigma_t) is the standard deviation of the log‑probabilities over the vocabulary at that step. The numerator measures how far the true token’s log‑probability lies from the top‑1 log‑probability; division by (\sigma_t) normalizes for distribution sharpness. Because (g_t\le 0), values close to zero indicate that the target token is essentially the model’s top‑1 prediction (typical of training data), while large negative values signal a strong divergence (typical of non‑training data).

To capture sequential consistency, a sliding window of size w is applied:
(\bar g_t = \frac{1}{w}\sum_{i=0}^{w-1} g_{t+i}).
This smoothing reduces token‑level noise and highlights contiguous regions where the model consistently aligns or misaligns with the input. Finally, the Gap‑K% membership score for a whole sequence is the average of the lowest k % of the smoothed scores (\bar g_t). In other words, the method focuses on the hardest‑to‑explain segments, which provide the strongest evidence for non‑membership.

The paper provides a theoretical comparison with Min‑K%++. While Min‑K%++ measures deviation from the mean of the vocabulary distribution, Gap‑K% adds an extra term that explicitly accounts for the distance between the mode (top‑1) and the mean. This allows Gap‑K% to distinguish between “flat” low‑confidence predictions (where many tokens have similar probabilities) and “confident mispredictions” (where the model is highly confident in an incorrect token). The latter are penalized more heavily, reflecting the fact that training data have been optimized to avoid such confident errors.

Empirical evaluation is performed on two benchmarks: WikiMIA (original and paraphrased versions, with input lengths of 32, 64, and 128 tokens) and MIMIR (a more challenging dataset with minimal distributional differences between members and non‑members). Five models of varying scale are tested: Mamba‑1.4B, Pythia‑6.9B, Pythia‑12B, LLaMA‑13B, and LLaMA‑65B. Across all settings, Gap‑K% achieves higher AUROC scores than Min‑K%++ and all other baselines. For example, on WikiMIA (original, 32‑token) Gap‑K% reaches 69.2 % AUROC versus 66.4 % for Min‑K%++. In the paraphrased setting the gap narrows but Gap‑K% still leads (67.2 % vs. 65.7 %). Similar improvements are observed on longer inputs and on the MIMIR benchmark, where Gap‑K% consistently outperforms baselines by 2–4 percentage points.

Ablation studies explore the impact of the sliding‑window size w and the percentile k. The authors find that moderate window sizes (w ≈ 5–10) and a low‑percentile choice (k ≈ 10 %) provide a good trade‑off between sensitivity and robustness. Visualizations of score distributions illustrate how Gap‑K% assigns substantially lower (more negative) scores to confident mispredictions, whereas Min‑K%++ treats them similarly to uniformly low‑probability tokens.

Limitations are acknowledged. The method requires access to token‑level log‑probabilities, which may not be available for closed‑source APIs. Moreover, the hyper‑parameters may need domain‑specific tuning, and the approach has not been tested on multilingual or multimodal models.

In summary, Gap‑K% introduces a principled, optimization‑driven signal—the gap between the model’s top‑1 prediction and the true token—and combines it with local smoothing to capture sequential patterns. This yields a simple yet powerful membership inference tool that outperforms existing likelihood‑based techniques, advancing the state of the art in pre‑training data detection and contributing to greater transparency and responsible use of LLMs.


Comments & Academic Discussion

Loading comments...

Leave a Comment