Token-Efficient Change Detection in LLM APIs

Token-Efficient Change Detection in LLM APIs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Remote change detection in LLMs is a difficult problem. Existing methods are either too expensive for deployment at scale, or require initial white-box access to model weights or grey-box access to log probabilities. We aim to achieve both low cost and strict black-box operation, observing only output tokens. Our approach hinges on specific inputs we call Border Inputs, for which there exists more than one output top token. From a statistical perspective, optimal change detection depends on the model’s Jacobian and the Fisher information of the output distribution. Analyzing these quantities in low-temperature regimes shows that border inputs enable powerful change detection tests. Building on this insight, we propose the Black-Box Border Input Tracking (B3IT) scheme. Extensive in-vivo and in-vitro experiments show that border inputs are easily found for non-reasoning tested endpoints, and achieve performance on par with the best available grey-box approaches. B3IT reduces costs by $30\times$ compared to existing methods, while operating in a strict black-box setting.


💡 Research Summary

The paper tackles the problem of detecting changes in large language model (LLM) APIs when only the generated tokens are observable, i.e., in a strict black‑box setting. Existing solutions fall into three categories: white‑box methods that require model weights, gray‑box methods that need log‑probabilities, and pure black‑box approaches that rely solely on output tokens but demand many API calls, making them prohibitively expensive for continuous monitoring. The authors introduce the concept of “Border Inputs” (BIs) – prompts for which at least two tokens share the highest logit – and build a detection framework called B3IT (Black‑Box Border Input Tracking) that leverages these inputs to achieve high sensitivity at very low sampling temperature.

The theoretical contribution begins by modeling the first output token as a categorical distribution p(θ) over the vocabulary. Repeated queries on the same prompt yield i.i.d. samples, and the empirical frequency vector is a sufficient statistic. Parameter changes are expressed as a small perturbation θ₁ = θ₀ + εh. Using the Local Asymptotic Normality (LAN) regime, where ε scales as s/√n, the authors derive an asymptotically optimal test whose Type‑II error depends on a single scalar SNR²(h) = hᵀ(JᵀF⁻¹J)h. Here J is the Jacobian of the output distribution with respect to model parameters, and F is the Fisher information matrix of the categorical distribution.

The key insight is obtained by examining the low‑temperature limit (τ → 0) of the softmax. When a single logit dominates (k = 1), the distribution collapses to a Dirac mass, the Fisher information becomes singular, and SNR² → 0, rendering detection impossible. Conversely, when at least two logits are tied (k ≥ 2), the Fisher information matrix evaluated on the uniform distribution over the tied tokens diverges, causing SNR² → ∞ for almost any perturbation direction h. This constitutes a phase transition: border inputs create a regime where arbitrarily small parameter changes produce a detectable shift in the token distribution. The authors formalize this in Theorem 3.3 and show that the condition hᵀ(JᵀΣ_MJ)h ≠ 0 holds with probability one for non‑degenerate h.

Building on this theory, B3IT proceeds in two stages. In the initialization stage, a large pool of random prompts is sampled at the lowest possible temperature (often τ ≈ 0). Each prompt is queried m times; if more than one distinct token appears, the prompt is flagged as a border input. Because the phase transition guarantees that non‑border inputs always produce the same token, this simple heuristic reliably extracts BIs with minimal API cost. In the detection stage, the stored BIs are re‑queried on the live model at the same low temperature. The observed token frequencies are compared to the baseline using standard multinomial tests (e.g., likelihood‑ratio or χ²). Persistence of the uniform distribution over the original tied tokens suggests no change (H₀), while a shift toward a single token indicates a model update (H₁).

Empirically, the authors evaluate B3IT both in‑vitro on the TinyChange benchmark (synthetic fine‑tuning, quantization, pruning) and in‑vivo on 93 real‑world endpoints spanning 64 models from 20 providers. They report detection accuracy comparable to the best gray‑box methods (e.g., LT) while using roughly 1/30 of the token budget of the strongest black‑box baseline (MET). The cost reduction stems from the fact that only a handful of queries per BI are needed, and the number of BIs required for reliable detection is modest (often < 50). Moreover, the method is robust to different types of model modifications, including LoRA fine‑tuning, weight quantization, and even changes to system prompts or routing logic.

In conclusion, the paper demonstrates that low‑temperature sampling creates a sharp detectability dichotomy that can be exploited without any internal model access. By automatically discovering border inputs and monitoring their token distribution, B3IT provides a practical, low‑cost solution for continuous LLM API monitoring. The work bridges rigorous statistical theory (LAN, Fisher information, phase transition) with scalable engineering, opening avenues for broader applications such as multi‑token monitoring, adaptive temperature selection, and detection of more subtle behavioral shifts beyond token probabilities.


Comments & Academic Discussion

Loading comments...

Leave a Comment