Information-Theoretic Limits of Safety Verification for Self-Improving Systems

Information-Theoretic Limits of Safet y V eriﬁcation for Self-Impro ving Systems Arsenios Scriv ens Marc h 2026 Abstract Can a safet y gate p ermit un b ounded b eneﬁcial self-modiﬁcation while maintaining bounded cum ulative risk? W e formalize this question through dual c onditions — requiring P δ n < ∞ (b ounded risk) and P TPR n = ∞ (un b ounded utility) — and establish a theory of their (in)compatibilit y . Classiﬁcation imp ossibilit y (Theorem 1 ): F or pow er-law risk schedules δ n = O ( n − p ) with p > 1 — the practically relev ant regime — any classiﬁer-based gate under o verlapping safe/unsafe distributions satisﬁes TPR n ≤ C α · δ β n via Hölder’s inequalit y , forcing P TPR n < ∞ . This imp ossibility is exp onent-optimal : no v alid imp ossibilit y b ound can use a larger exp onen t than β ∗ (Theorem 3 ; full Mills’ ratio asymptotics in App endix A.3 ). A second indep enden t pro of via the NP c ounting metho d (Theorem 4 ) yields a 13% tighter b ound without Hölder’s inequalit y . Scop e ca veat: for slowly-deca ying summable sequences such as δ n = 1 / ( n ln 2 n ) that are not p o wer-la w, the Hölder summability argument fails and the NP-optimal classiﬁer may satisfy b oth dual conditions (§ 3.1 ). Theorem 5 cov ers all sc hedules. Univ ersal ﬁnite-horizon ceiling (Theorem 5 ): F or any summable risk sc hedule, the exact maxim um achiev able classiﬁer utility is U ∗ ( N , B ) = N · TPR NP ( B / N ) , growing as exp ( O ( √ log N )) — subp olynomial. A t N = 10 6 with budget B = 1 . 0 , a classiﬁer extracts at most U ∗ ≈ 87 versus a veriﬁer’s ≈ 500 , 000 . This result is universal: it applies even where Theorem 1 do es not. V eriﬁcation escap e (Theorem 2 ): A Lipschitz ball veriﬁer ac hieves δ = 0 with TPR > 0 , escaping the imp ossibilit y . F ormal Lipschitz b ounds for pre-Lay erNorm transformers under LoRA (Prop osition 3 ) enable LLM-scale veriﬁcation. The separation is strict: classiﬁers cannot ac hieve δ = 0 with TPR > 0 under distribution ov erlap (Prop osition 4 ). W e v alidate on GPT-2 ( d LoRA = 147 , 456 ): conditional δ = 0 with TPR = 0 . 352 ; unconditional δ = 0 at d ≤ 17 , 408 . Comprehensiv e empirical v alidation is in the companion pap er [ 23 ]. 1 In tro duction The prosp ect of AI systems that impro ve their own capabilities raises a central safety question: can we build a gate that p ermits b eneﬁcial self-mo diﬁcations while reliably blo c king dangerous ones? W e formalize this as a sequential decision problem and provide a comprehensive theoretical c haracterization of when safe self-improv emen t is p ossible and when it is prov ably imp ossible. A t each step n , a system prop oses a mutation θ n to its o wn parameters. A safety gate g n decides whether to accept or reject θ n , incurring false acceptance rate δ n = P − ( g n accepts ) and true p ositiv e rate TPR n = P + ( g n accepts ) . W e require P δ n < ∞ (b ounded risk, ensuring p ositiv e probabilit y of nev er accepting an unsafe mutation) and P TPR n = ∞ (un b ounded utility , ens uring the system accepts inﬁnitely many improv emen ts). W e call these the dual conditions for safe self-impro vemen t. 1 W e prov e four imp ossibilit y and tigh tness results and four constructive results, forming a complete picture. The imp ossibilit y is strongest for p o w er-law risk schedules δ n = O ( n − p ) with p > 1 (Theorem 1 ), which subsume all practically relev ant risk budgets. F or the most fav orable slo wly-decaying schedules, the asymptotic imp ossibilit y do es not apply , but the tight ﬁnite-horizon ceiling (Theorem 5 ) ensures classiﬁer utility remains subp olynomial — orders of magnitude b elo w a v eriﬁer’s linear growth — at any practical deploymen t horizon. Wh y the dual conditions matter for deplo yed systems. The dual conditions formalize a failure pattern that self-improving systems face by construction. Red-team ev aluations — which explicitly classify mo del outputs or parameter mutations as safe/unsafe — are the closest practical analog to the binary gates our theorems address. F ailure to satisfy b oth conditions simultaneously means either that unsafe modiﬁcations accumulate ( P δ n div erges) or that the system stops impro ving ( TPR n → 0 ). Other deploy ed mechanisms such as RLHF reward mo dels can b e view ed as a suggestive analogy , but our formal results apply sp eciﬁcally to binary gates on parameter m utations; for a fuller discussion see § 8.1 . 1.1 Con tributions The pap er establishes tw o core results and then systematically corrob orates them. (Theorems are n umbered 1–5 and Prop ositions 1–4, with separate counters.) A note on the nature of the contribution. The p er-step b ound TPR n ≤ C α · δ β n is a standard f-div ergence inequality [ 27 ], and the sequen tial summability consequence follows in a few lines. The pro of is short — delib erately so. The contribution of this pap er is not the length or tec hnical diﬃculty of any single pro of, but rather: (i) the pr oblem formalization — casting safe self-impro vemen t as dual summability conditions, which has no precedent in the hypothesis testing or AI safety literatures; (ii) the structur al c onse quenc e — that this elementary coupling creates an imp ossibility for the safety–utilit y pairing with no analog in single-test settings; (iii) the tight ﬁnite-horizon c eiling (Theorem 5 ), whic h pro vides the exact, univ ersal utilit y b ound for any classiﬁer under an y risk schedule; and (iv) the c onstructive esc ap e via veriﬁcation, proving the imp ossibility is sp eciﬁc to classiﬁcation, not to safe self-improv emen t itself. Core results: 1. Classiﬁcation imp ossibilit y (Theorem 1 ): Any classiﬁer-based gate under distribution ov erlap satisﬁes TPR n ≤ C α · δ β n , forcing b ounded utility whenever risk follows a p o w er-law schedule δ n = O ( n − p ) with p > 1 . F or slowly-deca ying summable sequences (e.g., δ n = 1 / ( n ln 2 n ) ), the p er-step b ound still holds but the Hölder summability argumen t do es not force P TPR n < ∞ ; in suc h edge cases, the NP-optimal classiﬁer can in principle satisfy both dual conditions sim ultaneously . Ho w ever, the ﬁnite-horizon ceiling (Theorem 5 ) remains fully op erativ e in all cases, ensuring classiﬁer utility grows at most subp olynomially — far b elo w a veriﬁer’s linear gro wth at any practical horizon. 2. V eriﬁcation escape (Theorem 2 ): Sound veriﬁcation gates achiev e δ = 0 with TPR > 0 , escaping the imp ossibilit y . The Lipsc hitz ball v eriﬁer is the simplest example; the structural separation (Prop osition 4 ) prov es the gap is architectural, not a matter of degree. See Figure 1 for an ov erview of the t wo gate architectures; Figure 6 visualizes the structural separation in the ( δ, TPR ) plane. 3. Tigh t ﬁnite-horizon ceiling (Theorem 5 ): P erhaps the most practically consequen tial re- sult. F or any summable risk schedule — including non-p ow er-la w sequences where the Hölder summabilit y argument do es not apply — the exact maximum achiev able utility is U ∗ ( N , B ) = 2 N · TPR NP ( B / N ) , growing as exp ( O ( √ log N )) . This is subp olynomial, 13 × tigh ter th an the MI b ound, and ensures that no classiﬁer under any risk schedule can matc h a veriﬁer’s linear utilit y growth. Unlike Theorem 1 , this result is univ ersal ov er all summable risk schedules and immediately op erational at any ﬁnite deploymen t horizon. See Figure 4 . Tigh tness and corrob oration (conﬁrming the imp ossibilit y is robust, not an artifact of one pro of technique): 4. Exp onen t-optimalit y (Theorem 3 ): The Hölder exp onent β ∗ is minimax-optimal — no v alid imp ossibilit y b ound can use a strictly larger exp onen t. A t deploymen t-relev ant δ , the NP classiﬁer op erates within one order of magnitude of the ceiling (App endix D.1 ). See Figure 3 . 5. NP coun ting imp ossibilit y (Theorem 4 ): An indep enden t pro of via the Neyman–Pearson lemma and T onelli’s theorem, av oiding Hölder’s inequalit y entirely . The counting b ound is 13% tigh ter than the Hölder b ound at ∆ s = 1 . 0 , p = 2 . 0 . Supp orting results (extending the theory to information-theoretic, sample complexity , and LLM-scale settings): 6. Information-theoretic b ound (Prop osition 1 ): P N n =1 TPR n ≤ P N n =1 δ n + √ 2 N I 0 . Comple- men ts the Hölder b ound via m utual information. 7. Sample complexity barrier (Prop osition 2 ): Requires Ω( n 2 p ) lab eled examples by step n ; under constan t lab el generation, sample starv ation o ccurs at ﬁnite n fail . 8. F ormal transformer Lipsc hitz b ounds (Prop osition 3 ): Closed-form Lipschitz constants for pre-La yerNorm transformers under LoRA, enabling LLM-scale veriﬁcation. 9. Structural separation (Prop osition 4 ): Under absolute contin uit y , δ = 0 = ⇒ TPR = 0 for classiﬁers, but v eriﬁers achiev e δ = 0 with TPR > 0 . 10. LLM-scale mechanism v alidation : Ball veriﬁer on GPT-2 (124M parameters) with LoRA rank-4 ( d LoRA = 147 , 456 ), achieving conditional δ = 0 (conditional on estimated Lipschitz constan ts) with TPR = 0 . 352 (§ 7.1 ); unconditional δ = 0 at d ≤ 17 , 408 via analytical b ounds. Theorems 3 – 5 and Prop ositions 1 – 4 are corrob orativ e, each conﬁrming the imp ossibilit y from a diﬀeren t angle to establish robustness. 1.2 Related W ork Our mathematical to ols — Hölder’s inequalit y , Rényi divergence, Lipschitz contin uity , Neyman– P earson testing — are w ell-established. The p er-step b ound TPR ≤ C α · δ β is a standard f- div ergence inequalit y [ 27 ], and NP optimalit y [ 16 ] establishes single-test R OC tradeoﬀs. Our con tribution is the pr oblem formalization (the dual conditions as a formal sp eciﬁcation of safe self-impro vemen t) and the structur al r esult that sequential comp osition under dual summabilit y conditions creates an imp ossibilit y with no analog in single-test settings. The p er-step b ound and the summabilit y requirements are individually standard; the coupling — that b ounded P δ n forces b ounded P TPR n for p o wer-la w risk schedules — is not, and is conﬁrmed by tw o indep enden t imp ossibilit y pro ofs (Theorems 1 , 4 ) corrob orated by three complementary b ounds (Theorem 5 , Prop ositions 1 – 2 ) approaching the same conclusion from diﬀerent angles (§ 3 , § 6 , App endix C ). W e build on alignment theory [ 2 , 6 , 7 ], h yp othesis testing [ 13 ], imp ossibilit y results [ 21 , 30 ], information- theoretic b ounds [ 19 , 20 ], P AC-Ba y es and VC theory [ 15 , 28 ], adversarial robustness tradeoﬀs [ 11 , 26 ], 3 and transformer Lipschitz analysis [ 8 , 12 , 29 ]. Our dual conditions formalize the alignment tax — the cost of making mo dels safe versus capable [ 3 , 17 ] — as a precise mathematical tradeoﬀ: the Hölder coupling TPR n ≤ C α · δ β n quan tiﬁes the exact rate at which safety constraints reduce utilit y under classiﬁcation-based gates. Structurally , our result is closer to mec hanism-design imp ossibilities — Gibbard [ 10 ] and Satterthw aite [ 22 ] sho w that no voting rule can simultaneously satisfy multiple natural axioms, just as no classiﬁer can simultaneously satisfy our dual conditions — than to no-free-lunc h theorems [ 30 ], whic h concern the absence of a universally optimal learner rather than a hard tradeoﬀ in a ﬁxed domain. Multi-ob jectiv e optimization imp ossibilities [ 18 ] also exhibit this ﬂa vour: tw o desiderata in conﬂict cannot b e jointly optimized in p olynomial time, analogously to ho w our tw o summabilit y conditions cannot b e jointly satisﬁed by a classiﬁer under distribution o verlap. A detailed comparison with eac h line of work is in App endix B . An analogy clariﬁes the con tribution. Arro w’s imp ossibilit y theorem comp oses elementary so cial-c hoice axioms — transitivit y , non-dictatorship, indep endence — eac h individually ob vious, yet their c omp osition yields a deep imp ossibilit y no voting system can escap e. Similarly , our p er-step b ound is a standard f-divergence inequality and the dual conditions are individually natural, but the c oupling — that summability of { δ n } forces summabilit y of { TPR n } — creates a structural imp ossibilit y with no analog in single-test hypothesis testing (see App endix B.1 for a full discussion). Online learning and adaptiv e gates. A natural question is whether an online learner with sublinear regret (e.g., online conv ex optimization; Shalev-Shw artz 24 ) could adaptiv ely satisfy the dual conditions. The answer is no under our framew ork: Theorem 1 ’s p er-step b ound TPR n ≤ C α · δ β n constrains any binary decision rule at each step, regardless of wh ether it was chosen adaptively based on previous observ ations. Online learning can reduce misclassiﬁcation regret, but cannot escap e the Hölder coupling b etw een δ n and TPR n that drives the imp ossibilit y . The gate’s adaptivity aﬀects whic h p oin t on the p er-step ROC curv e it selects, not the curve itself. 2 Problem Setup Notation. The following sym b ols recur throughout: Sym b ol Meaning δ n F alse acceptance rate of gate at step n TPR n T rue p ositiv e rate of gate at step n C α , β Hölder constan ts: C α = exp( β · D α ) , β = ( α − 1) /α D α ( P + ∥ P − ) Rén yi divergence of order α ∆ s Distribution separation ∥ µ + − µ − ∥ /σ B T otal risk budget P N n =1 δ n U ∗ ( N , B ) Exact ﬁnite-horizon utilit y ceiling (Theorem 5 ) L Lipsc hitz constant of the tra jectory map m Safet y margin (min distance to obstacle b oundary) r Ball radius = m/L σ ∗ Optimal m utation scale for p ositiv e TPR 2.1 System Mo del A self-impro ving system main tains parameters θ ∈ R d and iterativ ely p roposes mutations θ 1 , θ 2 , . . . A t each step n , a safety oracle S : R d → { safe , unsafe } determines ground truth. The oracle is 4 assumed exp ensiv e (requiring exhaustiv e simulation or formal pro of ). A safety gate g n : R d → { accept , reject } ﬁlters mutations b efore deplo yment. The gate is computationally c heap er than the oracle. 2.2 Classiﬁcation Gates A classiﬁc ation gate maps controller parameters to a low-dimensional feature representation φ : R d → R k and applies a binary classiﬁer. The relev an t distributions are: P + n = distribution of φ ( θ ) | S ( θ ) = safe (1) P − n = distribution of φ ( θ ) | S ( θ ) = unsafe (2) A t step n , the gate accepts if the classiﬁer predicts “safe” based on φ ( θ n ) . Stationarit y assumption. The core theorems (§ 3 –§ 6 ) assume stationary distributions P + , P − (dropping the n subscript). The non-stationary extension (Prop osition C.2 , App endix C.2 ) requires sup n D ( n ) α < ∞ and, for p o wer-la w schedules δ n = c/n p , the strictly stronger condition p > α (vs. p > 1 in the stationary case). This gap narrows as α → 1 + and v anishes for fast-decaying sc hedules ( p ≫ α ), whic h cov er all practically relev an t risk budgets. F or deploymen t arguments where stationarit y ma y not hold and p is mo derate, th e ﬁnite-horizon ceiling (Theorem 5 ) provides a stationarit y-free alternativ e: it b ounds total classiﬁer utilit y ov er any N -step horizon given a risk budget B . Scop e: con tinuous parameter spaces. Our results assume θ ∈ R d with con tinuous mutation distributions, so that P + ≪ P − (absolute con tinuit y) holds via the transversalit y argumen t (§ 3.1 ). F or discrete or quantized parameter spaces — such as quantized LoRA ﬁne-tuning with integer- v alued weigh ts — absolute contin uit y do es not hold in the same form, and the imp ossibilit y may not apply directly . W e note that even in quantized settings, the eﬀective parameter up dates are t ypically computed in full precision b efore rounding, and the induced distributions on the quan tized grid can still exhibit the ov erlap structure that drives our results; a formal treatment of the discrete case is left to future w ork. 2.3 V eriﬁcation Gates A veriﬁc ation gate attempts to construct a mathematical pro of that θ is safe. If the pro of succeeds, the gate accepts; otherwise it rejects. A key prop ert y: Deﬁnition (Soundness) . A veriﬁcation gate is sound if every accepted θ is actually safe: g ( θ ) = accept = ⇒ S ( θ ) = safe. Soundness implies δ n = 0 for all n — b y construction, not by learning. 2.4 The Dual Conditions Deﬁnition. A safety gate achiev es safe self-impr ovement if: 1. P ∞ n =1 δ n < ∞ (b ounded cumulativ e risk) 2. P ∞ n =1 TPR n = ∞ (unbounded cumulativ e utility) Condition 1 ensures the system is almost surely safe o ver inﬁnitely many steps ( Q (1 − δ n ) > 0 b y conv ergence of the inﬁnite pro duct). Condition 2 preven ts v acuous safety (a gate rejecting ev erything trivially satisﬁes Condition 1 but accomplishes nothing). 5 On the c hoice of P TPR n = ∞ . This is the we akest p ossible non-v acuity condition: it requires only that the system even tually accepts inﬁnitely many improv emen ts, with no constraint on the rate or timing. Any ﬁnite threshold ( N 0 accepted mo diﬁcations “suﬃce”) is arbitrary and deplo yment-dependent — a self-impro ving system has no natural stopping p oint, and an y ﬁxed N 0 can b e exceeded by extending the deploymen t horizon. The condition is also ne c essary in the follo wing sense: if P TPR n < ∞ , the exp ected num b er of accepted impro vemen ts is ﬁnite, meaning the system almost surely stops self-improving after ﬁnitely many steps — it b ecomes a ﬁxed system with a safet y gate that rejects everything b ey ond some horizon. Critically , readers who reject the asymptotic framing lose nothing from the theory . Theorem 5 pro vides the exact ﬁnite-horizon utility ceiling U ∗ ( N , B ) = N · TPR NP ( B / N ) for any ﬁnite N and risk budget B , without requiring N → ∞ . A t N = 10 6 steps with B = 1 . 0 , a classiﬁer extracts at most U ∗ ≈ 87 utilit y v ersus a v eriﬁer’s ≈ 500 , 000 — a 5 , 700 × gap. The dual conditions framework is immediately op erational at any ﬁnite horizon; the asymptotic condition simply states the limiting case. 3 The Classiﬁcation Imp ossibilit y 3.1 Hölder–Rén yi Bound (Theorem 1) Theorem 1 (Safety–Utilit y Imp ossibilit y) . L et P + , P − b e distributions on R k with P + ≪ P − (absolute c ontinuity). Supp ose D α 0 ( P + ∥ P − ) < ∞ for some α 0 > p/ ( p − 1) . Then for any se quenc e of binary classiﬁers with false ac c eptanc e r ates δ n ≤ c/n p for some c > 0 , p > 1 : ∞ X n =1 δ n < ∞ = ⇒ ∞ X n =1 TPR n < ∞ That is, b ounde d cumulative risk under any p ower-law sche dule for c es b ounde d cumulative utility. Pr o of. Step 1. Let A n = { x : g n ( x ) = accept } . Then: TPR n = Z A n dP + = Z A n dP + dP − dP − Apply Hölder’s inequalit y with exp onen ts α > 1 and α ′ = α/ ( α − 1) : TPR n ≤ Z A n dP + dP − ! α dP − ! 1 /α ·  Z A n dP −  ( α − 1) /α Step 2. Bound the ﬁrst factor by extending the integration domain: Z A n dP + dP − ! α dP − ! 1 /α ≤ Z R k dP + dP − ! α dP − ! 1 /α = exp  α − 1 α D α ( P + ∥ P − )  using the deﬁnition D α ( P + ∥ P − ) = 1 α − 1 log R ( dP + /dP − ) α dP − . Step 3. Setting β = ( α − 1) /α ∈ (0 , 1) and C α = exp( β · D α ) : TPR n ≤ C α · δ β n Step 4. If δ n = c/n p with p > 1 (summable), then P TPR n ≤ C α c β P n − pβ , which conv erges iﬀ pβ > 1 . Cho ose α ∈ ( p/ ( p − 1) , α 0 ) (v alid since α 0 > p/ ( p − 1) b y hypothesis), ensuring pβ > 1 and D α < ∞ . 6 Scop e and limitations of Theorem 1 . The imp ossibilit y is established for p o wer-la w risk sc hedules δ n = O ( n − p ) with p > 1 , whic h subsume all practically relev ant risk budgets (geometric, p olynomial, or faster decay). F or slowly-deca ying summable sequences (e.g., δ n = 1 / ( n ln 2 n ) ), the p er-step b ound TPR n ≤ C α · δ β n still holds at each step, but P C α δ β n can diverge b ecause β < 1 — the Hölder exp onent cannot comp ensate for the slow deca y . In suc h edge cases, the NP-optimal classiﬁer can in principle satisfy b oth dual conditions simultaneously , and the asymptotic imp ossibilit y do es not apply . This is an inherent limitation of the Hölder-based pro of technique, not an artifact of our analysis. Ho wev er, the practical signiﬁcance of this gap is limited: the ﬁnite-horizon c eiling (Theorem 5 ) remains fully op erativ e for all summable sc hedules, including these edge cases. Even under the most fav orable slo wly-decaying schedule, total classiﬁer utility grows at most as exp ( O ( √ log N )) — subp olynomial — while a veriﬁer’s utility gro ws linearly as Θ( N ) (see § 6 for exact b ounds). The imp ossibilit y is therefore sharp for p o wer-la w schedules; the ﬁnite-horizon gap is universal. T wo indep enden t imp ossibilit y pro ofs (Theorems 1 and 4 ) and the exact ﬁnite-horizon ceiling (Theorem 5 ), supp orted b y the information-theoretic rate bound (Proposition 1 ) and sample complexit y barrier (Prop osition 2 ), conﬁrm that the classiﬁcation ceiling is robust and fundamental, not an artifact of an y single pro of technique. R emark (On the p er-step b ound) . The p er-step b ound TPR n ≤ C α · δ β n is a standard f-divergence inequalit y [ 27 ]. The con tribution is se quential c omp osition : under summability of { δ n } , this elemen tary b ound forces P TPR n < ∞ . T wo indep enden t pro ofs (Theorems 1 and 4 ) and three complemen tary b ounds (Theorem 5 , Prop ositions 1 – 2 ) conﬁrm the coupling is robust and tec hnique- indep enden t. R emark (Necessit y of Distribution Overlap) . The assumption P + ≪ P − is structurally una voidable: (i) if safe and unsafe mo diﬁcations were p erfectly separable, the indicator 1 supp ( P + ) w ould b e a zero-error oracle and no gate w ould b e needed; (ii) under full-supp ort m utations and smo oth safet y b oundaries, transv ersalit y ensures every feature-space neigh b orho od con tains b oth safe and unsafe pre-images (see App endix C for the full geometric argument); (iii) when the safety b oundary is piecewise smooth and µ is Gaussian, D α ( P + ∥ P − ) < ∞ in a neighborho o d of 1. Empirical conﬁrmation: across three systems in [ 23 ], measured ∆ s ∈ [0 . 059 , 0 . 091] — well b elo w the separabilit y threshold. 3.2 Exp onen t-Optimalit y (Theorem 3) Theorem 3 (Exp onent-Optimalit y of Hölder Bound) . F or Gaussian distributions P + = N ( µ, I k ) and P − = N (0 , I k ) with sep ar ation ∆ s = ∥ µ ∥ , the Neyman–Pe arson optimal classiﬁer achieves TPR NP ( δ ) = Φ(Φ − 1 ( δ ) + ∆ s ) , and the Hölder exp onent β ∗ = ( α ∗ − 1) /α ∗ (with α ∗ = 1 + 2 / ∆ 2 s ) is minimax-optimal: (i) No b ound TPR ≤ C ′ · δ γ with γ > β ∗ is valid uniformly over P ( D , α ) = { ( P + , P − ) : D α ( P + ∥ P − ) ≤ D } . (ii) The r atio TPR NP ( δ ) / ( C α ∗ · δ β ∗ ) → 0 as δ → 0 (the NP classiﬁer de c ays faster than the b ound; A pp endix A.3 ), but at deployment-r elevant δ ∈ [10 − 6 , 10 − 1 ] , the r atio r anges fr om 0.1 to 0.9 (A pp endix D.1 ). Pr o of sketch. The NP likelihoo d-ratio test µ T x ≷ t δ yields TPR NP ( δ ) = Φ(Φ − 1 ( δ )+∆ s ) . Asymptotic analysis via Mills’ ratio (App endix A.3 ) shows the NP classiﬁer’s log-exp onen t matches β ∗ as an upp er env elop e: no v alid imp ossibility b ound can use a larger exp onen t. At ﬁnite δ v alues relev ant 7 to deplo yment, TPR NP / Hölder ranges from ≈ 0 . 1 (at ∆ s = 2 . 0 ) to ≈ 0 . 9 (at ∆ s = 0 . 1 ); see App endix D.1 . Corollary 1 (Minimax Optimalit y) . The exp onent β ∗ is minimax-optimal over P ( D , α ) = { ( P + , P − ) : D α ( P + ∥ P − ) ≤ D } ; any valid imp ossibility b ound satisﬁes f ( δ ) = Ω( δ β ∗ ) . The b ound is also tight for non-Gaussian distributions: across 8 families (Laplace, Student- t , Gaussian mixture), the NP classiﬁer ac hieves 28–70% of the Hölder ceiling (App endix D.7 ). 3.3 NP Coun ting Imp ossibilit y (Theorem 4) W e provide a fundamentally diﬀerent pro of of the classiﬁcation imp ossibilit y that av oids Hölder’s inequalit y and Rényi divergence entirely , using only the Neyman–P earson lemma and T onelli’s theorem. Theorem 4 (NP Coun ting Imp ossibilit y) . L et P + ≪ P − with D α ( P + ∥ P − ) < ∞ for some α > 1 . F or any summable risk sche dule δ n = c/n p with p > 1 and any se quenc e of classiﬁers: ∞ X n =1 TPR n ≤ c 1 /p · E P + h P − ( L > L ( X )) − 1 /p i < ∞ wher e L ( x ) = dP + /dP − ( x ) is the likeliho o d r atio. Pr o of sketch. (1) By NP optimality , TPR n ≤ TPR NP ( δ n ) . (2) Deﬁne the counting function N ( ℓ ) = |{ n : c δ n < ℓ }| ; T onelli’s theorem giv es P n TPR NP ( δ n ) = E P + [ N ( L ( X ))] . (3) Bound N ( ℓ ) ≤ ( c/P − ( L > ℓ )) 1 /p . (4) Finiteness via p-v alue density integrabilit y . F ull pro of in App endix A.8 . The coun ting b ound is strictly tighter than the Hölder b ound: 1.76 vs 2.03 at ∆ s = 1 . 0 , p = 2 . 0 (13% impro vemen t). See App endix D.7 for complete v alidation including non-Gaussian distributions. T wo additional supp orting results are in the app endix: the information-theoretic ﬁnite-horizon b ound (Prop osition 1 , App endix C.3 ), which constrains the r ate of utility accumulation via mutual information ( P TPR n ≤ P δ n + √ 2 N I 0 ); and the sample complexit y barrier (Proposition 2 , App endix C.4 ), which sho ws that le arning a gate satisfying the dual conditions requires exp onen tially gro wing training sets, indep enden t of Theorem 1 . The Gaussian sp ecialization (App endix C.1 ) and non-stationary extension with self-correcting structure (App endix C.2 ) pro vide additional theoretical depth. 4 The V eriﬁcation Escap e 4.1 Statemen t (Theorem 2) Theorem 2 (V eriﬁcation Escap e) . Ther e exists a veriﬁc ation-b ase d gate achieving: • δ n = 0 for al l n (zer o false ac c eptanc e) • P TPR n = ∞ (unb ounde d utility) 8 4.2 Construction: Lipsc hitz Ball V eriﬁer Let θ 0 b e a controller veriﬁed safe on a deﬁned op erating domain D = { ( s i , t i ) } M i =1 of M start–target scenarios. Let m > 0 b e the safety mar gin : the minimum distance to any obstacle across all scenarios: m = min i ∈ [ M ] min t ∈ [0 ,T ] d ( tra j θ 0 ( t ; s i , t i ) , obstacles ) Let L b e a (conserv ativ e) Lipschitz constant for the closed-lo op tra jectory map with resp ect to con troller parameters: sup ( s,t ) ∈D ∥ tra j θ ( s, t ) − tra j θ 0 ( s, t ) ∥ ∞ ≤ L · ∥ θ − θ 0 ∥ Ball v eriﬁer: Accept θ iﬀ ∥ θ − θ 0 ∥ < r , where r = m/L . Soundness. If ∥ θ − θ 0 ∥ < r , then the tra jectory deviation is at most L · r = m . Since θ 0 has margin m , the tra jectory of θ sta ys at distance > 0 from all obstacles on all scenarios in D . No unsafe con troller is accepted. R emark. The construction uses standard to ols (Lipsc hitz con tinuit y , the triangle inequality; cf. Berk enkamp et al. 5 ). The contribution is the quantiﬁe d dichotomy within the dual conditions framew ork: the same safety task is pro v ably imp ossible for any classiﬁer (Theorem 1 ) yet ac hiev able by a simple O ( d ) veriﬁer. Classiﬁers face a ceiling of exp ( O ( √ log N )) utility (Theorem 5 ) while veriﬁers ac hieve Θ( N ) ; the structural separation (Prop osition 4 ) establishes that this gap is architectural, not a matter of degree. Co verage: F or m utations ∆ θ ∼ N (0 , σ 2 I d ) : TPR = P ( ∥ ∆ θ ∥ < r ) = P ( χ 2 d < r 2 /σ 2 ) This is b ounded aw ay from zero when r ≥ σ √ d , i.e., when m/ ( Lσ √ d ) ≥ 1 . F or smaller m utation scales σ , TPR → 1 . Computational cost: The gate chec ks ∥ θ − θ 0 ∥ < r , an O ( d ) op eration. No simulation, no oracle calls. 4.3 Domain-Restricted Safet y The veriﬁer guarantees D -safet y: safety on a deﬁned op erating domain D , standard in formal metho ds (DO-178C, ISO 26262). A natural ob jection is that this is weak er than what classiﬁers attempt — generalization to unseen scenarios. W e argue the comparison is more symmetric than it app ears. Classiﬁers are equally domain-restricted, but probabilistically . Classical generalization b ounds [ 4 , 28 ] guarante e classiﬁer accuracy only on the training distribution P : for a classiﬁer with V C-dimension h , the empirical risk con verges to true risk at rate O ( p h log ( n ) /n ) on samples fr om P . If the op erating domain shifts — new obstacle conﬁgurations, new task distributions — the classiﬁer requires retraining on the new distribution to maintain its guarantees. This is the statistical analogue of the v eriﬁer’s geometric domain restriction. The guaran tee types diﬀer structurally . The veriﬁer’s domain restriction is deterministic : for all θ ∈ B ( θ 0 , r ) and all scenarios in D , safet y holds with certaint y ( δ = 0 ). The classiﬁer’s domain restriction is pr ob abilistic : for most θ dra wn from the training distribution, the classiﬁer’s prediction is correct with probability 1 − ϵ . The veriﬁer provides a c ertiﬁc ate ; the classiﬁer pro vides 9 a statistic al estimate . Both require re-v alidation if the domain changes, but the veriﬁer’s guarantee within its domain is exact while the classiﬁer’s is appro ximate. Both gates face the same test. In the dual conditions framework, b oth gates are ev aluated on mutations from the same distribution P + , P − ; neither has access to out-of-distribution mutations. The Theorem 1 imp ossibilit y applies to any binary gate op erating on these distributions, regardless of ho w the gate was trained or whether it generalizes b ey ond them. The comparison in Theorem 5 — classiﬁer utility exp ( O ( √ log N )) vs. veriﬁer utility Θ( N ) — holds within the shared op erating domain. F ormal transformer Lipsc hitz b ounds under LoRA p erturbation (Prop osition 3 ) are stated in App endix C.5 (full deriv ation in A.7 ), enabling comp ositional veriﬁcation at LLM scale. 5 The Separation Principle (Prop osition 4) Prop osition 4 (Structural Classiﬁcation–V eriﬁcation Separation) . Under P + ≪ P − : (i) F or any classiﬁer, δ = 0 = ⇒ TPR = 0 . (ii) Ther e exists a veriﬁc ation gate with δ = 0 and TPR > 0 . (iii) The sep ar ation is strict: as δ → 0 , classiﬁers satisfy TPR → 0 (The or em 1 ), while the veriﬁer maintains c onstant TPR V > 0 at δ V = 0 (The or em 2 ). Pr o of. (i) If P − ( A ) = 0 , absolute con tinuit y giv es P + ( A ) = 0 . (ii) The ball B ( θ 0 , r ) has δ = 0 (Theorem 2 ) and p ositiv e Gaussian mass. (iii) By Theorem 1 , TPR class ( ϵ ) ≤ C α ϵ β → 0 , while TPR V > 0 is indep endent of ϵ . Comprehensiv e exp erimen tal v alidation across 18 classiﬁer conﬁgurations ([ 23 ] §4.1–4.3), MuJoCo b enc hmarks ([ 23 ] §4.5), and LLM-scale ball c haining ([ 23 ] §5.7) is presen ted in the companion pap er [ 23 ]. 6 Finite-Horizon Analysis F or practical deploymen t o ver N steps with risk budget B = P δ n , w e establish the exact utility ceiling. 6.1 Tigh t Finite-Horizon Ceiling (Theorem 5) The Hölder–Jensen ceiling C α · N 1 − β · B β (App endix C.6 ) is not tigh t: it applies Hölder’s inequality to each step individually and then uses Jensen to optimize allo cation. By using the exact NP curv e directly , we obtain the tight ceiling. Theorem 5 (Tight Finite-Horizon Ceiling) . F or N -step deployment with total risk budget B = P N n =1 δ n , the exact maximum achievable utility is: U ∗ ( N , B ) = N · TPR NP ( B / N ) wher e TPR NP ( δ ) = Φ(Φ − 1 ( δ ) + ∆ s ) is the Neyman–Pe arson optimal TPR. The optimal al lo c ation is uniform δ n = B / N (by c onc avity of the NP curve and Jensen ’s ine quality). 10 F or Gaussian distributions with sep ar ation ∆ s , the exact gr owth r ate is: U ∗ ( N , B ) = Θ exp  ∆ s p 2 ln ( N/B )  p ln( N /B ) ! which is subp olynomial: U ∗ ( N , B ) = o ( N ϵ ) for every ϵ > 0 . Pr o of. Step 1 (NP ceiling p er step). By the Neyman–P earson lemma, any classiﬁer at lev el δ n satisﬁes TPR n ≤ TPR NP ( δ n ) . F or Gaussians, TPR NP ( δ ) = Φ(Φ − 1 ( δ ) + ∆ s ) . Step 2 (Optimal allo cation). The R OC curve δ 7→ TPR NP ( δ ) is concav e (a standard prop ert y of NP classiﬁers under contin uous likelihoo d ratios). By Jensen’s inequality , for an y non-negative δ 1 , . . . , δ N with P δ n = B : N X n =1 TPR NP ( δ n ) ≤ N · TPR NP 1 N N X n =1 δ n ! = N · TPR NP ( B / N ) This is ac hieved with equality iﬀ δ n = B / N for all n (uniform allo cation). Step 3 (Asymptotic growth). Setting δ = B / N , as N → ∞ with B ﬁxed, δ = B / N → 0 , and b y Mills’ ratio: TPR NP ( B / N ) = Φ(Φ − 1 ( B / N ) + ∆ s ) ∼ e − z 2 / 2 z √ 2 π where z = p 2 ln ( N/B ) − ∆ s . Thus U ∗ ( N , B ) = N · TPR NP ( B / N ) grows as exp  ∆ s p 2 ln ( N/B )  p ln( N /B ) whic h is ω (log k N ) for all k but o ( N ϵ ) for all ϵ > 0 . Comparison of b ounds (Gaussian, ∆ s = 1 . 0 , B = 1 . 0 ): N Exact ceiling U ∗ MI b ound ( √ N ) Hölder–Jensen Impro vemen t 10 2 9.24 21.0 12.6 2.3 × , 1.4 × 10 3 18.3 98.6 27.2 5.4 × , 1.5 × 10 4 32.7 436 58.6 13 × , 1.8 × 10 5 54.8 1835 126 33 × , 2.3 × 10 6 87.2 7463 272 86 × , 3.1 × The exact ceiling gro ws as exp ( O ( √ log N )) , v astly slo w er than √ N (MI bound) or N 1 − β (Hölder–Jensen). A t N = 10 6 , the MI b ound is 86 × lo ose and the Hölder–Jensen ceiling is 3.1 × lo ose. R emark. Theorem 5 is 13 × –86 × tigh ter than Prop osition 1 ’s √ N b ound at N = 10 4 – 10 6 . Prop osi- tion 1 provides complementary distribution-free guarantees. The classiﬁer and veriﬁer regions are disc onne cte d on the δ = 0 hyperplane (App endix C.7 ). 11 7 V alidation Summary W e v alidate each theoretical result through targeted computations and exp erimen ts. F ull details for eac h v alidation are in App endix D ; v alidation script sp eciﬁcations are in App endix F . Comprehensive exp erimen tal v alidation — including MuJoCo con tinuous control ([ 23 ] §4.5, §5.4–5.5), ball c haining ([ 23 ] §5.4), and LLM-scale deplo yment ([ 23 ] §5.7) — is presented in the companion pap er [ 23 ]. Result V alidation Key Metric Outcome Thm 1 (Hölder) NP clf vs. bound, 4 seps. TPR NP /Hölder ratio 0.1–0.9 (v alid, tight) Thm 3 (Exp.-opt.) 8 non-Gaussian families Min NP/Hölder ratio 0.28–0.70 (within 1 OOM) Prop 1 (MI bound) Hölder vs. MI, p er-step & cumul. Tighter b ound Hölder for δ < 0 . 1 ; MI compl. Prop 2 (Sample) Retrain, d VC = 11 P δ at starv ation 41.17 (div erges) Prop 3 (T ransf. L ) 4 archs., T oy–Qw en-7B Steps in ball 2.3–11.6 (non-v acuous) Thm 2 (Ball v er.) L TC d = 240 , 200 tests F alse accepts 0 ( δ = 0 ) Thm 1 (T rained) 4 clfs, 50K, 72 conﬁgs Hölder violations 0 /72 Thm 4 (Coun ting) 9 (∆ s , p ) conﬁgs Coun ting tighter by 13% at ∆ s = 1 , p = 2 Thm 5 (Ceiling) N up to 10 6 , 5 seps. MI bound lo oseness 4 × –86 × 7.1 LLM-Scale Mec hanism V alidation: GPT-2 with LoRA W e include a single LLM-scale v alidation as a bridge r esult connecting the 240-dimensional L TC demonstration (App endix D.8 ) to industrial LLM systems; comprehensive LLM-scale exp erimen ts (Qw en2.5-7B, 7.6B parameters) are presented in [ 23 ] §5.7. W e v alidate the Lipschitz ball veriﬁer (Theorem 2 + Prop osition 3 ) on GPT-2 (124M parameters) with LoRA ﬁne-tuning. Prop osition 3 establishes that pre-Lay erNorm transformers under LoRA p erturbation hav e ﬁnite, closed-form Lipsc hitz constants; the sp eciﬁc numeric v alues b elo w are estimated via ﬁnite diﬀerences with a 5 × safet y factor (not deriv ed analytically), as is standard for practical deploymen t (see [ 23 ] §6.3, limitation 2). Setup. GPT-2 is equipp ed with LoRA rank-4 adapters on the c_attn (fused QKV) pro jection in all 12 lay ers, yielding d LoRA = 147 , 456 trainable parameters (0.12% of 124M total). The mo del is ﬁne-tuned for 30 steps on WikiT ext-2 with learning rate 5 × 10 − 4 . Safety is deﬁned as p erplexit y on a held-out v alidation set < 2 × ﬁne-tuned p erplexit y . Lipsc hitz estimation. W e prob e the p erplexit y function at 7 p erturbation scales prop ortional to ∥ θ 0 ∥ (from 0.1% to 50% of the parameter norm), with 100 random directional prob es. The estimated Lipsc hitz constant (with 5 × safety factor) is L = 0 . 168 . Ball radius. With margin = 16 . 31 (threshold 32.6, achiev ed p erplexit y 16.3) and L = 0 . 168 : r = m/L = 2 . 53 , capp ed at 0 . 5 · ∥ θ 0 ∥ for meaningful demonstration. 12 Metric V alue LoRA dimension d 147,456 LoRA rank 4 P ost-ﬁnetune p erplexit y 16.3 Safet y threshold 32.6 Lipsc hitz constant L 0.168 (5 × safet y) Ball radius r 2.53 r / ∥ θ 0 ∥ 0.50 Inside-ball safe 50/50 F alse accept rate δ 0 Min inside margin 16.26 Outside-ball unsafe 8/100 Eﬀectiv e TPR 0.352 Result. The ball veriﬁer achiev es conditional δ = 0 (50/50 inside-ball p erturbations are safe; conditional on the estimated Lipschitz constant b eing a v alid upp er b ound) with eﬀective TPR = 0 . 352 > 0 on a 147,456-dimensional LoRA parameter space — three orders of magnitude larger than the L TC demo. The minimum inside-ball margin (16.26) is within 0.3% of the full margin (16.31), conﬁrming that the Lipsc hitz b ound is tight within the veriﬁed ball. All 8 outside-ball violations o ccur at p erturbation scales > 1 . 5 r , conﬁrming that the ball b oundary is meaningful. This v alidates Theorem 2 and Prop osition 3 at LLM scale. Scaling b eyond GPT-2. The companion pap er [ 23 ] extends this v alidation to Qwen2.5-7B- Instruct (7.6B parameters) with comp ositional p er-la yer veriﬁcation (§5.7). 8 Discussion 8.1 Implications for Safe AI Deplo ymen t Theorem 1 implies that an y AI safety approac h based on classifying mo diﬁcations — learned discriminators, anomaly detectors, neural safety critics — faces a fundamen tal ceiling that is a mathematical consequence of distribution o verlap, not a limitation of architecture or training. T o the extent that RLHF reward mo dels act as binary accept/reject gates after thresholding, they inherit this ceiling (see § 1 for the analogy and its limits; the formal results strictly apply to binary gates on parameter mutations, not contin uous rew ard scores). Ov er suﬃcient iterations, either the false acceptance rate accumulates (safet y degrades) or the gate b ecomes ov erly conserv ativ e (utility collapses). W e address ﬁve common concerns. “ P TPR n = ∞ is to o w eak. ” Ev en this w eak condition cannot b e met with b ounded risk; strengthening it (requiring TPR n ≥ c > 0 ) forces δ n ≥ ( c/C α ) 1 /β for all n , making P δ n div erge immediately . “Finite-time systems don’t need P TPR n = ∞ . ” The ﬁnite-horizon tradeoﬀ still applies: with risk budget B , total utility gro ws subp olynomially (Theorem 5 ), yielding an exact budget- allo cation formula for ﬁnite deploymen ts. “Classiﬁers still extract nonzero utility . ” Correct — but the ceiling is subp olynomial ( exp ( O ( √ log N )) ) v ersus the v eriﬁer’s linear gro wth ( Θ( N ) ). At N = 10 , 000 with B = 1 . 0 , a classiﬁer extracts U ∗ ≈ 32 . 7 v ersus a ball veriﬁer’s U ball = 5 , 000 — a 153 × adv antage (App endix D.10 ). 13 “Imp ossibilit y” refers to satisfying the dual conditions sim ultaneously , not to extracting any utility at all. “What ab out an ensem ble of div erse classiﬁers?” An ensemble accepting iﬀ all members agree is itself a classiﬁer with acceptance region A = T i A i . Theorem 1 applies: TPR n ≤ C α · δ β n constrains any binary decision rule. The imp ossibilit y is driven by β < 1 , not classiﬁer complexit y . “Isn’t the v eriﬁer’s conditional δ = 0 just as uncertain as a classiﬁer’s δ ≈ 0 ?” No — the failure mo des are categorically diﬀeren t. A classiﬁer’s δ > 0 is a mathematic al ne c essity under distribution o verlap (Theorem 1 ); no amount of b etter training, data, or architecture can ac hieve δ = 0 with TPR > 0 (Prop osition 4 ). The veriﬁer’s conditionalit y rests on the Lipschitz constan t b eing correct — a falsiﬁable engine ering claim that strengthens monotonically with b etter estimation. At d ≤ 17 , 408 , w e provide analytical b ounds making δ = 0 unconditional; at LLM scale, Prop osition 3 pro ves the required b ound exists and the op en problem is computing it tractably . The distinction is b et w een an imp ossibility theorem and an op en engineering problem (see [ 23 ] §6.3). The classiﬁcation–veriﬁcation separation mirrors b ehavioral testing versus formal guarantees. Curren t AI safety practices — red-teaming, ev aluations, b enc hmarks — are primarily classiﬁcation- based; as self-mo diﬁcation b ecomes prev alen t, veriﬁcation-based safety will b e mathematically required. R emark (Overlap under mutation-speciﬁc distributions) . Even if mutations come from a sp eciﬁc distribution µ , the induced conditional distributions P + = µ | safe and P − = µ | unsafe satisfy P + ≪ P − b y the transversalit y argument (§ 3.1 ), and the Hölder ceiling applies parametrically . The companion pap er [ 23 ] measures ∆ s ∈ [0 . 059 , 0 . 091] across three physical systems (§4.5), conﬁrming lo w separation. Connections to computational complexit y (is satisfying the dual conditions NP-hard?) and op en problems (adaptiv e veriﬁcation, multi-agen t extensions) are discussed in App endix E . 8.2 When Our Results Do Not Apply The imp ossibilit y (Theorem 1 ) and the ﬁnite-horizon ceiling (Theorem 5 ) require sp eciﬁc structural conditions. Practitioners should ev aluate whether their setting satisﬁes these conditions b efore applying our conclusions: 1. P erfectly separable distributions. If safe and unsafe mo diﬁcations are p erfectly separable in feature space ( D α ( P + ∥ P − ) = ∞ or equiv alently P + ⊥ P − ), the Hölder b ound b ecomes v acuous and a p erfect classiﬁer can ac hieve δ = 0 with TPR = 1 . This o ccurs when the safety b oundary do es not pass through the supp ort of the m utation distribution. Our results apply only when P + and P − ha ve ov erlapping supp ort (§ 3.1 ). 2. Discrete or quantized parameter spaces. Our framew ork assumes contin uous parameter p erturbations from distributions with density . If the parameter space is ﬁnite or mo diﬁcations come from a discrete set, the safe/unsafe partition may b e exactly learnable and the distribution o verlap condition may not hold. 3. Large safet y margins with small mo diﬁcation sets. If the system op erates far from any safet y b oundary and mo diﬁcations are b ounded to a region where all p oints are safe, no gate is needed at all. Our results address the non-trivial regime where safet y b oundaries intersect the mo diﬁcation space. 4. Fixed ﬁnite deplo ymen ts where classiﬁer utility suﬃces. Theorem 5 shows classiﬁer utilit y grows as exp ( O ( √ log N )) , whic h is non-zero. F or short deploymen t horizons — say 14 threshold > 0 (unavoidable) T h e o r e m 1 : < T P R < F eature space Classification Gate P + ( s a f e ) P ( u n s a f e ) 0 m a r g i n m T h e o r e m 2 : = 0 , T P R > 0 P arameter space V erification Gate B a l l : r = m / L Safety boundary Figure 1: Overview of the tw o gate arc hitectures: classiﬁcation gates (left) threshold a feature-space represen tation, incurring δ > 0 ; veriﬁcation gates (righ t) certify safet y via a Lipschitz ball, achieving δ = 0 . The classiﬁcation imp ossibilit y (Theorem 1 ) and veriﬁcation escap e (Theorem 2 ) establish a structural dic hotomy . N ≤ 100 with a generous risk budget B — a classiﬁer may provide adequate utility even under the Hölder ceiling. The imp ossibilit y b ecomes practically binding only when N is large enough that the subp olynomial ceiling falls far b elo w the linear growth a veriﬁer achiev es. 5. Systems with non-ov erlapping m utation distributions b y design. Some safet y mecha- nisms engineer the mo diﬁcation space to av oid ov erlap — for example, restricting up dates to a pre-v eriﬁed subspace. If the restriction is enforced b efor e the gate, the resulting conditional distributions ma y b e separable, and a classiﬁer within this restricted space may succeed. Our framew ork applies to the unrestricted case. 9 Conclusion F or p o wer-la w risk sc hedules δ n = O ( n − p ) with p > 1 — the practically relev ant regime — classiﬁer- based safety gates cannot satisfy the dual conditions under any arc hitecture, training regime, or data a v ailabilit y . This is established through tw o indep enden t imp ossibilit y pro ofs (Theorems 1 and 4 ), pro ved exp onen t-optimal b y the NP matching low er b ound (Theorem 3 ; Mills’ ratio asymptotics in App endix A.3 ), and corrob orated b y tw o complementary b ounds: the information-theoretic rate b ound (Prop osition 1 ) and the sample complexity barrier (Prop osition 2 ). F or slowly-deca ying non-p o w er-law schedules where the asymptotic imp ossibilit y do es not hold, Theorem 5 ’s universal ﬁnite-horizon ceiling ensures classiﬁer utility gro ws at most as exp ( O ( √ log N )) — orders of magnitude b elo w a veriﬁer’s linear Θ( N ) growth at any practical horizon. A constructiv e escap e via sou nd v eriﬁcation gates (Theorem 2 ) achiev es δ = 0 with TPR > 0 ; the separation is strict (Prop osition 4 ). W e v alidate on GPT-2 with LoRA ( d = 147 , 456 ): the ball veriﬁer ac hieves conditional δ = 0 (unconditional at d ≤ 17 , 408 ) with TPR = 0 . 352 (§ 7.1 ). Comprehensive exp erimen tal v alidation is in the companion pap er [ 23 ]. Safet y gates for self-improving AI systems should b e built on v eriﬁcation, not classiﬁcation. Note on app endix structure. The app endices are extensive, comprising full pro ofs (A), extended related work (B), supp orting theoretical results (C), n umerical v alidations (D), op en 15 1 0 2 1 0 3 1 0 4 D i m e n s i o n d 1 0 1 1 0 2 1 0 3 L i p s c h i t z c o n s t a n t L Lipschitz Constants vs. Dimension L e m p ( 2 . 5 × c o n s e r v a t i v e ) L t r a j ( a n a l y t i c a l ) 1 0 2 1 0 3 1 0 4 D i m e n s i o n d 1 0 3 * O p t i m a l M u t a t i o n S c a l e * F i t : * d 0 . 5 4 84 240 768 2688 5760 9984 17408 D i m e n s i o n d 0 20 40 60 80 100 Ball Soundness (%) 100% 100% 100% 100% 100% 100% 100% V erification Soundness: 100% at All Scales Figure 2: Scaling analysis of the Lipsc hitz ball veriﬁer from d = 84 to d = 17 , 408 . Ball soundness is 100% at all dimensions. Required mutation scale σ ∗ decreases as O ( d − 0 . 54 ) . 1 0 8 1 0 7 1 0 6 1 0 5 1 0 4 1 0 3 1 0 2 1 0 1 ( f a l s e a c c e p t a n c e r a t e ) 1 0 7 1 0 6 1 0 5 1 0 4 1 0 3 1 0 2 1 0 1 1 0 0 TPR (true positive rate) Theorem 3: Hölder Bound is Tight N P c u r v e ( d 0 = 0 . 5 ) H ö l d e r b o u n d ( d 0 = 0 . 5 ) N P c u r v e ( d 0 = 1 . 0 ) H ö l d e r b o u n d ( d 0 = 1 . 0 ) N P c u r v e ( d 0 = 2 . 0 ) H ö l d e r b o u n d ( d 0 = 2 . 0 ) Figure 3: Exponent-optimalit y v alidation (Theorem 3 ). The NP classiﬁer achiev es 10–90% of the Hölder ceiling at deplo yment-relev ant δ , conﬁrming near-tigh tness. problems (E), and script sp eciﬁcations (F). F or a journal submission these would naturally split in to a main supplement (pro ofs and key v alidations) and an online app endix (extended related w ork, additional v alidations, and script details). W e retain them in full here so the arXiv preprint is self-con tained. A Pro of Details A.1 Hölder Inequalit y V eriﬁcation The conjugate exp onen ts α and α ′ = α/ ( α − 1) satisfy 1 /α + 1 /α ′ = 1 . The Hölder inequality: Z f g ≤ ∥ f ∥ α · ∥ g ∥ α ′ 16 1 0 1 1 0 2 1 0 3 1 0 4 H o r i z o n N 1 0 1 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 T o t a l U t i l i t y U * ( N , B ) Theorem 9: Classifier Ceiling vs. V erifier (L TC d=240) Classifier ceiling (B=0.001) Classifier ceiling (B=0.01) Classifier ceiling (B=0.1) Classifier ceiling (B=1.0) Ball verifier (linear) U = N ( i d e a l ) Figure 4: Finite-horizon utility ceiling (Theorem 5 ). The exact ceiling U ∗ ( N , B ) grows as exp ( O ( √ log N )) (subp olynomial), v astly b elow the MI b ound ( √ N ) and Hölder–Jensen ( N 1 − β ). The ball v eriﬁer’s utility grows linearly ( Θ( N ) ). 0 25 50 75 100 125 150 175 200 Training Step 2 4 6 8 10 Loss Q w e n 2 . 5 - 7 B L o R A F i n e - t u n i n g ( d = 1 , 2 6 1 , 5 6 8 ) Loss (smoothed) Ball Accepted Oracle Checked Oracle Rejected 0 20 40 60 80 100 120 140 160 Count 158 42 0 Oracle savings: 158/200 = 79% V e r i f i c a t i o n : 7 9 % b a l l - a c c e p t e d , = 0 Figure 5: GPT-2 LoRA v alidation ( d LoRA = 147 , 456 ). Inside-ball: 50/50 safe ( δ = 0 ). Eﬀective TPR = 0 . 352 . is applied with f = dP + /dP − and g = 1 A n , b oth measured against P − . Then: TPR n = Z A n dP + = Z A n dP + dP − dP − = Z f g dP − By Hölder: TPR n ≤ Z dP + dP − ! α dP − ! 1 /α ·  Z 1 α ′ A n dP −  1 /α ′ The ﬁrst factor is e ( α − 1) D α ( P + ∥ P − ) /α = C α (b y deﬁnition of Rényi divergence). The second factor is δ 1 /α ′ n = δ ( α − 1) /α n = δ β n . 17 1 0 9 1 0 8 1 0 7 1 0 6 1 0 5 1 0 4 1 0 3 1 0 2 1 0 1 1 0 0 ( f a l s e a c c e p t a n c e r a t e ) 1 0 6 1 0 5 1 0 4 1 0 3 1 0 2 1 0 1 1 0 0 TPR (true positive rate) Structural gap (Prop. 7) Classification V erification Separation Classifier ceiling (Hölder) Classifier-accessible region V e r i f i e r : = 0 , T P R = 0 . 8 0 V e r i f i e r ( G P T - 2 ) : = 0 , T P R = 0 . 3 5 2 Figure 6: Structural separation (Prop osition 4 ) in the ( δ, TPR ) plane. Classiﬁers lie on the curv e TPR ≤ C α δ β approac hing the origin; the veriﬁer o ccupies the δ = 0 axis with TPR > 0 . A.2 Rén yi Div ergence Conv ention W e use D α ( P ∥ Q ) = 1 α − 1 log R ( dP /dQ ) α dQ follo wing v an Erven and Harremo es [ 27 ] . This diﬀers from some references b y a factor of ( α − 1) in the exp onen t. The constant C α = exp  α − 1 α D α ( P + ∥ P − )  is ﬁnite whenev er the Rényi divergence is ﬁnite, which requires P + ≪ P − (absolute con tinuit y) and suﬃcien tly light tails of the likelihoo d ratio. A.3 Theorem 3 Exp onen t-Optimalit y: Mills’ Ratio Asymptotics W e provide the full asymptotic analysis establishing the exp onen t-optimality of the Hölder b ound. Setup. F or Gaussian P + = N ( µ, I k ) , P − = N (0 , I k ) with ∆ s = ∥ µ ∥ , the NP optimal test rejects when µ T x < t δ , giving TPR NP ( δ ) = Φ(Φ − 1 ( δ ) + ∆ s ) where Φ is the standard normal CDF. The Hölder b ound with optimal order α ∗ = 1 + 2 / ∆ 2 s and β ∗ = 1 − 1 /α ∗ = 2 / (2 + ∆ 2 s ) giv es C α ∗ = exp(∆ 2 s / 2) . Log-asymptotic analysis. As δ → 0 , set z δ = Φ − 1 (1 − δ ) (so z δ → + ∞ ). Comparing the log-exp onen ts directly: log TPR NP ( δ ) ∼ − ( z δ − ∆ s ) 2 2 , log( C α ∗ δ β ∗ ) ∼ ∆ 2 s 2 − β ∗ z 2 δ 2 Dividing b y log δ ∼ − z 2 δ / 2 : log TPR NP ( δ ) log δ → ( z δ − ∆ s ) 2 z 2 δ = 1 − 2∆ s z δ + ∆ 2 s z 2 δ → 1 while log( C α ∗ δ β ∗ ) log δ → β ∗ < 1 . Since the NP classiﬁer’s log-exp onen t (1) exceeds the Hölder b ound’s ( β ∗ ), the ratio TPR NP ( δ ) / ( C α ∗ δ β ∗ ) → 0 as δ → 0 — the NP classiﬁer deca ys faster than the b ound. 18 Exp onen t-optimalit y . The key consequence: lim inf δ → 0 log TPR NP ( δ ) log δ = 1 > β ∗ The NP classiﬁer achiev es TPR = Ω( δ 1 − ϵ ) for all ϵ > 0 , so any v alid universal upp er b ound m ust ha ve exp onen t ≤ 1 . Meanwhile, the Hölder b ound with β ∗ < 1 is v alid. Therefore β ∗ is the smallest exp onen t ac hiev able by any imp ossibilit y b ound — it cannot b e replaced by an y γ > β ∗ without violating the NP classiﬁer’s p erformance. Practical tigh tness. A t ﬁnite δ v alues relev ant to deplo yment ( δ ∈ [10 − 6 , 10 − 1 ] ), the ratio TPR NP / ( C α ∗ δ β ∗ ) ranges from 0.1 to 0.9 dep ending on ∆ s (App endix D.1 ), conﬁrming that the b ound is practically tight — the NP classiﬁer op erates within one order of magnitude of the Hölder ceiling across the deplo yment-relev ant range. A.4 Lipsc hitz Ball Soundness Pro of Supp ose θ ∈ B ( θ 0 , r ) with r = m/L . F or an y scenario ( s i , t i ) ∈ D : sup t d ( tra j θ ( t ) , tra j θ 0 ( t )) ≤ L · ∥ θ − θ 0 ∥ < L · r = m Since θ 0 has margin m (minim um distance to obstacles), the tra jectory of θ main tains p ositiv e distance to all obstacles: d ( tra j θ ( t ) , obstacle j ) ≥ m − L ∥ θ − θ 0 ∥ > 0 ∀ t, j Therefore θ is D -safe. A.5 Information-Theoretic Bound F ull Pro of Setup. A t each step n , the gate g n : R k → { accept , reject } induces a binary c hannel from the safety lab el S n ∈ { safe , unsafe } to the gate decision. The mutual information of this channel is: I n = I ( g n ( θ n ); S n ) = H ( g n ) − H ( g n | S n ) Pinsk er b ound. The total v ariation b et ween the gate’s conditional distributions satisﬁes TV( P g | + , P g |− ) = | TPR n − δ n | / 2 . By Pinsker’s inequalit y: | TPR n − δ n | 2 ≤ s I n 2 Hence TPR n ≤ δ n + √ 2 I n . Summation. Summing ov er n = 1 , . . . , N : N X n =1 TPR n ≤ N X n =1 δ n + N X n =1 p 2 I n By Cauc hy–Sc h warz: P N n =1 √ I n ≤ q N P N n =1 I n . Under the b ounded mutual information assump- tion P N n =1 I n ≤ I 0 : N X n =1 TPR n ≤ N X n =1 δ n + p 2 N I 0 19 A.6 Sample Complexit y Bound F ull Pro of Setup. The safet y gate at step n is a binary classiﬁer g n ∈ G (a hypothesis class with VC dimension d VC ), trained on n train ( n ) lab eled examples. Step 1: By the fundamental theorem of statistical learning [ 13 , 28 ], with probability ≥ 1 − η : err true ≤ err train + p ( d VC ln(2 m/d VC ) + ln(2 /η )) /m . Step 2: Setting the b ound equal to ϵ n / 2 and solving: n train ( n ) = Ω( d VC /ϵ 2 n ) . Step 3: F or ϵ n = c/n p : n train ( n ) = Ω( d VC · n 2 p /c 2 ) . Step 4: A v ailable data grows as n 0 + k n ; required data as n 2 p . The crossing p oin t: n fail = Θ(( c 2 k /d VC ) 1 / (2 p − 1) ) . A.7 T ransformer Lipschitz Deriv ation W e deriv e the p er-lay er Lipschitz constan t for a pre-Lay erNorm transformer under LoRA p erturbation of atten tion pro jections. Each lay er k computes: y k = x k + MHA ( LN 1 ( x k )) , z k = y k + FFN ( LN 2 ( y k )) La yerNorm b ound. ∥ J LN ∥ ≤ ∥ γ ∥ ∞ / √ ϵ where ϵ is the regularization constan t. Multi-head atten tion under LoRA. Under LoRA p erturbation ∆ θ = (∆ A q ,p , ∆ B q ,p ) : ∥ ∆ O p ∥ ≤ ∥ W 0 v ,p ∥ · ∥ W 0 k,p ∥ · ∥ LN ( x ) ∥ 2 √ d k · √ 2 · max( ∥ A q ,p ∥ , ∥ B q ,p ∥ ) · ∥ ∆ θ p ∥ F or n proj LoRA-adapted pro jections p er la yer: L LoRA k ≤ ∥ γ k ∥ √ ϵ · max p ∥ W 0 v ,p ∥ √ d k · q 2 n proj Comp ositional escap e. Instead of using the exp onen tially large pro duct L full = Q k (1 + L k ) , w e use the additive b ound: ∥ ∆ output ∥ ≤ K X k =1 L LoRA k · ∥ ∆ θ k ∥ · Y j >k L full,frozen j Since the frozen-lay er pro ducts Q j >k L full,frozen j are constants that can b e precomputed once from the pretrained weigh ts, deﬁne ˜ L k = L LoRA k · Q j >k L full,frozen j . The veriﬁcation reduces to the p er-la y er ball c heck P k ˜ L k ∥ ∆ θ k ∥ ≤ m , a conserv ative but tractable O ( d ) computation. A.8 NP Coun ting Pro of F ull Details T onelli in terchange. The interc hange P n P + ( L ( X ) > c δ n ) = E P + [ P n 1 L ( X ) >c δ n ] is justiﬁed by T onelli’s theorem applied to non-negative measurable functions with the counting measure on N and P + on R k . Coun ting function. N ( ℓ ) = |{ n ∈ N : c δ n < ℓ }| coun ts ho w man y thresholds are exceeded. F or δ n = c/n p , w e get N ( ℓ ) ≤ ( c/P − ( L > ℓ )) 1 /p . P-v alue densit y in tegrability . W riting U ( x ) = P − ( L > L ( x )) , the b ound b ecomes E P + [ U ( X ) − 1 /p ] . F or p > 1 , the in tegrand u − 1 /p f U ( u ) is in tegrable near u = 0 b ecause the Gaussian tail mak es f U ( u ) deca y sup er-p olynomially . More generally , the exp ectation is ﬁnite whenev er the p-v alue density satisﬁes f U ( u ) = O ( u η ) near u = 0 for some η > 1 /p − 1 ; this holds for all distribution pairs with D α ( P + ∥ P − ) < ∞ for suﬃcien tly large α , whic h is guaran teed b y the h yp othesis of Theorem 4 . 20 A.9 Tigh t Finite-Horizon Ceiling Details Conca vity of NP curve. The deriv ative TPR ′ NP ( δ ) = ϕ (Φ − 1 ( δ ) + ∆ s ) /ϕ (Φ − 1 ( δ )) . The second deriv ative is negative for all δ ∈ (0 , 1) b y log-concavit y of ϕ , establishing conca vity (see also [ 13 ], Chapter 3). Asymptotic form ula. Using Φ − 1 ( δ ) ∼ − p 2 ln (1 /δ ) for δ → 0 and Mills’ ratio: U ∗ ( N , B ) ∼ B · exp(∆ s p 2 ln ( N/B ) − ∆ 2 s / 2) p 2 π · 2 ln ( N/B ) This gro ws as exp(∆ s p 2 ln ( N/B )) , which is o ( N ϵ ) for ev ery ϵ > 0 but ω (log k N ) for ev ery k . B Relation to Kno wn Results and Extended Related W ork B.1 Relation to Kno wn Results The mathematical to ols in this pap er — Hölder’s inequality , Rén yi divergence, Lipschitz contin uity , F ano’s inequality , V C dimension — are well-established. The p er-step b ound TPR ≤ C α · δ β is an instance of a standard f-div ergence inequality [ 27 ], and the Neyman–Pearson lemma [ 16 ] establishes R OC tradeoﬀs for individual hypothesis tests. Our contribution is the pr oblem formalization and the structur al r esults that emerge: (1) the dual conditions as a formalization of safe self-improv emen t, (2) sequential comp osition creating an imp ossibility for the c oupling of b ounded risk and unbounded utilit y , (3) the tightness of this coupling, (4) its information-theoretic strengthening, (5) the sample complexit y barrier, and (6) the structural separation b et w een classiﬁcation and veriﬁcation. An analogy clariﬁes the distinction. Arro w’s imp ossibilit y theorem uses elementary so cial-c hoice axioms, each individually ob vious, but their c omp osition yields a deep imp ossibility no v oting system can escap e. Similarly , our p er-step b ound is standard, and the dual conditions are individually natural. But the c oupling creates a structural imp ossibilit y with no analog in single-test hypothesis testing. B.2 Extended Related W ork Self-impro ving AI safet y . The alignment literature discusses recursive self-impro vemen t [ 6 , 25 ] and concrete safety challenges [ 2 ] but lacks formal imp ossibility results for the safet y–utility coupling. Christiano et al. [ 7 ] prop ose iterated ampliﬁcation; Leike et al. [ 14 ] formalize reward mo deling. Hyp othesis testing and statistical tradeoﬀs. The Neyman–Pearson lemma [ 13 , 16 ] estab- lishes optimal R OC tradeoﬀs for individual tests. The nov elt y is se quential c omp osition : summabilit y constrain ts on { δ n } force summabilit y on { TPR n } . Imp ossibilit y results in learning theory . No-free-lunch theorems [ 30 ] show no classiﬁer dominates across all distributions. Rice [ 21 ] sho ws undecidability of seman tic prop erties. Our imp ossibilit y is for a sp e ciﬁc task under distribution ov erlap. Information-theoretic b ounds. F ano’s inequality and its reﬁnemen ts [ 20 ] pro vide fundamen tal limits. The strong data pro cessing inequality [ 1 , 19 ] b ounds information pro cessing gains. P A C-Ba yes and sample complexit y . McAllester [ 15 ] b ound generalization via KL divergence. V apnik and Chervonenkis [ 28 ] established VC dimension. W e use V C sample complexity to sho w indep enden t barriers. A dversarial robustness. T sipras et al. [ 26 ] pro ve accuracy–robustness tradeoﬀs. Gilmer et al. [ 11 ] sho w adversarial examples are inevitable in high dimensions. Our imp ossibility concerns se quential c omp osition , not p er-input robustness. 21 T ransformer Lipsc hitz b ounds. Virmaux and Scaman [ 29 ] compute sp ectral norms. Kim et al. [ 12 ] analyze atten tion Lipschitz prop erties. Dasoulas et al. [ 8 ] study Lipsc hitz normalization. F azlyab et al. [ 9 ] use SDP for tight b ounds. W e derive b ounds for LoRA p erturbations sp eciﬁcally . C Supp orting Theoretical Results C.1 Gaussian Sp ecialization F or unit-v ariance Gaussians with separation ∆ s = | µ + − µ − | /σ : D α ( P + ∥ P − ) = α ∆ 2 s / 2 . The optimal (Neyman–P earson) classiﬁer achiev es TPR = Φ(Φ − 1 ( δ ) + ∆ s ) . C.2 Non-Stationary Extension Prop osition C.2 (Non-Stationary Imp ossibilit y) . L et { ( P + n , P − n ) } n ≥ 1 b e a se quenc e of distribution p airs and α ∈ (1 , ∞ ) with β = 1 − 1 /α . Supp ose ¯ D := sup n D α ( P + n ∥ P − n ) < ∞ . Then for any se quenc e of classiﬁers { g n } with p er-step r ates ( δ n , TPR n ) : TPR n ≤ ¯ C α · δ β n for al l n wher e ¯ C α = exp (( α − 1) ¯ D ) . Conse quently, if P δ n < ∞ then P TPR n ≤ ¯ C α P δ β n < ∞ , and the dual c onditions c annot b e jointly satisﬁe d. Pr o of. A t each step n , the Hölder b ound (Theorem 1 pro of, Step 1) gives TPR n ≤ C ( n ) α · δ β n where C ( n ) α = exp (( α − 1) D α ( P + n ∥ P − n )) . Since D α ( P + n ∥ P − n ) ≤ ¯ D for all n , w e hav e C ( n ) α ≤ ¯ C α . Summing: P TPR n ≤ ¯ C α P δ β n . F or p o wer-la w sc hedules δ n ≤ c/n p , we hav e δ β n ≤ c β n − pβ , which is summable iﬀ pβ > 1 (i.e., p > α ). F or general summable { δ n } , con vergence of P δ β n follo ws from Hölder’s inequalit y on ﬁnite horizons: P N n =1 δ β n ≤ N 1 − β · ( P N n =1 δ n ) β ≤ N 1 − β · B β where B = P δ n < ∞ . R emark (Cov erage gap) . F or p o w er-law schedules δ n = c · n − p with p > 1 , the series P δ β n = c β P n − pβ con verges iﬀ pβ > 1 , i.e., p > α . F or 1 < p ≤ α , the stationary imp ossibilit y (Theorem 1 ) applies but Prop osition C.2 do es not — the non-stationary extension requires the strictly stronger condition p > α (ﬂagged in § 2.2 ). This gap narro ws as α → 1 + and v anishes for all practically relev ant fast- deca ying schedules ( p ≥ 2 ). F or the intermediate regime, Theorem 5 ’s stationarity-free ﬁnite-horizon ceiling pro vides an alternative b ound. C.3 Information-Theoretic Finite-Horizon Bound Prop osition 1 (Information-Theoretic Finite-Horizon Bound) . L et { g n } b e a se quenc e of safety gates with p er-step mutual information I n and total budget I 0 = P I n . Then for any N : N X n =1 TPR n ≤ N X n =1 δ n + p 2 N I 0 This b ound grows as √ N , so it do es not pro ve P TPR n < ∞ — that follows from Theorem 1 . Prop osition 1 complements Theorem 1 b y constraining the r ate of utility accumulation via mutual information. F ull pro of in App endix A.5 . 22 C.4 Sample Complexit y Barrier Prop osition 2 (Sample Complexity Barrier) . L et G b e a family of binary classiﬁers with V C dimension d V C . F or the gate to achieve δ n ≤ c/n p with p > 1 , the r e quir e d tr aining set is n tr ain ( n ) = Ω( d V C · n 2 p ) . If the system gener ates at most k new lab ele d examples p er step, sample starvation o c curs at n fail = O ( k 1 / (2 p − 1) ) . This result is indep enden t of Theorem 1 : even if a classiﬁer circumv ented the Hölder b ound, it w ould face sample starv ation. F ull pro of in App endix A.6 . C.5 F ormal T ransformer Lipschitz Bounds Prop osition 3 (T ransformer LoRA Lipschitz Bound) . F or a pr e-L ayerNorm tr ansformer with K layers under L oRA p erturb ation with r ank r on n pr oj attention pr oje ctions p er layer, the p er-layer Lipschitz c onstant w.r.t. L oRA p ar ameters is: L L oRA k ≤ ∥ γ k ∥ √ ϵ · max p ∥ W 0 v ,p ∥ √ d k · q 2 · n pr oj Comp ositional veriﬁc ation che cks P k L L oRA k · ∥ ∆ θ k ∥ ≤ m (additive, O ( d ) ) r ather than the exp onen- tial ly lar ge pr o duct L ful l = Q k (1 + L k ) . F ull deriv ation in App endix A.7 . C.6 Hölder–Jensen Appro ximation F or practical deploymen t o ver N steps with risk budget B = P δ n , applying the p er-step Hölder b ound and Jensen’s inequality yields: U max ( N , B ) = C α · N 1 − β · B β with optimal uniform allo cation δ n = B / N . This b ound is lo oser than the exact NP-based ceiling U ∗ ( N , B ) = N · TPR NP ( B / N ) (Theorem 5 ), whic h is 1.4–3.1 × tighter at N = 10 2 – 10 6 . C.7 Multi-Dimensional T radeoﬀ Surface Classiﬁers o ccup y: TPR ≤ C α · δ β , C = O ( d 2 ) , n = Ω( d VC /δ 2 ) . V eriﬁers o ccup y: δ = 0 , TPR > 0 (domain-restricted), C = O ( d ) , n = 0 . These regions are disconnected on the δ = 0 hyperplane. D F ull Numerical V alidation D.1 Tigh tness V alidation ∆ s α ∗ β ∗ TPR NP /Hölder at δ = 10 − 6 log TPR / log δ at δ = 10 − 12 0.1 201.0 0.995 0.561 0.974 0.5 9.0 0.889 0.834 0.875 1.0 3.0 0.667 0.321 0.758 2.0 1.5 0.333 0.108 0.552 F or all separations, TPR NP ≤ Hölder b ound (v erifying Theorem 1 ) and the ratio ranges from 0.1 to 0.9 at deplo yment-relev ant δ . 23 D.2 Information-Theoretic Bound Comparison The Hölder b ound is tighter p er-step for small δ ; the MI b ound is complementary for cum ulative analysis. Both b ounds are v alid across all distributions tested. D.3 Sample Complexit y Sim ulation Sim ulated logistic regression ( d VC = 11 , ∆ s = 0 . 5 , k = 5 ): 200/200 steps sample-starv ed; P δ = 41 . 17 (div erges). Conﬁrms Prop osition 2 . D.4 T ransformer Lipschitz Computation Arc hitecture d K d k ∥ W v ∥ L LoRA k r k Steps in ball T oy (2L) 64 2 32 2.32 259.7 5.8e-4 11.6 Small (6L) 256 6 64 2.09 165.2 3.0e-4 6.1 GPT-2 (12L) 768 12 64 1.80 142.5 1.8e-4 3.5 Qw en-7B (28L) 3584 28 128 1.68 94.0 1.1e-4 2.3 The b ound is non-v acuous across all arc hitectures — even at Qwen-7B scale, 2 LoRA gradient steps ﬁt within the safe ball. D.5 P areto F ron tier Visualization The classiﬁer and veriﬁer regions are disc onne cte d on the δ = 0 hyperplane. Classiﬁers require Ω( d VC /δ 2 ) samples and cannot reach δ = 0 with TPR > 0 . The ball v eriﬁer op erates at δ = 0 with no training data. D.6 T rained Classiﬁer Ceiling A cross all 72 (classiﬁer, δ ) pairs tested (4 classiﬁers × 6 δ v alues × 3 separations), zero violations of the Hölder b ound were observed. T rained classiﬁers ac hieve TPR ratios of 0.52–0.94 relative to the Hölder ceiling. D.7 Non-Gaussian Tigh tness V alidation A cross 8 non-Gau ssian families (Laplace, Studen t- t , Gaussian mixture): min ratios 0.28–0.40, a verage ratios 0.54–0.70. The b ound is uniformly v alid and tight across heavy-tailed and multi- mo dal distributions. D.8 Lipsc hitz Ball V eriﬁer Demonstration L TC controller ( d = 240 ), L = 13 . 75 , r = 0 . 0208 . Inside-ball: 200/200 safe ( δ = 0 ), TPR = 0 . 286 . D.9 NP Coun ting Pro of V alidation All 9 conﬁgs satisfy direct sum ≤ coun ting b ound (ratios 0.33–0.89). Counting 13% tigh ter than Hölder at ∆ s = 1 . 0 , p = 2 . 0 . 24 D.10 Tigh t Finite-Horizon V alidation The exact ceiling gro ws subp olynomially: from N = 10 4 to N = 10 6 (100 × increase in N ), U ∗ gro ws only 2.66 × . Uniform allo cation optimal (Jensen). MI b ound is lo ose by 4–86 × . E Computational Complexit y and Op en Problems E.1 Connection to Computational Complexit y The information-theoretic b ound (Prop osition 1 ) and sample complexit y barrier (Prop osition 2 ) connect to a broader question: is safe self-improv emen t computationally hard? A natural extension is whether satisfying the dual conditions is NP-hard. E.2 Op en Problems 1. Computational imp ossibility . Is satisfying the dual conditions NP-hard, b ey ond b eing statistically imp ossible? 2. A daptive veriﬁcation. Can tighter veriﬁed regions (e.g., ellipsoidal) maintain O ( d ) c hecking? Ball c haining exp erimen ts in [ 23 ] provide an initial empirical answer. 3. Multi-agen t and contin uous-time extensions of the dual conditions. F V alidation Script Details • experiments/prove_tightness.py : Computes NP TPR via Φ(Φ − 1 ( δ ) + ∆ s ) for 100 δ v alues and 4 separations; conﬁrms TPR NP ≤ Hölder (Theorems 1 , 3 ). • experiments/prove_info_theoretic_bound.py : Computes MI of the NP channel for Gaussian and Laplacian distributions (Prop osition 1 ). • experiments/prove_sample_complexity.py : Simulates 200 steps with logistic regression gate (Prop osition 2 ). • experiments/pareto_tradeoff.py : Computes the 4D tradeoﬀ surface (App endix D.5 ). • experiments/validate_classifier_ceiling.py : T rains 4 classiﬁers on 50K samples (Ap- p endix D.6 ). • experiments/compute_lipschitz_bounds.py : Proposition 3 b ounds for 4 architectures (Ap- p endix D.4 ). • experiments/prove_tightness_nongaussian.py : 8 non-Gaussian families (App endix D.7 ). • experiments/validate_ball_verifier.py : Ball v eriﬁer on L TC d = 240 (App endix D.8 ). • experiments/lora_ball_verifier_gpt2.py : GPT-2 LoRA v alidation (§ 7.1 ). • experiments/prove_counting_impossibility.py : Theorem 4 v alidation (App endix D.9 ). • experiments/prove_tight_finite_horizon.py : Theorem 5 v alidation (App endix D.10 ). 25 References [1] R udolf Ahlswede and P éter Gács. Spreading of sets in pro duct spaces and hypercontraction of the Mark ov op erator. A nnals of Pr ob ability , 4(6):925–939, 1976. [2] Dario Amo dei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Sch ulman, and Dan Mané. Concrete problems in AI safet y . arXiv pr eprint arXiv:1606.06565 , 2016. [3] Amanda Askell, Y untao Bai, Anna Chen, et al. A general language assistant as a lab oratory for alignmen t. arXiv pr eprint arXiv:2112.00861 , 2021. [4] P eter L. Bartlett and Shahar Mendelson. Rademacher and Gaussian complexities: Risk b ounds and structural results. Journal of Machine L e arning R ese ar ch , 3:463–482, 2002. [5] F elix Berkenkamp, Matteo T urchetta, Angela Sc ho ellig, and Andreas Krause. Safe mo del-based reinforcemen t learning with stability guarantees. In A dvanc es in Neur al Information Pr o c essing Systems (NeurIPS) , 2017. [6] Nic k Bostrom. Sup erintel ligenc e: Paths, Dangers, Str ate gies . Oxford Universit y Press, 2014. [7] P aul Christiano, Ajeya Cotra, and Mark Xu. Iterated ampliﬁcation. A lignment F orum , 2017. [8] George Dasoulas, Ludo vic Dos San tos, Filipp o Maria Bianchi, and Michalis V azirgiannis. Lipsc hitz normalization for self-attention lay ers with application to graph neural netw orks. In Pr o c e e dings of the International Confer enc e on Machine L e arning (ICML) , 2021. [9] Mah yar F azly ab, Alexander Rob ey , Hamed Hassani, Manfred Morari, and George J. Pappas. Eﬃcien t and accurate estimation of Lipsc hitz constants for deep neural netw orks. In A dvanc es in Neur al Information Pr o c essing Systems (NeurIPS) , 2019. [10] Allan Gibbard. Manipulation of voting sc hemes: A general result. Ec onometric a , 41(4):587–601, 1973. [11] Justin Gilmer, Luk e Metz, F artash F aghri, Samuel S. Schoenholz, Maithra Ragh u, Martin W attenberg, and Ian Go o dfello w. Adv ersarial spheres. In ICLR W orkshop , 2018. [12] Hyunjik Kim, George Papamakarios, and Andriy Mnih. The Lipschitz constan t of self-attention. In Pr o c e e dings of the International Confer enc e on Machine L e arning (ICML) , 2021. [13] Eric h L. Lehmann and Joseph P . Romano. T esting Statistic al Hyp otheses . Springer, 3rd edition, 2005. [14] Jan Leik e, David Krueger, T om Everitt, et al. Scalable agent alignment via reward mo deling: A researc h direction. arXiv pr eprint arXiv:1811.07871 , 2018. [15] Da vid McAllester. P A C-Bay esian mo del av eraging. In Pr o c e e dings of the Confer enc e on L e arning The ory (COL T) , 1999. [16] Jerzy Neyman and Egon S. Pearson. On the problem of the most eﬃcien t tests of statistical h yp otheses. Philosophic al T r ansactions of the R oyal So ciety A , 231:289–337, 1933. [17] Long Ouy ang, Jeﬀ W u, Xu Jiang, et al. T raining language mo dels to follow instructions with h uman feedback. In A dvanc es in Neur al Information Pr o c essing Systems (NeurIPS) , 2022. 26 [18] Christos H. P apadimitriou and Mihalis Y annakakis. On the appro ximability of trade-oﬀs and optimal access of web sources. In Pr o c e e dings of the IEEE Symp osium on F oundations of Computer Scienc e (FOCS) , pages 86–92, 2000. [19] Y ury P olyanskiy and Yihong W u. Strong data pro cessing inequalities for channels and Bay esian net works. In Convexity and Conc entr ation , volume 161 of IMA V olumes in Mathematics and its A pplic ations , pages 211–249. Springer, 2017. [20] Maxim Raginsky . Strong data pro cessing inequalities and Φ -Sob olev inequalities for discrete c hannels. IEEE T r ansactions on Information The ory , 62(6):3355–3389, 2016. [21] Henry Gordon Rice. Classes of recursively en umerable sets and their decision problems. T r ansactions of the A meric an Mathematic al So ciety , 74(2):358–366, 1953. [22] Mark Allen Satterthw aite. Strategy-pro ofness and Arrow’s conditions. Journal of Ec onomic The ory , 10(2):187–217, 1975. [23] Arsenios Scriv ens. Empirical v alidation of the classiﬁcation–veriﬁcation dichotom y for AI safety gates. Zeno do , 2026. doi: 10.5281/zeno do.19237566. URL https://zenodo.org/records/ 19237566 . Companion pap er. [24] Shai Shalev-Sh wartz. Online learning and online con vex optimization. F oundations and T r ends in Machine L e arning , 4(2):107–194, 2012. [25] Nate Soares and Benja F allenstein. Agent foundations for aligning mac hine intelligence with h uman interests. T echnical rep ort, Machine Intelligence Research Institute (MIRI), 2017. [26] Dimitris T sipras, Shibani Santurkar, Logan Engstrom, Alexander T urner, and Aleksander Madry . Robustness may b e at o dds with accuracy . In International Confer enc e on L e arning R epr esentations (ICLR) , 2019. [27] Tim v an Erven and P eter Harremo es. Rén yi diverge nce and Kullbac k–Leibler divergence. IEEE T r ansactions on Information The ory , 60(7):3797–3820, 2014. [28] Vladimir N. V apnik and Alexey Y a. Chervonenkis. On the uniform con vergence of relativ e frequencies of even ts to their probabilities. The ory of Pr ob ability and its A pplic ations , 16(2): 264–280, 1971. [29] Aladin Virmaux and Kevin Scaman. Lipschitz regularity of deep neural netw orks. In A dvanc es in Neur al Information Pr o c essing Systems (NeurIPS) , 2018. [30] Da vid H. W olp ert. The lack of a priori distinctions b et ween learning algorithms. Neur al Computation , 8(7):1341–1390, 1996. 27

Information-Theoretic Limits of Safety Verification for Self-Improving Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment