Debate is efficient with your time
AI safety via debate uses two competing models to help a human judge verify complex computational tasks. Previous work has established what problems debate can solve in principle, but has not analysed the practical cost of human oversight: how many queries must the judge make to the debate transcript? We introduce Debate Query Complexity}(DQC), the minimum number of bits a verifier must inspect to correctly decide a debate. Surprisingly, we find that PSPACE/poly (the class of problems which debate can efficiently decide) is precisely the class of functions decidable with O(log n) queries. This characterisation shows that debate is remarkably query-efficient: even for highly complex problems, logarithmic oversight suffices. We also establish that functions depending on all their input bits require Omega(log n) queries, and that any function computable by a circuit of size s satisfies DQC(f) <= log(s) + 3. Interestingly, this last result implies that proving DQC lower bounds of log(n) + 6 for languages in P would yield new circuit lower bounds, connecting debate query complexity to central questions in circuit complexity.
💡 Research Summary
The paper introduces a new quantitative measure for the human oversight cost in AI‑safety‑by‑debate, called Debate Query Complexity (DQC). In the debate model, two powerful provers (Prov₀ and Prov₁) exchange messages while a human verifier reads only a limited number of bits from the combined transcript and the original input, then decides the Boolean function f : {0,1}ⁿ→{0,1}. A (k, ℓ)‑debate is defined as a protocol where the honest prover can always force the correct outcome against any dishonest opponent, using a transcript of length k and a verifier that makes at most ℓ adaptive queries. DQC(f) is the smallest ℓ for which such a protocol exists, regardless of k.
The central result is a tight characterisation: PSPACE/poly equals the class of functions decidable with O(log n) queries, i.e.
PSPACE/poly = { f | DQC(f) ≤ O(log n) }.
Thus any language that can be solved by a polynomial‑time, polynomial‑length debate can be verified by a human who inspects only logarithmically many bits of the transcript. This dramatically refines earlier theoretical work that showed debate can decide PSPACE problems but left the human cost unspecified.
A lower bound is proved: if a Boolean function depends on all n input bits, then DQC(f) ≥ log n. The argument uses the fact that the verifier’s decision function V depends on at most 2ℓ variables; to capture dependence on all n inputs, ℓ must be at least log n.
Two complementary upper bounds are given, both expressed in terms of a Boolean circuit C_f that computes f with fan‑in‑2 AND/OR gates.
-
Depth‑based bound: By adapting the Karchmer‑Wigderson communication game to the debate setting, the authors show DQC(f) ≤ depth(C_f) + 1. The provers traverse the circuit from the output gate down to an input, each step pointing to a child gate that witnesses the claimed output. The verifier reads the entire transcript (at most depth(C_f)+1 bits) and extracts the input index that resolves the dispute.
-
Size‑based bound: Using a cross‑examination technique, they obtain DQC(f) ≤ log size(C_f) + 3. Here Prov₀ (Alice) writes down the value of every gate in the circuit; Prov₁ (Bob) either accepts the computation or points to a gate where the output does not follow from its two inputs. The verifier only needs to read Bob’s gate index (log size(C_f) bits) and then query the three bits that constitute the gate’s output and its two inputs. This yields a logarithmic query cost even for circuits of large depth.
An important corollary follows: proving a DQC lower bound of log n + 6 for any language in P would immediately imply new circuit lower bounds exceeding the best known (≈5n). Since the size‑based upper bound is log size(C_f) + 3, such a DQC lower bound would force size(C_f) to be super‑linear, giving a fresh route to circuit complexity results via an information‑theoretic setting.
The paper also examines a randomized verifier model and shows that, for the PSPACE regime, randomization does not improve query complexity. Lemma 3 demonstrates that any valid debate can be compressed to have transcript length k ≤ 2ℓ, because bits never queried by the verifier can be safely removed without breaking the provers’ strategies.
Overall, the work provides a rigorous framework for measuring and minimizing human supervision in debate‑based AI safety. It shows that, theoretically, a human needs to read only O(log n) bits to verify arbitrarily complex computations, and it connects this supervision cost to fundamental open problems in circuit complexity. Future directions include tightening DQC lower bounds, exploring randomized verifiers in other complexity classes, and applying the DQC framework to practical AI alignment systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment