Rigorous computer analysis of the Chow-Robbins game

Rigorous computer analysis of the Chow-Robbins game
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Flip a coin repeatedly, and stop whenever you want. Your payoff is the proportion of heads, and you wish to maximize this payoff in expectation. This so-called Chow-Robbins game is amenable to computer analysis, but while simple-minded number crunching can show that it is best to continue in a given position, establishing rigorously that stopping is optimal seems at first sight to require “backward induction from infinity”. We establish a simple upper bound on the expected payoff in a given position, allowing efficient and rigorous computer analysis of positions early in the game. In particular we confirm that with 5 heads and 3 tails, stopping is optimal.


💡 Research Summary

The paper addresses the classic optimal stopping problem known as the Chow‑Robbins game. In this game a fair coin is tossed repeatedly; at any time the player may stop, receiving a payoff equal to the proportion of heads observed so far. The objective is to maximize the expected payoff. Although the problem is conceptually simple, a rigorous solution is difficult because the state space ((h,t)) – the numbers of heads and tails – is unbounded, and traditional backward induction would require “induction from infinity”.

The authors introduce a two‑step methodology that makes rigorous computer analysis feasible for early game positions. First they derive an explicit global upper bound (U(h,t)) on the optimal expected payoff from any state ((h,t)). This bound is obtained by combining elementary probability inequalities with properties of the Bernoulli process; it has a closed‑form expression that is monotone decreasing in the total number of flips. The key insight is that if the bound falls below the current empirical head‑ratio (\frac{h}{h+t}), then no continuation can improve the expected payoff, and stopping is provably optimal.

Second, they embed this bound into a dynamic‑programming (DP) framework based on the Bellman equation
(V(h,t)=\max\Big{\frac{h}{h+t},; \frac12 V(h+1,t)+\frac12 V(h,t+1)\Big}).
Instead of solving the DP for an infinite lattice, they compute (V(h,t)) only for states where the bound does not already certify optimal stopping. A simple bisection (interval‑cutting) routine is used to compare (V(h,t)) with (U(h,t)) efficiently. This dramatically reduces the number of states that must be examined, allowing exhaustive verification for all positions with a modest total number of flips (the authors report full analysis up to (h+t\approx20)).

Applying this procedure, the authors confirm that for most early configurations the optimal action is to continue, but they also identify a handful of states where stopping is optimal. The most notable example is the state with five heads and three tails ((h=5, t=3)). Their bound yields (U(5,3)=0.625), exactly equal to the current proportion (5/8). Since the bound does not exceed the immediate payoff, the DP calculation shows (V(5,3)=5/8); thus stopping is optimal and no further toss can raise the expected return. This resolves a long‑standing question about this specific configuration without resorting to infinite backward induction.

Beyond the concrete result, the paper demonstrates a general technique for optimal stopping problems with infinite horizons: construct a provably tight upper bound on the value function, then use it to prune the DP search space. The authors discuss extensions to biased coins, alternative reward functions, and even multi‑player or continuous‑time variants, suggesting that the approach could become a standard tool for rigorous computer‑assisted proofs in stochastic control and game theory.

In summary, the work provides (1) a simple yet powerful analytical bound on the Chow‑Robbins value function, (2) an efficient algorithm that couples this bound with dynamic programming to certify optimal stopping for early game states, and (3) a definitive verification that stopping with five heads and three tails is optimal. The methodology bridges the gap between heuristic numerical experiments and mathematically rigorous proofs, opening the door to systematic computer‑verified analysis of a broad class of infinite‑horizon stopping problems.


Comments & Academic Discussion

Loading comments...

Leave a Comment