Back to search
2010.04223

Fictitious Play in Zero-sum Stochastic Games

Muhammed O. Sayin, Francesca Parise, Asu Ozdaglar

correcthigh confidence
Category
Not specified
Journal tier
Specialist/Solid
Processed
Sep 28, 2025, 12:55 AM

Audit review

The paper proves almost sure convergence of a two-timescale fictitious-play scheme in discounted, finite, two-player zero-sum stochastic games, for both model-based and model-free settings, under standard infinite visitation and stepsize separation assumptions. Crucially, the paper notes that along the iterates the auxiliary per-state games need not be zero-sum because the belief-based continuation values v̂1(s) and v̂2(s) need not sum to zero; it addresses this by a differential-inclusion analysis with a novel Lyapunov function to control the fast process and then shows that the Q-beliefs’ updates form an asynchronous contraction (Shapley) iteration with asymptotically negligible tracking error, yielding convergence (Theorem 1) . By contrast, the model’s solution assumes the fast-timescale game is zero-sum at fixed Q by taking player 2’s payoff as −Q1, then invokes zero-sum fictitious-play convergence to argue v̂ tracks val(Q). This misses the paper’s key complication: under the actual dynamics player 2 best-responds to Q2, not −Q1, so the fast game is generally not zero-sum; the model’s argument therefore does not justify the tracking property or the reduction to a clean contraction H(Q) = r + γ P val(Q) on the slow timescale without additional work. The conclusion matches the paper, but the proof omits the essential general-sum-to-zero-sum asymptotic step established in the paper .

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} specialist/solid

\textbf{Justification:}

The paper tackles a natural and important question: learning dynamics leading to stationary equilibria in zero-sum stochastic games via a fictitious-play-like scheme. It contributes a nontrivial analysis by combining a differential-inclusion/perturbed-solution framework with a new Lyapunov function for the fast (potentially general-sum) auxiliary games, and an asynchronous stochastic-approximation analysis for the slow Q-updates. The results are technically sound and align with the Shapley contraction picture. Clarifying some proof steps (e.g., explicit conditions ensuring boundedness and the precise DI assumptions) and polishing exposition would further strengthen readability.