Independent Learning in Stochastic Games

Asuman Ozdaglar, Muhammed O. Sayin, Kaiqing Zhang

correcthigh confidence

Category: Not specified
Journal tier: Specialist/Solid
Processed: Sep 28, 2025, 12:56 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper’s stated result (Theorem 4.3) proves almost-sure convergence of two-timescale fictitious play in discounted two-player zero-sum stochastic games, carefully handling the key difficulty that, along the learning path, the per-state auxiliary games can deviate from zero-sum because each player updates her own Q-table independently. The proof sketch explicitly introduces a Lyapunov-function-based tracking argument to show v̂^i(s) tracks val(Q̂^i(s,·)) before invoking asynchronous SA for a contractive operator . By contrast, the model’s solution incorrectly treats the fast-timescale dynamics as classical fictitious play in a fixed zero-sum matrix game with payoff Q̂1(s,·,·), overlooking that the stage game during learning is generally not zero-sum (Q̂1+Q̂2≠0) and thus standard fictitious-play convergence cannot be invoked directly . The model also asserts E_k→0 without the Lyapunov/tracking step that the paper uses to bridge v̂^i(s) and val(Q̂^i(s,·)) .

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} specialist/solid

\textbf{Justification:}

The exposition correctly states and motivates the convergence of two-timescale fictitious play in zero-sum stochastic games, and the proof sketch addresses the central analytical hurdle (non-zero-sum deviations during learning) with an appropriate Lyapunov/tracking device before invoking asynchronous SA for a contractive operator. Minor clarifications and cross-references would improve readability for newcomers.