2111.11743
Independent Learning in Stochastic Games
Asuman Ozdaglar, Muhammed O. Sayin, Kaiqing Zhang
correcthigh confidence
- Category
- Not specified
- Journal tier
- Specialist/Solid
- Processed
- Sep 28, 2025, 12:56 AM
- arXiv Links
- Abstract ↗PDF ↗
Audit review
The paper’s stated result (Theorem 4.3) proves almost-sure convergence of two-timescale fictitious play in discounted two-player zero-sum stochastic games, carefully handling the key difficulty that, along the learning path, the per-state auxiliary games can deviate from zero-sum because each player updates her own Q-table independently. The proof sketch explicitly introduces a Lyapunov-function-based tracking argument to show v̂^i(s) tracks val(Q̂^i(s,·)) before invoking asynchronous SA for a contractive operator . By contrast, the model’s solution incorrectly treats the fast-timescale dynamics as classical fictitious play in a fixed zero-sum matrix game with payoff Q̂1(s,·,·), overlooking that the stage game during learning is generally not zero-sum (Q̂1+Q̂2≠0) and thus standard fictitious-play convergence cannot be invoked directly . The model also asserts E_k→0 without the Lyapunov/tracking step that the paper uses to bridge v̂^i(s) and val(Q̂^i(s,·)) .
Referee report (LaTeX)
\textbf{Recommendation:} minor revisions
\textbf{Journal Tier:} specialist/solid
\textbf{Justification:}
The exposition correctly states and motivates the convergence of two-timescale fictitious play in zero-sum stochastic games, and the proof sketch addresses the central analytical hurdle (non-zero-sum deviations during learning) with an appropriate Lyapunov/tracking device before invoking asynchronous SA for a contractive operator. Minor clarifications and cross-references would improve readability for newcomers.