On the Heterogeneity of Independent Learning Dynamics in Zero-sum Stochastic Games

Muhammed O. Sayin, K. Alperen Cetiner

correctmedium confidence

Category: Not specified
Journal tier: Strong Field
Processed: Sep 28, 2025, 12:56 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper proves almost-sure convergence of two-timescale fictitious play with heterogeneous step sizes in discounted two-player zero-sum stochastic games under Assumptions 2–3 and the sharp condition γ ≤ dα dβ (Theorem 4), by constructing a novel Lyapunov function and a one-sided asynchronous convergence argument tailored to the fact that, during learning, the auxiliary stage games are not exactly zero-sum because Q1 and Q2 are updated independently (Equations (5a)–(5b); see the discussion that this deviation is the key challenge and how it is handled via the Lyapunov term Ξ and the choice of λ ∈ (1, dα dβ/γ)) . By contrast, the candidate model’s outline assumes the fast-timescale beliefs evolve by fictitious play in a frozen zero-sum stage game “with payoff matrix Q(s,·)”, and then treats the mismatch as a small perturbation dominated by contraction. This misses the central technical obstacle emphasized in the paper: for frozen beliefs the induced auxiliary stage games are generally non–zero-sum when Q1 + Q2 ≠ 0, so standard fictitious-play convergence for zero-sum games does not apply a priori; this is exactly why the paper builds a bespoke Lyapunov function and a one-sided asynchronous bound to recover convergence under γ ≤ dα dβ . The model also leaves several necessary assumptions implicit (infinite state visitation; precise step-size conditions), which the paper states explicitly (Assumptions 2–3) . Hence the paper’s argument is sound and complete at the level of a research proof sketch, while the model’s outline is incorrect/incomplete on the key fast-timescale convergence claim.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The paper isolates and solves a genuine difficulty in heterogeneous independent learning for stochastic games: off-equilibrium auxiliary stage games are not zero-sum, invalidating direct use of standard fictitious-play arguments. The Lyapunov construction and one-sided asynchronous analysis give a crisp, interpretable condition γ ≤ dα dβ and unify heterogeneous timescales into a single convergence guarantee. Exposition is clear but could be strengthened with a more self-contained proof of the one-sided lemma and a brief summary of constants.