2012.03083

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Stefanos Leonardos, Georgios Piliouras

correctmedium confidence

Category: math.DS
Journal tier: Strong Field
Processed: Sep 28, 2025, 12:55 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper’s Theorem 4.1 establishes that for any M>0 there exist potential games, positive-measure sets of initial conditions, and exploration schedules with lim sup_t δ_k(t)=0 such that exploration can cause unbounded performance loss or gain—even if only a single agent explores—by steering the SQL dynamics (eq. (1)) between basins of different pure equilibria via the QRE geometry in 2×2 coordination games . The proof uses the geometry of the QRE surface (Theorem 4.2) to show how a temporary increase in exploration leads trajectories into the basin of the risk-dominant equilibrium, and how keeping exploration always low keeps them in the basin of the other equilibrium . The model’s construction also satisfies all quantifiers and constraints, but via a different route: it gives explicit two-action exact-potential games and single-agent, finite-window exploration schedules; it analyzes the SQL/replicator ODE directly to prove basin crossing by elementary differential inequalities. This constitutes a valid alternative proof. Minor slips in the model’s write-up (a typo in p⋆ for part (ii) and one formula for ṗ’s exploration term) do not affect correctness. Overall, both arguments are correct; the paper’s is topological/QRE-based, the model’s is a direct ODE/basin argument.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The paper’s main theorem concisely captures a critical and practically relevant instability of exploration in multi-agent learning, even under vanishing exploration and single-agent perturbations. The QRE-geometry argument is conceptually clear and consistent with SQL dynamics in potential games. Tightening the proof with explicit thresholds and a short continuity-to-basin lemma would make it fully airtight. The model’s alternative ODE-based proof further strengthens confidence in the result.