Formalizing the Generalization-Forgetting Trade-off in Continual Learning

R. Krishnan, Prasanna Balaprakash

incompletemedium confidence

Category: Not specified
Journal tier: Specialist/Solid
Processed: Sep 28, 2025, 12:56 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper claims local existence and stability of a saddle point for the two-player game built on H(Δx, θ) via Theorems 1–2, but its supporting lemmas rely on nonstandard or insufficient assumptions. In particular, Lemma 2 shows that a normalized ascent step increases H by roughly α(i) when θ is fixed, but it neither guarantees that Δx(i) converges nor that the limit is a maximizer; moreover it assumes the gradient norm is always strictly positive, which precludes convergence to a true critical point and is invoked to justify dividing by ‖∇H‖2 (see the normalized step and assumptions in Sec. 3.1 and Appendix A: Lemma 2 and the remarks) . Lemma 3 similarly shows a one-step decrease for the θ-update under bounded gradients, but without convexity/strong convexity it does not ensure convergence to a minimizer, nor does it control cross-terms created by ∂θ(∇xV(∗)k)Δx (Appendix A: Lemma 3) . The existence and stability theorems (Theorems 1–2, restated as Theorems 3–4 in the appendix) stitch these lemmas together, but the logical gap remains (the union Mk ∪ Nk being nonempty is not enough to ensure a local saddle unless maximizer/minimizer claims are valid, and the discrete-time proof appeals to decaying steps without the usual curvature/smoothness conditions) . By contrast, the model’s solution is correct under stricter, classical conditions (local convexity in θ, affine in Δx, compact convex neighborhoods, Lipschitz gradients), using Sion’s minimax theorem for existence and a Lyapunov/Arrow–Hurwicz–Uzawa argument for stability. However, it overstates uniqueness of the Δx-component of the saddle: because H is affine in Δx and ∇ΔxH(·, θ∗) can vanish at equilibrium, Δx∗ need not be unique. Thus, the paper’s proofs are incomplete under their stated assumptions, and the model’s proof is essentially correct but requires additional hypotheses not present in the paper and slightly overclaims uniqueness.

Referee report (LaTeX)

\textbf{Recommendation:} major revisions

\textbf{Journal Tier:} specialist/solid

\textbf{Justification:}

The proposed two-player formulation for balancing forgetting and generalization is interesting and potentially impactful for continual learning. However, the theoretical section requires substantial strengthening: core lemmas assume a strictly positive gradient norm and use first-order expansions to claim asymptotic optimality without curvature/smoothness guarantees. Stability under sequential updates with decaying steps is asserted without the usual strong convexity/Lipschitz conditions. These gaps can be addressed with standard convex–concave (or locally strongly convex) assumptions and by clarifying the role and error of the finite-difference approximations used to construct H.