Back to search
2107.04479

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Arnulf Jentzen, Adrian Riekert

correctmedium confidence
Category
Not specified
Journal tier
Strong Field
Processed
Sep 28, 2025, 12:56 AM

Audit review

The paper proves that along any bounded gradient-flow (GF) trajectory for a one-hidden-layer ReLU network, the risk L decreases and converges to the risk at a critical point; technically, it establishes an energy identity and then uses lower semicontinuity of the generalized gradient norm together with compactness to extract a critical limit point (Theorem 1.1(iv), proved via Lemma 3.1) . The candidate solution follows the same blueprint: smooth the ReLU by C1 approximants Rr, control ∇Lr uniformly on bounded parameter sets, pass r→∞ to identify G and the energy identity, deduce ∫0^∞∥G(Θs)∥2 ds<∞, select times with small gradient, and invoke measure-zero of hyperplane boundaries (µ≪λ) to pass to a critical point. These steps match Proposition 2.2 (regularity and convergence of Lr, ∇Lr to G), Lemma 3.1 (energy identity), and Corollaries 2.16–2.17 (lower semicontinuity and full-measure C1 locus) in the paper . Minor differences are presentation-level: the model states Lr→L “locally uniformly” on bounded sets (pointwise convergence in θ suffices) and claims existence of limr∇Lr(θ) for every θ; the paper phrases G via a definition that only requires convergence where the limit exists (then shows the desired convergence and explicit formulas, e.g., (14)) . These do not affect correctness. Overall, both arguments are essentially the same smoothing-and-limit proof used in the paper’s Section 2–3, including the indicator-region continuity under µ≪λ and the energy identity that yields monotonicity of t↦L(Θt) .

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The paper rigorously establishes convergence of the risk along bounded gradient-flow trajectories in one-hidden-layer ReLU networks via smooth approximation, an energy identity, and compactness arguments. The analysis of the generalized gradient and active-region continuity under absolutely continuous input measures is careful and correct. Minor clarifications (the scope of G’s definition, explicit monotonicity-to-limit) would further strengthen readability.