Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Arnulf Jentzen, Adrian Riekert

correctmedium confidence

Category: Not specified
Journal tier: Strong Field
Processed: Sep 28, 2025, 12:56 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper proves that along any bounded gradient-flow (GF) trajectory for a one-hidden-layer ReLU network, the risk L decreases and converges to the risk at a critical point; technically, it establishes an energy identity and then uses lower semicontinuity of the generalized gradient norm together with compactness to extract a critical limit point (Theorem 1.1(iv), proved via Lemma 3.1) . The candidate solution follows the same blueprint: smooth the ReLU by C1 approximants Rr, control ∇Lr uniformly on bounded parameter sets, pass r→∞ to identify G and the energy identity, deduce ∫0^∞∥G(Θs)∥2 ds<∞, select times with small gradient, and invoke measure-zero of hyperplane boundaries (µ≪λ) to pass to a critical point. These steps match Proposition 2.2 (regularity and convergence of Lr, ∇Lr to G), Lemma 3.1 (energy identity), and Corollaries 2.16–2.17 (lower semicontinuity and full-measure C1 locus) in the paper . Minor differences are presentation-level: the model states Lr→L “locally uniformly” on bounded sets (pointwise convergence in θ suffices) and claims existence of limr∇Lr(θ) for every θ; the paper phrases G via a definition that only requires convergence where the limit exists (then shows the desired convergence and explicit formulas, e.g., (14)) . These do not affect correctness. Overall, both arguments are essentially the same smoothing-and-limit proof used in the paper’s Section 2–3, including the indicator-region continuity under µ≪λ and the energy identity that yields monotonicity of t↦L(Θt) .

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The paper rigorously establishes convergence of the risk along bounded gradient-flow trajectories in one-hidden-layer ReLU networks via smooth approximation, an energy identity, and compactness arguments. The analysis of the generalized gradient and active-region continuity under absolutely continuous input measures is careful and correct. Minor clarifications (the scope of G’s definition, explicit monotonicity-to-limit) would further strengthen readability.