2106.03885
Differentiable Multiple Shooting Layers
Stefano Massaroli, Michael Poli, Sho Sonoda, Taiji Suzuki, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
correctmedium confidence
- Category
- Not specified
- Journal tier
- Strong Field
- Processed
- Sep 28, 2025, 12:56 AM
- arXiv Links
- Abstract ↗PDF ↗
Audit review
The paper’s Appendix A.1 proves the O(η_p^2) one-step Newton tracking bound by a second-order Taylor (2-jet) expansion of g(B,θ)=0 at (B*_p,θ_{p+1}), using that Dg=I−Dγ is invertible thanks to the strictly lower-triangular (nilpotent) structure of Dγ, and bounding the inverse via a finite Neumann series; this yields an explicit constant M in Eq. (A.8) . The candidate solution follows the same strategy: (i) represent the direct Newton update using the block lower–bidiagonal Jacobian, (ii) apply a second-order Taylor remainder identity, and (iii) control ΔB* via Lipschitz dependence on θ (via variational equations or an implicit function theorem argument). Differences are stylistic: the paper phrases bounds in terms of Lipschitz constants m^θ_γ, m^z_γ, while the model derives them from standard ODE flow sensitivities; both yield the same O(η_p^2) conclusion and rely on the same structural facts about the Jacobian and Newton’s method. The direct Newton iteration used by both matches Eq. (3.2) in the paper , and the training context and assumptions preceding Theorem 1 align with the model’s conditions (fixed z0, Lipschitz loss, γ Lipschitz in z and θ) .
Referee report (LaTeX)
\textbf{Recommendation:} minor revisions
\textbf{Journal Tier:} strong field
\textbf{Justification:}
The result is correct and impactful for efficient training of implicit neural ODE layers. The proof leverages the nilpotent Jacobian structure and Newton’s quadratic error identity appropriately. Minor clarifications about uniformity of bounds, compactness of the trajectory set, and handling of remainder terms would strengthen rigor and usability. Empirical sections corroborate the theoretical speedups, and the paper situates the contribution among Newton/Quasi-Newton and adjoint-based approaches.