2011.14212

Approximate Midpoint Policy Iteration for Linear Quadratic Control

Benjamin Gravell, Iman Shames, Tyler Summers

incompletemedium confidence

Category: math.DS
Journal tier: Specialist/Solid
Processed: Sep 28, 2025, 12:55 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper asserts cubic local convergence of the controller gains for midpoint policy iteration but only cites generic midpoint-Newton results, without verifying the needed smoothness/Jacobian conditions or the nontrivial step that converts cubic convergence in value matrices P to an inequality purely in gain error. The model solution correctly identifies MPI as midpoint-Newton for the Riccati residual and establishes cubic convergence in P, but its proposed sufficient condition to translate this to a gain-only bound (controllability of (F*, B) implying injectivity of E ↦ B^T E F*) is not generally valid. Thus, both arguments miss essential assumptions/details.

Referee report (LaTeX)

\textbf{Recommendation:} major revisions

\textbf{Journal Tier:} specialist/solid

\textbf{Justification:}

The midpoint-Newton framing and algorithmic development are valuable, and the empirical evidence is supportive. However, the main theoretical proposition is stated for gains and justified only by generic results applicable to the value-matrix variable. The proof omits a necessary and nontrivial argument (or assumptions) to translate cubic convergence in P to an inequality purely in K. This mismatch between the stated result and provided justification needs to be addressed before publication.