2010.10473

Regret-optimal control in dynamic environments

Gautam Goel, Babak Hassibi

correcthigh confidence

Category: Not specified
Journal tier: Strong Field
Processed: Sep 28, 2025, 12:55 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper derives the regret-optimal controller for finite-horizon LTV LQR by reducing the regret-suboptimal inequality to a suboptimal H∞ problem via a causal, invertible whitening of the disturbance, z = Lw, where L is constructed so that γ^2 I + G^T (I+FF^T)^{-1} G = L^T L; the resulting extended plant, Riccati recursions, and strict bounded-real conditions Δ̂_t ≺ 0 yield the controller in Theorem 4 and the bound Regret ≤ γ_opt^2 ∑ ||w_t||^2 (see the reduction and state-space model for z and the extended plant, and Theorem 4’s formulas for Â_t, B̂_{u,t}, B̂_{w,t}, Q̂_t, P̂_t, P^b_t, and Δ̂_t) . The candidate solution reproduces this structure step-by-step: forward completion-of-squares to define Ã_t and the disturbance-shaping state δ via a forward Riccati, a backward completion-of-squares and change of variables w ↦ z via a backward Riccati (R^b_{e,t} and K^b_{l,t}), construction of the same extended plant, and enforcement of the finite-horizon bounded-real inequalities Δ̂_t ≺ 0 with synthesis via dynamic programming. The H∞ subproblem and bisection over γ^2 match the paper’s Theorem 2-based approach (suboptimal H∞ feasibility via Δ_t ≺ 0) and Theorem 4 (smallest γ^2 with Δ̂_t ≺ 0), respectively . Aside from stylistic differences (operator/Kalman-factorization viewpoint in the paper vs. completion-of-squares/dissipation in the candidate), the definitions and resulting controller are the same.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The paper presents a rigorous and implementable synthesis of the regret-optimal controller for finite-horizon LTV LQR via a reduction to H∞ control and bounded-real lemmas. The construction is clear, the controller is explicit, and the regret bound is tight by design. The result is valuable for adaptive control under dynamic disturbances and complements both H2 and H∞ controllers. Minor clarifications on factorization/invertibility and a succinct algorithmic summary would further strengthen readability.