2011.11231
Reinforcement Learning-based Disturbance Rejection Control for Uncertain Nonlinear Systems
Maopeng Ran, Juncheng Li, Lihua Xie
correcthigh confidence
- Category
- math.DS
- Journal tier
- Strong Field
- Processed
- Sep 28, 2025, 12:55 AM
- arXiv Links
- Abstract ↗PDF ↗
Audit review
The paper proves, under A1–A4, that the saturated ESO achieves uniform (in t ≥ T) convergence of estimation errors as ε → 0 and that the state and actor/critic weights are UUB, using a two-step argument that handles the ESO’s peaking via saturation, a scaled error system with a time-varying feedback term, Bellman-error decomposition, and a composite Lyapunov analysis with a Gramian lower bound (A4) to remove PE; see the statement of Theorem 1 and its proof, including the ESO design (9)–(10), control (23), updates (19)–(22), assumptions (24)–(25), BE decomposition (38)–(39), and Lyapunov inequality (45) with gain conditions (46) . The model’s solution mirrors this structure: it lumps uncertainties into an extended state, analyzes a high-gain saturated ESO with scaled errors, reduces the plant to the nominal loop with an O(ε) perturbation, decomposes instantaneous/extrapolated BEs, and establishes UUB of critic/actor and state via a composite Lyapunov function and A4. Differences are technical (e.g., the model streamlines the ESO analysis by not isolating the −F1ϑ2η1 feedback term, and it absorbs the Θ̃c-quadratic term in δ into bounded residuals), but do not alter the conclusion.
Referee report (LaTeX)
\textbf{Recommendation:} minor revisions
\textbf{Journal Tier:} strong field
\textbf{Justification:}
A solid, well-motivated integration of ADRC/ESO with concurrent-learning RL for uncertain nonlinear systems with non-simple nominal models. The theory carefully handles ESO saturation and RL coupling and eliminates PE via extrapolated BEs. Minor additions to clarify ESO feedback handling and practical selection of saturation bounds would improve readability.