Deep Reinforcement Learning for Online Control of Stochastic Partial Differential Equations

Erfan Pirmorad, Faraz Khoshbakhtian, Farnam Mansouri, Amir-massoud Farahmand

incompletemedium confidence

Category: Not specified
Journal tier: Note/Short/Other
Processed: Sep 28, 2025, 12:56 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper formulates SPDE control as an MDP and defines the “squashing” objective J with ū(t) = E[(1/2π)∫ u dx], but only sketches the MDP mapping and reward choice without rigorous derivations or assumptions. It also adopts a specific finite-difference/Crank–Nicolson discretization without formal transition-kernel characterization. The candidate solution supplies the missing analysis: (a) a correct variance identity for E[∫(u−ū)^2], (b) a concrete MDP construction via Fourier–Galerkin and Euler–Maruyama with explicit Gaussian kernels and reward equal to −J_N, and (c) a sound argument that improved return strictly reduces the expected spatial L^2 variance at some time. Minor slips (a sign omission in one explanatory sentence and the sign of the state-mean variance adjustment) do not affect correctness. Overall, the paper’s argument is conceptually right but incomplete, while the model’s solution is correct on the posed tasks. See the paper’s definition of the reward and cost (their Eqs. (2)–(3)) and “squashing” objective for stochastic Burgers (their Eq. (5) and surrounding text) for alignment .

Referee report (LaTeX)

\textbf{Recommendation:} major revisions

\textbf{Journal Tier:} note/short/other

\textbf{Justification:}

The paper advances an appealing RL framing for SPDE control and shows preliminary success on stochastic Burgers shock damping. However, it lacks rigorous details about the discretization-to-MDP mapping (transition kernels, assumptions, and convergence of discrete to continuous costs), and it does not substantiate per-time-step “squashing” claims theoretically. These can be remedied without changing the main contribution, hence major revisions are recommended.