Back to search
2010.00403

Mediating artificial intelligence developments through negative and positive incentives

The Anh Han, Luís Moniz Pereira, Tom Lenaerts, Francisco C. Santos

correctmedium confidence
Category
Not specified
Journal tier
Strong Field
Processed
Sep 28, 2025, 12:55 AM

Audit review

The model’s derivations match the paper’s logic and thresholds for the baseline (AS vs AU), reward (RS vs AU), and the limiting punishment case sα ≥ s. For punishment with sα < s, both follow the same large-B risk-dominance inequality; the paper then states Eq. (7) as pr > 1 − 1/(s + 2Wr), but the immediately preceding algebra in the paper itself yields pr > 1 − 1/(s + 2W/r). This appears to be a notational/typographical slip in Eq. (7). The candidate derives the correct form but attributes the discrepancy to a redefinition of r, which is unnecessary. Aside from this minor notation bug, the proofs are the same in substance and reach the same thresholds.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The analytical framework is well-motivated and delivers crisp, interpretable thresholds with clear policy interpretations. Numerical results back the analytics. The only substantive issue is a minor notational/typographical error in Eq. (7) that could mislead readers; fixing this and clarifying the definition of r will resolve the discrepancy. Otherwise, the work is methodologically sound and sufficiently novel within the AI governance modeling literature.