2010.00403

Mediating artificial intelligence developments through negative and positive incentives

The Anh Han, Luís Moniz Pereira, Tom Lenaerts, Francisco C. Santos

correctmedium confidence

Category: Not specified
Journal tier: Strong Field
Processed: Sep 28, 2025, 12:55 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The model’s derivations match the paper’s logic and thresholds for the baseline (AS vs AU), reward (RS vs AU), and the limiting punishment case sα ≥ s. For punishment with sα < s, both follow the same large-B risk-dominance inequality; the paper then states Eq. (7) as pr > 1 − 1/(s + 2Wr), but the immediately preceding algebra in the paper itself yields pr > 1 − 1/(s + 2W/r). This appears to be a notational/typographical slip in Eq. (7). The candidate derives the correct form but attributes the discrepancy to a redefinition of r, which is unnecessary. Aside from this minor notation bug, the proofs are the same in substance and reach the same thresholds.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The analytical framework is well-motivated and delivers crisp, interpretable thresholds with clear policy interpretations. Numerical results back the analytics. The only substantive issue is a minor notational/typographical error in Eq. (7) that could mislead readers; fixing this and clarifying the definition of r will resolve the discrepancy. Otherwise, the work is methodologically sound and sufficiently novel within the AI governance modeling literature.