Investigating Bi-Level Optimization for Learning and Vision from a Unified Perspective: A Survey and Beyond

Risheng Liu, Jiaxin Gao, Jin Zhang, Deyu Meng, Zhouchen Lin

correcthigh confidence

Category: Not specified
Journal tier: Strong Field
Processed: Sep 28, 2025, 12:55 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper states the chain-rule decomposition for ϕ(x)=F(x,y*(x)) (its Eq. (14)) and then derives the implicit-gradient formula (its Eq. (33)) by differentiating the LL stationarity condition ∂y f(x,y*(x))=0, assuming the LL Hessian ∂²yy f is invertible and working under the lower-level singleton (LLS) setting; see their discussion that IGBR requires second-order differentiability and an invertible Hessian and relies on uniqueness of the LL solution set . The candidate solution gives the same formula, using the Implicit Function Theorem to justify a C^1 local selection and explicitly identifying it with the best response under LLS and nondegenerate KKT, then applying the chain rule. Both arguments are essentially the same (implicit differentiation of the LL KKT with invertible ∂²yy f); the model provides a more explicit IFT-based justification of smoothness and selection. The paper’s presentation is correct but high-level; the model fills in standard regularity details.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The survey correctly presents the chain-rule decomposition and the implicit-gradient formula for bilevel optimization under standard assumptions (LLS and invertible LL Hessian). The argument aligns with established implicit-differentiation practice. For completeness and rigor, a short formal statement (e.g., an IFT-based lemma) ensuring local C1 regularity of the best-response mapping and its identification with the global LL minimizer near a nondegenerate interior KKT point would be beneficial. These are clarifications rather than corrections.