Research

On Convergence of Gradient Descent Ascent: A Tight Local Analysis

ICML

Cite Paper

Authors

Subhro Das
Ali Jadbabaie
Haochuan Li
Farzan Farnia

Published on

07/23/2022

Categories

ICML

Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs). Convergence properties of GDA have drawn significant interest in the recent literature. Specifically, for minx maxy f(x; y) where f is stronglyconcave in y and possibly nonconvex in x, (Lin et al., 2020) proved the convergence of GDA with a stepsize ratio ηy/ηx = Θ(κ 2 ) where ηx and ηy are the stepsizes for x and y and κ is the condition number for y. While this stepsize ratio suggests a slow training of the min player, practical GAN algorithms typically adopt similar stepsizes for both variables, indicating a wide gap between theoretical and empirical results. In this paper, we aim to bridge this gap by analyzing the local convergence of general nonconvex-nonconcave minimax problems. We demonstrate that a stepsize ratio of Θ(κ) is necessary and sufficient for local convergence of GDA to a Stackelberg Equilibrium, where κ is the local condition number for y. We prove a nearly tight convergence rate with a matching lower bound. We further extend the convergence guarantees to stochastic GDA and extra-gradient methods (EG). Finally, we conduct several numerical experiments to support our theoretical findings.

Please cite our work using the BibTeX below.

@InProceedings{pmlr-v162-li22e,
  title = 	 {On Convergence of Gradient Descent Ascent: A Tight Local Analysis},
  author =       {Li, Haochuan and Farnia, Farzan and Das, Subhro and Jadbabaie, Ali},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {12717--12740},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/li22e/li22e.pdf},
  url = 	 {https://proceedings.mlr.press/v162/li22e.html},
  abstract = 	 {Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs). Convergence properties of GDA have drawn significant interest in the recent literature. Specifically, for $\min_{x} \max_{y} f(x;y)$ where $f$ is strongly-concave in $y$ and possibly nonconvex in $x$, (Lin et al., 2020) proved the convergence of GDA with a stepsize ratio $\eta_y/\eta_x=\Theta(\kappa^2)$ where $\eta_x$ and $\eta_y$ are the stepsizes for $x$ and $y$ and $\kappa$ is the condition number for $y$. While this stepsize ratio suggests a slow training of the min player, practical GAN algorithms typically adopt similar stepsizes for both variables, indicating a wide gap between theoretical and empirical results. In this paper, we aim to bridge this gap by analyzing the local convergence of general nonconvex-nonconcave minimax problems. We demonstrate that a stepsize ratio of $\Theta(\kappa)$ is necessary and sufficient for local convergence of GDA to a Stackelberg Equilibrium, where $\kappa$ is the local condition number for $y$. We prove a nearly tight convergence rate with a matching lower bound. We further extend the convergence guarantees to stochastic GDA and extra-gradient methods (EG). Finally, we conduct several numerical experiments to support our theoretical findings.}
}