On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization



  • Xiangyi Chen
  • Sijia Liu
  • Ruoyu Sun
  • Mingyi Hong

Published on


This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the “Adam-type,” includes the popular algorithms such as Adam, AMSGrad, AdaGrad. Despite their popularity in training deep neural networks (DNNs), the convergence of these algorithms for solving non-convex problems remains an open question. In this paper, we develop an analysis framework and a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods, with a convergence rate of order O(log{T}/sqrt{T}) for non-convex stochastic optimization. Our convergence analysis applies to a new algorithm called AdaFom (AdaGrad with First Order Momentum). We show that the conditions are essential, by identifying concrete examples in which violating the conditions makes an algorithm diverge. Besides providing one of the first comprehensive analysis for Adam-type methods in the non-convex setting, our results can also help the practitioners to easily monitor the progress of algorithms and determine their convergence behavior.

Please cite our work using the BibTeX below.

  author    = {Xiangyi Chen and
               Sijia Liu and
               Ruoyu Sun and
               Mingyi Hong},
  title     = {On the Convergence of {A} Class of Adam-Type Algorithms for Non-Convex
  journal   = {CoRR},
  volume    = {abs/1808.02941},
  year      = {2018},
  url       = {},
  archivePrefix = {arXiv},
  eprint    = {1808.02941},
  timestamp = {Thu, 11 Oct 2018 15:59:45 +0200},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}
Close Modal