Adam, and related methods that use exponential moving averages such as RMSProp, Adadelta, and Nadam, can be demonstrated not to converge, even for 1-dimensional convex problems (see Reddi et al 2018 - On the Convergence of Adam and Beyond, follow-up here: Ward et al 2020). The methods below attempt to improve this situation.
See also Distributed Training
See also Meta-Learning
For a history of SGD, etc see ML History - Optimization.
Second-order optimizers such as Newton's method or quasi-Newton methods enjoy a much, much faster convergence rate than first-order optimizers (a “quadratic” convergence rate). See here and here.
Also known as black-box optimizers. See also Hyperparameter Tuning.