Git Product home page Git Product logo

awesome-optimizer's Introduction

Awesome-Optimizer

Collect optimizer related papers, data, repositories

Title Year Optimizer Published Code Keywords
A Stochastic Approximation Method 1951 SGD projecteuclid code gradient descent
Some methods of speeding up the convergence of iteration methods 1964 Polyak sciencedirect gradient descent
Large-scale linearly constrained optimization 1978 MINOS springerlink quasi-newton
On the limited memory BFGS method for large scale optimization 1989 L-BFGS springerlink quasi-newton
Particle swarm optimization 1995 PSO ieee evolutionary
Trust region methods 2000 Sub-sampled TR siam inexact hessian
Evolving Neural Networks through Augmenting Topologies 2002 NEAT ieee code evolutionary
A Limited Memory Algorithm for Bound Constrained Optimization 2003 L-BFGS-B researchgate code quasi-newton
Online convex programming and generalized infinitesimal gradient ascent 2003 OGD acm gradient descent
A Stochastic Quasi-Newton Method for Online Convex Optimization 2007 O-LBFGS researchgate quasi-newton
Scalable training of L1-regularized log-linear models 2007 OWL-QN acm code quasi-newton
A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks 2009 HyperNEAT ieee evolutionary
AdaDiff: Adaptive Gradient Descent with the Differential of Gradient 2010 AdaDiff iopscience gradient descent
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization 2011 AdaGrad jmlr code gradient descent
CMA-ES: evolution strategies and covariance matrix adaptation 2011 CMA-ES acm code evolutionary
ADADELTA: An Adaptive Learning Rate Method 2012 ADADELTA arxiv code gradient descent
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets 2012 SAG arxiv variance reduced
An Enhanced Hypercube-Based Encoding for Evolving the Placement, Density, and Connectivity of Neurons 2012 ES-HyperNEAT ieee code evolutionary
CMA-TWEANN: efficient optimization of neural networks via self-adaptation and seamless augmentation 2012 CMA-TWEANN acm evolutionary
Neural Networks for Machine Learning 2012 RMSProp coursera code gradient descent
No More Pesky Learning Rates 2012 vSGD-b arxiv code variance reduced
No More Pesky Learning Rates 2012 vSGD-g arxiv code variance reduced
No More Pesky Learning Rates 2012 vSGD-l arxiv code variance reduced
Accelerating stochastic gradient descent using predictive variance reduction 2013 SVRG neurips code variance reduced
Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients 2013 vSGD-fd arxiv gradient descent
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming 2013 ZO-SGD arxiv gradient free
Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization 2013 ZO-ProxSGD arxiv gradient free
Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization 2013 ZO-PSGD arxiv gradient free
Semi-Stochastic Gradient Descent Methods 2013 S2GD arxiv variance reduced
Adam: A Method for Stochastic Optimization 2014 Adam arxiv code gradient descent
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives 2014 SAGA arxiv code variance reduced
A Stochastic Quasi-Newton Method for Large-Scale Optimization 2014 SQN arxiv code quasi-newton
RES: Regularized Stochastic BFGS Algorithm 2014 Reg-oBFGS-Inf arxiv quasi-newton
A Proximal Stochastic Gradient Method with Progressive Variance Reduction 2014 Prox-SVRG arxiv code variance reduced
A Computationally Efficient Limited Memory CMA-ES for Large Scale Optimization 2014 LM-CMA-ES arxiv evolutionary
Random feedback weights support learning in deep neural networks 2014 FA arxiv code gradient descent
Adam: A Method for Stochastic Optimization 2015 AdaMax arxiv code gradient descent
Scale-Free Algorithms for Online Linear Optimization 2015 AdaFTRL arxiv gradient descent
A Linearly-Convergent Stochastic L-BFGS Algorithm 2015 SVRG-SQN arxiv code quasi-newton
Accelerating SVRG via second-order information 2015 SVRG+II: LBFGS opt quasi-newton
Accelerating SVRG via second-order information 2015 SVRG+I: Subsampled Hessian followed by SVT opt quasi-newton
Probabilistic Line Searches for Stochastic Optimization 2015 ProbLS arxiv gradient descent
Optimizing Neural Networks with Kronecker-factored Approximate Curvature 2015 K-FAC arxiv code gradient descent
adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs 2015 adaQN arxiv code quasi-newton
Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization 2016 Damp-oBFGS-Inf arxiv code quasi-newton
Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates 2016 Eve arxiv code gradient descent
Incorporating Nesterov Momentum into Adam 2016 Nadam openreview code gradient descent
The Whale Optimization Algorithm 2016 WOA sciencedirect code evolutionary
Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks 2016 SDProp arxiv gradient descent
Barzilai-Borwein Step Size for Stochastic Gradient Descent 2016 SGD-BB arxiv code gradient descent
Barzilai-Borwein Step Size for Stochastic Gradient Descent 2016 SVRG-BB arxiv code variance reduced
SGDR: Stochastic Gradient Descent with Warm Restarts 2016 SGDR arxiv code gradient descent
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods 2016 Katyusha arxiv variance reduced
A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order 2016 ZO-SCD arxiv gradient free
Direct Feedback Alignment Provides Learning in Deep Neural Networks 2016 DFA arxiv code gradient descent
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks 2017 AdaBatch arxiv code gradient descent
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training 2017 AdaComp arxiv gradient descent
SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient 2017 SARAH arxiv variance reduced
Sub-sampled Cubic Regularization for Non-convex Optimization 2017 SCR arxiv code inexact hessian
IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate 2017 IQN arxiv code quasi-newton
Decoupled Weight Decay Regularization 2017 AdamW arxiv code gradient descent
Decoupled Weight Decay Regularization 2017 SGDW arxiv code gradient descent
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning 2017 BPGrad arxiv code gradient descent
Training Deep Networks without Learning Rates Through Coin Betting 2017 COCOB arxiv code gradient descent
Practical Gauss-Newton Optimisation for Deep Learning 2017 KFLR arxiv gradient descent
Practical Gauss-Newton Optimisation for Deep Learning 2017 KFRA arxiv gradient descent
Large Batch Training of Convolutional Networks 2017 LARS arxiv code gradient descent
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients 2017 M-SVAG arxiv code gradient descent
Normalized Direction-preserving Adam 2017 ND-Adam arxiv code gradient descent
Noisy Natural Gradient as Variational Inference 2017 Noisy Adam arxiv code gradient descent
Noisy Natural Gradient as Variational Inference 2017 Noisy K-FAC arxiv code gradient descent
Evolving Deep Neural Networks 2017 CoDeepNEAT arxiv code evolutionary
Evolving Deep Convolutional Neural Networks for Image Classification 2017 EvoCNN arxiv code evolutionary
NMODE --- Neuro-MODule Evolution 2017 NMODE arxiv code evolutionary
Online Convex Optimization with Unconstrained Domains and Losses 2017 RescaledExp arxiv gradient descent
Variants of RMSProp and Adagrad with Logarithmic Regret Bounds 2017 SC-Adagrad arxiv code gradient descent
Variants of RMSProp and Adagrad with Logarithmic Regret Bounds 2017 SC-RMSProp arxiv code gradient descent
Improving Generalization Performance by Switching from Adam to SGD 2017 SWATS arxiv code gradient descent
YellowFin and the Art of Momentum Tuning 2017 YellowFin arxiv code gradient descent
Natasha 2: Faster Non-Convex Optimization Than SGD 2017 Natasha2 arxiv gradient descent
Natasha 2: Faster Non-Convex Optimization Than SGD 2017 Natasha1.5 arxiv gradient descent
Regularizing and Optimizing LSTM Language Models 2017 NT-ASGD arxiv code gradient descent
SW-SGD: The Sliding Window Stochastic Gradient Descent Algorithm 2017 SW-SGD sciencedirect gradient descent
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost 2018 Adafactor arxiv code gradient descent
Quasi-hyperbolic momentum and Adam for deep learning 2018 QHAdam arxiv code gradient descent
Online Adaptive Methods, Universality and Acceleration 2018 AcceleGrad arxiv gradient descent
Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods 2018 AdaBayes arxiv code gradient descent
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization 2018 AdaFom arxiv gradient descent
Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis 2018 EKFAC arxiv code gradient descent
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods 2018 AdaShift arxiv code gradient descent
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation Methods 2018 BADAM arxiv code gradient descent
Small steps and giant leaps: Minimal Newton solvers for Deep Learning 2018 Curveball arxiv code gradient descent
GADAM: Genetic-Evolutionary ADAM for Deep Neural Network Optimization 2018 GADAM arxiv gradient descent
HyperAdam: A Learnable Task-Adaptive Adam for Network Training 2018 HyperAdam arxiv code gradient descent
L4: Practical loss-based stepsize adaptation for deep learning 2018 L4Adam arxiv code gradient descent
L4: Practical loss-based stepsize adaptation for deep learning 2018 L4Momentum arxiv code gradient descent
Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate 2018 NosAdam arxiv code gradient descent
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks 2018 Padam arxiv code gradient descent
Quasi-hyperbolic momentum and Adam for deep learning 2018 QHM arxiv code gradient descent
Optimal Adaptive and Accelerated Stochastic Gradient Descent 2018 A2GradExp arxiv code gradient descent
Optimal Adaptive and Accelerated Stochastic Gradient Descent 2018 A2GradInc arxiv code gradient descent
Optimal Adaptive and Accelerated Stochastic Gradient Descent 2018 A2GradUni arxiv code gradient descent
Shampoo: Preconditioned Stochastic Tensor Optimization 2018 Shampoo arxiv code gradient descent
signSGD: Compressed Optimisation for Non-Convex Problems 2018 signSGD arxiv code gradient descent
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam 2018 VAdam arxiv code gradient descent
VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning 2018 VR-SGD arxiv code gradient descent
WNGrad: Learn the Learning Rate in Gradient Descent 2018 WNGrad arxiv code gradient descent
Adaptive Methods for Nonconvex Optimization 2018 Yogi neurips code gradient descent
First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time 2018 NEON arxiv gradient descent
Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization 2018 Katyusha X arxiv variance reduced
PSA-CMA-ES: CMA-ES with population size adaptation 2018 PSA-CMA-ES acm evolutionary
AdaGrad Stepsizes: Sharp Convergence Over Nonconvex Landscapes 2018 AdaGrad-Norm arxiv code gradient descent
Aggregated Momentum: Stability Through Passive Damping 2018 AggMo arxiv code gradient descent
Accelerating SGD with momentum for over-parameterized learning 2018 MaSS arxiv code gradient descent
SADAGRAD: Strongly Adaptive Stochastic Gradient Methods 2018 SADAGRAD mlr gradient descent
Deep Frank-Wolfe For Neural Network Optimization 2018 DFW arxiv code gradient descent
On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks 2018 AdaHB deepai gradient descent
On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks 2018 AdaNAG deepai gradient descent
Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization 2018 KGD arxiv code gradient descent
On the Convergence of Adam and Beyond 2019 AMSGrad arxiv code gradient descent
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates 2019 AdaAlter arxiv code gradient descent
Adaptive Gradient Methods with Dynamic Bound of Learning Rate 2019 AdaBound arxiv code gradient descent
Does Adam optimizer keep close to the optimal point? 2019 AdaFix arxiv gradient descent
Adaloss: Adaptive Loss Function for Landmark Localization 2019 Adaloss arxiv gradient descent
A new perspective in understanding of Adam-Type algorithms and beyond 2019 AdamAL openreview code gradient descent
On the Convergence of Adam and Beyond 2019 AdamNC arxiv gradient descent
Lookahead Optimizer: k steps forward, 1 step back 2019 Lookahead arxiv code gradient descent
On Higher-order Moments in Adam 2019 HAdam arxiv gradient descent
An Adaptive and Momental Bound Method for Stochastic Learning 2019 AdaMod arxiv code gradient descent
On the Convergence Proof of AMSGrad and a New Version 2019 AdamX arxiv gradient descent
Second-order Information in First-order Optimization Methods 2019 AdaSqrt arxiv code gradient descent
Adathm: Adaptive Gradient Method Based on Estimates of Third-Order Moments 2019 Adathm ieee gradient descent
Domain-independent Dominance of Adaptive Methods 2019 Delayed Adam arxiv code gradient descent
Domain-independent Dominance of Adaptive Methods 2019 AvaGrad arxiv code gradient descent
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates 2019 ArmijoLS arxiv code gradient descent
An Adaptive Remote Stochastic Gradient Method for Training Neural Networks 2019 ARSG arxiv code gradient descent
BGADAM: Boosting based Genetic-Evolutionary ADAM for Neural Network Optimization 2019 BGADAM arxiv gradient descent
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity 2019 CProp arxiv code gradient descent
DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization 2019 DADAM arxiv code gradient descent
diffGrad: An Optimization Method for Convolutional Neural Networks 2019 diffGrad arxiv code gradient descent
Gradient-only line searches: An Alternative to Probabilistic Line Searches 2019 GOLS-I arxiv gradient descent
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes 2019 LAMB arxiv code gradient descent
An Adaptive Remote Stochastic Gradient Method for Training Neural Networks 2019 NAMSB arxiv code gradient descent
An Adaptive Remote Stochastic Gradient Method for Training Neural Networks 2019 NAMSG arxiv code gradient descent
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks 2019 Novograd arxiv code gradient descent
Fast-DENSER++: Evolving Fully-Trained Deep Artificial Neural Networks 2019 F-DENSER++ arxiv code evolutionary
Fast DENSER: Efficient Deep NeuroEvolution 2019 F-DENSER researchgate code evolutionary
Parabolic Approximation Line Search for DNNs 2019 PAL arxiv code gradient descent
The Role of Memory in Stochastic Optimization 2019 PolyAdam arxiv gradient descent
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization 2019 PowerSGD arxiv code gradient descent
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization 2019 PowerSGDM arxiv code gradient descent
On the Variance of the Adaptive Learning Rate and Beyond 2019 RAdam arxiv code gradient descent
Matrix-Free Preconditioning in Online Learning 2019 RecursiveOptimizer arxiv code gradient descent
On Empirical Comparisons of Optimizers for Deep Learning 2019 RMSterov arxiv gradient descent
SAdam: A Variant of Adam for Strongly Convex Functions 2019 SAdam arxiv code gradient descent
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM 2019 Sadam arxiv code gradient descent
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM 2019 SAMSGrad arxiv code gradient descent
signADAM: Learning Confidences for Deep Neural Networks 2019 signADAM arxiv code gradient descent
signADAM: Learning Confidences for Deep Neural Networks 2019 signADAM++ arxiv code gradient descent
Memory-Efficient Adaptive Optimization 2019 SM3 arxiv code gradient descent
Momentum-Based Variance Reduction in Non-Convex SGD 2019 STORM arxiv code gradeint descent
ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization 2019 ZO-AdaMM arxiv code gradient free
signSGD via Zeroth-Order Oracle 2019 ZO-signSGD openreview gradient free
Demon: Improved Neural Network Training with Momentum Decay 2019 Demon SGDM arxiv code gradient descent
Demon: Improved Neural Network Training with Momentum Decay 2019 Demon Adam arxiv code gradient descent
An Optimistic Acceleration of AMSGrad for Nonconvex Optimization 2019 OPT-AMSGrad arxiv gradient descent
UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization 2019 UniXGrad arxiv gradient descent
An Adaptive Optimization Algorithm Based on Hybrid Power and Multidimensional Update Strategy 2019 AdaHMG ieee gradient descent
ProxSGD: Training Structured Neural Networks under Regularization and Constraints 2019 ProxSGD openreview code gradient descent
Efficient Learning Rate Adaptation for Convolutional Neural Network Training 2019 e-AdLR ieee gradient descent
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients 2020 AdaBelief arxiv code gradient descent
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning 2020 ADAHESSIAN arxiv code gradient descent
Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia 2020 Adai arxiv code gradient descent
Adam+: A Stochastic Method with Adaptive Variance Reduction 2020 Adam+ arxiv gradient descent
Adam with Bandit Sampling for Deep Learning 2020 Adambs arxiv code gradient descent
Why are Adaptive Methods Good for Attention Models? 2020 ACClip arxiv gradient descent
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights 2020 AdamP arxiv code gradient descent
On the Trend-corrected Variant of Adaptive Stochastic Optimization Methods 2020 AdamT arxiv code gradient descent
AdaS: Adaptive Scheduling of Stochastic Gradients 2020 AdaS arxiv code gradient descent
AdaScale SGD: A User-Friendly Algorithm for Distributed Training 2020 AdaScale arxiv gradient descent
AdaSGD: Bridging the gap between SGD and Adam 2020 AdaSGD arxiv gradient descent
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory 2020 AdaX arxiv code gradient descent
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory 2020 AdaX-W arxiv code gradient descent
AEGD: Adaptive Gradient Descent with Energy 2020 AEGD arxiv code gradient descent
Biased Stochastic Gradient Descent for Conditional Stochastic Optimization 2020 BSGD arxiv gradient descent
Compositional ADAM: An Adaptive Compositional Solver 2020 C-ADAM arxiv gradient descent
CADA: Communication-Adaptive Distributed Adam 2020 CADA arxiv code gradient descent
CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing 2020 CoolMomentum arxiv code gradient descent
EAdam Optimizer: How ε Impact Adam 2020 EAdam arxiv code gradient descent
Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties 2020 Expectigrad arxiv code gradient descent
Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum 2020 FRSGD arxiv gradient descent
Iterative Averaging in the Quest for Best Test Error 2020 Gadam arxiv gradient descent
A Variant of Gradient Descent Algorithm Based on Gradient Averaging 2020 Grad-Avg arxiv gradient descent
Gravilon: Applications of a New Gradient Descent Method to Machine Learning 2020 Gravilon arxiv gradient descent
Practical Quasi-Newton Methods for Training Deep Neural Networks 2020 K-BFGS arxiv code gradient descent
Practical Quasi-Newton Methods for Training Deep Neural Networks 2020 K-BFGS(L) arxiv code gradient descent
LaProp: Separating Momentum and Adaptivity in Adam 2020 LaProp arxiv code gradient descent
Mixing ADAM and SGD: a Combined Optimization Method 2020 MAS arxiv code gradient descent
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering 2020 MEKA arxiv gradient descent
MTAdam: Automatic Balancing of Multiple Training Loss Terms 2020 MTAdam arxiv code gradient descent
Momentum with Variance Reduction for Nonconvex Composition Optimization 2020 MVRC-1 arxiv gradient descent
Momentum with Variance Reduction for Nonconvex Composition Optimization 2020 MVRC-2 arxiv gradient descent
PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization 2020 PAGE arxiv gradient descent
Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization 2020 PSTorm arxiv gradient descent
Ranger-Deep-Learning-Optimizer 2020 Ranger github code gradient descent
Gradient Centralization: A New Optimization Technique for Deep Neural Networks 2020 GC arxiv code gradient descent
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima 2020 S-SGD arxiv gradient descent
SALR: Sharpness-aware Learning Rate Scheduler for Improved Generalization 2020 SALR arxiv gradient descent
Sharpness-aware Minimization for Efficiently Improving Generalization 2020 SAM arxiv code gradient descent
Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent 2020 SGD-G2 arxiv gradient descent
A New Accelerated Stochastic Gradient Method with Momentum 2020 SGDM arxiv gradient descent
Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent 2020 SRSGD arxiv code gradient descent
Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs 2020 SHAdaGrad arxiv gradient descent
Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods 2020 SKQN arxiv gradient descent
Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods 2020 S4QN arxiv gradient descent
SMG: A Shuffling Gradient-Based Method with Momentum 2020 SMG arxiv gradient descent
Stochastic Normalized Gradient Descent with Momentum for Large Batch Training 2020 SNGM arxiv gradient descent
TAdam: A Robust Stochastic Gradient Optimizer 2020 TAdam arxiv code gradient descent
Eigenvalue-corrected Natural Gradient Based on a New Approximation 2020 TEKFAC arxiv gradient descent
pbSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization 2020 pbSGD ijcai code gradient descent
Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization 2020 Apollo arxiv code quasi-newton
Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization 2020 ApolloW arxiv code quasi-newton
Slime mould algorithm: A new method for stochastic optimization 2020 SMA sciencedirect code evolutionary
AdaSwarm: Augmenting Gradient-Based optimizers in Deep Learning with Swarm Intelligence 2020 AdaSwarm arxiv code evolutionary
Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities 2020 AdaACSA arxiv gradient descent
Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities 2020 AdaAGD+ arxiv gradient descent
SCW-SGD: Stochastically Confidence-Weighted SGD 2020 SCWSGD ieee gradient descent
An Improved Adaptive Optimization Technique for Image Classification 2020 Mean-ADAM ieee gradient descent
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes 2020 LANS arxiv code gradient descent
Weak and Strong Gradient Directions: Explaining Memorization, Generalization, and Hardness of Examples at Scale 2020 RM3 arxiv code gradient descent
On the distance between two neural networks and the stability of learning 2020 Fromage arxiv code gradient descent
Smooth momentum: improving lipschitzness in gradient descent 2022 Smooth Momentum springerlink gradient descent
Towards Better Generalization of Adaptive Gradient Methods 2020 SAGD neurips gradient descent

awesome-optimizer's People

Contributors

kovax007 avatar zoq avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.