Toy examples of AdaBelief Optimizer
This repository tries to reproduce toy examples (fig.3) in the paper.
SGD behaves differently from the figures in the paper. SGD with learning rate = 10 ^ -3 behaves unstably for Beale function and Rosenbrock function. I add SGD with learning rate = 10 ^ -6.
TBD
TBD