Git Product home page Git Product logo

Comments (3)

kloudkl avatar kloudkl commented on April 28, 2024

@qipeng has solved this issue in #741.

from caffe.

pwohlhart avatar pwohlhart commented on April 28, 2024

Hi

I might be mistaken, but I dont think your interpretation of the Bengio et al. paper is right. They show that the parameter update (Formula 7) is the same as the one in the regular momentum (Formula 5), except for different coefficients. These coefficients however are then not the same as those used to update the velocity (Formula 6) (which would make if completely the same). That's what makes the difference (although probably a rather slight one?).

from caffe.

qipeng avatar qipeng commented on April 28, 2024

Hi @pwohlhart , due to the limitation of the current gradient based solver that it only evaluates the gradient once and updates the parameters once every iteration, my implementation is slightly different from (and perhaps slightly faster than) the original NAG.

Each iteration of the standard NAG can be viewed as:

  1. Update the current parameters to a "future point" with the current velocity
  2. Evaluate the gradient at that point
  3. "Undo" the update
  4. Update the velocity with the gradient at the future point
  5. Update the parameters with the new velocity

Due to the aforementioned limitations, my implementation is:

  1. Evaluate the gradient at a "future point"
  2. Add a negative velocity to the parameter update
  3. Update the velocity, and add the new velocity to the parameter update (multiplied by 1+momentum to update the parameters to the "future point" of the next iteration)
  4. Update the parameters with their corresponding updates

Here several parameter updates in the original algorithm are consolidated.

The only slight difference between this method and the standard NAG is that the parameter states between iterations are always the "future point" of that iteration, i.e. theta + momentum * velocity. This shouldn't cause too big of a problem as the gradient and/or learning rate are usually close to zero when the optimization approaches its end.

from caffe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.