Git Product home page Git Product logo

Comments (11)

npinto avatar npinto commented on June 25, 2024

Thanks for your analysis, it all makes sense now.

In your opinion, what is the downside of choice 2 ?

from asgd.

npinto avatar npinto commented on June 25, 2024

I guess 3 is more flexible, we could add a note in the docstring suggesting good practices in setting them up, what do you say ?

from asgd.

npinto avatar npinto commented on June 25, 2024

BTW, Bottou has a similar implementation:
http://leon.bottou.org/projects/sgd#stochastic_gradient_svm

from asgd.

jaberg avatar jaberg commented on June 25, 2024

Cool, I suggest setting the defaults following Bottou.

from asgd.

npinto avatar npinto commented on June 25, 2024

Sounds good. Closing this issue then, since setting the defaults like Bottou has been covered in other (opened) issues.

from asgd.

jaberg avatar jaberg commented on June 25, 2024

It still feels like the learning rate should be protected within a min(1, lr(t)) to prevent a few giant steps at the beginning but I just looked through bottou's svmsgd.cpp and there is no such protection.

Instead, he runs through at most 1000 examples a few times to do a coarse line search on C0 between 1.0 and 2.0, then follows through with the full optimization on the C0 that did best on the limited budget.

This would put his full search range about 3 orders of magnitude higher than our current implementation.

from asgd.

npinto avatar npinto commented on June 25, 2024

In this case, it may make more sense to set the default to 1.0

If we want to write a similar procedure, optimizing sgd_step_size0 on a few examples should probably be written in a different object, what do you think ?

from asgd.

jaberg avatar jaberg commented on June 25, 2024

How about putting this into BaseASGD, and implementing there in terms of virtual methods self.fit and self.loss or something? sklearn has names for this sort of behavior but we can pick anything sensible for now.

from asgd.

jaberg avatar jaberg commented on June 25, 2024

+1 on setting the default sgd_step_size0 to 1.0 though.

from asgd.

jaberg avatar jaberg commented on June 25, 2024

I just made a small change to the Theano implementation that makes it go much faster. For different numbers of features I now see the following runtimes:

N_FEAT:100 Naive:0.192 Theano:0.038
N_FEAT:1000 Naive:0.239 Theano:0.083
N_FEAT:10000 Naive:1.085 Theano:0.520

The change was to replace dot(obs, weights) with (obs * weights).sum(). Theano lacks L1BLAS, and was using GEMM for the inner product. This optimization is now submitted to the theano trunk.

from asgd.

jaberg avatar jaberg commented on June 25, 2024

Update: using gemv is as fast as elemwise sum, (and faster than gemm)

Also: benchmarking the Theano and Naive implementations is problem dependent. Theano's "if" is slow, so it computes the weight update regardless of whether the margin is violated. Theano computes things faster than the naive implementation. So if there are a lot of weight updates or the features are relatively small (< 10K items) then Theano is always faster, sometimes by several times. However, when features are long and relatively few examples actually violate the margin, then the Naive implementation can be slightly faster.

from asgd.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.