Comments (11)
Thanks for your analysis, it all makes sense now.
In your opinion, what is the downside of choice 2 ?
from asgd.
I guess 3 is more flexible, we could add a note in the docstring suggesting good practices in setting them up, what do you say ?
from asgd.
BTW, Bottou has a similar implementation:
http://leon.bottou.org/projects/sgd#stochastic_gradient_svm
from asgd.
Cool, I suggest setting the defaults following Bottou.
from asgd.
Sounds good. Closing this issue then, since setting the defaults like Bottou has been covered in other (opened) issues.
from asgd.
It still feels like the learning rate should be protected within a min(1, lr(t)) to prevent a few giant steps at the beginning but I just looked through bottou's svmsgd.cpp and there is no such protection.
Instead, he runs through at most 1000 examples a few times to do a coarse line search on C0 between 1.0 and 2.0, then follows through with the full optimization on the C0 that did best on the limited budget.
This would put his full search range about 3 orders of magnitude higher than our current implementation.
from asgd.
In this case, it may make more sense to set the default to 1.0
If we want to write a similar procedure, optimizing sgd_step_size0
on a few examples should probably be written in a different object, what do you think ?
from asgd.
How about putting this into BaseASGD, and implementing there in terms of virtual methods self.fit and self.loss or something? sklearn has names for this sort of behavior but we can pick anything sensible for now.
from asgd.
+1 on setting the default sgd_step_size0 to 1.0 though.
from asgd.
I just made a small change to the Theano implementation that makes it go much faster. For different numbers of features I now see the following runtimes:
N_FEAT:100 Naive:0.192 Theano:0.038
N_FEAT:1000 Naive:0.239 Theano:0.083
N_FEAT:10000 Naive:1.085 Theano:0.520
The change was to replace dot(obs, weights) with (obs * weights).sum(). Theano lacks L1BLAS, and was using GEMM for the inner product. This optimization is now submitted to the theano trunk.
from asgd.
Update: using gemv is as fast as elemwise sum, (and faster than gemm)
Also: benchmarking the Theano and Naive implementations is problem dependent. Theano's "if" is slow, so it computes the weight update regardless of whether the margin is violated. Theano computes things faster than the naive implementation. So if there are a lot of weight updates or the features are relatively small (< 10K items) then Theano is always faster, sometimes by several times. However, when features are long and relatively few examples actually violate the margin, then the Naive implementation can be slightly faster.
from asgd.
Related Issues (17)
- BLAS and heap safety HOT 3
- Ready to test! HOT 5
- kwargs for mini_batch settings
- Weight updates only if margin constraint violated. HOT 1
- multiclass HOT 1
- Online LOOE-like for hyper-parameters selection.
- ENH: warm-up period with exponential moving asgd and switch from sgd to asgd when empirical loss gets higher
- The Cython wrapper cannot find the symbols defined by BLAS HOT 3
- Compute X_mean and X_std in fit().
- Use η0 / (1 + λ η0 t)^0.75 by default instead of ...^2/3.
- step_size0 should be cross validated on a small subset of the data HOT 3
- Sparsity trick on weights update
- new reference for readme HOT 1
- simple_blas.h missing? HOT 2
- Working feedback
- Parameter for (exponentially) moving average
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from asgd.