npinto / asgd Goto Github PK
View Code? Open in Web Editor NEWAveraged Stochastic Gradient Descent Classifiers
Averaged Stochastic Gradient Descent Classifiers
In the caa_stable branch is simpe_blas.h supposed to be present?
see sparsity trick from bottou
The idea is to boost the performance by "disabling" the averaging until it gets useful. start with exp_moving_asgd_step_size=1e-2 ?
It would be useful to have the possibility of using "mini_batches" to get better estimations of the gradients (see https://github.com/npinto/asgd/blob/master/asgd/naive_asgd.py#L60). Since we'll be using BLAS, etc. this parameter could possibly speed up convergence.
The learning rate has the form η0 / (1 + λ η0 t)^0.75 where λ is the regularization constant.
I fixed all the (apparent) issues with C and BLAS, and now all BLAS functions are called with the right matrix/version dimensions. I gdb'ed them and they all appear to write in the proper memory location. Also, I cleaned up the C file a little bit. Notice that I created a new branch, caa_blas, for the development of the blas-based algorithm.
Now I should start testing that I implemented the logic of the algorithm as a whole correctly. I think a good approach would be to first have a unit test for each of the three functions, and then some integration test for the algorithm. That would be good especially if you plan to push the software as a public library, and to guarantee the correctness of future edits. What do you think?
For the specifics of testing, I can't exactly replicate the python test, as I doubt that the NumPy Mersenne Twister yields the same sequences as the LCG in glibc when seeded with the same seeds (42 and 43). So, if I want to start by testing everything in C, I should find some other way to do it. Any suggestions about this? I am hoping to make much progress from the testing on.
Finally, let me know when you have a chance to pass me the real testing matrices and the reference to the Python wrapping tools that you suggest, so I can also start looking at these. I know you are busy lately, so no worries.
Let me know...
The constant η0 is determined by performing preliminary experiments on a data subsample.
http://leon.bottou.org/projects/sgd
We could also have a asgd.tune_...()
methods to "tune" speed and accuracy (here step_size0 would be optimized).
Remember the previous discussion we had about whether the l2_regularization parameter should be allowed to be 0? I think that was related to a mis-parameterization of the BaseASGD object.
Background: the schedule implemented takes the form C(t) = C0 (1 + C0 k t)^a, where C0
is the initial sgd step size, t
is the number of iterations so far, a
is the annealing exponent, and k
is an extra multiplier on the time. This form is recommended in [1].
The current naive_asgd implementation hardcodes k = l2_regularization, as suggested in [2].
This leads to weirdness though because C0
is typically like 1e-2 or 1e-3, and l2_regularization
should usually be something like 1e-6. In the most aggressive case we end have a schedule like
C(t) = C0 (1 + 1e-8 t)^a
For a normal choice of a=.5
this is basically un-annealed SGD, since it takes 100 million examples to get the learning rate from C0 down to C0/sqrt(2). I'm not that confident in saying it, but I don't think this was how the annealing was meant to be used.
I think this could be improved in two ways:
C0
from the annealing schedule (simplify things... why put it in twice? and It is not used in [3])k
default to C0
, rather than l2_regularization
k
a parameter of BaseASGD because both [2] and [3] say it's problem-dependent.[1] http://arxiv.org/pdf/1107.2490v2
[2] http://www.dbs.ifi.lmu.de/~yu_k/cvpr11_0694.pdf
[3] http://hal.archives-ouvertes.fr/docs/00/60/80/41/PDF/gradsto_hal.pdf
To decrease communication and speed up convergence, we should have an option (default=True) to only update weights when margin constraints have been violated:
e.g.: Line #66 should move up (to Line #63) and if margin >= 1: continue
.
https://github.com/npinto/asgd/blob/master/asgd/naive_asgd.py#L66
Although I didn't get everything out of it by far, but this paper looks like a good one to add to the README:
http://hal.archives-ouvertes.fr/docs/00/60/80/41/PDF/gradsto_hal.pdf
I am having an issue with cython. The cython file compiles correctly to C, and the C file also compiles correctly to .so (see first target of the Makefile on branch caa_devel). However, when I import the .so in python, python complains that
ImportError: ./pasgd.so: undefined symbol: cblas_sdsdot
I am not using cblas_sdsdot in python, and also the function is in fact defined (either in simple_blas.c or from libcblas.so). This sounds like a silly issue, like me not giving some necessary parameter to Python. Did you ever encounter something like this? Suggestions?
I have integrated BLAS in the code, and all functions are complete. However, a BLAS call causes a memory corruption: I need to track it down and correct it. It is probably a wrong size parameter (e.g. sizeof(*x) instead of 1). Lack of documentation for BLAS (both the NetLib version and OpenBLAS) is definitely an issue - will have to be more careful.
After this, I will need to unit-test the code. A preliminary step could be testing the code entirely in C (which makes testing harder, but allows me to test before writing the Python wrapper) or wrapping the functions first (allows for a more flexible testing, but potentially introduces issues with wrapping).
"sphere" the data and merge in the weights
I've been working on a multiclass version of this -- https://github.com/yamins81/asgd/tree/feature/multiclass
When it's ready I'd like to contribute it back, but in the meantime, can you comment on it when you guys have a moment?
Thanks,
Dan
Multiple (sgd_step_size0, l2_regularization) could be given and *fit()
methods could use BLAS Level-3 operations when appropriate to allow for more data re-use and speed up the computation.
This is confusing... please consult me for more details ;-)
Loosely related work:
Notes on Regularized Least-Squares
by Rifkin. and. Lippert
http://cbcl.mit.edu/publications/ps/MIT-CSAIL-TR-2007-025.pdf
feedback should be restored (btw, it was used in LeCun's NIPS'11 optimization challenge)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.