Comments (6)
I like this idea a lot, but we have to be mindful of the practicalities of checking lots of data into the tree. We could host the data in a separate repo and use a download script.
from golearn.
Yes, this is how we should do it. Only store code in the repo, but use a Go script to download it all.
from golearn.
I've started writing a benchmarking suite, here's a quick update on philosophy, features, current status, caveats, and open questions.
Philosophy: the general idea is to have a suite of tests that stress the algorithms in the golearn
library in a number of ways, establishing benchmarks for accuracy and speed. I want the tests to be highly decoupled from the implementation (so, e.g. for classifiers, it should only know how to create them, and then call Fit and Predict on them, and not much else) and also decoupled from the regular workflow on golearn
(I don't want people to have to run slow tests or download large datasets to work on golearn
). For those reasons, and also since it's a new project that's likely to see a lot of churn for now, it's a separate repo from golearn
, but can be copy-pasted in later if keeping things in sync becomes painful.
Given that it's "out of the way", it still needs to provide value as a regression check against changes that hurt performance, and as a standard to decide whether new algorithm optimizations actually improve things. I imagine that the Travis build should go get
and then run the tests in the benchmark suite, so that it serves its purpose as a regression suite.
Features: Structurally I plan for it to have a suite for classifiers, a suite for optimization algorithms, etc. Each suite will benchmark behaviour of some number of algorithms (whichever ones are implemented in golearn
) against a common set of datasets for the suite. So each suite will consist of three main things: (a) datasets, (b) shared behaviours that make assertions about how an algorithm in the suite performs against a given dataset, and (c) concrete applications of the shared behaviours for the specific algorithms in golearn
. One thing that will be nice is that anyone can use (a), and anyone writing an ML library who wraps their algorithms with something that implements the interfaces defined in golearn
can even use (b). The idea here fits with the "decoupled" philosophy, namely that this project tries to solve "how do you benchmark an ML library, then apply it to golearn
" rather than just "how do you benchmark golearn
."
Current Status: I've only started on the Classifier suite, and the shared behaviours for that are done. I only have one basic dataset so far, and have only applied them to one algorithm. Adding more datasets and plugging in different classifiers will be easy now. Not sure yet what the next suite will be.
Caveat: It works against (the develop
branch of) my fork of golearn
. The only salient difference between that branch and master of this repo is Fit
and Predict
now include errors in their return signatures. I noticed that Fit
would just hang sometimes if the input wasn't what it expected, so after fixing that it was clear Fit, and also Predict, should really be able to return errors.
Open Questions:
- What's a good source for datasets?
- What variety of datasets should we use? For the Classifier suite, my current thinking is to have (a) a basic dataset, (b) very large dataset, (c) a dataset where the number of features is large, (d) a dataset with mixed-type features, (e) dataset where the "boundaries" between classes is somewhat fuzzy.
- What other suites can the algorithms be broken down into? Regressions, Classifiers, Optimizers, Clusterers, ...?
from golearn.
@amitkgupta http://www.quandl.com/ is free and has a ton of nice data available
from golearn.
Has been mentioned recently but the libsvm project have some good datasets, though we can't read them yet.
from golearn.
Nice, the libsvm datasets look great, table even shows number of features,
classes, total size of dataset, etc. Exactly the kind of breakdown I was
hoping for.
On Fri, Aug 22, 2014 at 6:45 AM, Richard Townsend [email protected]
wrote:
Has been mentioned recently but the libsvm project have some good datasets
http://www.csie.ntu.edu.tw/%7Ecjlin/libsvmtools/datasets/, though we
can't read them yet.—
Reply to this email directly or view it on GitHub
#72 (comment).
from golearn.
Related Issues (20)
- Recent merge introduces compilation error
- IsolationForest panic: 0 ClassAttributes (1 expected) HOT 4
- Arch linux support / installation? HOT 2
- The same order of categorical values for Equals function is necessary or not HOT 1
- Example about how to query the model after being trained using KNN ? HOT 4
- Building the KNN Example as a static linked executable fails
- runtime error:cgo argument has Go pointer to Go pointer HOT 2
- neural: Rename `neural.NeuralFunction` -> `neural.Function` HOT 1
- Bad import, was an upstream dependency deleted? HOT 2
- Question: How to convert class name into single quote?
- Any interest in XGBoost? HOT 7
- Implement tanh activation func HOT 2
- Error in download golearn with 'go get' HOT 3
- KNNClassifier has no field or method optimisedEuclideanPredict HOT 2
- golearn can read the .pickle converted csv file ?
- Support for Apple Metal HOT 2
- KNN Classifier saved and loaded models don't give the same results. HOT 1
- KNN.optimisedEuclideanPredict undefined (type *KNNClassifier has no field or method optimisedEuclideanPredict) HOT 6
- Question: Could we save the model and use it next time for reproduction of result? HOT 6
- there is a crash in dbscan algoritm HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from golearn.