sjwhitworth / golearn Goto Github PK
View Code? Open in Web Editor NEWMachine Learning for Go
License: MIT License
Machine Learning for Go
License: MIT License
I am currently trying to take natively typed data in a stream processing system and do predictions on the data. All of the current examples only show how to create instances with csv data. The one example that shows how to create instances only works with converting string data to float64. I already have a map[string]float64 for all of my data I want to put it into an instance and do predictions based on previously learned data.
Any help would be appreciated, if there is no current way to do this I would love to do the work necessary to support this.
Call me crazy, but BatchGradientDescent
doesn't find the min, nor the argmin, as the GradientDescent
part of the function name and the optimisation
package name would suggest. It also doesn't do parameter estimation as the tests suggest.
I actually don't know what "parameter estimation" even means in this context, but I'm guessing that, assuming y = a1*x1 + a2*x2 + ... + an*xn
(dot product), it attempts to figure out that the linear coefficient parameters a1, ..., an
are, given a bunch of observed values y
and observed tuples (x1, ..., xn)
.
Can someone enlighten me on what this code does and/or is supposed to do?
Could you give me an example or an easy method to create the instances for test data? I was able to follow the main example in the landing page to split train-test data of something similar to that below:
10,hello,hello
9,helow,hello
8,mahalo,mahalo
12,helo,hello
7,hallo,hello
5,halo,hello
11,hellow,hello
8,mhalo,mahalo
12,mehalo,mahalo
But how do I now create a test data instance of something like below that I can pass to a Predict function?
10,melo
Is there a way to add it in a way similar to ParseCSVToInstances? Maybe a ParseStringToPredictInstance? Or ParseCSVToPredictInstance?
When training a random forest, I see vastly different, and poorer results when using golearn, than when using Python's scikit-learn. Unfortunately the dataset is confidential, so I can't share it here online. However, I'm using the same train/test split, and have ensured that data is represented the same way (they're all floats in both).
When using scikit learn
Auc: 0.943867958106
Confusion Matrix
[[35878 1876]
[ 5402 16388]]
precision recall f1-score support
0 0.87 0.95 0.91 37754
1 0.90 0.75 0.82 21790
avg / total 0.88 0.88 0.88 59544
When using golearn, with the same number of estimators..
Reference Class True Positives False Positives True Negatives Precision Recall F1 Score
--------------- -------------- --------------- -------------- --------- ------ --------
1.00 4199 1366 15858 0.7545 0.4263 0.5448
0.00 15858 5651 4199 0.7373 0.9207 0.8188
Overall accuracy: 0.7408
As you can see, there's a big drop in precision and recall, on both outcomes. Any ideas as to what could be the problem @Sentimentron ?
The installation README has instructions related to C dependencies for SUSE Linux which are missing for Ubuntu. The particular omissions are:
gonum/blas/cblas.go
. (After following the instructions to install OpenBLAS, you need to change the line #cgo linux LDFLAGS: -lcblas
to #cgo linux LDFLAGS: -openblas
, and may need to include a -L/path/to/OpenBLAS
if it's installed to some non-standard location.liblinear
, which you can do with sudo apt-get install liblinear-dev
.I would create a PR myself with these additional instructions, BUT then go get ./...
in the root of this project still fails, yielding the following error:
linear_models/liblinear.go:55: cannot use &c_y[0] (type *C.double) as type *C.int in assignment
Casting c_y[0]
to a C.int
"fixes" it, and go test ./...
passes. The liblinear
that I'm apt-get
ting is somehow different than the library built from source as per the instructions for SUSE Linux?
I was wondering if it would be better to focus on a particular class in machine learning that is computationally intensive - dimensionality reduction, neural networks, fourier transform, etc rather than re-writing the algorithms. My take is if we just use one of the multi-translator compiler - python -> go or c++ -> go, we might be able to save some time. The team can just do some tests.
nnet implements some more exotic neural network architectures that those we currently support, so I think integration is logical.
It's a little bit of a mess at the moment in terms of duplication of code . I'm going to clean up and send a pull request. I'm working on the interfaces branch if you'd like to take a look.
Hi everyone. I'd like to formalise what features we want for a V.01 release. What I mean by this is, is the first version of GoLearn that is nearly ready for production use externally. We'll learn much more when it's in the hands of users. Docs need to be improved substantially, and we need a few more implementations of algorithms.
What does everyone think?
cc: @ifesdjeen @npbool @macmania @lazywei @marcoseravalli
When installing everything from scratch on my Raspberry Pi, I get this error when trying to run the tests.
# github.com/riobard/go-mmap
../riobard/go-mmap/mmap_linux.go:8: undefined: syscall.MAP_32BIT
../riobard/go-mmap/mmap_linux.go:8: const initializer (<T>)(syscall.MAP_32BIT) is not a constant
../riobard/go-mmap/mmap_linux.go:17: undefined: syscall.MAP_STACK
../riobard/go-mmap/mmap_linux.go:17: const initializer (<T>)(syscall.MAP_STACK) is not a constant
../riobard/go-mmap/mmap_linux.go:18: undefined: syscall.MAP_HUGETLB
../riobard/go-mmap/mmap_linux.go:18: const initializer (<T>)(syscall.MAP_HUGETLB) is not a constant
Any ideas @Sentimentron?
Hi. In the base.GetClass() function, the return type is set to string. This implies that all class labels will be strings. Is this always the case? I think it would be better to have this as an interface{} where the user can determine the type at learning time.
Is there any specific reason why this is set to string?
I've implemented this before, but only for a binary case. It requires some kind of spatial indexing, which we don't currently have support for either.
Currently golearn uses fmt.Println liberally to let you know what's going on. While this is nice for small training sets, it's over the top and completely unhelpful for larger sets.
For example, I'm training for a data set with 100,000 rows and categorical features with a very high cardinality. This results is an absurdly huge amount of console spam. Additionally, writing this much out to the screen can actually slow down program execution significantly.
One way to solve this would be to use a log.Logger to do all outputs. By default, set up the logger using stdout, but allow users to configure the logger to write to a file or write to dev/null to essentially turn logging off.
I may be able to submit a proper pull request, but wanted to post first to see if other people want this feature. What do you think?
Discussed in #72, LibSVM have some really awesome and large datasets that might be good for benchmarking and also might be crucial for people wanting to try out golearn. Probably need a new FixedDataGrid which stores things in the LibSVM memory format, as well as the parsing functions.
OK, I've just been hitting my head against merging a high-resolution DataFrame
with a low-resolution time Series
in pandas
and it sucked harder than a dying star. I think we can do better.
Lots of ML applications involve predicting one or more dependent variables based on past measurements, usually at a given timestamp. Sometimes, these are totally linear and regular but often they aren't and I don't want to have to write bespoke interpolation functions, merging functions etc to get the job done. Sometimes I want to process stock data, where I've got measurements at defined dates and time, and sometimes I don't care what time really means.
I propose to add a new interface - TimeGrid
- to base
which will look like this:
interface TimeGrid {
FixedDataGrid
SetInterpolationMethod(InterpolationMethod)
AtTime(int64, AttributeSpec) []byte
SetTime(int64, AttributeSpec, []byte)
}
as well as two implementations of TimeGrid
: AbsoluteTimeSeries
and RelativeTimeSeries
. They will both function as Instances
do, but will always add a mandatory time Attribute
(called Time
) (TimeAttribute
for AbsoluteTimeSeries
, and a new IntAttribute
for RelativeTimeSeries
). Rows will be accessed in the order implied by those Attributes
. Accesses between recorded time positions will trigger interpolation of any FloatAttributes
using an interpolation function (currently expected to be Nearest
and Linear
).
Additionally, I want to write new filters to resample time series, detrend them, apply convolutions to them, and combine them with other time series data.
Again, a very useful method.
Apparently github.com/gonum/blas/cblas was recently changed to github.com/gonum/blas/cblas128 This, by the way breaks the go get -u -t ./... command
package github.com/sjwhitworth/golearn
imports github.com/sjwhitworth/golearn/base
imports github.com/gonum/blas
imports github.com/gonum/blas/blas64
imports github.com/gonum/blas/native
imports github.com/gonum/internal/asm
imports github.com/gonum/matrix/mat64
imports github.com/smartystreets/goconvey/convey
imports github.com/jtolds/gls
imports github.com/smartystreets/goconvey/convey/assertions
imports github.com/smartystreets/goconvey/convey/assertions/oglematchers
imports github.com/smartystreets/goconvey/convey/gotest
imports github.com/smartystreets/goconvey/convey/reporting
imports github.com/sjwhitworth/golearn/ensemble
imports github.com/gonum/blas/cblas
imports github.com/gonum/blas/cblas
imports github.com/gonum/blas/cblas: cannot find package "github.com/gonum/blas/cblas" in any of:
/usr/local/Cellar/go/1.4/libexec/src/github.com/gonum/blas/cblas (from $GOROOT)
/Users/mathDR/gocode/src/github.com/gonum/blas/cblas (from $GOPATH)
Please run gofmt
on your source files. It will, for example, eliminate crazy inconsistent indentation.
I recommend a simple git pre-commit
hook:
#!/bin/sh
# Redirect output to stderr.
exec 1>&2
files="*.go */*.go"
nofmted=$(gofmt -l $files)
if [ $(echo "$nofmted" | wc -w) != 0 ]; then
echo "Some files are not gofmt'd:"
for f in $nofmted; do
echo $f
done
exit 1
fi
This will prevent you from committing Go code that isn't gofmt
'd.
Depending on your editor, there may be tools that can gofmt
your code automatically when you save.
We still don't have it and we do need it. #90 offers a candidate API, where you call the function with a classifier and it gives you back k confusion matrices that you can feed to an evaluation function. Things we need to consider are:
I've started writing some documentation for the direction of the project. Please contribute your ideas when you get a minute. You'll have to send me your email so I can give you edit rights.
https://docs.google.com/a/hailocab.com/document/d/1x21Y-g1rga0LTwC_LnKHi0y7RjFzd2Il7YB47rp7kTA/edit
Would this be interesting / fit the roadmap? What do you guys think?
Hello! Does anyone want to do a google hangout sometime next week?
I'm new to this open source thing and I don't know how other groups do it but it would be awesome to have a google hangout to just have a check mark on how the components are going and to see which parts are good to go, which ones needs to be fully tested and stuff.
p.s. I still have yet to implement my part! So sorry about that, I got caught up on exams + a golang web app project. I should have more energy and time to spend on neural-nets and svms :)
If I wrap the code inside TestRandomForest1
inside a 10-iteration for loop, I get the following panic:
panic: cannot allocate memory
I'm running this on an m3.2xlarge EC2 instance (Linux 3.13.0-32-generic x86_64). The first 8 or so iterations are likely to succeed, the problem almost always occurs when iterating 10 or more times.
Other things to note: If I increase the forest size to something like 50, then 2-3 iterations suffice to cause the panic. Also, if I remove the call to rf.Predict(testData)
(and all subsequent code depending on the result of that Predict
), then the panics do not occur.
Here's the full output, starting from the panic (not including the output of the first few iterations that succeed):
panic: cannot allocate memory
goroutine 184 [running]:
runtime.panic(0x5851e0, 0xc)
/home/ubuntu/.gvm/gos/go1.3/src/pkg/runtime/panic.c:279 +0xf5
github.com/sjwhitworth/golearn/base.NewDenseInstances(0xc2083f6b00)
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/base/dense.go:29 +0x67
github.com/sjwhitworth/golearn/base.GeneratePredictionVector(0x7fd6cf13a9f8, 0xc2083f6b00, 0x0, 0x0)
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/base/util_instances.go:16 +0xa3
github.com/sjwhitworth/golearn/trees.(*DecisionTreeNode).Predict(0xc2082189b0, 0x7fd6cf13a9f8, 0xc2083f6b00, 0x0, 0x0)
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/trees/id3.go:199 +0xad
github.com/sjwhitworth/golearn/trees.(*ID3DecisionTree).Predict(0xc20807bc40, 0x7fd6cf13a9f8, 0xc2083f6b00, 0x0, 0x0)
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/trees/id3.go:278 +0x51
github.com/sjwhitworth/golearn/meta.func·004()
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:143 +0x140
created by github.com/sjwhitworth/golearn/meta.(*BaggedModel).Predict
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:149 +0x25b
goroutine 16 [chan receive]:
testing.RunTests(0x601658, 0x69c010, 0x1, 0x1, 0x575401)
/home/ubuntu/.gvm/gos/go1.3/src/pkg/testing/testing.go:505 +0x923
testing.Main(0x601658, 0x69c010, 0x1, 0x1, 0x6a8700, 0x0, 0x0, 0x6a8700, 0x0, 0x0)
/home/ubuntu/.gvm/gos/go1.3/src/pkg/testing/testing.go:435 +0x84
main.main()
github.com/sjwhitworth/golearn/ensemble/_test/_testmain.go:47 +0x9c
goroutine 19 [finalizer wait]:
runtime.park(0x413090, 0x6a4ef8, 0x6a3a09)
/home/ubuntu/.gvm/gos/go1.3/src/pkg/runtime/proc.c:1369 +0x89
runtime.parkunlock(0x6a4ef8, 0x6a3a09)
/home/ubuntu/.gvm/gos/go1.3/src/pkg/runtime/proc.c:1385 +0x3b
runfinq()
/home/ubuntu/.gvm/gos/go1.3/src/pkg/runtime/mgc0.c:2644 +0xcf
runtime.goexit()
/home/ubuntu/.gvm/gos/go1.3/src/pkg/runtime/proc.c:1445
goroutine 20 [runnable]:
github.com/sjwhitworth/golearn/meta.(*BaggedModel).Predict(0xc208818380, 0x7fd6cf13a9f8, 0xc208818340, 0x0, 0x0)
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:154 +0x2cf
github.com/sjwhitworth/golearn/ensemble.(*RandomForest).Predict(0xc20807bc20, 0x7fd6cf13a9f8, 0xc208818340, 0x0, 0x0)
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/ensemble/randomforest.go:45 +0x51
github.com/sjwhitworth/golearn/ensemble.TestRandomForest1(0xc208050090)
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/ensemble/randomforest_test.go:29 +0x1a3
testing.tRunner(0xc208050090, 0x69c010)
/home/ubuntu/.gvm/gos/go1.3/src/pkg/testing/testing.go:422 +0x8b
created by testing.RunTests
/home/ubuntu/.gvm/gos/go1.3/src/pkg/testing/testing.go:504 +0x8db
goroutine 191 [runnable]:
github.com/sjwhitworth/golearn/meta.func·004()
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:138
created by github.com/sjwhitworth/golearn/meta.(*BaggedModel).Predict
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:149 +0x25b
goroutine 190 [runnable]:
github.com/sjwhitworth/golearn/meta.func·004()
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:138
created by github.com/sjwhitworth/golearn/meta.(*BaggedModel).Predict
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:149 +0x25b
goroutine 189 [runnable]:
github.com/sjwhitworth/golearn/meta.func·004()
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:138
created by github.com/sjwhitworth/golearn/meta.(*BaggedModel).Predict
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:149 +0x25b
goroutine 188 [runnable]:
github.com/sjwhitworth/golearn/meta.func·004()
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:138
created by github.com/sjwhitworth/golearn/meta.(*BaggedModel).Predict
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:149 +0x25b
goroutine 187 [runnable]:
github.com/sjwhitworth/golearn/meta.func·004()
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:138
created by github.com/sjwhitworth/golearn/meta.(*BaggedModel).Predict
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:149 +0x25b
goroutine 186 [runnable]:
github.com/sjwhitworth/golearn/meta.func·004()
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:138
created by github.com/sjwhitworth/golearn/meta.(*BaggedModel).Predict
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:149 +0x25b
goroutine 185 [runnable]:
github.com/sjwhitworth/golearn/meta.func·004()
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:138
created by github.com/sjwhitworth/golearn/meta.(*BaggedModel).Predict
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:149 +0x25b
goroutine 183 [chan receive]:
github.com/sjwhitworth/golearn/meta.func·003()
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:113 +0x8c
created by github.com/sjwhitworth/golearn/meta.(*BaggedModel).Predict
/home/ubuntu/.gvm/pkgsets/go1.3/global/src/github.com/sjwhitworth/golearn/meta/bagging.go:131 +0x175
exit status 2
This of course makes it hard to write benchmarking tests which expect to be able to execute the code multiple times in order to time it.
Trying to benchmark the k-NN Classifier with a large dataset with a large number of features. I'm using this in particular:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#aloi
Had to do a fair bit of yak shaving to get edf
to work:
rowsPerPage
is about 3.96 and you have 108,000 rows, the lack-of-rounding error compounds and you end up not asking for enough pages. I think rowsPerPage
needs to be math.Floor
ed.alloc.go
doesn't appear to consider edfAnonMode
. I get a panic on line 21 because e.f
is nil. I extracted a fileSize
variable which I set to os.Getpagesize()
in case we're in edfAnonMode
. Few issues here: (a) f
isn't a great name for a struct field, (b) my solution was hacky, not sure where else various modes aren't being considered, but also don't want to riddle the package with switches on mode
, (c) not sure why os.Getpagesize()
is the correct value, I copied it from map.go, but this value should probably be extracted as a constant somewhere.startBlock == 0
guard in AllocPages isn't clear, but I found I needed to move e.extend(pagesRequested)
up before the if block.fixed.go
blowing up when I tried to use it for some debugging output. It assumes f.alloc
has non-zero length, and that f.alloc[0]
has length at least 61. Got index out of range
panics. Not sure whether these assumptions are supposed to hold and weren't for some other reason that's broken, so I just worked around it by removing anything related to alloc
from the Sprintf
interpolation arguments, and so I got rid of the if
altogether. The source of the 61 number wasn't clear.I didn't want to submit a PR with these fixes because (a) the tests don't give me a lot of confidence that the changes haven't broken something else, (b) lot of my changes felt hacky, and feels like the right solution would involve heavier refactoring.
Couple big-picture questions about EDF:
golearn
repo? Seems like it's its own project.testTrainSplit
function). For each of those files, the ParseCSVToInstances
takes about 30s, never mind actually doing the kNN classification (literally on the order of a few years, made some tweaks but still on the order of an hour). For comparison I have code that parses both CSVs and does the kNN-classification in about 30s total. Hard to pinpoint where all the slowness is coming from, but seems scary.Perceptron
implementation, probably under optimisation
or a new subpackage linear
BoostedModel
to the ensemble
packageA large number of tests don't make any assertions, and instead just print stuff out, e.g. a confusion matrix summary. Seems to defeat the purpose of having automated tests. There is some value to these tests, if they pass, it tells you the algorithm runs end to end without blowing up. But I've noticed when running in verbose mode that some of the algorithms produce 0% accuracy, and when tweaking parameters to improve the accuracy, the tests crashed with a memory panic.
Seems like having actual assertions (in addition to being valuable on their own) would've helped catch that kind of thing earlier. I'm willing to volunteer to go do this, work through the tests and make sure they're all making valuable assertions, someone just needs to assign it to me.
While I'm at it, any ideas for what sorts of things would be valuable to assert on? At minimum, for things that were printing out confusion matrices, asserting on the overall summary is a start.
Just wanted to bring up one minor bug and the ask about unseeded random number generation in Go.
There was a minor implementation bug in base/instances.go in regards to the Shuffle method on Instances that causes it to create a non-uniform distribution of shuffle permutations. I'll have it fixed with the correct Fisher-Yates algorithm and submit a pull request soon.
Another thing that came to my attention is that our libraries do not seed the math/rand generators. I'm not sure if this lack of seeding was a desired feature. As Go's rand package uses a fixed seed, behavior between runs is currently deterministic. Perhaps we could make it a convention to seed in our packages' init() function?
Trying to run the tests for linear_regression. I get the error
# include <linear.h>```
But I've installed everything with `make.go` and set appropriate paths. Any ideas @njern?
InstancesView
are used by InstancesTrainingTestSplit
and other methods to slice and dice FixedDataGrids
without using any more memory than necessary, however this does mean that there's less scope for optimising things, like with the recent KNN optimisation, which rely on having a contiguous data layout.
To fix: I propose adding a InstancesDenseCopy
method to base
which always returns a DenseInstances
, which replicates the original FixedDataGrid's Attributes and all of the data it contains.
Moving gokmeans to golearn, making use of golearn's interfaces.
So you might have come across this piece on Hacker News the other week about implementing 1-NN in Rust. I decided it might be an interesting idea to see if we could achieve the same performance.
I installed a similar nightly build of Rust 0.11.0 on my Linode and compiled the sample code with full optimisations. The real user time (as measured by time
and averaged over five runs) was 6.5 seconds. I then checked out the current golearn master, and wrote an equivalent KNN program. This took 45.9 seconds (averaged over two runs). Most of the time is spent in matrix functions, and a small amount is spent sorting integer maps and garbage collecting.
To cut down on allocations and matrix operations, I
mmap
-able format.Attributes
to use []byte
slices as their system representation.Using reflect.SliceHeader
, and some other tricks to cast between float64
and uint64
and []byte
it's possible to access that (potentially disk-backed) memory with very little garbage collection overhead. Surprisingly, while this transformed the profiling results, performance wasn't that different - saving only around 7-10 seconds on average..
I then radically overhauled the Instances
type, allowing subsets of Attributes
to be stored in column order, and then re-wrote the Euclidean distance calculation in C to take advantage of this locality. The new running time (averaged over five runs) is 9.6 seconds. I think that with a few compiler flags, and some more Attribute
types, I think we could go even faster than that.
In light of this result, I'd like to recommend:
SelectAttributes
-style functions into utility functionsAttribute
groupingThe code I'm going to submit is not finished yet: it doesn't work in low-memory situations, it doesn't implement all of the safety requirements to use it, and these features definitely aren't mergeable in the v0.1 time-frame, but I'd be interested in your thoughts on the design.
I'm not a experienced golang programmer but there are couple libraries made for golang that could be used.
https://github.com/datastream/libsvm
https://github.com/ryanbressler/CloudForest
https://gowalker.org/github.com/jbrukh/bayesian
Not sure if we want to stick with plain vanilla neural nets, or if we want to do deep learning. Do you have any experience with this @macmania ?
The installation instructions mention modifying some lines of code in $GOPATH/src/github.com/gonum/blas/cblas/blas.go
. However, the lines the instructions reference do not exist in the latest version of gonum/blas. As far as I can tell, they were removed on April 28.
I found that adding the temporary environment variable CGO_LDFLAGS when running go install ./..
works. So e.g., on OSX the command that worked for me was CGO_LDFLAGS="-framework Accelerate" go install ./...
. This way you don't have to modify the source code for gonum/blas.
We should start off with this. It's a stable workhorse of ML applications, and would be really useful.
As mentioned in other issues, there are some decisions we need to make.
mat64
lack docs, but author replies to the issues very fast, optimized memory usage.biogo.matrix
docs are quite good, but I have no experience in using this.base
package, due to it is related to many other packages in golearn.linear_models/liblinear_src
in #23. We need to agree a convention for how to include 3rd libraries.Please leave comments about above issues. We should settle down these issues first.
@sjwhitworth @ifesdjeen @npbool @marcoseravalli @macmania
CategoricalSetAttribute
type for holding up to 64 thingsWe should design the interface for a base classifier/regressor to implement, so we can simply pass interfaces into func's.
Overlooked in #88, requires some plumbing.
Started working on k-fold and leave-one-out.
Having a useful set of parsing packages makes life a whole lot easier for people. We should aim to natively support at least JSON and CSV from the start.
I am getting the following error message when trying to do anything:
# github.com/sjwhitworth/golearn/linear_models
../../../sjwhitworth/golearn/linear_models/liblinear.go:55: cannot use &c_y[0] (type *C.int) as type *C.double in assignment
It has to do with commit 94a562b. Changing the two lines 51 and 53 back to using the double type fixes the issue.
Is it possible this is a cross-platform or versioning issue? I'm on mac os x and installed liblinear version 1.94 with brew install liblinear
.
Happy to provide clarification or more information. Is anyone else running into this?
I know there is a ticket for building out a deep learning Neural Network but I was curious if an averaged perceptron implementation might be welcome. I've been hacking in Go for a bit and I'm trying to find good projects to both learn from others and have an opportunity to build new things. Thanks!
Right now the TrainTestSplit() function that is implemented in the cross_validation package is never used anywhere, because the base package already implements InstancesTrainTestSplit().
Personally, I think CV is such an integral part of any ML algorithm/training that we could go ahead and delete the CV package and just keep any CV-related logic in base.
Thoughts?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.