Git Product home page Git Product logo

goml's Introduction

goml

Golang Machine Learning, On The Wire

GoDoc wercker status

goml is a machine learning library written entirely in Golang which lets the average developer include machine learning into their applications. (pronounced like the data format 'toml')

While models include traditional, batch learning interfaces, goml includes many models which let you learn in an online, reactive manner by passing data to streams held on channels.

The library includes comprehensive tests, extensive documentation, and clean, expressive, modular source code. Community contribution is heavily encouraged.

Each package (mentioned below) includes individual README's to learn more about the function, and purpose of the models. Above all, if you want to learn about models, read the GoDoc reference for the package. All models are, as mentioned above, heavily documented.

Installation

go get github.com/cdipaolo/goml/base

# This could be any other model package if you want
#
# Also, the base package is imported already
# by many of the packages so you might not even
# need to `go get` the package explicitly
go get github.com/cdipaolo/goml/perceptron

Documentation

All the code is well documented, and the source is/should be really readable if you'd like to make sense of it all! Look at each package (like right now, in GitHub,) and you will see a link to Godoc as well as an explanation of the package and an example usage. You can even click on the main bullets below and it'll take you to those packages. Also you could just use the Godoc link at the top of this README and navigate to the package you'd like to see more about.

Sub-bullets below will take you directly to the source code of the model.

Currently Implemented Models

Contributing!

see CONTRIBUTING.

I'd love help with any of this if anybody thinks that they would like to implement a model that isn't here, or if they have improvements to current models implemented, or if they want to help with documentation (this would be greatly appreciated, believe me, writing great documentation takes time! ๐Ÿ‘)

LICENSE - MIT

see LICENSE

goml's People

Contributors

arianht avatar ashnair1 avatar cdipaolo avatar jrbarron avatar juandes avatar mexeniz avatar piazzamp avatar vikashvverma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goml's Issues

Allow users to disable stdout

It would be appreciated if there was a global flag to disable logging to standard out. When creating models, it's not always wanted to fill screen space with the model output.

Alternatively it'd be nice if instead of default printing, you could call a method that would give you the variables, like

model.OptimzationMethod() -> "Batch Gradient Ascent
model.TrainingExamples() -> 4000

or return a struct with all the information, so that callers can decide what and where they want to print that information.

Silhouette validation for clustering

Are you planning on implementing the silhouette method for validation the clustering results. Is this a wanted feature? I could implement if you like

fmt.Errorf format %v reads arg #2, but call has 1 arg

This line in kmeans throws an error fmt.Errorf format %v reads arg #2, but call has 1 arg while running tests.

A simple fix would be to replace the line in question

errors <- fmt.Errorf("ERROR: point.X must have the same dimensions as clusters (len %v). Point: %v", point)

with this

errors <- fmt.Errorf("ERROR: point.X must have the same dimensions as clusters (len %v). Point: %v", centroids, point)

Follow up question, is this project in active development?

Comparison with Weka, others?

It would be very useful to compare performance (run time, memory used) with other commonly used machine learning libraries/frameworks. like Weka and Apache Mahout....

Examples

I'd like to learn more about machine learning and this library looks like a good place to start building something with. Are there any examples you could post to demonstrate some simple use cases?

Text models, uint8 for number of classes?

I don't know that much at the moment about ML so pardon me if this is ignorant. Is there a reason that the number of classes for text classification is limited to 255 via uint8? Would it be possible to increase this?

Remove `fmt.Printf`s?

Hello!

Great library. I noticed during tests that the code decides to just fmt.Printf. I don't want the ML lib in my app to be outputting to the console without me knowing. Can we disable that? Or provide a way to provide an alternate io.Writer?

Thanks!

Bug in k nearest neighbors

In the Predict function in knn.go you "initialize" the neighbors array with random elements from k.trainingSet and then use insertSorted to insert new data into the neighbors array.

This is a problem because insertSorted requires that the array you are inserting into be sorted; it uses binary search. The random data you initialize the neighbors vector with may not sorted.

A possible fix is to get rid of the rand package altogether, initialize the neighbors vector with the first k.K elements from k.trainingSet, and sort neighbors before calculating the nearest neighbors.

I can submit a pull request if you like.

Concurrent map access during NaiveBayes's OnlineLearning sequence

In theOnlineLearn method, Words are written to the model's counts of words while the Predict, Probability, and TFIDF.InverseDocumentFrequency methods read from that same map. The only indication that training is done is that the errors channel passed in to OnlineLearn is closed, at which point it's safe to use the model. Otherwise, a runtime error will occur as a result of the concurrent map reads and writes.

Copying lock values TFIDF

When trying to cast the NaiveBayes model to the TFIDF model, I get a go-vet warning saying "TFIDF copies lock value".

There was another related issue with a fix that allowed access to the concurrent map, however I can't find a way to cast one model to the other without this issue.

I get the same issue when running tfidf_test.go

TFIDF doesn't work

TFIDF doesn't work unless we actually save the DocsSeen value in the Bayes model.

Currently the struct for Word doesn't do this.

type Word struct { Count []uint64 Seen uint64 DocsSeen uint64 json:"-" }

Should be:

type Word struct { Count []uint64 Seen uint64 DocsSeen uint64 }

Roadmap / Comparison to other Go ML libraries

How does goml compare to some of the other Go libraries in terms of product vision / roadmap?

There's a decent amount of overlap in terms of the implemented algorithms / models. Is your goal to eventually include all of the other types (neural networks, collaborative filtering, etc)? It seems like the stated goal of being more stream oriented than batch oriented differentiates this library too.

At the end of the day, this seems like the most active repo with an exciting direction. I'm very curious to know where you plan on taking things.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.