Git Product home page Git Product logo

hlearn's Introduction

#Homomorphic Learning

HLearn is a suite of libraries for interpretting machine learning models according to their algebraic structure. Every structure has associated algorithms useful for learning. when we show that a model is an instance of a particular structure, we get the associated algorithms "for free."

Structure What we get
Monoid parallel batch training
Monoid online training
Monoid fast cross-validation
Abelian group "untraining" of data points
Abelian group more fast cross-validation
R-Module weighted data points
Vector space fractionally weighted data points
Functor fast simple preprocsesing of data
Monad fast complex preprocessing of data

This interpretation of machine learning is somewhat limitting in that not all models have obvious algebraic structure. But many important models do. Currently implemented models include:

  • Univariate distributions: exponential, log-normal, normal, kernel density estimator, binomial, categorical, geometric, poisson

  • Multivariate distributions: normal, categorical, subset of markov networks

  • Classifiers: naive bayes, full bayes, perceptron, bagging, boosting (sort of)

  • Univariate regression: exponential, logistic, power law, polynomial

  • NP-hard approximations: k-centers, bin packing, multiprocessor scheduling

  • Other: markov chains

Note: These models not included in the latest hackage releases: decision stumps/trees, and k-nearest neighbor (kd-tree based)

Example: normal distribution

Every model in HLearn is trained from a data set using the function train. The type signature specifies which model we're training.

let dataset =  [1,2,3,4,5,6]
let dist = train dataset :: Normal Double

We can train in parallel using the higher order function parallel. The GHC run time automatically takes advantage of multiple cores on your compuer. If you have 4 cores, then run time is 4x faster.

let dist' = parallel train dataset :: Normal Double

We can also train in online mode. This is where you add data points to an already existing model using either the function add1dp or addBatch.

let dist_online1 = add1dp dist 7
let dist_online2 = addBatch dist [7,8,9,10]

Finally, once we've trained a data point, we can do all the normal operations on it we would expect. One common operation on distributions is evaluating the probability density function. We do this with the pdf function.

pdf dist 10

For more details on why the Normal distribution has algebraic structure and what we can do with it, see the blog post Normal distributions form a monoid and why machine learning experts should care.

More documentation

There are three main sources of documentation. First, there are a number of tutorials on my personal blog. These provide the most detail and are geared towards the beginner. They're probably the easiest way to get started. Next, there are two papers about the internals of the HLearn library. They are a good resource for understanding the theory behind why the library works. Finally, there's the hackage documentation.

Tutorial links:

Comparison to other libraries:

Paper links:

Hackage links:

Future directions

HLearn is under active development. At present, it is primarily a research tool. This means that the interfaces may change significantly in the future (but will definitely follow the PVP). I'm hoping HLearn will eventually become a stable package that will make it easy to incorporate machine learning techniques into Haskell programs.

Current development is focused in two areas. First, implementing new models and their algebraic structures. Many unimplemented models have "trivial" algebraic structure. But for many successful models it is unknown whether they can have interesting structure. The second area is investigating new structures. Many models have Functor/Applicative/Monoid structure (or in some strict sense almost have these structures) and I'm working on how to exploit these structures.

Any comments / questions / pull requests are greatly appreciated!

hlearn's People

Contributors

mikeizbicki avatar he-chien-tsai avatar davorak avatar npezolano avatar

Watchers

James Cloos avatar Matt Wismer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.