The hlearn from mattwis

#Homomorphic Learning

HLearn is a suite of libraries for interpretting machine learning models according to their algebraic structure. Every structure has associated algorithms useful for learning. when we show that a model is an instance of a particular structure, we get the associated algorithms "for free."

Structure	What we get
Monoid	parallel batch training
Monoid	online training
Monoid	fast cross-validation
Abelian group	"untraining" of data points
Abelian group	more fast cross-validation
R-Module	weighted data points
Vector space	fractionally weighted data points
Functor	fast simple preprocsesing of data
Monad	fast complex preprocessing of data

This interpretation of machine learning is somewhat limitting in that not all models have obvious algebraic structure. But many important models do. Currently implemented models include:

Univariate distributions: exponential, log-normal, normal, kernel density estimator, binomial, categorical, geometric, poisson
Multivariate distributions: normal, categorical, subset of markov networks
Classifiers: naive bayes, full bayes, perceptron, bagging, boosting (sort of)
Univariate regression: exponential, logistic, power law, polynomial
NP-hard approximations: k-centers, bin packing, multiprocessor scheduling
Other: markov chains

Note: These models not included in the latest hackage releases: decision stumps/trees, and k-nearest neighbor (kd-tree based)

Example: normal distribution

Every model in HLearn is trained from a data set using the function train. The type signature specifies which model we're training.

let dataset =  [1,2,3,4,5,6]
let dist = train dataset :: Normal Double

We can train in parallel using the higher order function parallel. The GHC run time automatically takes advantage of multiple cores on your compuer. If you have 4 cores, then run time is 4x faster.

let dist' = parallel train dataset :: Normal Double

We can also train in online mode. This is where you add data points to an already existing model using either the function add1dp or addBatch.

let dist_online1 = add1dp dist 7
let dist_online2 = addBatch dist [7,8,9,10]

Finally, once we've trained a data point, we can do all the normal operations on it we would expect. One common operation on distributions is evaluating the probability density function. We do this with the pdf function.

pdf dist 10

For more details on why the Normal distribution has algebraic structure and what we can do with it, see the blog post Normal distributions form a monoid and why machine learning experts should care.

Future directions

HLearn is under active development. At present, it is primarily a research tool. This means that the interfaces may change significantly in the future (but will definitely follow the PVP). I'm hoping HLearn will eventually become a stable package that will make it easy to incorporate machine learning techniques into Haskell programs.

Current development is focused in two areas. First, implementing new models and their algebraic structures. Many unimplemented models have "trivial" algebraic structure. But for many successful models it is unknown whether they can have interesting structure. The second area is investigating new structures. Many models have Functor/Applicative/Monoid structure (or in some strict sense almost have these structures) and I'm working on how to exploit these structures.

Any comments / questions / pull requests are greatly appreciated!

mattwis / hlearn Goto Github PK

hlearn's Introduction

Example: normal distribution

More documentation

Future directions

hlearn's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent