Git Product home page Git Product logo

pyglmnet's Introduction

pyglmnet

A python implementation of elastic-net regularized generalized linear models

License Travis Codecov Circle Gitter DOI

Generalized linear models are well-established tools for regression and classification and are widely applied across the sciences, economics, business, and finance. They are uniquely identifiable due to their convex loss and easy to interpret due to their point-wise non-linearities and well-defined noise models.

In the era of exploratory data analyses with a large number of predictor variables, it is important to regularize. Regularization prevents overfitting by penalizing the negative log likelihood and can be used to articulate prior knowledge about the parameters in a structured form.

Despite the attractiveness of regularized GLMs, the available tools in the Python data science eco-system are highly fragmented. More specifically,

  • statsmodels provides a wide range of link functions but no regularization.
  • scikit-learn provides elastic net regularization but only for linear models.
  • lightning provides elastic net and group lasso regularization, but only for linear and logistic regression.

Pyglmnet is a response to this fragmentation. It runs on Python 3.5+, and here are some of the highlights.

  • Pyglmnet provides a wide range of noise models (and paired canonical link functions): 'gaussian', 'binomial', 'multinomial', 'poisson', and 'softplus'.
  • It supports a wide range of regularizers: ridge, lasso, elastic net, group lasso, and Tikhonov regularization.
  • Pyglmnet's API is designed to be compatible with scikit-learn, so you can deploy Pipeline tools such as GridSearchCV() and cross_val_score().
  • We follow the same approach and notations as in Friedman, J., Hastie, T., & Tibshirani, R. (2010) <https://core.ac.uk/download/files/153/6287975.pdf>__ and the accompanying widely popular R package.
  • We have implemented a cyclical coordinate descent optimizer with Newton update, active sets, update caching, and warm restarts. This optimization approach is identical to the one used in R package.
  • A number of Python wrappers exist for the R glmnet package (e.g. here and here) but in contrast to these, Pyglmnet is a pure python implementation. Therefore, it is easy to modify and introduce additional noise models and regularizers in the future.

Benchmarks

Here is table comparing pyglmnet against scikit-learn's linear_model, statsmodels, and R.

The numbers below are run time (in milliseconds) to fit a 1000 samples x 100 predictors sparse matrix (density 0.05). This was done on a c. 2011 Macbook Pro, so your numbers may vary.

distr pyglmnet scikit-learn statsmodels R
gaussian 6.8 1.2 29.8 10.3
binomial 16.3 4.5 89.3 --
poisson 5.8 -- 117.2 156.1

We provide a function called BenchMarkGLM() in pyglmnet.datasets if you would like to run these benchmarks yourself, but you need to take care of the dependencies: scikit-learn, Rpy2, and statsmodels yourself.

Installation

Now pip installable!

$ pip install pyglmnet

Manual installation instructions below:

Clone the repository.

$ git clone http://github.com/glm-tools/pyglmnet

Install pyglmnet using setup.py as follows

$ python setup.py develop

Getting Started

Here is an example on how to use the GLM estimator. .. This example is also found in examples/intro_example.py.

import numpy as np
import scipy.sparse as sps
from pyglmnet import GLM, simulate_glm

# create an instance of the GLM class
glm = GLM(distr="poisson")

# sample random coefficients
n_samples, n_features = 1000, 100
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# simulate training data
X_train = np.random.normal(0.0, 1.0, [n_samples, n_features])
y_train = simulate_glm("poisson", beta0, beta, X_train)

# simulate testing data
X_test = np.random.normal(0.0, 1.0, [n_samples, n_features])
y_test = simulate_glm("poisson", beta0, beta, X_test)

# fit the model on the training data
#scaler = StandardScaler().fit(X_train)
glm.fit(X_train, y_train)

# predict using fitted model on the test data
yhat_test = glm.predict(X_test)

# score the model
deviance = glm.score(X_test, y_test)

More pyglmnet examples and use cases.

Tutorial

Here is an extensive tutorial on GLMs, optimization and pseudo-code.

Here are slides from a talk at PyData Chicago 2016, corresponding tutorial notebooks and a video.

How to contribute?

We welcome pull requests. Please see our developer documentation page for more details.

Acknowledgments

License

MIT License Copyright (c) 2016 Pavan Ramkumar

pyglmnet's People

Contributors

anchorblues avatar beibinli avatar cxrodgers avatar daniel-acuna avatar evadyer avatar geektoni avatar hugoguh avatar jasmainak avatar marquesvf avatar pavanramkumar avatar ravigarg27 avatar the872 avatar themantalope avatar timshell avatar tommyod avatar umegaki-msi avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.