Git Product home page Git Product logo

celer's Introduction

celer

build coverage License Downloads Downloads PyPI version

celer is a Python package that solves Lasso-like problems and provides estimators that follow the scikit-learn API. Thanks to a tailored implementation, celer provides a fast solver that tackles large-scale datasets with millions of features up to 100 times faster than scikit-learn.

Currently, the package handles the following problems:

Problem Support Weights Native cross-validation
Lasso
ElasticNet
Group Lasso
Multitask Lasso
Sparse Logistic regression

Why celer?

celer is specially designed to handle Lasso-like problems which makes it a fast solver of such problems. In particular it comes with tools such as:

  • automated parallel cross-validation
  • support of sparse and dense data
  • optional feature centering and normalization
  • unpenalized intercept fitting

celer also provides easy-to-use estimators as it is designed under the scikit-learn API.

Get started

To get stared, install celer via pip

pip install -U celer

On your python console, run the following commands to fit a Lasso estimator on a toy dataset.

>>> from celer import Lasso
>>> from celer.datasets import make_correlated_data
>>> X, y, _ = make_correlated_data(n_samples=100, n_features=1000)
>>> estimator = Lasso()
>>> estimator.fit(X, y)

This is just a starter examples. Make sure to browse celer documentation to learn more about its features. To get familiar with celer API, you can also explore the gallery of examples which includes examples on real-life datasets as well as timing comparison with other solvers.

Contribute to celer

celer is an open source project and hence rely on community efforts to evolve. Your contribution is highly valuable and can come in three forms

  • bug report: you may encounter a bug while using celer. Don't hesitate to report it on the issue section.
  • feature request: you may want to extend/add new features to celer. You can use the issue section to make suggestions.
  • pull request: you may have fixed a bug, enhanced the documentation, ... you can submit a pull request and we will reach out to you asap.

For the last mean of contribution, here are the steps to help you setup celer on your local machine:

  1. Fork the repository and afterwards run the following command to clone it on your local machine
git clone https://github.com/{YOUR_GITHUB_USERNAME}/celer.git
  1. cd to celer directory and install it in edit mode by running
cd celer
pip install -e .
  1. To run the gallery examples and build the documentation, run the followings
cd doc
pip install -e .[doc]
make html

Cite

celer is licensed under the BSD 3-Clause. Hence, you are free to use it. If you do so, please cite:

@InProceedings{pmlr-v80-massias18a,
  title     = {Celer: a Fast Solver for the Lasso with Dual Extrapolation},
  author    = {Massias, Mathurin and Gramfort, Alexandre and Salmon, Joseph},
  booktitle = {Proceedings of the 35th International Conference on Machine Learning},
  pages     = {3321--3330},
  year      = {2018},
  volume    = {80},
}

@article{massias2020dual,
  author  = {Mathurin Massias and Samuel Vaiter and Alexandre Gramfort and Joseph Salmon},
  title   = {Dual Extrapolation for Sparse GLMs},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {234},
  pages   = {1-33},
  url     = {http://jmlr.org/papers/v21/19-587.html}
}

Further links

celer's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

celer's Issues

n_jobs (multi-core CPU) for LassoCV function?

Thanks for these tools! Do any of the celer sklearn functions (LassoCV, etc.) have multi-cpu support like the original sklearn functions (n_jobs = 1, 2, 3, etc.)? Currently, the n_jobs argument is unrecognized as:
__init__() got an unexpected keyword argument 'n_jobs'

ENH: add docstring testing

We should have a test to check that parameter orders match in docstring and function definition. it is done in sklearn and MNE for example

BUG: Divide by zero warning when running celer_path

  sol = celer(```
It's been here for a long time, not a zero column error, happens on every dataset, does not seem to affect convergence
Could be a division by `norm(X.T @ theta)` with `theta = np.zeros(n_samples)`

ENH: support csr64

int[:] X_indptr everywhere

when calling check_estimator(clf) instead of check_estimator(Lasso) the test fails for pseudo large csr matrices

easy to implement

ENH: remove python2.7 testing

all packages dropping support by 2020, as seen on other repos (clar) it might not even be tested properly due to some weird conda env creation. It's probably best to save some time by just removing the python=2.7 case in travis config.

ENH: merge Logreg and Lasso cython code

There is a lot of duplicated code currently, it does not cause a loss of time to add a pb variable to the cython code and have a few if pb == "lasso" in the code.

ENH: merge sparse.pyx and dense.pyx

Currently these two have a lot in common.
maybe it would possible to have a is_sparse parameter
and merge the inner_solver_* and celer_* (resp.)

ENH: reflexion about tol

  1. behaviour disagrees with sklearn because the latter scales tol by norm(y) ** 2 (or norm(y) ** 2 / n_samples ?)

  2. using tol < 1e-7 with float32 caused precision issues (found out in check_estimator, MCVE:

                  [1.9376824, 1.3127615, 2.675319, 2.8909883, 1.1503246],
                  [2.375175, 1.5866847, 1.7041336, 2.77679, 0.21310817],
                  [0.2613879, 0.06065519, 2.4978595, 2.3344703, 2.6100364],
                  [2.935855, 2.3974757, 1.384438, 2.3415875, 0.3548233],
                  [1.9197631, 0.43005985, 2.8340068, 1.565545, 1.2439859],
                  [0.79366684, 2.322701, 1.368451, 1.7053018, 0.0563694],
                  [1.8529065, 1.8362871, 1.850802, 2.8312442, 2.0454607],
                  [1.0785236, 1.3110958, 2.0928936, 0.18067642, 2.0003002],
                  [2.0119135, 0.6311477, 0.3867789, 0.946285, 1.0911323]],
                 dtype=np.float32)

    y = np.array([[1.],
                  [1.],
                  [2.],
                  [0.],
                  [2.],
                  [1.],
                  [0.],
                  [1.],
                  [1.],
                  [2.]], dtype=np.float32)

    params = dict(eps=1e-2, n_alphas=10, tol=1e-10, cv=2, n_jobs=1,
                  fit_intercept=False, verbose=2)

    clf = MultiTaskLassoCV(**params)
    clf.fit(X, y)

(casting X to float64 fixes it)

so maybe we can raise a warning if tol is low and X.dtype == np.float32

[FIX] no more sklearn.linear_model.cd_fast

sklearn 0.24

Importing clar gives:

/home/mathurin/workspace/scikit-learn/sklearn/utils/deprecation.py:143: DeprecationWarning: The sklearn.linear_model.cd_fast module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.
  warnings.warn(message, DeprecationWarning)

BUG: installing celer with pip from github first fails, then succeeds

To reproduce in a newly created environement test where pip in installed:

(test) ➜  benchOpt git:(pip2conda) ✗ which pip
/home/mathurin/miniconda3/envs/test/bin/pip
(test) ➜  benchOpt git:(pip2conda) ✗ pip install git+https://github.com/mathurinm/celer.git
Collecting git+https://github.com/mathurinm/celer.git
  Cloning https://github.com/mathurinm/celer.git to /tmp/pip-req-build-wmet9675
  Running command git clone -q https://github.com/mathurinm/celer.git /tmp/pip-req-build-wmet9675
Collecting numpy>=1.12
  Using cached numpy-1.18.4-cp37-cp37m-manylinux1_x86_64.whl (20.2 MB)
Collecting seaborn>=0.7
  Using cached seaborn-0.10.1-py3-none-any.whl (215 kB)
Collecting scipy>=0.18.0
  Using cached scipy-1.4.1-cp37-cp37m-manylinux1_x86_64.whl (26.1 MB)
Collecting matplotlib>=2.0.0
  Using cached matplotlib-3.2.1-cp37-cp37m-manylinux1_x86_64.whl (12.4 MB)
Requirement already satisfied: Cython>=0.26 in /home/mathurin/miniconda3/envs/test/lib/python3.7/site-packages (from celer==0.5.dev0) (0.29.17)
Collecting scikit-learn>=0.23
  Using cached scikit_learn-0.23.0-cp37-cp37m-manylinux1_x86_64.whl (7.3 MB)
Collecting xarray
  Using cached xarray-0.15.1-py3-none-any.whl (668 kB)
Collecting download
  Using cached download-0.3.5-py3-none-any.whl (8.8 kB)
Collecting tqdm
  Using cached tqdm-4.46.0-py2.py3-none-any.whl (63 kB)
Collecting pandas>=0.22.0
  Using cached pandas-1.0.3-cp37-cp37m-manylinux1_x86_64.whl (10.0 MB)
Collecting kiwisolver>=1.0.1
  Using cached kiwisolver-1.2.0-cp37-cp37m-manylinux1_x86_64.whl (88 kB)
Collecting cycler>=0.10
  Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting python-dateutil>=2.1
  Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1
  Using cached pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
Collecting threadpoolctl>=2.0.0
  Using cached threadpoolctl-2.0.0-py3-none-any.whl (34 kB)
Collecting joblib>=0.11
  Using cached joblib-0.14.1-py2.py3-none-any.whl (294 kB)
Requirement already satisfied: setuptools>=41.2 in /home/mathurin/miniconda3/envs/test/lib/python3.7/site-packages (from xarray->celer==0.5.dev0) (46.2.0.post20200511)
Collecting six
  Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
Collecting requests
  Using cached requests-2.23.0-py2.py3-none-any.whl (58 kB)
Collecting pytz>=2017.2
  Using cached pytz-2020.1-py2.py3-none-any.whl (510 kB)
Requirement already satisfied: certifi>=2017.4.17 in /home/mathurin/miniconda3/envs/test/lib/python3.7/site-packages (from requests->download->celer==0.5.dev0) (2020.4.5.1)
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Using cached urllib3-1.25.9-py2.py3-none-any.whl (126 kB)
Collecting idna<3,>=2.5
  Using cached idna-2.9-py2.py3-none-any.whl (58 kB)
Collecting chardet<4,>=3.0.2
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Building wheels for collected packages: celer
  Building wheel for celer (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /home/mathurin/miniconda3/envs/test/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-wmet9675/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-wmet9675/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-oygzm2ad
       cwd: /tmp/pip-req-build-wmet9675/
  Complete output (3178 lines):
  /home/mathurin/miniconda3/envs/test/lib/python3.7/site-packages/setuptools/dist.py:454: UserWarning: Normalizing '0.5dev' to '0.5.dev0'
    warnings.warn(tmpl.format(**locals()))
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.7
  creating build/lib.linux-x86_64-3.7/celer
  copying celer/homotopy.py -> build/lib.linux-x86_64-3.7/celer
  copying celer/__init__.py -> build/lib.linux-x86_64-3.7/celer
  copying celer/plot_utils.py -> build/lib.linux-x86_64-3.7/celer
  copying celer/dropin_sklearn.py -> build/lib.linux-x86_64-3.7/celer
  running build_ext
  cythoning celer/lasso_fast.pyx to celer/lasso_fast.cpp
  warning: celer/lasso_fast.pyx:159:37: Index should be typed for more efficient access
  cythoning celer/cython_utils.pyx to celer/cython_utils.cpp
  /home/mathurin/miniconda3/envs/test/lib/python3.7/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-req-build-wmet9675/celer/cython_utils.pxd
    tree = Parsing.p_module(s, pxd, full_module_name)
  
  Error compiling Cython file:
  ------------------------------------------------------------
  ...
  # License: BSD 3 clause
  
  cimport cython
  cimport numpy as np
  
  from scipy.linalg.cython_blas cimport ddot, dasum, daxpy, dnrm2, dcopy, dscal
  ^
  ------------------------------------------------------------
  
  celer/cython_utils.pyx:9:0: 'scipy/linalg/cython_blas.pxd' not found
## many cython errors

  celer/cython_utils.cpp:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
   #error Do not use this file, it is the result of a failed Cython compilation.
    ^~~~~
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for celer
  Running setup.py clean for celer
Failed to build celer
Installing collected packages: numpy, scipy, six, python-dateutil, pytz, pandas, kiwisolver, cycler, pyparsing, matplotlib, seaborn, threadpoolctl, joblib, scikit-learn, xarray, tqdm, urllib3, idna, chardet, requests, download, celer
    Running setup.py install for celer ... done
Successfully installed celer-0.5.dev0 chardet-3.0.4 cycler-0.10.0 download-0.3.5 idna-2.9 joblib-0.14.1 kiwisolver-1.2.0 matplotlib-3.2.1 numpy-1.18.4 pandas-1.0.3 pyparsing-2.4.7 python-dateutil-2.8.1 pytz-2020.1 requests-2.23.0 scikit-learn-0.23.0 scipy-1.4.1 seaborn-0.10.1 six-1.14.0 threadpoolctl-2.0.0 tqdm-4.46.0 urllib3-1.25.9 xarray-0.15.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.