Git Product home page Git Product logo

regressions.jl's Introduction

Regressions.jl

[MacOS/Linux] Coverage Documentation
Build Status codecov.io TBA

This is a convenience package gathering functionalities to solve a number of generalised linear regression/classification problems which, inherently, correspond to an optimisation problem of the form

L(y, Xθ) + P(θ)

where L is a loss function and P is a penalty function (both of those can be scaled or composed). Additional regression/classification methods which do not directly correspond to this formulation may be added in the future.

The core aims of this package are:

  • make these regressions models "easy to call" and callable in a unified way,
  • interface with MLJ.jl,
  • focus on performance including in "big data" settings exploiting packages such as Optim.jl, IterativeSolvers.jl,
  • use a "machine learning" perspective, i.e.: focus essentially on prediction, hyper-parameters should be obtained via a data-driven procedure such as cross-validation.

Implemented

Regressors Formulation¹ Available solvers Comments
OLS & Ridge L2Loss + 0/L2 Analytical² or CG³
Lasso & Elastic-Net L2Loss + 0/L2 + L1 (F)ISTA⁴
Robust 0/L2 RobustLoss⁵ + 0/L2 Newton, NewtonCG, LBFGS, IWLS-CG⁶ no scale⁷
  1. "0" stands for no penalty
  2. Analytical means the solution is computed in "one shot" using the \ solver,
  3. CG = conjugate gradient
  4. (Accelerated) Proximal Gradient Descent
  5. Huber, Andrews, Bisquare, Logistic, Fair and Talwar weighing functions available
  6. Iteratively re-Weighted Least Squares where each system is solved iteratively via CG
  7. In other packages such as Scikit-Learn, a scale factor is estimated along with the parameters, this is a bit ad-hoc and corresponds more to a statistical perspective, further it does not work well with penalties; we recommend using cross-validation to set the parameter of the Huber Loss. (TODO: document)
Classifiers Formulation Available solvers Comments
Logistic 0/L2 LogisticLoss + 0/L2 Newton, Newton-CG, LBFGS yᵢ∈{±1}
Logistic L1/EN LogisticLoss + 0/L2 + L1 (F)ISTA yᵢ∈{±1}
Multinomial 0/L2 MultinomialLoss + 0/L2 Newton-CG, LBFGS yᵢ∈{1,...,c}
Multinomial L1/EN MultinomialLoss + 0/L2 + L1 ISTA, FISTA yᵢ∈{1,...,c}

Unless otherwise specified:

  • Newton-like solvers use Hager-Zhang line search (default in Optim.jl)
  • ISTA, FISTA solvers use backtracking line search and a shrinkage factor of β=0.8

Current limitations

  • The models are built and tested assuming n > p; if this doesn't hold, tricks should be employed to speed up computations; these have not been implemented yet.
  • Stochastic solvers that would be appropriate for huge models have not yet been implemented.
  • "Meta" functionalities such as One-vs-All or Cross-Validation are left to other packages such as MLJ.

Possible future models

WIP

  • Quantile reg with ADMM, IWLS, (? IP, Frisch Newton)
  • LAD with ADMM

Future

Model Formulation Comments
Huber L1/ElasticNet HuberLosss + No/L2 + L1
Group Lasso L2Loss + ∑L1 over groups
Adaptive Lasso L2Loss + weighted L1 A
LAD L1Loss People seem to use a simplex algorithm (Barrodale and Roberts), prox like ADMM should be ok too G, or F
SCAD L2Loss + SCAD A, B, C
MCP L2Loss + MCP A
OMP L2Loss + L0Loss D
SGD Classifiers *Loss + No/L2/L1 and OVA SkL
  • (⭒) should be added soon

Other regression models

There are a number of other regression models that may be included in this package in the longer term but may not directly correspond to the paradigm Loss+Penalty introduced earlier.

In some cases it will make more sense to just use GLM.jl.

Sklearn's list: https://scikit-learn.org/stable/supervised_learning.html#supervised-learning

Model Note Link(s)
LARS --
Quantile Regression -- Yang et al, 2013, QuantileRegression.jl
L∞ approx (Logsumexp) -- slides
Passive Agressive -- Crammer et al, 2006 SkL
Orthogonal Matching Pursuit -- SkL
Least Median of Squares -- Rousseeuw, 1984
RANSAC, Theil-Sen Robust reg Overview RANSAC, SkL, SkL, More Ransac
Ordinal regression need to figure out how they work E
Count regression need to figure out how they work R
Robust M estimators -- F
Perceptron, MIRA classifier Sklearn just does OVA with binary in SGDClassif H
Robust PTS and LTS -- PTS LTS

What about other packages

While the functionalities in this package overlap with a number of existing packages, the hope is that this package will offer a general entry point for all of them in a way that won't require too much thinking from an end user (similar to how someone would use the tools from sklearn.linear_model). If you're looking for specific functionalities/algorithms, it's probably a good idea to look at one of the packages below:

There's also GLM.jl which is more geared towards statistical analysis for reasonably-sized datasets and does (as far as I'm aware) lack a few key functionalities for ML such as penalised regressions or multinomial regression.

References

Dev notes


regressions.jl's People

Contributors

tlienart avatar

Watchers

 avatar  avatar  avatar

regressions.jl's Issues

Basic regressions

  • OLS 0/L2
  • Logistic 0/L2
  • Multinomial 0/L2
  • Lasso see Lasso.jl
  • Logistic L1 / ElasticNet
  • Multinomial L1
  • ElasticNet

Prox solvers

  • Ista
  • Fista (prox gd)

Quantile regression

algos

ADMM, MM, and CD approaches, while MM and CD are faster and ADMM slower than the IP algorithm available in quantreg. The results so far suggest that the MM algorithm is the best-suited for non-regularized (composite) quantile regression among the four methods tested, especially for data sets with large n and relatively small p. In regularized quantile regression, all methods perform similarly in terms of variable selection, but CD and ADMM show clear superiority in run time, particularly relative to the IP and MM methods when p is large. In the case of regularized composite quantile regression, CD and ADMM dis

(...)

Applying existing optimization algorithms to (composite) quantile regression requires a non-trivial reformulation of the problem due to the non-linearity and non-differentiability of the loss and regularization terms of the objective. The well-known quantreg package for R (Koenker, 2017) uses an interior point (IP) approach for quantile and composite quantile regression with the option of l1 (lasso) regularization for the former and no regulariza- tion options for the latter. Although advanced IP algorithms in quantreg, such as the one using prediction-correction (Mehrotra, 1992) for non-regularized quantile regression, have greatly improved upon earlier attempts using simplex methods, the time spent on matrix inversion in IP approaches (Chen and Wei, 2005) motivates us to seek faster algorithms for quantile and composite quantile regression, particularly for high-dimensional data where regularization is required. In addition, following the conjectures of Fan and Li (2001), Zou (2006) showed lasso variable selection—currently the most commonly-implemented penalty for quantile regression—to be inconsistent in certain situations and presented adaptive lasso regularization as a solution. Our work in the present paper is thus motivated by both a search for faster quantile regression algorithms as well as the lack of publicly-available meth- ods for adaptive-lasso regularized quantile and composite quantile regression, particularly for high-dimensional data.

in https://arxiv.org/pdf/1709.04126.pdf (cqreg package in R)

refs

** Hunter & Lange: MM algo for QR http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.206.1351&rep=rep1&type=pdf

Huber/Robust regression; optimize sigma

Huber regression in its default format works fine; Sklearn optimises an additional scale factor;

Probably can get to something similar by doing r ./= sigma and having a gradient in sigma etc; needs a bit of thinking if only because sklearn ensures sigma is positive by using LBFGSB and not just LBFGS.

Probably something that could be added optionally leaving the possibility for the user to CV it. Needs a bit of proper thinking as to blocking delta, blocking delta * sigma etc.

Allegedly it might make more sense to keep things as they are and CV the delta (or at least keep this possible) Though not directly comparable to sklearn as a result.

Enhancements

(see readme)

Other stuff

  • add option for size of tape in LBFGS solver(s)
  • add option(s) for linesearches
  • add and test options generally for solvers

option to scale loss by `n`

In case users prefer this;

that means though that things should be ScaledLosss{...}, it doesn't change much otherwise.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.