Git Product home page Git Product logo

ncvreg's Introduction

version downloads codecov.io Travis build status

Regularization Paths for SCAD and MCP Penalized Regression Models

ncvreg fits regularization paths for linear regression, GLM, and Cox regression models using lasso or nonconvex penalties, in particular the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty, with options for additional L2 penalties (the "elastic net" idea). Utilities for carrying out cross-validation as well as post-fitting visualization, summarization, inference, and prediction are also provided.

Basic Usage

The basic usage of ncvreg is as follows:

fit <- ncvreg(X, y)

The default penalty here is the minimax concave penalty (MCP), but SCAD and lasso penalties are also available. This produces a path of coefficients, which we can plot with

plot(fit)

img

Notice that variables enter the model one at a time, and that at any given value of lambda, several coefficients are zero. The summary method can be used for post-selection summarization and inference:

summary(fit, lambda=0.05)

# MCP-penalized linear regression with n=97, p=8
# At lambda=0.0500:
# -------------------------------------------------
#   Nonzero coefficients: 6
#   Expected nonzero coefficients: 2.51
#   Average mfdr (6 features)    : 0.418

summary(fit) also returns the following table:

Estimate z mfdr
lcavol 0.5317899 8.880429 0.0000000
svi 0.6725610 3.945052 0.0018967
lweight 0.6038969 3.665874 0.0050683
lbph 0.0887456 1.928241 0.4998035
age -0.0153092 -1.788334 1.0000000
pgg45 0.0016804 1.159772 1.0000000

In this case, it would appear that lcavol, svi, and lweight are clearly associated with the response, even after adjusting for the other variables in the model, while lbph, age, and pgg45 may be false positives included simply by chance.

Typically, one would carry out cross-validation for the purposes of assessing the predictive accuracy of the model at various values of lambda:

cvfit <- cv.ncvreg(X, y)
plot(cvfit)

img

At this point, coef(cvfit) will return the coefficients at the value of lambda minimizing the cross-validation error. Likewise,

predict(cvfit, X=head(X))

will return predictions for that model, while

predict(cvfit, type="nvars")

will return the number of nonzero coefficients. Note that the original fit (to the full data set) is returned as cvfit$fit; it is not necessary to call both ncvreg and cv.ncvreg to analyze a data set. For example, plot(cvfit$fit) will produce the same coefficient path plot as plot(fit) above.

Documentation and Citation

For more on the usage and syntax of ncvreg, see the ncvreg homepage.

For more on the algorithms used by ncvreg, see the original article:

For more about the marginal false discovery rate idea used for post-selection inference, see

Installation

  • To install the latest release version from CRAN: install.packages("ncvreg")
  • To install the latest development version from GitHub: devtools::install_github("pbreheny/ncvreg")

ncvreg's People

Contributors

pbreheny avatar slowkow avatar vlopezj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.