Git Product home page Git Product logo

tfhuber's Introduction

tfHuber

Tuning-Free Huber Estimation and Regression

Description

This package implements the Huber mean estimator, Huber covariance matrix estimation, adaptive Huber regression and l1-regularized Huber regression (Huber-Lasso) estimators efficiently. For all these methods, the robustification parameter τ is calibrated via a tuning-free principle.

Specifically, for Huber regression, assume the observed data vectors (Y, X) follow a linear model Y = θ0 + X θ + ε, where Y is an n-dimensional response vector, X is an n × d design matrix, and ε is an n-vector of noise variables whose distributions can be asymmetric and/or heavy-tailed. The package computes the standard Huber's M-estimator when d < n and the Huber-Lasso estimator when d > n. The vector of coefficients θ and the intercept term θ0 are estimated successively via a two-step procedure. See Wang et al., 2020 for more details of the two-step tuning-free framework.

Recent update

The most efficient implementation of three functions huberMean, huberCov, huberReg in this package have been merged into another R library FarmTest, which has a CRAN binary release. To avoid the annoying compiling issues caused by this source package, and experience faster and more stable computation, we recommend installing FarmTest.

Installation

Install tfHuber from GitHub:

install.packages("devtools")
library(devtools)
devtools::install_github("XiaoouPan/tfHuber")
library(tfHuber)

Common error messages

First of all, to avoid most unexpected error messages, it is strongly recommended to update R to version >= 3.6.1.

Besides, since the library tfHuber is coded in Rcpp and RcppArmadillo, when you first install it, the following two build tools are required:

  1. Rtools for Windows OS or XCode Command Line Tools for Mac OS. See this link for details.

  2. gfortran binaries: see here for instructions.

tfHuber should be working well after these steps. Some common error messages along with their solutions are collected below, and we'll keep updating them based on users' feedback:

  • Error: "...could not find build tools necessary to build FarmTest": Please see step 1 above.

  • Error: "library not found for -lgfortran/..": Please see step 2 above.

Functions

There are four functions in this package:

  • huberMean: Huber mean estimation.
  • huberCov: Huber covariance matrix estimation.
  • huberReg: Adaptive Huber regression.
  • cvHuberLasso: K-fold cross-validated Huber-Lasso regression.

Getting help

Help on the functions can be accessed by typing ?, followed by function name at the R command prompt.

For example, ?huberReg will present a detailed documentation with inputs, outputs and examples of the function huberReg.

Examples

First, we present an example of Huber mean estimation. We generate data from a log-normal distribution, which is asymmetric and heavy-tailed. We estimate its mean by the tuning-free Huber mean estimator.

library(tfHuber)
n = 1000
X = rlnorm(n, 0, 1.5) - exp(1.5^2 / 2)
meanList = huberMean(X)
hMean = meanList$mu

Then we present an example of Huber covariance matrix estimation. We generate data from t distribution with df = 3, which is heavy-tailed. We estimate its covariance matrix by the method proposed in Ke et al., 2019.

library(tfHuber)
n = 100
d = 50
X = matrix(rt(n * d, df = 3), n, d) / sqrt(3)
hubCov = huberCov(X)

Next, we present an example of adaptive Huber regression. Here we generate data from a linear model Y = X θ + ε, where ε follows a log-normal distribution, and estimate the intercept and coefficients by tuning-free Huber regression.

library(tfHuber)
n = 500
d = 5
thetaStar = rep(3, d + 1)
X = matrix(rnorm(n * d), n, d)
error = rlnorm(n, 0, 1.5) - exp(1.5^2 / 2)
Y = as.numeric(cbind(rep(1, n), X) %*% thetaStar + error)
listHuber = huberReg(X, Y)
thetaHuber = listHuber$theta

Finally, we illustrate the use of l1-regularized Huber regression. Again, we generate data from a linear model Y = X θ + ε, where θ is a high-dimensional vector, and ε is from a log-normal distribution. We estimate the intercept and coefficients by Huber-Lasso regression, where the regularization parameter λ is calibrated by K-fold cross-validation, and the robustification parameter τ is chosen by a tuning-free procedure.

library(tfHuber)
n = 100
d = 200
s = 5
thetaStar = c(rep(3, s + 1), rep(0, d - s))
X = matrix(rnorm(n * d), n, d)
error = rlnorm(n, 0, 1.5) - exp(1.5^2 / 2)
Y = as.numeric(cbind(rep(1, n), X) %*% thetaStar + error)
listHuberLasso = cvHuberLasso(X, Y)
thetaHuberLasso = listHuberLasso$theta

License

GPL (>= 2)

Author(s)

Xiaoou Pan [email protected], Wen-Xin Zhou [email protected]

References

Eddelbuettel, D. and Francois, R. (2011). Rcpp: Seamless R and C++ integration. J. Stat. Softw. 40 1-18. Paper

Eddelbuettel, D. and Sanderson, C. (2014). RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Comput. Statist. Data Anal. 71 1054-1063. Paper

Fan, J., Liu, H., Sun, Q. and Zhang, T. (2018). I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error. Ann. Statist. 46 814–841. Paper

Ke, Y., Minsker, S., Ren, Z., Sun, Q. and Zhou, W.-X. (2019). User-friendly covariance estimation for heavy-tailed distributions. Statis. Sci. 34 454-471, Paper

Pan, X., Sun, Q. and Zhou, W.-X. (2019). Nonconvex regularized robust regression with oracle properties in polynomial time. Preprint. Paper.

Sanderson, C. and Curtin, R. (2016). Armadillo: A template-based C++ library for linear algebra. J. Open Source Softw. 1 26. Paper

Sun, Q., Zhou, W.-X. and Fan, J. (2020). Adaptive Huber regression. J. Amer. Stat. Assoc. 115 254-265. Paper

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288. Paper

Wang, L., Zheng, C., Zhou, W. and Zhou, W.-X. (2020). A new principle for tuning-free Huber regression. Stat. Sinica to appear. Paper

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.