Git Product home page Git Product logo

wbacon's Introduction

wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression

DOI CRAN

Summary

Billor et al. (2000) proposed the BACON (blocked adaptive computationally-efficient outlier nominators) algorithms for multivariate outlier nomination and robust linear regression. Béguin and Hulliger (2008) extended the outlier detection method to weighted and incomplete data problems. Both methods are implemented in the R statistical software (R Core Team, 2024) in the packages, respectively, robustX (Mächler et al., 2023) and modi (Hulliger, 2023).

Our package offers a computationally efficient implementation in the C language with OpenMP support for parallelization. Efficiency is achieved by using a weighted quantile based on the Quicksort algorithm, partial sorting in place of full sorting, reuse of computed estimates, and most importantly an up-/downdating scheme for the Cholesky and QR factorizations. The computational costs of up-/downdating are far less than re-computing the entire decomposition repeatedly.

The details of the package are discussed in the accompanying paper:

Schoch, T. (2021) wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression, Journal of Open Source Software 6, 323. DOI 10.21105/joss.03238

Available methods

  • wBACON() is for multivariate outlier nomination and robust estimation of location/ center and covariance matrix
  • wBACON_reg() is for robust linear regression (the method is robust against outliers in the response variable and the model's design matrix)

Assumptions

The BACON algorithms assume that the underlying model is an appropriate description of the non-outlying observations; see Billor et al. (2000). More precisely,

  • the outlier nomination method assumes that the "good" data have (roughly) an elliptically contoured distribution (this includes the Gaussian distribution as a special case);
  • the regression method assumes that the non-outlying ("good") data are described by a linear (homoscedastic) regression model and that the independent variables (having removed the regression intercept/constant, if there is a constant) follow (roughly) an elliptically contoured distribution.

"Although the algorithms will often do something reasonable even when these assumptions are violated, it is hard to say what the results mean." Billor et al. (2000, p. 289)

It is strongly recommended that the structure of the data be examined and whether the assumptions made about the "good" observations are reasonable.

The role of the data analyst

In line with Billor et al. (2000, p. 290), we use the term outlier "nomination" rather than "detection" to highlight that algorithms should not go beyond nominating observations as potential outliers; see also Béguin and Hulliger (2008). It is left to the analyst to finally label outlying observations as such.

The software provides the analyst with tools and measures to study potentially outlying observations. It is strongly recommended to use the tools. See the package folders vignettes and doc for a vignette (guide) and further documentation.

Installation

The package can be installed from CRAN using

install.packages("wbacon")

Building

Make sure that the R package devtools is installed. Then, the wbacon package can be pulled from this GitHub repository and installed by

devtools::install_github("tobiasschoch/wbacon")

The package contains C code that needs to be compiled.

Community guidelines

Submitting an issue

If you have any suggestions for feature additions or any problems with the software that you would like addressed with the development community, please submit an issue on the Issues tab of the project GitHub repository. You may want to search the existing issues before submitting, to avoid asking a question or requesting a feature that has already been discussed.

How to contribute

If you are interested in modifying the code, you may fork the project for your own use, as detailed in the GNU GPL License we have adopted for the project. In order to contribute, please contact the developer by Tobias Schoch at gmail dot com (the names are separated by a dot) after making the desired changes.

Asking for help

If you have questions about how to use the software, or would like to seek out collaborations related to this project, you may contact Tobias Schoch (see contact details above).

References

Béguin, C., and Hulliger, B. (2008). The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data, Survey Methodology 34, 91-103.

Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked adaptive computationally-efficient outlier nominators, Computational Statistics and Data Analysis 34, 279-298. doi: 10.1016/S0167-9473(99)00101-2

Hulliger, B. (2023). modi: Multivariate Outlier Detection and Imputation for Incomplete Survey Data, R package version 0.1-2. URL https://CRAN.R-project.org/package=modi

Mächler, M. and W. A. Stahel (2023). robustX: ’eXtra’ / ’eXperimental’ Functionality for Robust Statistics, R package version 1.2-7. URL https://CRAN.R-project.org/package=robustX

OpenMP Architecture Review Board (2018). OpenMP Application Program Interface Version 5.0. URL https://https://www.openmp.org

R Core Team (2024). R. A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org.

wbacon's People

Contributors

tobiasschoch avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

minghao2016

wbacon's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.