Git Product home page Git Product logo

feature-selection's Introduction

Feature Selection Techniques

Feature selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in many cases, to improve the performance of the model. Statistical-based feature selection methods involve evaluating the relationship between each input variable and the target variable using statistics and selecting those input variables that have the strongest relationship with the target variable. These methods can be fast and effective, although the choice of statistical measures depends on the data type of both the input and output variables.

There are two main types of Feature Selection:

Supervised Selection:

Use the target variable (e.g. remove irrelevant variables). it is further categorized into three main categories.

- Filter Method
Select subsets of features based on their relationship with the target.
    - Basics: (Constant, Quasi-Constant, Duplicated)
    - Correlation 
    - Statistical measures: (Information Gain, Fisher Score, Univariate method, mutual information)

- Wrapper method:
Search subsets of features that perform according to a predictive model.
    - Step forward Selection 
    - Step Backward Selection 
    - Exhaustive Search 
    - Feature shuffling 

- Embedded Methods:
    - LASSO
    - Decision tree derived importance 
    - Regression Coefficients 

- Hybrid Methods:
    - Recursive Feature Elimination 

Unsupervised Learning

Do not use the target variable (e.g. remove redundant variables). Unsupervised feature selection techniques ignore the target variable, such as methods that remove redundant variables using correlation or features that have few values or low variance(i.e. data cleaning).

Detail Listing

  1. Basic Selection Methods

    1. Removing Constant Features
    2. Removing Quasi-Constant Features
    3. Removing Duplicated Features
  2. Correlation Feature Selection

    1. Removing Correlated Features
    2. Basic Selection Methods + Correlation - Pipeline
  3. Filter Methods: Univariate Statistical Methods

    1. Mutual Information
    2. Chi-square distribution
    3. Anova
    4. Basic Selection Methods + Statistical Methods - Pipeline
  4. Filter Methods: Other Methods and Metrics

    1. Univariate roc-auc, mse, etc
    2. Method used in a KDD competition - 2009
  5. Wrapper Methods

    1. Step Forward Feature Selection
    2. Step Backward Feature Selection
    3. Exhaustive Feature Selection
  6. Embedded Methods: Linear Model Coefficients

    1. Logistic Regression Coefficients
    2. Linear Regression Coefficients
    3. Effect of Regularization on Coefficients
    4. Basic Selection Methods + Correlation + Embedded - Pipeline
  7. Embedded Methods: Lasso

    1. Lasso
    2. Basic Selection Methods + Correlation + Lasso - Pipeline
  8. Embedded Methods: Tree Importance

    1. Random Forest derived Feature Importance
    2. Tree importance + Recursive Feature Elimination
    3. Basic Selection Methods + Correlation + Tree importance - Pipeline
  9. Hybrid Feature Selection Methods

    1. Feature Shuffling
    2. Recursive Feature Elimination
    3. Recursive Feature Addition

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.