Git Product home page Git Product logo

ryanquinnnelson / cmu-02750-query-selection-methods-in-active-learning Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 1.0 4.7 MB

Spring 2021 - Automation of Scientific Research - course project

Jupyter Notebook 96.59% Python 3.41%
active-learning algorithm-implementation type-1-active-learning iwal modal-library random-forest-classifier logistic-regression uncertainty-sampling density-based-sampling query-by-committee

cmu-02750-query-selection-methods-in-active-learning's Introduction

CMU-02750-HW1

Spring 2021 Automation of Scientific Research course project - Study of Query Selection Methods in Active Learning (HW1)

Summary

There are two parts to this project:

  • The first part of this project explores heuristic query selection methods with pool-based sampling (Uncertainty Sampling, Density-based Sampling, Query-by-Committee) using the modAL library.
  • The second part of this project implements the Importance Weighted Active Learning (IWAL) algorithm from Beygelzimer et al. (2009) with bootstrap rejection threshold and hinge loss. IWAL is developed as a Python package with unit tests (pytest) and documentation.

Analysis was performed using Jupyter Notebook and Python.

Project Structure

The IWAL algorithm is implemented as Python package iwal and is found under /packages.

Explanation of IWAL

Formal Version

IWAL is a Type I (hypothesis elimination) active learning algorithm used for binary and multiclass classification on any data access model. The algorithm labels instances in the disagreement region. To correct for sampling bias, IWAL uses an importance weighting strategy carefully chosen to control variance. Called "loss-weighting", this strategy defines the importance weight for a labeled instance to be inversely proportional to the range of predictions made for that instance over a bounded hypothesis space (i.e. close to optimal).

The reason IWAL is consistent (i.e. converges to the optimal model) is that the rejection threshold it uses to decide whether or not to label an instance is bounded away from zero. With every instance having a chance of being selected for labeling, IWAL will eventually uncover all regions of disagreement.

Informal Version

In layman's terms, the more disagreement about the predicted label of a given instance, the more likely the algorithm is to select that instance for labeling. This introduces bias: the instance is more likely to show up in the training set than the test set. Bias is corrected for by reducing the influence this labeled instance has within the training set by the same amount. IWAL is defined so every instance it considers has a chance of being selected for labeling (even if it is very small). This ensures that the algorithm will consistently find the best model over the long-run, regardless of the data it sees along the way.

cmu-02750-query-selection-methods-in-active-learning's People

Contributors

ryanquinnnelson avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

dgenx23

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.