Git Product home page Git Product logo

biquality-learn's Introduction

biquality-learn

main codecov versions pypi

biquality-learn (or bqlearn in short) is a library à la scikit-learn for Biquality Learning.

Biquality Learning

Biquality Learning is a machine learning framework to train classifiers on Biquality Data, where the dataset is split into a trusted and an untrusted part:

  • The trusted dataset contains trustworthy samples with clean labels and proper feature distribution.
  • The untrusted dataset contains potentially corrupted samples from label noise or covariate shift (distribution shift).

biquality-learn aims at making well-known and proven biquality learning algorithms accessible and easy to use for everyone and enabling researchers to experiment in a reproducible way on biquality data.

Install

biquality-learn requires multiple dependencies:

  • numpy>=1.17.3
  • scipy>=1.5.0
  • scikit-learn>=1.3.0
  • scs>=3.2.2

The package is available on PyPi. To install biquality-learn, run the following command :

pip install biquality-learn

A dev version is available on TestPyPi :

pip install --index-url https://test.pypi.org/simple/ biquality-learn

Quick Start

For a quick example, we are going to train one of the available biquality classifiers, KPDR, on the digits dataset with synthetic asymmetric label noise.

Loading Data

First, we must load the dataset with scikit-learn and split it into a trusted and untrusted dataset.

from sklearn.datasets import load_digits
from sklearn.model_selection import StratifiedShuffleSplit

X, y = load_digits(return_X_y=True)

trusted, untrusted = next(StratifiedShuffleSplit(train_size=0.1).split(X, y))

Simulating Label Noise

Then we generate label noise on the untrusted dataset.

from bqlearn.corruption import make_label_noise

y[untrusted] = make_label_noise(y[untrusted], "flip", noise_ratio=0.8)

Training Biquality Classifier

Finally, we train KKMM on the biquality dataset by providing the sample_quality metadata, indicating if a sample is trusted or untrusted.

from sklearn.linear_models import LogisticRegression
from bqlearn.density_ratio import KKMM

bqclf = KKMM(LogisticRegression(), kernel="rbf")

sample_quality = np.ones(X.shape[0])
sample_quality[untrusted] = 0

bqclf.fit(X, y, sample_quality=sample_quality)
bqclf.predict(X)

Citation

If you use biquality-learn in your research, please consider citing us :

@misc{nodet2023biqualitylearn,
      title={biquality-learn: a Python library for Biquality Learning}, 
      author={Pierre Nodet and Vincent Lemaire and Alexis Bondu and Antoine Cornuéjols},
      year={2023},
      eprint={2308.09643},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Acknowledgment

This work has been funded by Orange Labs.

Orange Logo

biquality-learn's People

Contributors

pierrenodet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.