Git Product home page Git Product logo

prune4rel's Introduction

Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy (NeurIPS 2023, PDF)

by Dongmin Park1, Seola Choi1, Doyoung Kim1, Hwanjun Song1, 2, Jae-Gil Lee1.

1 KAIST, 2 Amazon AWS AI

  • Sep 22, 2023: Our work is accepted at NeurIPS 2023.

Brief Summary

  • Prune4ReL is a new data pruning method for Re-labeling models (e.g., DivideMix & SOP+) showing state-of-the-art performance under label noise.
  • Inspired by a re-labeling theory, Prune4ReL finds the desired data subset by maximizing the total reduced neighborhood confidence, thereby maximizing re-labeling & generalization performance.
  • With a greedy approximation, Prune4ReL is efficient and scalable to large datasets including Clothing-1M & ImageNet-1K.
  • On four real noisy datasets (e.g., CIFAR-10/100N, WebVision, & Clothing-1M), Prune4Rel outperforms data pruning baselines with Re-labeling models by 9.1%, and those with a standard model by 21.6%.

How to run

Prune4ReL

Please follow Table 7 for hyperparameters. For CIFAR-10N dataset with SOP+ as Re-labeling model,

python3 main_label_noise.py --gpu 0 --model 'PreActResNet18' --robust-learner 'SOP' -rc 0.9 -rb 0.1 \
          --dataset CIFAR10 --noise-type $noise_type --n-class 10 --lr-u 10 -se 10 --epochs 300 \
          --fraction $fraction --selection Prune4Rel --save-log True \
          --metric cossim --uncertainty LeastConfidence --tau 0.975 --eta 1 --balance True

More detailed scripts for other datasets can be found in scripts/ folder.

Data Pruning Baselines: Uniform, SmallLoss, Margin, Forgetting, GraNd, Moderate, etc

Basically, the script is similar to that of Prune4ReL. For example,

python3 main_label_noise.py --gpu 0 --model 'PreActResNet18' --robust-learner 'SOP' -rc 0.9 -rb 0.1 \
          --dataset CIFAR10 --noise-type $noise_type --n-class 10 --lr-u 10 -se 10 --epochs 300 \
          --fraction $fraction --selection *$pruning_algorithm* --save-log True \

where *$pruning_algorithm* must be from [Uniform, SmallLoss, Uncertainty, Forgetting, GraNd, ...], each of which is a class name in deep_core/methods/~~.py.

Citation

@article{park2023robust,
  title={Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy},
  author={Park, Dongmin and Choi, Seola and Kim, Doyoung and Song, Hwanjun and Lee, Jae-Gil},
  journal={NeurIPS 2023},
  year={2023}
}

References

We thank the DeepCore library, on which we built most of our repo. Hope our project helps extend the open-source library of data pruning.

  • DeepCore library [code] : DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning, Guo et al. 2022.

prune4rel's People

Contributors

dongmean avatar

Stargazers

Kayhan Batmanghelich avatar  avatar Sung-Yub Kim avatar Ritchie avatar ZhijingWan avatar Fourk_tx avatar  avatar  avatar

Watchers

Jae-Gil Lee avatar Minseok Kim avatar

Forkers

successhaha

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.