Git Product home page Git Product logo

picket's Introduction

Picket: Guarding Against Corrupted Data in Tabular Data during Learning and Inference

Picket is a system that safeguards against data corruptions during both training and deployment of machine learning models over tabular data. Details of the system can be found in our paper: https://arxiv.org/abs/2006.04730. Please cite our paper if you use the code.

Environment Requirements

We run the experiments with the following dependencies:

  • pytorch v1.0.1
  • matplotlib v3.2.1
  • pandas v0.23.4
  • scikit-learn v0.22.1
  • gensim v3.8.1
  • tqdm v4.45.0
  • adversarial-robustness-toolbox v1.2.0

For poisoning attack at training time, we use the implementation by the Github repo: Poisoning-Attacks-with-Back-gradient-Optimization

Set up with a Conda Virtual Environment

  • First, download Anaconda (not miniconda) from https://www.anaconda.com/products/individual. Follow the steps for your OS and framework.
  • Second, create a conda environment (Python 3.6+). For example, to create a Python 3.7 conda environment, run:
conda create -n picketEnv python=3.7
  • Upon starting/restarting your terminal session, you will need to activate your conda environment by running
conda activate picketEnv

The following two steps should be run in the activated virtual environment:

  • Run bash install.sh gpu for dependency installation within a conda3 environment if a gpu is available, otherwise run bash install.sh cpu. The default CUDA version is 10.0. Modify intall.sh accordingly if you use a gpu with other CUDA versions.
  • Run python -m ipykernel install --user --name=picketEnv to add your virtual environment to Jupyter

In a new terminal outside the virtual environment:

  • Run jupyter notebook to Launch the Jupyter Notebook App.
  • Open one of the notebooks, change the kernel to picketEnv and run it. The following screen shot illustrates how to change the kernel: Alt text

Demos

The following two notebooks provide demos showing the usage of Picket.

  • notebooks/Demo-Titanic.ipynb: The notebook demonstrates how to integrate our system into a machine learning pipeline. The noise is random.
  • notebooks/Demo-Wine.ipynb: Same as the previous one except that the noise is adversarial.

Experiments

The following notebooks run the experiments with the same settings in the paper. The datasets can be found at https://drive.google.com/file/d/1onXCORjgTGizlL_QjZQ2LoVgJ0uDE5Q_/view?usp=sharing.

  • notebooks/Experiments-Main.ipynb: The notebook runs experiments that evaluate the end-to-end performance of Picket.
  • notebooks/Experiments-MicroBenchmark-TwoStream.ipynb: The notebook validates the effectiveness of the two stream design of PicketNet.
  • notebooks/Experiments-MicroBenchmark-EarlyFiltering.ipynb: The notebook validates the effectiveness of early filtering.
  • notebooks/Experiments-MicroBenchmark-VictimSampleDetector.ipynb: The notebook validates the effectiveness of one victim sample detector per class.
  • notebooks/Experiments-MicroBenchmark-ArtificialNoiseType.ipynb: The notebook validates the effectiveness of the artificial noise injected for detector training.
  • notebooks/Experiments-MicroBenchmark-Structure.ipynb: The notebook validates the ability of Picket to capture the structure of the data.
  • matlabCode/downstreamEffectNN.m: The script evaluates the effect of filtering on the downstream neural network after notebook/Experiments-Main.ipynb.

Deactivate the virtual environment using conda deactivate, source deactivate, or deactivate depending on the version, when it's no longer need.

picket's People

Contributors

zifanl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.