Git Product home page Git Product logo

blackboxauditing's Introduction

Black Box Auditing

This repository contains a sample implementation of Gradient Feature Auditing (GFA) meant to be generalizable to most datasets. For more information on the repair process, see our paper on Certifying and Removing Disparate Impact. For information on the full auditing process, see our paper on Auditing Black-box Models by Obscuring Features.

To run GFA on a dataset, use the main.py file. The top few lines of that file dictate what machine-learning technique is to be used (the "model factory"), what dataset should be loaded (the "experiment"), and what the response-feature of the data-set is. You also may specify certain dataset features to ignore in the training/auditing process, as well as which "measurers" you would like to use for GFA.

Creating a New "Experiment" / Using a New Dataset

Each "Experiment" should reside in the experiments directory as a separate module; each such module should have a load_data method prescribed in the __init__.py file (refer to experiments/sample/__init__.py for an example). This load_data method should return a tuple containing (in order) the headers, training set, and test set for the experiment.

Testing Code Changes

All tests should be run from the main project directory. To make this process easier, a run_test_suite.sh file has been included (which can be run with bash via: bash ./run_test_suite.sh) in order to run all available tests at once.

Every python file should include test functions at the bottom that will be run when the file is run. This can be done by including the line if __name__=="__main__": test() as long as there is a function defined as test.

These tests should use print statements with True or False readouts indicating success or failure (where True should always be success). It is fine/good to have multiple of these per file.

Note: if a test requires reading data from the test_data directory, it should import the appropriate load_data file from the experiments directory.

Implementing a New Machine-Learning Method

The best way to create a model would be to use a ModelFactory and ModelVisitors. A ModelVisitor should be thought of as a wrapper that knows how to load a machine-learning model of a given type and communicate with that model file in order to output predicted values of some test dataset. A ModelFactory simply knows how to "build" a ModelVisitor based on some provided training data. Check out the "Abstract" files in the sample_experiment directory for outlines of what these two classes should do; similarly, check out the "SVM_ModelFactory" files in the sample_experiment subdirectory for examples that use WEKA to create model files and produce predictions.

Setup and Installation

  1. Clone this repository to your workspace.
  2. Install WEKA and/or Tensorflow (see below).
  3. Update the WEKA path in model_factories/AbstractWekaModelFactory.py.
  4. Install the Python dependencies listed in the requirements.txt file.
  5. Install python-matplotlib if you do not already have it (sudo apt-get install python-matplotlib).
  6. Run python main.py to run the sample experiment.

Many of the ModelVisitors rely on Weka. Similarly, we use TensorFlow 0.6.0 for network-based machine learning. Any Python libraries that need to be installed are included in the requirements.txt file.

Sources

Dataset Sources:

  • adult.csv link
  • german_categorical.csv (Modified from link
  • RicciDataMod.csv (Modified from link)
  • DRP Datasets (Source and data-files coming soon.)
  • Arrests/Recidivism Datasets link
  • Linear Datasets ("sample_2" Experiment) link

More information on DRP can be found at the Dark Reactions Project official site.

Bug Reports and Feature-Requests

All bug reports and feature-requests should be submitted through the Issue Tracker.

blackboxauditing's People

Contributors

bsmith8108 avatar cfalk avatar cscheid avatar grybeck avatar sorelle avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

rpplayground

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.