Git Product home page Git Product logo

selfstudy-adversarial-robustness's Introduction

Self-study course in evaluating adversarial robustness

This repository contains a collection of defenses aimed at researchers who wish to learn how to properly evaluate the robustness of adversarial example defenses.

While there is a vast literature of published techniques that help to attack adversarial example defenses, few researchers have practical experience actually running these. This project is designed to give researchers that experience, so that when they develop their own defenses, they can perform a thorough evaluation.

The primary purpose of these defenses is therefore to be pedagogically useful. We aim to make our defenses as simple as possible---even sacrificing accuracy or robustness if we can make a defense simpler.

This simplicity-above-all-else has one cost: it possible for one to be able to break all of our defenses, but still not properly evaluate the robustness of their own (often more complicated codebase) codebase. Researchers who aim to build strong defenses would be well served by first creating a defense as simple as we have here and then analyzing it.

This is currently a preliminary code release. We expect that we will make changes to the defenses, training algorithms, and possibly framework, as we receive feedback. We have no immediate timeline for when it will become stable.

A whitepaper describing this project will be coming in the future. More details on getting started to use this project are available in the following three documents.

List of all defenses

The list below goes (roughly) from the easiest to easier to harder defenses. The first defenses require very little overall knowledge, and by the end we hope to cover all modern defense techniques. However, feel free to study them in any convenient order.

  1. Baseline (defense_baseline)
    • This is naively trained model without any hardening against adversarial examples.
  2. Bluring (defense_blur)
    • This model blurs its input images in an attempt to remove adversarial perturbation.
  3. Softmax temperature (defense_temperature)
    • This model trains a neural network with increased softmax temperature.
  4. Ensemble of binary classifiers (defense_mergebinary)
    • The model works by merging predictions of independent per-class binary classfiers.
  5. Label smoothing (defense_labelsmooth)
    • This model is trained with a label smoothing objective.
  6. Jump (defense_jump)
    • This model applies a "jump" activation function instead of ReLU.
  7. Majority vote (defense_majority)
    • This model takes a best-of-three majority vote among separate classifiers.
  8. Discretization (defense_discretize)
    • This model encodes the input into a more sophisticated discretization of the input.
  9. Random neuron perturbations (defense_randomneuron)
    • This model adds test-time randomness to the activations of a neural network.
  10. Transform (defense_transform)
    • This model randomly modifies the input image before classifying it.
  11. Injection (defense_injection)
    • This model injects backdoors into a model so that inputs can be fingerprinted.
  12. K-nearest neighbours (defense_knn)
    • This model embeds inputs into representation space and then uses a nearest neighbor classifier.
  13. Simple adversarial training (defense_advtrain)
    • This model trains on adversarial examples to become robust to them.

Repository structure

selfstudy-adversarial-robustness's People

Contributors

carlini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

selfstudy-adversarial-robustness's Issues

Detection false positives?

When evaluating whether inputs are adversarial, the framework first checks whether the classification of the input matches the groundtruth label. If it does not, then it uses the detection mechanism to reject/ignore inputs. Only when both conditions are satisfied does the framework consider the input adversarial.

def evaluate(self, defense, example_idx, true_label,
src_example, adv_example,
src_pred, adv_pred,
src_detector, adv_detector):
# Verify that the label is now incorrect
if np.argmax(adv_pred) == true_label:
return False, "Label {} matches true label {}".format(np.argmax(adv_pred), true_label)
# Verify that example is within the allowed Lp norm
distortion = np.linalg.norm((src_example - adv_example).flatten(), ord=self.norm)
if distortion > self.threshold + 1e-3:
return False, "Distortion {} exceeds bound {}".format(distortion, self.threshold)
# Verify that it's not detected as adversarial
if adv_detector > defense.threshold:
return False, "Adversarial example rejected by detector with score {}.".format(adv_detector)
return True, None

My expectation was that correctly classified inputs also ought to be rejected if they trip the detector, but because L223 returns early this can never happen. This is particularly pronounced in the transform defense where a non-trivial majority of the benign inputs would be rejected by the "stable prediction" detector. Is this intentional? It’s a little weird to force the attacker to defeat some objective that the defender can almost never achieve.

Blurring function in blur defense?

It looks like the blur function in the blur defense and training code,

def blur(x):
x_pad = np.pad(x, [(0, 0), (1, 1), (1, 1), (0, 0)])
x_pad = (x_pad[:, :1] + x_pad[:, :-1])/2
x_pad = (x_pad[:, :, :1] + x_pad[:, :, :-1])/2
return x_pad

, is functionally equivalent to:

np.pad(x, [(0, 0), (1, 0), (1, 0), (0, 0)]) / 4

At least, x_pad[:, :1] and x_pad[:, :, :1] are the zero vectors. I assume this was meant to be a box blur and those offsets should be changed accordingly.

L2 Attack documentation?

I see that the task_definition files have L2 thresholds in them as well as Linf. However, it's not clear how to use them in the framework. Do I create a new attack_l2.py or attack_l2_torch.py file? Does something else need to be done?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.