madrylab / cifar10_challenge Goto Github PK

A challenge to explore adversarial robustness of neural networks on CIFAR10.

License: MIT License

Python 100.00%

cifar10_challenge's Introduction

CIFAR10 Adversarial Examples Challenge

Recently, there has been much progress on adversarial attacks against neural networks, such as the cleverhans library and the code by Carlini and Wagner. We now complement these advances by proposing an attack challenge for the CIFAR10 dataset which follows the format of our earlier MNIST challenge. We have trained a robust network, and the objective is to find a set of adversarial examples on which this network achieves only a low accuracy. To train an adversarially-robust network, we followed the approach from our recent paper:

Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu
https://arxiv.org/abs/1706.06083.

As part of the challenge, we release both the training code and the network architecture, but keep the network weights secret. We invite any researcher to submit attacks against our model (see the detailed instructions below). We will maintain a leaderboard of the best attacks for the next two months and then publish our secret network weights.

Analogously to our MNIST challenge, the goal of this challenge is to clarify the state-of-the-art for adversarial robustness on CIFAR10. Moreover, we hope that future work on defense mechanisms will adopt a similar challenge format in order to improve reproducibility and empirical comparisons.

Update 2017-12-10: We released our secret model. You can download it by running python fetch_model.py secret. As of Dec 10 we are no longer accepting black-box challenge submissions. We have set up a leaderboard for white-box attacks on the (now released) secret model. The submission format is the same as before. We plan to continue evaluating submissions and maintaining the leaderboard for the foreseeable future.

Black-Box Leaderboard (Original Challenge)

Attack	Submitted by	Accuracy	Submission Date
PGD on the cross-entropy loss for the adversarially trained public network	(initial entry)	63.39%	Jul 12, 2017
PGD on the CW loss for the adversarially trained public network	(initial entry)	64.38%	Jul 12, 2017
FGSM on the CW loss for the adversarially trained public network	(initial entry)	67.25%	Jul 12, 2017
FGSM on the CW loss for the naturally trained public network	(initial entry)	85.23%	Jul 12, 2017

White-Box Leaderboard

Attack	Submitted by	Accuracy	Submission Date
Guided Local Attack	Siyuan Yi	43.95%	Aug 2, 2021
EWR-PGD	Ye Liu	43.96%	Sep 8, 2020
PGD attack with Output Diversified Initialization	Yusuke Tashiro	43.99%	Feb 15, 2020
MultiTargeted	Sven Gowal	44.03%	Aug 28, 2019
FAB: Fast Adaptive Boundary Attack	Francesco Croce	44.51%	Jun 7, 2019
Distributionally Adversarial Attack	Tianhang Zheng	44.71%	Aug 21, 2018
20-step PGD on the cross-entropy loss with 10 random restarts	Tianhang Zheng	45.21%	Aug 24, 2018
20-step PGD on the cross-entropy loss	(initial entry)	47.04%	Dec 10, 2017
20-step PGD on the CW loss	(initial entry)	47.76%	Dec 10, 2017
FGSM on the CW loss	(initial entry)	54.92%	Dec 10, 2017
FGSM on the cross-entropy loss	(initial entry)	55.55%	Dec 10, 2017

Format and Rules

The objective of the challenge is to find black-box (transfer) attacks that are effective against our CIFAR10 model. Attacks are allowed to perturb each pixel of the input image by at most epsilon=8.0 on a 0-255 pixel scale. To ensure that the attacks are indeed black-box, we release our training code and model architecture, but keep the actual network weights secret.

We invite any interested researchers to submit attacks against our model. The most successful attacks will be listed in the leaderboard above. As a reference point, we have seeded the leaderboard with the results of some standard attacks.

The CIFAR10 Model

We used the code published in this repository to produce an adversarially robust model for CIFAR10 classification. The model is a residual convolutional neural network consisting of five residual units and a fully connected layer. This architecture is derived from the "w32-10 wide" variant of the Tensorflow model repository. The network was trained against an iterative adversary that is allowed to perturb each pixel by at most epsilon=8.0.

The random seed used for training and the trained network weights will be kept secret.

The sha256() digest of our model file is:

555be6e892372599380c9da5d5f9802f9cbd098be8a47d24d96937a002305fd4

We will release the corresponding model file on September 15 2017, which is roughly two months after the start of this competition. Edit: We are extending the deadline for submitting attacks to October 15th due to requests.

The Attack Model

We are interested in adversarial inputs that are derived from the CIFAR10 test set. Each pixel can be perturbed by at most epsilon=8.0 from its initial value on the 0-255 pixel scale. All pixels can be perturbed independently, so this is an l_infinity attack.

Submitting an Attack

Each attack should consist of a perturbed version of the CIFAR10 test set. Each perturbed image in this test set should follow the above attack model.

The adversarial test set should be formated as a numpy array with one row per example and each row containing a 32x32x3 array of pixels. Hence the overall dimensions are 10,000x32x32x3. Each pixel must be in the [0, 255] range. See the script pgd_attack.py for an attack that generates an adversarial test set in this format.

In order to submit your attack, save the matrix containing your adversarial examples with numpy.save and email the resulting file to [email protected]. We will then run the run_attack.py script on your file to verify that the attack is valid and to evaluate the accuracy of our secret model on your examples. After that, we will reply with the predictions of our model on each of your examples and the overall accuracy of our model on your evaluation set.

If the attack is valid and outperforms all current attacks in the leaderboard, it will appear at the top of the leaderboard. Novel types of attacks might be included in the leaderboard even if they do not perform best.

We strongly encourage you to disclose your attack method. We would be happy to add a link to your code in our leaderboard.

Overview of the Code

The code consists of seven Python scripts and the file config.json that contains various parameter settings.

Running the code

python train.py: trains the network, storing checkpoints along the way.
python eval.py: an infinite evaluation loop, processing each new checkpoint as it is created while logging summaries. It is intended to be run in parallel with the train.py script.
python pgd_attack.py: applies the attack to the CIFAR10 eval set and stores the resulting adversarial eval set in a .npy file. This file is in a valid attack format for our challenge.
python run_attack.py: evaluates the model on the examples in the .npy file specified in config, while ensuring that the adversarial examples are indeed a valid attack. The script also saves the network predictions in pred.npy.
python fetch_model.py name: downloads the pre-trained model with the specified name (at the moment adv_trained or natural), prints the sha256 hash, and places it in the models directory.
cifar10_input.py provides utility functions and classes for loading the CIFAR10 dataset.

Parameters in `config.json`

Model configuration:

model_dir: contains the path to the directory of the currently trained/evaluated model.

Training configuration:

tf_random_seed: the seed for the RNG used to initialize the network weights.
numpy_random_seed: the seed for the RNG used to pass over the dataset in random order
max_num_training_steps: the number of training steps.
num_output_steps: the number of training steps between printing progress in standard output.
num_summary_steps: the number of training steps between storing tensorboard summaries.
num_checkpoint_steps: the number of training steps between storing model checkpoints.
training_batch_size: the size of the training batch.

Evaluation configuration:

num_eval_examples: the number of CIFAR10 examples to evaluate the model on.
eval_batch_size: the size of the evaluation batches.
eval_on_cpu: forces the eval.py script to run on the CPU so it does not compete with train.py for GPU resources.

Adversarial examples configuration:

epsilon: the maximum allowed perturbation per pixel.
k: the number of PGD iterations used by the adversary.
a: the size of the PGD adversary steps.
random_start: specifies whether the adversary will start iterating from the natural example or a random perturbation of it.
loss_func: the loss function used to run pgd on. xent corresponds to the standard cross-entropy loss, cw corresponds to the loss function of Carlini and Wagner.
store_adv_path: the file in which adversarial examples are stored. Relevant for the pgd_attack.py and run_attack.py scripts.

Example usage

After cloning the repository you can either train a new network or evaluate/attack one of our pre-trained networks.

Training a new network

Start training by running:

python train.py

(Optional) Evaluation summaries can be logged by simultaneously running:

python eval.py

Download a pre-trained network

For an adversarially trained network, run

python fetch_model.py adv_trained

and use the config.json file to set "model_dir": "models/adv_trained".

For a naturally trained network, run

python fetch_model.py natural

and use the config.json file to set "model_dir": "models/naturally_trained".

Test the network

Create an attack file by running

python pgd_attack.py

Evaluate the network with

python run_attack.py

cifar10_challenge's People

Contributors

Stargazers

Watchers

Forkers

garyyh78 liuheng2cqupt pgentili akemisetti angusg bristy1588 tengyangxie jfchi bai-li dxoigmn kexu935 yang-song aminjun hongxin001 electronicshelf gabrielrmachado ykubo82 ghostofadam luizgh bethgelab a7b23 searow cadurosar lyqgo lith0613 runtianz vangvassalos asayghe dinggit zhuchen03 suyeecav line290 jfc43 hadisalman yguooo dangofuko liuye6666 yyvettey xinshaoamoswang xzk7 fmannan keranrong tinyloop chengzhicu marckhoury simpleonly1 uriyapes khchow-gt qiaoptdun ziqizh machanic roy-ch halvorbmundal goel96vibhor littlefish12 zhaitongqing233 lironghuo nmakes upcdz vertialartificial lin2020 sanketkshah mvuyyuru ezeob002 fuweijie fu-bit xiangni mathczh edlin0249 originofamonia lliai saehyung-lee superctj metehancekic shiyujiaaaa yelobean andrehuang steven202 rophen2333 andrewpatrickdu safiyajan freegliboracle yanzhaowu zzf2014 mhilmiasyrofi p3n9w31 jdarena66 aam-at dancwpark hacktfj eduardoandrade evewyt linsencc rmoraffa t2222l hechenghai cv-scb0 grandwang ksauka xuhuabao

cifar10_challenge's Issues

Submitting my result to white CIFAR-10 leaderboards

hello mardy:
I am very happy to read such a good paper, and thank you very much for providing the white box MNIST and CIFAR-10 leaderboards. I recently（2020.8.15） submitted the results of my adversarial attack to you. If you have time, could you check my results and update the CIFAR-10 leaderboards?
Thank you very much！
My name is ye Liu.

Making adversarial examples during training

During training, could you explain me why did you use the gradient of a 'train' model to make PGD adversarial examples? It seems unnatural since the batch normalization could hinder generating 'real' adversarial examples.
Thanks.

Questions about recreating paper results

I am working to recreate some of the results from your paper, specifically some cifar10 transfer results. I have noticed something in the tables that doesnt seem intuitive so I was wondering if you could comment on.

In Table 5 [Model=Wide-Natural, Adversary=FGSM] it appears the whitebox model accuracy while under attack is 32.7%. In Table 3 [Target = Wide-Natural, Source = Wide-Natural] the accuracy of the target model under FGSM attack is recorded as 21.3%. This is surprising to me because it means the black-box attack is more powerful than the whitebox attack which I have never observed before. Do you have any intuitions or explanations about this?

Thank you.

When generating uniform noise in random start, floating point number will cause invalid pixel value.

In here,replace
x = x_nat + np.random.uniform(-self.epsilon, self.epsilon, x_nat.shape)
with
x = x_nat + np.random.random_integers(int(-self.epsilon), int(self.epsilon), x_nat.shape)
Actually, x_nat is discrete and converted from UINT8, but uniform noise got from np.random.uniform() is continuous if we ignore machine word-length.
When doing PGD adversarial training, I think FLOAT type maybe ok. However, when generating adversarial examples, I think we should restrict adversarial space in a meaningful space, says UINT8.
What's more, in run_attack.py, we should make sure all pixel values in an adversarial image can map to UINT8.

cifar_input.py Function get_next_batch() has a small bug.

Hi there,

Thanks a lot for the open sourced project!

I recently found that the function get_next_batch() in cifar_input.py has a small bug.
In cifar_input.py line 132 and 142,

` actual_batch_size = min(batch_size, self.n - self.batch_start)

    if actual_batch_size < batch_size:

        if reshuffle_after_pass:

            self.cur_order = np.random.permutation(self.n)

        self.batch_start = 0

    batch_end = self.batch_start + batch_size

    batch_xs = self.xs[self.cur_order[self.batch_start : batch_end], ...]

    batch_ys = self.ys[self.cur_order[self.batch_start : batch_end], ...]

    self.batch_start += actual_batch_size`

The final line here should be self.batch_start += batch_size, since the generated batch contains (batch_size) images and labels.

For example, when we just take every image out and start over again, this will cause actual_batch_size = 0, then self.batch_start will not update in the first round and there will be two same batch generated.

About the accuracy of adversarial examples

I download the two 'secret model' from the web url in fetch_model.py, and load the model weights. When I use the adversarial examples generated from my own method, I found the test accuracy of the naturally_trained model is even better than the accuracy of adv_trained model. I don't know why that happens, can you give some explanation ?

About the convergence of training.

Hello, thanks for your great work. I wonder as the training going on, how to judge the convergence of training? Just according to the curves of loss?

clean accuracy on adversarially trained cifar10 resnet 18

Can you tell what is the clean accuracy of the adv. trained model here (used in the leaderboard)?
Also what is the number of iterations for PGD used for adv. training?

Dataset normalization

Hello,

I am trying to re-implement your CIFAR10 adv. training in PyTorch and maybe some of the questions will be based on my limited knowledge of TensorFlow.

I have couple of questions regarding CIFAR10 dataset normalization. In PyTorch, the entire dataset is usually normalized as the dataset is loaded through the loader by adding normalization as one of transformations (after converting image to tensor to be in range [0,1]). Also, the normalization is usually implemented by specifying per-channel mean and stddev computed for the entire dataset. Hence, my questions are the following:

What is the reason you are implementing "per_image_standardization" as part of the model rather than normalization over entire dataset as preprocessing? Is it to keep original samples in the range of 0-255 and to perform perturbations in that range?
Is there any difference between implementing standardization/normalization using per-channel mean and stddev computed for entire dataset (case of PyTorch) and mean and stddev computed for each separate image (case of tf.image.per_image_standardization)? As far as I can tell the end aim is essentially the same for both cases: to have samples with zero mean and unit variance. But I think in case of TensorFlow, as the sample will be perturbed, normalization will change correspondingly to keep the input distribution to the model consistently with zero mean and unit variance.

Thank you and sorry for verbosity: wanted to make sure I delivered my concerns properly.

Overflow when random_restart is false

We believe there is overflow occurring in pgd_attack.py when random_start is False. Because x is of type uint8, x will overflow when the gradient step is added to it owing to the unsafe add. To fix this, we propose the change below. (Note: there is a similar issue with x_nat when the step_size in config.json is an integer.)

Interestingly, when we run 20-step PGD with no random start, step size of 2.0, and our fix, the adversarially-trained model achieves an adversarial accuracy of 45.81%. That is really close to the 20-step PGD on the cross-entropy loss with 10 random restarts white-box leaderboard result (45.21%). We also found that increasing the number of steps to 100 with a step size of 1.0 yields an adversarial accuracy of 45.37%, closing the gap further.

It seems that random-starts/random-restarts are unnecessary when you attack an adversarially-trained model. Any difference between a random start and non-random start would imply that either the attack needs more iterations or that gradient masking is occurring for those examples. We are currently investigating how this issue affects adversarial training.

Proposed change:

diff --git cifar10_input.py cifar10_input.py
index aa2eec4..334bba5 100644
--- cifar10_input.py
+++ cifar10_input.py
@@ -42,7 +42,7 @@ class CIFAR10Data(object):
         eval_filename = 'test_batch'
         metadata_filename = 'batches.meta'

-        train_images = np.zeros((50000, 32, 32, 3), dtype='uint8')
+        train_images = np.zeros((50000, 32, 32, 3), dtype='float32')
         train_labels = np.zeros(50000, dtype='int32')
         for ii, fname in enumerate(train_filenames):
             cur_images, cur_labels = self._load_datafile(os.path.join(path, fname))

about the loss in the pgd_attack

The loss from CW in line 36 of pgd_attack.py use a negative sign, but there is not a such sign in the original CW loss, looking forward to your help

 loss = -tf.nn.relu(correct_logit - wrong_logit + 50)

White-box result of madry_lab_challenges in examples of cleverhans.

I run the code in 'cleverhans/examples/madry_lab_challenges/cifar10/attack_model.py' with default parameter settings to attack target model with 'models/adv_trained' checkpoints. And I get the results as follows, which are something different from those in the white-box leaderboard. I don't know why the resulting test accuraries are higher. Any help would be appreciated!
PGD: 0.5370
fgsm: 0.6330
cwl2 : 0.5420

Image Channels

Hi Team,

For example, if a CNN model (say, image classification) is trained on a 1-channel (grayscale) inputs, how can we deal with the perturbations or l-norm constraints? any thoughts?
Thanks.

Number of trainable parameters

I logged out the number of trainable parameters here and I received 45,901,914 params.
Using this function
np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()])

But when I look at the number of trainable parameters in the wideresnet from their original paper: https://arxiv.org/abs/1605.07146 I see 36.5M. Why is yours so much more?

Also, is there a pytorch version of this network? I noticed you referred to the Robustness platform, but I don't see an actual implementation of the exact same network mentioned here in that repository here: https://github.com/MadryLab/robustness/blob/master/robustness/cifar_models/resnet.py I only see a ResNet 18 wide but that's it.

Thanks,

PGD steps along the sign of the gradient

Googlenet with owndata

Hi Team,

can we extend this cifar10_challenge to a vehicle classification dataset trained using googlenet model (tensorflow), any thoughts?

About the network architecture

Hi,

I have a question about the network architecture. As you commented in https://github.com/MadryLab/cifar10_challenge/blob/master/model.py#L50, it is a w28-10 wide residual network. However, when I looked at the original implementation of wide resnet at https://github.com/tensorflow/models/blob/master/research/resnet/resnet_model.py#L87-L92, the hps.num_residual_units needs to be set as 4. But it is set as 5 in your code, which corresponds to a w34-10 network?

Base network questions and implementation.

Does anybody know if there is a PyTorch implementation of the Wide ResNet network specifically mentioned in this repository. I have found some but they are 30x10 instead of 28x10? Additionally is the standard (non-wide) ResNet a ResNet101?

config for 20 iteration pgd that gives 47.04% accuracy.

Hi all,
I wanted to know what is the step size being used for the 20 step pgd attacks which has 47.04 accuracy.

How to determine "best" model

After training, we are left with ~80 models saved at each 1k iteration. What is your rule for selecting the "best" model to then keep and do further evaluations with? I am especially wondering because I have noticed that when I train a model with FGSM adversary only, if I simply select the model with the greatest robustness to FGSM adversary, the clean data accuracy may not be that great. Essentially, how do you determine tradeoff between robustness to the adversary you trained against vs clean data accuracy?

I will also specifically reference Table 5 in your paper (i.e. robustness to whitebox adversary). What was your criteria for choosing the models reported in this table?

Naturally trained network gives 78% test accuracy on pgd attack.

Hi,
Thank you very much for this repo. It's very helpful. I could reproduce the performance in your paper for the adversarially trained network. However, I observed that the naturally trained network has 78% accuracy on pgd attack. First, I used fetch_model.py to download naturally trained model and ran run_attack.py on attack.npy which is generated using adv_trained network. I got 78%. In case of an issue with the released model, I trained another model on only natural images from scratch using your implementation and again got 78% test accuracy on pgd adversarial images. I used the default config file. Standard test performance is around 95%. There is still a drop, but I was expecting the test performance on pgd attack to be around 3%. Am I doing something wrong?

Thank you so much in advance.

pretrained model link expired

Can anyone provide new link to download pretrained model in fetch_model.py?
I got urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable> when I run python fetch_model.py natural

The config of training robust model for CIFAR10.

In this repository, the PGD attack applied in training the robust model was using 7 steps of size 2 shown in the paper or using 10 steps of size 2 shown in the config? Thank you.

Image out of valid range for the first iteration of PGD attack

Hi,

I noticed that the image which is fed to the model to obtain the gradients for the first iteration of the PGD attack is not clipped to be in the valid image range.

Here, random noise is added to the original image and the resulting image is directly fed to the network for the first iteration without clipping.

lack of Sign function

hello,
why is there not a Sign function?

Same accuracy logged twice

Why is the accuracy logged out twice to tensorboard for the adversarial samples in https://github.com/MadryLab/cifar10_challenge/blob/master/train.py#L76. It seems that these are just duplicates.

pytorch definition of the model

Is there a pytorch definition and pytorch model weights of the architecture used for the white board leaderboard?
We would like to try our attack on your challenge but unfortunately our code is written in PyTorch.

If there is any other way, please let me know?

Matching training statistics

I tried to match the training scheme of this network and I was unable to do so using what seems to be the same parameters.

I pull a random batch for each epoch of size batch_size = 48.

Epochs 0 - 40,000 LR = 0.1
Epochs 40,000 - 60,000 LR = 0.01
Epochs 60,000 - 80,000 LR = 0.001
In this case it nearly matches the procedure shown because I have an epoch for each batch, so it takes ~937 epochs to cycle once through my training dataset.

However, the cross entropy loss on my network is near 0 by the time I move past 60k epochs, and the network is only trained on the adversarial samples.

(1) Are the momentum parameters saved after the learning rate is updated? Because it seems like a brand new optimizer is being created after the 40th epoch, and after the 60th epoch.

(2) Did you experience anything like this? I am using an off the shelf WideResNet 30 from here.

My training set is 45k images, and I have a validation set of 5k images. Each adversarial sample is computed using the training set. I am able to get ~100% accuracy on the natural images and around 100% on the adversarial trained images but only 80% on the natural testing images.