Adversarial Distributional Training

This repository contains the code for adversarial distributional training (ADT) introduced in the following paper

Adversarial Distributional Training for Robust Deep Learning (NeurIPS 2020)

Yinpeng Dong*, Zhijie Deng*, Tianyu Pang, Hang Su, and Jun Zhu (* indicates equal contribution)

Citation

If you find our methods useful, please consider citing:

@inproceedings{dong2020adversarial,
  title={Adversarial Distributional Training for Robust Deep Learning},
  author={Dong, Yinpeng and Deng, Zhijie and Pang, Tianyu and Su, Hang and Zhu, Jun},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}

Introduction

Adversarial distribution training (ADT) is a new framework to train robust deep learning models. It is formulated as a minimax optimization problem, in which the inner maximization aims to find an adversarial distribution for each natural input to characterize potential adversarial examples; and the outer minimization aims to optimize DNN parameters with the worst-case adversarial distributions.

In this paper, we proposed three different approaches to parameterize the adversarial distributions, as illustrated below.

Figure 1: An illustration of three different ADT methods, including (a) ADT_EXP; (b) ADT_EXP-AM; (c) ADT_IMP-AM.

Prerequisites

Python (3.6.8)
Pytorch (1.3.0)
torchvision (0.4.1)
numpy

Training

We have proposed three different methods for ADT. The command for each training method is specified below.

Training ADT_EXP

python adt_exp.py --model-dir adt-exp --dataset cifar10 (or cifar100/svhn)

Training ADT_EXP-AM

python adt_expam.py --model-dir adt-expam --dataset cifar10 (or cifar100/svhn)

Training ADT_IMP-AM

python adt_impam.py --model-dir adt-impam --dataset cifar10 (or cifar100/svhn)

The checkpoints will be saved at each model folder.

Evaluation

Evaluation under White-box Attacks

For FGSM attack, run

python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method FGSM --dataset cifar10 (or cifar100/svhn)

For PGD attack, run

python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method PGD --num-steps 20 (or 100) --dataset cifar10 (or cifar100/svhn)

For MIM attack, run

python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method MIM --num-steps 20 --dataset cifar10 (or cifar100/svhn)

For C&W attack, run

python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method CW --num-steps 30 --dataset cifar10 (or cifar100/svhn)

For FeaAttack, run

python feature_attack.py --model-path ${MODEL-PATH} --dataset cifar10 (or cifar100/svhn)

Evaluation under Transfer-based Black-box Attacks

First change the --white-box-attack argument in evaluate_attacks.py to False. Then run

python evaluate_attacks.py --source-model-path ${SOURCE-MODEL-PATH} --target-model-path ${TARGET-MODEL-PATH} --attack-method PGD (or MIM)

Evaluation under SPSA

python spsa.py --model-path ${MODEL-PATH} --samples_per_draw 256 (or 512/1024/2048)

Pretrained Models

We have provided the pre-trained models on CIFAR-10, whose performance is reported in Table 1. They can be downloaded at

Contact

Yinpeng Dong: [email protected]

Zhijie Deng: [email protected]

dongyp13 / adversarial-distributional-training Goto Github PK