bolunwang / backdoor Goto Github PK

Code implementation of the paper "Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks", at IEEE Security and Privacy 2019.

Home Page: https://sandlab.cs.uchicago.edu/

License: MIT License

Python 100.00%

python keras deep-learning backdoor-attacks trojan security

backdoor's People

Stargazers

Watchers

backdoor's Issues

Is this detection method a white-box setting?

Hello, I have learned your work on backdoor detection of neural network. I want to confirm if your method are under white-box condition(because the mask and reversed trigger generation require the gradient of the model). But there is a paper says that your method are under black-box condition (https://arxiv.org/pdf/2007.10760.pdf Page. 22, Table II). I think the authors are wrong, what do you think?

Reverse-engineered triggers: can you share them?

Can you please share the three image files of the reverse-engineered triggers of the following three models: GTRSB-model and VGG-Face model (square trigger) and VGG-Face model (watermark trigger).

That would be very helpful to replicate your experiments.

Thanks,

where is implementation on partial backdoor attack?

Thank you for your work on backdoor attacks!
In your paper, you mentioned that you have conducted experiments on partial backdoor attack. I've also noticed that in issue "Adaptation for partial backdoor attack #9", someone succeeded in implementing partial backdoor attack.
However, I didn't figure out where I can make partial backdoor attack settings, and where the codes are to implement partial backdoor attack. In "gtsrb_injection_example.py", it looks like that you just choose pictures from arbitrary label (with injection ratio), and convert them to target label. I couldn't found where I can mark source label. Could you help me figure out where I can make this setting?
Thanks a lot!

The reversed mask of the targetted label can't converge on MNIST dataset

My classmate and I changed the parameters in visualizer.py and gtsrb_visualize_example.py, which is MNIST_visualize_example.py in our case, to reproduce the reverse engineering process on MNIST dataset. When it started to reverse the triggers, the mask of the targeted label cannot converge, so that it cannot tell the properly targeted label.

Here is the result of mad outlier detection. The targetted label is 5. The trigger is a little white square on the right bottom.

Here are the results of the reversed masks of each label.

Here are the results of the reversed masks of label 5 in each step.

Here are our parameters in MNIST_visualize_example.py

# # input size
IMG_ROWS = 28
IMG_COLS = 28
IMG_COLOR = 1
IMAGE_SHAPE = (IMG_ROWS, IMG_COLS, IMG_COLOR)
mad_outlier_detection
#
CLASSES_ALL = 10  # total number of classes in the model
Y_TARGET = 5  # (optional) infected target label, used for prioritizing label scanning

PREPROCESS = 'mnist'  # preprocessing method for the task, GTSRB uses raw pixel intensities

# parameters for optimization
BATCH_SIZE = 32  # batch size used for optimization
LR = 0.1  # learning rate
STEPS = 1000  # total optimization iterations
NB_SAMPLE = 1000  # number of samples in each mini batch
MINI_BATCH = NB_SAMPLE // BATCH_SIZE  # mini batch size used for early stop
INIT_COST = 2e-3  # initial weight used for balancing two objectives

We have tried to adjust the other parameters, too. And we found that when LR was smaller and INIT_COST was a bit larger, like 2e-3, the result would be a bit closer to the proper targetted label .

As for detailed model construction file

Would you please share the model construction file in here?

trigger pattern mentioned in the paper

Hi,

Is the trigger pattern ∆ mentioned in the paper predefined?

Detection Ineffective on MNIST model

Dear Bolun,
Thanks so much for sharing your code.

I trained a trojaned and a cleaned MNIST model based on the settings of BadNet paper. I have tested the attacked trojan accuracy and regular test accuracy both larger than 98%. However, when I tried to detect the trojan using the code, the computed anomaly index for the trojaned model is 2.6 and for the cleaned model is 3.6. I am wondering if you have any ideas about what might be the possible cause.

y_true and y_pred position

Hi, I'm curious that why the position of y_true and y_pred get exchange? Is this a mistake or it is done on purpose? Can you please explain why? Thanks!

backdoor/visualizer.py

Line 233 in e857774

self.loss_acc = categorical_accuracy(output_tensor, y_true_tensor)

the code

Watermark pattern might be incorrect

Hello,

Can you please double-check the image: vggface_watermark_pattern.png.

It seems to me that this is not the watermark pattern (according to your paper).

If so, can you please upload the correct one?

Best,

URL does not exist

You will need to download the training data from here.
:(

About data poisoning

Hello, this is a cool job. Recently I'm reading your paper. I'm wondering can you share the file to poison the clean dataset? It can be a great help to understand your paper. Thanks a lot.

The Reverse Engineering output is not correct

Hi,sir.

I want do some experiment on mnist data/model . I used Badnet method.Here is my trigger image.

I just modify four pixel on the upper left corner . turn the value 0 to 255 . And label about 2000 trigger image to 1 which original label is 8. Add those dataset to 60000 train image.
The infect model on test dataset accuracy is around 99% . and The attack accuracy is 100%
But I used the reverse engineering's output is not correct:
this is the pixel_mnist_fusion_label_1.png:

this is the pixel_mnist_mask_label_1.png

and the output of outlier detection is:

10 labels found
median: 482.968628, MAD: 107.412963
anomaly index: 1.030401
flagged label list:
elapsed time 0.01 s

Here is my setting. Those setting is the same on train time :
` #input size
IMG_ROWS = 28
IMG_COLS = 28
IMG_COLOR = 1
INPUT_SHAPE = (IMG_ROWS, IMG_COLS, IMG_COLOR)

NUM_CLASSES = 10 # total number of classes in the model
Y_TARGET = 1 # (optional) infected target label, used for prioritizing label scanning

INTENSITY_RANGE = 'mnist' # preprocessing method for the task, GTSRB uses raw pixel intensities
`
Can you give me some advice where is wrong?
Thank you

implementation on other datasets?

I tried your released code on CIFAR10 dataset, and the results is not satisfied. The reverse-engineered trigger is not simialr to the pattern or mask -- white square. Could you release your code on other datasets mentioned in your paper?

Thanks a lot!

Information about VGGFace models is missing: can you add it?

Hello,

I have two requests:

Can you provide the code you used to reverse-engineer the VGG-Face models? It would be great if you add this code to this repo.
Can you provide the information you used to apply the pruning method over the two models: GTSRB-based model and VGG-Face model? That is, which neurons did you remove and how did you select them? The results I am getting are lower than the ones reported in the paper.

The download link of data is invaild

Could you please update a new link for download? Thanks

Adaptation for partial backdoor attack

Your excellent works is much appreciated. However i have one small question to throw.
As mentioned in the paper, you have implemented detection for partial backdoor attack, may you share the code.
My confusion is, for each target label, if i try each source-target pairs, i may eventually get several reversed triggers for each target label. Still, i can find a subset of reversed triggers that are regarded as outlier (in the case, for a target label, the number of reversed triggers is likely to be more than 1). Then when i am doing detection, the activation profile should be determined by which reversed trigger(s)?
Thanks for your kindly reply!

reg_best does not converge

I tried to use pretrained backdoor model on CIFAR10 dataset. But during visualization none of values among cost, attack, loss, ce, reg, reg_best gets updated.
Here is the snapshot:

loading dataset
X_test shape (50, 32, 32, 3)
Y_test shape (50, 10)
loading model
processing label 7
resetting state
('mask_tanh', -3.672258450327029, 3.5073656483843307)
('pattern_tanh', -3.9436355297972994, 4.010161127999756)
step: 0, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 528.047913, reg_best: inf
step: 1, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 547.919495, reg_best: inf
step: 2, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 548.243958, reg_best: inf
step: 3, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 548.243958, reg_best: inf
step: 4, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 548.243958, reg_best: inf
down cost from 0.00E+00 to 0.00E+00
step: 5, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 548.243958, reg_best: inf
step: 6, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 548.243958, reg_best: inf
step: 7, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 548.243958, reg_best: inf
step: 8, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 548.243958, reg_best: inf
step: 9, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 548.243958, reg_best: inf
down cost from 0.00E+00 to 0.00E+00
step: 10, cost: 0.00E+00, attack: 0.000, loss: 16.118097, ce: 16.118097, reg: 548.243958, reg_best: inf

bolunwang / backdoor Goto Github PK

backdoor's People

Stargazers

Watchers

Forkers

backdoor's Issues

Recommend Projects

Recommend Topics

Recommend Org