Git Product home page Git Product logo

backdoors101's Introduction

Backdoors 101

drawing

Backdoors 101 — is a PyTorch framework for state-of-the-art backdoor defenses and attacks on deep learning models. It includes real-world datasets, centralized and federated learning, and supports various attack vectors. The code is mostly based on "Blind Backdoors in Deep Learning Models (USENIX'21)" and "How To Backdoor Federated Learning (AISTATS'20)" papers, but we always look for incorporating newer results.

If you have a new defense or attack, let us know (raise an issue or send an email), happy to help porting it. If you are doing research on backdoors and want some assistance don't hesitate to ask questions.

Table of contents

Current status

We try to incorporate new attacks and defenses as well as to extend the supported datasets and tasks. Here is the high-level overview of the possible attack vectors:

drawing

Backdoors

  • Pixel-pattern (incl. single-pixel) - traditional pixel modification attacks.
  • Physical - attacks that are triggered by physical objects.
  • Semantic backdoors - attacks that don't modify the input (e.g. react on features already present in the scene).

TODO clean-label (good place to contribute).

Injection methods

  • Data poisoning - adds backdoors into the dataset.
  • Batch poisoning - injects backdoor samples directly into the batch during training.
  • Loss poisoning - modifies the loss value during training (supports dynamic loss balancing, see Sec 3.4 )

TODO: model poisoning (good place to contribute!).

Datasets

  • Image Classification - ImageNet, CIFAR-10, Pipa face identification, MultiMNIST, MNIST.
  • Text - IMDB reviews datasets, Reddit (coming)

TODO: Face recognition, eg Celeba or VGG. We already have some code, but need expertise on producing good models (good place to contribute!).

Defenses

  • Input perturbation - NeuralCleanse + added evasion.
  • Model anomalies - SentiNet + added evasion.
  • Spectral clustering / fine-pruning + added evasion.

TODO: Port Jupyter notebooks demonstrating defenses and evasions. Add new defenses and evasions (good place to contribute!).

Training regimes

  • Centralized training.
  • Differentially private / gradient shaping training.
  • Federated Learning (CIFAR-10 only).

Basics

First, we want to give some background on backdoor attacks, note that our definition is inclusive of many other definitions stated before and supports all the new attacks (e.g. clean-label, feature-mix, semantic).

  1. Deep Learning. We focus on supervised learning setting where our goal is to learn some task m: X -> Y (we call it a main task) on some domain of inputs X and labels Y. A model θ for task m is trained on tuples (x,y) ∈ (X,Y) using some loss criterion L (e.g. cross-entropy): L(θ(x), y).

  2. Backdoor definition. A backdoor introduces malicious behavior m* additional to the main behavior m the model is trained for. Therefore, we state that a backdoor attack is essentially a multi-task setting with two or more tasks: main task m and backdoor task m*, and if needed evasion tasks mev . The model trained for two tasks will exhibit both normal and backdoor behavior.

  3. Backdoor data. In order to introduce a backdoor task m*: X* -> Y* the model has to be trained on a different domain of backdoor inputs and labels: (X*, Y*). Intuitively we can differentiate that the backdoor domain X* contains inputs that contain backdoor features. The main domain X might also include backdoor inputs, i.e. when backdoors are naturally occurring features. However, note that the input domain X* should not prevail in the main task domain X, e.g. X \ X* ≈ 0, otherwise two tasks will collude.

  4. Backdoor feature. Initially, a backdoor trigger was defined as a pixel pattern, therefore clearly separating the backdoor domain X* from the main domain X. However, recent works on semantic backdoors, edge-case backdoors and physical backdoors allow the backdoor feature to be a part of the unmodified input (ie. a particular model of a car or an airplane that will be misclassified as birds).

    We propose to use synthesizers that transform non -backdoored inputs to contain backdoor features and create backdoor labels. For example in image backdoors. The input synthesizer can simply insert a pixel pattern on top of an image, perform more complex transformations, or substitute the image with a backdoored image (edge-case backdoors).

  5. Complex backdoors. A domain of backdoor labels Y* can contain many labels. This setting is different from all other backdoor attacks, where the presence of a backdoor feature would always result in a specific label. However, our setting allows a new richer set of attacks for example a model trained on a task to count people in the image might contain a backdoor task to identify particular individuals.

drawing

  1. Supporting multiple backdoors. Our definition enables multiple backdoor tasks. As a toy example we can attack a model that recognizes a two -digit number and inject two new backdoor tasks: one that sums up digits and another one that multiplies them.

drawing

  1. Methods to inject backdoor task. Depending on a selected threat model the attack can inject backdoors by poisoning the training dataset, directly mixing backdoor inputs into a training batch, altering loss functions, or modifying model weights. Our framework supports all these methods, but primarily focuses on injecting backdoors by adding a special loss value. We also utilize Multiple Gradient Descent Algorithm (MGDA) to efficiently balance multiple losses.

Installation

Now, let's configure the system:

  • Install all dependencies: pip install -r requirements.txt.
  • Create two directories: runs for Tensorboard graphs and saved_models to store results.
  • Startup Tensorboard: tensorboard --logdir=runs/.

Next, let's run some basic attack on MNIST dataset. We use YAML files to configure the attacks. For MNIST attack, please refer to the configs /mnist_params.yaml file. For the full set of available parameters see the dataclass Parameters. Let's start the training:

python training.py --name mnist --params configs/mnist_params.yaml --commit none

Argument name specifies Tensorboard name and commit just records the commit id into a log file for reproducibility.

Repeating Experiments

For imagenet experiments you can use imagenet_params.yaml.

python training.py --name imagenet --params configs/imagenet_params.yaml --commit none

For NLP experiments we also created a repo with backdoored transformers.

This is the commit.

To run NLP experiment just run this script.

Structure

Our framework includes a training file training.py that heavily relies on a Helper object storing all the necessary objects for training. The helper object contains the main Task that stores models, datasets, optimizers, and other parameters for the training. Another object Attack contains synthesizers and performs loss computation for multiple tasks.

Citation

@inproceedings {bagdasaryan2020blind,
 author = {Eugene Bagdasaryan and Vitaly Shmatikov},
 title = {Blind Backdoors in Deep Learning Models},
 booktitle = {30th {USENIX} Security Symposium ({USENIX} Security 21)},
 year = {2021},
 isbn = {978-1-939133-24-3},
 pages = {1505--1521},
 url = {https://www.usenix.org/conference/usenixsecurity21/presentation/bagdasaryan},
 publisher = {{USENIX} Association},
 month = aug,
}

backdoors101's People

Contributors

davidhidde avatar dependabot[bot] avatar ebagdasa avatar phil0042 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

backdoors101's Issues

Questions Regarding the code Implementation

Hi, thanks for the code!

I have some questions regarding the code implementation.

  1. In the line 119 of attack.py file, I think the purpose here is to scale the local update of a compromised client so that the local update can replace the global model as described in the equation (3) of the paper How To Backdoor Federated Learning. In the implementation, the scaling factor is set to self.params.fl_weight_scale. And in the config file, it was set to the total number of participants, However, I think this is not correct as it does not take the parameter fl_eta (server-side stepsize) into account, which is used in here to perform global weight update. Also, I think it ignores the fact that the training protocol allows partial participation as implied by this line here. From what I have in mind, the scaling factor should be num_of_participants_at_the_attacked_round / fl_eta.

  2. In the model simple.py, a F.log_softmax is applied. But later, the attack uses the the nn.CrossEntropyLoss, which ends with "normalizing" the neural net's output twice. This seems to be a bit weird to me. Is there any specific reason for this?

Thank you!

Running FL

Hi! Please what does the eta and fl_weight_scale stand for in the Federated Learning setup? Thank you!

can't get a clear result

I'm new to this study.So I want to recurrence your work,but aftter the end of 'python training.py --name mnist --params configs/mnist_params.yaml --commit none ', I can't get a clear result.
i can see some of the processes while the program is running.But there is no logs in runs/ or saved_models/ . and ' No scalar data was found. ' in tensorboard.
like thouse:
2022-11-26 22:03:54 - WARNING - Backdoor True . Epoch: 349. Accuracy: Top-1: 100.00 | Loss: value: 0.00
0it [00:00, ?it/s]2022-11-26 22:03:54 - INFO - Epoch: 350. Batch: 0/938. Losses: ['backdoor: 0.00', 'normal: 0.00', 'total: 0.00']. Scales: ['backdoor: 0.25', 'normal: 0.75']
99it [00:03, 28.02it/s]2022-11-26 22:03:58 - INFO - Epoch: 350. Batch: 100/938. Losses: ['backdoor: 0.00', 'normal: 0.00', 'total: 0.00']. Scales: ['backdoor: 0.23', 'normal: 0.77']
197it [00:07, 28.73it/s]2022-11-26 22:04:01 - INFO - Epoch: 350. Batch: 200/938. Losses: ['backdoor: 0.00', 'normal: 0.00', 'total: 0.00']. Scales: ['backdoor: 0.21', 'normal: 0.79']

How do you measure the effectiveness of the attack?

Hi there, I would like to ask how do you measure the effectiveness of the attack? For instance, I tried to launch a pixel pattern attack on CIFAR-10 via the code. From the paper "Blind Backdoors in Deep Learning Models", I saw that there is a main-task accuracy and backdoor-task accuracy measure as shown below
image

Is it possible to produce these results via the code? If so, how do I proceed? If not, what are other measures to measure the effectiveness of an attack?

AttributeError: 'NoneType' object has no attribute 'to'

When i try to run training with ‘python training.py --name mnist --params configs/mnist_params.yaml --commit none’, the following error occurs:

Traceback (most recent call last):
File "training.py", line 119, in
helper = Helper(params)
File "D:\lab\backdoors101\helper.py", line 40, in init
self.make_task()
File "D:\lab\backdoors101\helper.py", line 64, in make_task
self.task = task_class(self.params)
File "D:\lab\backdoors101\tasks\task.py", line 43, in init
self.init_task()
File "D:\lab\backdoors101\tasks\task.py", line 49, in init_task
self.model = self.model.to(self.params.device)
AttributeError: 'NoneType' object has no attribute 'to'

Then I find that the function build_model() in class Task is 'NotImplemented'. Does it mean that i have to make some changes to the code before i use 'python training.py --name mnist --params configs/mnist_params.yaml --commit none'?

about the PIPA dataset

Hello, I am preparing for my graduate, which aims at Person Recognition.
However, I failed to find the PIPA dataset in the Internet, since the pulic link to the dataset has gone.

Could you share the PIPA? Thanks very much in advance.
Looking for your reply.

pip install failing

There are multiples issues with installing the version of the packages

  1. numpy~=1.18.4 : Throws error: subprocess-exited-with-error
  2. torch, torchtext versions missing, or is it because of a different Python version. It throws this: ERROR: Ignored the following versions that require a different python version: 0.7 Requires-Python >=3.6, <3.7; 0.8 Requires-Python >=3.6, <3.7 ERROR: Could not find a version that satisfies the requirement torchtext~=0.7.0 (from versions: 0.1.1, 0.2.0, 0.2.1, 0.2.3, 0.3.1, 0.4.0, 0.5.0, 0.6.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.11.1, 0.11.2, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.1, 0.15.2) ERROR: No matching distribution found for torchtext~=0.7.0
    Is updating the functions according to the recent versions of the libraries/packages used the only way to go forward?

Where can you find the dataset for training of model?

Hi there, I was wondering where are the datasets (e.g. CIFAR-10) stored? As I am trying to launch a backdoor attack with image-scaling, how or where can I store my own images? After training with the poisoned images, a model will be saved into the saved_models folder as shown here:
image

From here, how should I proceed to test whether the attack is successful?

I am sorry for these questions as I am still a beginner in machine learning.

Questions regarding evading Neural Cleanse

Hi,

Thanks for sharing the code.

I am trying to reproduce the results in the USENIX paper Blind Backdoors in Deep Learning Models that evade the Neural Cleanse defense. I am using the MNIST dataset. I assume if I uncomment the line "- neural_cleanse" in "loss_tasks" in configs/mnist_params.yaml, this should be the same loss function as the one described in Section 6.1 in the paper. Correct me if this is not the case.

So I train a model using the above setting, which is supposed to evade the detection by Neural Cleanse. However, when I use Neural Cleanse to scan this trained model, I get an anomaly index larger than 2, which means the trained model is still considered to be backdoored.

Is there anything not configured properly? Would you be able to take a look? I'd really appreciate it.

General questions regarding the framework

Hi,

I'm researching defenses against the blind backdoor attack. I have a couple of questions regarding the backdoors 101 framework w.r.t. defenses:

  • I can't seem to find the implementations of the defenses (NC and SentiNet) mentioned in the Readme. Are these implemented and if so: how are they implemented and how could I add new defenses myself?

  • Are the backdoor tasks that change the task of the model (MultiMNIST addition, MultiMNIST multiply) also implemented?

  • Does the framework provide anything to evaluate the models retrieved from the training process?

Thanks for your help in advance. If possible, I will contribute some defenses after my research is done.

Questions about the low benign accuracy on CIFAR-10 and GTSRB dataset of Blind Backdoor

Hi, Eugene Bagdasaryan,

Congratulations on the acceptance of your paper `Blind Backdoors in Deep Learning Models' and thanks for the sharing of its codes.

However, when we run your code on CIFAR-10 dataset and GTSRB dataset, we get a very low benign accuracy (CIFAR: BA: 18.24, ASR: 98.64; GTSRB: BA: 5.7, ASR: 100) with the default settings in your codes. (PS: we get satisified results on MNIST (BA: 98.86, ASR: 99.99)). We are not for sure where the problems are or whether you used different settings in the experiments of your paper. Can you kindly help us for this problem?

Besides, we also reproduce your codes in our open-sourced toolbox (https://github.com/THUYimingLi/BackdoorBox/blob/main/core/attacks/Blind.py) based on your codes and we meet the same problem. I would be very grateful if you can also help us to check our reproduced codes.

Best Regard,
Yiming Li

Question about parameter fl_eta in cifar_fed.yaml

Hi @ebagdasa,

Thanks for sharing code.

I am trying to run cifar_fed with command,

    python training.py --name cifar --params configs/cifar_fed.yaml --commit none

I am a little confused about the parameter fl_eta.

In function, run_fl_round (training.py) , the variable, round_participants,

    round_participants = hlpr.task.sample_users_for_round(epoch)

uses parameter fl_no_models (cifar_fed.yaml) to decide the number of users updating weights to server, for example 10 in cifar_fed.yaml.

Then, the code

    hlpr.task.update_global_model(weight_accumulator, global_model)

calls the function update_global_model (fl_task.py).

In function update_global_model (fl_tas.py),

    def update_global_model(self, weight_accumulator, global_model: Module):
        for name, sum_update in weight_accumulator.items():
            if self.check_ignored_weights(name):
                continue
            scale = self.params.fl_eta / self.params.fl_total_participants
            average_update = scale * sum_update
            self.dp_add_noise(average_update)
            model_weight = global_model.state_dict()[name]
            model_weight.add_(average_update)

the sum_update is the sum of all users' weights, which is supposed to be divided by the value of fl_no_model. In the code, however, you use variables scale

    scale = self.params.fl_eta / self.params.fl_total_participants
    average_update = scale * sum_update

to process the sum_update. I didn't find any explains of this logic in papers or any comments in the code.

I wonder would you mind giving more details about the usage of fl_eta?
My questions are,
1. Why sum_update doesn't divide fl_no_model?
2. What is the meaning of self.params.fl_eta / self.params.fl_total_participants?
3. How should I set the fl_eta, if I trying to increase the value of fl_no_model?

Thanks

Bug in save_model function

Hi,
the save_model function does not properly save the best checkpoint. The reason being the following two lines of code.

self.best_loss = float('inf')

if val_loss < self.best_loss:

During training, save_model is called and loss_val contains the accuracy of the current iteration on the test set, not the loss value.

Fix:
Change the initial value of self.best_loss and modify the comparison (maybe rename self.best_loss and val_loss as well).
self.best_loss = float(0) and if val_loss >= self.best_loss:

Problem saving results into "runs" and "saved_models"

Hi there,

As I am a beginner on Federated Learning and its backdoor attacks, may I check how do I view the training results on tensorboard? Nothing shows on the tensorboard.
image

Even when I aborted the training, it shows the error "Aborted training. No output generated". I have created the folders "runs" and "saved_models" as mentioned in the instructions.
image

Questions about low accuracy of Test_backdoor_True

Hey, Eugene Bagdasaryan.
Thanks a lot for the sharing of the codes of "How To Backdoor Federated Learning".

But I met some problems when I was trying to run cifar_fed with the default settings in your codes with:

python training.py --name cifar10 --params configs/cifar_fed.yaml

I got very low accuracy of Test_backdoor_True.

image

I'd really appreciate it if you could tell me why.
Thanks a lot.

Enquiries about the attacks

Does the fact that the function synthesizes_inputs is not implemented mean that all of the attacks in the paper are still not implemented ? Or only batch poisoning ?

test_loader is NoneType Object

when I run the training.py, I got this error, then I check the task.py file, the test_loader was initalized None. How can I solve it?
image

Question regarding federated experiment with multiple GPUs on one node (machine)

Dear authors, thank you very much for your nice work! From the two papers you wrote, I found some of your experiments were run on either 2 or 4 Titan X GPUs. I was wondering if some experiments in this repo, e.g., cifar federated, can run on multiple GPUs as well? Could you please point me to the point where this is achieved in the code (I couldn't find any code related to torch.dp or torch.ddp)? Many thanks!

I can't download dateset

when I run "python training.py --name mnist --params configs/mnist_params.yaml --commit none"
my ternimal will says" urllib.error.HTTPError: HTTP Error 503: Service Unavailable"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.