hanxunh / unlearnable-examples Goto Github PK

View Code? Open in Web Editor NEW

145.0 145.0 15.0 20.61 MB

[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Home Page: https://hanxunh.github.io/Unlearnable-Examples/

License: MIT License

Shell 15.78% Jupyter Notebook 10.25% Python 73.97%

deep-learning deep-neural-networks iclr2021 pytorch

unlearnable-examples's Introduction

Hi there, I am Hanxun Huang (Curtis) 👋

I am a research fellow at the School of Computing and Information Systems, The University of Melbourne. I completed my Ph.D. at the University of Melbourne, supervised by Prof. James Bailey, Dr. Xingjun Ma and Dr.Sarah Erfani. Prior to my PhD, I completed my Master’s at The University of Melbourne and Bachelor’s study at Purdue University.

🔭 My research mainly focus on:

Adversarial Attacks and Defenses
Robust Machine Learning
Trustworthy ML

Contact me 📧

🎓 Google Scholar.
🌐 Visit my personal website here.

Cheers 🍻

unlearnable-examples's People

Contributors

Stargazers

Watchers

Forkers

xmaadv renjie3 wagletanvi joey61liuyi misschen1 wuliliu1988 dong-0412 haole1683 saigontrade88 pengfeihepower yangkuiwu galaxyhbxy xqchen914 iberxilong

unlearnable-examples's Issues

About visualizing the results according to log file

Hi Huang,
I want to ask about how to extract the training acc_avg and eval_avg of the log file and drow the line plot. I would really appreciate it if you could provide me with the code.

Generating examples using CelebA

Hi, I'm trying to run code for CelebA dataset. Can you share an example like you shared for CIFAR10 (QuickStart.ipynb) ?

应用场景与设定变更时，该如何保证不可学效果

作者您好，首先十分感谢您的论文和代码贡献，这为我的科研提供了相当大的帮助。
However,目前我们的应用场景是训练集和测试集的设定要求一致。例如训练集随机 5%加class-wise噪声，则测试集也得随机5%加class-wise噪声；
我想请教一下作者orz，关于在上述场景下，要如何添加保护性噪声还能具有不可学效果呢？
（任何想法或建议都将是对我极大的帮助QAQ）

A problem about noise generating.

Hello author, the noise in your experiments was generated after data augmentation, but the noise was added to the clean datasets before the data augmentation. Is this not rigorous ? Thank you very much!

A problem when training model on ImageNetMini

Thanks for releasing the code.
I found that you are using cls_id to choose the sample-wise noise from perturb_tensor. Should I change cls_id to the data index of the dataloader to choose the sample-wise noise from perturb_tensor?

keyerror报错

您好，我使用pycharm无法正常运行程序，请问我用python3.6运行的时候报错KeyError: 'train_subset'，这可能是什么原因造成的呢，如果您不便回答的话可以提醒我一下这个变量的具体作用吗，麻烦您了！

Mismatch of the training data augmentation between QuickStart.ipynb and main.py

Thanks for the interesting work and the detailed code.

I may have noticed some mismatch of the training data augmentation between QuickStart.ipynb and main.py. Let's denote the clean image as x and the perturbations as noise, and the poisoned image as x'.

In QuickStart.ipynb,

unlearnable_train_dataset = datasets.CIFAR10(root='../datasets', train=True, download=True, transform=train_transform)
perturb_noise = noise.mul(255).clamp(0, 255).permute(0, 2, 3, 1).to('cpu').numpy()
unlearnable_train_dataset.data = unlearnable_train_dataset.data.astype(np.float32)
for i in range(len(unlearnable_train_dataset)):
unlearnable_train_dataset.data[i] += perturb_noise[i]
unlearnable_train_dataset.data[i] = np.clip(unlearnable_train_dataset.data[i], a_min=0, a_max=255)_

, which means that x' = train_transform(x)+noise.

However, in the main.py, we can see that the input to the training process is the PoisonCIFAR10, which has been defined in line 376 of dataset.py. There, if I understand correctly, the construction of PoisonCIFAR10 by adding CIFAR10 and the perturbation.pt instead follows x' = train_transform(x+noise)

Could you please confirm if my understanding is correct? If so, which version have you used for generating the results in your paper?

By the way, I also notice that for generating the perturbations via perturbation.py, no train_transform has been used at all, could you please explain why it is the case?

Several questions about this article

Hi, I'm a new one studying on adversarial examples, here I'd like to ask you for some questions.
Q1: Is your scheme based on data poisoning?

Q2: About the formula(2), it is said: "Note that the above bi-level optimization has two components that optimize the same objective. In order to find effective noise δ and unlearnable examples, the optimization steps for θ should be limited, compared to standard or adversarial training. Specifically, we optimize δ over Dc after every M steps of optimization of θ."
Why optimize δ over Dc after every M steps of optimization of θ can help to find effective noise δ ? Does this strategy only work when the two min have a same objective?

Q3: In section4.1, it is said:" However, in the sample-wise case, every sample has a different noise, and there is no explicit correlation between the noise and the label. In this case, only low-error samples can be ignored by the model, and normal and high-error examples have more positive impact on model learning than low-error examples. This makes error-minimizing noise more generic and effective in making data unlearnable."
I know there is no explicit correlation between the noise in the sample-wise case. But why this makes error-minimizing noise more generic and effective in making data unlearnable? What does it mean?

looking forward for your reply ! Thanks !

Some questions about face recognition poisoning attack

I am so interested in your poisoning attack paper that I regret not reading this article earlier. But I am confused by the face recognition poisoning attack. "100 identities from CelebA as the small dataset are used to generate the unlearnable images for 50 identities of WebFace" in your paper. I think that generates 100 perturbation noise images, and how do put them on the 50 identities? Why not generate 50 noise images by themselves just like CIFAR10 poisoning attack? This is my confusion, thank you for your answer.

bugs when generating sample wise perturbation

When I run perturbation there appears to be an error above. How can I solve it?

Two problems in training code of ImageNetMini

In line 173 of main.py, 20% of the PoisonImageNetMini were selected as the training data but with "shuffle=True", then how could you make sure these data are exactly the same 20% of the data that have been used for generating the noise?
I also find the same problem about the sample-wise noise as the one mentioned in #5.
in line 634 of dataset.py:

if self.poison_samples[index]:
            noise = self.perturb_tensor[target]
            sample = sample + noise
            sample = np.clip(sample, 0, 255)

so by using 'target' as the index, the perturb_tensor[target] will only select one of the top '0~99' components of the perturb_tensor, and add the same noise to all samples that are from the same [target] class. In this way, it definitely does it wrong but can lead to good results because it is doing class-wise noise.

Questions about training casia-webface dataset

I'm using your InceptionResnet.yaml to train clean casia-webface, but got 0% acc, and 50%~55% acc when only using 50 num_classes of the same casia-webface dataset. Is the resullts reasonable?

Some questions about training Inception-ResNet

Hi, thanks for open source. I am very interested in your paper. I have some questions. When you use the WebFace dataset to train the Inception-ResNet network, how are the training set and test set divided?

Can you share your experience with fast-autoaugment

Hello,

Thanks you for sharing your good work. You showed in the Appendix, Table 3 that your class-wise and sample-wise noise is unstable when the model is trained with Fast AutoAugment. In the following question under the official implementation of the fast-autoaugment, there is some complaint about the incompatibility between RayX and Slurm. Can you help share how you run the fast-auto augment for your unlearnable example? Many thanks.

kakaobrain/fast-autoaugment#27

KeyError: 'train_subset'

Hi, thanks for sharing your codes!
I was able to run perturbation.py and main.py in the Sample-wise noise for unlearnable example on CIFAR-10 section. However, when I try to run perturbation.py in the Class-wise noise for unlearnable example on CIFAR-10 section, it raises the following error:

Traceback (most recent call last):
  File "perturbation.py", line 483, in <module>
    main()
  File "perturbation.py", line 469, in main
    noise = universal_perturbation(noise_generator, trainer, evaluator, model, criterion, optimizer, scheduler, random_noise, ENV)
  File "perturbation.py", line 191, in universal_perturbation
    for i, (images, labels) in tqdm(enumerate(data_loader[args.universal_train_target]), total=len(data_loader[args.universal_train_target])):
KeyError: 'train_subset'

Thus, I add an option --universal_train_target train_dataset to fix this error. Is this right to get the class-wise perturbation?
BTW, there are two typos (--perturb_type samplewse=>--perturb_type samplewise) in README.md.

Why use custom models? Cannot reproduce with torchvision model

Hi, thank you for sharing your code!

Is there any specific reason you chose to build your own models than using the models provided by torchvision?

I am trying to reproduce the results in the quickstart notebook with a clean, default resnet-18 (torchvision.models.resnet18()), leaving every other code unchanged. It generates the error-minimizing noise normally. But on the training stage, it produces accuracies far higher (50%) than reported in the paper and in the notebook (Screenshot below). Also, when using your code to visualize the noise (the cell "Visualize Clean Images, Error-Minimizing Noise, Unlearnable Images"), it produces a black image in-place of the noise.

However, when using your provided Resnet-18 model, I can reproduce your notebook's results, but generating the noise is far slower than using the torchvision resnet-18 (almost 2hrs on yours vs 20 minutes on torch model) .

Inspecting your resnet code, I don't see any specific component that would purposefully slow it down. I did have to remove import mlconfig in your model's init and associated references to it because it seems to not be part of your package, and I was getting an import error otherwise

Here is the training accuracy (on unlearnable train dataset) and test accuracy (on clean test dataset) on a torchvision Resnet-18

A problem with bi-level optimization in the article

In the article, the author says "In order to find effective noise delta and unlearnable examples, the optimization steps for theta should be
limited, compared to standard or adversarial training ".

What is the meaning of using bi-level optimization? In traditional adversarial attack, most works are focusing on optimized on data but not on model.
Why the limit on optimization step on model can help finding effective noises as in the article?

I'm a new one studying on adversarial examples and I'm very interested in your brillant work, so my questions may sounds 'silly', but I'm really looking forward for your reply ! Thanks !

Here's my email [email protected]

关于噪声处理的问题？

训练生成的噪声的数值是有正有负的，下面这个操作把负的值都变成0了，这种做法不会有问题吗？不应该是噪声加图像再clamp到0-255？
self.perturb_tensor.mul(255).clamp_(0, 255).permute(0, 2, 3, 1).to('cpu').numpy()

About the plot setting in the paper

Hi Huang,
I read this paper of yours and found your setup of the lines of the line graph to be very aesthetically pleasing. I get confused when I draw multiple lines and cross them multiple times. Can you share your plt setup at that time?
Much appreciated.