hanxunh / unlearnable-examples Goto Github PK
View Code? Open in Web Editor NEW[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable
Home Page: https://hanxunh.github.io/Unlearnable-Examples/
License: MIT License
[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable
Home Page: https://hanxunh.github.io/Unlearnable-Examples/
License: MIT License
Hi Huang,
I read this paper of yours and found your setup of the lines of the line graph to be very aesthetically pleasing. I get confused when I draw multiple lines and cross them multiple times. Can you share your plt setup at that time?
Much appreciated.
Hi, thanks for open source. I am very interested in your paper. I have some questions. When you use the WebFace dataset to train the Inception-ResNet network, how are the training set and test set divided?
作者您好,首先十分感谢您的论文和代码贡献,这为我的科研提供了相当大的帮助。
However,目前我们的应用场景是训练集 和 测试集 的设定要求一致。例如 训练集随机 5%加class-wise噪声,则测试集也得随机5%加class-wise噪声;
我想请教一下作者orz,关于在上述场景下,要如何添加 保护性噪声 还能 具有 不可学效果 呢?
(任何想法或建议都将是对我极大的帮助QAQ)
I'm using your InceptionResnet.yaml to train clean casia-webface, but got 0% acc, and 50%~55% acc when only using 50 num_classes of the same casia-webface dataset. Is the resullts reasonable?
In the article, the author says "In order to find effective noise delta and unlearnable examples, the optimization steps for theta should be
limited, compared to standard or adversarial training ".
I'm a new one studying on adversarial examples and I'm very interested in your brillant work, so my questions may sounds 'silly', but I'm really looking forward for your reply ! Thanks !
Here's my email [email protected]
Hello author, the noise in your experiments was generated after data augmentation, but the noise was added to the clean datasets before the data augmentation. Is this not rigorous ? Thank you very much!
Thanks for the interesting work and the detailed code.
In QuickStart.ipynb,
unlearnable_train_dataset = datasets.CIFAR10(root='../datasets', train=True, download=True, transform=train_transform)
perturb_noise = noise.mul(255).clamp(0, 255).permute(0, 2, 3, 1).to('cpu').numpy()
unlearnable_train_dataset.data = unlearnable_train_dataset.data.astype(np.float32)
for i in range(len(unlearnable_train_dataset)):
unlearnable_train_dataset.data[i] += perturb_noise[i]
unlearnable_train_dataset.data[i] = np.clip(unlearnable_train_dataset.data[i], a_min=0, a_max=255)_
, which means that x' = train_transform(x)+noise.
However, in the main.py, we can see that the input to the training process is the PoisonCIFAR10, which has been defined in line 376 of dataset.py. There, if I understand correctly, the construction of PoisonCIFAR10 by adding CIFAR10 and the perturbation.pt instead follows x' = train_transform(x+noise)
Could you please confirm if my understanding is correct? If so, which version have you used for generating the results in your paper?
Hi, I'm trying to run code for CelebA dataset. Can you share an example like you shared for CIFAR10 (QuickStart.ipynb) ?
Hi, thank you for sharing your code!
Is there any specific reason you chose to build your own models than using the models provided by torchvision?
I am trying to reproduce the results in the quickstart notebook with a clean, default resnet-18 (torchvision.models.resnet18()
), leaving every other code unchanged. It generates the error-minimizing noise normally. But on the training stage, it produces accuracies far higher (50%) than reported in the paper and in the notebook (Screenshot below). Also, when using your code to visualize the noise (the cell "Visualize Clean Images, Error-Minimizing Noise, Unlearnable Images"), it produces a black image in-place of the noise.
However, when using your provided Resnet-18 model, I can reproduce your notebook's results, but generating the noise is far slower than using the torchvision resnet-18 (almost 2hrs on yours vs 20 minutes on torch model) .
Inspecting your resnet code, I don't see any specific component that would purposefully slow it down. I did have to remove import mlconfig
in your model's init and associated references to it because it seems to not be part of your package, and I was getting an import error otherwise
Here is the training accuracy (on unlearnable train dataset) and test accuracy (on clean test dataset) on a torchvision Resnet-18
I am so interested in your poisoning attack paper that I regret not reading this article earlier. But I am confused by the face recognition poisoning attack. "100 identities from CelebA as the small dataset are used to generate the unlearnable images for 50 identities of WebFace" in your paper. I think that generates 100 perturbation noise images, and how do put them on the 50 identities? Why not generate 50 noise images by themselves just like CIFAR10 poisoning attack? This is my confusion, thank you for your answer.
训练生成的噪声的数值是有正有负的,下面这个操作把负的值都变成0了,这种做法不会有问题吗?不应该是噪声加图像再clamp到0-255?
self.perturb_tensor.mul(255).clamp_(0, 255).permute(0, 2, 3, 1).to('cpu').numpy()
In line 173 of main.py, 20% of the PoisonImageNetMini were selected as the training data but with "shuffle=True", then how could you make sure these data are exactly the same 20% of the data that have been used for generating the noise?
I also find the same problem about the sample-wise noise as the one mentioned in #5.
in line 634 of dataset.py:
if self.poison_samples[index]:
noise = self.perturb_tensor[target]
sample = sample + noise
sample = np.clip(sample, 0, 255)
so by using 'target' as the index, the perturb_tensor[target] will only select one of the top '0~99' components of the perturb_tensor, and add the same noise to all samples that are from the same [target] class. In this way, it definitely does it wrong but can lead to good results because it is doing class-wise noise.
Hi, I'm a new one studying on adversarial examples, here I'd like to ask you for some questions.
Q1: Is your scheme based on data poisoning?
Q2: About the formula(2), it is said: "Note that the above bi-level optimization has two components that optimize the same objective. In order to find effective noise δ and unlearnable examples, the optimization steps for θ should be limited, compared to standard or adversarial training. Specifically, we optimize δ over Dc after every M steps of optimization of θ."
Why optimize δ over Dc after every M steps of optimization of θ can help to find effective noise δ ? Does this strategy only work when the two min have a same objective?
Q3: In section4.1, it is said:" However, in the sample-wise case, every sample has a different noise, and there is no explicit correlation between the noise and the label. In this case, only low-error samples can be ignored by the model, and normal and high-error examples have more positive impact on model learning than low-error examples. This makes error-minimizing noise more generic and effective in making data unlearnable."
I know there is no explicit correlation between the noise in the sample-wise case. But why this makes error-minimizing noise more generic and effective in making data unlearnable? What does it mean?
looking forward for your reply ! Thanks !
Hello,
Thanks you for sharing your good work. You showed in the Appendix, Table 3 that your class-wise and sample-wise noise is unstable when the model is trained with Fast AutoAugment. In the following question under the official implementation of the fast-autoaugment, there is some complaint about the incompatibility between RayX and Slurm. Can you help share how you run the fast-auto augment for your unlearnable example? Many thanks.
您好,我使用pycharm无法正常运行程序,请问我用python3.6运行的时候报错KeyError: 'train_subset',这可能是什么原因造成的呢,如果您不便回答的话可以提醒我一下这个变量的具体作用吗,麻烦您了!
Hi Huang,
I want to ask about how to extract the training acc_avg and eval_avg of the log file and drow the line plot. I would really appreciate it if you could provide me with the code.
Thanks for releasing the code.
I found that you are using cls_id to choose the sample-wise noise from perturb_tensor. Should I change cls_id to the data index of the dataloader to choose the sample-wise noise from perturb_tensor?
Hi, thanks for sharing your codes!
I was able to run perturbation.py
and main.py
in the Sample-wise noise for unlearnable example on CIFAR-10 section. However, when I try to run perturbation.py
in the Class-wise noise for unlearnable example on CIFAR-10 section, it raises the following error:
Traceback (most recent call last):
File "perturbation.py", line 483, in <module>
main()
File "perturbation.py", line 469, in main
noise = universal_perturbation(noise_generator, trainer, evaluator, model, criterion, optimizer, scheduler, random_noise, ENV)
File "perturbation.py", line 191, in universal_perturbation
for i, (images, labels) in tqdm(enumerate(data_loader[args.universal_train_target]), total=len(data_loader[args.universal_train_target])):
KeyError: 'train_subset'
Thus, I add an option --universal_train_target train_dataset
to fix this error. Is this right to get the class-wise perturbation?
BTW, there are two typos (--perturb_type samplewse
=>--perturb_type samplewise
) in README.md.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.