vikasverma1077 / manifold_mixup Goto Github PK
View Code? Open in Web Editor NEWCode for reproducing Manifold Mixup results (ICML 2019)
Code for reproducing Manifold Mixup results (ICML 2019)
Hi! Great paper! I implemented manifold mixup and also support for interpolated adversarial training (https://github.com/shivamsaboo17/ManifoldMixup) for any custom model defined by user using PyTorch's forward hook functionality by:
For now I am selecting the layer randomly without considering type of layer (batchnorm, relu etc are counted as different layer), hence I wanted to know if there should be any layer selection rule such as 'mixup should be done only after a conv block in resnet' and if yes how to extend this rule to custom models that users might build?
When I try to use this the supervised PreActResnets with Manifold Mixup with torch.nn.DataParallel, it only returns the data from one of my GPUS. Is there a known lack of integration with the DataParallel module or am I likely doing something wrong?
Hi,
I run the Semi-supervised Manifold mixup for Cifar10 and the following error appears, I'm using python 2.7, torch 0.3.1 and torchvision 0.2.0
Traceback (most recent call last):
File "main_mixup_hidden_ssl.py", line 357, in
train(epoch)
File "main_mixup_hidden_ssl.py", line 216, in train
lam = lam.data.cpu().numpy().item()
ValueError: can only convert an array of size 1 to a Python scalar
I saw Iam here is an array of size 2, not 1, so I used item(0) to replace item() in line 216, then a similar error appears
File "main_mixup_hidden_ssl.py", line 357, in
train(epoch)
File "main_mixup_hidden_ssl.py", line 241, in train
mixedup_target = target_alam.expand_as(target_a) + target_b(1-lam.expand_as(target_b))
File "/home/wei.z/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 433, in expand_as
return self.expand(tensor.size())
RuntimeError: The expanded size of the tensor (10) must match the existing size (2) at non-singleton dimension 1. at /pytorch/torch/lib/THC/generic/THCTensor.c:340
I am attempting to reproduce your CIFAR10 supervised results from Table 1 (https://arxiv.org/pdf/1806.05236.pdf) using code from this repository. I cannot get within 0.5% of the following results:
For example, the paper is vague on details such as initial learning rate, batchsize, Nesterov or not, and other settings. Could you kindly provide command line invocations to reproduce those results?
Also when running your training code I see test error variation of around 0.3-0.5% from epoch to epoch. Do you report results over multiple seeds or are these figures single-seed estimates of the test error?
Hi,
I would like to know the reason of using BCE instead of CrossEntropy. Is this critical to Manifold Mixup? Is this also the reason you train 2000 epochs which is much longer than the common training schedules?
Hi, the original code that you have mentioned in your file (models/resnet.py) is given as:
out = x
if layer_mix == 0:
#out = lam * out + (1 - lam) * out[index,:]
out, y_a, y_b, lam = mixup_data(out, target, mixup_alpha)
#print (out)
out = F.relu(self.bn1(self.conv1(x)))
out = self.layer1(out)
At line 109, after the hidden mixup at layer 0, you are using x as input in self.conv1() layer.
Shouldn't that be changed to self.conv1(out)?
I want to copy the code from this project and use this methods to other task.
Thank you very much.
@vikasverma1077
I didn't find anything in the code like nn.Dropout
to determine if it was in training.
I have a question about the result of epoch. Why do you use 600-2000 epoch to validate the superiority of your method? I think that the epoch number is too large and sometimes I only use 200 epoch to train these tiny dataset. Any reasons about settings of epoch?
Best
Hi,
in manifold_mixup_hidden_ssl.py row 215.
Using two GPU with DataParallel the dim of lam is two, .item() cannot convert to one single scalar.
I was trying to implement mix-up-hidden for preactresnet18, I found that for calculating accuracy you are comparing mix-up output with original target instead of re-weighted one, that makes accuracy low during training, I didn't understand what that accuracy signifies/implies?
Thanks for sharing your work for all to reproduce!!
I was wondering if you had the extra plotting code to reproduce figure 1a, 1b from the paper. It would be very much appreciated.
Thanks in advance!!
Hi there
I cloned the repo, pre-calced the ZCA matrix, and ran your command as follows on a single GPU using python 2.7, torch 0.3.1 and torchvision 0.2.0:
python main_mixup_hidden_ssl.py --dataset cifar10 --optimizer sgd --lr 0.1 --l2 0.0005 --nesterov --epochs 1000 --batch_size 100 --mixup_sup 1 --mixup_usup 1 --mixup_sup_hidden --mixup_usup_hidden --mixup_alpha_sup 0.1 --mixup_alpha_usup 2.0 --alpha_max 1.0 --alpha_max_at_factor 0.4 --net_type WRN28_2 --schedule 500 750 875 --gammas 0.1 0.1 0.1 --exp_dir exp1 --data_dir ../data/cifar10/
However I get a final test error of around 18%, whereas in your paper you report around 10%.
Is there something I'm missing here?
Thanks in advance
Liam
[This is similar to #5, but with the current code base and more networks.]
I am trying to recreate the Manifold Mixup CIFAR10 results, it seems that Manifold Mixup is a very promising development! I'm using the command lines from the project's README.md. I'm using Windows10, TitanXP, Python 3.7, PyTorch nightly (1.2, 7/6/2019), torchvision 0.3, and other packages the same or (mostly) slightly newer. My manifold_mixup version is 10/16/2019.
I only had to make one slight change, for torchvision 0.3: get_sampler(train_data.targets, ...) instead of get_sample(train_data.train_labels, ...).
Below, I show the test results from your paper, along with the results that I got. End is the final test error; best is the best test error during the run. The column "z" is a z-score, based on the mean μ and stdev σ from the arXiv paper, and my results. A negative z-score indicates that my results had a lower test error; a positive z-score = a higher test error. CLFR=="Command Line From README.md".
The results are mixed, and I'm not sure why; I thought you might have some thoughts. I'm seeing:
I accidentally tried Manifold Mixup without mixup_hidden for WRN28-10 (i.e. mixup, alpha=2.0), and actually got the mean result reported in the paper.
Any ideas? Some questions:
CIFAR 10 | Err | Err | Tm | End | End | Best | Best | Best | CLFR |
---|---|---|---|---|---|---|---|---|---|
μ | σ | [hrs] | Err | z | Iter | Err | z | ||
PreActResNet18 | |||||||||
No Mixup | 4.83 | .066 | 28.5 | 4.59 | -3.6 | 642 | 4.4 | -6.5 | Y |
AdaMix (Guo) | 3.52 | ||||||||
Input Mixup (Zhang) | 4.2 | ||||||||
Input Mixup (α = 1) | 3.82 | 0.048 | 30 | 3.43 | -8.1 | 1687 | 3.15 | -14.0 | Y |
Manifold Mixup (α = 2) | 2.95 | 0.046 | 32 | 3.18 | 5.0 | 1640 | 3.01 | 1.3 | Y |
PreActResNet34 | |||||||||
No Mixup | 4.64 | .072 | |||||||
Input Mixup (α = 1) | 2.88 | 0.043 | 44 | 3.21 | 7.7 | 1159 | 2.99 | 2.6 | Y |
Manifold Mixup (α = 2) | 2.54 | 0.047 | 45 | 2.7 | 3.4 | 1230 | 2.47 | -1.5 | Y |
Wide-Resnet-28-10 | |||||||||
No Mixup | 3.99 | .118 | 19 | 4.12 | 1.1 | 299 | 3.89 | -0.8 | Y |
Input Mixup (α = 1) | 2.92 | .088 | 20.5 | 2.79 | -1.5 | 367 | 2.76 | -1.8 | Y |
Manifold Mixup (α = 2) | 2.55 | .024 | 19 | 2.97 | 17.5 | 353 | 2.82 | 11.3 | Y |
Manifold Mixup (α = 2) , but not mixup_hidden | 2.55 | .024 | 18.5 | 2.73 | 7.5 | 391 | 2.55 | 0.0 | N |
Also, here is a plot of the test error, for each of the scenarios above. (The pink wrn28_10_mixup_alpha=0 is shortened / offset to the left, because it's from a restart.) Notably:
Hi! Thanks so much for the well organised code!
Just wanted to double check regarding the torch and torchvision versions. With regards to the supervised models, torchvision 0.2.1 appears to give the error in the title. This seems to be confirmed by this other post mentioning that ".targets" is used from torchvision 0.4.1? Perhaps there were some retroactive library updates?
Warm regards,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.