Git Product home page Git Product logo

fixmatch-pytorch's Introduction

Jungdae's github stats

fixmatch-pytorch's People

Contributors

kekmodel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fixmatch-pytorch's Issues

why add a interleave before the input?

Wonderful job!
I want to know what role does do_interleave function do, is the performance dropping when removing it?

logits = de_interleave(logits, 2*args.mu+1)

Using fixmatch to train on my dataset, LR = 0.03, batchsize = 64. When the unlabeled loss increases gradually, the model gets worse and worse, and the loss of training set will decrease, but the loss of verification set will increase, and the accuracy will decrease?

Using fixmatch to train on my dataset, LR = 0.03, batchsize = 64. When the unlabeled loss increases gradually, the model gets worse and worse, and the loss of training set will decrease, but the loss of verification set will increase, and the accuracy will decrease?

Can not get repo reported accuracy for [email protected]

I am following the repo instruction and keep the original code, and run command

python train.py --dataset cifar10 --num-labeled 40 --arch wideresnet --batch-size 64 --lr 0.03 --expand-labels --seed 5 --out results/[email protected]

But I can not get the reported accuracy, from the acc curve, we can see the model reach 90.49% accuracy at 100th epoch, but I can only get around 76%.

Can you help me figure it out?

Lack of seed during random labeled image selection may be leading to better performance on training resumption

In the function x_u_split(...) in dataset/cifar.py the labeled images are generated without a seed. If training runs consist of multiple start and stops then it is possible that total number of labeled images that the model sees exceeds the set value. For instance, training on 40 labels with 2 stops will lead to 120 unique labeled images over the entire course of training even though the model only sees 40 labeled images at a time. I think this can explain the much higher accuracy obtained by this implementation, especially for the low label tasks.

A quick fix would be adding below snippet before random label generation in the x_u_split(...) function.

np.random.seed(args.seed)

cos learing rate

Hi,

It seems the last line in get_cosine_schedule_with_warmup funtion should be:
return max(0., (math.cos(math.pi * num_cycles * no_progress) + 1 ) * 0.5)

But I am not sure about this, correct me if I am wrong, thanks

RuntimeError after 1 epoch

01/14/2021 19:40:10 - INFO - models.wideresnet -   Model: WideResNet 28x2
01/14/2021 19:40:10 - INFO - __main__ -   Total params: 1.47M
01/14/2021 19:40:19 - INFO - __main__ -   ***** Running training *****
01/14/2021 19:40:19 - INFO - __main__ -     Task = cifar10@4000
01/14/2021 19:40:19 - INFO - __main__ -     Num Epochs = 1024
01/14/2021 19:40:19 - INFO - __main__ -     Batch size per GPU = 64
01/14/2021 19:40:19 - INFO - __main__ -     Total train batch size = 64
01/14/2021 19:40:19 - INFO - __main__ -     Total optimization steps = 1048576
Train Epoch: 1/1024. Iter: 1024/1024. LR: 0.0300. Data: 0.253s. Batch: 0.569s. Loss: 1.2500. Loss_x: 1.2062. Loss_u: 0.0437. Mask: 0.08. : 100% 1024/1024 [09:42<00:00,  1.76it/s]
  0% 0/157 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train.py", line 475, in <module>
    main()
  File "train.py", line 291, in main
    model, optimizer, ema_model, scheduler, writer)
  File "train.py", line 393, in train
    test_loss, test_acc = test(args, test_loader, test_model, epoch)
  File "train.py", line 450, in test
    prec1, prec5 = accuracy(outputs, targets, topk=(1, 5))
  File "/content/FixMatch-pytorch/utils/misc.py", line 41, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
  0% 0/157 [00:01<?, ?it/s]

Any help is appreciated!

fail to reproduce the results of your experiment

I want to reproduce the result when the number of labeled data is 40
After executing the following command:
python train.py --dataset cifar10 --num-labeled 40 --arch wideresnet --batch-size 256 --lr 0.01 --expand-labels --seed 5 --out results/[email protected] --gpu 1

Accuracy hovered around 77.6, barely improving.
Is my hyperparameter set wrong?
How about your command?

Train Epoch: 148/1024. Iter: 1024/1024. LR: 0.0294. Data: 0.036s. Batch: 0.507s. Loss: 0.1614. Loss_x: 0.0009. Loss_u: 0.1605. Mask: 0.83. : 100%|█| 1024/
Test Iter: 79/ 79. Data: 0.005s. Batch: 0.017s. Loss: 2.5019. top1: 77.20. top5: 96.61. : 100%|████████████████████████| 79/79 [00:01<00:00, 56.13it/s]
12/07/2020 18:27:21 - INFO - main - top-1 acc: 77.20
12/07/2020 18:27:21 - INFO - main - top-5 acc: 96.61
12/07/2020 18:27:21 - INFO - main - Best top-1 acc: 77.68
12/07/2020 18:27:21 - INFO - main - Mean top-1 acc: 77.20

The RandAugment Implementation Is Wrong

google-research/fixmatch#65

As confirmed by the collaborator of the official implementation the randaugment function should be applied all the time not 50 percent of the time.

The only 50% chance from the original paper refers to the flips etc of weak augmentation only and definitely not the strong augmentation methods

This could explain why this repository does better than the original paper but it isn't what the original paper did.

For those who want the true performance of the original fixmatch paper you need to delete

if random.random() < 0.5:

From the https://github.com/kekmodel/FixMatch-pytorch/blob/master/dataset/randaugment.py file

train failed when EMA mode is off

I found that this is because the model.train() did not open again when evaluation ends.
solution: just move mode.train() to the epoch loop:

  • model.train()
    for epoch in range(args.start_epoch, args.epochs):

->

for epoch in range(args.start_epoch, args.epochs):

  •    model.train()
    

custom dataset

Do you have plans to make this repository compatible with a custom dataset, and if not, which files would need to be modified to do so?

Datasets

Why do the authors include the labeled examples in the unlabeled dataset?

Purpose of Interleave

Just wanted to know the intuition behind the interleave and deinterleave operations. How does this help?

FixMatch for Multi-class Classification

Thanks for sharing this excellent work. I just wonder if there is any idea to apply this algorithm for multi-class classification. Could I simply replace softmax with sigmoid to implement it?

CIFAR100 has different architecture from official implementation

In your implementation, WRN28-10 is used which has about 36M parameters.
Your model definition:

FixMatch-pytorch/train.py

Lines 165 to 169 in 9044f2e

elif args.dataset == 'cifar100':
args.num_classes = 100
if args.arch == 'wideresnet':
args.model_depth = 28
args.model_width = 10

I used the following code to get the number of parameters

wrn = build_wideresnet(depth=28, widen_factor=10, dropout=0, num_classes=100)

print(f"# params: {sum(p.numel() for p in wrn.parameters()):,}")

which gives the following output:
Annotation 2020-09-09 005826

In the official TensorFlow implementation, a WRN with about 23M parameters is used for CIFAR100 (see below image).
fixmatch num params - Copy

Notes

The CLI args for the official code can be found in this issue.

BatchNorm has a biased estimation

Hi :)

In the function train() in train.py, I think batch normalization layers will have a biased estimation if you feed the concatenated inputs to the model. It should rather be like:

inputs = torch.cat((inputs_x, inputs_u_s)).to(args.device)
targets_x = targets_x.to(args.device)
logits = model(inputs)
logits_x , logits_u_s = logits.chunk(2)
model.eval()
logits_u_w= model(inputs_u_w)
model.train()        
del logits

Besides, it is mentioned in the paper that they also applied the unlabeled loss for labeled data

In practice, we include all labeled examples as part of unlabeled data
without using their labels when constructing U.

Accuracy lower than reported

For the 40 labels on CIFAR-10 the accuracy reaches 93.38, when I run it. I used the same hyperparameters and seed 5. Is the reported acc for seed 5?

What does the function `interleave` do?

Hi, nice work! When reading your code, I find a function named interleave in the train.py:

def interleave(x, size):
    s = list(x.shape)
    return x.reshape([-1, size] + s[1:]).transpose(0, 1).reshape([-1] + s[1:])

Could you make some explanations about this function? I do not understand why we use this operation. Thank you!

Unsupervised loss part for one single class

I try to apply FixMatch on one class data.
For the unsupervised loss part, I modified the code like that.

logits_u_w, logits_u_s = logits[batch_size:].chunk(2)
pseudo_label = torch.sigmoid( logits_u_w.detach_() )
mask = pseudo_label.ge( args.threshold ).float()
Lu = (F.binary_cross_entropy( logits_u_s, mask, reduction='none' ) * mask ).mean()

Is that correct?

Accuracy curves

Hi --

Are you able to share a screenshot of the accuracy curves (test accuracy vs epoch)? I'm trying to reproduce your results, but it'd be helpful to make sure I'm on the right track since the models take so long to train.

Thanks!

Can't reproduce the result with single GPU

Just wondering if it is necessary to use 4 GPUs to reproduce the reported the results? I have been struggling to get the same results with 1 GPU. Anyone succesfully reproduced the result with single GPU?

RandAugment Implementation

Hello I am very curious about this part of the randaugment implementation in the RandAugmentMC class

if random.random() < 0.5:
img = op(img, v=v, max_v=max_v, bias=bias)

If I am understanding this right, for RandaugmentMC with n=2 there is a 25% chance of no randaugment operator, 50% chance of one randaugment operator and a 25% of two randaugment operators occurring for a given image.

Is this what you found in the fixmatch implementation as this isn't what I understood the paper to do?

num_labeled in DistributedDataParallel

When using DistributedDataParallel, if N labeled training images and K GPUs are used, should we set num_labeled = N / K instead of N? since np.random.shuffle(idx) generate different idxs in different threads.

Performance not reproduced

I've run this code with 40 labels for CIFAR10, but I could not reproduced the report results. I only changed the number 4000 to 40 in the command directed in the USAGE part of the main page. (I got about 90% accuracy, but a little smaller). Is there anything to be modified more?

No Validation Set

Hi, For supervised and semi-supervised methods, it is generally advised to use a separate validation set. From the code, it looks like the best test set accuracy is reported.

Is there any specific reason that a separate validation set is not required.

PyTorch Distributed Training

Hello,

Thanks for your good work! I really enjoyed your implementation. While I was trying to reproduce your results, I found some questions about PyTorch distributed training:

  1. Why don't you reduce losses across GPUs before calling the backward? I think typically you may want to call torch.distributed.all_reduce to synchronize the gradients across the GPUs. Is it just missing from the code or you have some equivalent operations that I have overlooked?
  2. When I run the code on single GPU and multiple GPUs, the distributed training does not make the training process faster(even slower). The overall batch sizes are consistent in both settings(64 for single GPU and 4*16 for 4 GPUs). Is this expected?

Thanks for your help in advance!

total numer of iterations

If train the model with multiple GPU's the total batch size becomes bigger (batch_size_total = batch_size * num_gpus) but the number of eval_steps in one epoch stays equal. This causes that the number of overall iterations in the training is increased by the factor of the number of GPU's. In the original Tensorflow implementation the number of overall iterations is independent from the number of the GPU's and the batch is divided to the different GPU's.
I'm not 100% sure about this but if it's right the number of eval_steps in one epoch should be reduced or the batch should be divided to the GPU's so that the number of overall iterations stays constant when using multiple GPU's.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.