Git Product home page Git Product logo

cmc's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cmc's Issues

Why use learning rate 30~50

I saw your note and it seems rather unusual to use such a large learning rate:

Note: When training linear classifiers on top of ResNets, it's important to use large learning rate, e.g., 30~50.

Is there something I'm missing? I can't imagine how you get stable gradient descent with such high learning rates.

Pre-trained weights for linear classifier available?

Hey there, thanks for the well-documented code!

Quick question: Am I correctly assuming that in order to evaluate the model on the 1,000-class ImageNet validation dataset one has to train the linear classifier first (using LinearProbing.py)? If so, would it be possible to release pre-trained weights for the classifier as well, such that one can use classifier.load_state_dict(checkpoint['classifier'])?

shuffle-bn has no effect on single-GPU

It appears to me that shuffle-bn has no effect, when run on a single GPU.

Example:

import torch
import torch.nn as nn

(B,C,H,W) = 4,3,2,2

model1 = nn.Sequential(nn.BatchNorm2d(C))
model2 = nn.Sequential(nn.BatchNorm2d(C))
print("Before:")
print("  model1 stats: ", model1[0].running_mean, model1[0].running_var)
print("  model2 stats: ", model2[0].running_mean, model2[0].running_var)
shuffle_ids = torch.randperm(B).long()
x1 = torch.randn(B,C,H,W)*3+1
x2 = x1[shuffle_ids]
model1(x1)
model2(x2)
print("After:")
print("  model1 stats: ", model1[0].running_mean, model1[0].running_var)
print("  model2 stats: ", model2[0].running_mean, model2[0].running_var)
Before:
  model1 stats:  tensor([0., 0., 0.]) tensor([1., 1., 1.])
  model2 stats:  tensor([0., 0., 0.]) tensor([1., 1., 1.])
After:
  model1 stats:  tensor([0.2285, 0.1523, 0.1447]) tensor([1.6193, 1.4863, 1.6332])
  model2 stats:  tensor([0.2285, 0.1523, 0.1447]) tensor([1.6193, 1.4863, 1.6332])

I guess another approach is necessary on single-GPU. Any thoughts?

Thanks for releasing this code.

A question about the dataset.

Dear authors,

I just read your paper recently, and I think it is really interesting and significant.
So I want to see some details about the method by running the code.

I mainly focus on graph representation learning, recommendation, and ML.
I am not familiar with the image dataset and processing.

Could you provide me the datasets to run the code? Or where can I download the Imagenet 100 and STL-100 dataset??

Thanks.

Xu Chen

Is it possible to train CMC by loading pretrained resnet-50 model on ImageNet?

I enjoyed reading the paper. Thanks for open sourcing the code.

Please let me know if I can train CMC model [resnet50 variant] by loading pretrained resnet-50 trained on ImageNet.

Also if I want to train with custom dataset with custom number of classes, please suggest what change is required in hyperparams?

Code to reproduce NYU RGBD results / input pipeline

Hi,
thanks for your repo.
It would be nice if you could provide the code / the input pipeline which you used to run the NYU RGB-D experiments as well (similar to #4 ). To me it is not entirely clear, how you added the different modalities.
Best,

TenCrop Results

Is there any ten crop results? As I know, some methods will improve a lot with ten crop, but some may only improve a little. I wonder how much improvement can be get with ten crop in CMC.

Unable to reproduce full Imagenet accuracies of pretrained weights for CMC Resnet50v2 and MoCo

Hi @HobbitLong,

Thanks for such a clean and readable code.

I am interested in using the pre-trained weights that you were kind enough to provide. I downloaded the pre-trained weights CMC_resnet50v2.pth and MoCo_softmax_16384_epoch200.pth. Then, I ran the linear evaluation code with the following commands, but couldn't reproduce the accuracies. The accuracies at the final, 60th, epoch for CMC and MoCo are 62.0% and 57.3% respectively. The accuracies should be 64.1% (from the CMC paper) and 59.4% (from readme).

CUDA_VISIBLE_DEVICES=9 python LinearProbing.py --dataset imagenet \
 --data_folder /datasets/imagenet_nfs1 \
 --save_path ./output/cmc_linear \
 --tb_path ./output/cmc_linear \
 --model_path ./pretrained/CMC_resnet50v2.pth \
 --model resnet50v2 --learning_rate 30 --layer 6

CUDA_VISIBLE_DEVICES=8 python eval_moco_ins.py --dataset imagenet \
 --data_folder /datasets/imagenet_nfs1 \
 --save_path ./output/moco_linear \
 --tb_path ./output/moco_linear \
 --model_path ./pretrained/MoCo_softmax_16384_epoch200.pth \
 --model resnet50 --learning_rate 30 --layer 6

Have I missed something? Do I need to change the default hyperparameters to get the reported numbers?

Thanks

For loops in AliasMethod

https://github.com/HobbitLong/CMC/blob/master/NCE/alias_multinomial.py#L8

Hi!

While reading your code, I've noticed that for loops in the initialization function of AliasMethod causes a lot of computation.

However, the only entry (https://github.com/HobbitLong/CMC/blob/master/NCE/NCEAverage.py#L13) instantiating the class is passing torch.ones, which results in ones self.prob and zeros self.alias in AliasMethod.

What could go wrong if I let them just ones and zeros instead of running for loops while initializing AliasMethod?

Thanks for sharing the code :) (and RepDistiller too!)

Implementing CMC on CIFAR-10

Hi @HobbitLong, I am trying to implement CMC on CIFAR-10 with a shallow ResNet. However, the accuracy only reaches 60%~70%. I have tried to tune the batch size from 64 to 512 and learning rate from 0.01 to 0.12. In addition, I also tuned the nce_k from 8192 to 65536. Unfortunately, it is not improved yet. I am writing to ask do you have any suggestions on tuning parameters on small datasets like CIFAR-10? Thank you very much.

Something went wrong when evaluating the results on ImageNet

When I evaluated the result on ImageNet (not the subset), I got the bug as follows:

THCudaCheckWarn FAIL file=/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/THCStream.cpp line=50 error=59 : device-side assert triggered

Does anyone have any thought about the issue?

ImageNet100 subset

Hi,
Would you please share the subset of ImageNet(ImageNet100) you used?
I want to train the MoCo model and compare it with your results!
Thanks!

Curious about the RandomResizedCrop parameters(minimum crop in your code)

Hi, thank you for sharing the code! I am curious about the effect of the data augmentation, concretely the RandomResizedCrop in train_moco_ins.py.
In your codes, the minimum crop scale is 0.2 for most choices but 0.08 for imagenet full dataset with ResNet, however the parameter in other papers such as non parametric instance discrimination is also set to 0.2 when using ResNet as backbone. So I am curious about the choice(0.08 as default torchvision parameter). Is this smaller scale work better in full imagenet? Have you validated the performance on imagenet between 0.08 and 0.2 with a ResNet backbone?

Reproducing MoCo on ImageNet-1k.

Hi Yonglong,

Thanks a lot for the great work and sharing the code. I am trying to reproduce the results of MoCo on ImageNet-1k, with ResNet 50. Did you reproduce the results on Kaiming's paper on the full ImageNet? Would you kindly share me the specific configurations for reproducing MoCo-ResNet-50?

Thanks a lot!

Support for resnet as backbone

Hi, thanks for open-sourcing the code. I wanted to know as to when will you enable the support for resnet to be used as a backbone.

other available views?

Could you please release the code of using other views instead of only "l" and "ab" in the training CMC process?

Code to recreate STL10 experiments

I enjoyed reading the paper and thanks for uploading the code.

Quick question - would it be possible to also upload the scripts to run the STL-10 eval?

Thanks!

Support for DistributedDataParallel

Hi

Thank you for sharing this great work with us.

I saw that you have spawn in the code, thus I am wondering your plan to release the code for supporting DistributedDataParallel. In particular, I am curious how do you sync the memory bank for L and ab, e.g., in self.register_buffer('memory_ab') during training.

Thank you :)

ImageNet100

Hi,
I'm confused how can I get the imagenet100 dataset online, since I can't find any corresponding link for downloading.
Could you please share the link?

Thank you.

Convention for number of convolutions in AlexNet

Hi there,

This is a bit of a meta-question.

I noticed that your code uses the original AlexNet parameters i.e. with convolutions 96,256,384,384,256 vs. the one weird trick paper 64,192,384,256,256 that is the standard in the official PyTorch implementation.

In comparison, Feng et al. at CVPR 2019 use the smaller version of AlexNet in their code.

I was wondering if there was a standard for which version of AlexNet should be used in the self-supervised literature, and if it even makes a difference?

Thanks

Why use memory feature for positive samples?

Hi! Thanks for your code!

I have some questions about your implement. I notice that for negative samples we use memory bank cause 4096 is too large for 1 batch. But for positive samples, why still use memory bank rather than the feature calculated by this batch? Is there any harm for doing this?

Thanks

During the training, the loss and probs seem bad.

After 126 epochs for training, the loss still seems huge. And the probs for "L","ab" are only about 0.007. We set learning rate, batch size to 6e-2 and 1024 (8 Tesla-V100).

Train: [126][930/1252] BT 0.827 (0.953) DT 0.001 (0.234) loss 6.161 (6.071) l_p 0.007 (0.007) ab_p 0.006 (0.006)
torch.Size([1024, 16385, 1])
Train: [126][940/1252] BT 0.630 (0.951) DT 0.001 (0.232) loss 5.945 (6.071) l_p 0.007 (0.007) ab_p 0.006 (0.006)
torch.Size([1024, 16385, 1])

I don't know what's wrong with our experiment setting. Could you share the curves of training loss and probs of 'L' and 'ab'?

imagenet 100

Hi, @HobbitLong, could you please supply the classes you use for the imagenet 100 dataset? thanks. Is the imagenet 100 the same as imagenet except for the class number?

High Values of Z_L and Z_ab

Hi,
Thanks for open-sourcing your work, I have been trying to use CMC on my custom toy dataset which has 2 views (Image (3D), Sensor view (3D)) I'm able to run the model successfully but the Z for view 1 and view 2 is being set to 119973150195712.

I made sure to use L2 norm around the final features from each of the alexnet halfs but I'm really not sure why the Z values are being initialized to such a high value. I kept the nce_m,nce_k and nce_t to the same as that of your code.

Please, can you help me with the same? Thank you

Question about data augmentation and memory bank

Hi, Thanks a lot for sharing this great code.
I have a question about data augmentation and the memory bank. If we use data augmentation, the features in the memory bank are not update for this issue. Especially for the positive examples which we using from the memory bank.
Have you thought about it?

Expected values of `ins_prob` and `ins_loss` in MoCo when training is working

Hi there

Thanks a lot for this great repo!

I am trying out MoCo on my own dataset (I also added additional augmentations). Training appears to have converged, but the max value I get for ins_prob is about 13.35, and the lowest value I get for loss is about 0.2422.

I am wondering what metrics you got when training on Imagenet? Am not sure what a "good" score should look like.

Here are screenshots from training progress in tensorboard (ignore the multiple lines at the start of training).

image

image

Thanks,
Liam

Accuracy on ImageNet using Resnet50v1

Hi,
Thanks for your released code. I want to check something puzzling me.
Does 'resnes50v2' represent 'ResNet-50' in Table 2 in the paper?
Does 'resnes50v3' represent 'ResNet-50 x2' in Table 2 in the paper?
If the answers are true, I want to know if you have trained 'resnet50v1' on ImageNet. Could you please share the results?

Thanks.

Questions about NCEAverage.py

Hi @HobbitLong , thank you for releasing the code. I wanted to ask a few questions regarding the implementation of NCEAverage.py. I understand some of them might be pretty basic questions but hopefully the answers will also help others to understand the code + implementation better.

  • What is the purpose of T=0.07 and why do out_l and out_ab need to be divided by T?

* Is there any advantage of starting out with unit vectors (on average) by implementing stdv = 1. / math.sqrt(inputSize / 3) here. I say this because out_l and out_ab need to be normalized anyway as is done here.

* Is this correct that you use a moving average (MA) to update weight_l and weight_ab (instead of just copying the values directly) because the model itself is learning and the values l and ab can be noisy? Using a MA reduces variance.
* As a follow up, how would this implementation be possible if you were not using memory banks? Is this an incidental advantage of using a memory bank?

  • [Resolved] Why did you not use a gradient descent based method to implement NCE? Was it done to reduce the overload of all things that needed to be learnt?

  • [Resolved] Lastly, since NCEAverage has no parameters or nn layers, I believe you don't need with torch.no_grad() here.~~

Thank you again.

Reproduce MoCov2 on ImageNet 1k

Hi @HobbitLong , I am trying to reproduce MoCo v2 on ImageNet 1k. Have you tried to replace the Linear projection head to MLP? Do you think it is necessary that add the batch normalization layer or bias for the fully connected layer? I keep all the hyper-param same as the paper but only could get 61.4~ acc with 4 gpu 256 batch size.

Would you kindly share with me the specific configurations based on your codebase for reproducing MoCov2-ResNet-50?

Thanks a lot!

Using dot product as a proxy for probability in NCEAverage

Hi, it seems that you are using the dot product between vectors from two views as a proxy for unknown distribution denoted as pd in your paper here. In other words, your hθ is the dot product. Theoretically any hθ can work so it's all good.

But doesn't it force the two representations to be similar? I understand the two representations should have high mutual information. But it is not the same as having the two vectors in similar directions.

Obviously it worked out pretty well. But do you think having a parameterized NCEAverage loss would have allowed for more representations with not so similar directions but still having high MI?

Thank you again!

Question about softmax loss

CMC/NCE/NCECriterion.py

Lines 35 to 46 in 58d06e9

class NCESoftmaxLoss(nn.Module):
"""Softmax cross-entropy loss (a.k.a., info-NCE loss in CPC paper)"""
def __init__(self):
super(NCESoftmaxLoss, self).__init__()
self.criterion = nn.CrossEntropyLoss()
def forward(self, x):
bsz = x.shape[0]
x = x.squeeze()
label = torch.zeros([bsz]).cuda().long()
loss = self.criterion(x, label)
return loss

Hi, I have a question about using softmax instead of NCE loss.
In that function, every label is set zero including the critic value of positive sample, which has index 0 of the batch.
I want to know the reason. My take on this is that the label should be [1, 0, 0, 0, ...]. Isn't it?

Could You Please Share the Curve of Training Loss?

Hi,
I want to use CMC in my own experiment, but the loss is strange. At each epoch, the loss decays as normal (like from 20 to 11). But at the next epoch, the loss becomes nearly the same as begining (the loss is 20 again). I wonder if it is 'normal' in CMC.

Thanks.

The loss label of of NCESoftmaxLoss in NCECriterion.py?

Hi,

I see your code for NCESoftmaxLoss as follows:

#########
class NCESoftmaxLoss(nn.Module):
"""Softmax cross-entropy loss (a.k.a., info-NCE loss in CPC paper)"""
def init(self):
super(NCESoftmaxLoss, self).init()
self.criterion = nn.CrossEntropyLoss()

def forward(self, x):
    bsz = x.shape[0]
    x = x.squeeze()
    label = torch.zeros([bsz]).cuda().long()
    loss = self.criterion(x, label)
    return loss

###########
The label for this loss is label = torch.zeros([bsz]).cuda().long(), but in your paper, according to eq.2,
image
You have one positive for each sample.

So is something missed here??

Thanks.

Question about NCECriterion. py

loss = - (log_D1.sum(0) + log_D0.view(-1, 1).sum(0)) / bsz

I think the purpose of using NCE is to avoid expensive summation over entire vector in softmax. But in your implementation, there is still summation over entire log_D0 which confused me. I'll appreciate it if you explain this.
I'm new to this field, and hope you point out my misunderstanding if there is.

Unable to run pre-trained models

Hello,

Thanks for making this code available. I am trying to run the pertained alexnet model (downloaded from the Dropbox link) with the following command:

python LinearProbing.py --dataset imagenet --data_folder /share/ctn/users/jwl2182/imagenet_data --save_path . --model_path /home/jwl2182/CMC/CMC_alexnet.pth --model alexnet --learning_rate 0.1 --layer 5 --tb_path /home/jwl2182/CMC/tb --gpu 0

But I get the following error. Any ideas what might be happening?

RuntimeError: Error(s) in loading state_dict for MyAlexNetCMC:
Unexpected key(s) in state_dict: "encoder.module.l_to_ab.conv_block_1.1.num_batches_tracked", "encoder.module.l_to_ab.conv_block_2.1.num_batches_tracked", "encoder.module.l_to_ab.conv_block_3.1.num_batches_tracked", "encoder.module.l_to_ab.conv_block_4.1.num_batches_tracked", "encoder.module.l_to_ab.conv_block_5.1.num_batches_tracked", "encoder.module.l_to_ab.fc6.1.num_batches_tracked", "encoder.module.l_to_ab.fc7.1.num_batches_tracked", "encoder.module.ab_to_l.conv_block_1.1.num_batches_tracked", "encoder.module.ab_to_l.conv_block_2.1.num_batches_tracked", "encoder.module.ab_to_l.conv_block_3.1.num_batches_tracked", "encoder.module.ab_to_l.conv_block_4.1.num_batches_tracked", "encoder.module.ab_to_l.conv_block_5.1.num_batches_tracked", "encoder.module.ab_to_l.fc6.1.num_batches_tracked", "encoder.module.ab_to_l.fc7.1.num_batches_tracked".

Augmenting images at the evaluation of downstream classification task

CMC/eval_moco_ins.py

Lines 372 to 397 in 58d06e9

for idx, (input, target) in enumerate(train_loader):
# measure data loading time
data_time.update(time.time() - end)
if opt.gpu is not None:
input = input.cuda(opt.gpu, non_blocking=True)
input = input.float()
target = target.cuda(opt.gpu, non_blocking=True)
# ===================forward=====================
with torch.no_grad():
feat = model(input, opt.layer)
feat = feat.detach()
output = classifier(feat)
loss = criterion(output, target)
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.item(), input.size(0))
top1.update(acc1[0], input.size(0))
top5.update(acc5[0], input.size(0))
# ===================backward=====================
optimizer.zero_grad()
loss.backward()
optimizer.step()

If I understand the inner work of eval_moco_ins.py correctly, the code seems training the downstream task (single FC) using augmented images (train_transform == 'CJ').

This augmentation process not only slows down the training speed of the downstream task but also seems to violate the purpose of evaluation (Then we freeze the features and train a supervised linear classifier, said in MoCo paper).

Isn't it right to save the center-cropped average pooled features and perform FC training on those fixed features?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.