hobbitlong / pycontrast Goto Github PK

View Code? Open in Web Editor NEW

1.9K 1.9K 185.0 163 KB

PyTorch implementation of Contrastive Learning methods

Python 100.00%

pycontrast's People

Contributors

Stargazers

Watchers

Forkers

forks-learning sidney1994 jingmouren waynemoon yyht woojunepark aashish24 deniseduma megayeye cometyang abdelpakey houwenxin sccbhxc kiminh napkin-dl kanglicheng cequencer dongzhuoyao ets-research-repositories tarunn2799 youtang1993 ziqizh xrosliang daveishan ajaykrishnan23 xxchenxx ha-lins neuroailab star0071 ming1993li fadouakhm mldl kantharajucn zlapp normster x-lai sean-bin-yang lironghuo jdlc105 jesperkers lxy1993 endeavour10020 y-kanan guang000 godofpdog wayne980 zezhoucheng penghu-cs emigmo dabaier xuchensjtu nsokhand lkhphuc zdstandup yurongchen1998 tcwltcwl tahak zhiyuanchen qiming-zou howardchenhd ahamadshaik dongkuanx27 zeta1999 pickleyang ankitshah009 haoyz sakastlord wutianyirosun christophalt pkuqgg qsong4 rj-t ablustrund yuejiangliu djx1995 aribido-oluwaseun hologerry layccg cswaynecool timothyxxx chrisbyd a3dgroupatcsu limbo0000 mgsong lidaiqing whrws chenchy israrbacha phillint pengchen233 xiaohui9607 linusericsson yangsenwxy niuchuangnn sthalles crazy-jack xiaocui3737 flamato joshlyman galbya

pycontrast's Issues

Proposition A.1

thanks your paper, and I think it's a fundamental job!
But, I'm not clear that why I(y; x) = I(y; v1, v2) as given in proof in Proposition A.1. Could you please show me the reliable reason for it? Maybe, it's a naive question.
I'm hope for your reply!
Best wishes!

How to use MOCO for a segmentation task?

Thank you for sharing code for moco. I have two questions for using MOCO in a segmentation task?

In a segmentation task, the input of a network is H*W*3, and the last two feature map should be: H*W*C ---> H*W*1. The last feature map of segmentation task is two dimension rather than one dimension in classification task. How can we calculate the InfoNCE loss for this two dimension feature map?
If the feature map is one dimension, the f_q and f_k are both 1*n, and f_q*f_k.T will return one value. But if the f_q and f_k are m*n, the f_q*f_k.T will return a matrix.
In my thought, we can transform the f_q/f_k from m*n to 1*mn, and then f_q*f_k.T will return one value. Am I right? Please correct me if my thought is wrong.
In the classification task, the output should be the same for any augmented image. For a image X and a rotated image Y, the output is feature_X, feature_Y, and feature_X should be equal as feature_Y because they are the same class. In this way, feature_X*feature_Y.T should return 1.
However, if imageY=rotation(imageX), feature_Y should be rotation(feature_X). Then, feature_X*feature_Y.T should not be 1. How can we deal with this problem? Or should we inversely rotate feature_Y as irotated_feature_Y, and calculate feature_X*irotated_feature_Y.T?

Any suggestion is appreciated.

CMCv2

Is there a pretrained CMCv2 model available for download somewhere? I would be very interested in that. I am assuming that the CMC.pth model in Dropbox is not that one, though a CMCv2 option does exist in datasets/util.py (aug == 'E').

Thank you for the great repo!

How much gain can Jigsaw bring?

Good job! I was wondering how much gain can Jigsaw bring ?

Where is the code for view learning?

Can't reproduce results for detection weights rand init. 1x and 2x

Hey there, first of all a huge thank you for providing the pre-trained model, that's pretty cool :) One question regarding the detection weights:

I failed to reproduce the results for rand init. 1x and 2x (all other checkpoints work fine). From the log I saw that rand init. 1x and 2x are trained in "RGB" format with the respective mean and std. The configs will default to BGR and 1.0 std which results in a box/mask AP ~0. Strangely, even if I use RGB format, mean and std I only get box/mask AP of 0.257 / 0.235 (reported 32.8 / 29.9) for the 1x and 0.304 / 0.278 (reported 38.4 / 34.7) for the 2x model. The rand init. 6x models seems to be trained in BGR and works fine with the config.

Do I miss something here? Maybe these are not the final checkpoints?
Thanks in advance

classifier weights for pre-trained models

Hello @HobbitLong

Thanks for the great work!
We want to use your pre-trained models using resnet50 architecture in out toolkit. Would it be possible to release weights for the classifier?

Best,
Kantharaju.

Why shuffle BN in local group?

Hello again, @HobbitLong.
Following the code, it looks like shuffle BN is applied in each node by the self.local_group variable, but it could be applied in all nodes.
Could you tell me the reason why you chose this way?
I guess that this may have better shuffling time without hurting performance, but if we run this code in multiple nodes with 1 gpu for each, shuffling BN might have no effect.

Debug advice？

Hello， @HobbitLong
Thanks for the package project. Can you give me some advice when debugging the intermediate output which can help me understand better. DDP seems hard to add breakpoint.
Thanks.

It's 2021 now, where is the code for view learning?

Hi, @alldbi,

Thanks for your interest! The view learning experiments will be released in a separate repo later.

Originally posted by @HobbitLong in #3 (comment)

For question and feedback about "InfoMin", please leave it here.

Firstly, my apology that this repo currently contains much more beyond "InfoMin", and that the view learning experiments are not released here. I just realized that I should have a separate repo for "InfoMin" to host fun experiments. I will do this in the future.

If you have specific question about "InfoMin" paper, you could leave it under this issue.

Problem loading provided CMC weights for training linear classifier

The function load_encoder_weights assumes the pretrained checkpoint state_dict has the structure "module.encoder.", for CMC it's "module.encoder1." and "module.encoder2.*". I verified it's not the case for MoCo and InfoMin, didn't check other models.

module 'main' has no attribute 'spec'

When trying to run main_contrast.py, I got the following error:
File "/home/someguy/anaconda2/envs/contrast/lib/python3.8/multiprocessing/spawn.py", line 183, in get_preparation_data
main_mod_name = getattr(main_module.spec, "name", None)
AttributeError: module 'main' has no attribute 'spec'

How to structure dataset?

Hello,
I apologize for the newbie question.
i'm trying to do some experiment with this repo, and in particular, I'd like to try instance discrimination. I analyzed ImageFolderInstance but I'm not sure how to structure my dataset. In the specific case (instance discrimination), should I create a dataset folder which contains folders with only one image ( in other case, a single class associated to a single image, as described in the paper) or it's enough only have a folder which contains all images of my dataset and so, the instance discrimination is based directly on the index returned by ImageFolderInstance.
Just to be more clear:
First Structure:

Dataset
- image_1_folder
  -image_1.jpg
- image_2_folder
  -image_2.jpg
- image_3_folder
  -image_3.jpg

Second Structure

Dataset
-image_1.jpg
-image_2.jpg
-image_3.jpg

Thanks for the help.

Question about Jigsaw?

Is it correct that grid_size is always img_size / 3?

model accuracies?

Hey,
Could you please refer to the table which lists the accuracies of the pretrained models when evaluated on Imagenet using a linear classifier?

Still update the paper list?

Hello,

I am wondering if you are still updating the paper list for contrastive learning?

Best,
Jizong

Pixel Mean and Std calculation

How are the pixel means and std calculated?

I trained InfoMin on a different dataset and am planning on transferring it to a downstream task.

Configs issues about "Res5ROIHeadsExtraNorm"

Hi, thanks so much for providing the detection configs.

When I run detectron2 based on "Base-RCNN-C4-BN.yaml", the process said that "Res5ROIHeadsExtraNorm" has not been registered?

And I carefully read the documents of detectron2,
it has 'Res5ROIHeads', but does not have 'Res5ROIHeadsExtraNorm'.

Does Res5ROIHeads == Res5ROIHeadsExtraNorm?

Can you repeat moco_v2 800epoch results(71.1) with multi-nodes(8 nodes, each nodes with 8 gpus)

Moco v2 report 800epoch results with batch-size 256. But if we train moco 800 epoch with only one node, it is very time-consuming. So have you re-run moco v2 with multi-nodes？

I noticed that you shuffle bn in each local node. I also implement a local shuffle bn before like your implementation. But I can not make sure in multi-nodes scenario, if there will be performance dropping by using linear lr&batch_size strategy and local shuffle bn strategy?

How to run with a single GPU?

How to run with a single GPU? Or two GPUs in one server?

Different LR between Moco, CMC and SimCLR in linear evaluation stage

Hi,

I wondered why the magnitude of lr in linear evaluation stage varies greatly？ I think SimCLR is a similar method as MoCo and CMC, which all use contrastive loss to train the unsupervised network. But the lr in linear evaluation stage of moco and cmc are 30, simclr is 0.1.

Refer to your previous answer in CMC project, add a non-parametes bn before fc in linear evaluation stage can adjust lr from 30 to a normal magnitude. If simclr use similar strategies? I check the code of official simclr, but I can not find such operations.

Thanks !

loss not changing

I am training mocov2, it seems that the loss will keep increasing or not changing. My model is resnet50, the image size is 512, the random cropping is 384, and the queue size is 2000.Any good adjustment suggestions，thanks

Does any paper construct the contrast pair between the shallow features and the deep features?

The Toy Example

Are details or code of the toy example in the paper available? Its a very illustrative example and a good starting point to test some intuitions.

License request

Thanks for sharing this nice work!
Since it seems that currently no license is specified, would you mind updating license for this repo?

The relationship between contrastive accuracy and linear evaluation performance?

Hello, @HobbitLong
I have succeeded in training with cifar-10 dataset. And now I train with these methods on the custom dataset.

For example, when I apply MoCo method, I change the length of the queue and other parameters. I find that it is very hard to judge the convergence of the model.
The contrastive accuracy can reach a very high level(nearly 95%), and the loss value is nearly stable. However, when I use linear evaluation, the top-1 accuracy is only 50%. I find the contrastive accuracy rate has not direct relationship with the linear evaluation performance.
This means, Sometimes when I use larger queue, and the contrastive accuracy is smaller than the smaller queue length. However, the performance on linear evaluation is that the larger queue is better than a smaller queue.

Can you give me some reason that why the contrastive accuracy is so high but the linear evaluation performance is low value? Or Is there anything that can help me solve this problem.