hobbitlong / pycontrast Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of Contrastive Learning methods
PyTorch implementation of Contrastive Learning methods
thanks your paper, and I think it's a fundamental job!
But, I'm not clear that why I(y; x) = I(y; v1, v2) as given in proof in Proposition A.1. Could you please show me the reliable reason for it? Maybe, it's a naive question.
I'm hope for your reply!
Best wishes!
Thank you for sharing code for moco. I have two questions for using MOCO in a segmentation task?
In a segmentation task, the input of a network is H*W*3, and the last two feature map should be: H*W*C ---> H*W*1. The last feature map of segmentation task is two dimension rather than one dimension in classification task. How can we calculate the InfoNCE loss for this two dimension feature map?
If the feature map is one dimension, the f_q and f_k are both 1*n, and f_q*f_k.T will return one value. But if the f_q and f_k are m*n, the f_q*f_k.T will return a matrix.
In my thought, we can transform the f_q/f_k from m*n to 1*mn, and then f_q*f_k.T will return one value. Am I right? Please correct me if my thought is wrong.
In the classification task, the output should be the same for any augmented image. For a image X and a rotated image Y, the output is feature_X, feature_Y, and feature_X should be equal as feature_Y because they are the same class. In this way, feature_X*feature_Y.T should return 1.
However, if imageY=rotation(imageX), feature_Y should be rotation(feature_X). Then, feature_X*feature_Y.T should not be 1. How can we deal with this problem? Or should we inversely rotate feature_Y as irotated_feature_Y, and calculate feature_X*irotated_feature_Y.T?
Any suggestion is appreciated.
Is there a pretrained CMCv2 model available for download somewhere? I would be very interested in that. I am assuming that the CMC.pth model in Dropbox is not that one, though a CMCv2 option does exist in datasets/util.py (aug == 'E').
Thank you for the great repo!
Good job! I was wondering how much gain can Jigsaw bring ?
Hey there, first of all a huge thank you for providing the pre-trained model, that's pretty cool :) One question regarding the detection weights:
I failed to reproduce the results for rand init. 1x and 2x (all other checkpoints work fine). From the log I saw that rand init. 1x and 2x are trained in "RGB" format with the respective mean and std. The configs will default to BGR and 1.0 std which results in a box/mask AP ~0. Strangely, even if I use RGB format, mean and std I only get box/mask AP of 0.257 / 0.235
(reported 32.8 / 29.9
) for the 1x and 0.304 / 0.278
(reported 38.4 / 34.7
) for the 2x model. The rand init. 6x models seems to be trained in BGR and works fine with the config.
Do I miss something here? Maybe these are not the final checkpoints?
Thanks in advance
Hello @HobbitLong
Thanks for the great work!
We want to use your pre-trained models using resnet50 architecture in out toolkit. Would it be possible to release weights for the classifier?
Best,
Kantharaju.
Hello again, @HobbitLong.
Following the code, it looks like shuffle BN is applied in each node by the self.local_group variable, but it could be applied in all nodes.
Could you tell me the reason why you chose this way?
I guess that this may have better shuffling time without hurting performance, but if we run this code in multiple nodes with 1 gpu for each, shuffling BN might have no effect.
Hello, @HobbitLong
Thanks for the package project. Can you give me some advice when debugging the intermediate output which can help me understand better. DDP seems hard to add breakpoint.
Thanks.
Hi, @alldbi,
Thanks for your interest! The view learning
experiments will be released in a separate repo later.
Originally posted by @HobbitLong in #3 (comment)
Firstly, my apology that this repo currently contains much more beyond "InfoMin", and that the view learning
experiments are not released here. I just realized that I should have a separate repo for "InfoMin" to host fun experiments. I will do this in the future.
If you have specific question about "InfoMin" paper, you could leave it under this issue.
The function load_encoder_weights assumes the pretrained checkpoint state_dict has the structure "module.encoder.", for CMC it's "module.encoder1." and "module.encoder2.*". I verified it's not the case for MoCo and InfoMin, didn't check other models.
When trying to run main_contrast.py, I got the following error:
File "/home/someguy/anaconda2/envs/contrast/lib/python3.8/multiprocessing/spawn.py", line 183, in get_preparation_data
main_mod_name = getattr(main_module.spec, "name", None)
AttributeError: module 'main' has no attribute 'spec'
Hello,
I apologize for the newbie question.
i'm trying to do some experiment with this repo, and in particular, I'd like to try instance discrimination. I analyzed ImageFolderInstance but I'm not sure how to structure my dataset. In the specific case (instance discrimination), should I create a dataset folder which contains folders with only one image ( in other case, a single class associated to a single image, as described in the paper) or it's enough only have a folder which contains all images of my dataset and so, the instance discrimination is based directly on the index returned by ImageFolderInstance.
Just to be more clear:
First Structure:
Second Structure
Thanks for the help.
Hey,
Could you please refer to the table which lists the accuracies of the pretrained models when evaluated on Imagenet using a linear classifier?
Hello,
I am wondering if you are still updating the paper list for contrastive learning?
Best,
Jizong
Hi, thanks so much for providing the detection configs.
When I run detectron2 based on "Base-RCNN-C4-BN.yaml", the process said that "Res5ROIHeadsExtraNorm" has not been registered?
And I carefully read the documents of detectron2,
it has 'Res5ROIHeads', but does not have 'Res5ROIHeadsExtraNorm'.
Does Res5ROIHeads == Res5ROIHeadsExtraNorm?
Moco v2 report 800epoch results with batch-size 256. But if we train moco 800 epoch with only one node, it is very time-consuming. So have you re-run moco v2 with multi-nodes?
I noticed that you shuffle bn in each local node. I also implement a local shuffle bn before like your implementation. But I can not make sure in multi-nodes scenario, if there will be performance dropping by using linear lr&batch_size strategy and local shuffle bn strategy?
How to run with a single GPU? Or two GPUs in one server?
Hi,
I wondered why the magnitude of lr in linear evaluation stage varies greatly? I think SimCLR is a similar method as MoCo and CMC, which all use contrastive loss to train the unsupervised network. But the lr in linear evaluation stage of moco and cmc are 30, simclr is 0.1.
Refer to your previous answer in CMC project, add a non-parametes bn before fc in linear evaluation stage can adjust lr from 30 to a normal magnitude. If simclr use similar strategies? I check the code of official simclr, but I can not find such operations.
Thanks !
Are details or code of the toy example in the paper available? Its a very illustrative example and a good starting point to test some intuitions.
Thanks for sharing this nice work!
Since it seems that currently no license is specified, would you mind updating license for this repo?
Hello, @HobbitLong
I have succeeded in training with cifar-10 dataset. And now I train with these methods on the custom dataset.
For example, when I apply MoCo method, I change the length of the queue and other parameters. I find that it is very hard to judge the convergence of the model.
The contrastive accuracy can reach a very high level(nearly 95%), and the loss value is nearly stable. However, when I use linear evaluation, the top-1 accuracy is only 50%. I find the contrastive accuracy rate has not direct relationship with the linear evaluation performance.
This means, Sometimes when I use larger queue, and the contrastive accuracy is smaller than the smaller queue length. However, the performance on linear evaluation is that the larger queue is better than a smaller queue.
Can you give me some reason that why the contrastive accuracy is so high but the linear evaluation performance is low value? Or Is there anything that can help me solve this problem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.