Git Product home page Git Product logo

swav's Introduction

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

This code provides a PyTorch implementation and pretrained models for SwAV (Swapping Assignments between Views), as described in the paper Unsupervised Learning of Visual Features by Contrasting Cluster Assignments.

SwAV Illustration

SwAV is an efficient and simple method for pre-training convnets without using annotations. Similarly to contrastive approaches, SwAV learns representations by comparing transformations of an image, but unlike contrastive methods, it does not require to compute feature pairwise comparisons. It makes our framework more efficient since it does not require a large memory bank or an auxiliary momentum network. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or “views”) of the same image, instead of comparing features directly. Simply put, we use a “swapped” prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data.

Model Zoo

We release several models pre-trained with SwAV with the hope that other researchers might also benefit by replacing the ImageNet supervised network with SwAV backbone. To load our best SwAV pre-trained ResNet-50 model, simply do:

import torch
model = torch.hub.load('facebookresearch/swav:main', 'resnet50')

We provide several baseline SwAV pre-trained models with ResNet-50 architecture in torchvision format. We also provide models pre-trained with DeepCluster-v2 and SeLa-v2 obtained by applying improvements from the self-supervised community to DeepCluster and SeLa (see details in the appendix of our paper).

method epochs batch-size multi-crop ImageNet top-1 acc. url args
SwAV 800 4096 2x224 + 6x96 75.3 model script
SwAV 400 4096 2x224 + 6x96 74.6 model script
SwAV 200 4096 2x224 + 6x96 73.9 model script
SwAV 100 4096 2x224 + 6x96 72.1 model script
SwAV 200 256 2x224 + 6x96 72.7 model script
SwAV 400 256 2x224 + 6x96 74.3 model script
SwAV 400 4096 2x224 70.1 model script
DeepCluster-v2 800 4096 2x224 + 6x96 75.2 model script
DeepCluster-v2 400 4096 2x160 + 4x96 74.3 model script
DeepCluster-v2 400 4096 2x224 70.2 model script
SeLa-v2 400 4096 2x160 + 4x96 71.8 model -
SeLa-v2 400 4096 2x224 67.2 model -

Larger architectures

We provide SwAV models with ResNet-50 networks where we multiply the width by a factor ×2, ×4, and ×5. To load the corresponding backbone you can use:

import torch
rn50w2 = torch.hub.load('facebookresearch/swav:main', 'resnet50w2')
rn50w4 = torch.hub.load('facebookresearch/swav:main', 'resnet50w4')
rn50w5 = torch.hub.load('facebookresearch/swav:main', 'resnet50w5')
network parameters epochs ImageNet top-1 acc. url args
RN50-w2 94M 400 77.3 model script
RN50-w4 375M 400 77.9 model script
RN50-w5 586M 400 78.5 model -

Running times

We provide the running times for some of our runs:

method batch-size multi-crop scripts time per epoch
SwAV 4096 2x224 + 6x96 * * * * 3min40s
SwAV 256 2x224 + 6x96 * * 52min10s
DeepCluster-v2 4096 2x160 + 4x96 * 3min13s

Running SwAV unsupervised training

Requirements

Singlenode training

SwAV is very simple to implement and experiment with. Our implementation consists in a main_swav.py file from which are imported the dataset definition src/multicropdataset.py, the model architecture src/resnet50.py and some miscellaneous training utilities src/utils.py.

For example, to train SwAV baseline on a single node with 8 gpus for 400 epochs, run:

python -m torch.distributed.launch --nproc_per_node=8 main_swav.py \
--data_path /path/to/imagenet/train \
--epochs 400 \
--base_lr 0.6 \
--final_lr 0.0006 \
--warmup_epochs 0 \
--batch_size 32 \
--size_crops 224 96 \
--nmb_crops 2 6 \
--min_scale_crops 0.14 0.05 \
--max_scale_crops 1. 0.14 \
--use_fp16 true \
--freeze_prototypes_niters 5005 \
--queue_length 3840 \
--epoch_queue_starts 15

Multinode training

Distributed training is available via Slurm. We provide several SBATCH scripts to reproduce our SwAV models. For example, to train SwAV on 8 nodes and 64 GPUs with a batch size of 4096 for 800 epochs run:

sbatch ./scripts/swav_800ep_pretrain.sh

Note that you might need to remove the copyright header from the sbatch file to launch it.

Set up dist_url parameter: We refer the user to pytorch distributed documentation (env or file or tcp) for setting the distributed initialization method (parameter dist_url) correctly. In the provided sbatch files, we use the tcp init method (see * for example).

Evaluating models

Evaluate models: Linear classification on ImageNet

To train a supervised linear classifier on frozen features/weights on a single node with 8 gpus, run:

python -m torch.distributed.launch --nproc_per_node=8 eval_linear.py \
--data_path /path/to/imagenet \
--pretrained /path/to/checkpoints/swav_800ep_pretrain.pth.tar

The resulting linear classifier can be downloaded here.

Evaluate models: Semi-supervised learning on ImageNet

To reproduce our results and fine-tune a network with 1% or 10% of ImageNet labels on a single node with 8 gpus, run:

  • 10% labels
python -m torch.distributed.launch --nproc_per_node=8 eval_semisup.py \
--data_path /path/to/imagenet \
--pretrained /path/to/checkpoints/swav_800ep_pretrain.pth.tar \
--labels_perc "10" \
--lr 0.01 \
--lr_last_layer 0.2
  • 1% labels
python -m torch.distributed.launch --nproc_per_node=8 eval_semisup.py \
--data_path /path/to/imagenet \
--pretrained /path/to/checkpoints/swav_800ep_pretrain.pth.tar \
--labels_perc "1" \
--lr 0.02 \
--lr_last_layer 5

Evaluate models: Transferring to Detection with DETR

DETR is a recent object detection framework that reaches competitive performance with Faster R-CNN while being conceptually simpler and trainable end-to-end. We evaluate our SwAV ResNet-50 backbone on object detection on COCO dataset using DETR framework with full fine-tuning. Here are the instructions for reproducing our experiments:

  1. Install detr and prepare COCO dataset following these instructions.

  2. Apply the changes highlighted in this gist to detr backbone file in order to load SwAV backbone instead of ImageNet supervised weights.

  3. Launch training from detr repository with run_with_submitit.py.

python run_with_submitit.py --batch_size 4 --nodes 2 --lr_backbone 5e-5

Common Issues

For help or issues using SwAV, please submit a GitHub issue.

The loss does not decrease and is stuck at ln(nmb_prototypes) (8.006 for 3000 prototypes).

It sometimes happens that the system collapses at the beginning and does not manage to converge. We have found the following empirical workarounds to improve convergence and avoid collapsing at the beginning:

  • use a lower epsilon value (--epsilon 0.03 instead of the default 0.05)
  • carefully tune the hyper-parameters
  • freeze the prototypes during first iterations (freeze_prototypes_niters argument)
  • switch to hard assignment
  • remove batch-normalization layer from the projection head
  • reduce the difficulty of the problem (less crops or softer data augmentation)

We now analyze the collapsing problem: it happens when all examples are mapped to the same unique representation. In other words, the convnet always has the same output regardless of its input, it is a constant function. All examples gets the same cluster assignment because they are identical, and the only valid assignment that satisfy the equipartition constraint in this case is the uniform assignment (1/K where K is the number of prototypes). In turn, this uniform assignment is trivial to predict since it is the same for all examples. Reducing epsilon parameter (see Eq(3) of our paper) encourages the assignments Q to be sharper (i.e. less uniform), which strongly helps avoiding collapse. However, using a too low value for epsilon may lead to numerical instability.

Training gets unstable when using the queue.

The queue is composed of feature representations from the previous batches. These lines discard the oldest feature representations from the queue and save the newest one (i.e. from the current batch) through a round-robin mechanism. This way, the assignment problem is performed on more samples: without the queue we assign B examples to num_prototypes clusters where B is the total batch size while with the queue we assign (B + queue_length) examples to num_prototypes clusters. This is especially useful when working with small batches because it improves the precision of the assignment.

If you start using the queue too early or if you use a too large queue, this can considerably disturb training: this is because the queue members are too inconsistent. After introducing the queue the loss should be lower than what it was without the queue. On the following loss curve (30 first epochs of this script) we introduced the queue at epoch 15. We observe that it made the loss go more down.

SwAV training loss batch_size=256 during the first 30 epochs

If when introducing the queue, the loss goes up and does not decrease afterwards you should stop your training and change the queue parameters. We recommend (i) using a smaller queue, (ii) starting the queue later in training.

License

See the LICENSE file for more details.

See also

PyTorch Lightning Bolts: Implementation by the Lightning team.

SwAV-TF: A TensorFlow re-implementation.

Citation

If you find this repository useful in your research, please cite:

@article{caron2020unsupervised,
  title={Unsupervised Learning of Visual Features by Contrasting Cluster Assignments},
  author={Caron, Mathilde and Misra, Ishan and Mairal, Julien and Goyal, Priya and Bojanowski, Piotr and Joulin, Armand},
  booktitle={Proceedings of Advances in Neural Information Processing Systems (NeurIPS)},
  year={2020}
}

swav's People

Contributors

mathildecaron31 avatar nzw0301 avatar robbiejones96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swav's Issues

How to use the checkpoint after training?

I tried to train swav with a small dataset, and I got these generated files:

  • checkpoints
  • stats0.pkl
  • params.pkl
  • train.log

If I have the model after training how can I use it? how to assign an unseen image to one of those clusters and how to retrieve images from the same cluster?

I used this command for training:

python -m torch.distributed.launch --nproc_per_node=1 main_swav.py \
--data_path pics1 \
--epochs 5 \
--base_lr 0.6 \
--final_lr 0.0006 \
--warmup_epochs 0 \
--batch_size 32 \
--size_crops 224 96 \
--nmb_crops 2 6 \
--min_scale_crops 0.14 0.05 \
--max_scale_crops 1. 0.14 \
--use_fp16 true \
--freeze_prototypes_niters 5005 \
--queue_length 3840 \
--epoch_queue_starts 15

num_prototype

Hi, thanks for your excellent work!
I have a question about num_prototype in deepclustering V2. What does the num_prototype mean?
why num_prototype can bigger than class_num?
Thanks!

How is it ensured that only full resolution views are used for code computation?

Referring to this section of the paper:

image

In the code, this part is supposedly handled with crops_for_assign:

for i, crop_id in enumerate(args.crops_for_assign):
            with torch.no_grad():
                out = output[bs * crop_id: bs * (crop_id + 1)]

                # time to use the queue
                if queue is not None:
                    if use_the_queue or not torch.all(queue[i, -1, :] == 0):
                        use_the_queue = True
                        out = torch.cat((torch.mm(
                            queue[i],
                            model.module.prototypes.weight.t()
                        ), out))
                    # fill the queue
                    queue[i, bs:] = queue[i, :-bs].clone()
                    queue[i, :bs] = embedding[crop_id * bs: (crop_id + 1) * bs]
                # get assignments
                q = torch.exp(out / args.epsilon).t()
                q = distributed_sinkhorn(q, args.sinkhorn_iterations)[-bs:]

I am not sure how this indexing out = output[bs * crop_id: bs * (crop_id + 1)] ensures we are only operating on full resolution views (224/160)?

problems in run the eval_linear.py with the pretrained swav model

Hi, thanks for your excellent work! I meet some problems when I run the codes.
Firstly,I train the swav model with the command python -m torch.distributed.launch --nproc_per_node=2 main_swav.py ...,and the model parameters saved in the checkpoint.pth.tar. But when I run the eval_linear.py with the pretrained swav model with the command python -m torch.distributed.launch --nproc_per_node=2 eval_linear.py --pretrained checkpoint.pth.tar,I meet some errors,the logs are:

Traceback (most recent call last):
  File "/home/yc/codes/swav/src/utils.py", line 144, in restart_from_checkpoint
    msg = value.load_state_dict(checkpoint[key], strict=False)
TypeError: load_state_dict() got an unexpected keyword argument 'strict'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "eval_linear.py", line 397, in <module>
    main()
  File "eval_linear.py", line 201, in main
    scheduler=scheduler,
  File "/home/yc/codes/swav/src/utils.py", line 147, in restart_from_checkpoint
    msg = value.load_state_dict(checkpoint[key])
  File "/home/yc/anaconda3/envs/tf2/lib/python3.6/site-packages/torch/optim/optimizer.py", line 123, in load_state_dict
    raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
Traceback (most recent call last):
  File "/home/yc/anaconda3/envs/tf2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/yc/anaconda3/envs/tf2/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/yc/anaconda3/envs/tf2/lib/python3.6/site-packages/torch/distributed/launch.py", line 261, in <module>
    main()
  File "/home/yc/anaconda3/envs/tf2/lib/python3.6/site-packages/torch/distributed/launch.py", line 257, in main
    cmd=cmd)

Does it means that there are some errors when the optimizer restore from the checkpoints? Could you help me,thanks!

Pre-trained models (256 batch size, 200 epochs) without multi-crops

Hi, thanks for your excellent work! Could you kindly release the model and results pre-trained with batch size of 256 for 200 epochs without multi-crops? I am asking because this seems a commonly used configuration in the literature but it is missing both in the paper and the repo. Some researchers also raised this issue before but it seems that it has not be resolved. Due to the limited computing resources, I think releasing this model would help a lot. Thank you very much!

Experiments of linear classification on different numbers of GPUs

Hello,

Thanks for your inspiring paper and code.

I trained SwAV with a batch size of 4096 for 200 epochs and then trained a linear classifier with your default setting (batch size of 256 on 8 GPUs), achieving 74.5% top-1 accuracies. I wanted to fasten the linear classifier training process, so I tried to train it with a batch size of 2048 on 64 GPUs and left all the other settings the same. I observed 73.3% in terms of top-1, showing a slight drop from your default setting.

So I am wondering how to train the linear classifier on 64 GPUs and achieve similar performance as training on 8 GPUs, e.g. tuning some hyper-parameters? Looking forward to your reply.

Thanks

Demo running on colab

So I was trying to get this working as a prototype on google colab. I installed apex, and when I run

python -m torch.distributed.launch main_swav.py \
--data_path /content/data/fer/images \
--epochs  20 \
--base_lr 0.6 \
--final_lr 0.0006 \
--warmup_epochs 0 \
--batch_size 32 \
--size_crops 48 48 \
--use_fp16 true \
--freeze_prototypes_niters 5005 \
--queue_length 0 \
--epoch_queue_starts 15

I get this error :

Traceback (most recent call last):
  File "main_swav.py", line 375, in <module>
    main()
  File "main_swav.py", line 123, in main
    init_distributed_mode(args)
  File "/content/swav/src/utils.py", line 65, in init_distributed_mode
    rank=args.rank,
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 391, in init_process_group
    init_method, rank, world_size, timeout=timeout
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/rendezvous.py", line 79, in rendezvous
    raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for ://
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'main_swav.py', '--local_rank=0', '--data_path', '/content/data/fer/images', '--epochs', '20', '--base_lr', '0.6', '--final_lr', '0.0006', '--warmup_epochs', '0', '--batch_size', '32', '--size_crops', '48', '48', '--use_fp16', 'true', '--freeze_prototypes_niters', '5005', '--queue_length', '0', '--epoch_queue_starts', '15']' returned non-zero exit status 1.

Thinking this might be related to the python -m torch.distributed.launch \ because obviously I am not using a distributed computing environment, I try to change it to maybe torch.launch which does not obviously work.

Can I get any help ? Thanks.

Pre-trained checkpoint for larger architecture?

Hi! Thanks for the sharing of SWAV! I just wonder that do you have any follow up plans on releasing the pre-trained weight for larger models (like Res50X4, Res152X4), which might provide great help for researchers, as it might be too computational resources demanding for us to re-train it :(

Again, thanks for your work very much :)

[resnet50w2]Size mismatch between checkpoint model and config model

I tried to load checkpoint downloaded from resnet50w2 to do some experiments, but an error occurred. It seems the model you published doesn't match the config model in resnet.py for resnet50w2.

size mismatch for module.layer1.0.conv1.weight: copying a param with shape torch.Size([128, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]). size mismatch for module.layer1.0.bn1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for module.layer1.0.bn1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for module.layer1.0.bn1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for module.layer1.0.bn1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for module.layer1.0.conv2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for module.layer1.0.bn2.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).

The script is swav_RN50w2_400ep_pretrain.sh and the checkpoint is resnet50w2.

The learning rate of linear classification

Thanks for your awesome work.
I wonder why the learning rate is so small in linear classification(0.3 in eval_linear.py)?
In the linear classification of MoCo, the initial learning rate is 30 with a two-stage reduction. There is a 100x difference with this repo.
Have you ever run the eval_linear.py with moco v2 weights or run swav weights with the code from MoCo?
I wonder about the performance impact of the lr.

Empty clusters?

Hi @mathildecaron31

I trained a network from scratch with my own dataset and wrote some code that sorts images in different folders regarding their cluster assignments. I did this with the following lines of code:

embedding, output = model(inputs)
p = softmax(output / args.temperature)
prediction = p.tolist()
prototyp = []
for i in range(len(prediction)): 
      prototyp.append(np.argmax(prediction[i]))

The problem is that when I save the images in different folders regarding their cluster assignment, some folders remain empty. The number of folders is the same as the number of prototypes. I always thought that the images are equally distributed between the different prototypes. What is the problem? Can you help me?

How to perform multinode training with torch.distributed.launch?

Hi, nice work! I tried to do pretraining with main_swav.py on multiple machines.

Here's the main code for distributed training.

python -m torch.distributed.launch main_swav.py --rank 0 \
--world_size 8 \
--dist_url 'tcp://172.31.11.200:23456' \

I comment the line 55-59 in src/utils.py in order to set ranks for each machine. It is okay to run.

But I found that during training, on each machine, only 1 GPU was used. I think it is caused by

args.gpu_to_work_on = args.rank % torch.cuda.device_count()
. Could you help me figure it out?

Many thanks!

Benchmarking on CIFAR-10

Hi,

I wanted to benchmark SwAV on CIFAR-10.
Is there any recommended configuration for CIFAR-10? For eg:

  • The number of prototypes could be set to 50, 100 etc.
  • Since CIFAR-10 images are 32x32, multicrop can be avoided.

Also, do you plan to publish any pretrained model on CIFAR-10?

Question regarding the license

Hi!
I wonder if I can use swav internally in a commercial company? We do not charge end-users directly, but of course the company is for profit and it's profit may increase due to usage of DL models.
As the name of the license suggests, I can't use it, but would like to clarify.
Thanks!

Non-distributed training

If I run main_deepclusterv2.py in a non-distributed training mode, what modifications do I need to make?

Traceback (most recent call last):
File "main_deepclusterv2.py", line 426, in
main()
File "main_deepclusterv2.py", line 119, in main
init_distributed_mode(args)
File "/remote_projects/ImageSimilarity/swav/src/utils.py", line 56, in init_distributed_mode
args.rank = int(os.environ["RANK"])
File "/root/software/anaconda3/envs/similarity/lib/python3.6/os.py", line 669, in getitem
raise KeyError(key) from None
KeyError: 'RANK'

Cannot load the pretrained models

Hi, I ran into a problem when I tried to load the pretrained resnet-50 model. It seems that the keys in the pre-trained model and keys in the torchvision resnet-50 are not the same. The same problem appears when I tried to load other models listed on the Model Zoo table. Could you please help me with this issue? Thanks.

Here is my code:
import torch, torchvision
model = torchvision.models.resnet50()
checkpoint = torch.load('.user/swav_800ep_pretrain.pth.tar')
model.load_state_dict(checkpoint, strict=False)

when I set strict=False, the model does not load any weights and act like a random initialized model.
when I set strict=True, it will raise error as following:

RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict:
"conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "layer1.0.conv1.weight", "layer1.0.bn1.weight", "layer1.0.bn1.bias", "layer1.0.bn1.running_mean", "layer1.0.bn1.running_var",
......
"layer4.2.bn2.running_mean", "layer4.2.bn2.running_var", "layer4.2.conv3.weight", "layer4.2.bn3.weight", "layer4.2.bn3.bias", "layer4.2.bn3.running_mean", "layer4.2.bn3.running_var", "fc.weight", "fc.bias".
Unexpected key(s) in state_dict:
"module.conv1.weight", "module.bn1.weight", "module.bn1.bias", "module.bn1.running_mean", "module.bn1.running_var", "module.bn1.num_batches_tracked", "module.layer1.0.conv1.weight", "module.layer1.0.bn1.weight", "module.layer1.0.bn1.bias", "module.layer1.0.bn1.running_mean",
......
"module.projection_head.0.weight", "module.projection_head.0.bias", "module.projection_head.1.weight", "module.projection_head.1.bias", "module.projection_head.1.running_mean", "module.projection_head.1.running_var", "module.projection_head.1.num_batches_tracked", "module.projection_head.3.weight", "module.projection_head.3.bias", "module.prototypes.weight".

Generate embedding without GPU

Hi, I love the project a lot, thanks for sharing it!

However, for some reason, the GPU is not available to me. I have downloaded the pretrained model, and my ultimate goal is to get some visual embedding from the pretrained model, so I just wondering if there is any easy way to do the following thing: If I input a single image into the model, the model would output a corresponding embedding. Note that I don't expect to re-train or fine-tune the model, so maybe it's still possible to do the job without using a GPU.

Also, if this is not too dumb, could you please specify the input/output size? Thank you very much!

Multi-Crop on MoCo

Good job! Thanks for sharing the code. However I was wondering how much gain can Multi-Crop bring on MoCo ? Have you tried it?

A design problem which may lead model to cheat

Hi
The algorithm design: view A --> code x, view B ---> code y, then let view B predict code x and view A predicts code y.
But in some experiments (CIFAR_10 dataset) , found that, the model will learn to cheat by predicting nearly same embeddings(z in paper) for all images including their augmentations. In this way the loss will decrease rapidly, but model learns wrong.

Deepclusterv2: why always freeze the prototypes during training?

Hi,
Thanks for your nice work!

I notice in your code that

if iteration < args.freeze_prototypes_niters:

The freeze_prototypes_niters is set to 300000(deepclusterv2_800ep_pretrain.sh) which is equal to 1000 epochs when the batch size is 4096. It seems to never update the parameter of prototypes. I think the prototypes are as same as the fully connected layer in standard classification tasks which should be optimized. So why always freeze them in your code?

Training time is too long and how to accelerating training?

Hi, a wonderful work and thanks for sharing your code! Now i run your code on ImageNet following your setting, but I found it takes about one hour and a half for training just one epoch,which is too slow. So I want to know if I was missing some key points which are import for speeding up training?

Finetuning with 1% and 10%

Hi,

Thanks for sharing this awesome repo.

I was wondering in the eval script how is the finetuning on 1% and 10% of imagenet done?

Here it looks like the entire folder is used for the dataset:

train_dataset = datasets.ImageFolder(os.path.join(args.data_path, "train"))

Was 1% and 10% of train imagenet preselected and placed into separate folders?

Choice of dimension of hidden variable

Hi, I'm curious about how you choose 2048 as the dimension in the projection layer, it seems that ResNet in this repo will output the tensor with 512 channels.

If my Resnet has to output tensor with 256 channels, do you think I need to decrease it from 2048?

Thx

fixed seed, but no reproducability

Hi Mathilde,
Thanks for your great work. I enjoyed reading your paper!

When running main_swav.py, I experience no reproducibility of the results (although the seeds are set nicely in utils.fix_random_seeds).

RUN1:
INFO - 12/07/20 09:38:11 - 0:00:06 - Epoch: [0][0] Loss 3.5037 (3.5037)
INFO - 12/07/20 09:38:30 - 0:00:25 - Epoch: [0][50] Loss 2.9354 (3.0861)

RUN2:
INFO - 12/07/20 09:37:31 - 0:00:06 - Epoch: [0][0] Loss 3.5037 (3.5037)
INFO - 12/07/20 09:37:51 - 0:00:25 - Epoch: [0][50] Loss 2.9074 (3.0710)

Do you experience the same? If yes, do you have a clue why that is the case (maybe distributed training)?

Thanks in advance!

SwAV with 100 256 2x224 + 6x96 ?

can you provide a model / results of training swav with bs 256, input size 2x224+6x96 for 100 epochs?

The training time is too long.

Display clustering results?

Hi,
how can I display clustering results? When I forward an image through a pretrained network, I get a vector of numbers with the length of the number of prototype vectors. Do I have to pass this vector to the distributed_sinkhorn function?
The distributed_sinkhorn function returns the probabilities for every cluster, is this correct?

prototype norm problems

Hi @mathildecaron31
As you said,
In the paper, the prototypes are indeed normalized along the first dimension because the prototypes matrix, C, is of dimension DxK (i.e, 128x3000).

On the contrary, in the code, w is of dimension KxD (i.e, 3000x128). You can easily check that the normalization is done correctly by printing:

print(torch.norm(w, dim=1).shape) # should give 3000
print(torch.norm(w, dim=1)) # should give a vector with 1 everywhere

But the the code goes:
self.prototypes = nn.Linear(output_dim, nmb_prototypes, bias=False)
which output_dim = 128 , nmb_prototypes = 3000
So the w is of dimension D * K (128 * 3000), which means there are 3000 prototypes and each prototype is a vector with dim 128

Thanks

About the pretrained model RN50-w5

Thank you for sharing this wonderful work.

I conducted several experiments using the released pretrained models. However, the pretrained resnet50w5 is failed to be loaded because the batchnorm layer of the projection_head is missing in the pretrained models.

So could I just ignore this batchnorm layer when using the pretrained resnet50w5?

Thanks very much!

RuntimeError: No rendezvous handler for ://

Hello

I am trying to train a custom dataset.

I am trying to train in an environment where there is one gpu.

What's the problem?

Also, can you provide a tutorial for testing on a custom dataset?

export NGPU=1; python -m torch.distributed.launch --nproc_per_node=$NGPU main_swav.py --data_path /home/ubuntu/merge/src/swav/data/train --epochs 400 --base_lr 0.6 --final_lr 0.0006 --warmup_epochs 0 --batch_size 32 --size_crops 224 96 --nmb_crops 2 6 --min_scale_crops 0.14 0.05 --max_scale_crops 1. 0.14 --use_fp16 true --freeze_prototypes_niters 5005 --queue_length 3840 --epoch_queue_starts 15
Traceback (most recent call last):
File "main_swav.py", line 374, in
main()
File "main_swav.py", line 122, in main
init_distributed_mode(args)
File "/home/ubuntu/merge/src/swav/src/utils.py", line 65, in init_distributed_mode
rank=args.rank,
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 391, in init_process_group
init_method, rank, world_size, timeout=timeout
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 79, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for ://
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/envs/swav/bin/python', '-u', 'main_swav.py', '--local_rank=0', '--data_path', '/home/ubuntu/merge/src/swav/data/train', '--epochs', '400', '--base_lr', '0.6', '--final_lr', '0.0006', '--warmup_epochs', '0', '--batch_size', '32', '--size_crops', '224', '96', '--nmb_crops', '2', '6', '--min_scale_crops', '0.14', '0.05', '--max_scale_crops', '1.', '0.14', '--use_fp16', 'true', '--freeze_prototypes_niters', '5005', '--queue_length', '3840', '--epoch_queue_starts', '15']' returned non-zero exit status 1.

main_deepclusterv2.py use

HI, I would like to use the main_deepclusterv2.py clustering new images without fine-tuning. How can I use the main_deepclusterv2.py to implement the project of deepcluster(eval_voc_classif_fc6_8.sh)? Thank you!

Problem about data_path in training code ?

main_swav.py
the dataset folder is used args.data_path which is the path about imagenet dataset root path (contain train, val, test)

train_dataset = MultiCropDataset(
        args.data_path,
        args.size_crops,
        args.nmb_crops,
        args.min_scale_crops,
        args.max_scale_crops,
    )

eval_linear.py
train_dataset = datasets.ImageFolder(os.path.join(args.data_path, "train"))

in main_swav.py, if set the args.data_path=/path/to/imagenet, it will use the ( train, val, test ) to do self supervised pretraining , am i right ?

Deepclusterv2:How to dispaly the results?

I have used main_deepclusterv2.py train a model on my custom dataset, I want to see the results of cluster. For example, put the images in same class into the same folder. How to do that? Thank you.

DeepCluster-v2 linear evaluation for fewer epochs

Thanks for sharing your code of such a wonderful work. Have you done experiments using DeepCluster-v2 for fewer training epochs, e.g. 100 or 200 epochs ? If so, can you provide the linear evaluation top-1 acc. for such settings? Many thanks.

How to download a model with projection head

Hellow. Thanks for the work. As far as I can see pretrained models in the end have fully counted dim =1000 . Shouldn't the projection head = 128 be there? I want to get embeddings , hiw to do this? Do you have appropriate pretraibed model with 128 x projection head?

Training loss history

Hi,

Thank you so much for sharing your codes!

May I know if you have a copy of your loss record?

When I trained your model from scratch, the loss was stacked around 8 for the first 2 epochs. (I am still training the model)

Is it the same for you?

Thank you.

TypeError: optimizers must be either a single optimizer or a list of optimizers.

Hello,

I'm trying to run main_swav.py with the following command:

python -m torch.distributed.launch --nproc_per_node=1 main_swav.py --images_path=<path to data directory> --train_annotations_path <path to data file> --epochs 400 --base_lr 0.6 --final_lr 0.0006 --warmup_epochs 0 --batch_size 32 --size_crops 224 96 --nmb_crops 2 6 --min_scale_crops 0.14 0.05 --max_scale_crops 1. 0.14 --use_fp16 true --freeze_prototypes_niters 5005 --queue_length 3840 --epoch_queue_starts 15

Some of those parameters have been added to accommodate our data. The only changes I have made to the code are minor changes to the dataset and additional/changed arguments. When I run this command I get the following error:

`Traceback (most recent call last):
File "main_swav.py", line 380, in
main()
File "main_swav.py", line 189, in main
model, optimizer = apex.amp.initialize(model, optimizer, opt_level="O1")
File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 158, in _initialize
raise TypeError("optimizers must be either a single optimizer or a list of optimizers.")
TypeError: optimizers must be either a single optimizer or a list of optimizers.

Traceback (most recent call last):
File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'main_swav.py', '--local_rank=0', '--images_path=/data/computer_vision_projects/rare_planes/classification_data/images/', '--train_annotations_path', '/data/computer_vision_projects/rare_planes/classification_data/annotations/instances_train_role_mislabel_category_id_033_chipped.json', '--epochs', '400', '--base_lr', '0.6', '--final_lr', '0.0006', '--warmup_epochs', '0', '--batch_size', '32', '--size_crops', '224', '96', '--nmb_crops', '2', '6', '--min_scale_crops', '0.14', '0.05', '--max_scale_crops', '1.', '0.14', '--use_fp16', 'true', '--freeze_prototypes_niters', '5005', '--queue_length', '3840', '--epoch_queue_starts', '15']' returned non-zero exit status 1.
make: *** [Makefile:69: train-rare-planes] Error 1`

Immediately before the line that throws the error I placed a couple print statements:
print("type(OPTIMIZER)", type(optimizer)) print("OPTIMIZER", optimizer)

The output from those is:
type(OPTIMIZER) <class 'apex.parallel.LARC.LARC'> OPTIMIZER SGD ( Parameter Group 0 dampening: 0 lr: 0.6 momentum: 0.9 nesterov: False weight_decay: 1e-06 )

Here are some version numbers I'm using:
Python 3.6.9 :: Anaconda, Inc. PyTorch == 1.5.0a0+8f84ded torchvision == 0.6.0a0 CUDA == 10.2 apex == 0.1

Any ideas why I would be seeing this error? Thanks in advance!

Warning from Training

Great work. When I trained the model, in each epoch, it will print a few lines of "Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to ***". I didn't change anything. Is this normal? Thanks

Open sourcing imagenet-trained models

Thanks for open sourcing your codebase. Would it be possible to share the final model corresponding to Imagenet downstream task that gets 75.3% top-1 accuracy? Thanks in advance

prototype normalize bugs

Hi,
In paper, the pseudo-code shows the prototypes are normalized along the first dimension:

    **with torch.no_grad():
       C = normalize(C, dim=0, p=2)**

But in the source code, the prototypes are normalized along the second dimension:

      **# normalize the prototypes
      with torch.no_grad():
         w = model.module.prototypes.weight.data.clone()
         w = nn.functional.normalize(w, dim=1, p=2)
         model.module.prototypes.weight.copy_(w)**
     
Since the column of the prototypes is regarded as one cluster,  the prototype should be normalized along the first dimension( dim = 0) ? 
Thanks

checkpoint loading failed

Hi,

when I try to load one of your provided checkpoint models in your main_sway.py file, I always receive the warning:
WARNING - 01/05/21 11:22:46 - 0:00:09 - => failed to load optimizer from checkpoint ...
WARNING - 01/05/21 11:22:46 - 0:00:09 - => failed to load amp from checkpoint ...
WARNING - 01/05/21 11:22:46 - 0:00:09 - => failed to load state_dict from checkpoint ...

What is the reason for the warnings? Isn't it possible to use your provided models for finetuning?

Some problems about swav experiment

I'd like to ask a few questions.

  1. due to the limitation of GPU resources, I can only use a single GPU to run swav experiments. In this case, what needs to be adjusted in the setting of experimental parameters? Will the performance of the pre-training model decrease significantly?

  2. How many instances are needed at least in order to get a relatively good pre-training effect?

  3. With regard to the model superparameter args.nmb_prototypes, if the actual categories of the custom dataset are few (far less than 1k), is it necessary to make corresponding adjustments?

  4. In line 371 of the file main_swav.py, why does args.world_size appear in the code but not in the pseudo-code in the article?

Thanks again for being able to open source this code. I am looking forward to your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.