facebookresearch / vicregl Goto Github PK

View Code? Open in Web Editor NEW

222.0 222.0 25.0 739 KB

VICRegL official code base

License: Other

Python 100.00%

vicregl's People

Contributors

Stargazers

Watchers

vicregl's Issues

fine tune vicregl to coco object detection

Thank you for your excellent contributions and ideas. I have attempted to fine-tune the checkpoint you provided to downstream detection tasks. Using VICReg, I achieved an mAP of 38.0, but with VICRegL, I got 35.5. I followed the common practices and used Detectron2's R_50_FPN_1x.yaml for training. I would like to ask if these results are reasonable?

Would you like to further discuss the specifics of the downstream task and the performance differences between VICReg and VICRegL?

Missing keys in ConvNeXt state dict

Hello, thanks for this wonderful paper and code/model !

I tried to load one the models but it seems some keys are missing from the state dict of ConvNeXt (resnet50 if working):
Here's the full error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [21], in <cell line: 1>()
----> 1 get_convnext_small_alpha0p9()

Input In [20], in get_convnext_small_alpha0p9()
    126 def get_convnext_small_alpha0p9():
--> 127     model = torch.hub.load('facebookresearch/vicregl', 'convnext_base_alpha0p9')
    128     return model

File ~/.pyenv/versions/3.8.10/envs/tvmealvisionapi/lib/python3.8/site-packages/torch/hub.py:404, in load(repo_or_dir, model, source, force_reload, verbose, skip_validation, *args, **kwargs)
    401 if source == 'github':
    402     repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation)
--> 404 model = _load_local(repo_or_dir, model, *args, **kwargs)
    405 return model

File ~/.pyenv/versions/3.8.10/envs/tvmealvisionapi/lib/python3.8/site-packages/torch/hub.py:433, in _load_local(hubconf_dir, model, *args, **kwargs)
    430 hub_module = _import_module(MODULE_HUBCONF, hubconf_path)
    432 entry = _load_entry_from_hubconf(hub_module, model)
--> 433 model = entry(*args, **kwargs)
    435 sys.path.remove(hubconf_dir)
    437 return model

File ~/.cache/torch/hub/facebookresearch_vicregl_main/hubconf.py:67, in convnext_base_alpha0p9(pretrained, **kwargs)
     62 if pretrained:
     63     state_dict = torch.hub.load_state_dict_from_url(
     64         url="https://dl.fbaipublicfiles.com/vicregl/convnext_base_alpha0.9.pth",
     65         map_location="cpu",
     66     )
---> 67     model.load_state_dict(state_dict, strict=True)
     68 return model

File ~/.pyenv/versions/3.8.10/envs/tvmealvisionapi/lib/python3.8/site-packages/torch/nn/modules/module.py:1497, in Module.load_state_dict(self, state_dict, strict)
   1492         error_msgs.insert(
   1493             0, 'Missing key(s) in state_dict: {}. '.format(
   1494                 ', '.join('"{}"'.format(k) for k in missing_keys)))
   1496 if len(error_msgs) > 0:
-> 1497     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
   1498                        self.__class__.__name__, "\n\t".join(error_msgs)))
   1499 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for ConvNeXt:
	Missing key(s) in state_dict: "stages.0.0.gamma", "stages.0.1.gamma", "stages.0.2.gamma", "stages.1.0.gamma", "stages.1.1.gamma", "stages.1.2.gamma", "stages.2.0.gamma", "stages.2.1.gamma", "stages.2.2.gamma", "stages.2.3.gamma", "stages.2.4.gamma", "stages.2.5.gamma", "stages.2.6.gamma", "stages.2.7.gamma", "stages.2.8.gamma", "stages.2.9.gamma", "stages.2.10.gamma", "stages.2.11.gamma", "stages.2.12.gamma", "stages.2.13.gamma", "stages.2.14.gamma", "stages.2.15.gamma", "stages.2.16.gamma", "stages.2.17.gamma", "stages.2.18.gamma", "stages.2.19.gamma", "stages.2.20.gamma", "stages.2.21.gamma", "stages.2.22.gamma", "stages.2.23.gamma", "stages.2.24.gamma", "stages.2.25.gamma", "stages.2.26.gamma", "stages.3.0.gamma", "stages.3.1.gamma", "stages.3.2.gamma".

Plots of losses for different configurations

Hi! Thanks for the amazing work!

I wanted to ask if you shared or are willing to share plots of the training loss for the different configs? Would be really useful for gaining some intuition behind VICRegL training.

High local loss while finetuning resnet50

Hi Adrien,
I am trying to finetune resnet50 model using the complete checkpoint (excluding the classifier) on an e-commerce image dataset.
I am seeing very high variance, covariance & invariance term (100x) for local loss as compared to the logs you have shared in the repo. Also, the global loss terms are within the same range if I compare with it with your logs.

Additional Info: Since I am finetuning, I am using the same imagenet mean & std in transforms.

I wanted your opinion on why could this happen:

There's something wrong with the local loss computation
Needs more # epochs to finetune ( I think won't help wrt local loss terms)
augmentations are very diff b/w pretrained model you shared and the codebase

This is the command line args passed to start finetuning:
VICRegL main_vicregl.py --local_rank=0 --fp16 --exp-dir ./resnet_finetuning/ --arch resnet50 --epochs 30 --batch-size 512 --optimizer adamw --base-lr 0.00075 --weight-decay 1e-06 --size-crops 224 --num-crops 2 --min_scale_crops 0.08 --max_scale_crops 1.0 --alpha 0.75

Repos logs for resnet50_075alpha at 30th epoch and 300th epoch:
{"ep": 29, "st": 18732, "lr": 1.5814172938771736, "t": "000017683", "stdr": 0.0154, "stde": 0.042409, "corr": 0.008171, "core": 0.000214, "inv_l": 3.825874, "std_l": 9.492188, "cov_l": 2.068358, "minv_l": 0.213284, "mvar_l": 0.053986, "mcov_l": 0.137415, "cls_l": 2.682755, "top1": 42.2363, "top5": 67.4316, "l": 22.093506}

{"ep": 299, "st": 187665, "lr": 0.0016000029958416245, "t": "000176214", "stdr": 0.016162, "stde": 0.007431, "corr": 0.008438, "core": 0.000113, "inv_l": 3.110444, "std_l": 3.189453, "cov_l": 3.890673, "minv_l": 0.07154, "mvar_l": 0.042328, "mcov_l": 0.203255, "cls_l": 1.158083, "top1": 72.6074, "top5": 89.1113, "l": 14.522144}

My finetuning log at 30th epoch:
{"ep": 29, "st": 26650, "lr": 1.5092537293760234e-06, "t": "000031628", "stdr": 0.01549, "stde": 0.011048, "corr": 0.00642, "core": 1e-06, "inv_l": 1.40463, "var_l": 9.71875, "cov_l": 3.121696, "total_loss": 15.580208, "minv_l": 2.408733, "mvar_l": 8.0, "mcov_l": 9.176867}

I am concerned about minv_l, mvar_l and mcov_l terms

Looking forward to know your thoughts.
Thank you!

Symmetry on local and feature-based loss functions

Hello,

Thank you for your work!

I'm raising an issue because, in your paper, it is said that both location and feature-based loss functions are symmetrized (Equation 4). But it seems to be different on the code.

I think a + sign is missing in the local_loss function of the class VICRegL. In the local_loss function, you're not updating but overwriting the inv, var and cov loss. Then you divide them by the number of iterations.

Best regards,
Elias.

Missing maps_projection in the checkpoint and batch_norm in the code

Hey guys,

There's no 'maps_projection' layer in resnet50_alpha0p75 checkpoint, while the source code has it. Also, when l load the checkpoint, the global projection layer has batch norm layers but the code misses it.

Here's the diff:
state dict from the checkpoint
module.projector.0.weight module.projector.0.bias module.projector.1.weight module.projector.1.bias module.projector.1.running_mean module.projector.1.running_var module.projector.1.num_batches_tracked module.projector.3.weight module.projector.3.bias module.projector.4.weight module.projector.4.bias module.projector.4.running_mean module.projector.4.running_var module.projector.4.num_batches_tracked module.projector.6.weight

state dict from the repo's global and local projection layer:
module.maps_projector.0.weight module.maps_projector.0.bias module.maps_projector.1.weight module.maps_projector.1.bias module.maps_projector.3.weight module.maps_projector.3.bias module.maps_projector.4.weight module.maps_projector.4.bias module.maps_projector.6.weight module.projector.0.weight module.projector.0.bias module.projector.1.weight module.projector.1.bias module.projector.3.weight module.projector.3.bias module.projector.4.weight module.projector.4.bias module.projector.6.weight

Please share the complete checkpoint, if possible

Thank you!

Dimension of global and local

Other than the factor of hardware limitations, I would like to inquire why, in the paper, a larger dimension of 8192 is used for global while a smaller dimension of 512 is chosen for local? And where the vicreg do avg pooling

Do you really use the top-k filtering on feature-based matching?

Hello, very much appreciate that you made this great work open-sourced!

I was wondering do you use the top-k filtering on feature-based matching in practice? It's very clear that you use it on location-based matching, I can see that from both the paper and the code, but there is an inconsistency in feature-based matching that you claim in the paper you use the top-k filtering but in the training settings you keep the arg "l2_all_matching" as default which equals to 1(True) and no filtering if so.

Looking forward to your ideas!

Need seg pretrained model

When can the pretrained model be published？

Request for Change in Licensing Terms

Hello,

I'm truly impressed by the work you've done. It fits the use-case that I'm interested in for a project I'm currently working on. However, I noticed that the project is currently licensed under the CC-BY-NC License, which does not permit commercial use.

I understand and respect the choice to use this license, but I'm wondering if there might be a possibility of re-evaluating the license, perhaps to a version that allows for commercial use.

Thank you for your time and consideration.

No 'InterpolationMode' in torchvision 0.8.2

Hi,
See the previous issue from the vicreg repo (facebookresearch/vicreg#19).

If I'm correct, torchvision.transforms.InterpolationMode has only been introduced in 0.9.0 version. Thus, the requirement mentioned in the README of torchvision>=0.8.2 can lead to an ImportError.

where were they from to views and locations? were they from f"{prefix}_images.npy"?

(val_view, (views, locations)), labels = inputs

Linear segmentation evaluation

Hello,

Thank you for your work. Amazing results on both classification and dense prediction tasks!

You have opened the evaluation code for the classification task. Are you planning to open your linear segmentation evaluation code on the Pascal Voc dataset so we can easily reproduce your results?

Sincerely,
Shirley

README typo

leanring => learning :)

Unable to reproduce object detection results on Pascal VOC

Dear authors, thanks for your great work and for sharing the codes!

The object detection result on the Pascal VOC you reported is AP 59.5 . In your paper, you say that you follow DenseCL and fine-tune a Faster R-CNN detector (C4-backbone) on the Pascal VOC trainval07+12 set with standard 2x schedule and test on the VOC test2007 set (Fine-tune in the Table).
I downloaded your pretrained model resnet50_alpha0.75.pth, followed the instructions given in DenseCL website (https://github.com/WXinlong/DenseCL/tree/main/benchmarks/detection), and trained the object detection network using "pascal_voc_R_50_C4_24k_moco.yaml". But I only got AP 27 (When I downloaded DenseCL pretrained model and trained using "pascal_voc_R_50_C4_24k_moco.yaml", I got AP 58 which is similar to the reported results in DenseCL paper).
Do you use the training configure file "pascal_voc_R_50_C4_24k_moco.yaml" for your training or different training parameters? Could you please provide some guidance to reproduce your result? Thanks.

Does this have a requirements file?

Classification loss

@Adrien987k thank you for the work and the implementation.

I have noticed that you are also using classification loss, in addition to the VICReg and VICRegL losses. The classification loss is not mentioned in the paper and was also not used in the original VICReg.

On the first hand, it seems a bit off, given that you won't have labels available when pretraining in the wild and it destroys the purpose of "self-supervised" pre-training ?

facebookresearch / vicregl Goto Github PK

vicregl's People

Contributors

Stargazers

Watchers

Forkers

vicregl's Issues

Recommend Projects

Recommend Topics

Recommend Org