swintransformer / transformer-ssl Goto Github PK

View Code? Open in Web Editor NEW

618.0 618.0 66.0 1.19 MB

This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Home Page: https://arxiv.org/abs/2105.04553

License: MIT License

Python 100.00%

self-supervised-learning swin-transformer transformer

transformer-ssl's People

Contributors

Stargazers

Watchers

transformer-ssl's Issues

Will apex mixed precision training affect the accuracy of the model?

Thank you very much for this great paper. I would like to ask, will apex mixed precision training affect the accuracy of the model?
I tried to install using the instructions in the ‘get_started.md ’file but failed, and chose cd apex, python setup.py install.

pretrain fails when image categories are similar

i want to use your work to perform few epochs pretrain on my dataset,which contains sevceral similar vehicle categories.
So i load the imagenet-pretrained checkpoint and run another pretrain on my dataset,and it fails cause the loss not falling。
What's the reason?

Have you tried any other initial patch size in the swin transformer apart from the patch size = 4?

Hello dear authors,
Thank you for providing your work and code.

I understand from your paper that you used patch size = 4 in all your models, is there any specific reason to do that?
Did you try any larger patch sizes to begin with like 8 or 16? This reduces the flops significantly.

I am trying to further compress your network for my application and I was able to successfully do it for patch size = 4 but I was unable to retrain the model with patch size = 8 since I don't see any model with that size.

Any comments or suggestions would be really helpful.

Thank you!

config setting "NORM_BEFORE_MLP" takes no effect

in models/build.py:41,
the keyword passed to partial(SwinTransformer,...) is norm_befor_mlp,
while the keyword in SwinTransformer (models/swin_transformer.py:497) is norm_before_mlp.
The formmer missed a letter E compared with the latter.

Therefore, the 'bn' setting in configs takes no effect.

Download links of DeiT-S and Swin-T backbone models are interchanged

Download link of DeiT-S model:
https://github.com/SwinTransformer/storage/releases/download/v1.0.3/moby_swin_t_300ep_pretrained.pth

Download link of Swin-T model:
https://github.com/SwinTransformer/storage/releases/download/v1.0.3/moby_deit_small_300ep_pretrained.pth

Look at the last part of the download link. I think the model links should be interchanged.

I am training chuxian on a local server with only one GPU error

script command：
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 moby_main.py --cfg configs/moby_swin_tiny.yaml --data-path ucm:8:2 --batch-size 4 --output output --tag job-tag

TypeError: 'Compose' object is not iterable

Traceback (most recent call last):
File "moby_linear.py", line 385, in
main(config)
File "moby_linear.py", line 174, in main
train_one_epoch(config, model, criterion, data_loader_train, optimizer, epoch, mixup_fn, lr_scheduler)
File "moby_linear.py", line 199, in train_one_epoch
for idx, (samples, targets) in enumerate(data_loader):
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/haoxing/Transformer-SSL/data/custom_image_folder.py", line 24, in getitem
for t in self.transform:
TypeError: 'Compose' object is not iterable

Strange output log

Hi authors, I have pretrianed your moby_swin_tiny model using 8 Tesla V100 GPU
and reproduced your results in downstream task. I get 74.394% on linear evaluation and 43.1% on COCO object detection task, 39.3% on COCO segmentation task. But the loss and grad_norm is really weired during training. Can you show me your log?
Here is my log. The loss drops to 7 and then rises to 16, then never drop again. During the pretraining task, the grad norm average value sometimes rises to infinite.
log_rank0.txt

Train MoBy-SwinT on local machine with one GPU

I am gonna train MoBY-SwinT on my custom dataset.
My machine has one GPU.
I tried some but failed and faced following errors. All packages are installed.

First try
Second try

What is the correct command to run the training script on local machine with one GPU?

Thanks in advance.

Multi-machine training

Thanks for your work!
As shown in the markdown file, we can now pretrain Transformer-SSL via 8 GPUs and 1 node.
Do you have scripts for multi-machine training? I want to pretrain it via 64 GPUs on 8 machines.

Cannot import vit_deit_small

from timm.models import vit_deit_small_patch16_224
ImportError: cannot import name 'vit_deit_small_patch16_224' from 'timm.models' (/home/michuan.lh/miniconda3/envs/moby/lib/python3.7/site-packages/timm/models/init.py)

Thanks for your work. When I run your code, I got an error that cannot import vit_deit_small_patch16_224.

How to load a checkpoint when using the swin transformer as the backbone in a Mask-RCNN model

Hi there,
I've already trained my swin transformer with your proposed SSL method and have the checkpoints saved.
I'm now trying to load my model as the backbone of a mask-rcnn model (also your mmdetection implementation from the other repository). However, I'm getting the following error.

KeyError: 'encoder.layers.0.blocks.0.attn.relative_position_bias_table'

I guess that just requires a naming conversion. I was wondering if you have the script to do so for all layers?
Thanks,

The interpolation method for BYOL augmentation is wrong

Under Transformer-SSL/data/build.py, inside the "build_transform" function, under "byol" augmentation type, the interpolation method used in RandomResizedCrop is the default which is BILINEAR, however in the BYOL paper the author used BICUBIC

300 epoch DeiT-S and Swin-T checkpoints dowload links are same

hope to check it

Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to xxxx

start cmd

imagenetpath=mypath
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345  moby_main.py \
       --cfg configs/moby_swin_tiny.yaml --data-path ${imagenetpath} --batch-size 256

but get the Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to xxxx error

^[[32m[2023-10-24 17:33:21 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][290/625]  eta 0:05:52 lr 0.002772 time 0.5567 (1.0516)    loss 10.5960 (10.9174)  grad_norm 1.4802 (1.5236)       mem 45716MB^[[32m[2023-10-24 17:33:38 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][300/625]  eta 0:05:47 lr 0.002785 time 0.7607 (1.0707)    loss 10.7823 (10.9141)  grad_norm 2.3465 (1.5536)       mem 45716MB^[[32m[2023-10-24 17:33:45 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][310/625]  eta 0:05:33 lr 0.002797 time 0.9247 (1.0588)    loss 10.9386 (10.9140)  grad_norm 3.8597 (1.6136)       mem 45716MBGradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0
^[[32m[2023-10-24 17:33:53 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][320/625]  eta 0:
05:20 lr 0.002810 time 0.5590 (1.0518)    loss 11.4219 (10.9264)  grad_norm 3.9233 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:00 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][330/625]  eta 0:
05:07 lr 0.002823 time 0.5751 (1.0412)    loss 11.6204 (10.9487)  grad_norm 2.7699 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:09 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][340/625]  eta 0:
04:55 lr 0.002836 time 0.5561 (1.0365)    loss 11.2880 (10.9609)  grad_norm 2.3273 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:16 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][350/625]  eta 0:
04:42 lr 0.002849 time 0.5530 (1.0271)    loss 11.0601 (10.9651)  grad_norm 0.9230 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:23 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][360/625]  eta 0:
04:30 lr 0.002861 time 0.5628 (1.0200)    loss 10.9609 (10.9669)  grad_norm 0.8707 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:30 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][370/625]  eta 0:
04:17 lr 0.002874 time 0.5648 (1.0094)    loss 10.9728 (10.9655)  grad_norm 1.9388 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:36 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][380/625]  eta 0:
04:04 lr 0.002887 time 0.5568 (0.9993)    loss 10.8801 (10.9645)  grad_norm 0.6718 (inf)  mem 45716MB

AttributeError: TRAINING_IMAGES

Traceback (most recent call last):
File "main.py", line 347, in
main(config)
File "main.py", line 80, in main
model = build_model(config)
File "/home/featurize/work/STSL/models/build.py", line 65, in build_model
pred_num_layers=config.MODEL.MOBY.PRED_NUM_LAYERS,
File "/home/featurize/work/STSL/models/moby.py", line 77, in init
self.K = int(self.cfg.DATA.TRAINING_IMAGES * 1. / dist.get_world_size() / self.cfg.DATA.BATCH_SIZE) * self.cfg.TRAIN.EPOCHS
File "/environment/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 141, in getattr
raise AttributeError(name)
AttributeError: TRAINING_IMAGES

hallo everone！How to solve this problem?

Question about the detection/segmentation results

Hi there,
Congrats for the nice work and thanks for providing the code.
I have a question about the experiments you conducted on downstream tasks (detection and segmentation).
For the detection/segmentation results reported in Table 3, did you perform SSL on ImageNet-1K and then use the models as backbones and simply train on COCO? No SSL on COCO data, right?

And if so, could that be a reason why the MoBY model is not outperforming the supervised model?
What I'm trying to understand is if we can expect a model which is SSL-trained on a large unannotated data, and then trained on the downstream tasks on a portion of the same data (which is labeled) to perform significantly better than a model which is solely trained in a supervised fashion on the annotated portion? Any insight is appreciated.

Best,

dataloader error

When I used moby_main for training, Linux memory grew until it crashed. What is the reason and how to solve it

The error is:
Traceback (most recent call last):
File "moby_main.py", line 236, in
main(config)
File "moby_main.py", line 121, in main
train_one_epoch(config, model, data_loader_train, optimizer, epoch, lr_scheduler)
File "moby_main.py", line 151, in train_one_epoch
scaled_loss.backward()
File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 2605) is killed by signal: Killed.

Some questions about relative_position_index and attn_mask

Wonderful job! I recently read you code and have some questions in Swin model which is shown in swin_transformer.py. Concretely, I can't understand the calculation formula of relative_position_index and attn_mask. Is there anything I can refer to or can you explain them?

swintransformer / transformer-ssl Goto Github PK

transformer-ssl's People

Contributors

Stargazers

Watchers

Forkers

transformer-ssl's Issues

Recommend Projects

Recommend Topics

Recommend Org