Git Product home page Git Product logo

transformer-ssl's People

Contributors

ancientmooner avatar caoyue10 avatar impiga avatar kamalkraj avatar microsoftopensource avatar zdaxie avatar zeliu98 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

transformer-ssl's Issues

pretrain fails when image categories are similar

i want to use your work to perform few epochs pretrain on my dataset,which contains sevceral similar vehicle categories.
So i load the imagenet-pretrained checkpoint and run another pretrain on my dataset,and it fails cause the loss not falling。
What's the reason?

Have you tried any other initial patch size in the swin transformer apart from the patch size = 4?

Hello dear authors,
Thank you for providing your work and code.

I understand from your paper that you used patch size = 4 in all your models, is there any specific reason to do that?
Did you try any larger patch sizes to begin with like 8 or 16? This reduces the flops significantly.

I am trying to further compress your network for my application and I was able to successfully do it for patch size = 4 but I was unable to retrain the model with patch size = 8 since I don't see any model with that size.

Any comments or suggestions would be really helpful.

Thank you!

config setting "NORM_BEFORE_MLP" takes no effect

in models/build.py:41,
the keyword passed to partial(SwinTransformer,...) is norm_befor_mlp,
while the keyword in SwinTransformer (models/swin_transformer.py:497) is norm_before_mlp.
The formmer missed a letter E compared with the latter.

Therefore, the 'bn' setting in configs takes no effect.

TypeError: 'Compose' object is not iterable

Traceback (most recent call last):
File "moby_linear.py", line 385, in
main(config)
File "moby_linear.py", line 174, in main
train_one_epoch(config, model, criterion, data_loader_train, optimizer, epoch, mixup_fn, lr_scheduler)
File "moby_linear.py", line 199, in train_one_epoch
for idx, (samples, targets) in enumerate(data_loader):
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/haoxing/.conda/envs/chx/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/haoxing/Transformer-SSL/data/custom_image_folder.py", line 24, in getitem
for t in self.transform:
TypeError: 'Compose' object is not iterable

Strange output log

Hi authors, I have pretrianed your moby_swin_tiny model using 8 Tesla V100 GPU
and reproduced your results in downstream task. I get 74.394% on linear evaluation and 43.1% on COCO object detection task, 39.3% on COCO segmentation task. But the loss and grad_norm is really weired during training. Can you show me your log?
Here is my log. The loss drops to 7 and then rises to 16, then never drop again. During the pretraining task, the grad norm average value sometimes rises to infinite.
log_rank0.txt

Train MoBy-SwinT on local machine with one GPU

I am gonna train MoBY-SwinT on my custom dataset.
My machine has one GPU.
I tried some but failed and faced following errors. All packages are installed.

  • First try
    screenshot_33

  • Second try
    screenshot_34

What is the correct command to run the training script on local machine with one GPU?

Thanks in advance.

Multi-machine training

Thanks for your work!
As shown in the markdown file, we can now pretrain Transformer-SSL via 8 GPUs and 1 node.
Do you have scripts for multi-machine training? I want to pretrain it via 64 GPUs on 8 machines.

Cannot import vit_deit_small

from timm.models import vit_deit_small_patch16_224
ImportError: cannot import name 'vit_deit_small_patch16_224' from 'timm.models' (/home/michuan.lh/miniconda3/envs/moby/lib/python3.7/site-packages/timm/models/init.py)

Thanks for your work. When I run your code, I got an error that cannot import vit_deit_small_patch16_224.

How to load a checkpoint when using the swin transformer as the backbone in a Mask-RCNN model

Hi there,
I've already trained my swin transformer with your proposed SSL method and have the checkpoints saved.
I'm now trying to load my model as the backbone of a mask-rcnn model (also your mmdetection implementation from the other repository). However, I'm getting the following error.

KeyError: 'encoder.layers.0.blocks.0.attn.relative_position_bias_table'

I guess that just requires a naming conversion. I was wondering if you have the script to do so for all layers?
Thanks,

The interpolation method for BYOL augmentation is wrong

Under Transformer-SSL/data/build.py, inside the "build_transform" function, under "byol" augmentation type, the interpolation method used in RandomResizedCrop is the default which is BILINEAR, however in the BYOL paper the author used BICUBIC

Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to xxxx

start cmd

imagenetpath=mypath
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345  moby_main.py \
       --cfg configs/moby_swin_tiny.yaml --data-path ${imagenetpath} --batch-size 256

but get the Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to xxxx error

^[[32m[2023-10-24 17:33:21 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][290/625]  eta 0:05:52 lr 0.002772 time 0.5567 (1.0516)    loss 10.5960 (10.9174)  grad_norm 1.4802 (1.5236)       mem 45716MB^[[32m[2023-10-24 17:33:38 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][300/625]  eta 0:05:47 lr 0.002785 time 0.7607 (1.0707)    loss 10.7823 (10.9141)  grad_norm 2.3465 (1.5536)       mem 45716MB^[[32m[2023-10-24 17:33:45 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][310/625]  eta 0:05:33 lr 0.002797 time 0.9247 (1.0588)    loss 10.9386 (10.9140)  grad_norm 3.8597 (1.6136)       mem 45716MBGradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 65536.0
^[[32m[2023-10-24 17:33:53 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][320/625]  eta 0:
05:20 lr 0.002810 time 0.5590 (1.0518)    loss 11.4219 (10.9264)  grad_norm 3.9233 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:00 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][330/625]  eta 0:
05:07 lr 0.002823 time 0.5751 (1.0412)    loss 11.6204 (10.9487)  grad_norm 2.7699 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:09 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][340/625]  eta 0:
04:55 lr 0.002836 time 0.5561 (1.0365)    loss 11.2880 (10.9609)  grad_norm 2.3273 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:16 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][350/625]  eta 0:
04:42 lr 0.002849 time 0.5530 (1.0271)    loss 11.0601 (10.9651)  grad_norm 0.9230 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:23 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][360/625]  eta 0:
04:30 lr 0.002861 time 0.5628 (1.0200)    loss 10.9609 (10.9669)  grad_norm 0.8707 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:30 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][370/625]  eta 0:
04:17 lr 0.002874 time 0.5648 (1.0094)    loss 10.9728 (10.9655)  grad_norm 1.9388 (inf)  mem 45716MB
^[[32m[2023-10-24 17:34:36 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][380/625]  eta 0:
04:04 lr 0.002887 time 0.5568 (0.9993)    loss 10.8801 (10.9645)  grad_norm 0.6718 (inf)  mem 45716MB

AttributeError: TRAINING_IMAGES

Traceback (most recent call last):
File "main.py", line 347, in
main(config)
File "main.py", line 80, in main
model = build_model(config)
File "/home/featurize/work/STSL/models/build.py", line 65, in build_model
pred_num_layers=config.MODEL.MOBY.PRED_NUM_LAYERS,
File "/home/featurize/work/STSL/models/moby.py", line 77, in init
self.K = int(self.cfg.DATA.TRAINING_IMAGES * 1. / dist.get_world_size() / self.cfg.DATA.BATCH_SIZE) * self.cfg.TRAIN.EPOCHS
File "/environment/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 141, in getattr
raise AttributeError(name)
AttributeError: TRAINING_IMAGES

hallo everone!How to solve this problem?

Question about the detection/segmentation results

Hi there,
Congrats for the nice work and thanks for providing the code.
I have a question about the experiments you conducted on downstream tasks (detection and segmentation).
For the detection/segmentation results reported in Table 3, did you perform SSL on ImageNet-1K and then use the models as backbones and simply train on COCO? No SSL on COCO data, right?

And if so, could that be a reason why the MoBY model is not outperforming the supervised model?
What I'm trying to understand is if we can expect a model which is SSL-trained on a large unannotated data, and then trained on the downstream tasks on a portion of the same data (which is labeled) to perform significantly better than a model which is solely trained in a supervised fashion on the annotated portion? Any insight is appreciated.

Best,

dataloader error

When I used moby_main for training, Linux memory grew until it crashed. What is the reason and how to solve it

The error is:
Traceback (most recent call last):
File "moby_main.py", line 236, in
main(config)
File "moby_main.py", line 121, in main
train_one_epoch(config, model, data_loader_train, optimizer, epoch, lr_scheduler)
File "moby_main.py", line 151, in train_one_epoch
scaled_loss.backward()
File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 2605) is killed by signal: Killed.

Some questions about relative_position_index and attn_mask

Wonderful job! I recently read you code and have some questions in Swin model which is shown in swin_transformer.py. Concretely, I can't understand the calculation formula of relative_position_index and attn_mask. Is there anything I can refer to or can you explain them?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.