Git Product home page Git Product logo

Comments (11)

JingyunLiang avatar JingyunLiang commented on May 6, 2024 5

Experiments are conducted on a machine with 8 Nvidia 2080 Ti GPUs. We use batch_size=32 and less total iterations to save time.

classical_sr_x2 (trained on DIV2K, patch size = 48x48) takes about 1.75 days to train for 500K iterations.

classical_sr_x4 (trained on DIV2K, patch size = 48x48) takes about 0.95 days to train for 250K iterations. Note that we fine-tune x3/x4/8 models from x2 and halve the learning rate and total training iterations, for the benefit of reducing training time.

from swinir.

Senwang98 avatar Senwang98 commented on May 6, 2024 1

@JingyunLiang
Hi, thanks for your work!
I find something strange. You use half learning rate to fine-tune BIX3-4-8, why only use half lr for training?
If you use 1e-4 to train swinIR-BIX2 and then use 1e-5 to fine-tune the BIX4, you can get much much better result(even unbelieveable result, PSNR/SSIM so high) instead of half lr=5e-5.
Is this kind of training trick cheating?(still improve 0.2 PSNR on Manga109, 0.0x improve on other image SR benchmark)

from swinir.

shengkelong avatar shengkelong commented on May 6, 2024

Thanks, so does it mean that it is difficult to train on a single card because of the memory even use 16 batchsize?

from swinir.

JingyunLiang avatar JingyunLiang commented on May 6, 2024

For a middle size SwinIR (used for classical image SR, around 12M parameters), we need about 24GB (2x12GB GPUs) for batch_size=16.
For a small size SwinIR (used for lightweight image SR, around 900k parameters), we need about 12GB(1x12GB GPUs) for batch_size=16.

from swinir.

JingyunLiang avatar JingyunLiang commented on May 6, 2024

I only use half lr (2e-4 may be too large, so we use 1e-4 for fine-tuning) for fine-tuning to save half of the training time. Fine-tuning from x2 is a common practice, such as RCAN, ECCV2018.

As for using 1e-4 to train SwinIR-BIX2, I never tried it before. Your observation is really surprising! According to my experience, the learning rate doesn't have much impact if you decrease it gradually. Maybe Transformers have different characteristics compared with CNNs in learning rate selection.

@Senwang98 Thank you for your reporting. Can you post more details here? Is it classical SR or lightweight SR? What are your PSNR values on five benchmarks? Is your network architecture identical to ours (seemodels/network_swinir.py)?

I will try it and validate your finding~~ My results will be updated here.

from swinir.

Senwang98 avatar Senwang98 commented on May 6, 2024

@JingyunLiang
Thanks for your quick reply!
The result is found on CNNs network. I am not sure the code is wrong because I used to use EDSR-pytorch repo to train my model.
Several days ago, I conducted an experiment on RCAN, I use RCAN-BIX2.pt (which is 1e-4 for training and decay half per 200000 iters), then I used 1e-5 to fine-tune the RCAN-BIX4(In this time, I didn't decay learning rate per 200000 iters. That is to say I use 1e-5 for the whole training without changing lr!)
I think you are expert in this field, so do you think this is training trick cheating?
If I change the lr when train BIX4, the result is ok. If I use a much small lr to train and don't change lr, the final result is better(For RCAN, the BIX4-Manga109 performance can improve from 31.22 to round 31.45).
Can you give me some suggestions?(I don't mean your swinIR is wrong, I just want to explain this strange thing!)
Thanks for your interesting work again, and maybe you can use siwnIR-BIX2 to fine-tune BIX4 without change lr during training!

from swinir.

JingyunLiang avatar JingyunLiang commented on May 6, 2024

It is possible for RCAN as it is a very deep network and should have strong representation ability. Better training strategy or longer training time may help.

In my view, if all other settings are the same as original RCAN (same datasets, same patch size, same training iterations, same optimizer, etc.), changing the lr could be a good trick. I think it is fair. If it is useful for all other CNNs, all future works should use this strategy from then on! However, maybe we should point the lr strategy out in the paper and do some ablation studies if we compare it with these old methods.

As for your advice on fine-tuning SwinIR-BIX4 by using fixed lr (1e-5), I will try it and keep you updated. Thank you.

from swinir.

Senwang98 avatar Senwang98 commented on May 6, 2024

@JingyunLiang
Yes, you are wright, this training setting should be reported in paper and some study should also be done to support this trick.
I will test training other cnn-based later. If it acturally works, I will tell you.
Thanks for your reply!

from swinir.

JingyunLiang avatar JingyunLiang commented on May 6, 2024
  • Update: I compared three kinds of learning rate strategies when finetuning x4 from x2 classical SR models.
Case Init LR LR_step Set5 Set14 BSD100 Urban100 Manga109
1 (used in the paper) 1e-4 [125000, 200000, 225000, 237500] (total_iter=250000) 32.72/0.9021 28.94/0.7914 27.83/0.7459 27.07/0.8164 31.67/0.9226
2 1e-5 [None] (total_iter=250000) 32.69/0.9018 28.96/0.7920 27.84/0.7463 27.07/0.8165 31.73/0.9227
3 1e-5 [None] (total_iter=500000) 32.69/0.9020 28.96/0.7918 27.84/0.7462 27.08/0.8168 31.69/0.9228
  • Conclusion: We get a PSNR improvement from -0.03 to 0.06. Using the second lr strategy (fix the lr to 1e-5 for x4 fine-tuning) is only slightly better.

from swinir.

Senwang98 avatar Senwang98 commented on May 6, 2024

@JingyunLiang
Ok, I will check. Maybe it is more useful for CNN-based model. (Alough I think this strategy should not work, the results are relly better in my repo 23333333. )

from swinir.

JingyunLiang avatar JingyunLiang commented on May 6, 2024

Feel free to open it if you have more questions.

from swinir.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.