Comments (11)
Experiments are conducted on a machine with 8 Nvidia 2080 Ti GPUs. We use batch_size=32 and less total iterations to save time.
classical_sr_x2
(trained on DIV2K, patch size = 48x48) takes about 1.75 days to train for 500K iterations.
classical_sr_x4
(trained on DIV2K, patch size = 48x48) takes about 0.95 days to train for 250K iterations. Note that we fine-tune x3/x4/8 models from x2 and halve the learning rate and total training iterations, for the benefit of reducing training time.
from swinir.
@JingyunLiang
Hi, thanks for your work!
I find something strange. You use half learning rate to fine-tune BIX3-4-8, why only use half lr for training?
If you use 1e-4 to train swinIR-BIX2 and then use 1e-5 to fine-tune the BIX4, you can get much much better result(even unbelieveable result, PSNR/SSIM so high) instead of half lr=5e-5.
Is this kind of training trick cheating?(still improve 0.2 PSNR on Manga109, 0.0x improve on other image SR benchmark)
from swinir.
Thanks, so does it mean that it is difficult to train on a single card because of the memory even use 16 batchsize?
from swinir.
For a middle size SwinIR (used for classical image SR, around 12M parameters), we need about 24GB (2x12GB GPUs) for batch_size=16.
For a small size SwinIR (used for lightweight image SR, around 900k parameters), we need about 12GB(1x12GB GPUs) for batch_size=16.
from swinir.
I only use half lr (2e-4 may be too large, so we use 1e-4 for fine-tuning) for fine-tuning to save half of the training time. Fine-tuning from x2 is a common practice, such as RCAN, ECCV2018.
As for using 1e-4 to train SwinIR-BIX2, I never tried it before. Your observation is really surprising! According to my experience, the learning rate doesn't have much impact if you decrease it gradually. Maybe Transformers have different characteristics compared with CNNs in learning rate selection.
@Senwang98 Thank you for your reporting. Can you post more details here? Is it classical SR or lightweight SR? What are your PSNR values on five benchmarks? Is your network architecture identical to ours (seemodels/network_swinir.py
)?
I will try it and validate your finding~~ My results will be updated here.
from swinir.
@JingyunLiang
Thanks for your quick reply!
The result is found on CNNs network. I am not sure the code is wrong because I used to use EDSR-pytorch repo to train my model.
Several days ago, I conducted an experiment on RCAN, I use RCAN-BIX2.pt (which is 1e-4 for training and decay half per 200000 iters), then I used 1e-5 to fine-tune the RCAN-BIX4(In this time, I didn't decay learning rate per 200000 iters. That is to say I use 1e-5 for the whole training without changing lr!)
I think you are expert in this field, so do you think this is training trick cheating?
If I change the lr when train BIX4, the result is ok. If I use a much small lr to train and don't change lr, the final result is better(For RCAN, the BIX4-Manga109 performance can improve from 31.22 to round 31.45).
Can you give me some suggestions?(I don't mean your swinIR is wrong, I just want to explain this strange thing!)
Thanks for your interesting work again, and maybe you can use siwnIR-BIX2 to fine-tune BIX4 without change lr during training!
from swinir.
It is possible for RCAN as it is a very deep network and should have strong representation ability. Better training strategy or longer training time may help.
In my view, if all other settings are the same as original RCAN (same datasets, same patch size, same training iterations, same optimizer, etc.), changing the lr could be a good trick. I think it is fair. If it is useful for all other CNNs, all future works should use this strategy from then on! However, maybe we should point the lr strategy out in the paper and do some ablation studies if we compare it with these old methods.
As for your advice on fine-tuning SwinIR-BIX4 by using fixed lr (1e-5), I will try it and keep you updated. Thank you.
from swinir.
@JingyunLiang
Yes, you are wright, this training setting should be reported in paper and some study should also be done to support this trick.
I will test training other cnn-based later. If it acturally works, I will tell you.
Thanks for your reply!
from swinir.
- Update: I compared three kinds of learning rate strategies when finetuning x4 from x2 classical SR models.
Case | Init LR | LR_step | Set5 | Set14 | BSD100 | Urban100 | Manga109 |
---|---|---|---|---|---|---|---|
1 (used in the paper) | 1e-4 | [125000, 200000, 225000, 237500] (total_iter=250000) | 32.72/0.9021 | 28.94/0.7914 | 27.83/0.7459 | 27.07/0.8164 | 31.67/0.9226 |
2 | 1e-5 | [None] (total_iter=250000) | 32.69/0.9018 | 28.96/0.7920 | 27.84/0.7463 | 27.07/0.8165 | 31.73/0.9227 |
3 | 1e-5 | [None] (total_iter=500000) | 32.69/0.9020 | 28.96/0.7918 | 27.84/0.7462 | 27.08/0.8168 | 31.69/0.9228 |
- Conclusion: We get a PSNR improvement from -0.03 to 0.06. Using the second lr strategy (fix the lr to 1e-5 for x4 fine-tuning) is only slightly better.
from swinir.
@JingyunLiang
Ok, I will check. Maybe it is more useful for CNN-based model. (Alough I think this strategy should not work, the results are relly better in my repo 23333333. )
from swinir.
Feel free to open it if you have more questions.
from swinir.
Related Issues (20)
- Colab notebook error
- About self-ensemble strategy
- not compatible with the latest cog version
- Did you train SwinIR on DIV test set?
- How to disable using two GPUs for training?
- only 1 swin layer in the RSTB module?
- It seems SwinIR doesn't use patch merging. HOT 2
- Loading pretrained weight achiving not accurate result HOT 1
- Error(s) in loading state_dict for SwinIR HOT 5
- Inquiry about patch embedding HOT 4
- 关于X8的测试集
- JPEG Artifact Removal window size
- Transfer Learning with SWINIR model
- Artifact SWINIR (training Model as Generator GAN) HOT 1
- dynamic shape inference with onnx model HOT 1
- The noise removal command eats up my entire RAM and then gets killed HOT 5
- Load model takes forever
- SWINIR as Generator in GAN : Real world
- Unable to load pretrained model
- change the video card to run on the site replicate HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from swinir.