Comments (8)
Hi, my suggestion is to start with gamma=10 and try different itc and itd. The reason is: 1) I only roughly tuned hyper-parameters, the provided hyper-parameters are not optimal, so if you are looking for better performance, you may want to tune them; 2) although each GPU has 16 samples, the total mini-batch size is different when you use different number of GPUs, which I think will influence the hyper-parameter settings, but I think gamma=10 will lead to promising results.
I used 'ada=noaug' because 1) for fair comparison, I want to compare with previous methods fairly, which didn't use augmentation, to show our effectiveness under the same setting; 2) I did some experiments on different datasets (although the hyper-parameters may not be well-tuned), the ADA augmentation is not guaranteed to improve performance on all the datasets, so I didn't use it in final experiments for simplicity.
from lafite.
Thanks for your advice, It's a great help for me.
I still have some questions about hyper-parameters tuning and batch size.
-
In the paper, you select itd and itc from 0 ~ 50. my question is those hyper-parameters might still be in this range right ? perhaps close to the original setting (ex. itd = 5, itc = 10) ?
-
In the latest model which use contrastive learing method, I noticed their batch size were usually seted quite large (>256). does large batch size result in better performance in Lafite ? Or using lower batch size have some benefits ?
from lafite.
Yes, I think you can tune the hyper-parameters by searching in this range. I think larger batch size will lead to performance improvement, because it could provide better discriminative information for training. But in that case you may also need to tune some hyper-parameters for contrastive loss (--temp=0.5, --lam=0. are tuned based on batch size=16 per GPU).
from lafite.
I appreciate for you answering my questions. I will close this issue and do some experiment mentioned above.
from lafite.
@StolasIn Have you reproduced the results with one gpu now? I have reproduced the results of the paper under the setting of four gpus (batch=32, btach_gpu=8 gets better results than the paper). But when I try to experiment with one gpu, I get only poor results.
@drboog ,Thank you so much for your work, for which I have to keep harassing you again. As for why the hyperparameters need to be re-adjusted for one gpu, my observation is that the {gather:flase} setting in the contrastive loss in the code is to distribute and calculate the contrastive loss on each gpu. I don't know what else causes the difference between one gpu and four gpu?
What confuses me is that I modified the calculation method of contrastive learning under the one gpu setting, simulating it as the {gather:false} setting (divide a batch of samples into four parts, and calculate their contrastive losses separately) , but still only get poor results.
from lafite.
The performance is related to many things, batch size, learning rate, regularizer ... For example, for StyleGAN2 without contrastive loss (for image generation not text-to-image generation), GPUs still matters a lot.
https://github.com/NVlabs/stylegan2-ada/blob/main/docs/stylegan2-ada-training-curves.png
from lafite.
Assume under 4 GPU setting, each GPU has N samples, resulting in a batch size of 4N. Are you using batch size of 4N when using one GPU?
from lafite.
@drboog Yes, I did, but the performance of one card is still significantly worse than four cards. Thank you very much for providing this picture. I originally thought it was the difference caused by the different number of gups corresponding to different hyperparameters when cfg=auto.
But as a beginner, I still can't understand why the forward and back propagation of the network are not equivalent at this time. Are they equivalent in theory?
from lafite.
Related Issues (20)
- Understanding Multi-Line Text input for training and testing model HOT 1
- Some questions about the experimental setting HOT 1
- Data set download HOT 1
- The questions about parameters img_img_c and img_img_d HOT 1
- Questions on the class Model and Gaussian noise in loss.py, and data preprocessing HOT 4
- evaluate issue HOT 2
- A Question on the Temperature in the Contrastive Loss HOT 3
- any plans for the Lafite2? HOT 1
- Question about FID-k HOT 3
- A Question on Injecting Text Condition into Each Generator Layer. HOT 2
- Pre-trained Model on CUB dataset HOT 2
- Tune hyperparameters HOT 1
- Pretrained checkpoints for MM-CelebA-HQ HOT 1
- Question about finetuning hyperparams for mscoco HOT 3
- About generated results HOT 2
- Custom Dataset training HOT 8
- training time
- resume pretrained model problem HOT 1
- data preprocess
- weights link is not working
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lafite.