First of all thank you very much for this great work. It really helped me a lot. but I

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

lowering gpu requirement hyperparameter setting about lafite HOT 8 CLOSED

drboog commented on August 16, 2024

lowering gpu requirement hyperparameter setting

from lafite.

Comments (8)

drboog commented on August 16, 2024

Hi, my suggestion is to start with gamma=10 and try different itc and itd. The reason is: 1) I only roughly tuned hyper-parameters, the provided hyper-parameters are not optimal, so if you are looking for better performance, you may want to tune them; 2) although each GPU has 16 samples, the total mini-batch size is different when you use different number of GPUs, which I think will influence the hyper-parameter settings, but I think gamma=10 will lead to promising results.
I used 'ada=noaug' because 1) for fair comparison, I want to compare with previous methods fairly, which didn't use augmentation, to show our effectiveness under the same setting; 2) I did some experiments on different datasets (although the hyper-parameters may not be well-tuned), the ADA augmentation is not guaranteed to improve performance on all the datasets, so I didn't use it in final experiments for simplicity.

from lafite.

StolasIn commented on August 16, 2024

Thanks for your advice, It's a great help for me.

I still have some questions about hyper-parameters tuning and batch size.

In the paper, you select itd and itc from 0 ~ 50. my question is those hyper-parameters might still be in this range right ? perhaps close to the original setting (ex. itd = 5, itc = 10) ?
In the latest model which use contrastive learing method, I noticed their batch size were usually seted quite large (>256). does large batch size result in better performance in Lafite ? Or using lower batch size have some benefits ?

from lafite.

drboog commented on August 16, 2024

Yes, I think you can tune the hyper-parameters by searching in this range. I think larger batch size will lead to performance improvement, because it could provide better discriminative information for training. But in that case you may also need to tune some hyper-parameters for contrastive loss (--temp=0.5, --lam=0. are tuned based on batch size=16 per GPU).

from lafite.

StolasIn commented on August 16, 2024

I appreciate for you answering my questions. I will close this issue and do some experiment mentioned above.

from lafite.

Cwj1212 commented on August 16, 2024

@StolasIn Have you reproduced the results with one gpu now? I have reproduced the results of the paper under the setting of four gpus (batch=32, btach_gpu=8 gets better results than the paper). But when I try to experiment with one gpu, I get only poor results.
@drboog ,Thank you so much for your work, for which I have to keep harassing you again. As for why the hyperparameters need to be re-adjusted for one gpu, my observation is that the {gather:flase} setting in the contrastive loss in the code is to distribute and calculate the contrastive loss on each gpu. I don't know what else causes the difference between one gpu and four gpu?
What confuses me is that I modified the calculation method of contrastive learning under the one gpu setting, simulating it as the {gather:false} setting (divide a batch of samples into four parts, and calculate their contrastive losses separately) , but still only get poor results.

from lafite.

drboog commented on August 16, 2024

The performance is related to many things, batch size, learning rate, regularizer ... For example, for StyleGAN2 without contrastive loss (for image generation not text-to-image generation), GPUs still matters a lot.
https://github.com/NVlabs/stylegan2-ada/blob/main/docs/stylegan2-ada-training-curves.png

from lafite.

drboog commented on August 16, 2024

Assume under 4 GPU setting, each GPU has N samples, resulting in a batch size of 4N. Are you using batch size of 4N when using one GPU?

from lafite.

Cwj1212 commented on August 16, 2024

@drboog Yes, I did, but the performance of one card is still significantly worse than four cards. Thank you very much for providing this picture. I originally thought it was the difference caused by the different number of gups corresponding to different hyperparameters when cfg=auto.
But as a beginner, I still can't understand why the forward and back propagation of the network are not equivalent at this time. Are they equivalent in theory?

from lafite.

lowering gpu requirement hyperparameter setting about lafite HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent