Hello Yizhong and everyone! Thanks for your great work and contribut

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

[Question] parameters for performance reproduction in paper about tk-instruct HOT 6 CLOSED

yizhongw commented on May 26, 2024

[Question] parameters for performance reproduction in paper

from tk-instruct.

Comments (6)

yizhongw commented on May 26, 2024 3

Hi Leo, thanks for reporting your results. May I know how many GPUs did you use? This is important because the real batch size is per_device_train_batch_size x num_gpus x accumulation steps. In my experiment, I used 8 A100 GPUs, which results in a batch size of 16.

from tk-instruct.

kuri-leo commented on May 26, 2024 1

Hi Leo, thanks for reporting your results. May I know how many GPUs did you use? This is important because the real batch size is per_device_train_batch_size x num_gpus x accumulation steps. In my experiment, I used 8 A100 GPUs, which results in a batch size of 16.

Hey Yizhong,

Thanks for your rapid reply :-)

In my test, I only used one A100 for debugging. So I will try 8 GPU and share the result later.

Thank you again and have a nice day!

Cheers,
Leo

from tk-instruct.

kuri-leo commented on May 26, 2024 1

Hi Yizhong,

Thanks for your hints, and I have successfully performed 2 tests and received even better results of 48.5 (sample 8 times) and 55.1235 (sample 64 times) than 48.5 and 54.7 respectively reported from paper.

That's amazing!

Leo

from tk-instruct.

Yufang-Liu commented on May 26, 2024

Hi everyone，

It's strange that I can only get 48 on 8 3090 GPUs without changing any parameters, does anyone know the possible reason?

from tk-instruct.

kuri-leo commented on May 26, 2024

@Yufang-Liu

Hi Yufang,

Given the multitude of factors that could lead to variations in scores, it was on the batch_size my previous test. Further, I suggest examining if any optimization measures such as half-precision or ZERO have been auto-implemented via the Deepspeed or accelerate package.

Hope this may help.

Leo

from tk-instruct.

Yufang-Liu commented on May 26, 2024

@Yufang-Liu

Hi Yufang,

Given the multitude of factors that could lead to variations in scores, it was on the batch_size my previous test. Further, I suggest examining if any optimization measures such as half-precision or ZERO have been auto-implemented via the Deepspeed or accelerate package.

Hope this may help.

Leo

Hi Leo, thanks a lot for your helpful suggestions !!

I found the reason is the version of the installed packages. I got the same results on 8 3090 GPUs with the same version of packages. Still not sure which package affects the performance.

from tk-instruct.

[Question] parameters for performance reproduction in paper about tk-instruct HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent