Comments (6)
Hi Leo, thanks for reporting your results. May I know how many GPUs did you use? This is important because the real batch size is per_device_train_batch_size x num_gpus x accumulation steps
. In my experiment, I used 8 A100 GPUs, which results in a batch size of 16.
from tk-instruct.
Hi Leo, thanks for reporting your results. May I know how many GPUs did you use? This is important because the real batch size is
per_device_train_batch_size x num_gpus x accumulation steps
. In my experiment, I used 8 A100 GPUs, which results in a batch size of 16.
Hey Yizhong,
Thanks for your rapid reply :-)
In my test, I only used one A100 for debugging. So I will try 8 GPU and share the result later.
Thank you again and have a nice day!
Cheers,
Leo
from tk-instruct.
Hi Yizhong,
Thanks for your hints, and I have successfully performed 2 tests and received even better results of 48.5 (sample 8 times) and 55.1235 (sample 64 times) than 48.5 and 54.7 respectively reported from paper.
That's amazing!
Leo
from tk-instruct.
Hi everyone,
It's strange that I can only get 48 on 8 3090 GPUs without changing any parameters, does anyone know the possible reason?
from tk-instruct.
Hi Yufang,
Given the multitude of factors that could lead to variations in scores, it was on the batch_size
my previous test. Further, I suggest examining if any optimization measures such as half-precision
or ZERO
have been auto-implemented via the Deepspeed
or accelerate
package.
Hope this may help.
Leo
from tk-instruct.
Hi Yufang,
Given the multitude of factors that could lead to variations in scores, it was on the
batch_size
my previous test. Further, I suggest examining if any optimization measures such ashalf-precision
orZERO
have been auto-implemented via theDeepspeed
oraccelerate
package.Hope this may help.
Leo
Hi Leo, thanks a lot for your helpful suggestions !!
I found the reason is the version of the installed packages. I got the same results on 8 3090 GPUs with the same version of packages. Still not sure which package affects the performance.
from tk-instruct.
Related Issues (20)
- Cannot load `tk-instruct-11b-def` with Huggingface transformers HOT 3
- Unable to reproduce Tk-Instruct predictions on Natural Instructions test HOT 2
- How to debug deepspeed in vscode?
- About the Evaluation Metrics
- Fine tune of Multi-news dataset HOT 2
- The `max_num_instances_per_task` Parameter in Experiments
- Able to predict more than one test case in one call?
- Predicting a topic that doesnot exist in the list
- What is the minimum samples required for a new Task?
- Low ROUGE scores for Tk-instruct large? HOT 1
- How to use TK-instruct on our own dataset
- finetune 11b model HOT 1
- Input format for the model for the complete schema
- question about the train learning rate
- [Errno 2] No such file or directory: 'data/tasks/.json
- Tokenizer & Model info
- Maximum input sequence length? HOT 1
- Datasets folder? HOT 1
- Evaluation time estimate? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tk-instruct.