Comments (5)
Verified a fix. Working on a PR.
from text-to-text-transfer-transformer.
Hey Morizeyao, thanks for your interest.
Does the code base support distributed training? If not, is it possible to support it after some code modifications?
We (the authors) haven't tried GPU training yet but it should be possible. See the example under here: https://github.com/tensorflow/mesh#example-network-mnist
@nshazeer in case he wants to chime in.
By the way, what is the way to set batch size and gpu number if I want to use GPU to train the model?
The batch size can be set according to how much memory your GPU can hold. The Mesh TF Transformer is in principle smart enough to chop your batch up into microbatches and accumulate gradients if the batch won't fit in memory. The GPU number will depend on which GPUs are available on the machine you're running on and which you want to use. Or do you mean how many GPUs you should use? It should work to use as many as you have available.
from text-to-text-transfer-transformer.
Thank you for your quick response!
About distributed GPU training, I'll do some research following your link. :)
For the second question, I did some test on 2 GPU machine and the code used only one card.
As you have said, the Mesh library automatically choose the batch size, gradient accumulation and gpu numbers. I guess the problem may due to the environment settings.
I will report my findings here soon.
from text-to-text-transfer-transformer.
Same question here. I have tried my best to let the code run on GPUs, but the log shows that tpu_estimator.py says training on CPU. And the speed is around 300 times slower than on tpu in GCP. I believe I have set settings properly. The log_device_placement show that some of the operations are assigned to cpu, and some assigned to gpu. Do you have any idea what is going on?
from text-to-text-transfer-transformer.
And the speed is around 300 times slower than on tpu in GCP.
How did you get to this number?
Have you tried changing your batch size?
As @craffel mentioned:
The batch size can be set according to how much memory your GPU can hold.
from text-to-text-transfer-transformer.
Related Issues (20)
- ValueError when evaluating tuning model using Mtf library
- using A100(40G)*8 gpus server to train T5-3b,it reports OOM resource is exhausted problem HOT 2
- How should I speed up T5 exported saved_model by using TF-TRT ?
- model.finetune(...) does not show the loss of the model HOT 6
- CUDA OOM with HF Model
- Predictions are inconsistent unless model is reloaded for each prediction HOT 1
- how to change teacher forcing fashion to autogressive fashion in training stage?
- ERROR:root:Path not found: gs://t5-data/pretrained_models/large/operative_config.gin HOT 6
- Fine tuning t5 without TPU
- About "seqio" in "hf_model.py"
- Question about the metric reported in the paper?
- All attempts to get a Google authentication bearer token failed, returning an empty token. HOT 2
- How to fine-tune T5 with a Casual Language Modeling object?
- cmd vs entrypoint youtube video suggestion HOT 1
- Question about cross-node(multi-node) data parallelism on GPU HOT 1
- Dependencies in `setup.py` have module conflicts.
- How can I get the best checkpoint in Squad?
- Custom Model
- Columns and DataType Not Explicitly Set on line 163 of eval_utils_test.py
- Clarification on T5 Model Pre-training Objective and Denoising Process
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-to-text-transfer-transformer.