Comments (20)
Just to add here, (it is a hypothesis but) TPU has both a normal (ARM based or x86 based) CPU along with the accelerator (matrix multiplication units) and on cloud console it does not provide details about how much the accelerator is being used but it shows how much the CPU is being used.
In case of T5, TF-DS graph is uploaded to TPU, it uses TPU's CPU to execute non-deep learning ops like loading from GCS and preprocessing/tokenizing and then uses accelerator to do the deep learning training etc.
If you want to see how much TPU's accelerator is being used, you can use TPU profiling. In my case CPU was being used <0.1% but accelerator was being used at ~45%.
from text-to-text-transfer-transformer.
from text-to-text-transfer-transformer.
This means the model must be is I/O bound, due in part to its small size. We do tokenization and packing on the fly by default. I have a TODO to add support for cacheing smaller datasets in memory post-tokenization. Let me see if I can get to it today and you can try it out.
from text-to-text-transfer-transformer.
Thank you @adarob , let me know if I can help you to implement something. I'm launching the T5 for one of my current working tasks and I'm eager to make it train faster.
from text-to-text-transfer-transformer.
Can you try using the latest commit to see if this improves? It will cache the dataset on the first pass so the it will be much faster after that.
from text-to-text-transfer-transformer.
@adarob
Hi Adam,
Thank you for the update. I'm running it now (had to update python to 3.6 to be able to run the latest and it took a good part of the day).
From what I see it did not improve, timing parameters are pretty much the same (global_step/sec: 0.401189, examples/sec: 821.635) , but strangely the TPU load decreased to 1%.
from text-to-text-transfer-transformer.
from text-to-text-transfer-transformer.
Hi @t5-copybara,
As far as I understand it is here:
https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/utils.py#L654
I will comment out condition and keep ds = ds.cache()
Please, let me know if this is the correct approach.
Thank you!
from text-to-text-transfer-transformer.
I have found that I could specify use_cached in command line parameters like this:
--gin_param="mesh_train_dataset_fn.use_cached = True"
This is the complete line:
t5_mesh_transformer --tpu="${TPU_NAME}" --gcp_project="${PROJECT}" --tpu_zone="${ZONE}" --model_dir="${MODEL_DIR}" --t5_tfds_data_dir="${DATA_DIR}" --gin_file="dataset.gin" --gin_param="mesh_train_dataset_fn.use_cached = True" --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" --gin_param="utils.tpu_mesh_shape.tpu_topology = '2x2'" --gin_param="MIXTURE_NAME = 'super_glue_boolq_v102'" --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin"
When I run it though, I get the exception here:
Do you know where I can specify cache directories?
from text-to-text-transfer-transformer.
Ok, have found out that I can specify cache directories using:
--additional_task_cache_dirs="${CACHE_DIR}"
but in this case the cache does not get created as well.
This is the message I get:
22:18:44.970715 140004609750016 utils.py:584] 'super_glue_boolq_v102' does not exist in any task cache directories (searched ['gs://uniquebucketname/t5-boolq-data-dir-cache/super_glue_boolq_v102']).
Giving up for now.
from text-to-text-transfer-transformer.
The offline use_cached
stuff is only supported on our internal infrastructure for the time being. What I added for you is something that will do the caching on the fly. You should be able to explicitly enable it as you mentioned above ("I will comment out condition and keep ds = ds.cache()"). Are you sure you're actually using this new code when you run and not what's in the pip package?
from text-to-text-transfer-transformer.
@adarob
Hi Adam,
I'll give it a try on another run. But I see several places with ds = ds.cache(), do I have to make the change to all of them?
I'm using latest master by installing from source with command:
pip install --upgrade -e ./text-to-text-transfer-transformer
And I see all the recent fixes there, so I'm pretty sure to be using the most recent version.
Will let you know as soon as I run it again.
from text-to-text-transfer-transformer.
I am currently finetuning the large model on a 6GB TSV file and get a TPU usage of < 1% . Anything new here?
from text-to-text-transfer-transformer.
@f-lng , @adarob ,
I have ended up using the notebook provided here: https://github.com/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb
It seems to be using TPU more effectively. At least training that took more than 12 hours using the script from the issue, is accomplished in less than 4 hours using the notebook.
Size of the dataset is under 500Mb though.
@adarob , thank you for providing the notebook!
from text-to-text-transfer-transformer.
@adarob I was not aware that the size of the TSV file could be an issue, I assumed that the code would just read chunks of it. Thank you for clarifying, I will try to pre-shard it.
@anatoly-khomenko Thanks for letting me know, I will have a look at the notebook as well.
from text-to-text-transfer-transformer.
from text-to-text-transfer-transformer.
@adarob I did now take 7.000 examples from my dataset and precomputed 7 TFRecords files from it.
To create the TFRecords I directly used the T5 'Task' ( https://pastebin.com/36pG5ne4 )
I then adjusted your _get_cached_dataset function (and made sure its called) to load it ( https://pastebin.com/cvTzxEXh ) . The debug print is showing, so the function is called and the in-memory caching is also working.
I am using code that has been adapted from your Notebook to train the model ( https://pastebin.com/8am5S5H2 )
However, I am still getting a speed of ~50 examples/second and a TPU-CPU usage of <= 0.15% (sic) during training most of the time, with some spikes (1-2%)
( The naive approach of putting my huge TSV into the command line util gave me ~90 examples / sec. )
I do not have a lot of experience with the TF ecosystem except from some hacking around in Tensor2Tensor, and none with TPUs, so perhaps I am missing something important ?!
Btw I just checked, the buckets, the TPU and the VM are all in us-central1(-a) and it is a TPU v3-8.
from text-to-text-transfer-transformer.
@adarob I did another experiment and set tokens_per_batch to 1024^2 and trained on ~250k datapoints. The examples/second stayed at ~50. (Also note that I got OOM errors with such high batch sizes when training using the CLI but did not get one this time. )
from text-to-text-transfer-transformer.
@adarob @anatoly-khomenko I am having a similar issue and I am trying to finetune on the GPU. When training, GPU usage is less than 7% and there is a huge usage of CPU(It has to be I/O bound). I also ended up using the following notebook even with the example data provided:
https://github.com/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb
Moreover, I even tried parameter mesh_train_dataset_fn.use_cached = True
.
Any suggestion or correction on what I might be doing wrong?
from text-to-text-transfer-transformer.
use_cached=True
won't work unless you have run the cache_tasks_main
preprocessing (https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/cache_tasks_main.py). You should also check that your data is being stored in the same region as your TPU/GPU. I'm not sure what else could be causing this issue.
from text-to-text-transfer-transformer.
Related Issues (20)
- ValueError when evaluating tuning model using Mtf library
- using A100(40G)*8 gpus server to train T5-3b,it reports OOM resource is exhausted problem HOT 2
- How should I speed up T5 exported saved_model by using TF-TRT ?
- model.finetune(...) does not show the loss of the model HOT 6
- CUDA OOM with HF Model
- Predictions are inconsistent unless model is reloaded for each prediction HOT 1
- how to change teacher forcing fashion to autogressive fashion in training stage?
- ERROR:root:Path not found: gs://t5-data/pretrained_models/large/operative_config.gin HOT 6
- Fine tuning t5 without TPU
- About "seqio" in "hf_model.py"
- Question about the metric reported in the paper?
- All attempts to get a Google authentication bearer token failed, returning an empty token. HOT 2
- How to fine-tune T5 with a Casual Language Modeling object?
- cmd vs entrypoint youtube video suggestion HOT 1
- Question about cross-node(multi-node) data parallelism on GPU HOT 1
- Dependencies in `setup.py` have module conflicts.
- How can I get the best checkpoint in Squad?
- Custom Model
- Columns and DataType Not Explicitly Set on line 163 of eval_utils_test.py
- Clarification on T5 Model Pre-training Objective and Denoising Process
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-to-text-transfer-transformer.