Comments (3)
It is possible to do inference on the 11B model in a Colab and to train/fine-tune the 3B param model using the free TPU. We will be releasing a notebook in the near future.
If you launch a larger TPU in Cloud, you can connect to it and train the 11B model via Colab.
from text-to-text-transfer-transformer.
It is possible to do inference on the 11B model in a Colab and to train/fine-tune the 3B param model using the free TPU. We will be releasing a notebook in the near future.
If you launch a larger TPU in Cloud, you can connect to it and train the 11B model via Colab.
Hi @adarob,
I'm trying to fine-tune the T5-small pre-trained model (60 million parameters) on Google Colab (with free TPU) on a custom dataset. However, even if I use an extremely small dataset, the notebook runs out of RAM (35GB on Google Colab). You said that it is possible to fine-tune the 3B param model using free TPU. Hence, I'm wondering if I'm doing something wrong.
I describe below the list of commands that I run.
I install the package with the following command:
pip install t5[gcp]
and I execute this command to fine-tune the model with my dataset.
t5_mesh_transformer --model_dir="/content/small" --gin_file="dataset.gin" --gin_file="/content/small/operative_config.gin" --gin_param="utils.run.train_dataset_fn = @t5.models.mesh_transformer.tsv_dataset_fn" --gin_param="tsv_dataset_fn.filename = 'custom_dataset.tsv'" --gin_file="learning_rate_schedules/constant_0_001.gin" --gin_param="run.train_steps = 1010000"
Have others encountered the same issue?
Thank you!
from text-to-text-transfer-transformer.
Hi @adarob ,
I was trying to fine-tune the 11B model in google cloud and got out of memory error on TPU.
Is there anything to change in the parameters?
Here is how I run the fine tuning:
`export PROJECT=projectname
export ZONE=us-central1-b
export BUCKET=gs://uniquebucketname
export TPU_NAME=t5-ex2
export DATA_DIR="${BUCKET}/t5-boolq-data-dir"
export MODEL_DIR="${BUCKET}/t5_boolq-small-model_dir"
ctpu up --name=$TPU_NAME --project=$PROJECT --zone=$ZONE --tpu-size=v3-8 --tpu-only --tf-version=1.15.dev20190821
t5_mesh_transformer --tpu="${TPU_NAME}" --gcp_project="${PROJECT}" --tpu_zone="${ZONE}" --model_dir="${MODEL_DIR}" --t5_tfds_data_dir="${DATA_DIR}" --gin_file="dataset.gin" --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" --gin_param="utils.tpu_mesh_shape.tpu_topology = '2x2'" --gin_param="MIXTURE_NAME = 'super_glue_boolq_v102'" --gin_file="gs://t5-data/pretrained_models/11B/operative_config.gin"`
The complete stack trace is attached:
from text-to-text-transfer-transformer.
Related Issues (20)
- ValueError when evaluating tuning model using Mtf library
- using A100(40G)*8 gpus server to train T5-3b,it reports OOM resource is exhausted problem HOT 2
- How should I speed up T5 exported saved_model by using TF-TRT ?
- model.finetune(...) does not show the loss of the model HOT 6
- CUDA OOM with HF Model
- Predictions are inconsistent unless model is reloaded for each prediction HOT 1
- how to change teacher forcing fashion to autogressive fashion in training stage?
- ERROR:root:Path not found: gs://t5-data/pretrained_models/large/operative_config.gin HOT 6
- Fine tuning t5 without TPU
- About "seqio" in "hf_model.py"
- Question about the metric reported in the paper?
- All attempts to get a Google authentication bearer token failed, returning an empty token. HOT 2
- How to fine-tune T5 with a Casual Language Modeling object?
- cmd vs entrypoint youtube video suggestion HOT 1
- Question about cross-node(multi-node) data parallelism on GPU HOT 1
- Dependencies in `setup.py` have module conflicts.
- How can I get the best checkpoint in Squad?
- Custom Model
- Columns and DataType Not Explicitly Set on line 163 of eval_utils_test.py
- Clarification on T5 Model Pre-training Objective and Denoising Process
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-to-text-transfer-transformer.