Git Product home page Git Product logo

Comments (3)

adarob avatar adarob commented on June 24, 2024 5

It is possible to do inference on the 11B model in a Colab and to train/fine-tune the 3B param model using the free TPU. We will be releasing a notebook in the near future.

If you launch a larger TPU in Cloud, you can connect to it and train the 11B model via Colab.

from text-to-text-transfer-transformer.

alespeggio avatar alespeggio commented on June 24, 2024 2

It is possible to do inference on the 11B model in a Colab and to train/fine-tune the 3B param model using the free TPU. We will be releasing a notebook in the near future.

If you launch a larger TPU in Cloud, you can connect to it and train the 11B model via Colab.

Hi @adarob,

I'm trying to fine-tune the T5-small pre-trained model (60 million parameters) on Google Colab (with free TPU) on a custom dataset. However, even if I use an extremely small dataset, the notebook runs out of RAM (35GB on Google Colab). You said that it is possible to fine-tune the 3B param model using free TPU. Hence, I'm wondering if I'm doing something wrong.

I describe below the list of commands that I run.
I install the package with the following command:
pip install t5[gcp]

and I execute this command to fine-tune the model with my dataset.
t5_mesh_transformer --model_dir="/content/small" --gin_file="dataset.gin" --gin_file="/content/small/operative_config.gin" --gin_param="utils.run.train_dataset_fn = @t5.models.mesh_transformer.tsv_dataset_fn" --gin_param="tsv_dataset_fn.filename = 'custom_dataset.tsv'" --gin_file="learning_rate_schedules/constant_0_001.gin" --gin_param="run.train_steps = 1010000"

Have others encountered the same issue?

Thank you!

from text-to-text-transfer-transformer.

anatoly-khomenko avatar anatoly-khomenko commented on June 24, 2024

Hi @adarob ,

I was trying to fine-tune the 11B model in google cloud and got out of memory error on TPU.
Is there anything to change in the parameters?

Here is how I run the fine tuning:

`export PROJECT=projectname
export ZONE=us-central1-b
export BUCKET=gs://uniquebucketname
export TPU_NAME=t5-ex2
export DATA_DIR="${BUCKET}/t5-boolq-data-dir"
export MODEL_DIR="${BUCKET}/t5_boolq-small-model_dir"

ctpu up --name=$TPU_NAME --project=$PROJECT --zone=$ZONE --tpu-size=v3-8 --tpu-only --tf-version=1.15.dev20190821

t5_mesh_transformer --tpu="${TPU_NAME}" --gcp_project="${PROJECT}" --tpu_zone="${ZONE}" --model_dir="${MODEL_DIR}" --t5_tfds_data_dir="${DATA_DIR}" --gin_file="dataset.gin" --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" --gin_param="utils.tpu_mesh_shape.tpu_topology = '2x2'" --gin_param="MIXTURE_NAME = 'super_glue_boolq_v102'" --gin_file="gs://t5-data/pretrained_models/11B/operative_config.gin"`

The complete stack trace is attached:

T5-11B-TPU-stack-trace.txt

from text-to-text-transfer-transformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.