Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Home Page: https://arxiv.org/abs/1910.10683

License: Apache License 2.0

Python 88.97% Jupyter Notebook 11.03%

text-to-text-transfer-transformer's Introduction

T5: Text-To-Text Transfer Transformer

As of July 2022, we recommend using T5X:

T5X is the new and improved implementation of T5 (and more) in JAX and Flax. T5 on Tensorflow with MeshTF is no longer actively developed. If you are new to T5, we recommend starting with T5X.

The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. In the paper, we demonstrate how to achieve state-of-the-art results on multiple NLP tasks using a text-to-text transformer pre-trained on a large text corpus.

The bulk of the code in this repository is used for loading, preprocessing, mixing, and evaluating datasets. It also provides a way to fine-tune the pre-trained models released alongside the publication.

The t5 library can be used for future model development by providing useful modules for training and fine-tuning (potentially huge) models on mixtures of text-to-text tasks.

Library
Usage
Released Model Checkpoints
How to Cite

Library

t5.data

t5.data is a package for defining Task objects that provide tf.data.Datasets.

Each Task is made up of:

a data source
text preprocessor function(s)
a SentencePiece model
metric function(s)

Additionally, you may optionally provide:

token preprocessor function(s)
postprocess function(s)

The data source can be an arbitrary function that provides a tf.data.Dataset, but we also provide simpler wrappers for datasets available in TensorFlow Datasets (TFDS) (a TfdsTask) or stored as text files with one example per line (a TextLineTask).

The text preprocessor converts the examples in the source dataset into the appropriate format for a text-to-text model with fields for inputs and targets. For example, the predefined t5.data.preprocessors.translate preprocessor converts inputs in the form

{'de': 'Das ist gut.', 'en': 'That is good.'}

to the form

{'inputs': 'translate German to English: Das ist gut.', 'targets': 'That is good.'}

In addition to text preprocessing, you can also use one or more token preprocessors to modify the inputs post-tokenization. We implemented our unsupervised pre-training objectives using these token preprocessors.

We provide many predefined preprocessors in t5.data.preprocessors, but you may also define your own.

The SentencePiece model is used to tokenize the input strings and decode the output tokens. You can create your own model with the google/sentencepiece library, or use our default one at t5.data.DEFAULT_SPM_PATH. If you create your own, you must use the flags --pad_id=0 --eos_id=1 --unk_id=2 --bos_id=-1 with spm_train to be compatible with our model code.

The metric function returns a score given the target and prediction from the model. You may also define a postprocess function to convert the target and prediction text to another format before calling the metric. We provide some predefined metrics in t5.evaluation.metrics.

Finally, t5.data contains a Mixture class that can be instantiated to combine multiple Task datasets for multi-task training using various functions for specifying the mixture rates.

t5.evaluation

t5.evaluation contains two core components:

metrics to be used during evaluation
utilities for applying these metrics at evaluation time

t5.models

t5.models contains shims for connecting T5 Tasks and Mixtures to a model implementation for training, evaluation, and inference.

Currently there are two shims available: One for the Mesh TensorFlow Transformer that we used in our paper and another for the Hugging Face Transformers library. The Hugging Face API is currently experimental and subject to change, but provides a simple and easy way to load, fine-tune, and evaluate our pre-trained models using PyTorch on a single GPU. If you want to use our largest models on TPUs and/or reproduce the results in our paper, you should use the MtfModel API and the t5_mesh_transformer binary. If you are interested fine-tuning our models on a GPU in PyTorch, you should try the HfPyTorchModel API. Since the HfPyTorchModel is experimental, the remainder of this README assumes usage of the MtfModel and its associated binary. A usage example of HfPyTorchModel is available here.

Usage

The easiest way to try out T5 is with a free TPU in our Colab Tutorial.

Below we provide examples for how to pre-train, fine-tune, evaluate, and decode from a model from the command-line with our codebase. You can use these instructions to reproduce our results, fine-tune one of our released checkpoints with your own data and/or hyperparameters, or pre-train a model from scratch.

Dataset Preparation

You may either use a new or pre-existing Task, or you may load examples from a preprocessed TSV file.

Using a `Task`

Depending on your data source (see above), you will need to prepare your data appropriately.

`Task`

If using a vanilla task, just make sure any file(s) loaded by your dataset_fn are accessible to the TPU (i.e., are in a GCS bucket), and you should be good to go!

`TfdsTask`

Most of our predefined Tasks use TensorFlow Datasets (TFDS) as their data source. When you run our training binary (see instructions below) with a TfdsTask, the dataset will automatically be downloaded and prepared on its first use. After preparation is complete, the dataset is cached to your local storage to avoid this overhead in future runs. If working in the cloud, we recommend you set the --t5_tfds_data_dir flag to point to a persistent storage location, such as a GCS bucket. This is a requirement when training on TPU.

C4

The C4 dataset we created for unsupervised pre-training is available in TensorFlow Datasets, but it requires a significant amount of bandwidth for downloading the raw Common Crawl scrapes (~7 TB) and compute for its preparation (~335 CPU-days). We suggest you take advantage of the Apache Beam support in TFDS, which enables distributed preprocessing of the dataset and can be run on Google Cloud Dataflow. With 500 workers, the job should complete in ~16 hours.

After defining MY_PROJECT and MY_BUCKET appropriately, you can build the dataset in DataFlow from GCP using the following commands:

pip install tfds-nightly[c4]
echo 'tfds-nightly[c4]' > /tmp/beam_requirements.txt
python -m tensorflow_datasets.scripts.download_and_prepare \
  --datasets=c4/en \
  --data_dir=gs://$MY_BUCKET/tensorflow_datasets \
  --beam_pipeline_options="project=$MY_PROJECT,job_name=c4,staging_location=gs://$MY_BUCKET/binaries,temp_location=gs://$MY_BUCKET/temp,runner=DataflowRunner,requirements_file=/tmp/beam_requirements.txt,experiments=shuffle_mode=service,region=$MY_REGION"

Read more in the TFDS Beam instructions.

`TextLineTask`

A TextLineTask is useful when your data source is a text file (or files) with one example per line. You can then use a text preprocessor to convert each line into a dictionary of inputs and targets.

Make sure your files are accessible to the TPU (i.e., are in a GCS bucket), and you should be good to go!

Using a TSV File Directly

Instead of defining a new Task, you may use a TSV file (or files) directly as your dataset where each line is formatted as <input>\t<target>.

However, there are a couple of caveats:

There is no way to define a text processor, so the TSV will need to contain your data in a preprocessed format.
There is also currently no way to set a token preprocessor, postprocess function, or metric function for evaluation when using a TSV file directly.

If you need any of these features, you must define a new Task, TfdsTask, or TextLineTask.

Similar to the above cases, your TSV file(s) must be accessible to the TPU (i.e., are in a GCS bucket).

Installation

To install the T5 package, simply run:

pip install t5[gcp]

Setting up TPUs on GCP

You will first need to launch a Virtual Machine (VM) on Google Cloud. Details about launching the VM can be found at the Google Cloud Documentation.

In order to run training or eval on Cloud TPUs, you must set up the following variables based on your project, zone and GCS bucket appropriately. Please refer to the Cloud TPU Quickstart guide for more details.

export PROJECT=your_project_name
export ZONE=your_project_zone
export BUCKET=gs://yourbucket/
export TPU_NAME=t5-tpu
export TPU_SIZE=v3-8
export DATA_DIR="${BUCKET}/your_data_dir"
export MODEL_DIR="${BUCKET}/your_model_dir"

Please use the following command to create a TPU device in the Cloud VM.

ctpu up --name=$TPU_NAME --project=$PROJECT --zone=$ZONE --tpu-size=$TPU_SIZE \
        --tpu-only --noconf

Training

In the command below, we train a model on the GLUE Benchmark MRPC task from scratch. You can change the MIXTURE_NAME gin parameter to use any of the tasks or mixtures provided in our package.

t5_mesh_transformer  \
  --tpu="${TPU_NAME}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --t5_tfds_data_dir="${DATA_DIR}" \
  --gin_file="dataset.gin" \
  --gin_file="models/bi_v1.gin" \
  --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '${TPU_SIZE}'" \
  --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'"

The full list of tasks and mixtures can be obtained by running:

python -c "import t5; print(t5.data.MixtureRegistry.names())"

You may also define additional tasks and mixtures in a new file and import it using the --module_import flag.

Alternatively, you could train with a TSV file where each line is formatted as <input>\t<target> (see above).

Fine-tuning

In order to fine-tune one of our pre-trained models, you need to pass the operative config of the pre-trained model to the training script. The operative config should be passed in as a gin_file flag. It specifies the model architecture and other hyperparameters. In addition, you need to specify the mixture to fine-tune on. For example, to fine-tune the T5-small model on the glue_mrpc_v002 mixture, please run:

t5_mesh_transformer  \
  --tpu="${TPU_NAME}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --t5_tfds_data_dir="${DATA_DIR}" \
  --gin_file="dataset.gin" \
  --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '${TPU_SIZE}'" \
  --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'" \
  --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin"

The correct pre-trained checkpoint path is included in the operative config.

You may also define additional tasks and mixtures in a new file and import it using the --module_import flag.

Alternatively, you could fine-tune with a TSV file where each line is formatted as <input>\t<target> (see above). For example, you could try one of the paired translation datasets from WMT '19 News Commentary 14 training set (e.g., English-French). When using a TSV file, you would replace the MIXTURE_NAME flag with:

--gin_param="utils.run.train_dataset_fn = @t5.models.mesh_transformer.tsv_dataset_fn"
--gin_param="tsv_dataset_fn.filename = 'gs:/path/to/tsv'"

To fine-tune with the same hyperparameters we used in the paper (using a constant learning rate of 0.001), you can pass in this gin file which is included in the T5 package:

--gin_file="learning_rate_schedules/constant_0_001.gin"

The operative config for the pre-trained models are set so that there is effectively no limit on the number of train steps. If you'd like to train for a specific number of steps, you'll need to pass that in. Since the pre-trained model has already been trained for 1,000,000 steps, you should specify the total number of steps after pre-training and fine-tuning. For example, if you want to fine-tune for an additional 10,000 steps, you should pass

--gin_param="run.train_steps = 1010000"

You can also use a different batch size for fine-tuning. We set the batch size according to the total number of tokens in a batch. By default, a batch uses a sequence length of 512. To set the number of tokens in a batch, you should set

--gin_param = "tokens_per_batch=1048576"

Eval

In order to evaluate a model in the T5 framework, you need to use the eval.gin file, specify the model directory, decoding method, and which checkpoint step(s) to evaluate. So, to evaluate on the GLUE MRPC task using beam search on all checkpoints, use the following command:

t5_mesh_transformer \
  --tpu="${TPU_NAME}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --gin_file="${MODEL_DIR}/operative_config.gin" \
  --t5_tfds_data_dir=${DATA_DIR} \
  --gin_file="eval.gin" \
  --gin_file="beam_search.gin" \
  --gin_param="run.dataset_split = 'validation'" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '${TPU_SIZE}'" \
  --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'" \
  --gin_param="eval_checkpoint_step = 'all'"

To evaluate a specific checkpoint, simply set the eval_checkpoint_step parameter to appropriate checkpoint.

--gin_param="eval_checkpoint_step = 100000"

You can also use greedy_decode.gin or sample_decode.gin instead of beam_search.gin in the command above.

Decode

In order to produce predictions from a model in the T5 framework, you need to specify the model directory, decoding method, and which checkpoint step(s) to use for decoding. Assuming you have a text file of input sequences stored at /path/to/inputs.txt, an example command would be:

t5_mesh_transformer \
  --tpu="${TPU_NAME}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --gin_file="${MODEL_DIR}/operative_config.gin" \
  --gin_file="infer.gin" \
  --gin_file="sample_decode.gin" \
  --gin_param="input_filename = '/path/to/inputs.txt'"\
  --gin_param="output_filename = '/tmp/outputs.txt'"\
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '${TPU_SIZE}'"\
  --gin_param="infer_checkpoint_step = 'all'"

To predict with a specific checkpoint, simply set the infer_checkpoint_step parameter to appropriate checkpoint.

--gin_param="infer_checkpoint_step = 100000"

You can also use beam_search.gin or greedy_decode.gin instead of sample_decode.gin in the command above.

Export

You may also want to export a SavedModel, which is useful for serving your trained model, (e.g., when deploying with ML Engine or in a Docker image).

t5_mesh_transformer \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --use_model_api \
  --mode="export_predict" \
  --export_dir="/path/to/export/dir"

The command above exports the latest checkpoint in the model directory. To export a particular checkpoint, add the following flags:

  --checkpoint_mode="specific" \
  --checkpoint_steps=1000000

The t5-deploy notebook demonstrates exporting a SavedModel and packaging it in a Docker image for serving.

GPU Usage

If you would like to use GPU instead of TPUs, you can modify the above commands by removing TPU-specific flags (--tpu, --tpu_zone, --gcp_project) and setting the gin params for mesh_shape and mesh_devices based on your desired setup.

For example, if your machine has access to 6 GPUs and you'd like to do 3-way model parallelism and 2-way data parallelism, the fine-tuning command above would become:

t5_mesh_transformer  \
  --model_dir="${MODEL_DIR}" \
  --t5_tfds_data_dir="${DATA_DIR}" \
  --gin_file="dataset.gin" \
  --gin_param="utils.run.mesh_shape = 'model:3,batch:2'" \
  --gin_param="utils.run.mesh_devices = ['gpu:0','gpu:1','gpu:2','gpu:3','gpu:4','gpu:5']" \
  --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'" \
  --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin"

With a single GPU, the command is:

t5_mesh_transformer  \
  --model_dir="${MODEL_DIR}" \
  --t5_tfds_data_dir="${DATA_DIR}" \
  --gin_file="dataset.gin" \
  --gin_param="utils.run.mesh_shape = 'model:1,batch:1'" \
  --gin_param="utils.run.mesh_devices = ['gpu:0']" \
  --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'" \
  --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin"

Reproducing our experiments

We provide operative configs for all of the experiments in the paper in gs://t5-data/experiments. The experiments folder has different subdirectories corresponding to the different sections in our paper. For example, gs://t5-data/experiments/objectives contains the experiments from Section 3.3 ("Unsupervised objectives"). Each subdirectory of the objectives folder contains operative configs for some particular experiment (where loosely speaking an "experiment" is one of the rows in one of the tables in our paper).

Let's say you want to reproduce the results for the "Prefix language modeling" objective (the first row in Table 4). The operative configs for that experiment live in gs://t5-data/experiments/objectives/obj-prefix_lm. In the base directory, there is an operative config for pre-training the model (gs://t5-data/experiments/objectives/obj-prefix_lm/operative_config.gin). Then, there are subdirectories for each of the downstream fine-tuning mixtures we consider, each of which has its own operative config (for example, gs://t5-data/experiments/objectives/obj-prefix_lm/cnn_dailymail_v002/operative_config.gin). To run this experiment, first pre-train a model with the pre-training operative config:

export PRETRAIN_MODEL_DIR="${BUCKET}/obj-prefix_lm"
t5_mesh_transformer  \
  --tpu="${TPU_NAME}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${PRETRAIN_MODEL_DIR}" \
  --gin_file="gs://t5-data/experiments/objectives/obj-prefix_lm/operative_config.gin" \
  --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '${TPU_SIZE}'"

Then, you can fine-tune the pre-trained model on CNN/Daily Mail like so:

export FINETUNE_MODEL_DIR="${BUCKET}/obj-prefix_lm/cnn_dailymail_v002"
t5_mesh_transformer  \
  --tpu="${TPU_NAME}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${FINETUNE_MODEL_DIR}" \
  --gin_file="gs://t5-data/experiments/objectives/obj-prefix_lm/cnn_dailymail_v002/operative_config.gin" \
  --gin_param="init_checkpoint = '${PRETRAIN_MODEL_DIR}/model.ckpt-524288'" \
  --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '${TPU_SIZE}'"

Useful Options

Some training variants need multiple flags to be set at the same time. For each of the below variants, add the group of flags to ./third_party/py/t5/google/scripts/run_finetune.sh.

Deterministic training

  --train_gin_param="mesh_train_dataset_fn.seed=${SEED}" \
  --train_gin_param="utils.run.skip_seen_data = True" \

Language model

  --objective="lm" \
  --train_gin_param="utils.run.model_type = \"lm\"" \

Released Model Checkpoints

We have released the following checkpoints for pre-trained models described in our paper:

T5-Small (60 million parameters): gs://t5-data/pretrained_models/small
T5-Base (220 million parameters): gs://t5-data/pretrained_models/base
T5-Large (770 million parameters): gs://t5-data/pretrained_models/large
T5-3B (3 billion parameters): gs://t5-data/pretrained_models/3B
T5-11B (11 billion parameters): gs://t5-data/pretrained_models/11B

See here for a list of additional experimental pre-trained model checkpoints.

How to Cite

If you extend or use this work, please cite the paper where it was introduced:

@article{2020t5,
  author  = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
  title   = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {140},
  pages   = {1-67},
  url     = {http://jmlr.org/papers/v21/20-074.html}
}

text-to-text-transfer-transformer's People

Contributors

Stargazers

Watchers

Forkers

jack00101 lijun20 zhongyunuestc zuiwufenghua qianrenjian ml-lab michael-wzhu jingjing-gong jeonsworld bowenzzzzzz999 codeaudit frankchu0229 intuitionmachine grill-lab namisan dragomirradev yangliuy lettergram bxbc nanosofttech gaohuan2015 ideaplexus saitejas21 weili-nlp dogydev lylyone fishredleaf awesome-archive chapzq77 manikant92 nunofernandes-plight cxz beekbin hell-to-heaven jbdatascience sxjscience mars-wei ikaros0909 insighty iamthedkr hailingc thiefun2020 susangzj zouweidong91 hadryan romeowen rossgritz haojiepan1 shibei00 lynxgsm auscenery ricklentz saurabhkulkarni77 shyamalschandra rsharmapty shenjiawei19 liantieyu omar-florez yoheikikuta almoslmi xf05888 ashishpatel26 jrinconcol pranv12 los-phoenix diegosiqueir4 debuluoyi zlpmichelle phychaos razzlestorm ligo reshmarabi chaotingchang jin1258804025 etrigaen47 miyoungko fengqinlin kelly2016 trajanov tony1236 madamalarevanth winsomexjs anatoly-khomenko liuqiangict abhishekyana appcoreopc cwynjupt datasci-rigo xiaodanjiao xuxueshan88 pitrack mdjordjanova databill86 dushyanttara sadam1195 bfsujason june12mayank milkigit stjordanis lansatiankong

text-to-text-transfer-transformer's Issues

Will the Chinese model be released?

Question about model parallel version of adafactor

Thanks for answering all the questions on this repo diligently, it helps the community a lot.

I have few questions about ModelParallel version of Adafactor:

I am wondering do you do extra communication between model parallel devices in Adafactor step to get the row and column mean (G^2_t + \epsilon)1_m?
Do each device (GPU or TPU core) keep a copy of R_t and C_t?
I am looking at the Mesh TF adafactor implementation, unfortunately, it is a bit hard for me to ingest without knowing the full details of MTF. For example, is mtf.reduce_mean behind the scene actually doing all_reduce across devices?

Really appreciate you taking the time and helping the community.

Run T5 on a CPU for inference

Hi, great work on T5!

I'm looking to run T5 on a CPU for interactive predictions only, no fine-tuning. The given notebook provides great instructions for using T5 with a TPU, but I'm struggling to find how I could use it with a CPU?

I've tried changing the notebook similar to this:

model = t5.models.MtfModel(
    model_dir=MODEL_DIR,
    tpu=None,
    model_parallelism=model_parallelism,
    batch_size=train_batch_size,
    layout_rules="ensemble:ensemble,batch:batch,d_ff:model,heads:model,vocab:model,experts:batch,model1:batch1,cpu:0", # sometimes I include this, sometimes I don't - it doesn't seem to matter
    sequence_length={"inputs": 128, "targets": 32},
    learning_rate_schedule=0.003,
    save_checkpoints_steps=5000,
    keep_checkpoint_max=keep_checkpoint_max if ON_CLOUD else None,
    iterations_per_loop=100,
)

But I get these errors:

ValueError: Tensor dimension size not divisible by mesh dimension size: tensor_shape=Shape[outer_batch=1, batch=4, length=128] tensor_layout=TensorLayout(None, 0, None)

It seems likely that it has something to do with my TensorLayout being none. Would you mind giving me some tips for this? Thanks a bunch in advance.

Do we support GPU distributed training?

Hi, thanks for the awesome project!

Does the code base support distributed training? If not, is it possible to support it after some code modifications?

By the way, what is the way to set batch size and gpu number if I want to use GPU to train the model?

Thank you for your kind attention.

how to apply T5 model on cloze test

Is it possible to share dev set numbers for models in Table-14 of the paper

Hi,
You guys have done amazing job with putting the details in the paper, it is quite helpful to see such detailed study. Thanks for this.

I was wondering if it's possible to share dev set numbers for the 3 models (T5-large/3B/11B) presented in Table 14 of the paper. I know that test numbers are more trustworthy in general. But because most of these test numbers are obtained through leaderboard submissions (which are rate limited), they are hard to compare against for different systems. If you can provide the dev numbers also, they will be very helpful to compare against.

I personally specifically care about the GLUE and SuperGLUE tasks if that's possible to share.

Thanks again for the answering the questions.

How to set batch size of training?

When I try to change the batch size using --gin_param="sequences_per_batch=128" or --gin_param="tokens_per_batch=65536", the batch size seems always to be 32?

INFO:tensorflow:serialize_num_microbatches: tokens_per_microbatch_per_replica=2048 batch_dim=Dimension(name='batch', size=32) sequence_length={'inputs': 512, 'targets': 114} batch_per_replica=4 num_microbatches=1 I1208 11:05:22.407459 140391696871040 utils.py:1440] serialize_num_microbatches: tokens_per_microbatch_per_replica=2048 batch_dim=Dimension(name='batch', size=32) sequence_length={'inputs': 512, 'targets': 114} batch_per_replica=4 num_microbatches=1

Pretrain from scratch

Hi,

I am trying to pretrain from scratch and I am using the fine-tune colab example as a base for my code.
Everything runs fine except for the train part.

I have changed the train part to :

TRAIN_STEPS = 25000 #@param {type: "integer"}

model.train(
    mixture_or_task_name="ss3",
    steps = TRAIN_STEPS
)

It shows me the following error:
Required bindings for make_layer_stack not provided in config: ['layers']

I would assume this is a problem because I didn't define the model configuration.

Could you please let us know how you can adjust the fine-tune colab example to pretrain example from scratch ?

CHECK failed: (index) >= (0) when fine-tune MTF model

When I fine-tune a custom dataset, it throws me this error. Could you help me locate what cause it?

INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
I1212 16:14:50.494199 140587798564480 tpu_estimator.py:600] Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
I1212 16:14:50.494543 140587798564480 tpu_estimator.py:604] Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
I1212 16:14:51.149492 140587798564480 tpu_estimator.py:600] Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
I1212 16:14:51.150063 140587798564480 tpu_estimator.py:604] Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
I1212 16:14:52.438062 140587798564480 tpu_estimator.py:600] Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
I1212 16:14:52.438416 140587798564480 tpu_estimator.py:604] Dequeue next (1) batch(es) of data from outfeed.
[libprotobuf FATAL /sentencepiece/src/../third_party/protobuf-lite/google/protobuf/repeated_field.h:1505] CHECK failed: (index) >= (0):
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (index) >= (0):
Fatal Python error: Aborted

Do you use teacher Forcing during evaluation ?

Hi,

During training, you use teacher forcing to accelerate the training and to have a better convergence but do you also use teacher forcing during the evaluation?
Can we rely on the evaluation results or do we need to use prediction and then calculate the performance of the model?

If you are using it during the evaluation, how we turn it off in the gin files?

Thanks in advance for the reply.

Low TPU usage (under 7%) with default fine-tuning parameters, small model

The problem

Seems like TPU utilization is not effective. The CPU load in Google Cloud console is under 7% when fine tuning:

While performance of the fine-tuning seems to be pretty low (global_step/sec: 0.402755, examples/sec: 824.843):

WARNING:tensorflow:TPUPollingThread found TPU b't5-ex2' in state READY, and health HEALTHY.
W1118 22:32:02.386583 140230713861888 preempted_hook.py:91] TPUPollingThread found TPU b't5-ex2' in state READY, and health HEALTHY.
INFO:tensorflow:loss = 0.00076675415, step = 1001000 (248.289 sec)
I1118 22:32:31.888029 140234883987200 basic_session_run_hooks.py:260] loss = 0.00076675415, step = 1001000 (248.289 sec)
INFO:tensorflow:global_step/sec: 0.402755
I1118 22:32:31.890788 140234883987200 tpu_estimator.py:2307] global_step/sec: 0.402755
INFO:tensorflow:examples/sec: 824.843
I1118 22:32:31.891605 140234883987200 tpu_estimator.py:2308] examples/sec: 824.843
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
I1118 22:32:31.893662 140234883987200 tpu_estimator.py:600] Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
I1118 22:32:31.894027 140234883987200 tpu_estimator.py:604] Dequeue next (100) batch(es) of data from outfeed.
I1118 22:32:32.458967 140230713861888 transport.py:157] Attempting refresh to obtain initial access_token
WARNING:tensorflow:TPUPollingThread found TPU b't5-ex2' in state READY, and health HEALTHY.

How to reproduce

I use the following configuration, provided as an example of fine tuning:

export PROJECT=projectname
export ZONE=us-central1-b
export BUCKET=gs://uniquebucketname
export TPU_NAME=t5-ex2
export DATA_DIR="${BUCKET}/t5-boolq-data-dir"
export MODEL_DIR="${BUCKET}/t5_boolq-small-model_dir"

ctpu up --name=$TPU_NAME --project=$PROJECT --zone=$ZONE --tpu-size=v3-8 --tpu-only --tf-version=1.15.dev20190821 --noconf

t5_mesh_transformer --tpu="${TPU_NAME}" --gcp_project="${PROJECT}" --tpu_zone="${ZONE}" --model_dir="${MODEL_DIR}" --t5_tfds_data_dir="${DATA_DIR}" --gin_file="dataset.gin" --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" --gin_param="utils.tpu_mesh_shape.tpu_topology = '2x2'" --gin_param="MIXTURE_NAME = 'super_glue_boolq_v102'" --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin"

Where is the code for relative position embeddings?

More experimental information on C4 dataset

In your paper, table 9 gives the results on different number os tokens and figure 6 gives the coresponding loss. Can you provide the experimental information about the TPU numbers and training time ? Thank you!

Does T5 support sequence labeling (like NER) tasks?

Hi there, just as the title says. I wonder, does T5 support sequence labeling (like NER) tasks?

I don't see report of such tasks in the paper. (correct me If I am wrong).

Thank you very much!

Non-autoregressive decoding

The autoregressive nature makes inference difficult to parallelize and leads to high decoding latency. Have you noticed new researches in non-autoregressive decoding area? I found this: https://arxiv.org/abs/1908.07181
https://github.com/zomux/lanmt

GPU training: program is "killed" after "XLA compilation"

I tried training the code on a GPU, after including the changes made earlier today, I am having a memory issues. Just after 2020-01-08 16:11:33.715292: I tensorflow/compiler/jit/xla_compilation_cache.cc:238] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. message, the program crashes with a Killed message.

Here is an extended log for your attention:

> $ t5_mesh_transformer    --model_dir="danielk-files/models"   --t5_tfds_data_dir="danielk-files"   --gin_file="dataset.gin"   --gin_param="utils.run.mesh_shape = 'model:2,batch:1'"   --gin_param="utils.run.mesh_devices = ['gpu:0','gpu:1']"   --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'"   --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin" 
.
. 
.
Colocation members, user-requested devices, and framework assigned devices, if any:
  decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/shape (Const)
  decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/min (Const)
  decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/max (Const)
  decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
  decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/sub (Sub)
  decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/mul (Mul)
  decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform (Add)
  decoder/block_005/layer_001/EncDecAttention/o_slice_1 (VariableV2) /device:GPU:1
  decoder/block_005/layer_001/EncDecAttention/o_slice_1/Assign (Assign) /device:GPU:1
  decoder/block_005/layer_001/EncDecAttention/o_slice_1/read (Identity) /device:GPU:1
  decoder/block_005/layer_001/EncDecAttention/o_1/parallel_1_1/Assign (Assign) /device:GPU:1
  assign_1/parallel_1_96/Assign (Assign) /device:GPU:1

2020-01-08 16:10:35.215915: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/shape (Const)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/min (Const)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/max (Const)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/sub (Sub)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/mul (Mul)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform (Add)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0 (VariableV2) /device:GPU:0
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Assign (Assign) /device:GPU:0
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/read (Identity) /device:GPU:0
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_1/parallel_0_1/Assign (Assign) /device:GPU:0
  assign_1/parallel_0_97/Assign (Assign) /device:GPU:0

2020-01-08 16:10:35.216672: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/shape (Const)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/min (Const)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/max (Const)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/sub (Sub)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/mul (Mul)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform (Add)
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1 (VariableV2) /device:GPU:1
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Assign (Assign) /device:GPU:1
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/read (Identity) /device:GPU:1
  decoder/block_005/layer_002/DenseReluDense/wi/kernel_1/parallel_1_1/Assign (Assign) /device:GPU:1
  assign_1/parallel_1_97/Assign (Assign) /device:GPU:1

2020-01-08 16:10:35.217428: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/shape (Const)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/min (Const)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/max (Const)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/sub (Sub)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/mul (Mul)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform (Add)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0 (VariableV2) /device:GPU:0
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Assign (Assign) /device:GPU:0
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/read (Identity) /device:GPU:0
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_1/parallel_0_1/Assign (Assign) /device:GPU:0
  assign_1/parallel_0_98/Assign (Assign) /device:GPU:0

2020-01-08 16:10:35.218184: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/shape (Const)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/min (Const)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/max (Const)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/sub (Sub)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/mul (Mul)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform (Add)
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1 (VariableV2) /device:GPU:1
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Assign (Assign) /device:GPU:1
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/read (Identity) /device:GPU:1
  decoder/block_005/layer_002/DenseReluDense/wo/kernel_1/parallel_1_1/Assign (Assign) /device:GPU:1
  assign_1/parallel_1_98/Assign (Assign) /device:GPU:1

2020-01-08 16:10:35.254151: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/shape (Const)
  stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/min (Const)
  stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/max (Const)
  stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
  stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/sub (Sub)
  stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/mul (Mul)
  stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform (Add)
  stacked/shared/embedding_slot_vr_slice_0 (VariableV2) /device:GPU:0
  stacked/shared/embedding_slot_vr_slice_0/Assign (Assign) /device:GPU:0
  stacked/shared/embedding_slot_vr_slice_0/read (Identity) /device:GPU:0
  stacked/shared/embedding_slot_vr/parallel_0_1/Assign (Assign) /device:GPU:0
  assign/parallel_0/Assign (Assign) /device:GPU:0

2020-01-08 16:10:35.254968: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/shape (Const)
  stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/min (Const)
  stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/max (Const)
  stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
  stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/sub (Sub)
  stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/mul (Mul)
  stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform (Add)
  stacked/shared/embedding_slot_vr_slice_1 (VariableV2) /device:GPU:1
  stacked/shared/embedding_slot_vr_slice_1/Assign (Assign) /device:GPU:1
  stacked/shared/embedding_slot_vr_slice_1/read (Identity) /device:GPU:1
  stacked/shared/embedding_slot_vr/parallel_1_1/Assign (Assign) /device:GPU:1
  assign/parallel_1/Assign (Assign) /device:GPU:1

2020-01-08 16:10:35.255704: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  shared/embedding_slot_vc_slice_0/Initializer/random_uniform/shape (Const)
  shared/embedding_slot_vc_slice_0/Initializer/random_uniform/min (Const)
  shared/embedding_slot_vc_slice_0/Initializer/random_uniform/max (Const)
  shared/embedding_slot_vc_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
  shared/embedding_slot_vc_slice_0/Initializer/random_uniform/sub (Sub)
  shared/embedding_slot_vc_slice_0/Initializer/random_uniform/mul (Mul)
  shared/embedding_slot_vc_slice_0/Initializer/random_uniform (Add)
  shared/embedding_slot_vc_slice_0 (VariableV2) /device:GPU:0
  shared/embedding_slot_vc_slice_0/Assign (Assign) /device:GPU:0
  shared/embedding_slot_vc_slice_0/read (Identity) /device:GPU:0
  shared/embedding_slot_vc_1/parallel_0_1/Assign (Assign) /device:GPU:0
  assign/parallel_0_1/Assign (Assign) /device:GPU:0

2020-01-08 16:10:35.256476: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  shared/embedding_slot_vc_slice_1/Initializer/random_uniform/shape (Const)
  shared/embedding_slot_vc_slice_1/Initializer/random_uniform/min (Const)
  shared/embedding_slot_vc_slice_1/Initializer/random_uniform/max (Const)
  shared/embedding_slot_vc_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
  shared/embedding_slot_vc_slice_1/Initializer/random_uniform/sub (Sub)
  shared/embedding_slot_vc_slice_1/Initializer/random_uniform/mul (Mul)
  shared/embedding_slot_vc_slice_1/Initializer/random_uniform (Add)
  shared/embedding_slot_vc_slice_1 (VariableV2) /device:GPU:1
  shared/embedding_slot_vc_slice_1/Assign (Assign) /device:GPU:1
  shared/embedding_slot_vc_slice_1/read (Identity) /device:GPU:1
  shared/embedding_slot_vc_1/parallel_1_1/Assign (Assign) /device:GPU:1
  assign/parallel_1_1/Assign (Assign) /device:GPU:1

2020-01-08 16:10:35.257669: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/shape (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/min (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/max (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/sub (Sub)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/mul (Mul)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform (Add)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0 (VariableV2) /device:GPU:0
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Assign (Assign) /device:GPU:0
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/read (Identity) /device:GPU:0
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr/parallel_0_1/Assign (Assign) /device:GPU:0
  assign/parallel_0_2/Assign (Assign) /device:GPU:0

2020-01-08 16:10:35.258527: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/shape (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/min (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/max (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/sub (Sub)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/mul (Mul)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform (Add)
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1 (VariableV2) /device:GPU:1
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Assign (Assign) /device:GPU:1
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/read (Identity) /device:GPU:1
  stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr/parallel_1_1/Assign (Assign) /device:GPU:1
  assign/parallel_1_2/Assign (Assign) /device:GPU:1

2020-01-08 16:10:35.260295: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/shape (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/min (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/max (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/sub (Sub)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/mul (Mul)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform (Add)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0 (VariableV2) /device:GPU:0
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Assign (Assign) /device:GPU:0
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/read (Identity) /device:GPU:0
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v/parallel_0_1/Assign (Assign) /device:GPU:0
  assign/parallel_0_3/Assign (Assign) /device:GPU:0

2020-01-08 16:10:35.261051: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/shape (Const)

Colocation members, user-requested devices, and framework assigned devices, if any:
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/shape (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/min (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/max (Const)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/sub (Sub)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/mul (Mul)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform (Add)
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1 (VariableV2) /device:GPU:1
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Assign (Assign) /device:GPU:1
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/read (Identity) /device:GPU:1
  stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v/parallel_1_1/Assign (Assign) /device:GPU:1
  assign/parallel_1_3/Assign (Assign) /device:GPU:1

2020-01-08 16:10:35.262048: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/shape (Const)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/min (Const)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/max (Const)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/sub (Sub)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/mul (Mul)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform (Add)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0 (VariableV2) /device:GPU:0
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Assign (Assign) /device:GPU:0
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/read (Identity) /device:GPU:0
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc/parallel_0_1/Assign (Assign) /device:GPU:0
  assign/parallel_0_4/Assign (Assign) /device:GPU:0

2020-01-08 16:10:35.262775: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/shape (Const)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/min (Const)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/max (Const)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/sub (Sub)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/mul (Mul)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform (Add)
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1 (VariableV2) /device:GPU:1
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Assign (Assign) /device:GPU:1
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/read (Identity) /device:GPU:1
  stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc/parallel_1_1/Assign (Assign) /device:GPU:1
  assign/parallel_1_4/Assign (Assign) /device:GPU:1

2020-01-08 16:10:35.288719: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/shape (Const)
  decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/min (Const)
  decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/max (Const)
  decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
  decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/sub (Sub)
  decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/mul (Mul)
  decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform (Add)
  decoder/final_layer_norm/scale_slot_v_slice_0 (VariableV2) /device:GPU:0
  decoder/final_layer_norm/scale_slot_v_slice_0/Assign (Assign) /device:GPU:0
  decoder/final_layer_norm/scale_slot_v_slice_0/read (Identity) /device:GPU:0
  decoder/final_layer_norm/scale_slot_v_1/parallel_0_1/Assign (Assign) /device:GPU:0
  assign/parallel_0_5/Assign (Assign) /device:GPU:0

2020-01-08 16:10:35.289603: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/shape (Const)
  decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/min (Const)
  decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/max (Const)
  decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
  decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/sub (Sub)
  decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/mul (Mul)
  decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform (Add)
  decoder/final_layer_norm/scale_slot_v_slice_1 (VariableV2) /device:GPU:1
  decoder/final_layer_norm/scale_slot_v_slice_1/Assign (Assign) /device:GPU:1
  decoder/final_layer_norm/scale_slot_v_slice_1/read (Identity) /device:GPU:1
  decoder/final_layer_norm/scale_slot_v_1/parallel_1_1/Assign (Assign) /device:GPU:1
  assign/parallel_1_5/Assign (Assign) /device:GPU:1

INFO:tensorflow:Running local_init_op.
I0108 16:10:37.353017 140685207750400 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0108 16:10:37.880563 140685207750400 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Before copy master to slices.
I0108 16:10:38.634351 140685207750400 ops.py:5541] Before copy master to slices.
INFO:tensorflow:Done with copy master to slices.
I0108 16:10:39.607687 140685207750400 ops.py:5543] Done with copy master to slices.
INFO:tensorflow:Saving checkpoints for 0 into danielk-files/models/model.ckpt.
I0108 16:10:51.266983 140685207750400 basic_session_run_hooks.py:606] Saving checkpoints for 0 into danielk-files/models/model.ckpt.
INFO:tensorflow:Before Save.
I0108 16:10:51.276291 140685207750400 ops.py:5516] Before Save.
INFO:tensorflow:About to write a checkpoint
I0108 16:10:52.409570 140685207750400 ops.py:5518] About to write a checkpoint
INFO:tensorflow:danielk-files/models/model.ckpt-0 is not in all_model_checkpoint_paths. Manually adding it.
I0108 16:10:53.351364 140685207750400 checkpoint_management.py:95] danielk-files/models/model.ckpt-0 is not in all_model_checkpoint_paths. Manually adding it.
INFO:tensorflow:Done writing checkpoint.
I0108 16:10:55.473980 140685207750400 ops.py:5521] Done writing checkpoint.
import feature targets[[[7072 1 7072 1 7072 1 7072 1 7072 1 7072 1 7072 1 7072 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][59 834 15 1169 15592 1 7072 1 7072 1 59 834 15 1169 15592 1 7072 1 7072 1 7072 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...]]...]import feature targets_segmentation[[[1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][1 1 1 1 1 1 2 2 3 3 4 4 4 4 4 4 5 5 6 6 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...]]...]

import feature inputs[[[3 51 52 102 75 7142 536 10 86 3 9 2493 865 3 6 3 88 243 34 4283 112 596 141 9717 3 9 720 710 3 5 7142 357 10 3969 40 18276 28325 26 16 3 9 2493 4683 2818 24 112 563 164 43 9717 96 3 9 720 710 3 5 96 1 3 51 52 102 75 7142 536 10 96 30628 65 2994 186 203 13 3014 3 233 581 385 42 150 2259 3 6 96 10261 1836 243 16 8 2493 3 5 7142 357 10 96 30628 65 2994 186 203 13 3014 3190 4148 8852 581 385 42 150 2259 3 6 96 243 3 5494 2146 15702 6187 10261 1836 3 5 1 3 51 52 102 75 7142 536 10 25103 3 6 8 1113 13 1473 7 6914 19 5657 30587 7 190 8 1719 3 5 7142 357 10 25103 3 6 1473 3 31 7 6914 33 5657 30587 7 190 8 1719 3 5 1 3 51 52 102 75 7142 536 10 907 641 65 1866 8 690 1514 6154 770 16 17524 21 3586 8 166 1751 13 4311 8874 3 5 7142 357 10 907 65 1866 1514 6154 770 16 17524 21 12385 12 942 4311 8874 3 5 1 3 51 52 102 75 7142 536 10 216 3 9925 38 46 1038 4297 8211 30 2645 6834 7 12 36 3 9 14625 2378 11 37 101 1639 222 7505 3 5 7142 357 10 71 9396 1424 8211 113 1279 30 1267 379 2645 6834 7 304 493 3 9 14625 2378 3 58 1 3 51 52 102 75 7142 536 10 23066 43 4313 10209 12778 13485 30 3 10363 3972 7159 1296 24 164 554 10475 32 7 15 6917 12 824 6716 3 5 7142 357 10 9864 11 112 372 43 4313 10209 12778 13485 24 9296 1137 42 554 10475 32 7 15 6917 12 824 6716 3 5 1 3 51 52 102 75 7142 536 10 12394 4794 25394 11385 7 16 1798 3370 2213 4599 3 31 37 29210 127 3 31 18786 21 8 3 4060 189 2041 18050 3 31 7 29952 21670 13 1718 5396 6751 262 1014 21537 3 5 7142 357 10 1881 18 279 2741 7 2213 4599 3 31 8 29210 127 3 31 18786 5978 7 16 8 18050 3 31 7 29952 21670 13 1718 5396 6751 262 1014 21537 1701 3 5 1 3 51 52 102 75 7142 536 10 86 119 1234 3 6 17240 6610 19 3 30273 26 12 726 21 3 476 3205 3426 3 31 7 9953 581 2900 20055 17240 3 5 7142 357 10 9046 6402 3 6 17240 6610 56 36 2418 53 8 2876 21 3 476 3205 3 31 9953 581 2900 20055 17240 3 5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][3 51 52 102 75 7142 536 10 71 903 13 8 20 900 6682 646 1187 116 3 9 388 10719 112 443 190 3 9 3 17208 7208 512 16 4625 20268 30 2875 3 5 7142 357 10 37 643 13 3 9 7584 7797 365 3 9 4459 3 2046 102 16 851 13 3 9 443 24 3 102 22411 190 3 9 3 17208 7208 512 16 4625 20268 3 5 1 3 51 52 102 75 7142 536 10 1363 1793 23 19001 23 3 60 3007 5100 1363 2974 32 5208 3 6 2145 112 21029 130 96 21001 96 3 5 7142 357 10 1793 23 19001 23 1219 19644 44 112 828 24 2974 32 5208 3 31 7 21029 130 96 21001 3 5 96 1 3 51 52 102 75 7142 536 10 24583 4300 1390 12 7464 19852 1213 2747 23620 13335 21 81 1514 3 4060 770 16 1723 3 6 8 688 243 1701 3 5 7142 357 10 24583 4300 10052 5 19 3 19031 2747 23620 13335 3937 3 5 3 6 3 9 19852 1213 8106 13 331 18 20393 889 3 6 21 3241 1514 3 4060 770 3 6 8 688 243 1701 3 5 1 3 51 52 102 75 7142 536 10 25394 243 30 2875 24 66 898 4627 724 13928 11 133 36 5285 21 4845 11 4798 3 5 7142 357 10 216 243 8 20395 3 31 7 4627 56 36 19257 11 5285 21 4845 11 4798 3 5 1 3 51 52 102 75 7142 536 10 37 29 8 5015 54 1520 430 356 13 452 3507 7 30 7954 3 287 4246 53 1358 3 6 8 5191 243 3 5 7142 357 10 299 8 5191 3 6 5181 1060 3 6 243 8 5015 54 1520 430 356 13 452 3507 7 30 7954 3 287 4246 53 1358 227 2239 3 5 1 3 51 52 102 75 7142 536 10 486 709 2838 797 12673 43 118 4792 16 1041 437 8905 10126 779 4719 147 30 932 209 3 5 7142 357 10 886 386 9611 797 11 2390 12673 43 118 4792 437 8905 10126 779 4719 147 16 7457 30 932 209 3 5 1 3 51 52 102 75 7142 536 10 37 5923 3271 7 13 1473 3 6 4623 11 662 2069 6578 9352 43 1736 16 14465 11 3754 3 9 1487 21 70 3518 1034 563 53 3 5 7142 357 10 37 3427 6323 7 13 1473 3 6 4623 11 662 2808 6578 1440 3814 10663 30 2818 24 356 1390 16 4644 21 3 9 307 18 9 13106 3518 1181 18 14389 2050 16 412 172 346 2168 5627 3 5 1 0 0 0 0 0 0 0 0 0 0...]]...]
import feature inputs_segmentation[[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 0 0 0 0 0 0 0 0 0 0...]]...]
2020-01-08 16:11:33.715292: I tensorflow/compiler/jit/xla_compilation_cache.cc:238] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
Killed

My GPU specs:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Wed Jan  8 16:25:55 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro GV100        On   | 00000000:01:00.0 Off |                  Off |
| 29%   41C    P2    25W / 250W |      0MiB / 32478MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 8000     On   | 00000000:02:00.0 Off |                  Off |
| 33%   28C    P8    11W / 260W |      0MiB / 48571MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Memory info:

top - 16:29:15 up 47 days,  7:31,  4 users,  load average: 0.11, 0.54, 1.95
Tasks: 648 total,   1 running, 647 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.7 sy,  0.0 ni, 99.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
GiB Mem :   62.842 total,   61.903 free,    0.506 used,    0.433 buff/cache
GiB Swap:    8.000 total,    7.730 free,    0.270 used.   61.731 avail Mem

and pip packages:

$ pip list
Package                       Version
----------------------------- ----------
absl-py                       0.9.0
alabaster                     0.7.12
allennlp                      0.9.0
astor                         0.8.1
attrs                         19.3.0
Babel                         2.8.0
blis                          0.2.4
boto                          2.49.0
boto3                         1.10.49
botocore                      1.13.49
cachetools                    4.0.0
certifi                       2019.11.28
chardet                       3.0.4
Click                         7.0
conllu                        1.3.1
cycler                        0.10.0
cymem                         2.0.3
dill                          0.3.1.1
distro                        1.4.0
docutils                      0.15.2
editdistance                  0.5.3
flaky                         3.6.1
Flask                         1.1.1
Flask-Cors                    3.0.8
ftfy                          5.6
future                        0.18.2
gast                          0.2.2
gevent                        1.4.0
gin-config                    0.3.0
google-api-core               1.15.0
google-api-python-client      1.7.11
google-auth                   1.10.0
google-auth-httplib2          0.0.3
google-cloud-core             1.1.0
google-cloud-storage          1.24.1
google-compute-engine         2.8.13
google-pasta                  0.1.8
google-resumable-media        0.5.0
googleapis-common-protos      1.6.0
greenlet                      0.4.15
grpcio                        1.26.0
h5py                          2.10.0
httplib2                      0.15.0
idna                          2.8
imagesize                     1.2.0
importlib-metadata            1.3.0
itsdangerous                  1.1.0
Jinja2                        2.10.3
jmespath                      0.9.4
joblib                        0.14.1
jsonnet                       0.14.0
jsonpickle                    1.2
Keras-Applications            1.0.8
Keras-Preprocessing           1.1.0
kiwisolver                    1.1.0
Markdown                      3.1.1
MarkupSafe                    1.1.1
matplotlib                    3.1.2
mesh-tensorflow               0.1.9
more-itertools                8.0.2
murmurhash                    1.0.2
nltk                          3.4.5
numpy                         1.18.1
numpydoc                      0.9.2
oauth2client                  4.1.3
opt-einsum                    3.1.0
overrides                     2.8.0
packaging                     20.0
pandas                        0.25.3
parsimonious                  0.8.1
pip                           19.3.1
plac                          0.9.6
pluggy                        0.13.1
portalocker                   1.5.2
preshed                       2.0.1
promise                       2.3
protobuf                      3.11.2
py                            1.8.1
pyasn1                        0.4.8
pyasn1-modules                0.2.7
Pygments                      2.5.2
pyparsing                     2.4.6
pytest                        5.3.2
python-dateutil               2.8.1
pytorch-pretrained-bert       0.6.2
pytorch-transformers          1.1.0
pytz                          2019.3
regex                         2020.1.8
requests                      2.22.0
responses                     0.10.9
rouge-score                   0.0.3
rsa                           4.0
s3transfer                    0.2.1
sacrebleu                     1.4.3
scikit-learn                  0.22.1
scipy                         1.4.1
sentencepiece                 0.1.85
setuptools                    44.0.0
six                           1.13.0
snowballstemmer               2.0.0
spacy                         2.1.9
Sphinx                        2.3.1
sphinxcontrib-applehelp       1.0.1
sphinxcontrib-devhelp         1.0.1
sphinxcontrib-htmlhelp        1.0.2
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.2
sphinxcontrib-serializinghtml 1.1.3
sqlparse                      0.3.0
srsly                         1.0.1
t5                            0.1.7
tensorboard                   1.15.0
tensorboardX                  2.0
tensorflow                    1.15.0
tensorflow-datasets           1.3.2
tensorflow-estimator          1.15.1
tensorflow-metadata           0.21.0
tensorflow-text               1.15.0rc0
termcolor                     1.1.0
thinc                         7.0.8
torch                         1.3.1
tqdm                          4.41.1
typing                        3.7.4.1
Unidecode                     1.1.1
uritemplate                   3.0.1
urllib3                       1.25.7
wasabi                        0.6.0
wcwidth                       0.1.8
Werkzeug                      0.16.0
wheel                         0.33.6
word2number                   1.1
wrapt                         1.11.2
zipp                          0.6.0

Colab example fails with mixture

https://colab.research.google.com/drive/13shlV1wk9e9vcnpI77YvFbBBWAQKYACS

task = t5.data.MixtureRegistry.get("glue_v002_proportional")
ds = task.get_dataset(split="validation", sequence_length={"inputs": 128, "targets": 32})
tfds.as_numpy(ds.take(5))

/usr/local/lib/python3.6/dist-packages/t5/data/utils.py in assert_cached(self)
    608     """Raises an assertion error if cached dataset does not exist."""
    609     assert self.cached, (
--> 610         "'%s' does not exist in any of the task cache directories" % self.name)
    611 
    612   def get_cached_stats(self, split=tfds.Split.TRAIN):

AssertionError: 'glue_cola_v002' does not exist in any of the task cache directories
  In call to configurable 'rate_num_examples' (<function rate_num_examples at 0x7f025a3afae8>)

Proto duplication

Hi there,

I'm trying to fine tune a model on my own dataset, running on a GCP TPU. After following the setup instructions and installing t5 on the VM instance, i run into the following error when trying to execute t5_mesh_transformer:

TypeError: Conflict register for file "tensorboard/compat/proto/tensor_shape.proto": tensorboard.TensorShapeProto is already defined in file "tensorboard/src/tensor_shape.proto". Please fix the conflict by adding package name on the proto file, or use different name for the duplication.

Thanks in advance for any help you can give,
James

Customizing tokenizer

Is there any way to customize tokenizer so that it does not append 1 at the end of tokenized output.

Are newline '\n' characters allowed in .tsv data format?

I am trying to finetune the model on a new dataset. My data has newline characters. Even though sentence-piece should be able to handle newlines, I am not how to mix dataset newlines and tsv file-format newlines

Consider this tsv data-
Hi there!\nHello! How are you doing!?\t I am doing great!
Sample dataline1\nSample dataline2\t Sample dataline3\nSample dataline4

I want to ask if is \n the right way to encode newlines for tsv file?

Example for using masked LM inference on pretrained model?

Hi,

Thanks for releasing the models and experimental configurations. What's not clear to me is something simple -- how to download a pretrained model, and simply use it to predict masked LM from pre-training. It appears the collab link registers a new task and finetunes on it. Is there instructions for simply taking a model after pre-training, and using that directly on new examples?

Thanks!

Never ending “The TPU worker may not be ready (still scheduling)” warning

Trying to connect to my TPU instance but keep getting this warning (for the past 10 hours or so).
I am not sure if it's a T5-related issue or something to do with the way I set up TPU.
Any thoughts what could be wrong?

(env37) danielk0014-2:text-to-text-transfer-transformer danielk$ t5_mesh_transformer  \
>   --tpu="daniels-tpu" \
>   --gcp_project="testing-out-tpus" \
>   --tpu_zone="europe-west4-a" \
>   --t5_tfds_data_dir="gs://t5-files" \
>   --gin_file="dataset.gin" \
>   --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
>   --gin_param="utils.tpu_mesh_shape.tpu_topology = '2x2'" \
>   --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'" \
>   --model_dir="gs://t5-files/models" \
>   --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin" 
WARNING:tensorflow:From /Users/danielk/ideaProjects/text-to-text-transfer-transformer/env37/lib/python3.7/site-packages/tensorflow_core/python/compat/v2_compat.py:68: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
INFO:tensorflow:model_type=bitransformer
I0108 00:22:22.273789 4582849984 utils.py:1625] model_type=bitransformer
INFO:tensorflow:mode=train
I0108 00:22:22.273949 4582849984 utils.py:1626] mode=train
INFO:tensorflow:sequence_length={'inputs': 512, 'targets': 512}
I0108 00:22:22.274010 4582849984 utils.py:1627] sequence_length={'inputs': 512, 'targets': 512}
INFO:tensorflow:batch_size=2048
I0108 00:22:22.274062 4582849984 utils.py:1628] batch_size=2048
INFO:tensorflow:train_steps=1000000000
I0108 00:22:22.274109 4582849984 utils.py:1629] train_steps=1000000000
INFO:tensorflow:mesh_shape=Shape[batch=8]
I0108 00:22:22.274163 4582849984 utils.py:1630] mesh_shape=Shape[batch=8]
INFO:tensorflow:layout_rules=ensemble:ensemble,batch:batch,d_ff:model,heads:model,vocab:model,experts:batch
I0108 00:22:22.274212 4582849984 utils.py:1631] layout_rules=ensemble:ensemble,batch:batch,d_ff:model,heads:model,vocab:model,experts:batch
INFO:tensorflow:Building TPUConfig with tpu_job_name=None
I0108 00:22:22.277817 4582849984 utils.py:1646] Building TPUConfig with tpu_job_name=None
I0108 00:22:22.280086 4582849984 discovery.py:271] URL being requested: GET https://www.googleapis.com/discovery/v1/apis/tpu/v1/rest
I0108 00:22:22.579149 4582849984 discovery.py:867] URL being requested: GET https://tpu.googleapis.com/v1/projects/testing-out-tpus/locations/europe-west4-a/nodes/daniels-tpu?alt=json
I0108 00:22:22.579305 4582849984 transport.py:157] Attempting refresh to obtain initial access_token
I0108 00:22:22.608117 4582849984 client.py:777] Refreshing access_token
I0108 00:22:23.236666 4582849984 discovery.py:271] URL being requested: GET https://www.googleapis.com/discovery/v1/apis/tpu/v1/rest
I0108 00:22:23.555082 4582849984 discovery.py:867] URL being requested: GET https://tpu.googleapis.com/v1/projects/testing-out-tpus/locations/europe-west4-a/nodes/daniels-tpu?alt=json
I0108 00:22:23.555221 4582849984 transport.py:157] Attempting refresh to obtain initial access_token
I0108 00:22:23.584164 4582849984 client.py:777] Refreshing access_token
INFO:tensorflow:Using config: {'_model_dir': 'gs://t5-files/models', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.240.1.2:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x160559410>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.240.1.2:8470', '_evaluation_master': 'grpc://10.240.1.2:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=None, num_cores_per_replica=1, per_host_input_for_training=4, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': <tensorflow.python.distribute.cluster_resolver.tpu_cluster_resolver.TPUClusterResolver object at 0x15a9b1cd0>}
I0108 00:22:24.162949 4582849984 estimator.py:212] Using config: {'_model_dir': 'gs://t5-files/models', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.240.1.2:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x160559410>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.240.1.2:8470', '_evaluation_master': 'grpc://10.240.1.2:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=None, num_cores_per_replica=1, per_host_input_for_training=4, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': <tensorflow.python.distribute.cluster_resolver.tpu_cluster_resolver.TPUClusterResolver object at 0x15a9b1cd0>}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0108 00:22:24.163352 4582849984 tpu_context.py:220] _TPUContext: eval_on_tpu True
INFO:tensorflow:Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
I0108 00:22:24.408021 4582849984 tpu_system_metadata.py:78] Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
2020-01-08 00:22:24.409535: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
WARNING:tensorflow:Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
W0108 00:27:24.414439 4582849984 tpu_system_metadata.py:97] Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
WARNING:tensorflow:Retrying (1/288).
W0108 00:27:24.414757 4582849984 tpu_system_metadata.py:98] Retrying (1/288).
INFO:tensorflow:Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
I0108 00:27:24.414932 4582849984 tpu_system_metadata.py:78] Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
2020-01-08 00:27:24.415889: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
WARNING:tensorflow:Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
W0108 00:32:24.421372 4582849984 tpu_system_metadata.py:97] Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
WARNING:tensorflow:Retrying (2/288).
W0108 00:32:24.421593 4582849984 tpu_system_metadata.py:98] Retrying (2/288).
INFO:tensorflow:Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
I0108 00:32:24.421711 4582849984 tpu_system_metadata.py:78] Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
2020-01-08 00:32:24.422493: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
WARNING:tensorflow:Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
W0108 00:37:24.429055 4582849984 tpu_system_metadata.py:97] Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
WARNING:tensorflow:Retrying (3/288).
W0108 00:37:24.429368 4582849984 tpu_system_metadata.py:98] Retrying (3/288).
INFO:tensorflow:Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
I0108 00:37:24.429541 4582849984 tpu_system_metadata.py:78] Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
2020-01-08 00:37:24.430392: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
WARNING:tensorflow:Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
W0108 00:42:24.454765 4582849984 tpu_system_metadata.py:97] Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
WARNING:tensorflow:Retrying (4/288).
W0108 00:42:24.454980 4582849984 tpu_system_metadata.py:98] Retrying (4/288).
INFO:tensorflow:Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
I0108 00:42:24.455092 4582849984 tpu_system_metadata.py:78] Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
2020-01-08 00:42:24.455882: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created. 
WARNING:tensorflow:Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
W0108 00:47:24.462080 4582849984 tpu_system_metadata.py:97] Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
WARNING:tensorflow:Retrying (5/288).
W0108 00:47:24.462295 4582849984 tpu_system_metadata.py:98] Retrying (5/288).
INFO:tensorflow:Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
I0108 00:47:24.462409 4582849984 tpu_system_metadata.py:78] Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
2020-01-08 00:47:24.463112: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
WARNING:tensorflow:Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
W0108 00:52:24.471292 4582849984 tpu_system_metadata.py:97] Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
WARNING:tensorflow:Retrying (6/288).
W0108 00:52:24.471508 4582849984 tpu_system_metadata.py:98] Retrying (6/288).
INFO:tensorflow:Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
I0108 00:52:24.471624 4582849984 tpu_system_metadata.py:78] Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
2020-01-08 00:52:24.472351: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
WARNING:tensorflow:Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
W0108 00:57:24.479638 4582849984 tpu_system_metadata.py:97] Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).
WARNING:tensorflow:Retrying (7/288).
W0108 00:57:24.479855 4582849984 tpu_system_metadata.py:98] Retrying (7/288).
INFO:tensorflow:Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
I0108 00:57:24.479971 4582849984 tpu_system_metadata.py:78] Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
2020-01-08 00:57:24.480666: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
WARNING:tensorflow:Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or the Tensorflow master address is incorrect: got (grpc://10.240.1.2:8470).

For completeness, here is how I launched my TPU:

ctpu up --name=daniels-tpu --zone=europe-west4-a --tpu-size=v3-8 --tf-version=1.15  --disk-size-gb=2000

Configurable 'utils.run' doesn't have a parameter named 'mesh_devices'.

Hi, I am trying to run the t5 on my own server with single gpu with your command
t5_mesh_transformer --model_dir="${MODEL_DIR}" --t5_tfds_data_dir="${DATA_DIR}" --gin_file="dataset.gin" --gin_param="utils.run.mesh_shape = 'model:1,batch:1'" --gin_param="utils.run.mesh_devices = ['gpu:0']" --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'" --gin_file="./text-to-text-transfer-transformer/operative_config.gin"
(the operative_config.gin has been downloaded by gsutil)
but it gave an value error

ValueError: Configurable 'utils.run' doesn't have a parameter named 'mesh_devices'.
In bindings string line 2
utils.run.mesh_devices = ['gpu:0']

Do you know the possible reason and could you offer me a solution? Thank you!

StructBert

The StructBERT have released their paper. It's very good to see the great performance improvement when incorporating language structures into language model. They use the same data as BERT, but the base model can get a comparable score with bert-large in GLUE.

So I want to know if T5 have tried this unsupervised objective recently? And if it works better than the span mlm?

"Not found: Key decoder/block_000/layer_000/SelfAttention/relative_attention_bias not found in checkpoint"

When running the following command for fine-tuning:

t5_mesh_transformer  \
  --t5_tfds_data_dir="gs://danielk-files" \
  --gin_file="dataset.gin" \
  --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '2x2'" \
  --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'" \
  --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin"

I am getting the following error:

Not found: Key decoder/block_000/layer_000/SelfAttention/relative_attention_bias not found in checkpoint

Here is the full error log:

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:tensorflow:Graph was finalized.
I0107 10:43:38.246214 140625456166720 monitored_session.py:240] Graph was finalized.
2020-01-07 10:43:38.246462: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-01-07 10:43:38.277149: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2020-01-07 10:43:38.279231: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x91ab450 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-07 10:43:38.279274: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-01-07 10:43:38.755452: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x914b5f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-01-07 10:43:38.755483: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Quadro RTX 8000, Compute Capability 7.5
2020-01-07 10:43:38.755491: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Quadro RTX 8000, Compute Capability 7.5
2020-01-07 10:43:38.755498: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): Quadro RTX 8000, Compute Capability 7.5
2020-01-07 10:43:38.772836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Quadro RTX 8000 major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:17:00.0
2020-01-07 10:43:38.774193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: Quadro RTX 8000 major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:65:00.0
2020-01-07 10:43:38.775493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties:
name: Quadro RTX 8000 major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:b3:00.0
2020-01-07 10:43:38.775601: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-01-07 10:43:38.775641: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-01-07 10:43:38.775673: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-01-07 10:43:38.775704: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-01-07 10:43:38.775735: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-01-07 10:43:38.775765: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-01-07 10:43:38.775796: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2020-01-07 10:43:38.775803: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required
 libraries for your platform.
Skipping registering GPU devices...
2020-01-07 10:43:38.775937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-07 10:43:38.775945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1 2
2020-01-07 10:43:38.775949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N Y Y
2020-01-07 10:43:38.775953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   Y N Y
2020-01-07 10:43:38.775957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 2:   Y Y N
INFO:tensorflow:Restoring parameters from /tmp/transformer_standalone/model.ckpt-0
I0107 10:43:38.778871 140625456166720 saver.py:1284] Restoring parameters from /tmp/transformer_standalone/model.ckpt-0
2020-01-07 10:43:44.485832: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key decoder/block_000/layer_000/SelfAttention/relative_attention_bias not found in checkpoint
ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key decoder/block_000/layer_000/SelfAttention/relative_attention_bias not found in checkpoint
         [[node save/RestoreV2_1 (defined at /lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/RestoreV2_1':
  File "/bin/t5_mesh_transformer", line 8, in <module>
    sys.exit(console_entry_point())
  File "/lib/python3.6/site-packages/t5/models/mesh_transformer_main.py", line 218, in console_entry_point
    app.run(main)
  File "/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/lib/python3.6/site-packages/t5/models/mesh_transformer_main.py", line 212, in main
    model_dir=FLAGS.model_dir)
  File "/lib/python3.6/site-packages/gin/config.py", line 1055, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 1701, in run
    train_dataset_fn, train_steps, ensemble_inputs)
  File "/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 1092, in train_model
    estimator.train(input_fn=input_fn, max_steps=train_steps)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
    saving_listeners=saving_listeners)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
    config)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3126, in _model_fn
    features, labels, is_export_mode=is_export_mode)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1663, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1994, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 599, in my_model_fn
    save_relative_paths=True)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 502, in _build_internal
    restore_sequentially, reshape)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 381, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

E0107 10:43:44.498865 140625456166720 error_handling.py:75] Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key decoder/block_000/layer_000/SelfAttention/relative_attention_bias not found in checkpoint
         [[node save/RestoreV2_1 (defined at /lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/RestoreV2_1':
  File "/bin/t5_mesh_transformer", line 8, in <module>
    sys.exit(console_entry_point())
  File "/lib/python3.6/site-packages/t5/models/mesh_transformer_main.py", line 218, in console_entry_point
    app.run(main)
  File "/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/lib/python3.6/site-packages/t5/models/mesh_transformer_main.py", line 212, in main
    model_dir=FLAGS.model_dir)
  File "/lib/python3.6/site-packages/gin/config.py", line 1055, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 1701, in run
    train_dataset_fn, train_steps, ensemble_inputs)
  File "/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 1092, in train_model
    estimator.train(input_fn=input_fn, max_steps=train_steps)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
    saving_listeners=saving_listeners)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
    config)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3126, in _model_fn
    features, labels, is_export_mode=is_export_mode)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1663, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1994, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 599, in my_model_fn
    save_relative_paths=True)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 502, in _build_internal
    restore_sequentially, reshape)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 381, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

INFO:tensorflow:training_loop marked as finished
I0107 10:43:44.500117 140625456166720 error_handling.py:101] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W0107 10:43:44.500193 140625456166720 error_handling.py:135] Reraising captured error
Traceback (most recent call last):
  File "/home/danielk/text-to-text-transfer-transformer/env36/bin/t5_mesh_transformer", line 8, in <module>
    sys.exit(console_entry_point())
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/t5/models/mesh_transformer_main.py", line 218, in console_entry_point
    app.run(main)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/t5/models/mesh_transformer_main.py", line 212, in main
    model_dir=FLAGS.model_dir)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/gin/config.py", line 1078, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/gin/config.py", line 1055, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 1701, in run
    train_dataset_fn, train_steps, ensemble_inputs)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 1092, in train_model
    estimator.train(input_fn=input_fn, max_steps=train_steps)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
    rendezvous.raise_errors()
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
    six.reraise(typ, value, traceback)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/six.py", line 696, in reraise
    raise value
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
    saving_listeners=saving_listeners)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1490, in _train_with_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1014, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 725, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1207, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1212, in _create_session
    return self._sess_creator.create_session()
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 878, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 647, in create_session
    init_fn=self._scaffold.init_fn)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_core/python/training/session_manager.py", line 290, in prepare_session
    config=config)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_core/python/training/session_manager.py", line 220, in _restore_checkpoint
    saver.restore(sess, ckpt.model_checkpoint_path)
  File "/home/danielk/text-to-text-transfer-transformer/env36/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1306, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key decoder/block_000/layer_000/SelfAttention/relative_attention_bias not found in checkpoint
         [[node save/RestoreV2_1 (defined at /lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/RestoreV2_1':
  File "/bin/t5_mesh_transformer", line 8, in <module>
    sys.exit(console_entry_point())
  File "/lib/python3.6/site-packages/t5/models/mesh_transformer_main.py", line 218, in console_entry_point
    app.run(main)
  File "/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/lib/python3.6/site-packages/t5/models/mesh_transformer_main.py", line 212, in main
    model_dir=FLAGS.model_dir)
  File "/lib/python3.6/site-packages/gin/config.py", line 1055, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 1701, in run
    train_dataset_fn, train_steps, ensemble_inputs)
  File "/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 1092, in train_model
    estimator.train(input_fn=input_fn, max_steps=train_steps)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
    saving_listeners=saving_listeners)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
    config)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3126, in _model_fn
    features, labels, is_export_mode=is_export_mode)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1663, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1994, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 599, in my_model_fn
    save_relative_paths=True)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 502, in _build_internal
    restore_sequentially, reshape)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 381, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

For some reason when I drop the last line --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin" it works fine; which is surprising since I was under the impression that this line determines the pre-trained model to use (small, base, large, etc).

Additional info: I am running it on a GPU machine, but it shouldn't be a problem since the error happens when loading the models (and before any computation).

Can you share perplexity during pretraining for some experiments

Hi, first of all, great work, congrats!!! The experiments in the paper are very detailed and they help in answering some very interesting questions.

Can you please share pretraining perplexity of some experiments (similar to BERT paper's Table 6). It will provide more understanding into these models.

Specifically I am looking for:

Final ppl achieved by models in table 14 of your paper, ie. T5-11B, T5-3B, T5-Large, T5-Base, T5-Small.
Final ppl for experiments in Table 13.

Thanks, I appreciate your time in answering these questions.

"NameError: name 't5' is not defined" in the Colab notebook

Potentially a trivial question:

When using the Colab notebook, it does not recognize t5 as a package.
Any thought what could be going wrong?

Did you explore harder pretraining objectives for bigger models

Hi,

Thanks again for answering these questions and great study.

Just a quick question, when you scaled the model size were you able to explore different pretraining objectives? My intuition is, maybe bigger models are able to fit harder pretraining tasks. And harder pretraining tasks get even better downstream representations?

SavedModel export for ML Cloud serving

I have fine-tuned the 3B model in Colab notebook provided in this repo as notebooks/t5-trivia.ipynb.
After fine-tuning as recommended in this notebook, I would like to export my model as SavedModel to be served by ML Cloud.

To do this I use the following code fragment placed as the last cell in the notebook (so I have model already created):

vocabulary = t5.data.SentencePieceVocabulary(t5.data.DEFAULT_SPM_PATH)
estimator = model.estimator(vocabulary)

your_feature_spec = {
    # "inputs": tf.FixedLenFeature([], dtype=tf.string, default_value=""),
    "inputs": tf.VarLenFeature(dtype=tf.string),
}

def _serving_input_receiver_fn():
    serialized_tf_example = tf.placeholder(dtype=tf.string, shape=None, 
                                           name='inputs')
    # key (e.g. 'examples') should be same with the inputKey when you 
    # buid the request for prediction
    receiver_tensors = {'inputs': serialized_tf_example}
    features = tf.parse_example(serialized_tf_example, your_feature_spec)
    return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)

estimator.export_savedmodel(os.path.join(MODEL_DIR, "saved_model/"), _serving_input_receiver_fn)

I get the following error when executing this code:

Complete stack-trace is here:
stack-trace-SavedModel.txt

I assume that the error tells that this model is only suitable for TPU inference and is not supposed to work on ML Cloud (where you have only CPU instances available).

Is this a feasible task to make this model served by ML Cloud and if so what are the steps I should follow to accomplish that?

Thank you!

CUDA_ERROR_OUT_OF_MEMORY: out of memory (on a GPU)

Here is full log:

(env37_t5) danielk@aristo-server1 ~ $ t5_mesh_transformer  \ 
>   --model_dir="danielk-files/models" \
>   --t5_tfds_data_dir="danielk-files" \
>   --gin_file="dataset.gin" \
>   --gin_param="utils.run.mesh_shape = 'model:2,batch:1'" \ 
>   --gin_param="utils.run.mesh_devices = ['gpu:0', 'gpu:1']" \
>   --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'" \
>   --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin" \
>   --gin_param="batch_size=2"

WARNING:tensorflow:From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/tensorflow_core/python/compat/v2_compat.py:68: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-01-09 11:11:34.259764: W tensorflow/core/platform/cloud/google_auth_provider.cc:178] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Aborted: All 10 retry attempts failed. The 
last failure: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'".
INFO:tensorflow:model_type=bitransformer
I0109 11:11:35.254748 139976006838016 utils.py:1664] model_type=bitransformer
INFO:tensorflow:mode=train
I0109 11:11:35.254887 139976006838016 utils.py:1665] mode=train
INFO:tensorflow:sequence_length={'inputs': 512, 'targets': 512}
I0109 11:11:35.254942 139976006838016 utils.py:1666] sequence_length={'inputs': 512, 'targets': 512}
INFO:tensorflow:batch_size=2048
I0109 11:11:35.254985 139976006838016 utils.py:1667] batch_size=2048
INFO:tensorflow:train_steps=1000000000
I0109 11:11:35.255030 139976006838016 utils.py:1668] train_steps=1000000000
INFO:tensorflow:mesh_shape=model:2,batch:1
I0109 11:11:35.255067 139976006838016 utils.py:1669] mesh_shape=model:2,batch:1
INFO:tensorflow:layout_rules=ensemble:ensemble,batch:batch,d_ff:model,heads:model,vocab:model,experts:batch
I0109 11:11:35.255102 139976006838016 utils.py:1670] layout_rules=ensemble:ensemble,batch:batch,d_ff:model,heads:model,vocab:model,experts:batch
INFO:tensorflow:Building TPUConfig with tpu_job_name=None
I0109 11:11:35.255166 139976006838016 utils.py:1685] Building TPUConfig with tpu_job_name=None
INFO:tensorflow:Using config: {'_model_dir': 'danielk-files/models', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorfl
ow.python.training.server_lib.ClusterSpec object at 0x7f4debe16710>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=None, num_cores_per_replica=1, per_host_input_for
_training=4, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0109 11:11:35.257782 139976006838016 estimator.py:212] Using config: {'_model_dir': 'danielk-files/models', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorfl
ow.python.training.server_lib.ClusterSpec object at 0x7f4debe16710>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=None, num_cores_per_replica=1, per_host_input_for
_training=4, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0109 11:11:35.258051 139976006838016 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0109 11:11:35.258131 139976006838016 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0109 11:11:35.263432 139976006838016 deprecation.py:506] From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0109 11:11:35.263689 139976006838016 deprecation.py:323] From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
I0109 11:11:35.269644 139976006838016 dataset_builder.py:193] Overwrite dataset info from restored data version.
I0109 11:11:35.373311 139976006838016 dataset_builder.py:193] Overwrite dataset info from restored data version.
I0109 11:11:35.379834 139976006838016 dataset_builder.py:273] Reusing dataset glue (danielk-files/glue/mrpc/0.0.2)
I0109 11:11:35.380300 139976006838016 dataset_builder.py:434] Constructing tf.data.Dataset for split train, from danielk-files/glue/mrpc/0.0.2
2020-01-09 11:11:35.972400: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-09 11:11:36.034612: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:11:36.036360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Quadro GV100 major: 7 minor: 0 memoryClockRate(GHz): 1.627
pciBusID: 0000:01:00.0
2020-01-09 11:11:36.036418: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:11:36.038354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: Quadro RTX 8000 major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:02:00.0
2020-01-09 11:11:36.038504: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-01-09 11:11:36.039339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-01-09 11:11:36.040050: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-01-09 11:11:36.040232: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-01-09 11:11:36.041161: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-01-09 11:11:36.041870: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-01-09 11:11:36.044076: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-09 11:11:36.044185: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:11:36.045984: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:11:36.047956: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:11:36.049633: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:11:36.051551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
WARNING:tensorflow:From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/mesh_tensorflow-0.1.9-py3.7.egg/mesh_tensorflow/transformer/dataset.py:513: DatasetV1.output_shapes (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.
W0109 11:11:37.271278 139976006838016 deprecation.py:323] From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/mesh_tensorflow-0.1.9-py3.7.egg/mesh_tensorflow/transformer/dataset.py:513: DatasetV1.output_shapes (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.
INFO:tensorflow:Calling model_fn.
I0109 11:11:38.479681 139976006838016 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Running train on CPU
I0109 11:11:38.479841 139976006838016 tpu_estimator.py:3124] Running train on CPU
INFO:tensorflow:feature inputs : Tensor("Reshape:0", shape=(1, 2048, 512), dtype=int32)
I0109 11:11:38.480923 139976006838016 utils.py:374] feature inputs : Tensor("Reshape:0", shape=(1, 2048, 512), dtype=int32)
WARNING:tensorflow:From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/mesh_tensorflow-0.1.9-py3.7.egg/mesh_tensorflow/transformer/utils.py:376: Print (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2018-08-20.
Instructions for updating:
Use tf.print instead of tf.Print. Note that tf.print returns a no-output operator that directly prints the output. Outside of defuns or eager mode, this operator will not be executed unless it is directly specified in session.run or used as a control dependency for other operators. This is only a concern in graph mode. Below is an example of how to ensur
e tf.print executes in graph mode:

W0109 11:11:38.481014 139976006838016 deprecation.py:323] From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/mesh_tensorflow-0.1.9-py3.7.egg/mesh_tensorflow/transformer/utils.py:376: Print (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2018-08-20.
Instructions for updating:
Use tf.print instead of tf.Print. Note that tf.print returns a no-output operator that directly prints the output. Outside of defuns or eager mode, this operator will not be executed unless it is directly specified in session.run or used as a control dependency for other operators. This is only a concern in graph mode. Below is an example of how to ensur
e tf.print executes in graph mode:

INFO:tensorflow:feature inputs_position : Tensor("Reshape_1:0", shape=(1, 2048, 512), dtype=int32)
I0109 11:11:38.482377 139976006838016 utils.py:374] feature inputs_position : Tensor("Reshape_1:0", shape=(1, 2048, 512), dtype=int32)
INFO:tensorflow:feature targets : Tensor("Reshape_2:0", shape=(1, 2048, 512), dtype=int32)
I0109 11:11:38.483691 139976006838016 utils.py:374] feature targets : Tensor("Reshape_2:0", shape=(1, 2048, 512), dtype=int32)
INFO:tensorflow:feature targets_position : Tensor("Reshape_3:0", shape=(1, 2048, 512), dtype=int32)
I0109 11:11:38.485010 139976006838016 utils.py:374] feature targets_position : Tensor("Reshape_3:0", shape=(1, 2048, 512), dtype=int32)
INFO:tensorflow:feature inputs_segmentation : Tensor("Reshape_4:0", shape=(1, 2048, 512), dtype=int32)
I0109 11:11:38.486300 139976006838016 utils.py:374] feature inputs_segmentation : Tensor("Reshape_4:0", shape=(1, 2048, 512), dtype=int32)
INFO:tensorflow:feature targets_segmentation : Tensor("Reshape_5:0", shape=(1, 2048, 512), dtype=int32)
I0109 11:11:38.487596 139976006838016 utils.py:374] feature targets_segmentation : Tensor("Reshape_5:0", shape=(1, 2048, 512), dtype=int32)
INFO:tensorflow:serialize_num_microbatches: tokens_per_microbatch_per_replica=8192 batch_dim=Dimension(name='batch', size=2048) sequence_length={'inputs': 512, 'targets': 512} batch_per_replica=2048 num_microbatches=128
I0109 11:11:38.488244 139976006838016 utils.py:1483] serialize_num_microbatches: tokens_per_microbatch_per_replica=8192 batch_dim=Dimension(name='batch', size=2048) sequence_length={'inputs': 512, 'targets': 512} batch_per_replica=2048 num_microbatches=128
WARNING:tensorflow:Using default tf glorot_uniform_initializer for variable encoder/block_000/layer_000/SelfAttention/relative_attention_bias  The initialzer will guess the input and output dimensions  based on dimension order.
W0109 11:11:38.516117 139976006838016 ops.py:4022] Using default tf glorot_uniform_initializer for variable encoder/block_000/layer_000/SelfAttention/relative_attention_bias  The initialzer will guess the input and output dimensions  based on dimension order.
WARNING:tensorflow:Using default tf glorot_uniform_initializer for variable decoder/block_000/layer_000/SelfAttention/relative_attention_bias  The initialzer will guess the input and output dimensions  based on dimension order.
W0109 11:11:38.731440 139976006838016 ops.py:4022] Using default tf glorot_uniform_initializer for variable decoder/block_000/layer_000/SelfAttention/relative_attention_bias  The initialzer will guess the input and output dimensions  based on dimension order.
INFO:tensorflow:Trainable Variables            count: 99      Total size: 60506624         Total slice_size: 30261504
I0109 11:12:02.097769 139976006838016 ops.py:5656] Trainable Variables            count: 99      Total size: 60506624         Total slice_size: 30261504
INFO:tensorflow:All Variables                  count: 105     Total size: 60691328         Total slice_size: 30386880
I0109 11:12:02.098776 139976006838016 ops.py:5656] All Variables                  count: 105     Total size: 60691328         Total slice_size: 30386880
INFO:tensorflow:Create CheckpointSaverHook.
I0109 11:12:02.323129 139976006838016 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Done calling model_fn.
I0109 11:12:02.323394 139976006838016 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Starting the session.
I0109 11:12:05.723811 139976006838016 ops.py:5512] Starting the session.
WARNING:tensorflow:From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0109 11:12:05.893454 139976006838016 deprecation.py:323] From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:tensorflow:Graph was finalized.
I0109 11:12:06.051002 139976006838016 monitored_session.py:240] Graph was finalized.
2020-01-09 11:12:06.052934: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-09 11:12:06.061519: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600320000 Hz
2020-01-09 11:12:06.061715: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5622b19cd0e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-09 11:12:06.061729: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-01-09 11:12:06.279132: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.292306: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.294242: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5622af2a7120 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-01-09 11:12:06.294260: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Quadro GV100, Compute Capability 7.0
2020-01-09 11:12:06.294267: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Quadro RTX 8000, Compute Capability 7.5
2020-01-09 11:12:06.294765: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.296223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Quadro GV100 major: 7 minor: 0 memoryClockRate(GHz): 1.627
pciBusID: 0000:01:00.0
2020-01-09 11:12:06.296279: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.298011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: Quadro RTX 8000 major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:02:00.0
2020-01-09 11:12:06.298046: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-01-09 11:12:06.298064: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-01-09 11:12:06.298080: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-01-09 11:12:06.298096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-01-09 11:12:06.298111: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-01-09 11:12:06.298126: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-01-09 11:12:06.298142: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-09 11:12:06.298197: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.299798: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.301686: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.303165: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.304884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2020-01-09 11:12:06.305241: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-01-09 11:12:06.310865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-09 11:12:06.310878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1
2020-01-09 11:12:06.310883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N N
2020-01-09 11:12:06.310886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   N N
2020-01-09 11:12:06.311424: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.312928: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.314704: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.316186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30553 MB memory) -> physical GPU (device: 0, name: Quadro GV100, pci bus id: 0000:01:00.0, compute capability: 7.0)
2020-01-09 11:12:06.316632: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-09 11:12:06.318361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 45978 MB memory) -> physical GPU (device: 1, name: Quadro RTX 8000, pci bus id: 0000:02:00.0, compute capability: 7.5)
INFO:tensorflow:Restoring parameters from danielk-files/models/model.ckpt-0
I0109 11:12:06.319796 139976006838016 saver.py:1284] Restoring parameters from danielk-files/models/model.ckpt-0
WARNING:tensorflow:From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
W0109 11:12:09.913856 139976006838016 deprecation.py:323] From /home/danielk/anaconda3/envs/env37_t5/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
I0109 11:12:10.712485 139976006838016 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0109 11:12:11.262093 139976006838016 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Before copy master to slices.
I0109 11:12:12.044784 139976006838016 ops.py:5541] Before copy master to slices.
INFO:tensorflow:Done with copy master to slices.
I0109 11:12:13.903039 139976006838016 ops.py:5543] Done with copy master to slices.
INFO:tensorflow:Saving checkpoints for 0 into danielk-files/models/model.ckpt.
I0109 11:12:25.531368 139976006838016 basic_session_run_hooks.py:606] Saving checkpoints for 0 into danielk-files/models/model.ckpt.
INFO:tensorflow:Before Save.
I0109 11:12:25.541100 139976006838016 ops.py:5516] Before Save.
INFO:tensorflow:About to write a checkpoint
I0109 11:12:26.858080 139976006838016 ops.py:5518] About to write a checkpoint
INFO:tensorflow:Done writing checkpoint.
I0109 11:12:30.216168 139976006838016 ops.py:5521] Done writing checkpoint.



import feature targets[[[59 834 15 1169 15592 1 59 834 15 1169 15592 1 7072 1 7072 1 7072 1 7072 1 7072 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][7072 1 7072 1 7072 1 7072 1 59 834 15 1169 15592 1 7072 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...]]...]import feature inputs_segmentation[[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...]]...]

import feature targets_segmentation[[[1 1 1 1 1 1 2 2 2 2 2 2 3 3 4 4 5 5 6 6 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][1 1 2 2 3 3 4 4 5 5 5 5 5 5 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...]]...]
import feature inputs[[[3 51 52 102 75 7142 536 10 37 748 18 11706 26 10571 26 9 1824 27127 11507 3 5 196 4 4666 2490 4659 4848 1828 979 3 6 42 4097 2128 1093 3 6 12 1914 3840 11039 755 3 5 7142 357 10 37 10571 26 9 1824 13723 5538 6292 4357 927 1093 3 6 11 8 5150 3 184 21309 3 31 7 4728 3 27336 1093 3 5 1 3 51 52 102 75 7142 536 10 71 272 7075 13 944 42 756 19 1702 26676 3 117 604 42 756 19 1702 29329 3 5 7142 357 10 71 272 7075 344 209 19253 11 204 27336 19 1702 1389 3 6 147 944 19 1702 26676 11 604 42 2123 19 4802 38 29329 3 5 1 3 51 52 102 75 7142 536 10 451 47 3 10116 6962 26 16 368 1060 538 30 386 12052 13 7738 11 5563 1213 406 15794 3 5 7142 357 10 451 47 3 10116 6962 26 30 386 12052 13 511 18 19706 7738 11 5563 1213 406 15794 16 3 23748 1334 2215 173 3 5 1 3 51 52 102 75 7142 536 10 3 15944 2721 3 6 2449 28017 3 19448 12967 8 1149 4172 11675 6894 45 14617 3 5 7142 357 10 2721 1379 3 6 2449 28017 3 31 7 1476 12967 8 14617 240 1890 462 3 5 1 3 51 52 102 75 7142 536 10 216 243 8 962 13 5025 2298 53 12910 251 81 3 9 205 5 196 5 188 5 5502 47 96 3 9 182 2261 1052 96 24 225 36 96 6665 26 12 8 423 222 5996 96 57 8 6923 1775 3 5 7142 357 10 37 1945 1384 243 24 3 26177 12910 251 47 3 9 2261 1052 24 225 36 96 6665 26 12 8 423 222 5996 96 57 8 6923 1775 3 5 1 3 51 52 102 75 7142 536 10 9765 243 24 8 1025 18 19973 772 130 6737 57 5455 11 2289 1170 3 6 1101 1729 8175 11 3415 772 3 6 11 3798 4539 11 895 14609 3 5 7142 357 10 23686 11 2289 1170 3 6 1101 1729 8175 11 3415 772 11 1101 1170 16 4539 11 895 14609 10719 48 2893 3 31 7 772 3 5 1 3 51 52 102 75 7142 536 10 8979 53 7 81 750 268 1124 3798 15284 45 8 166 2893 3 6 15539 45 1283 12 6897 3 5 7142 357 10 15186 13 750 268 1124 3798 15284 3 6 8 4379 2086 243 3 6 15539 12 6897 45 1283 16 8 166 2893 3 5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][3 51 52 102 75 7142 536 10 391 2999 687 3 6 3479 3 6 744 3 31 17 780 43 46 4917 16 8 7738 1567 3 6 5779 243 3 5 7142 357 10 391 2999 687 3 6 3479 3 6 405 59 43 3 9 6297 30 8 7738 1567 3 6 5779 243 3 5 1 3 51 52 102 75 7142 536 10 37 3202 2120 95 192 477 865 44 3 9 6440 1078 633 3119 550 3 6 8944 29 68 1346 3 6 11 2139 991 2095 12 160 3 12554 703 7472 127 3 5 7142 357 10 37 10319 764 716 227 8 3202 2120 95 44 3 9 6440 1078 633 3119 550 11 2139 991 2095 12 160 3 12554 703 7472 127 3 5 1 3 51 52 102 75 7142 536 10 12737 7 7048 47 5510 13139 28 2084 3 9094 45 3 9 142 904 3342 77 30 8 7584 3 31 7 2131 3010 3 5 7142 357 10 86 8388 3 6 227 4169 203 16 5714 3 6 12737 7 7048 47 13139 28 2084 3 9094 45 3 9 142 904 3342 77 30 8 7584 3 31 7 2131 3010 3 5 1 3 51 52 102 75 7142 536 10 6187 630 27575 3 6 113 4037 8 73 28062 26 239 2864 3 6 3725 2098 662 767 16 5714 30 1817 1778 15 152 127 12710 11 861 18339 3991 3 5 7142 357 10 6187 630 27575 3 6 113 4037 8 6016 239 124 3 6 2098 662 767 16 5714 30 1817 1778 15 152 127 12710 11 861 18 29 15 122 3437 3991 3 5 1 3 51 52 102 75 7142 536 10 1960 3 6 8 8183 25553 25093 2086 56 962 165 7469 30 125 2953 8 3125 3 5 7142 357 10 37 8183 25553 25093 2086 65 2681 8 9100 21 8 3125 2812 120 30 10571 9 3 5 1 3 51 52 102 75 7142 536 10 216 19 80 13 192 11882 30 8 874 18 12066 377 2823 3 6 11 3 88 19 3 9 1101 11223 13 11955 49 17524 581 2252 11 4390 6991 24 15108 16 221 75 4392 3786 3 5 7142 357 10 10400 102 7 3 6 80 13 192 11882 30 8 874 18 12066 5473 3 6 65 3 9951 21 11955 49 17524 581 2252 11 4390 6991 24 15108 16 221 75 4392 3786 3 5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...]]...]



2020-01-09 11:13:56.070937: I tensorflow/compiler/jit/xla_compilation_cache.cc:238] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2020-01-09 11:15:58.087468: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 34359738368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.089614: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 34359738368
2020-01-09 11:15:58.090228: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 30923763712 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090249: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 30923763712
2020-01-09 11:15:58.090281: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 27831386112 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090289: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 27831386112
2020-01-09 11:15:58.090316: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 25048246272 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090325: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 25048246272
2020-01-09 11:15:58.090352: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 22543421440 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090361: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 22543421440
2020-01-09 11:15:58.090389: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 20289079296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090397: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 20289079296
2020-01-09 11:15:58.090424: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 18260170752 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090433: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 18260170752
2020-01-09 11:15:58.090468: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 16434153472 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090477: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 16434153472
2020-01-09 11:15:58.090504: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 14790737920 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090512: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 14790737920
2020-01-09 11:15:58.090537: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 13311664128 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090545: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 13311664128
2020-01-09 11:15:58.090572: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 11980496896 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090581: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 11980496896
2020-01-09 11:15:58.090608: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 10782446592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090616: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 10782446592
2020-01-09 11:15:58.090643: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 9704201216 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090652: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 9704201216
2020-01-09 11:15:58.090679: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 8733780992 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090687: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 8733780992
2020-01-09 11:15:58.090715: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 7860402688 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090723: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 7860402688
2020-01-09 11:15:58.090749: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 7074362368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-01-09 11:15:58.090758: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 7074362368

Here are some additional information about the environment:

(env37_t5) danielk@aristo-server1 ~ $ echo "$LD_LIBRARY_PATH"
:/home/danielk/anaconda3/pkgs/cudatoolkit-10.0.130-0/lib/

(env37_t5) danielk@aristo-server1 ~ $ nvidia-smi 
Fri Jan 24 14:24:09 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro GV100        On   | 00000000:01:00.0 Off |                  Off |
| 65%   82C    P2   159W / 250W |  11154MiB / 32478MiB |     74%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 8000     On   | 00000000:02:00.0 Off |                  Off |
| 33%   49C    P8    14W / 260W |   6830MiB / 48571MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

FYI @nalourie-ai2

[Bug+Fix] Batch-Size other than an integer leads to AttributeError in batch_size setter

The init of MtfModel tries to set batch_size before initialising various other variables that are needed in order to execute utils.compute_batch_size, which throws an AttributeError if batch_size is a tuple e.g. ("tokens_per_batch", 1024) .

To fix it, the following code should be at the bottom of the init function.
self.batch_size = batch_size

I hope this has no consequences for other initialisations done within the init function, and would be glad if someone could verify this.

Can we run this in Google Colab?

hey I would love to try this out but I'm not very proficient with ML.

is it possible to run the biggest T5 model on a Google Colab Notebook? did anyone set one up? thanks!

Mask Strategy

Is the mask strategy a whole word masking?

I am John #son => I am

The original Bert model will mask subword no matter if this subword is or is not a whole word. I do not see the detail about this in the T5 paper.

Thanks

GPU support?

This looks amazing. Would it be too slow to run of GPUs?

Segmentation fault when importing t5

hey,
I pip installed t5.
When I try to import t5 I get a Segmentation fault.

AttributeError: module 'tensorflow_text' has no attribute 'SentencepieceTokenizer'

is SentencepieceTokenizer part of tensorflow_text package?

Do you drop 15% tokens or 15% words?

In T5, do you drop 15% words or drop 15% tokens after sentence tokenizer?

Low GPU memory usage

Following #20, I know GPU usage is not tested, but thought I'd give it a try.

The current issue is that it does not effectively utilize GPU. Like in the log below, you can see that the majority of the GPU memory is free:

$ nvidia-smi
Tue Jan  7 14:26:02 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro RTX 8000     Off  | 00000000:17:00.0 Off |                  Off |
| 34%   38C    P8     8W / 260W |    171MiB / 48571MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 8000     Off  | 00000000:65:00.0 Off |                  Off |
| 33%   35C    P8    16W / 260W |    171MiB / 48568MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Quadro RTX 8000     Off  | 00000000:B3:00.0 Off |                  Off |
| 57%   77C    P2   259W / 260W |   3058MiB / 48571MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     25631      C   ...ransfer-transformer/env36/bin/python3.6   161MiB |
|    1     25631      C   ...ransfer-transformer/env36/bin/python3.6   161MiB |
|    2     13651      C   python                                      2887MiB |
|    2     25631      C   ...ransfer-transformer/env36/bin/python3.6   161MiB |
+-----------------------------------------------------------------------------+

(PS. Ignore process=13651; it's someone else's)

The program is launched with the following command:

t5_mesh_transformer  \
  --t5_tfds_data_dir="gs://danielk-files" \
  --gin_file="dataset.gin" \
  --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '2x2'" \
  --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'" \
  --model_dir="gs://danielk-files/t5-models/" \
  --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin" \
  --gin_param="tokens_per_batch=16384"

My initial guess was that the batch sizes are pretty small. So, I tried changing the value of "tokens_per_batch" to bigger values, but no luck.
Any thought?

loss calculation

Hi,

Regarding loss calculation, I have 2 questions:

Do you apply any kind of class weight balancing or any kind of weight balance for the vocab?
Do you calculate the loss as a mean of all tasks loss, or do you calculate each task loss separately and then sum them up?

rate_num_examples Doesn't work with new tasks

Hi,

I am training new mixture tasks (new tasks from tsv files), but it only works with "default_rate=1.0".
When I try to use "default_rate=rate_num_examples", it doesn't work because it can't find the tasks in the cache.

In issue #15 you mentioned the cache is only supported in your internal infrastructure.

So, how we can use the rate_num_examples in new mixture tasks ?

Any plans to publish multilingual models?

Shape mismatch error while loading the pretrained model

I get a shape mismatch error while running the t5_mesh_transformer either for training or fine-tuning.
Following is an example fine-tuning run, using a sample WMT TSV file:

$ t5_mesh_transformer --tpu="${TPU_NAME}" --gcp_project="${PROJECT}" --tpu_zone="${ZONE}" --model_dir="${MODEL_DIR}" --t5_tfds_data_dir=${DATA_DIR} --gin_file="gs://t5-data/pretrained_models/11B/operative_config.gin" --gin_file="models/bi_v1.gin" --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" --gin_param="utils.tpu_mesh_shape.tpu_topology = '2x2'" --gin_param="utils.run.train_dataset_fn = @t5.models.mesh_transformer.tsv_dataset_fn" --gin_param="tsv_dataset_fn.filename = 'gs://XYZbucket/t5/misc/news-commentary-v14.ar-it.tsv'"

Then I get the following error:

ERROR:tensorflow:Error recorded from training_loop: Shape of variable decoder/block_000/layer_000/SelfAttention/k:0 ((768, 768)) doesn't match with shape of tensor decoder/block_000/layer_000/SelfAttention/k ([1024, 16384]) from checkpoint reader.
E1110 22:05:58.563133 140034595272448 error_handling.py:75] Error recorded from training_loop: Shape of variable decoder/block_000/layer_000/SelfAttention/k:0 ((768, 768)) doesn't match with shape of tensor decoder/block_000/layer_000/SelfAttention/k ([1024, 16384]) from checkpoint reader.
INFO:tensorflow:training_loop marked as finished
I1110 22:05:58.563437 140034595272448 error_handling.py:101] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W1110 22:05:58.563559 140034595272448 error_handling.py:135] Reraising captured error
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
    config)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3159, in _model_fn
    _train_on_tpu_system(ctx, model_fn_wrapper, dequeue_fn))
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3604, in _train_on_tpu_system
    device_assignment=ctx.device_assignment)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/tpu/tpu.py", line 1277, in split_compile_and_shard
    name=name)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/tpu/tpu.py", line 992, in split_compile_and_replicate
    outputs = computation(*computation_inputs)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3589, in multi_tpu_train_steps_on_single_shard
    inputs=[0, _INITIAL_LOSS])
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/tpu/training_loop.py", line 178, in while_loop
    condition_wrapper, body_wrapper, inputs, name="", parallel_iterations=1)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/ops/control_flow_ops.py", line 2753, in while_loop
    return_same_structure)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/ops/control_flow_ops.py", line 2245, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/ops/control_flow_ops.py", line 2170, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/tpu/training_loop.py", line 121, in body_wrapper
    outputs = body(*(inputs + dequeue_ops))
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3588, in <lambda>
    lambda i, loss: [i + 1, single_tpu_train_step(i)],
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1715, in train_step
    self._call_model_fn(features, labels))
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1994, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 567, in my_model_fn
    init_checkpoint, {v: v for v in restore_vars}
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint
    init_from_checkpoint_fn)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1940, in merge_call
    return self._merge_call(merge_fn, args, kwargs)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1947, in _merge_call
    return merge_fn(self._strategy, *args, **kwargs)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 286, in <lambda>
    ckpt_dir_or_file, assignment_map)
  File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 329, in _init_from_checkpoint
    tensor_name_in_ckpt, str(variable_map[tensor_name_in_ckpt])
ValueError: Shape of variable decoder/block_000/layer_000/SelfAttention/k:0 ((768, 768)) doesn't match with shape of tensor decoder/block_000/layer_000/SelfAttention/k ([1024, 16384]) from checkpoint reader.

Note that the error is replicable using any of the pretrained models, not just 11B param one.

What is the task prefix for the unsupervised denoising objective?

Hi!

I want to try how the T5 model restores corrupted spans. I guess to do this, I need to prepend a task prefix to the input and then replace a few spans by token from the end of the vocabulary ([31199, 31198, ...]). Accordingly, the example from the paper (Thank you <x> me to your party <y>) becomes [1562, 25, 31999, 140, 12, 39, 1088, 31998]. I tried to find the prefix in the source code, but the only reference is in the preprocessors.py:

Note: Some functionality has been deleted, which we may or may not want to
  restore at a later date.  The code for this functionality can be found in
  the deleted code for this CL.  In particular:
    - mixture of masking and random replacement
    - task labels prepended to the inputs

What is the prefix for this task? It does not seem to work without it.

Enable fine-tuning on arbitrary `tf.data.Datasets`.

Runtime Error when decoding

I have fine-tuned a model on a .tsv file using the instructions provided. When I try to decode, I get the error:

RuntimeError: Required bindings for `decode_from_file` not provided in config: ['input_filename', 'output_filename']
  In call to configurable 'run' (<function run at 0x7f76de4e78c8>)

I definitely have the input_filename and output_filename params specified in gin_param though. My invocation is like so:

t5_mesh_transformer --tpu=$TPU_NAME --model_dir=$MODEL_DIR --gin_file=$MODEL_DIR/operative_config.gin --gin_file=sample_decode.gin --gin_param="utils.run.mode='infer'" --gin_param="input_filename='$DATA_DIR/testdata_pred.tsv'" --gin_param="output_filename='$DATA_DIR/testdata_outputs.txt'" --gin_param="utils.tpu_mesh_shape.tpu_topology='2x2'" --gin_param="eval_checkpoint_step = 1005000"

Both $MODEL_DIR and $DATA_DIR are GCS Buckets.
Any thoughts?

Finetuning 11B doesn't seem to save checkpoints

Hi all,

thanks for all the great work on the T5 paper/project + for open sourcing this library!

I'm playing around with finetuning the largest T5 model (11B) on a v3-1024 TPU pod. Finetuning seems to go along just fine, however, it seems like checkpoints aren't ever saved (except at the beginning when the model-ckpt-1000000 checkpoint is added from). Curiously, however, they seem to be saved when finetuning the 3B model (or smaller). Has anyone else ran into this before?

More info: every save_checkpoints_steps while training the 3B model, I get something like

I0109 04:30:54.350781 140357595617024 tpu_estimator.py:279] Outfeed finished for iteration (1, 11675)
I0109 04:30:59.370695 140357587224320 transport.py:157] Attempting refresh to obtain initial access_token
W0109 04:30:59.436725 140357587224320 preempted_hook.py:91] TPUPollingThread found TPU b'news2' in state READY, and health HEALTHY.
I0109 04:31:06.791346 140361433953664 basic_session_run_hooks.py:260] loss = 0.20800781, step = 1023436 (3473.152 sec)
I0109 04:31:06.793378 140361433953664 tpu_estimator.py:2307] global_step/sec: 3.37388
I0109 04:31:06.794843 140361433953664 tpu_estimator.py:2308] examples/sec: 431.857
I0109 04:31:09.892513 140361433953664 basic_session_run_hooks.py:606] Saving checkpoints for 1023436 into gs://adviceeval/t5/models/jan_8_2020/model=3B~lr=0.001~epochs=10~bsize=128/model.ckpt.
I0109 04:31:09.893307 140361433953664 ops.py:5516] Before Save.
I0109 04:31:13.890007 140361433953664 ops.py:5518] About to write a checkpoint
I0109 04:31:29.504065 140357587224320 transport.py:157] Attempting refresh to obtain initial access_token
W0109 04:31:29.583728 140357587224320 preempted_hook.py:91] TPUPollingThread found TPU b'news2' in state READY, and health HEALTHY.
I0109 04:31:50.368857 140361433953664 checkpoint_management.py:95] gs://MYBUCKET/t5/models/jan_8_2020/model=3B~lr=0.001~epochs=10~bsize=128/model.ckpt-1023436 is not in all_model_checkpoint_paths. Manually adding it.
I0109 04:32:00.038406 140357587224320 transport.py:157] Attempting refresh to obtain initial access_token
W0109 04:32:00.187774 140357587224320 preempted_hook.py:91] TPUPollingThread found TPU b'news2' in state READY, and health HEALTHY.
I0109 04:32:02.436602 140361433953664 ops.py:5521] Done writing checkpoint.
I0109 04:32:02.440483 140361433953664 tpu_estimator.py:600] Enqueue next (11718) batch(es) of data to infeed.
I0109 04:32:02.440706 140361433953664 tpu_estimator.py:604] Dequeue next (11718) batch(es) of data from outfeed.
I0109 04:32:02.799611 140357595617024 tpu_estimator.py:279] Outfeed finished for iteration (2, 0)

Whereas, for the 11B model, I get

I0110 02:01:45.591546 140334426838784 tpu_estimator.py:279] Outfeed finished for iteration (0, 4884)
I0110 02:02:11.054709 140334407706368 transport.py:157] Attempting refresh to obtain initial access_token
W0110 02:02:11.129206 140334407706368 preempted_hook.py:91] TPUPollingThread found TPU b'news0' in state READY, and health HEALTHY.
I0110 02:02:39.383076 140338804708736 basic_session_run_hooks.py:262] loss = 0.24804688, step = 1005000
I0110 02:02:39.384954 140338804708736 tpu_estimator.py:600] Enqueue next (5000) batch(es) of data to infeed.
I0110 02:02:39.385145 140338804708736 tpu_estimator.py:604] Dequeue next (5000) batch(es) of data from outfeed.
I0110 02:02:41.191438 140334407706368 transport.py:157] Attempting refresh to obtain initial access_token
W0110 02:02:41.268857 140334407706368 preempted_hook.py:91] TPUPollingThread found TPU b'news0' in state READY, and health HEALTHY.
I0110 02:03:11.331690 140334407706368 transport.py:157] Attempting refresh to obtain initial access_token
W0110 02:03:11.404917 140334407706368 preempted_hook.py:91] TPUPollingThread found TPU b'news0' in state READY, and health HEALTHY.
I0110 02:03:29.703239 140334426838784 tpu_estimator.py:279] Outfeed finished for iteration (1, 0)

(and training progresses despite not having written the checkpoint).

If it helps, I'm using Python 3.6, tensorflow==1.15 and the following settings when constructing the 11B model:

model = t5.models.MtfModel(
        tpu_topology='16x32',
        model_parallelism=32,
        batch_size=128,
        sequence_length={"inputs": 1280, "targets": 512},
        learning_rate_schedule=0.001,
        save_checkpoints_steps=5000,
        keep_checkpoint_max=None,
        iterations_per_loop=FLAGS.iterations_per_loop,
    )

Those settings are the same as for 3B and below (though I also messed around with save_checkpoints_steps, and I set model_parallelism=8 there).

Thanks! +Let me know if I can provide more information that would be helpful :)

What is "TOPOLOGY"?

Here and there in the readme I see mentions of "topology" in the readme (for instance, utils.tpu_mesh_shape.tpu_topology = '2x2'). I couldn't find the definition (either the readme or the paper). Could you elaborate?

Regarding the model structure, what is the main difference between T5 and original Transformer?

I've been trying to do some experiments on T5 recently, and have read the paper.

After reading 2.1 part of the paper, my understanding is, the differences of model structure between T5 and Transformer are:

T5 uses relative positional embedding.
LayerNorm are used at the start of each block and the end of last block.

Am I right?

Thank you for your time!

text generation

Is it possible for T5 to generate text?

How t5 handle MRC multiple choices task like RACE?

Hi there, I notice that, in SuperGlue, t5 handle ReCoRD by concatenating all candidate answers after the question and before the passage, and let the model to generate the correct answer. But when the length and amount of the candidates increase, for example in RACE, this might not be the best training and predicting way (I think), since the concatenated input sequence length might easily surpass 512 and the difficulty of decoding might also increase. Another example is passage re-ranking, we might need to get the scores for each answer.

A basic idea is that we concatenate passage with each candidate, and then get the logits/perplexity of model decoding the true or false token, and rank the candidates to get the final prediction. The question is, how could we find an easy way to get the intermediate logits in current t5, or is there any better solution for such tasks?

google-research / text-to-text-transfer-transformer Goto Github PK