Git Product home page Git Product logo

distilling-step-by-step's Introduction

Distilling Step-by-Step!

Code for paper Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Environment Setup

  • Setup Conda environment:
conda create --name distill python=3.10.6 -y
conda activate distill
conda install -y pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install git+https://github.com/huggingface/[email protected] datasets sentencepiece protobuf==3.20.* tensorboardX
  • Extract datasets to datasets/:
unzip datasets.zip

Command Usages

Args usages

  • --from_pretrained: google/t5-v1_1-small, google/t5-v1_1-base, google/t5-v1_1-large, google/t5-v1_1-xxl
  • --dataset: esnli, anli1, cqa, svamp
  • --label_type:
    • --label_type gt: Use GT label for training
    • --label_type llm: Use LLM predicted label for training
  • --alpha: Task weight for multi-task training. Loss = alpha * label_prediction_loss + (1 - alpha) * rationale_generation_loss
    • --alpha 0.5: recommended
  • --batch_size: Batch size
  • --grad_steps: Gradient accumulation step
  • --max_input_length: Maximum input length
  • --eval_steps: How many steps to evaluate the model during training
  • --max_steps: Maximum steps for training
  • --run: Random seed to use
  • --model_type:
    • standard: Standard finetuning (--label_type gt) or distillation (--label_type llm)
    • task_prefix: Distilling step-by-step
  • --parallelize: Model parallelism

Example usages

  • Standard finetuning:
python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type standard --label_type gt --batch_size 64
  • Distilling step-by-step with GT label and PaLM rationale:
python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type task_prefix --label_type gt --llm palm --alpha 0.5 --batch_size 64
  • Standard distillation:
python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type standard --label_type llm --batch_size 64
  • Distilling step-by-step with PaLM label and PaLM rationale:
python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type task_prefix --label_type llm --llm palm --alpha 0.5 --batch_size 64

Cite

If you find this repository useful, please consider citing:

@article{hsieh2023distilling,
  title={Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes},
  author={Hsieh, Cheng-Yu and Li, Chun-Liang and Yeh, Chih-Kuan and Nakhost, Hootan and Fujii, Yasuhisa and Ratner, Alexander and Krishna, Ranjay and Lee, Chen-Yu and Pfister, Tomas},
  journal={arXiv preprint arXiv:2305.02301},
  year={2023}
}

distilling-step-by-step's People

Contributors

chengyuhsieh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

distilling-step-by-step's Issues

Distilling failed

Hello, I try to run and train
python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type task_prefix --label_type llm --llm palm --alpha 0.5 --batch_size 64
The google/t5-v1_1-base model was downloaded from Hugging Face, but the tokenizer was having problems.
7633a7b7-bd47-4528-b6cd-c5d2d01c695f

Distilled models T5

It's possible to download the distilled/trained models (from google/t5-v1_1-small, google/t5-v1_1-base, google/t5-v1_1-large, google/t5-v1_1-xxl), in order to do some evaluation or more finetuning on them?

'eval_test_loss': nan

Hello,
I've run and training starts successfully.
python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type task_prefix --label_type gt --llm palm --alpha 0.5 --batch_size 64

However, I get
'eval_test_loss': nan
and ckpt forlder is empty after the training.

Do you have advice on this? Also do you eval script?

details of reproduction results

Could you please report numerical results of the experiments? I conduct the standard finetuning on 8*3090s with:
python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type standard --label_type gt --batch_size 64 --grad_steps 2
I only got an accuarcy of 60.2% on CQA with the last epoch. But It seems to be around 63% reported in the paper.
Here is my training log:

dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO Bootstrap : Using eth0:10.243.152.6<0>
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation

dsw-27183-759b57b4d6-kz2vc:261724:353 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO NET/Socket : Using [0]eth0:10.243.152.6<0>
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO Using network Socket
NCCL version 2.10.3+cuda11.3
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 2.
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 2.
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 00/02 :    0   1   2   3   4   5   6   7
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 01/02 :    0   1   2   3   4   5   6   7
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 00 : 4[b0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 00 : 3[a0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 00 : 6[d0] -> 7[e0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 00 : 5[c0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 01 : 4[b0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 00 : 2[90] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 01 : 3[a0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 00 : 0[70] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 01 : 6[d0] -> 7[e0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 00 : 7[e0] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 01 : 5[c0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 01 : 2[90] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 00 : 1[80] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 01 : 0[70] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 01 : 7[e0] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 01 : 1[80] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 00 : 7[e0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 01 : 7[e0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 00 : 4[b0] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 00 : 5[c0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 00 : 3[a0] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 01 : 4[b0] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 01 : 5[c0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 00 : 6[d0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 01 : 3[a0] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 00 : 1[80] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 00 : 2[90] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 01 : 6[d0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 01 : 1[80] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 01 : 2[90] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO comm 0x7f0000002fb0 rank 3 nranks 8 cudaDev 3 busId a0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO comm 0x7efff8002fb0 rank 5 nranks 8 cudaDev 5 busId c0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO comm 0x7f0008002fb0 rank 1 nranks 8 cudaDev 1 busId 80 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO comm 0x7efff4002fb0 rank 4 nranks 8 cudaDev 4 busId b0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO comm 0x7effec002fb0 rank 6 nranks 8 cudaDev 6 busId d0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO comm 0x7efff0002fb0 rank 7 nranks 8 cudaDev 7 busId e0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO comm 0x7f0004002fb0 rank 0 nranks 8 cudaDev 0 busId 70 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO comm 0x7efffc002fb0 rank 2 nranks 8 cudaDev 2 busId 90 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO Launch mode Parallel
{'loss': 7.6906, 'learning_rate': 4.875e-05, 'epoch': 27.78}
{'eval_test_loss': 0.8547041416168213, 'eval_test_accuracy': 0.2457002457002457, 'eval_test_runtime': 5.5955, 'eval_test_samples_per_second': 218.211, 'eval_test_steps_per_second': 0.536, 'epoch': 27.78}
{'loss': 1.098, 'learning_rate': 4.75e-05, 'epoch': 55.56}
{'eval_test_loss': 0.5050153136253357, 'eval_test_accuracy': 0.5552825552825553, 'eval_test_runtime': 5.5448, 'eval_test_samples_per_second': 220.205, 'eval_test_steps_per_second': 0.541, 'epoch': 55.56}
{'loss': 0.4848, 'learning_rate': 4.6250000000000006e-05, 'epoch': 83.33}
{'eval_test_loss': 0.5543485879898071, 'eval_test_accuracy': 0.5954135954135954, 'eval_test_runtime': 5.569, 'eval_test_samples_per_second': 219.248, 'eval_test_steps_per_second': 0.539, 'epoch': 83.33}
{'loss': 0.2971, 'learning_rate': 4.5e-05, 'epoch': 111.11}
{'eval_test_loss': 0.6299827098846436, 'eval_test_accuracy': 0.6036036036036037, 'eval_test_runtime': 5.5232, 'eval_test_samples_per_second': 221.068, 'eval_test_steps_per_second': 0.543, 'epoch': 111.11}
{'loss': 0.196, 'learning_rate': 4.375e-05, 'epoch': 138.89}
{'eval_test_loss': 0.7029837369918823, 'eval_test_accuracy': 0.6109746109746109, 'eval_test_runtime': 5.5637, 'eval_test_samples_per_second': 219.458, 'eval_test_steps_per_second': 0.539, 'epoch': 138.89}
{'loss': 0.1373, 'learning_rate': 4.25e-05, 'epoch': 166.67}
{'eval_test_loss': 0.7832159399986267, 'eval_test_accuracy': 0.6126126126126126, 'eval_test_runtime': 5.5722, 'eval_test_samples_per_second': 219.125, 'eval_test_steps_per_second': 0.538, 'epoch': 166.67}
{'loss': 0.1015, 'learning_rate': 4.125e-05, 'epoch': 194.44}
{'eval_test_loss': 0.8421533703804016, 'eval_test_accuracy': 0.6109746109746109, 'eval_test_runtime': 5.5786, 'eval_test_samples_per_second': 218.873, 'eval_test_steps_per_second': 0.538, 'epoch': 194.44}
{'loss': 0.0771, 'learning_rate': 4e-05, 'epoch': 222.22}
{'eval_test_loss': 0.9177669882774353, 'eval_test_accuracy': 0.6183456183456183, 'eval_test_runtime': 5.5181, 'eval_test_samples_per_second': 221.273, 'eval_test_steps_per_second': 0.544, 'epoch': 222.22}
{'loss': 0.0607, 'learning_rate': 3.875e-05, 'epoch': 250.0}
{'eval_test_loss': 0.9690037369728088, 'eval_test_accuracy': 0.6134316134316135, 'eval_test_runtime': 5.5698, 'eval_test_samples_per_second': 219.217, 'eval_test_steps_per_second': 0.539, 'epoch': 250.0}
{'loss': 0.0497, 'learning_rate': 3.7500000000000003e-05, 'epoch': 277.78}
{'eval_test_loss': 1.0180637836456299, 'eval_test_accuracy': 0.6101556101556102, 'eval_test_runtime': 5.5507, 'eval_test_samples_per_second': 219.973, 'eval_test_steps_per_second': 0.54, 'epoch': 277.78}
{'loss': 0.0408, 'learning_rate': 3.625e-05, 'epoch': 305.56}
{'eval_test_loss': 1.040199875831604, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.573, 'eval_test_samples_per_second': 219.091, 'eval_test_steps_per_second': 0.538, 'epoch': 305.56}
{'loss': 0.0348, 'learning_rate': 3.5e-05, 'epoch': 333.33}
{'eval_test_loss': 1.1167311668395996, 'eval_test_accuracy': 0.6101556101556102, 'eval_test_runtime': 5.5648, 'eval_test_samples_per_second': 219.416, 'eval_test_steps_per_second': 0.539, 'epoch': 333.33}
{'loss': 0.0292, 'learning_rate': 3.375000000000001e-05, 'epoch': 361.11}
{'eval_test_loss': 1.1364021301269531, 'eval_test_accuracy': 0.6027846027846028, 'eval_test_runtime': 5.5637, 'eval_test_samples_per_second': 219.459, 'eval_test_steps_per_second': 0.539, 'epoch': 361.11}
{'loss': 0.0258, 'learning_rate': 3.2500000000000004e-05, 'epoch': 388.89}
{'eval_test_loss': 1.1679093837738037, 'eval_test_accuracy': 0.6117936117936118, 'eval_test_runtime': 5.5548, 'eval_test_samples_per_second': 219.81, 'eval_test_steps_per_second': 0.54, 'epoch': 388.89}
{'loss': 0.023, 'learning_rate': 3.125e-05, 'epoch': 416.67}
{'eval_test_loss': 1.205809473991394, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.575, 'eval_test_samples_per_second': 219.014, 'eval_test_steps_per_second': 0.538, 'epoch': 416.67}
{'loss': 0.0201, 'learning_rate': 3e-05, 'epoch': 444.44}
{'eval_test_loss': 1.2262288331985474, 'eval_test_accuracy': 0.6076986076986077, 'eval_test_runtime': 5.5226, 'eval_test_samples_per_second': 221.091, 'eval_test_steps_per_second': 0.543, 'epoch': 444.44}
{'loss': 0.0182, 'learning_rate': 2.8749999999999997e-05, 'epoch': 472.22}
{'eval_test_loss': 1.2057785987854004, 'eval_test_accuracy': 0.6101556101556102, 'eval_test_runtime': 5.5463, 'eval_test_samples_per_second': 220.146, 'eval_test_steps_per_second': 0.541, 'epoch': 472.22}
{'loss': 0.0157, 'learning_rate': 2.7500000000000004e-05, 'epoch': 500.0}
{'eval_test_loss': 1.2767386436462402, 'eval_test_accuracy': 0.6093366093366094, 'eval_test_runtime': 5.5533, 'eval_test_samples_per_second': 219.867, 'eval_test_steps_per_second': 0.54, 'epoch': 500.0}
{'loss': 0.0149, 'learning_rate': 2.625e-05, 'epoch': 527.78}
{'eval_test_loss': 1.3246893882751465, 'eval_test_accuracy': 0.6052416052416052, 'eval_test_runtime': 5.6071, 'eval_test_samples_per_second': 217.759, 'eval_test_steps_per_second': 0.535, 'epoch': 527.78}
{'loss': 0.0133, 'learning_rate': 2.5e-05, 'epoch': 555.56}
{'eval_test_loss': 1.3044090270996094, 'eval_test_accuracy': 0.6117936117936118, 'eval_test_runtime': 5.5897, 'eval_test_samples_per_second': 218.437, 'eval_test_steps_per_second': 0.537, 'epoch': 555.56}
{'loss': 0.0124, 'learning_rate': 2.375e-05, 'epoch': 583.33}
{'eval_test_loss': 1.3567758798599243, 'eval_test_accuracy': 0.6085176085176085, 'eval_test_runtime': 5.5682, 'eval_test_samples_per_second': 219.28, 'eval_test_steps_per_second': 0.539, 'epoch': 583.33}
{'loss': 0.0116, 'learning_rate': 2.25e-05, 'epoch': 611.11}
{'eval_test_loss': 1.3604899644851685, 'eval_test_accuracy': 0.6060606060606061, 'eval_test_runtime': 5.5799, 'eval_test_samples_per_second': 218.819, 'eval_test_steps_per_second': 0.538, 'epoch': 611.11}
{'loss': 0.011, 'learning_rate': 2.125e-05, 'epoch': 638.89}
{'eval_test_loss': 1.3682199716567993, 'eval_test_accuracy': 0.6052416052416052, 'eval_test_runtime': 5.5523, 'eval_test_samples_per_second': 219.907, 'eval_test_steps_per_second': 0.54, 'epoch': 638.89}
{'loss': 0.0099, 'learning_rate': 2e-05, 'epoch': 666.67}
{'eval_test_loss': 1.4006143808364868, 'eval_test_accuracy': 0.6068796068796068, 'eval_test_runtime': 5.5158, 'eval_test_samples_per_second': 221.363, 'eval_test_steps_per_second': 0.544, 'epoch': 666.67}
{'loss': 0.0091, 'learning_rate': 1.8750000000000002e-05, 'epoch': 694.44}
{'eval_test_loss': 1.4297248125076294, 'eval_test_accuracy': 0.6027846027846028, 'eval_test_runtime': 5.5505, 'eval_test_samples_per_second': 219.981, 'eval_test_steps_per_second': 0.54, 'epoch': 694.44}
{'loss': 0.0091, 'learning_rate': 1.75e-05, 'epoch': 722.22}
{'eval_test_loss': 1.4137226343154907, 'eval_test_accuracy': 0.5945945945945946, 'eval_test_runtime': 5.5458, 'eval_test_samples_per_second': 220.168, 'eval_test_steps_per_second': 0.541, 'epoch': 722.22}
{'loss': 0.0083, 'learning_rate': 1.6250000000000002e-05, 'epoch': 750.0}
{'eval_test_loss': 1.4431531429290771, 'eval_test_accuracy': 0.597051597051597, 'eval_test_runtime': 5.5794, 'eval_test_samples_per_second': 218.841, 'eval_test_steps_per_second': 0.538, 'epoch': 750.0}
{'loss': 0.0083, 'learning_rate': 1.5e-05, 'epoch': 777.78}
{'eval_test_loss': 1.4453905820846558, 'eval_test_accuracy': 0.5995085995085995, 'eval_test_runtime': 5.5507, 'eval_test_samples_per_second': 219.97, 'eval_test_steps_per_second': 0.54, 'epoch': 777.78}
{'loss': 0.0076, 'learning_rate': 1.3750000000000002e-05, 'epoch': 805.56}
{'eval_test_loss': 1.448009967803955, 'eval_test_accuracy': 0.6052416052416052, 'eval_test_runtime': 5.5691, 'eval_test_samples_per_second': 219.245, 'eval_test_steps_per_second': 0.539, 'epoch': 805.56}
{'loss': 0.0072, 'learning_rate': 1.25e-05, 'epoch': 833.33}
{'eval_test_loss': 1.4657503366470337, 'eval_test_accuracy': 0.6068796068796068, 'eval_test_runtime': 5.5663, 'eval_test_samples_per_second': 219.357, 'eval_test_steps_per_second': 0.539, 'epoch': 833.33}
{'loss': 0.0073, 'learning_rate': 1.125e-05, 'epoch': 861.11}
{'eval_test_loss': 1.4750993251800537, 'eval_test_accuracy': 0.6076986076986077, 'eval_test_runtime': 5.5357, 'eval_test_samples_per_second': 220.569, 'eval_test_steps_per_second': 0.542, 'epoch': 861.11}
{'loss': 0.0067, 'learning_rate': 1e-05, 'epoch': 888.89}
{'eval_test_loss': 1.4982140064239502, 'eval_test_accuracy': 0.6076986076986077, 'eval_test_runtime': 5.5706, 'eval_test_samples_per_second': 219.188, 'eval_test_steps_per_second': 0.539, 'epoch': 888.89}
{'loss': 0.007, 'learning_rate': 8.75e-06, 'epoch': 916.67}
{'eval_test_loss': 1.4590543508529663, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.546, 'eval_test_samples_per_second': 220.16, 'eval_test_steps_per_second': 0.541, 'epoch': 916.67}
{'loss': 0.0062, 'learning_rate': 7.5e-06, 'epoch': 944.44}
{'eval_test_loss': 1.4887970685958862, 'eval_test_accuracy': 0.601965601965602, 'eval_test_runtime': 5.562, 'eval_test_samples_per_second': 219.525, 'eval_test_steps_per_second': 0.539, 'epoch': 944.44}
{'loss': 0.0061, 'learning_rate': 6.25e-06, 'epoch': 972.22}
{'eval_test_loss': 1.506649136543274, 'eval_test_accuracy': 0.6011466011466011, 'eval_test_runtime': 5.5727, 'eval_test_samples_per_second': 219.105, 'eval_test_steps_per_second': 0.538, 'epoch': 972.22}
{'loss': 0.0061, 'learning_rate': 5e-06, 'epoch': 1000.0}
{'eval_test_loss': 1.504927158355713, 'eval_test_accuracy': 0.6068796068796068, 'eval_test_runtime': 5.5313, 'eval_test_samples_per_second': 220.743, 'eval_test_steps_per_second': 0.542, 'epoch': 1000.0}
{'loss': 0.0059, 'learning_rate': 3.75e-06, 'epoch': 1027.78}
{'eval_test_loss': 1.4991811513900757, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.5587, 'eval_test_samples_per_second': 219.655, 'eval_test_steps_per_second': 0.54, 'epoch': 1027.78}
{'loss': 0.0058, 'learning_rate': 2.5e-06, 'epoch': 1055.56}
{'eval_test_loss': 1.5103862285614014, 'eval_test_accuracy': 0.5995085995085995, 'eval_test_runtime': 5.5427, 'eval_test_samples_per_second': 220.29, 'eval_test_steps_per_second': 0.541, 'epoch': 1055.56}
{'loss': 0.0058, 'learning_rate': 1.25e-06, 'epoch': 1083.33}
{'eval_test_loss': 1.5154839754104614, 'eval_test_accuracy': 0.601965601965602, 'eval_test_runtime': 5.5456, 'eval_test_samples_per_second': 220.175, 'eval_test_steps_per_second': 0.541, 'epoch': 1083.33}
{'loss': 0.0058, 'learning_rate': 0.0, 'epoch': 1111.11}
{'eval_test_loss': 1.5173912048339844, 'eval_test_accuracy': 0.601965601965602, 'eval_test_runtime': 5.5573, 'eval_test_samples_per_second': 219.711, 'eval_test_steps_per_second': 0.54, 'epoch': 1111.11}
{'train_runtime': 33468.0187, 'train_samples_per_second': 305.964, 'train_steps_per_second': 0.299, 'train_loss': 0.2646430722594261, 'epoch': 1111.11}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.