google-research / distilling-step-by-step Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Have you tried other values?
Hello,
I've run and training starts successfully.
python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type task_prefix --label_type gt --llm palm --alpha 0.5 --batch_size 64
However, I get
'eval_test_loss': nan
and ckpt
forlder is empty after the training.
Do you have advice on this? Also do you eval script?
Hello,
I'm interested in the Few-Shot Examples you used when generating rationales with the LLM model, specifically the PaLM and GPT-NeoX-20B. Could you kindly share them with the community?
Thank you!
D:\Users\MFL\anaconda3\envs\distill\pythonw.exe C:\Users\MFL\Desktop\distilling-step-by-step-main\distilling-step-by-step-main\run.py
Traceback (most recent call last):
File "C:\Users\MFL\Desktop\distilling-step-by-step-main\distilling-step-by-step-main\run.py", line 18, in
from datasets import DatasetDict, concatenate_datasets
File "D:\Users\MFL\anaconda3\envs\distill\lib\site-packages\datasets_init_.py", line 24, in
import pyarrow
File "D:\Users\MFL\anaconda3\envs\distill\lib\site-packages\pyarrow_init_.py", line 65, in
import pyarrow.lib as _lib
ImportError: DLL load failed while importing lib: 找不到指定的程序。
It's possible to download the distilled/trained models (from google/t5-v1_1-small, google/t5-v1_1-base, google/t5-v1_1-large, google/t5-v1_1-xxl), in order to do some evaluation or more finetuning on them?
not found!
Could you please report numerical results of the experiments? I conduct the standard finetuning on 8*3090s with:
python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type standard --label_type gt --batch_size 64 --grad_steps 2
I only got an accuarcy of 60.2% on CQA with the last epoch. But It seems to be around 63% reported in the paper.
Here is my training log:
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO Bootstrap : Using eth0:10.243.152.6<0>
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO NET/Socket : Using [0]eth0:10.243.152.6<0>
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO Using network Socket
NCCL version 2.10.3+cuda11.3
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 2.
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 2.
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 00/02 : 0 1 2 3 4 5 6 7
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 01/02 : 0 1 2 3 4 5 6 7
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 00 : 4[b0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 00 : 3[a0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 00 : 6[d0] -> 7[e0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 00 : 5[c0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 01 : 4[b0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 00 : 2[90] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 01 : 3[a0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 00 : 0[70] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 01 : 6[d0] -> 7[e0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 00 : 7[e0] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 01 : 5[c0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 01 : 2[90] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 00 : 1[80] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 01 : 0[70] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 01 : 7[e0] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 01 : 1[80] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 00 : 7[e0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 01 : 7[e0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 00 : 4[b0] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 00 : 5[c0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 00 : 3[a0] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 01 : 4[b0] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 01 : 5[c0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 00 : 6[d0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 01 : 3[a0] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 00 : 1[80] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 00 : 2[90] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 01 : 6[d0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 01 : 1[80] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 01 : 2[90] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO comm 0x7f0000002fb0 rank 3 nranks 8 cudaDev 3 busId a0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO comm 0x7efff8002fb0 rank 5 nranks 8 cudaDev 5 busId c0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO comm 0x7f0008002fb0 rank 1 nranks 8 cudaDev 1 busId 80 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO comm 0x7efff4002fb0 rank 4 nranks 8 cudaDev 4 busId b0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO comm 0x7effec002fb0 rank 6 nranks 8 cudaDev 6 busId d0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO comm 0x7efff0002fb0 rank 7 nranks 8 cudaDev 7 busId e0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO comm 0x7f0004002fb0 rank 0 nranks 8 cudaDev 0 busId 70 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO comm 0x7efffc002fb0 rank 2 nranks 8 cudaDev 2 busId 90 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO Launch mode Parallel
{'loss': 7.6906, 'learning_rate': 4.875e-05, 'epoch': 27.78}
{'eval_test_loss': 0.8547041416168213, 'eval_test_accuracy': 0.2457002457002457, 'eval_test_runtime': 5.5955, 'eval_test_samples_per_second': 218.211, 'eval_test_steps_per_second': 0.536, 'epoch': 27.78}
{'loss': 1.098, 'learning_rate': 4.75e-05, 'epoch': 55.56}
{'eval_test_loss': 0.5050153136253357, 'eval_test_accuracy': 0.5552825552825553, 'eval_test_runtime': 5.5448, 'eval_test_samples_per_second': 220.205, 'eval_test_steps_per_second': 0.541, 'epoch': 55.56}
{'loss': 0.4848, 'learning_rate': 4.6250000000000006e-05, 'epoch': 83.33}
{'eval_test_loss': 0.5543485879898071, 'eval_test_accuracy': 0.5954135954135954, 'eval_test_runtime': 5.569, 'eval_test_samples_per_second': 219.248, 'eval_test_steps_per_second': 0.539, 'epoch': 83.33}
{'loss': 0.2971, 'learning_rate': 4.5e-05, 'epoch': 111.11}
{'eval_test_loss': 0.6299827098846436, 'eval_test_accuracy': 0.6036036036036037, 'eval_test_runtime': 5.5232, 'eval_test_samples_per_second': 221.068, 'eval_test_steps_per_second': 0.543, 'epoch': 111.11}
{'loss': 0.196, 'learning_rate': 4.375e-05, 'epoch': 138.89}
{'eval_test_loss': 0.7029837369918823, 'eval_test_accuracy': 0.6109746109746109, 'eval_test_runtime': 5.5637, 'eval_test_samples_per_second': 219.458, 'eval_test_steps_per_second': 0.539, 'epoch': 138.89}
{'loss': 0.1373, 'learning_rate': 4.25e-05, 'epoch': 166.67}
{'eval_test_loss': 0.7832159399986267, 'eval_test_accuracy': 0.6126126126126126, 'eval_test_runtime': 5.5722, 'eval_test_samples_per_second': 219.125, 'eval_test_steps_per_second': 0.538, 'epoch': 166.67}
{'loss': 0.1015, 'learning_rate': 4.125e-05, 'epoch': 194.44}
{'eval_test_loss': 0.8421533703804016, 'eval_test_accuracy': 0.6109746109746109, 'eval_test_runtime': 5.5786, 'eval_test_samples_per_second': 218.873, 'eval_test_steps_per_second': 0.538, 'epoch': 194.44}
{'loss': 0.0771, 'learning_rate': 4e-05, 'epoch': 222.22}
{'eval_test_loss': 0.9177669882774353, 'eval_test_accuracy': 0.6183456183456183, 'eval_test_runtime': 5.5181, 'eval_test_samples_per_second': 221.273, 'eval_test_steps_per_second': 0.544, 'epoch': 222.22}
{'loss': 0.0607, 'learning_rate': 3.875e-05, 'epoch': 250.0}
{'eval_test_loss': 0.9690037369728088, 'eval_test_accuracy': 0.6134316134316135, 'eval_test_runtime': 5.5698, 'eval_test_samples_per_second': 219.217, 'eval_test_steps_per_second': 0.539, 'epoch': 250.0}
{'loss': 0.0497, 'learning_rate': 3.7500000000000003e-05, 'epoch': 277.78}
{'eval_test_loss': 1.0180637836456299, 'eval_test_accuracy': 0.6101556101556102, 'eval_test_runtime': 5.5507, 'eval_test_samples_per_second': 219.973, 'eval_test_steps_per_second': 0.54, 'epoch': 277.78}
{'loss': 0.0408, 'learning_rate': 3.625e-05, 'epoch': 305.56}
{'eval_test_loss': 1.040199875831604, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.573, 'eval_test_samples_per_second': 219.091, 'eval_test_steps_per_second': 0.538, 'epoch': 305.56}
{'loss': 0.0348, 'learning_rate': 3.5e-05, 'epoch': 333.33}
{'eval_test_loss': 1.1167311668395996, 'eval_test_accuracy': 0.6101556101556102, 'eval_test_runtime': 5.5648, 'eval_test_samples_per_second': 219.416, 'eval_test_steps_per_second': 0.539, 'epoch': 333.33}
{'loss': 0.0292, 'learning_rate': 3.375000000000001e-05, 'epoch': 361.11}
{'eval_test_loss': 1.1364021301269531, 'eval_test_accuracy': 0.6027846027846028, 'eval_test_runtime': 5.5637, 'eval_test_samples_per_second': 219.459, 'eval_test_steps_per_second': 0.539, 'epoch': 361.11}
{'loss': 0.0258, 'learning_rate': 3.2500000000000004e-05, 'epoch': 388.89}
{'eval_test_loss': 1.1679093837738037, 'eval_test_accuracy': 0.6117936117936118, 'eval_test_runtime': 5.5548, 'eval_test_samples_per_second': 219.81, 'eval_test_steps_per_second': 0.54, 'epoch': 388.89}
{'loss': 0.023, 'learning_rate': 3.125e-05, 'epoch': 416.67}
{'eval_test_loss': 1.205809473991394, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.575, 'eval_test_samples_per_second': 219.014, 'eval_test_steps_per_second': 0.538, 'epoch': 416.67}
{'loss': 0.0201, 'learning_rate': 3e-05, 'epoch': 444.44}
{'eval_test_loss': 1.2262288331985474, 'eval_test_accuracy': 0.6076986076986077, 'eval_test_runtime': 5.5226, 'eval_test_samples_per_second': 221.091, 'eval_test_steps_per_second': 0.543, 'epoch': 444.44}
{'loss': 0.0182, 'learning_rate': 2.8749999999999997e-05, 'epoch': 472.22}
{'eval_test_loss': 1.2057785987854004, 'eval_test_accuracy': 0.6101556101556102, 'eval_test_runtime': 5.5463, 'eval_test_samples_per_second': 220.146, 'eval_test_steps_per_second': 0.541, 'epoch': 472.22}
{'loss': 0.0157, 'learning_rate': 2.7500000000000004e-05, 'epoch': 500.0}
{'eval_test_loss': 1.2767386436462402, 'eval_test_accuracy': 0.6093366093366094, 'eval_test_runtime': 5.5533, 'eval_test_samples_per_second': 219.867, 'eval_test_steps_per_second': 0.54, 'epoch': 500.0}
{'loss': 0.0149, 'learning_rate': 2.625e-05, 'epoch': 527.78}
{'eval_test_loss': 1.3246893882751465, 'eval_test_accuracy': 0.6052416052416052, 'eval_test_runtime': 5.6071, 'eval_test_samples_per_second': 217.759, 'eval_test_steps_per_second': 0.535, 'epoch': 527.78}
{'loss': 0.0133, 'learning_rate': 2.5e-05, 'epoch': 555.56}
{'eval_test_loss': 1.3044090270996094, 'eval_test_accuracy': 0.6117936117936118, 'eval_test_runtime': 5.5897, 'eval_test_samples_per_second': 218.437, 'eval_test_steps_per_second': 0.537, 'epoch': 555.56}
{'loss': 0.0124, 'learning_rate': 2.375e-05, 'epoch': 583.33}
{'eval_test_loss': 1.3567758798599243, 'eval_test_accuracy': 0.6085176085176085, 'eval_test_runtime': 5.5682, 'eval_test_samples_per_second': 219.28, 'eval_test_steps_per_second': 0.539, 'epoch': 583.33}
{'loss': 0.0116, 'learning_rate': 2.25e-05, 'epoch': 611.11}
{'eval_test_loss': 1.3604899644851685, 'eval_test_accuracy': 0.6060606060606061, 'eval_test_runtime': 5.5799, 'eval_test_samples_per_second': 218.819, 'eval_test_steps_per_second': 0.538, 'epoch': 611.11}
{'loss': 0.011, 'learning_rate': 2.125e-05, 'epoch': 638.89}
{'eval_test_loss': 1.3682199716567993, 'eval_test_accuracy': 0.6052416052416052, 'eval_test_runtime': 5.5523, 'eval_test_samples_per_second': 219.907, 'eval_test_steps_per_second': 0.54, 'epoch': 638.89}
{'loss': 0.0099, 'learning_rate': 2e-05, 'epoch': 666.67}
{'eval_test_loss': 1.4006143808364868, 'eval_test_accuracy': 0.6068796068796068, 'eval_test_runtime': 5.5158, 'eval_test_samples_per_second': 221.363, 'eval_test_steps_per_second': 0.544, 'epoch': 666.67}
{'loss': 0.0091, 'learning_rate': 1.8750000000000002e-05, 'epoch': 694.44}
{'eval_test_loss': 1.4297248125076294, 'eval_test_accuracy': 0.6027846027846028, 'eval_test_runtime': 5.5505, 'eval_test_samples_per_second': 219.981, 'eval_test_steps_per_second': 0.54, 'epoch': 694.44}
{'loss': 0.0091, 'learning_rate': 1.75e-05, 'epoch': 722.22}
{'eval_test_loss': 1.4137226343154907, 'eval_test_accuracy': 0.5945945945945946, 'eval_test_runtime': 5.5458, 'eval_test_samples_per_second': 220.168, 'eval_test_steps_per_second': 0.541, 'epoch': 722.22}
{'loss': 0.0083, 'learning_rate': 1.6250000000000002e-05, 'epoch': 750.0}
{'eval_test_loss': 1.4431531429290771, 'eval_test_accuracy': 0.597051597051597, 'eval_test_runtime': 5.5794, 'eval_test_samples_per_second': 218.841, 'eval_test_steps_per_second': 0.538, 'epoch': 750.0}
{'loss': 0.0083, 'learning_rate': 1.5e-05, 'epoch': 777.78}
{'eval_test_loss': 1.4453905820846558, 'eval_test_accuracy': 0.5995085995085995, 'eval_test_runtime': 5.5507, 'eval_test_samples_per_second': 219.97, 'eval_test_steps_per_second': 0.54, 'epoch': 777.78}
{'loss': 0.0076, 'learning_rate': 1.3750000000000002e-05, 'epoch': 805.56}
{'eval_test_loss': 1.448009967803955, 'eval_test_accuracy': 0.6052416052416052, 'eval_test_runtime': 5.5691, 'eval_test_samples_per_second': 219.245, 'eval_test_steps_per_second': 0.539, 'epoch': 805.56}
{'loss': 0.0072, 'learning_rate': 1.25e-05, 'epoch': 833.33}
{'eval_test_loss': 1.4657503366470337, 'eval_test_accuracy': 0.6068796068796068, 'eval_test_runtime': 5.5663, 'eval_test_samples_per_second': 219.357, 'eval_test_steps_per_second': 0.539, 'epoch': 833.33}
{'loss': 0.0073, 'learning_rate': 1.125e-05, 'epoch': 861.11}
{'eval_test_loss': 1.4750993251800537, 'eval_test_accuracy': 0.6076986076986077, 'eval_test_runtime': 5.5357, 'eval_test_samples_per_second': 220.569, 'eval_test_steps_per_second': 0.542, 'epoch': 861.11}
{'loss': 0.0067, 'learning_rate': 1e-05, 'epoch': 888.89}
{'eval_test_loss': 1.4982140064239502, 'eval_test_accuracy': 0.6076986076986077, 'eval_test_runtime': 5.5706, 'eval_test_samples_per_second': 219.188, 'eval_test_steps_per_second': 0.539, 'epoch': 888.89}
{'loss': 0.007, 'learning_rate': 8.75e-06, 'epoch': 916.67}
{'eval_test_loss': 1.4590543508529663, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.546, 'eval_test_samples_per_second': 220.16, 'eval_test_steps_per_second': 0.541, 'epoch': 916.67}
{'loss': 0.0062, 'learning_rate': 7.5e-06, 'epoch': 944.44}
{'eval_test_loss': 1.4887970685958862, 'eval_test_accuracy': 0.601965601965602, 'eval_test_runtime': 5.562, 'eval_test_samples_per_second': 219.525, 'eval_test_steps_per_second': 0.539, 'epoch': 944.44}
{'loss': 0.0061, 'learning_rate': 6.25e-06, 'epoch': 972.22}
{'eval_test_loss': 1.506649136543274, 'eval_test_accuracy': 0.6011466011466011, 'eval_test_runtime': 5.5727, 'eval_test_samples_per_second': 219.105, 'eval_test_steps_per_second': 0.538, 'epoch': 972.22}
{'loss': 0.0061, 'learning_rate': 5e-06, 'epoch': 1000.0}
{'eval_test_loss': 1.504927158355713, 'eval_test_accuracy': 0.6068796068796068, 'eval_test_runtime': 5.5313, 'eval_test_samples_per_second': 220.743, 'eval_test_steps_per_second': 0.542, 'epoch': 1000.0}
{'loss': 0.0059, 'learning_rate': 3.75e-06, 'epoch': 1027.78}
{'eval_test_loss': 1.4991811513900757, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.5587, 'eval_test_samples_per_second': 219.655, 'eval_test_steps_per_second': 0.54, 'epoch': 1027.78}
{'loss': 0.0058, 'learning_rate': 2.5e-06, 'epoch': 1055.56}
{'eval_test_loss': 1.5103862285614014, 'eval_test_accuracy': 0.5995085995085995, 'eval_test_runtime': 5.5427, 'eval_test_samples_per_second': 220.29, 'eval_test_steps_per_second': 0.541, 'epoch': 1055.56}
{'loss': 0.0058, 'learning_rate': 1.25e-06, 'epoch': 1083.33}
{'eval_test_loss': 1.5154839754104614, 'eval_test_accuracy': 0.601965601965602, 'eval_test_runtime': 5.5456, 'eval_test_samples_per_second': 220.175, 'eval_test_steps_per_second': 0.541, 'epoch': 1083.33}
{'loss': 0.0058, 'learning_rate': 0.0, 'epoch': 1111.11}
{'eval_test_loss': 1.5173912048339844, 'eval_test_accuracy': 0.601965601965602, 'eval_test_runtime': 5.5573, 'eval_test_samples_per_second': 219.711, 'eval_test_steps_per_second': 0.54, 'epoch': 1111.11}
{'train_runtime': 33468.0187, 'train_samples_per_second': 305.964, 'train_steps_per_second': 0.299, 'train_loss': 0.2646430722594261, 'epoch': 1111.11}
I am looking at the files inside llm folder of anli1, the val_cot_0 has more than 1400 samples (I looked the number of "so the answer is" in the file) while the validation without rational has 1000 samples? Why is there difference?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.