Hi, I have a question about the GLUE task, MNLI. As you know, MNLI h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Also attaching GLUE benchmark deion that will be added into the Appendix to prov

[P1] MNLI has two validation set, how do you report the score about pyreft HOT 3 CLOSED

BaohaoLiao commented on May 17, 2024

[P1] MNLI has two validation set, how do you report the score

from pyreft.

Comments (3)

frankaging commented on May 17, 2024 1

@BaohaoLiao Hey, thanks for your question! For MNLI dataset, we choose the validation_matched split for validation and testing. (I will make this clear in the next revision. I think the RED paper was not clear either, so I figured this out by emailing the authors! I might also just describe what RED paper appendix says in the ReFT paper as well to make it self-contained about the validation setup and evaluation metric (whether use accuracy, correlation, etc..).)

To reproduce, here is an example script for RoBERTa-base. For RoBERTa-large, you can copy the hyperparameters from our appendix to reproduce:

python train.py -task glue \
-train_dataset mnli \
-model FacebookAI/roberta-base \
-seed 42 -l all -r 1 -p f1 -e 40 -lr 6e-4 \
-type LoreftIntervention \
-gradient_accumulation_steps 1 \
-batch_size 32 \
-eval_batch_size 32 \
-test_split validation_matched \
-max_length 256 \
--metric_for_best_model accuracy \
--dropout 0.05 \
--weight_decay 0.0000 \
--warmup_ratio 0.00 \
--logging_steps 20 \
--allow_cls_grad

Use the seeds {42,43,44,45,46}. And for the validation set partition, please refer to our code for details. But basically, we partition a set from the validation set (random partition based on the seed) for selecting the best model, and report the final accuracy on the hold out set.

Please let me know if you have other questions! And feel free to close the ticket if you feel like your question is addressed.

Thanks for your interests!

from pyreft.

frankaging commented on May 17, 2024 1

Also attaching GLUE benchmark description that will be added into the Appendix to provide more details. Please also see Appendix A.1 of the RED paper for the original implementation (I basically paraphrased their setup description, so credit goes to them).

from pyreft.

BaohaoLiao commented on May 17, 2024

Thank you very much for your timely help.

from pyreft.

Recommend Projects

[P1] MNLI has two validation set, how do you report the score about pyreft HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent