The lmeval from johanlaursen

lmeval's Introduction

Setup instructions for running evaluation

Git clone lm-evaluation-harness in this directory

git clone https://github.com/EleutherAI/lm-evaluation-harness.git

Create know_dist.yaml in the lm-evaluation/lm_eval/tasks/benchmarks with the following code:

group: know_dist
task:
- headqa_en
- headqa_es
- paws_en
- paws_es
- paws_zh
- xnli_en
- xnli_es
- xnli_zh
- xstorycloze_en
- xstorycloze_es
- xstorycloze_zh
- lambada_openai_mt_en
- lambada_openai_mt_es

run the install.sh script in your environment
change directory to /jobs and sbatch the model

Note: if you need to change the job script or create a new one then there are three things that you are crucial when you change it:

model_name: name that the results will have, mostly for properly keeping track of which model is which
model_path: where the model is located. This is the huggingface pretrained_model_name_or_path so you can also provide it with for example bigscience/bloom-3B to load a model from huggingface.
output_path: location where the results will be saved.

You might also want to change the SBATCH job-name as it makes it much easier to find the right error and output log file if you name it after the model. Most other things should not be touched except for other SBATCH settings in case you don't have red queue access or something similar. Each evaluation job script should run for roughly 30 min.

Recommend Projects