Git Product home page Git Product logo

tk-instruct's Introduction

Tk-Instruct

  • This repo releases our implementation for the Tk-Instruct model in the Super-NaturalInstructions paper.
  • Tk-Instruct is a preliminary attempt towards general-purpose AI that can solve many NLP tasks by following in-context instructions (plain language task definitions or k-shot examples).
  • It is built based on the pretrained T5 model, and finetuned on our data.
  • You can play with the 11B model via our online demo!

Requirements

Our main experiments and analysis are conducted on the following environment:

  • CUDA (11.3)
  • cuDNN (8.2.0.53)
  • Pytorch (1.10.0)
  • Transformers (4.17.0)
  • DeepSpeed

You can refer to the Dockerfile for setting up the environment and install the required python libraries by running

pip install -r requirements.txt

Note: after the main exploration with 3B model, we train our 11B model on TPUs using the T5 code here.

Data

Our models are trained and evaluated on Super-NaturalInstructions, which can be cloned by running:

git clone [email protected]:allenai/natural-instructions.git data

Since Super-NaturalInstructions didn't provide an official split for the development set, in order to do evaluation during training time, you can mannualy create a dev_tasks.txt in the data/splits/default folder. We found it unclear what should be a meaningful validation set, under such cross-task generalization setting. You can use a part of the training tasks for validation, or you can set apart tasks in some categories for validation.

If you want to use the T5 code here, you can convert the data into text2text format with scripts/convert_data_to_s2s.sh.

Training

A sample script for training the Tk-Instruct 3B model in our paper can be found at scripts/train_tk_instruct.sh. You can run it as follows:

./scripts/train_tk_instruct.sh

However, if you are familiar with Beaker, you can refer to the beaker_configs/default_experiment.yaml for a sample experiment config, and modifying src/create_exps.py to easily starts a set of experiments by running:

python src/create_exps.py

Released Checkpoints

Our 3B and 11B model checkpoints are accessible via the Hugging Face Hub. You can load them easily using the Transformers library:

>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

>>> tokenizer = AutoTokenizer.from_pretrained("allenai/tk-instruct-3b-def")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("allenai/tk-instruct-3b-def")

>>> input_ids = tokenizer.encode(
        "Definition: return the currency of the given country. Now complete the following example - Input: India. Output:", 
        return_tensors="pt"
    )
>>> output = model.generate(input_ids, max_length=10)
>>> output = tokenizer.decode(output[0], skip_special_tokens=True)

The model should generate 'Indian Rupee' as the output.

Evaluation

The following script evaluates our 3B Tk-Instruct model that uses task definition + 2 positive examples as instructions:

./scripts/eval_tk_instruct.sh

This should give you a ROUGE-L score of ~54.0, as is reported in the Table 3 of our paper.

You can also try other models under different encodings. You can control whether to include definition / explanation, or the number of pos/neg examples, by specifying the arguments in src/run_s2s.py.

The numbers for heuristic baselines and GPT3 can be reproduced by using the following scripts:

./scripts/run_heuristics.sh
./scripts/run_gpt3.sh

Model Predictions and Performance

The predictions of our tested models can be found in the output folder. You can evaluate each predition file in the following way:

python src/compute_metrics.py --predictions output/default/tk-instruct-3b-def-pos/predicted_examples.jsonl --track default --compute_per_category_metrics
python src/compute_metrics.py --predictions output/xlingual/mtk-instruct-3b-def-pos/predicted_examples.jsonl --track xlingual --compute_per_category_metrics

Here are the performance numbers (in ROUGE-L) for our tested models:

Models Default Track (en) X-lingual Track
Heuristic Baselines Copying Instance Input 14.20 5.44
Copying Demo. Output 28.54 50.31
Pretrained LMs T5-LM (11B) 30.16 -
GPT3 (175B) 45.05 51.20
Instruction-tuned Models T0 (11B) 32.28 -
GPT3-Instruct (175B) 52.06 53.74
Tk-Instruct (Ours, 3B) 54.33 -
Tk-Instruct (Ours, 11B) 60.07 -
mTk-Instruct (Ours, 3B) - 56.72

Note that these numbers might be different from the numbers reported in the our arxiv paper, because we 1) resampled our evaluation instances; 2) updated our evaluation script. We will update the paper once allowed.

We will keep adding the predictions and performance of new models into this repository.

Citation

@inproceedings{supernaturalinstructions,
  title={Super-NaturalInstructions:Generalization via Declarative Instructions on 1600+ Tasks},
  author={Wang, Yizhong and Mishra, Swaroop and Alipoormolabashi, Pegah and Kordi, Yeganeh and Mirzaei, Amirreza and Arunkumar, Anjana and Ashok, Arjun and Dhanasekaran, Arut Selvan and Naik, Atharva and Stap, David and others},
  booktitle={EMNLP},
  year={2022}
}

tk-instruct's People

Contributors

yizhongw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tk-instruct's Issues

Fine tune of Multi-news dataset

Hello,

Could we finetune (few shot e.g. 200 steps) or prompt tune (e.g. soft prompt tuning) the model on specific dataset. , do you have any code for these tasks.
Thanks

About the Evaluation Metrics

Hi, your work is very cool and i really like it. I have a question about your adopted evaluation metrics. Using compute_metrics.py, the Exact Matching(EM) and ROUGE-L scores are both reported. But only the ROUGE-L results are reported in the paper. I am confused about it. Could you give me some insights about it? Thanks.

Results reported by compute_metrics.py
image

Results reported by the paper
image

The following signatures couldn't be verified because the public key is not available:

I used Ubuntu 18
docker build -t tk_instruct:v1 .
But failed:

W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC

E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
The command '/bin/sh -c apt-get -y update' returned a non-zero code: 100

How to mannualy create a dev_tasks.txt

I run ./scripts/train_tk_instruct.sh
But it cannot find dev_task.txt
I wonder whether it is a must.

Traceback (most recent call last):
  File "src/run_s2s.py", line 581, in <module>
    main()
  File "src/run_s2s.py", line 317, in main
    max_num_instances_per_eval_task=data_args.max_num_instances_per_eval_task
  File "/opt/conda/lib/python3.7/site-packages/datasets/load.py", line 1699, in load_dataset
    use_auth_token=use_auth_token,
  File "/opt/conda/lib/python3.7/site-packages/datasets/builder.py", line 596, in download_and_prepare
    dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
  File "/opt/conda/lib/python3.7/site-packages/datasets/builder.py", line 690, in _download_and_prepare
    ) from None
OSError: Cannot find data file.
Original error:
[Errno 2] No such file or directory: 'data/splits/default/dev_tasks.txt'

question about the train learning rate

hi, appreciate the work, i have two questions:

  1. from the code https://github.com/yizhongw/Tk-Instruct/blob/main/scripts/train_tk_instruct.sh#L34, the learning rate is 5e-05, why so small? the learning rate from bigscience t0(https://arxiv.org/pdf/2110.08207.pdf) is learning rate of 1e-3。Why is there such a big difference? have you tried large learning rate? if i collect 40 million labeled data, and continue multiltask finetune from tk-instruct, what learning rate should i use?
    image

  2. the lr_scheduler_type is set to constant, why not use linear?(from min lr to max lr over warmup_num_steps steps, and then decay at linear rate over the remaining training steps.)

The `max_num_instances_per_task` Parameter in Experiments

Hi, thank you for your great work!

Apologize if I misread the paper. When training Tk-instruct, did you use max_num_instances_per_task=100 or max_num_instances_per_task=64 to report the main results in Table 3 and Figure 4? I cannot find anywhere in the paper how many training instances per task are used. Figure 5(b) mentions that 64 instances per task give the best results, but in the code, the default setting is 100, which confuses me.

Thank you!

How to debug deepspeed in vscode?

Dear authors,

I run the scripts/train_tk_instruct.sh successfully, but I don't know how to debug your code with deepspeed in the vscode.

For example, you use the command deepspeed --master_port $port src/run_s2s.py **args, but how can we implement it in the launch.json of vscode?

I tried to write the launch.json like below:

{   "cwd": "/home/xxx/code/project/",
    "env": {"CUDA_VISIBLE_DEVICES":"7",
            "CUDA_DEVICE_ORDER":"PCI_BUS_ID",
            "TRANSFORMERS_CACHE":"/home/xxx/.cache/huggingface"}, 
    "name": "deepspeed",
    //"type": "python",
    "type": "deepspeed",   // there is no supported interpreter for deepspeed
    "request": "launch",
    "program": "${file}",
    "console": "integratedTerminal",
    "justMyCode": false,
    "args": [
        "--master_port","10086",
        "--do_train",
        "--do_predict",
        "--predict_with_generate"
    ]
}

But it doesn't work due to there is no supported interpreter for deepspeed in the vscode.

Cannot load `tk-instruct-11b-def` with Huggingface transformers

Hi there, it seems like the 11B-def model cannot be loaded with Huggingface's transformers library, because the pytorch_model.bin file is missing.

I've used the following code (which works fine if we replace 11b with 3b):

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("allenai/tk-instruct-11b-def")

which results in this error:

Traceback (most recent call last):
  File "/transformers/src/transformers/modeling_utils.py", line 1359, in from_pretrained
    resolved_archive_file = cached_path(
  File "/transformers/src/transformers/file_utils.py", line 1938, in cached_path
    output_path = get_from_cache(
  File "/transformers/src/transformers/file_utils.py", line 2142, in get_from_cache
    _raise_for_status(r)
  File "/transformers/src/transformers/file_utils.py", line 2065, in _raise_for_status
    raise EntryNotFoundError(f"404 Client Error: Entry Not Found for url: {request.url}")
transformers.file_utils.EntryNotFoundError: 404 Client Error: Entry Not Found for url: https://huggingface.co/allenai/tk-instruct-11b-def/resolve/main/pytorch_model.bin

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/transformers/src/transformers/models/auto/auto_factory.py", line 447, in from_pretrained
    return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
  File "/transformers/src/transformers/modeling_utils.py", line 1404, in from_pretrained
    raise EnvironmentError(
OSError: allenai/tk-instruct-11b-def does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

Unable to reproduce Tk-Instruct predictions on Natural Instructions test

I'm unable to reproduce the predictions found in Tk-Instruct/output/default/tk-instruct-3b-def-pos/predicted_examples.jsonl using the tk-instruct-3b-def-pos model: The predictions I've computed only match with the provided ones ~60% of the time, resulting in a much lower score of 49 vs 54 reported in the paper.

To give one specific example, for task102-87fdccda3ce94464ba5b247a32fb6d74 the input is cob#corn#eat. I used the provided scripts/convert_data_to_s2s.sh script to convert all examples into linearized inputs. In this particular case, doing so returns:

Definition: In this task, you are given concept set (with 3 to 5 concepts) that contain mentions of names of people, places, activities, or things. These concept sets reflect reasonable concept co-occurrences in everyday situations. All concepts given as input are separated by \"#\". Your job is to generate a sentence describing a day-to-day scene using all concepts from a given concept set. Positive Example 1 - Input: mountain#ski#skier. Output: Skier skis down the mountain. Positive Example 2 - Input: call#character#contain#wallpaper. Output: queen of wallpaper containing a portrait called film character . Now complete the following example - Input: cob#corn#eat. Output:

From what I can tell, this is the correct input (definition + 2 positive examples + input). I used the following code to get a prediction for this input:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("allenai/tk-instruct-3b-def-pos").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("allenai/tk-instruct-3b-def-pos")
tokenizer.batch_decode(model.generate(tokenizer.encode(input, return_tensors="pt").to(model.device)))

where input is the string from above. However, this gives the output cob corn and eat, whereas the expected output (according to the predictions file) would be a man cobs corn and eats it..

I have also directly queried the model on the huggingface hub (you can do so using this link), which also gives cob corn and eat as output.

Why am I not getting the "correct" prediction for this example (and many other examples)?

Low ROUGE scores for Tk-instruct large?

Hi Yizhong,

Thanks for the great work and for making everything public!

I'm trying to reproduce/better understand these results you showed in the paper here:
image

Looking at this graph it seems like T5 Large 770M should be getting 48.0 ROUGE-L on unseen tasks, am I reading this graph correctly?

Some questions

  1. Is this the allenai/tk-instruct-large-def-pos on huggingface hub?
  2. How can I reproduce this training result? I took scripts/train_tk_instruct.sh and simply swapped in T5-large as the base model instead of T5-XL. But I'm getting substantially lower ROUGE scores (see screenshot). Are there different hyperparameters for this training run?

image

I notice you said in issue #1 that you found

remarkable gap between the smaller models and the 11B or 3B models in generalizing to new tasks

But the scaling results in the original paper don't seem too bad, i.e. 48 ROUGE vs 54 ROUGE for the 3B model. On the other hand the results I'm getting finetuning T5 Large are indeed substantially worse. So just trying to reconcile things here.

Tokenizer & Model info

From the given line, it appears that the data has been transformed using a gpt2 tokenizer.

tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

However, the model trained is T5.v.1.-lm-adapted checkpoint. Can you confirm that the model was trained with a T5 tokenizer?

Also all the model cards are same in huggingface,

allenai/tk-instruct-11b-def-pos-neg-expl, allenai/tk-instruct-11b-def-pos, allenai/tk-instruct-11b-def.

Can you confirm that the task encoding for these models are following,

  1. allenai/tk-instruct-11b-def-pos-neg-expl : {"add_task_name": False, "add_task_definition": True, "num_pos_examples": 1, "num_neg_examples": 1, "add_explanation": True},
  2. allenai/tk-instruct-11b-def-pos : {"add_task_name": False, "add_task_definition": True, "num_pos_examples": 1, "num_neg_examples": 0, "add_explanation": False},
  3. allenai/tk-instruct-11b-def : {"add_task_name": False, "add_task_definition": True, "num_pos_examples": False, "num_neg_examples": 0, "add_explanation": False},

Able to predict more than one test case in one call?

I am wondering if there is a way to predict more than one test case with tk-instruct.

Someting like in this format:

Model definition
Positive examples
Negative examples

Now predict the followings
Input1
Input2
Input3

Predicting a topic that doesnot exist in the list

I was using 3b model and i was framing the model as following:

Definition: Find the topic of the given text among this topic list: topic1,topic2

Postive example 1
Input
Output: topic1
Explanation: .....because of that result is topic1
....

When i ran the model, the model will output a topic that is not in the list in the definition or in the outputs of any example inputs.

What can we do to avoid this?

[Question] parameters for performance reproduction in paper

Hello Yizhong and everyone!

Thanks for your great work and contribution. While I'm attempting to replicate the performance in Fig. 5b in this paper with its setting, I find there is a gap and I was wondering if you could share some experience on this.

I attempted four times to run scripts/train_tk_instruct.sh with only changing --max_num_instances_per_task or --seed

  • when set --max_num_instances_per_task 8, it reports train/predict_rougeL 45.866, from Fig. 5b, it's 48.5
  • when set --max_num_instances_per_task 8 and --seed 1337, it reports train/predict_rougeL 46.762
  • when set --max_num_instances_per_task 64, it reports train/predict_rougeL 49.6898, from Fig. 5b, it's 54.7
  • when set --max_num_instances_per_task 100(default), it reports train/predict_rougeL 49.3467

I simply copied data/splits/default/test_tasks.txt into data/splits/default/dev_tasks.txt while maintaining the default settings for everything else. I'm not sure if the parameters in scripts/train_tk_instruct.sh are the default settings in paper, and I'm hoping that you can kindly offer some suggestion.

Thanks in advance!

Cheers,
Leo

Request for checkpoints of smaller models

Hi,
Really loved your work!
Thanks for sharing your 3B and 11B checkpoints.
Would it be possible to upload the checkpoints for small models as well? (small, base and large sizes)

Evaluation time estimate?

Hi - thank you very much for the paper and the repository.

I am trying to run the eval_tk_instruct.sh script as instructed here on a Titan RTX to reproduce ROUGE-L and Exact Match metrics.
I'm running this on a SLURM-enabled server, and initially set a (admittedly optimistic) time limit of 4 hours. Turns out the evaluation was cut short by this limit and I was only 7% of the way done with evaluation, 4 hours in, with a total estimated evaluation time of around 40 hours.

I then notice in the paper that you ran your experiments "with 8 A100 GPUs with 48G GPU memory per each". Is this also true (and perhaps necessary) for evaluation? Are my time estimates reported above therefore expected? Or do you think I am doing something wrong? Is GPU parallelism even enabled in the evaluation command (I see the deepspeed arg is not passed).

Thanks!

raise RuntimeError("WandbCallback requires wandb to be installed. Run `pip install wandb`.")

Traceback (most recent call last):
  File "src/run_s2s.py", line 590, in <module>
    main()
  File "src/run_s2s.py", line 484, in main
    callbacks=[DenserEvalCallback] if training_args.denser_evaluation else None
  File "/opt/conda/lib/python3.7/site-packages/transformers/trainer.py", line 407, in __init__
    callbacks, self.model, self.tokenizer, self.optimizer, self.lr_scheduler
  File "/opt/conda/lib/python3.7/site-packages/transformers/trainer_callback.py", line 290, in __init__
    self.add_callback(cb)
  File "/opt/conda/lib/python3.7/site-packages/transformers/trainer_callback.py", line 307, in add_callback
    cb = callback() if isinstance(callback, type) else callback
  File "/opt/conda/lib/python3.7/site-packages/transformers/integrations.py", line 542, in __init__
raise RuntimeError("WandbCallback requires wandb to be installed. Run `pip install wandb`.")

I can import wandb, but cannot callback.

python : 3.7.11
pip list: has installed wandb 0.12.10
conda list :wandb 0.12.10 pypi_0 pypi
which wandb : /opt/conda/bin/wandb
which pip : /opt/conda/bin/pip

finetune 11b model

Hi, nice work ! In this paper, for 11b model, "These experiments are run on Google V3-256 TPUs ".
for T5 models smaller than 11b, "These experiments are conducted with 8 A100 GPUs with 48GB GPU memory ".
Have you tried fine-tuning the 11b model on 8 A100 GPUs with 48GB.
I am trying to finetune t5 11b with deepspeed on 8 RTX6000 GPUs (48G). I use your script and just modify google/t5-xl-lm-adapt to google/t5-xxl-lm-adapt. When I use 'ds_configs/stage2.config', I meet the error.

[INFO|modeling_utils.py:1770] 2023-03-04 18:11:21,751 >> loading weights file /home/lizhi/Tk-Instruct-main/google/t5-xxl-lm-adapt/pytorch_model.bin
[2023-03-04 18:21:36,276] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3689226
[2023-03-04 18:21:36,305] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3689227
[2023-03-04 18:21:37,442] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3689228
[2023-03-04 18:21:38,699] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3689229
[2023-03-04 18:21:40,033] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3689230
[2023-03-04 18:21:41,370] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3689231
[2023-03-04 18:21:42,864] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3689232
[2023-03-04 18:21:44,274] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3689233
[2023-03-04 18:21:45,687] [ERROR] [launch.py:324:sigkill_handler] ['/home/lizhi/anaconda3/envs/tk-instruct/bin/python', '-u', 'src/run_s2s.py', '--local_rank=7', '--do_train', '--do_predict', '--predict_with_generate', '--model_name_or_path', '/home/lizhi/Tk-Instruct-main/google/t5-xxl-lm-adapt', '--max_source_length', '1024', '--max_target_length', '128', '--generation_max_length', '128', '--max_num_instances_per_task', '100', '--max_num_instances_per_eval_task', '100', '--add_task_name', 'False', '--add_task_definition', 'True', '--num_pos_examples', '2', '--num_neg_examples', '0', '--add_explanation', 'False', '--tk_instruct', 'False', '--data_dir', 'data/splits/default', '--task_dir', 'data/tasks', '--output_dir', 'output/', '--overwrite_output_dir', '--cache_dir', './cache/', '--overwrite_cache', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--learning_rate', '5e-05', '--num_train_epochs', '1', '--lr_scheduler_type', 'constant', '--warmup_steps', '0', '--logging_strategy', 'steps', '--logging_steps', '500', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '2500', '--deepspeed', 'ds_configs/stage2.config', '--bf16', '--run_name', 't5-experiment'] exits with return code = -9

I guess that the memory is not enough, but I have noticed that other project finetune model with similar gpus.
So I try to modify the 'ds_configs/stage2.config' to 'ds_configs/stage3.config'. The server is stuck and the error is similar to the one above:

[2023-03-05 03:26:26,366] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3691645
[2023-03-05 03:26:26,366] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3691646
[2023-03-05 03:26:27,663] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3691647
[2023-03-05 03:26:28,920] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3691648
[2023-03-05 03:26:30,178] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3691649
[2023-03-05 03:26:31,553] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3691650
[2023-03-05 03:26:32,846] [ERROR] [launch.py:324:sigkill_handler] ['/home/lizhi/anaconda3/envs/tk-instruct/bin/python', '-u', 'src/run_s2s.py', '--local_rank=7', '--do_train', '--do_predict', '--predict_with_generate', '--model_name_or_path', '/home/lizhi/Tk-Instruct-main/google/t5-xxl-lm-adapt', '--max_source_length', '1024', '--max_target_length', '128', '--generation_max_length', '128', '--max_num_instances_per_task', '1', '--max_num_instances_per_eval_task', '1', '--add_task_name', 'False', '--add_task_definition', 'True', '--num_pos_examples', '2', '--num_neg_examples', '0', '--add_explanation', 'False', '--tk_instruct', 'False', '--data_dir', 'data/splits/default', '--task_dir', 'data/tasks', '--output_dir', 'output/', '--overwrite_output_dir', '--cache_dir', './cache/', '--overwrite_cache', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--learning_rate', '5e-05', '--num_train_epochs', '1', '--lr_scheduler_type', 'constant', '--warmup_steps', '0', '--logging_strategy', 'steps', '--logging_steps', '500', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '2500', '--deepspeed', 'ds_configs/stage3.config'] exits with return code = -9

Can you give me some advice?

Datasets folder?

In line 32 of /src/run_s2s.py it reads,
from datasets import load_dataset, load_metric

However, I can't seem to find the datasets folder anywhere in your repository.
Could you point out where I can find this part? Thank you.

[Errno 2] No such file or directory: 'data/tasks/.json

I run this command ./scripts/train_tk_instruct.sh
and OSError: Cannot find data file.
Original error:
[Errno 2] No such file or directory: 'data/tasks/.json'
Downloading and preparing dataset natural_instructions/default to ./cache/natural_instructions/default-e8e348fe903ab36e/2.0.0/6aaf834f8edcbb685eabac7ce3c7af68bc248c248cea3277a382991604117fc5..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.