Git Product home page Git Product logo

wizardlm's Introduction

WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions

๐Ÿ  Home Page

๐Ÿค— HF Repo โ€ข ๐Ÿฆ Twitter โ€ข ๐Ÿ“ƒ [WizardLM] @ICLR2024 โ€ข ๐Ÿ“ƒ [WizardCoder] @ICLR2024 โ€ข ๐Ÿ“ƒ [WizardMath]

๐Ÿ‘‹ Join our Discord

WizardLM

Code License Data License Python 3.9+

Unofficial Video Introductions

Thanks to the enthusiastic friends, their video introductions are more lively and interesting.

  1. NEW WizardLM 70b ๐Ÿ”ฅ Giant Model...Insane Performance
  2. GET WizardLM NOW! 7B LLM KING That Can Beat ChatGPT! I'm IMPRESSED!
  3. WizardLM: Enhancing Large Language Models to Follow Complex Instructions
  4. WizardCoder AI Is The NEW ChatGPT's Coding TWIN!

News

  • ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ[2024/01/04] We released WizardCoder-33B-V1.1 trained from deepseek-coder-33b-base, the SOTA OSS Code LLM on EvalPlus Leaderboard, achieves 79.9 pass@1 on HumanEval, 73.2 pass@1 on HumanEval-Plus, 78.9 pass@1 on MBPP, and 66.9 pass@1 on MBPP-Plus. WizardCoder-33B-V1.1 outperforms ChatGPT 3.5, Gemini Pro, and DeepSeek-Coder-33B-instruct on HumanEval and HumanEval-Plus pass@1. WizardCoder-33B-V1.1 is comparable with ChatGPT 3.5, and surpasses Gemini Pro on MBPP and MBPP-Plus pass@1.
  • [2023/08/26] We released WizardCoder-Python-34B-V1.0 , which achieves the 73.2 pass@1 and surpasses GPT4 (2023/03/15), ChatGPT-3.5, and Claude2 on the HumanEval Benchmarks. For more details, please refer to WizardCoder.
  • [2023/06/16] We released WizardCoder-15B-V1.0 , which surpasses Claude-Plus (+6.8), Bard (+15.3) and InstructCodeT5+ (+22.3) on the HumanEval Benchmarks. For more details, please refer to WizardCoder.
Model Checkpoint Paper HumanEval HumanEval+ MBPP MBPP+
GPT-4-Turbo (Nov 2023) - - 85.4 81.7 83.0 70.7
GPT-4 (May 2023) - - 88.4 76.8 - -
GPT-3.5-Turbo (Nov 2023) - - 72.6 65.9 81.7 69.4
Gemini Pro - - 63.4 55.5 72.9 57.9
DeepSeek-Coder-33B-instruct - - 78.7 72.6 78.7 66.7
WizardCoder-33B-V1.1 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardCoder] 79.9 73.2 78.9 66.9
WizardCoder-Python-34B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardCoder] 73.2 64.6 73.2 59.9
WizardCoder-15B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardCoder] 59.8 52.4 -- --
WizardCoder-Python-13B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardCoder] 64.0 -- -- --
WizardCoder-Python-7B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardCoder] 55.5 -- -- --
WizardCoder-3B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardCoder] 34.8 -- -- --
WizardCoder-1B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardCoder] 23.8 -- -- --
  • [12/19/2023] ๐Ÿ”ฅ We released WizardMath-7B-V1.1 trained from Mistral-7B, the SOTA 7B math LLM, achieves 83.2 pass@1 on GSM8k, and 33.0 pass@1 on MATH.

  • [12/19/2023] ๐Ÿ”ฅ WizardMath-7B-V1.1 outperforms ChatGPT 3.5, Gemini Pro, Mixtral MOE, and Claude Instant on GSM8K pass@1.

  • [12/19/2023] ๐Ÿ”ฅ WizardMath-7B-V1.1 is comparable with ChatGPT 3.5, Gemini Pro, and surpasses Mixtral MOE on MATH pass@1.

  • ๐Ÿ”ฅ Our WizardMath-70B-V1.0 model slightly outperforms some closed-source LLMs on the GSM8K, including ChatGPT 3.5, Claude Instant 1 and PaLM 2 540B.

  • ๐Ÿ”ฅ Our WizardMath-70B-V1.0 model achieves 81.6 pass@1 on the GSM8k Benchmarks, which is 24.8 points higher than the SOTA open-source LLM.

  • ๐Ÿ”ฅ Our WizardMath-70B-V1.0 model achieves 22.7 pass@1 on the MATH Benchmarks, which is 9.2 points higher than the SOTA open-source LLM.

Model Checkpoint Paper GSM8k MATH
WizardMath-7B-V1.1 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardMath] 83.2 33.0
WizardMath-70B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardMath] 81.6 22.7
WizardMath-13B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardMath] 63.9 14.0
WizardMath-7B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardMath] 54.9 10.7
Model Checkpoint Paper MT-Bench AlpacaEval GSM8k HumanEval Demo License
WizardLM-70B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒComing Soon 7.78 92.91% 77.6% 50.6 Llama 2 License
WizardLM-13B-V1.2 ๐Ÿค— HF Link 7.06 89.17% 55.3% 36.6 Demo Llama 2 License
WizardLM-13B-V1.1 ๐Ÿค— HF Link 6.76 86.32% 25.0 Non-commercial
WizardLM-30B-V1.0 ๐Ÿค— HF Link 7.01 37.8 Non-commercial
WizardLM-13B-V1.0 ๐Ÿค— HF Link 6.35 75.31% 24.0 Non-commercial
WizardLM-7B-V1.0 ๐Ÿค— HF Link ๐Ÿ“ƒ [WizardLM] 19.1 Non-commercial

Citation

Please cite the paper if you use the data or code from WizardLM.

@inproceedings{
xu2024wizardlm,
title={Wizard{LM}: Empowering Large Pre-Trained Language Models to Follow Complex Instructions},
author={Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Qingwei Lin and Daxin Jiang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=CfXh93NDgH}
}

Please cite the paper if you use the data or code from WizardCoder.

@inproceedings{
luo2024wizardcoder,
title={WizardCoder: Empowering Code Large Language Models with Evol-Instruct},
author={Ziyang Luo and Can Xu and Pu Zhao and Qingfeng Sun and Xiubo Geng and Wenxiang Hu and Chongyang Tao and Jing Ma and Qingwei Lin and Daxin Jiang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=UnUwSIgK5W}
}

Please cite the paper if you refer to our model or code or data or paper from WizardMath.

@article{luo2023wizardmath,
  title={WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct},
  author={Luo, Haipeng and Sun, Qingfeng and Xu, Can and Zhao, Pu and Lou, Jianguang and Tao, Chongyang and Geng, Xiubo and Lin, Qingwei and Chen, Shifeng and Zhang, Dongmei},
  journal={arXiv preprint arXiv:2308.09583},
  year={2023}
}

โ—To commen concern about dataset:

Recently, there have been clear changes in the open-source policy and regulations of our overall organization's code, data, and models. Despite this, we have still worked hard to obtain opening the weights of the model first, but the data involves stricter auditing and is in review with our legal team . Our researchers have no authority to publicly release them without authorization. Thank you for your understanding.

Hiring

  • ๐Ÿ“ฃ We are looking for highly motivated students to join us as interns to create more intelligent AI together. Please contact [email protected]

Note for model system prompts usage:

To obtain results identical to our demo, please strictly follow the prompts and invocation methods provided in the "src/infer_wizardlm13b.py" to use our model for inference. Our model adopts the prompt format from Vicuna and supports multi-turn conversation.

For WizardLM, the Prompt should be as following:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.</s>USER: Who are you? ASSISTANT: I am WizardLM.</s>......

For WizardCoder , the Prompt should be as following:

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

For WizardMath, the Prompts should be as following:

Default version:

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

CoT Version: ๏ผˆโ—For the simple math questions, we do NOT recommend to use the CoT prompt.๏ผ‰

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: Let's think step by step."

GPT-4 automatic evaluation

We adopt the automatic evaluation framework based on GPT-4 proposed by FastChat to assess the performance of chatbot models. As shown in the following figure, WizardLM-30B achieved better results than Guanaco-65B.

WizardLM

WizardLM-30B performance on different skills.

The following figure compares WizardLM-30B and ChatGPTโ€™s skill on Evol-Instruct testset. The result indicates that WizardLM-30B achieves 97.8% of ChatGPTโ€™s performance on average, with almost 100% (or more than) capacity on 18 skills, and more than 90% capacity on 24 skills.

WizardLM

WizardLM performance on NLP foundation tasks.

The following table provides a comparison of WizardLMs and other LLMs on NLP foundation tasks. The results indicate that WizardLMs consistently exhibit superior performance in comparison to the LLaMa models of the same size. Furthermore, our WizardLM-30B model showcases comparable performance to OpenAI's Text-davinci-003 on the MMLU and HellaSwag benchmarks.

Model MMLU 5-shot ARC 25-shot TruthfulQA 0-shot HellaSwag 10-shot Average
Text-davinci-003 56.9 85.2 59.3 82.2 70.9
Vicuna-13b 1.1 51.3 53.0 51.8 80.1 59.1
Guanaco 30B 57.6 63.7 50.7 85.1 64.3
WizardLM-7B 1.0 42.7 51.6 44.7 77.7 54.2
WizardLM-13B 1.0 52.3 57.2 50.5 81.0 60.2
WizardLM-30B 1.0 58.8 62.5 52.4 83.3 64.2

WizardLM performance on code generation.

The following table provides a comprehensive comparison of WizardLMs and several other LLMs on the code generation task, namely HumanEval. The evaluation metric is pass@1. The results indicate that WizardLMs consistently exhibit superior performance in comparison to the LLaMa models of the same size. Furthermore, our WizardLM-30B model surpasses StarCoder and OpenAI's code-cushman-001. Moreover, our Code LLM, WizardCoder, demonstrates exceptional performance, achieving a pass@1 score of 57.3, surpassing the open-source SOTA by approximately 20 points.

Model HumanEval Pass@1
LLaMA-7B 10.5
LLaMA-13B 15.8
CodeGen-16B-Multi 18.3
CodeGeeX 22.9
LLaMA-33B 21.7
LLaMA-65B 23.7
PaLM-540B 26.2
CodeGen-16B-Mono 29.3
code-cushman-001 33.5
StarCoder 33.6
WizardLM-7B 1.0 19.1
WizardLM-13B 1.0 24.0
WizardLM-30B 1.0 37.8
WizardCoder-15B 1.0 57.3

Call for Feedbacks

We welcome everyone to use your professional and difficult instructions to evaluate WizardLM, and show us examples of poor performance and your suggestions in the issue discussion area. We are focusing on improving the Evol-Instruct now and hope to relieve existing weaknesses and issues in the the next version of WizardLM. After that, we will open the code and pipeline of up-to-date Evol-Instruct algorithm and work with you together to improve it.

Overview of Evol-Instruct

Evol-Instruct is a novel method using LLMs instead of humans to automatically mass-produce open-domain instructions of various difficulty levels and skills range, to improve the performance of LLMs. You can easily embark on your own evolutionary journey with the Evol Script we provide.

WizardLM

WizardLM

Disclaimer

The resources, including code, data, and model weights, associated with this project are restricted for academic research purposes only and cannot be used for commercial purposes. The content produced by any version of WizardLM is influenced by uncontrollable variables such as randomness, and therefore, the accuracy of the output cannot be guaranteed by this project. This project does not accept any legal liability for the content of the model output, nor does it assume responsibility for any losses incurred due to the use of associated resources and output results.

Star History

Star History Chart

wizardlm's People

Contributors

chiyeunglaw avatar flyinghpluo avatar nlpxucan avatar robertmarton avatar victorsungo avatar wenxcs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wizardlm's Issues

Demo 64bit bitwise test minor issue found.

CodeTest.txt
Performs relatively well analyzing this code(much better than starchat). it gets close to figuring out the compression being used but tends to miss slightly on the particulars of what data is being removed when the data is compressed. The answer being every 64bit value at the beginning of the set with all bits set to true. it got close in a run saying every 64bit value set to true but this is too vague to be correct since a 64bit value with just 1 bit set would still satisfy that statement but the compression would in fact leave such a value in.

Code Instruction Data Release

Hi, great work! I was curious if the code_instruction_data is released somewhere or otherwise when it might be released!

Remove "As an AI language model..." from dataset.

The model very often writes "As an AI language model, I do not have opinions", even though the question was factual and did not ask for the model's personal opinion. I think there is too much that stuff in the dataset. Think about removing this when you start training other models.

Missing LICENSE file

I see you have no LICENSE file for this project. The repository mentions Apache-2.0. When the license file is standardized with the full text of the license, the license clearly appears in the About section on the first pageview.

I could make a patch for this if you would like help.

Are you planning WizardLM 65B model?

Are you planning WizardLM 65B model?
Just asking ;)

Noways running such big models is quite easy with the cheap rtx 3090 and llama.cpp for instance - getting 2 tokens/s

Duplications in evol_instruct_70k dataset

Firstly, thanks for publicly releasing the instruction dataset.

While looking through the dataset, I've noticed several examples where the output is already present in the instruction with barely any changes.
Example (index 47821 from the dataset):
Instruction:

Can you provide a table of 6 different tools used for data science along with their descriptions and features?
| Tool | Description | Features |
|------|-------------|----------|
| Python | A high-level programming language used for data analysis and machine learning | Large standard library, easy to learn, supports multiple paradigms |
| R | A programming language and software environment for statistical computing and graphics | Large community, extensive library of statistical and graphical methods, open-source |
| SQL | A domain-specific language used in programming and designed for managing data held in a relational database management system | Easy to learn, widely used, powerful for managing large amounts of structured data |
| Tableau | A data visualization and business intelligence tool used to create interactive dashboards and reports | User-friendly interface, drag-and-drop functionality, ability to connect to multiple data sources |
| Apache Spark | An open-source distributed computing system used for big data processing and machine learning | Fast processing speed, ability to handle large datasets, supports multiple languages |
| TensorFlow | An open-source software library for dataflow and differentiable programming across a range of tasks | Popular for machine learning and deep learning, supports distributed computing, flexible architecture |

Output:

Sure, here's a table of 6 different tools used for data science along with their descriptions and features:
| Tool | Description | Features |
|------|-------------|----------|
| Python | A high-level programming language used for data analysis and machine learning | Large standard library, easy to learn, supports multiple paradigms |
| R | A programming language and software environment for statistical computing and graphics | Large community, extensive library of statistical and graphical methods, open-source |
| SQL | A domain-specific language used in programming and designed for managing data held in a relational database management system | Easy to learn, widely used, powerful for managing large amounts of structured data |
| Tableau | A data visualization and business intelligence tool used to create interactive dashboards and reports | User-friendly interface, drag-and-drop functionality, ability to connect to multiple data sources |
| Apache Spark | An open-source distributed computing system used for big data processing and machine learning | Fast processing speed, ability to handle large datasets, supports multiple languages |
| TensorFlow | An open-source software library for dataflow and differentiable programming across a range of tasks | Popular for machine learning and deep learning, supports distributed computing, flexible architecture |

Indices of some similar samples: 17421, 17926, 24717, 29166, 41705, 61826, 62587

Is this due to a bug during the creation of the dataset? Was the WizardLM model trained with such examples?

CMD closes after I select WizardLM as my model.

When I try to load WizardLM into my CMD it says "Loading wizardLM-7B-GPTQ-4bit-128g... Can't determine model type from model name. Please specify it manually using --model_type argument" and then closes. I do not have these issues with 2 other models that I load.

AssertionError: LLaMA is now in HuggingFace's main branch.

Hey, no matter what I do (I installed the transformers lib from the git repo directly), this error pops up when trying to load the inference code.

Could you please provide a basic example of how to install all dependencies via conda and which ones are required? Thank you very much

Plans for future improvements?

Thank you for providing two excellent models to the open source community. Since you created WizardCoder after WizardLM I wonder: Do you have any plans on optimizing those models further or introduce new ones?

It would be nice if new insights from the Orca paper would be used to improve the same models even further.

Add problem solving to evol-instruct

Description:

The current performance of Evol-Instruct on math, geometry, and physics problem solving is rather poor. To enhance the overall reasoning/basic math capabilities of WizardLM, I believe more high-school level physics, algebra, or geometry problems should be present within the dataset. GPT-4 seems to do mostly fine on them, so it seems doable, though being a smaller model, it would be quite interesting to see how far it can get.

This dataset I found didn't contain much physics questions in particular, which tracks well with hallucinated formulas and inability to reason step by step to find intermediary values.

DeepSpeed Configuration JSON

Please what GPUs did you use to train

Can you please share your deepspeed config json

I can't get finetune to work with the command you gave on your readme.md and with the deepspeed config json in llamax.

I tried 8x 4090 and 4x A100 neither worked

I will need to use the exact hardware and exact hyperparameters and exact deepspeed config file you used.

78k evolved code instructions

Thank you for your wonderful work.
The paper introduces a data volume of 78k, of which 20K comes from Alpaca. Where does the other instruction data come from?

Have you tried LoRA?

It looks like you are doing a full fine-tuning run, have you experimented with LoRA?

Load the model using the transformer and pytorch api

Sorry if this question has been asked and answered but I cannot find anything related to it.

Is there a way to download and use the model through the transformer and pytorch API, just like dolly or vicuna, explained on their model card page like below:
https://huggingface.co/databricks/dolly-v2-3b
https://huggingface.co/CarperAI/stable-vicuna-13b-delta

The WizardLM model card on huggingface (https://huggingface.co/TheBloke/wizardLM-7B-GPTQ) only shows how to configure it for text-generate-webui.

Can anyone shed some light on this please?

Falcon SFT

Great work. I guess the sft dataset can affect the performance of model. Do you make a supervised finetuning using your high-quality data on Falcon-40B? Thanks a lot.

dumb number tokenisation

All modern transformer models has problems with math. that's because they tokenise numbers weirdly.
Your model supposed to be better in math than others. But main reason, why are transformers so bad wasn't fixed.
to fix it properly, you'll need special tokeniser for numbers. But it's too much headache.
It's possible to fix it with less effort. Just remove all tokens with numbers, except single digits. And train model a bit more to correct loss.

can't accurately process text

wizardLM can't receive chunk of text and rewrite it as specified. I use simple prompt template.

TEXT: {book}

DIRECTIVE: read the text and formulate N relevant questions with answers. All questions should be independent and answerable without context. Start questions strictly with "Q:", and answers with "A:". Begin:
### Response 
Q:

It could follow format. But hallucinate questions and answers, which are irrelevant to text.

Over-hyping

Your README claims this model outperforms ChatGPT with "complex tasks." How should we define complex tasks? I asked it to write me shader code for Open GL ES 2.0, and it generated gibberish.

Once models fail a test like that, I like to ask them science questions because most of them will spew garbage (such as the Earth having 5 moons, or Jupiter being 10x larger than it actually is). I asked Wizard via this interface to describe the Moon... so it told me the Moon directly influences human menstrual cycles:

1

I then completely changed the parameters from the default, using temp 0.69, top p 0.5, top k 0, and it basically repeated itself:

2

There is no universe in which ChatGPT would say this.

Please don't misunderstand - I'm by no means trying to degrade your work. It really does seem incredible, and I can't wait to see how this turns out. It kicks ass with some other tests. But claiming you've defeated the industry standard God only harms progress. The Alpaca team did the same and claimed they too had beaten ChatGPT. Please be honest in your README. Perhaps this model fulfilled your exact use case; but the goal of defeating the ChatGPT has not been achieved.

Wizardcoder for code completion?

First of all, thanks for releasing the model. It is way better than anything else that was available up to this point.

I see the prompt used to get Wizardcoder to answer to instructions. I was wondering, can unprompted wizardcoder be used for code completion like base starcoder (e.g. if I processed its output to make it compatible with the HF Vscode extension)

Secondly, does it retain the capability of performing fill in the middle like the original starcoder model? If so, are the special tokens needed the same?

Error loading fine-tuned wizard model

Hi guys,
I have used your fine-tuning script on a custom data set to fine tune the wizardLm model. The training works without problems, the loss decreases and all the relevant model files are stored in the specified output directory.
However I checked the size of the pytorch_model.bin and it is only 623 KB large so I guess the error must be within saving the model. I used the model "TheBloke/wizardLM-7B-HF" from HuggingFace as base model for finetuning.
If I try to load the model for inference using the inference_wizardlm.py script giving it the output_directory specified during training it turns up this error:

'Traceback (most recent call last):
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 132, in
fire.Fire(main)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 121, in main
_output = evaluate(tokenizer, model, instruction)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 57, in evaluate
generation_output = model.generate(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/generation/utils.py", line 1437, in generate
return self.greedy_search(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/generation/utils.py", line 2248, in greedy_search
outputs = self(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 530, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/functional.py", line 2199, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D
'
Where could the problem be?

Clarification in the paper

Hi authors, thank you for releasing code and data for this project. I am confused about the following part in the paper.

For fair comparison, we replace Alpacaโ€™s original Davici-003 response with ChatGPTโ€™s response, and also sample 70K instructions subset to train WizardLM.

(1) In your alpaca baseline, do you use their original data or response generated from ChatGPT?
(2) Since you use the dataset from alpaca as seed dataset, do you change original alpaca response to train WizardLM?
(3) What does the following sentence mean? My understanding is that you sampled 70k data from the full 250k data, which have already included alpaca' seed dataset. Why do you say also sample 70K instructions subset to train WizardLM.?

also sample 70K instructions subset to train WizardLM.

Thank you for your clarification.

errors in inference_wizardcoder.py

When I use the infer command(inference_wizardcoder.py), an error occurs

python src/inference_wizardcoder.py --base_model "bigcode/starcoder" --input_data_path "./input.jsonl" --output_data_path "./result.jsonl"

image

please help me! thankyou

training

thank you for sharing this amazing model
could you please guide me on how I can train this model for another language?

Model used for generating training data?

Hi guys,
what LLM are you using for generating the training data? It is the proprietary ChatGPT 3.5 Turbo right? Have you tried out open-source alternatives for data generation as well? How does the usage of ChatGPT for training data generation affect the use of WizardLLM?

fine-tuning only takes 7% VRAM

I am trying to fine-tune on 8x 4090s

I use this command I copied from the readme.md:

deepspeed --num_gpus 8 --num_nodes 1 train_freeform.py \
    --model_name_or_path /workspace/llama-7b-hf \
    --data_path /workspace/WizardLM_alpaca_evol_instruct_70k_unfiltered/WizardLM_alpaca_evol_instruct_70k_unfiltered.json \
    --output_dir /workspace/WizardLM-7B-unfiltered/ \
    --num_train_epochs 3 \
    --model_max_length 2048 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 800 \
    --save_total_limit 3 \
    --learning_rate 2e-5 \
    --warmup_steps 2 \
    --logging_steps 2 \
    --lr_scheduler_type "cosine" \
    --report_to "tensorboard" \
    --gradient_checkpointing True \
    --deepspeed configs/deepspeed_config.json \
    --fp16 True

I am confused because I don't see any output on stdout, but I see all 8 GPUs at 100% utilization but the VRAM is only 7% utilization. Do I need to modify any of these parameters to use 8x 4090 VRAM? Can you please tell me which ones I need to change and and how to I get it to display the status on stdout?

image

Evol instruct with Dolly

Please confirm if there are any licensing terms with Evol instruct 70k, especially for commercial use

Use OpenAI Evals

Introduction

After releasing GPT-4, OpenAI found that there is a significant lack in standardized benchmarks for how LLMs perform at common tasks. Thus, OpenAI created Evals, an open-source collection of crowdsourced tests for LLMs. Unless there are other projects I'm unaware of, this is the closest to a standardized benchmark for how we've now come to use LLMs.

Why?

  • I believe that it is vital for our open-access LLM ecosystem to use the same standardized testing methodology, allowing us to objectively compare our results.

  • While surveys are good, they aren't trivial to run. Automated evaluation can help measure a model's performance throughout the training process or compare different model sizes, instruction datasets, or iterations.

  • It is also vital to push this kind of robust testing to other models, especially those who didn't have the time or resources to run surveys, and instead opted to ask ChatGPT to "compare" the responses, an obviously flawed approach for many reasons. The ability to objectively compare WizardLM against Vicuna or WizardVicunaLM may, for example, help test if mixing these two works well or if some changes to the dataset generation are required.

Implementation:

I cannot speak much on the matter of implementation, however it is worth mentioning that someone had already used OpenAI Evals on OpenAssistant models. I suppose the implementation would be rather similar.

Failed to recognize model when trying to load it in

I'm having this issue from which my computer can't load in the 7b parameter model. I downloaded the whole folder and tried to run it but kept getting this issue. When I run it I use "--auto-devices --chat --wbit 4 --groupsize 128 --pre_layer 27" and from the interface I load in this mode, and I've also done --model wizardLM-7B-GPTQ-4bit-128g with the same result. I don't think its a memory issue as I've gotten them before and typically they inform you that you couldn't allocate the required space. Here it doesn't give me that prompt but just tells me it can't determine the model type. I've ran other models like the gpt4 x alpaca model so I know I shouldn't be a location issue.

Can't determine model type from model name. Please specify it manually using --model_type argument

Traceback (most recent call last):
File โ€œC:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\text-generation-webui[server.py](http://server.py/)โ€, line 102, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File โ€œC:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\text-generation-webui\modules[models.py](http://models.py/)โ€, line 158, in load_model
model = load_quantized(model_name)
File โ€œC:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.pyโ€, line 147, in load_quantized
exit()
File โ€œC:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\installer_files\env\lib_sitebuiltins.pyโ€, line 26, in call
raise SystemExit(code)
SystemExit: None

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.