imoneoi / openchat Goto Github PK

View Code? Open in Web Editor NEW

5.0K 49.0 379.0 16.61 MB

OpenChat: Advancing Open-source Language Models with Imperfect Data

Home Page: https://openchat.team

License: Apache License 2.0

Python 55.71% Jupyter Notebook 44.29%

large-language-models open-source transformers

openchat's Introduction

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

💻Online Demo | 🤗Huggingface | 📃Paper | 💭Discord

🏆 The Overall Best Performing Open Source 7B Model 🏆
🤖 Outperforms ChatGPT (March) and Grok-1 🤖

OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT - a strategy inspired by offline reinforcement learning.
Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with ChatGPT, even with a 7B model which can be run on a consumer GPU (e.g. RTX 3090).
Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision.

✨ News

[2024/01/06] We released the second update, OpenChat 3.5 0106, further improved coding and overall performance 🏆.
[2023/12/10] We released the first update, OpenChat 3.5 1210, improved coding by 15 points 🚀.
[2023/11/01] We released the OpenChat-3.5-7B model, surpassing ChatGPT on various benchmarks 🔥.
[2023/09/21] We released our paper OpenChat: Advancing Open-source Language Models with Mixed-Quality Data.

[2023/09/03] We released the OpenChat V3.2 SUPER model.
[2023/08/04] We have launched an Online Demo featuring the latest version, OpenChat 3.2.
[2023/07/30] We are thrilled to introduce the OpenChat V3 model series, based on Llama 2, and now available for free for commercial use!
[2023/07/07] We released the OpenChat V2 model series.
[2023/07/01] We released the OpenChat V1 model series.

🏷️ Benchmarks

Model	# Params	Average	MT-Bench	HumanEval	BBH MC	AGIEval	TruthfulQA	MMLU	GSM8K	BBH CoT
OpenChat-3.5-0106	7B	64.5	7.8	71.3	51.5	49.1	61.0	65.8	77.4	62.2
OpenChat-3.5-1210	7B	63.8	7.76	68.9	49.5	48.0	61.8	65.3	77.3	61.8
OpenChat-3.5	7B	61.6	7.81	55.5	47.6	47.4	59.1	64.3	77.3	63.5
ChatGPT (March)*	???B	61.5	7.94	48.1	47.6	47.1	57.7	67.3	74.9	70.1

OpenHermes 2.5	7B	59.3	7.54	48.2	49.4	46.5	57.5	63.8	73.5	59.9
OpenOrca Mistral	7B	52.7	6.86	38.4	49.4	42.9	45.9	59.3	59.1	58.1
Zephyr-β^	7B	34.6	7.34	22.0	40.6	39.0	40.8	39.8	5.1	16.0
Mistral	7B	-	6.84	30.5	39.0	38.0	-	60.1	52.2	-
Open-source SOTA**	13B-70B	61.4	7.71	73.2	49.7	41.7	62.3	63.7	82.3	41.4
			WizardLM 70B	WizardCoder 34B	Orca 13B	Orca 13B	Platypus2 70B	WizardLM 70B	MetaMath 70B	Flan-T5 11B

Evaluation details

*: ChatGPT (March) results are from GPT-4 Technical Report, Chain-of-Thought Hub, and our evaluation.

^: Zephyr-β often fails to follow few-shot CoT instructions, likely because it was aligned with only chat data but not trained on few-shot data.

**: Mistral and Open-source SOTA results are taken from reported results in instruction-tuned model papers and official repositories.

All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions below.

Reproducing benchmarks

Reasoning and Coding:

Note: Please run the following commands at the base directory of this repository.

python -m ochat.evaluation.run_eval --condition "GPT4 Correct" --model openchat/openchat-3.5-0106 --eval_sets coding fs_cothub/bbh fs_cothub/mmlu zs/agieval zs/bbh_mc_orca zs/truthfulqa_orca
python ochat/evaluation/view_results.py
python ochat/evaluation/convert_to_evalplus.py

Then all humaneval code samples are placed in ochat/evaluation/evalplus_codegen. Use the following command to evaluate an individual code sample named samples.jsonl using Docker as a sandbox.

docker run -v $(pwd):/app ganler/evalplus:latest --dataset humaneval --samples samples.jsonl

Mathematical Reasoning:

Note: Please run the following commands at the base directory of this repository.

python -m ochat.evaluation.run_eval --condition "Math Correct" --model openchat/openchat-3.5-0106 --eval_sets fs_cothub/gsm8k zs/math
python ochat/evaluation/view_results.py

MT-Bench:

Please first launch a local API server, then download FastChat and run the following commands.

Note: Due to non-zero temperature and GPT-4 API changes over time, there might be variations in the results.

cd fastchat/llm_judge
python gen_api_answer.py --model openchat-3.5-0106 --max-tokens 4096 --parallel 128 --openai-api-base http://localhost:18888/v1
python gen_judgment.py --model-list openchat-3.5-0106 --parallel 8 --mode single

🎇 Comparison with X.AI Grok

🔥 OpenChat-3.5-0106 (7B) now outperforms Grok-0 (33B) on all 4 benchmarks and Grok-1 (???B) on average and 3/4 benchmarks.

	License	# Param	Average	MMLU	HumanEval	MATH	GSM8k
OpenChat-3.5-0106	Apache-2.0	7B	61.0	65.8	71.3	29.3	77.4
OpenChat-3.5-1210	Apache-2.0	7B	60.1	65.3	68.9	28.9	77.3
OpenChat-3.5	Apache-2.0	7B	56.4	64.3	55.5	28.6	77.3
Grok-0	Proprietary	33B	44.5	65.7	39.7	15.7	56.8
Grok-1	Proprietary	???B	55.8	73	63.2	23.9	62.9

⬇️ Installation

Note

Need pytorch to run OpenChat

pip

pip3 install ochat

Important

If you are facing package compatibility issues with pip, try the conda method below or check this issue

conda

conda create -y --name openchat python=3.11
conda activate openchat

pip3 install ochat

Windows (WSL 1.x, Ubuntu-22.04)

sudo apt update
sudo apt install build-essential

sudo apt install -y curl
curl -o miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash miniconda.sh

# Restart WSL terminal if the following conda command does not work

conda create -y --name openchat python=3.11
conda activate openchat

pip3 install ochat

From source

Clone this repo and install openchat from source in editable mode

git clone https://github.com/imoneoi/openchat
cd openchat

pip3 install --upgrade pip  # enable PEP 660 support
pip3 install -e .  # Editable mode, you can make changes in this cloned repo

🚀 Deploying API server

⚡ Our API server is ready for production use and compatible with the OpenAI API protocol. It is highly optimized with vLLM and can dynamically batch requests.

📎 Note: For 20 series or older GPUs that do not support bfloat16, add --dtype float16 to the server args.

For a single GPU (e.g. RTX 3090, 4090)

python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106

For multiple GPUs (tensor parallel)

# N is the number of tensor parallel GPUs
python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106 --engine-use-ray --worker-use-ray --tensor-parallel-size N

use -h to see more settings

python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106 -h

Deploy as online service

If you want to deploy the server as an online service, you can use --api-keys sk-KEY1 sk-KEY2 ... to specify allowed API keys and --disable-log-requests --disable-log-stats --log-file openchat.log for logging only to a file. For security purposes, we recommend using an HTTPS gateway in front of the server.

Request example

Once started, the server listens at localhost:18888 for requests and is compatible with the OpenAI ChatCompletion API specifications.

💡 Default Mode (GPT4 Correct): Best for coding, chat and general tasks

curl http://localhost:18888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openchat_3.5",
    "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
  }'

🧮 Mathematical Reasoning Mode: Tailored for solving math problems

curl http://localhost:18888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openchat_3.5",
    "condition": "Math Correct",
    "messages": [{"role": "user", "content": "10.3 − 7988.8133 = "}]
  }'

🌐 Web UI - OpenChat-UI

After launching the API server, OpenChat provide user interface that easy to interact with. Click here to check Web UI

🤗 Inference with Transformers

Warning

It's recommended to use our optimized API server for deployment. Inferencing with Transformers will be slower.

💡 Default Mode (GPT4 Correct): Best for coding, chat and general tasks

GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:

🧮 Mathematical Reasoning Mode: Tailored for solving math problems

Math Correct User: 10.3 − 7988.8133=<|end_of_turn|>Math Correct Assistant:

⚠️ Notice: Remember to set <|end_of_turn|> as end of generation token.

The default (GPT4 Correct) template is also available as the integrated tokenizer.chat_template, which can be used instead of manually specifying the template.

🛠️ Training

The OpenChat training system utilizes padding-free training and the Multipack Sampler, achieving a 3~10x speedup compared to the conventional padded training.

Choose a base model

OpenChat supports Llama 2 and Mistral models. Please first choose a base model to fit your needs. Each base model has a corresponding weight repo, model type, and recommended batch size as listed below, they should be filled into BASE_REPO, MODEL_TYPE, and BATCH_SIZE in the following instructions.

Base Model	Size	Weights (with EOT token)	Model Type	Recommended Batch Size per GPU (8xA100 80GB)
Mistral	7B	`imone/Mistral_7B_with_EOT_token`	`openchat_v3.2_mistral`	77824
Llama 2	7B	`imone/LLaMA2_7B_with_EOT_token`	`openchat_v3.2`	77824
Llama 2	13B	`imone/Llama2_13B_with_EOT_token`	`openchat_v3.2`	36864

Note: The OpenChat conversation template requires an <|end_of_turn|> special token. The base model specified must include this token. Our provided weights are the original base weights with this token added. If you want to add them manually, use the convert_llama_weights_to_hf_add_tokens.py or mistral_add_tokens.py in the scripts directory.

Installing DeepSpeed and Flash Attention

First, ensure that the CUDA nvcc compiler is available in your environment. If it is not, install the CUDA toolkit that matches the version used by PyTorch.

Next, install building dependencies:

pip install packaging ninja

Finally, install the packages:

pip install deepspeed flash-attn

Preparing Your Data

To utilize the OpenChat trainer, prepare your SFT data into a JSON Lines format where each line corresponds to a Conversation object:

class Message(BaseModel):
    role: str     # Must be "user" or "assistant"
    content: str  # Message content
    weight: Optional[float] = None  # Loss weight for this message. Typically 0 for user and 1 for assistant to supervise assistant's responses only


class Conversation(BaseModel):
    items: List[Message]  # All messages within the conversation
    condition: str = ""  # C-RLFT condition, can be any string or empty.
    system: str = ""  # System message for this conversation

For basic SFT, assign weight as 0 for human messages and 1 for assistant responses.

SFT example:

{"items":[{"role":"user","content":"Hello","weight":0.0},{"role":"assistant","content":"Hi","weight":1.0},{"role":"user","content":"How are you today?","weight":0.0},{"role":"assistant","content":"I'm fine.","weight":1.0}],"system":""}
{"items":[{"role":"user","content":"Who are you?","weight":0.0},{"role":"assistant","content":"I'm OpenChat.","weight":1.0}],"system":"You are a helpful assistant named OpenChat."}

For C-RLFT, condition should be set as the class the conversation belongs to (e.g. GPT3 or GPT4). The weight is assigned as 0 for human messages and w for assistant responses, where w is the weight of the class (e.g. 0.1 for GPT3 and 1 for GPT4, as found in our C-RLFT paper).

C-RLFT example:

{"items":[{"role":"user","content":"What is C-RLFT?","weight":0.0},{"role":"assistant","content":"C-RLFT is a method for improving open-source LLMs with mixed-quality data.","weight":1.0}],"condition":"GPT4","system":""}
{"items":[{"role":"user","content":"What is C-RLFT?","weight":0.0},{"role":"assistant","content":"I don't know.","weight":0.1}],"condition":"GPT3","system":""}

Pre-tokenizing the Dataset

You'll then need to pre-tokenize the dataset using the command (please specify a filename as PRETOKENIZED_DATA_OUTPUT_PATH to store the pretokenized dataset):

python -m ochat.data.generate_dataset --model-type MODEL_TYPE --model-path BASE_REPO --in-files data.jsonl --out-prefix PRETOKENIZED_DATA_OUTPUT_PATH

Launching the OpenChat Trainer

You can now launch the OpenChat trainer using the command below.

13B model requires eight A/H100s with 80GB VRAM
7B model can be trained with four A/H100s with 80GB VRAM or eight A/H100s with 40GB VRAM.

For hyperparameters, we recommend first setting the batch size to the recommended batch size. If OOM occurs, try setting it to the exact maximum that VRAM can hold and as a multiple of 2048. Other hyperparameters have been carefully selected as the default. Furthermore, the learning rate is automatically determined based on the inverse square-root rule.

Training Commands (click to expand)

NUM_GPUS=8

deepspeed --num_gpus=$NUM_GPUS --module ochat.training_deepspeed.train \
          --model_path BASE_REPO \
          --data_prefix PRETOKENIZED_DATA_OUTPUT_PATH \
          --save_path PATH_TO_SAVE_MODEL \
          --batch_max_len BATCH_SIZE \
          --epochs 5 \
          --save_every 1 \
          --deepspeed \
          --deepspeed_config ochat/training_deepspeed/deepspeed_config.json

You can find checkpoints of all epochs in PATH_TO_SAVE_MODEL. Then you may evaluate each epoch and choose the best one.

Limitations

Foundation Model Limitations

Despite its advanced capabilities, OpenChat is still bound by the limitations inherent in its foundation models. These limitations may impact the model's performance in areas such as:

Complex reasoning
Mathematical and arithmetic tasks
Programming and coding challenges

Hallucination of Non-existent Information

OpenChat may sometimes generate information that does not exist or is not accurate, also known as "hallucination". Users should be aware of this possibility and verify any critical information obtained the model.

Safety

OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.

License

Our OpenChat 3.5 code and models are distributed under the Apache License 2.0.

Models

Model	Size	Context	Weights	Serving
OpenChat 3.5 0106	7B	8192	Huggingface	`python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106 --engine-use-ray --worker-use-ray`
OpenChat 3.5 1210	7B	8192	Huggingface	`python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-1210 --engine-use-ray --worker-use-ray`
OpenChat 3.5	7B	8192	Huggingface	`python -m ochat.serving.openai_api_server --model openchat/openchat_3.5 --engine-use-ray --worker-use-ray`

Legacy Models

The following models are older versions of OpenChat and have inferior performance compared to the latest version. They will be deprecated in the next release. Please note that OpenChat V1 and V2 series are now deprecated, please install 3.1.x for using V1 and V2 models

To run the models on multiple GPUs with smaller VRAM, you can enable tensor parallelization, for example, using the --tensor-parallel-size 2 flag.

Model	Size	Context	Weights	Serving
OpenChat 3.2 SUPER	13B	4096	Huggingface	`python -m ochat.serving.openai_api_server --model openchat/openchat_v3.2_super --engine-use-ray --worker-use-ray`

Citation

@article{wang2023openchat,
  title={OpenChat: Advancing Open-source Language Models with Mixed-Quality Data},
  author={Wang, Guan and Cheng, Sijie and Zhan, Xianyuan and Li, Xiangang and Song, Sen and Liu, Yang},
  journal={arXiv preprint arXiv:2309.11235},
  year={2023}
}

💌Contact

Project Lead:

Guan Wang [imonenext at gmail dot com]
Alpay Ariyak [aariyak at wpi dot edu]

Main Contributors:

Sijie Cheng [csj23 at mails dot tsinghua dot edu dot cn]
Xianyuan Zhan (Tsinghua University)
Qiying Yu (Tsinghua University)
Changling Liu (GPT Desk Pte. Ltd.)
LDJ
AutoMeta (Alignment Lab AI)

Sponsors:

Sen Song (Tsinghua University)
Yang Liu (Tsinghua University)
01.AI Company
RunPod

Special Thanks:

openchat's People

Contributors

Stargazers

Watchers

Forkers

hbcbh1999 mevengue realsrisri haikuoxin chengfai pratick-at willbaldlygo lrochetta techthiyanes pingmuict rubenszimbres apollohuang1 youssefsultan charliegreenman karthik666manjunath baris-unver mz0in librty tomaarsen belram9966 lherrera stjordanis willnco chiennv2000 techsuni2023 worthmining ldsxp enockipp gmpdtd95 suryatmodulus gandalf012 tempaccountnull tacticerx raahulrawat varunvummadi kartikgc9 coinhubx dansonc lplzyp michael-jiahao-zhang tobiahrex ggbound maxwellamaral aubella uijnn jknkh monshyvcrepze fdsfsww 3plormeabho quisuccchanback helloitu mayufei-npu aleandrokor prathvik 1fratimgradna kmlllb 0clarflicciedzu 8tianaclinn shaofengz035 eltociear centaurioun 0guirizinji 1sumprysfdistku perposaitni 9phirerompte to-be-architect acidark 9provinqmuzu 3aninfterpza sceldiarata tisleotranmo isysodiszu isepdesce pomramosska 1lyples0quesu zhangdahai112 joshuayan 9tidaeadextbu danielowji alfasignde 8tracceltheofu provintiayu darioprovsa smithjamesr tranhuynh87 0provex0theomo quimurintsu terpfeflagda merajat haulembadzu 1cactagpinko longshaoduan soon14 crysex0inma p0tatooo 9riobiogramsu 1confcultranmu gewang371 puncfiphausu allenxiao95

openchat's Issues

Does openchat support LORA finetune?

crash in VLLM

Trying to install it to NVidia's pytorch contaner. I'm getting this while running.
Same issue while trying to install it to Lambda GPU cloud on H100 instance. (all default)

root@0971a018b7ec:/workspace/openchat# python -m ochat.serving.openai_api_server --model_type openchat_v2 --model openchat/openchat_v2_w --engine-use-ray --worker-use-ray
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/openchat/ochat/serving/openai_api_server.py", line 21, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs
  File "/usr/local/lib/python3.10/dist-packages/vllm/__init__.py", line 4, in <module>
    from vllm.engine.async_llm_engine import AsyncLLMEngine
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 7, in <module>
    from vllm.engine.llm_engine import LLMEngine
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 16, in <module>
    from vllm.worker.worker import Worker
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 8, in <module>
    from vllm.model_executor import get_model, InputMetadata, set_random_seed
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/__init__.py", line 2, in <module>
    from vllm.model_executor.model_loader import get_model
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 9, in <module>
    from vllm.model_executor.models import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/__init__.py", line 1, in <module>
    from vllm.model_executor.models.bloom import BloomForCausalLM
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/bloom.py", line 31, in <module>
    from vllm.model_executor.layers.activation import get_act_fn
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/activation.py", line 5, in <module>
    from vllm import activation_ops
ImportError: /usr/local/lib/python3.10/dist-packages/vllm/activation_ops.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Not able to run openchat v1 through huggingface, not able to make correct use of conversation templates

Can someone please help me with using https://huggingface.co/openchat/openchat locally?
I am not able to make use of the conversation template, it is always showing "Killed".
Can anyone give me an example of usage or a python script perhaps to use it?

Reproduce Training Data

To reproduce the training data, we need ShareGPT htmls as stated in
The input folder should contain a ShareGPT folder with .html files for each ShareGPT conversation page inside.

It seems that the best ShareGPT source I can find online is here. However, it doesn't give model information and thus we have no way to filter for GPT4 responses.

Any pointers or hints on how to get GPT4 responses would be appreciated!

Openchat3.5 training data

Congrats for the V3.5 release!
May I ask if there are plans to release your finetuning data, just like what you have been always doing with your previous release?

Do I need to handle the chat history when using the curl example?

When I use the curl http://localhost:18888/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "openchat_v3.2",
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
}', how do I let the model remember the chat history or the context?

If we want to use our own data, how should we construct it?

As described in the title

So this repo just proved that AlpacaEval and Vicuna GPT-4 evals are bad. What's next?

[2023/07] We released the OpenLLMs model series. Among them, OpenChat obtains 80.9% win-rate on AlpacaEval and 105% ChatGPT performance on Vicuna GPT-4 evaluation.

Are you saying your model is generally better than a 10x bigger model?

If not, what is the plan to fix metrics so they show the expected ranking?

Troubleshooting OpenChat and Non-English Data in the Dataset

Great job! I have a few questions:

I'm using the following script to test OpenChat, but even with the correct prompt template, the output is not very accurate. How should I modify the testing code?

tokenizer = LlamaTokenizer.from_pretrained(args.model_name_or_path, fast_tokenizer=False)
model = create_hf_model(AutoModelForCausalLM, args.model_name_or_path, tokenizer, None)
prompt = "<s>Human:  What are all the pairs of natural numbers which sum to 6?<|end_of_turn|>Assistant: "
generation_config = GenerationConfig(max_new_tokens=2048,num_beams=1,do_sample=True,temperature=0.7,top_p=0.9)
generate_ids = model.generate(input_ids=inputs.input_ids,generation_config=generation_config,)
response = tokenizer.batch_decode(generate_ids,skip_special_tokens=True,clean_up_tokenization_spaces=False)[0]
print(response)

output:

I noticed that there is a lot of non-English data in the dataset, which obviously wouldn't benefit the performance of alpaca_eval. Why not filter out this data?
https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/blob/main/openchat.train.text.json

discord invite in readme expired

I wanted to ask about the tokenizer. I quantized the model with the MLC framework and I noticed that the model never generates token 32000 to indicate end of turn, rather it generates the string <|end_of_turn|> as a sequence of tokens. Not what I expected. I don't know if it's a usage issue on my part.

How to user llama_convert_and_add_eot_token.py script?

Before training the model, do I need to use the llama_convert_and_add_eot_token.py script first, and how should I use it?

Thanks!

Online demo down?

I used the demo yesterday. But I can't use it now. Is it down?

Adding conversion scripts for open llama models

Is this something you think is valuable add to your project?

I modified the script to support open llama models (only supports 3B at the moment): https://gist.github.com/l3utterfly/9f5a2d7d6415d20bf3d89d915f1661bb

If you think it's worth, I can clean up the code and do a pull request?

Question about the data source for Openchat-v3.2-super

Hello,

I am reaching out to inquire about the data source used for training Openchat-v3.2-super. Could you please clarify if the dataset openchat/openchat_sharegpt_v3 that was used for its training originates from RyokoAI/ShareGPT52K? Additionally, I would like to know the approximate time frame for the data collection, i.e., up to which date was the data collected?

Thank you for your time and consideration. I look forward to your response.

Does '~80K cleaned ShareGPT data' refer to ' sharegpt_clean.json' files in the 'openchat/openchat_sharegpt4_dataset'?

And can you describe the details of a conditioning strategy and weighted loss?
Thanks!

how to download the dataset?

when i click on this link: https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset

It says no able to see the data:

Can not reproduce the alpaca_eval results of openchat v1

I downloaded the provided openchat v1 model on huggingface through this model name 'openchat/openchat' and I use your model to predict the 805 evaluation queries of alpaca-eval and I can only get a win rate over davince-003 around 70 which is far from your reported number. The alpaca-eval is verified as having no bugs since I can reproduce the scores of other LLMs.

FYI, I set the query template to " Human: {query} <|end_of_turn|> Assistant: " and I am using top_p sampling with top_p=1.0, temperature=0.7 and the maximum overall token length to 2048, which are consistent with the configs from this link: https://github.com/tatsu-lab/alpaca_eval/blob/main/src/alpaca_eval/models_configs/openchat-13b/configs.yaml
I also find out that the prompt from https://github.com/tatsu-lab/alpaca_eval/blob/main/src/alpaca_eval/models_configs/openchat-13b/prompt.txt is not consistent with the huggingface model and the training data provided.

Can you kindly explain this performance discrepancy? Or maybe can you provide a script for openchat inference?

What improvements have been made for v3.2 Super model?

Thank you for releasing the new model, I would like to know what improvements have been made to this Super model? Thanks

能不能出一个中文说明？

虽然英文也能看的懂，但是**用户还是很多的。

{ cause: [Error: AggregateError] }

Hi, i got this error and i cant find information about it:

C:\Users\xxxxxxxxxx\xxxxxxxxx\xxxxxxxx\openchat\openchat-ui>npm run dev

[email protected] dev
next dev

▲ Next.js 13.5.6

Local: http://localhost:3000
Environments: .env.local

✓ Ready in 3.3s
✓ Compiled / in 1387ms (1682 modules)
✓ Compiled in 325ms (1682 modules)
✓ Compiled /api/models in 138ms (70 modules)
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }

Can NOT reproduce the alpaca_eval results of OpenChat V3.1 13B

Hi! Really appreciate your work and open source effort! And openchat is a really great model.
However, I can not reproduce the alpaca_eval results of OpenChat V3.1 13B. I just download the model_outputs.json you uploaded in alpaca_eval repo, and test this using my own gpt-4 API,
however, the winrate is 84.41,
which is not as high as you claimed in the leaderboard.
So can you reveal more details for your evaluation?
Many thanks!

Does openchat use a prompt template?

Great Works!

It seems that openchat.train.json does not utilize a prompt template like what alpaca-lora does.

Do you make experiments about using a prompt template? Will that be better or not?

Thank you!

openchat 3.5 log

The logs print token ids. it's useless and unreadable for human.

AssertionError: pydantic.dataclasses.dataclass only supports init=False

i tried everything, from using docker (gives error about vllm) to venv and conda env, this is the last error i get, do you guys have idea what should i do?

File "/home/user/miniconda3/envs/venv/lib/python3.11/site-packages/pydantic/dataclasses.py", line 139, in dataclass
assert init is False, 'pydantic.dataclasses.dataclass only supports init=False'
^^^^^^^^^^^^^
AssertionError: pydantic.dataclasses.dataclass only supports init=False

[Question] Can OpenChat be instruct-tuned for further downstream tasks?

Is it possible to further do instruction tuning on OpenChat with domain specific data? If so, is there any boilerplate that can be used as a starting point. I had earlier fine-tuned LLama-2 on my dataset with trl-sft script, and another try with llama-recipes boilerplate. The time taken by both scripts varied greatly(3x), including the tokenization process and other parameters. The final model however didn't perform well at all with weird and abrupt answers. Therefore, I'm hoping to get some insights if using openchat(or vicuna/wizardlm/llama2-chat) might make a difference?

Thank you for your response.

Installation is a mess, instructions are a mess

ChatGPT helped remove all the swear words.

I've encountered difficulties trying to set this up on Ubuntu, MacOS, and Windows. I've noticed some inconsistencies in the instructions, and it seems some tools and libraries might be either too new or outdated. It would be greatly appreciated if these issues could be addressed to make the project more user-friendly for everyone. Thank you.

CUDA out of memory on 8xA100 GPUs

Tried running the sample training script on 8xA100 GPUs. Used the sharegpt_v3.2 dataset recommended in your ReadMe.

I got this error: CUDA out of memory. Tried to allocate 688.00 MiB (GPU 1; 39.39 GiB total capacity; 37.95 GiB already allocated; 633.12 MiB free; 38.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

What setting did you use to train? I tried setting PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 as a test, still ran into the same out of memory error.

What's the reason for deleting the llama_convert_and_add_eot_token script?

I feel it's a very valuable script to help people get started training their own models using the openchat method.

Purpose of the loss weighting

Hi ! Great work :)

I have a question regarding the loss weighting implementing in the repository. Do I understand it correctly that you assign a lower weights to tokens from the longer sequences, so that each sequence contributes more or less the same to the training, irrespective of its length ?

Regards

how's the performance on Chinese?

Detailed Training setting

Hi, may you provide the detailed hyper-paramters when you training llama-13b? For example, how many and what kind of GPUs you use, what are the gradient accumulation steps and batch size per GPU? Moreover, when I directly use your deepspeed config setting to deepspeed-initialize a llama-7b on an 80G A100, the server reports CUDA OOM error.

Looking forward to your reply.

Thank you so much!

Couldn't inference with gpu

When I load the model and perform inference using the Hugging Face framework, I noticed that although the model is loaded into GPU memory, the GPU usage remains at 0% while the CPU usage is at 100%. Here is the code:
def load_openchat_model(model_path:str,device_map):
model = LlamaForCausalLM.from_pretrained(
model_path,
load_in_8bit=False,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
)
model.to("cuda:0")
model.eval()
return model

inference code:
def infer_hf(input_text:str,model,tokenizer,device):
generation_config = dict(
temperature=0.8,
top_k=40,
top_p=0.9,
do_sample=True,
num_beams=1,
repetition_penalty=1.3,
max_new_tokens=400,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
)
with torch.inference_mode():
input_ids = tokenizer(input_text, return_tensors="pt")
generation_output = model.generate(
input_ids=input_ids["input_ids"].to(device),
attention_mask=input_ids['attention_mask'].to(device),
**generation_config
)
s = generation_output[0]
output = tokenizer.decode(s)
print(output)
I set device to "cuda:0"

Which flash attention version is being used?

I tried running the training script and got flash_attn_unpadded_func is not defined. Doing some digging, apparently it's deprecated in 2.0: https://github.com/Dao-AILab/flash-attention/blob/d30f2e1cd50185c98ed88c0684b4a603f15bee37/README.md?plain=1#L127

Is upgrading to flash-attn to 2.0 trivial (simply renaming some functions)? I'm not familiar with this project so can't say. If it's difficult, perhaps adding documentation somewhere specifying flash-attn 1.x is being used will be helpful for newcomers.

did u ever done the overfitting check on openchat v1.0

although your file has split the dataset into the training set and validation set, it seems that training for 5 epochs with 6k samples will encounter overfitting.

Can you please update requirements.txt file with versions

I am getting different errors. For example
AssertionError: pydantic.dataclasses.dataclass only supports init=False

I have to downgrade pydantic version. It would be great if you can add all packages versions

What is the difference between these versions of OpenChat?

Thank you for your amazing work! I have some questions below:

In alpaca eval leaderboard, there are 5 versions: OpenChatV3.1, OpenChatV2-W, OpenChatV2, OpenChat, OpenChat8192.
What is the difference?
And what is the datasets used?
What is the difference between openchat_shareGPT_v3 and openchat_shareGPT4?
Which datasets do you use for OpenChatV3.1?

Looking forward to your reply.

openchat3.5 training data formatting

Congrats to the authors on the great achievement!

Trying to understand your great work a bit more. In the inference examples, there are prompts like GPT4 Correct User, Code User. What are other conditional prompts used in training? What does Correct mean here? Thanks!

Contact

Is there any way to contact you? I want to work with you and I have a proposal.

Does the training code support open_llama_3b_v2

I tried training with sharegpt_v3.2 dataset, and it gives lots of weird errors.

docker gets undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

openchat-openchat-server-1 | Traceback (most recent call last): openchat-openchat-server-1 | File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main openchat-openchat-server-1 | return _run_code(code, main_globals, None, openchat-openchat-server-1 | File "/usr/lib/python3.10/runpy.py", line 86, in _run_code openchat-openchat-server-1 | exec(code, run_globals) openchat-openchat-server-1 | File "/ochat/serving/openai_api_server.py", line 29, in <module> openchat-openchat-server-1 | from ochat.config.model_config import MODEL_CONFIG_MAP openchat-openchat-server-1 | File "/ochat/config/model_config.py", line 7, in <module> openchat-openchat-server-1 | import ochat.models openchat-openchat-server-1 | File "/ochat/models/__init__.py", line 1, in <module> openchat-openchat-server-1 | from ochat.models.unpadded_llama import LlamaForCausalLM openchat-openchat-server-1 | File "/ochat/models/unpadded_llama.py", line 31, in <module> openchat-openchat-server-1 | from transformers.modeling_utils import PreTrainedModel openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 88, in <module> openchat-openchat-server-1 | from accelerate import dispatch_model, infer_auto_device_map, init_empty_weights openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/__init__.py", line 3, in <module> openchat-openchat-server-1 | from .accelerator import Accelerator openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 35, in <module> openchat-openchat-server-1 | from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/checkpointing.py", line 24, in <module> openchat-openchat-server-1 | from .utils import ( openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 136, in <module> openchat-openchat-server-1 | from .launch import ( openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/launch.py", line 33, in <module> openchat-openchat-server-1 | from ..utils.other import is_port_in_use, merge_dicts openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/other.py", line 27, in <module> openchat-openchat-server-1 | from .transformer_engine import convert_model openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/transformer_engine.py", line 21, in <module> openchat-openchat-server-1 | import transformer_engine.pytorch as te openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/__init__.py", line 6, in <module> openchat-openchat-server-1 | from .module import LayerNormLinear openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module.py", line 20, in <module> openchat-openchat-server-1 | import transformer_engine_extensions as tex openchat-openchat-server-1 | ImportError: /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

seems like there is some sort of cuda version error?

Issue with `pip3 install ochat`

I'm using a Windows machine, and I've been following the instructions outlined in this answer:: #41 (comment)

Everything went smoothly until I reached the step of running pip3 install ochat, where I encountered an error.

Here's the error message I'm getting:

error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [15 lines of output]
      test.c
      LINK : fatal error LNK1181: cannot open input file 'aio.lib'
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "C:\Users\hasans\AppData\Local\Temp\pip-install-22ncvin1\deepspeed_d1d81ae59ce344d3a308adf94757a6b8\setup.py", line 
165, in <module>
        File "C:\Users\hasans\AppData\Local\Temp\pip-install-22ncvin1\deepspeed_d1d81ae59ce344d3a308adf94757a6b8\setup.py", line 
51, in abort
          assert False, msg
      AssertionError: Unable to pre-compile async_io
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables 
to where it can be found.
       [WARNING]  One can disable async_io with DS_BUILD_AIO=0
       [ERROR]  Unable to pre-compile async_io
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
PS C:\Users\hasans\Documents\openchat> pip install libaio-devel
ERROR: Could not find a version that satisfies the requirement libaio-devel (from versions: none)
ERROR: No matching distribution found for libaio-devel

Could someone please guide me on how to resolve this issue? Your assistance would be greatly appreciated!

Thank you for your time and effort in maintaining this amazing project.

Can I use system prompt when training?

Hi @imoneoi , I want my assistant to have different emotions or it can act as someone based on system prompt. So when training can I use data samples of the form {system_prompt} Human: {human_message} <|end_of_turn|>Assistant: {assistant_message} .... or such prompts are used in {human_message} like data samples you trained? Thank you!
Example: <s>Human: Act as SEO expert. I want you to create the best meta descriptions among my competitors.\n\nHere are the list of our competitor's meta descriptions. \n\n\nHere is my meta description. Revise it. I don't want NFT in my description. I do not offer any staking service. \n\nBuy and sell the world's largest selection of 10,000+ Cryptocurrencies<|end_of_turn|>Assistant: ....
Convert to <s>Act as SEO expert. I want you to create the best meta descriptions among my competitors <|end_of_turn|>Human: Here are the list of our competitor's meta descriptions. \n\n\nHere is my meta description. Revise it. I don't want NFT in my description. I do not offer any staking service. \n\nBuy and sell the world's largest selection of 10,000+ Cryptocurrencies<|end_of_turn|>Assistant: ....

An error in training

I have installed flash-attn using pip3 install --no-build-isolation "flash-attn<2"
But an error emerges:

File "openchat/ochat/models/unpadded_llama.py", line 184, in forward
    attn_output = flash_attn_varlen_func(
                  ^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'flash_attn_varlen_func' is not defined

Error Installing requirements.txt: ModuleNotFoundError for 'torch'

Hi, I trying to install your requirement.txt but getting this error message:

Getting requirements to build wheel ... error
ERROR: Command errored out with exit status 1:
command: /root/miniconda3/bin/python /root/miniconda3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpt4c_a3i6
cwd: /tmp/pip-install-_op0fkvy/flash-attn_32ecdb534ca149cebac1b8d1956665eb
Complete output (15 lines):
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 280, in <module>
main()
File "/root/miniconda3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/root/miniconda3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 114, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-0d5h4yvd/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-0d5h4yvd/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-0d5h4yvd/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "<string>", line 13, in <module>
ModuleNotFoundError: No module named 'torch'

It seems to be unable to find the module 'torch'. However, I have verified that torch is installed in my environment, with version 1.11.0+cu113 and torchvision version 0.12.0+cu113.

I have also tried to install the requirements on a different machine where torch version 2.0.0+cu117 is installed, but the error persists.

Any assistance to resolve this issue would be greatly appreciated. Thank you.

OpenCoderPlus outputs will not stop

I tried to launch OpenCoderPlus with the latest code of this repo and vLLM:

python -m ochat.serving.openai_api_server --model-type opencoder --model openchat/opencoderplus

It can work, but the outputs will never stop util hitting the max_tokens limit, even if I pass the stop parameter:

requests.post(
  "http://localhost:18888/v1/chat/completions",
  json={
    "model": "opencoder",
    "messages": [{"rule": "user", "content": "Write a bubble sort."}],
    "stop": ["<|end_of_turn|>"]
  }
)

I refered to OpenCoderPlus's training data, it seems that this model is training on data with the <|end_of_turn|> character.

So does anyone know how to stop this model's outputs? Any help will be appreciated.

conversation issue

感谢分享，请问有微信交流群吗

openchat_v3.2_super deployment to SageMaker doesn't work

Hi team,

I would like to deploy new model to AWS Sagemaker with below code and getting RuntimeError: weight model.layers.0.self_attn.rotary_emb.inv_freq does not exist seems something is missing in the model index. At least, I couldn't find it in https://huggingface.co/openchat/openchat_v3.2_super/blob/main/pytorch_model.bin.index.json

Thanks in advance for your help!

Here is the deploy.py

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='role-name')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'openchat/openchat_v3.2_super',
	'SM_NUM_GPUS': json.dumps(4)
}



# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	image_uri=get_huggingface_llm_image_uri("huggingface",version="0.9.3"),
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1,
	instance_type="ml.g5.12xlarge",
    endpoint_name="openchat-v3-2-super",
	container_startup_health_check_timeout=600,
  )
  
# send request
predictor.predict({
	"inputs": "My name is Julien and I like to",
})

LIMA pipeline

Do you have a pipeline script from which you reduced the 90K data to 6K based on LIMA?

Met error of CUDA out of memory. How to split the model into multiple GPUs?

When loading the checkpoint, it comes out:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 15.90 GiB total capacity; 15.30 GiB already allocated; 31.81 MiB free; 15.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

gradio demo

would be great to setup a gradio demo for this on huggingface, similar to https://huggingface.co/spaces/mosaicml/mpt-30b-chat, this is the guide: https://huggingface.co/docs/hub/spaces-sdks-gradio