Git Product home page Git Product logo

openchat's Introduction

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

💻Online Demo | 🤗Huggingface | 📃Paper | 💭Discord

🏆 The Overall Best Performing Open Source 7B Model 🏆
🤖 Outperforms ChatGPT (March) and Grok-1 🤖

  • OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT - a strategy inspired by offline reinforcement learning.
  • Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with ChatGPT, even with a 7B model which can be run on a consumer GPU (e.g. RTX 3090).
  • Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision.

DOI

✨ News

Read more

🏷️ Benchmarks

Model # Params Average MT-Bench HumanEval BBH MC AGIEval TruthfulQA MMLU GSM8K BBH CoT
OpenChat-3.5-0106 7B 64.5 7.8 71.3 51.5 49.1 61.0 65.8 77.4 62.2
OpenChat-3.5-1210 7B 63.8 7.76 68.9 49.5 48.0 61.8 65.3 77.3 61.8
OpenChat-3.5 7B 61.6 7.81 55.5 47.6 47.4 59.1 64.3 77.3 63.5
ChatGPT (March)* ???B 61.5 7.94 48.1 47.6 47.1 57.7 67.3 74.9 70.1
OpenHermes 2.5 7B 59.3 7.54 48.2 49.4 46.5 57.5 63.8 73.5 59.9
OpenOrca Mistral 7B 52.7 6.86 38.4 49.4 42.9 45.9 59.3 59.1 58.1
Zephyr-β^ 7B 34.6 7.34 22.0 40.6 39.0 40.8 39.8 5.1 16.0
Mistral 7B - 6.84 30.5 39.0 38.0 - 60.1 52.2 -
Open-source SOTA** 13B-70B 61.4 7.71 73.2 49.7 41.7 62.3 63.7 82.3 41.4
WizardLM 70B WizardCoder 34B Orca 13B Orca 13B Platypus2 70B WizardLM 70B MetaMath 70B Flan-T5 11B
Evaluation details *: ChatGPT (March) results are from GPT-4 Technical Report, Chain-of-Thought Hub, and our evaluation.

^: Zephyr-β often fails to follow few-shot CoT instructions, likely because it was aligned with only chat data but not trained on few-shot data.

**: Mistral and Open-source SOTA results are taken from reported results in instruction-tuned model papers and official repositories.

All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions below.

Reproducing benchmarks

Reasoning and Coding:

Note: Please run the following commands at the base directory of this repository.

python -m ochat.evaluation.run_eval --condition "GPT4 Correct" --model openchat/openchat-3.5-0106 --eval_sets coding fs_cothub/bbh fs_cothub/mmlu zs/agieval zs/bbh_mc_orca zs/truthfulqa_orca
python ochat/evaluation/view_results.py
python ochat/evaluation/convert_to_evalplus.py

Then all humaneval code samples are placed in ochat/evaluation/evalplus_codegen. Use the following command to evaluate an individual code sample named samples.jsonl using Docker as a sandbox.

docker run -v $(pwd):/app ganler/evalplus:latest --dataset humaneval --samples samples.jsonl

Mathematical Reasoning:

Note: Please run the following commands at the base directory of this repository.

python -m ochat.evaluation.run_eval --condition "Math Correct" --model openchat/openchat-3.5-0106 --eval_sets fs_cothub/gsm8k zs/math
python ochat/evaluation/view_results.py

MT-Bench:

Please first launch a local API server, then download FastChat and run the following commands.

Note: Due to non-zero temperature and GPT-4 API changes over time, there might be variations in the results.

cd fastchat/llm_judge
python gen_api_answer.py --model openchat-3.5-0106 --max-tokens 4096 --parallel 128 --openai-api-base http://localhost:18888/v1
python gen_judgment.py --model-list openchat-3.5-0106 --parallel 8 --mode single

🎇 Comparison with X.AI Grok

🔥 OpenChat-3.5-0106 (7B) now outperforms Grok-0 (33B) on all 4 benchmarks and Grok-1 (???B) on average and 3/4 benchmarks.

License # Param Average MMLU HumanEval MATH GSM8k
OpenChat-3.5-0106 Apache-2.0 7B 61.0 65.8 71.3 29.3 77.4
OpenChat-3.5-1210 Apache-2.0 7B 60.1 65.3 68.9 28.9 77.3
OpenChat-3.5 Apache-2.0 7B 56.4 64.3 55.5 28.6 77.3
Grok-0 Proprietary 33B 44.5 65.7 39.7 15.7 56.8
Grok-1 Proprietary ???B 55.8 73 63.2 23.9 62.9

⬇️ Installation

Note

Need pytorch to run OpenChat

pip

pip3 install ochat

Important

If you are facing package compatibility issues with pip, try the conda method below or check this issue

conda

conda create -y --name openchat python=3.11
conda activate openchat

pip3 install ochat

Windows (WSL 1.x, Ubuntu-22.04)

sudo apt update
sudo apt install build-essential

sudo apt install -y curl
curl -o miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash miniconda.sh

# Restart WSL terminal if the following conda command does not work

conda create -y --name openchat python=3.11
conda activate openchat

pip3 install ochat

From source

Clone this repo and install openchat from source in editable mode
git clone https://github.com/imoneoi/openchat
cd openchat

pip3 install --upgrade pip  # enable PEP 660 support
pip3 install -e .  # Editable mode, you can make changes in this cloned repo

🚀 Deploying API server

⚡ Our API server is ready for production use and compatible with the OpenAI API protocol. It is highly optimized with vLLM and can dynamically batch requests.

📎 Note: For 20 series or older GPUs that do not support bfloat16, add --dtype float16 to the server args.

For a single GPU (e.g. RTX 3090, 4090)

python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106

For multiple GPUs (tensor parallel)

# N is the number of tensor parallel GPUs
python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106 --engine-use-ray --worker-use-ray --tensor-parallel-size N

use -h to see more settings

python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106 -h
Deploy as online service

If you want to deploy the server as an online service, you can use --api-keys sk-KEY1 sk-KEY2 ... to specify allowed API keys and --disable-log-requests --disable-log-stats --log-file openchat.log for logging only to a file. For security purposes, we recommend using an HTTPS gateway in front of the server.

Request example

Once started, the server listens at localhost:18888 for requests and is compatible with the OpenAI ChatCompletion API specifications.

💡 Default Mode (GPT4 Correct): Best for coding, chat and general tasks

curl http://localhost:18888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openchat_3.5",
    "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
  }'

🧮 Mathematical Reasoning Mode: Tailored for solving math problems

curl http://localhost:18888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openchat_3.5",
    "condition": "Math Correct",
    "messages": [{"role": "user", "content": "10.3 − 7988.8133 = "}]
  }'

🌐 Web UI - OpenChat-UI

After launching the API server, OpenChat provide user interface that easy to interact with. Click here to check Web UI

🤗 Inference with Transformers

Warning

It's recommended to use our optimized API server for deployment. Inferencing with Transformers will be slower.

💡 Default Mode (GPT4 Correct): Best for coding, chat and general tasks

GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:

🧮 Mathematical Reasoning Mode: Tailored for solving math problems

Math Correct User: 10.3 − 7988.8133=<|end_of_turn|>Math Correct Assistant:

⚠️ Notice: Remember to set <|end_of_turn|> as end of generation token.

The default (GPT4 Correct) template is also available as the integrated tokenizer.chat_template, which can be used instead of manually specifying the template.

🛠️ Training

The OpenChat training system utilizes padding-free training and the Multipack Sampler, achieving a 3~10x speedup compared to the conventional padded training.

Choose a base model

OpenChat supports Llama 2 and Mistral models. Please first choose a base model to fit your needs. Each base model has a corresponding weight repo, model type, and recommended batch size as listed below, they should be filled into BASE_REPO, MODEL_TYPE, and BATCH_SIZE in the following instructions.

Base Model Size Weights (with EOT token) Model Type Recommended Batch Size per GPU (8xA100 80GB)
Mistral 7B imone/Mistral_7B_with_EOT_token openchat_v3.2_mistral 77824
Llama 2 7B imone/LLaMA2_7B_with_EOT_token openchat_v3.2 77824
Llama 2 13B imone/Llama2_13B_with_EOT_token openchat_v3.2 36864

Note: The OpenChat conversation template requires an <|end_of_turn|> special token. The base model specified must include this token. Our provided weights are the original base weights with this token added. If you want to add them manually, use the convert_llama_weights_to_hf_add_tokens.py or mistral_add_tokens.py in the scripts directory.

Installing DeepSpeed and Flash Attention

First, ensure that the CUDA nvcc compiler is available in your environment. If it is not, install the CUDA toolkit that matches the version used by PyTorch.

Next, install building dependencies:

pip install packaging ninja

Finally, install the packages:

pip install deepspeed flash-attn

Preparing Your Data

To utilize the OpenChat trainer, prepare your SFT data into a JSON Lines format where each line corresponds to a Conversation object:

class Message(BaseModel):
    role: str     # Must be "user" or "assistant"
    content: str  # Message content
    weight: Optional[float] = None  # Loss weight for this message. Typically 0 for user and 1 for assistant to supervise assistant's responses only


class Conversation(BaseModel):
    items: List[Message]  # All messages within the conversation
    condition: str = ""  # C-RLFT condition, can be any string or empty.
    system: str = ""  # System message for this conversation

For basic SFT, assign weight as 0 for human messages and 1 for assistant responses.

SFT example:

{"items":[{"role":"user","content":"Hello","weight":0.0},{"role":"assistant","content":"Hi","weight":1.0},{"role":"user","content":"How are you today?","weight":0.0},{"role":"assistant","content":"I'm fine.","weight":1.0}],"system":""}
{"items":[{"role":"user","content":"Who are you?","weight":0.0},{"role":"assistant","content":"I'm OpenChat.","weight":1.0}],"system":"You are a helpful assistant named OpenChat."}

For C-RLFT, condition should be set as the class the conversation belongs to (e.g. GPT3 or GPT4). The weight is assigned as 0 for human messages and w for assistant responses, where w is the weight of the class (e.g. 0.1 for GPT3 and 1 for GPT4, as found in our C-RLFT paper).

C-RLFT example:

{"items":[{"role":"user","content":"What is C-RLFT?","weight":0.0},{"role":"assistant","content":"C-RLFT is a method for improving open-source LLMs with mixed-quality data.","weight":1.0}],"condition":"GPT4","system":""}
{"items":[{"role":"user","content":"What is C-RLFT?","weight":0.0},{"role":"assistant","content":"I don't know.","weight":0.1}],"condition":"GPT3","system":""}

Pre-tokenizing the Dataset

You'll then need to pre-tokenize the dataset using the command (please specify a filename as PRETOKENIZED_DATA_OUTPUT_PATH to store the pretokenized dataset):

python -m ochat.data.generate_dataset --model-type MODEL_TYPE --model-path BASE_REPO --in-files data.jsonl --out-prefix PRETOKENIZED_DATA_OUTPUT_PATH

Launching the OpenChat Trainer

You can now launch the OpenChat trainer using the command below.

  • 13B model requires eight A/H100s with 80GB VRAM
  • 7B model can be trained with four A/H100s with 80GB VRAM or eight A/H100s with 40GB VRAM.

For hyperparameters, we recommend first setting the batch size to the recommended batch size. If OOM occurs, try setting it to the exact maximum that VRAM can hold and as a multiple of 2048. Other hyperparameters have been carefully selected as the default. Furthermore, the learning rate is automatically determined based on the inverse square-root rule.

Training Commands (click to expand)
NUM_GPUS=8

deepspeed --num_gpus=$NUM_GPUS --module ochat.training_deepspeed.train \
          --model_path BASE_REPO \
          --data_prefix PRETOKENIZED_DATA_OUTPUT_PATH \
          --save_path PATH_TO_SAVE_MODEL \
          --batch_max_len BATCH_SIZE \
          --epochs 5 \
          --save_every 1 \
          --deepspeed \
          --deepspeed_config ochat/training_deepspeed/deepspeed_config.json

You can find checkpoints of all epochs in PATH_TO_SAVE_MODEL. Then you may evaluate each epoch and choose the best one.

Limitations

Foundation Model Limitations

Despite its advanced capabilities, OpenChat is still bound by the limitations inherent in its foundation models. These limitations may impact the model's performance in areas such as:

  • Complex reasoning
  • Mathematical and arithmetic tasks
  • Programming and coding challenges

Hallucination of Non-existent Information

OpenChat may sometimes generate information that does not exist or is not accurate, also known as "hallucination". Users should be aware of this possibility and verify any critical information obtained the model.

Safety

OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.

License

Our OpenChat 3.5 code and models are distributed under the Apache License 2.0.

Models

Model Size Context Weights Serving
OpenChat 3.5 0106 7B 8192 Huggingface python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106 --engine-use-ray --worker-use-ray
OpenChat 3.5 1210 7B 8192 Huggingface python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-1210 --engine-use-ray --worker-use-ray
OpenChat 3.5 7B 8192 Huggingface python -m ochat.serving.openai_api_server --model openchat/openchat_3.5 --engine-use-ray --worker-use-ray

Legacy Models

The following models are older versions of OpenChat and have inferior performance compared to the latest version. They will be deprecated in the next release. Please note that OpenChat V1 and V2 series are now deprecated, please install 3.1.x for using V1 and V2 models

To run the models on multiple GPUs with smaller VRAM, you can enable tensor parallelization, for example, using the --tensor-parallel-size 2 flag.

Model Size Context Weights Serving
OpenChat 3.2 SUPER 13B 4096 Huggingface python -m ochat.serving.openai_api_server --model openchat/openchat_v3.2_super --engine-use-ray --worker-use-ray

Citation

@article{wang2023openchat,
  title={OpenChat: Advancing Open-source Language Models with Mixed-Quality Data},
  author={Wang, Guan and Cheng, Sijie and Zhan, Xianyuan and Li, Xiangang and Song, Sen and Liu, Yang},
  journal={arXiv preprint arXiv:2309.11235},
  year={2023}
}

💌Contact

Project Lead:

  • Guan Wang [imonenext at gmail dot com]
  • Alpay Ariyak [aariyak at wpi dot edu]

Main Contributors:

  • Sijie Cheng [csj23 at mails dot tsinghua dot edu dot cn]
  • Xianyuan Zhan (Tsinghua University)
  • Qiying Yu (Tsinghua University)
  • Changling Liu (GPT Desk Pte. Ltd.)
  • LDJ
  • AutoMeta (Alignment Lab AI)

Sponsors:

Special Thanks:

openchat's People

Contributors

adacheng avatar alpayariyak avatar doodlebears avatar eltociear avatar hariprasad-1999 avatar imoneoi avatar l1ppman avatar liucl19 avatar michael-jiahao-zhang avatar pratick-at avatar revolvingdcon avatar teamdman avatar tm17-abcgen avatar wseng avatar yqy2001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openchat's Issues

crash in VLLM

Trying to install it to NVidia's pytorch contaner. I'm getting this while running.
Same issue while trying to install it to Lambda GPU cloud on H100 instance. (all default)

root@0971a018b7ec:/workspace/openchat# python -m ochat.serving.openai_api_server --model_type openchat_v2 --model openchat/openchat_v2_w --engine-use-ray --worker-use-ray
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/openchat/ochat/serving/openai_api_server.py", line 21, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs
  File "/usr/local/lib/python3.10/dist-packages/vllm/__init__.py", line 4, in <module>
    from vllm.engine.async_llm_engine import AsyncLLMEngine
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 7, in <module>
    from vllm.engine.llm_engine import LLMEngine
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 16, in <module>
    from vllm.worker.worker import Worker
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 8, in <module>
    from vllm.model_executor import get_model, InputMetadata, set_random_seed
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/__init__.py", line 2, in <module>
    from vllm.model_executor.model_loader import get_model
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 9, in <module>
    from vllm.model_executor.models import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/__init__.py", line 1, in <module>
    from vllm.model_executor.models.bloom import BloomForCausalLM
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/bloom.py", line 31, in <module>
    from vllm.model_executor.layers.activation import get_act_fn
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/activation.py", line 5, in <module>
    from vllm import activation_ops
ImportError: /usr/local/lib/python3.10/dist-packages/vllm/activation_ops.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Reproduce Training Data

To reproduce the training data, we need ShareGPT htmls as stated in
The input folder should contain a ShareGPT folder with .html files for each ShareGPT conversation page inside.

It seems that the best ShareGPT source I can find online is here. However, it doesn't give model information and thus we have no way to filter for GPT4 responses.

Any pointers or hints on how to get GPT4 responses would be appreciated!

Openchat3.5 training data

Congrats for the V3.5 release!
May I ask if there are plans to release your finetuning data, just like what you have been always doing with your previous release?

Troubleshooting OpenChat and Non-English Data in the Dataset

Great job! I have a few questions:

  1. I'm using the following script to test OpenChat, but even with the correct prompt template, the output is not very accurate. How should I modify the testing code?

    tokenizer = LlamaTokenizer.from_pretrained(args.model_name_or_path, fast_tokenizer=False)
    model = create_hf_model(AutoModelForCausalLM, args.model_name_or_path, tokenizer, None)
    prompt = "<s>Human:  What are all the pairs of natural numbers which sum to 6?<|end_of_turn|>Assistant: "
    generation_config = GenerationConfig(max_new_tokens=2048,num_beams=1,do_sample=True,temperature=0.7,top_p=0.9)
    generate_ids = model.generate(input_ids=inputs.input_ids,generation_config=generation_config,)
    response = tokenizer.batch_decode(generate_ids,skip_special_tokens=True,clean_up_tokenization_spaces=False)[0]
    print(response)

output:
image

  1. I noticed that there is a lot of non-English data in the dataset, which obviously wouldn't benefit the performance of alpaca_eval. Why not filter out this data?
    https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/blob/main/openchat.train.text.json
    image

discord invite in readme expired

I wanted to ask about the tokenizer. I quantized the model with the MLC framework and I noticed that the model never generates token 32000 to indicate end of turn, rather it generates the string <|end_of_turn|> as a sequence of tokens. Not what I expected. I don't know if it's a usage issue on my part.

Question about the data source for Openchat-v3.2-super

Hello,

I am reaching out to inquire about the data source used for training Openchat-v3.2-super. Could you please clarify if the dataset openchat/openchat_sharegpt_v3 that was used for its training originates from RyokoAI/ShareGPT52K? Additionally, I would like to know the approximate time frame for the data collection, i.e., up to which date was the data collected?

Thank you for your time and consideration. I look forward to your response.

Can not reproduce the alpaca_eval results of openchat v1

I downloaded the provided openchat v1 model on huggingface through this model name 'openchat/openchat' and I use your model to predict the 805 evaluation queries of alpaca-eval and I can only get a win rate over davince-003 around 70 which is far from your reported number. The alpaca-eval is verified as having no bugs since I can reproduce the scores of other LLMs.

FYI, I set the query template to " Human: {query} <|end_of_turn|> Assistant: " and I am using top_p sampling with top_p=1.0, temperature=0.7 and the maximum overall token length to 2048, which are consistent with the configs from this link: https://github.com/tatsu-lab/alpaca_eval/blob/main/src/alpaca_eval/models_configs/openchat-13b/configs.yaml
I also find out that the prompt from https://github.com/tatsu-lab/alpaca_eval/blob/main/src/alpaca_eval/models_configs/openchat-13b/prompt.txt is not consistent with the huggingface model and the training data provided.

Can you kindly explain this performance discrepancy? Or maybe can you provide a script for openchat inference?

{ cause: [Error: AggregateError] }

Hi, i got this error and i cant find information about it:

C:\Users\xxxxxxxxxx\xxxxxxxxx\xxxxxxxx\openchat\openchat-ui>npm run dev

[email protected] dev
next dev

▲ Next.js 13.5.6

✓ Ready in 3.3s
✓ Compiled / in 1387ms (1682 modules)
✓ Compiled in 325ms (1682 modules)
✓ Compiled /api/models in 138ms (70 modules)
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }
[TypeError: fetch failed] { cause: [Error: AggregateError] }

Can NOT reproduce the alpaca_eval results of OpenChat V3.1 13B

Hi! Really appreciate your work and open source effort! And openchat is a really great model.
However, I can not reproduce the alpaca_eval results of OpenChat V3.1 13B. I just download the model_outputs.json you uploaded in alpaca_eval repo, and test this using my own gpt-4 API,
however, the winrate is 84.41,
which is not as high as you claimed in the leaderboard.
So can you reveal more details for your evaluation?
Many thanks!

openchat 3.5 log

The logs print token ids. it's useless and unreadable for human.

AssertionError: pydantic.dataclasses.dataclass only supports init=False

i tried everything, from using docker (gives error about vllm) to venv and conda env, this is the last error i get, do you guys have idea what should i do?

File "/home/user/miniconda3/envs/venv/lib/python3.11/site-packages/pydantic/dataclasses.py", line 139, in dataclass
assert init is False, 'pydantic.dataclasses.dataclass only supports init=False'
^^^^^^^^^^^^^
AssertionError: pydantic.dataclasses.dataclass only supports init=False

[Question] Can OpenChat be instruct-tuned for further downstream tasks?

Is it possible to further do instruction tuning on OpenChat with domain specific data? If so, is there any boilerplate that can be used as a starting point. I had earlier fine-tuned LLama-2 on my dataset with trl-sft script, and another try with llama-recipes boilerplate. The time taken by both scripts varied greatly(3x), including the tokenization process and other parameters. The final model however didn't perform well at all with weird and abrupt answers. Therefore, I'm hoping to get some insights if using openchat(or vicuna/wizardlm/llama2-chat) might make a difference?

Thank you for your response.

Installation is a mess, instructions are a mess

ChatGPT helped remove all the swear words.

I've encountered difficulties trying to set this up on Ubuntu, MacOS, and Windows. I've noticed some inconsistencies in the instructions, and it seems some tools and libraries might be either too new or outdated. It would be greatly appreciated if these issues could be addressed to make the project more user-friendly for everyone. Thank you.

CUDA out of memory on 8xA100 GPUs

Tried running the sample training script on 8xA100 GPUs. Used the sharegpt_v3.2 dataset recommended in your ReadMe.

I got this error: CUDA out of memory. Tried to allocate 688.00 MiB (GPU 1; 39.39 GiB total capacity; 37.95 GiB already allocated; 633.12 MiB free; 38.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

What setting did you use to train? I tried setting PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 as a test, still ran into the same out of memory error.

Purpose of the loss weighting

Hi ! Great work :)

I have a question regarding the loss weighting implementing in the repository. Do I understand it correctly that you assign a lower weights to tokens from the longer sequences, so that each sequence contributes more or less the same to the training, irrespective of its length ?

Regards

Detailed Training setting

Hi, may you provide the detailed hyper-paramters when you training llama-13b? For example, how many and what kind of GPUs you use, what are the gradient accumulation steps and batch size per GPU? Moreover, when I directly use your deepspeed config setting to deepspeed-initialize a llama-7b on an 80G A100, the server reports CUDA OOM error.

Looking forward to your reply.

Thank you so much!

Couldn't inference with gpu

When I load the model and perform inference using the Hugging Face framework, I noticed that although the model is loaded into GPU memory, the GPU usage remains at 0% while the CPU usage is at 100%. Here is the code:
def load_openchat_model(model_path:str,device_map):
model = LlamaForCausalLM.from_pretrained(
model_path,
load_in_8bit=False,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
)
model.to("cuda:0")
model.eval()
return model

inference code:
def infer_hf(input_text:str,model,tokenizer,device):
generation_config = dict(
temperature=0.8,
top_k=40,
top_p=0.9,
do_sample=True,
num_beams=1,
repetition_penalty=1.3,
max_new_tokens=400,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
)
with torch.inference_mode():
input_ids = tokenizer(input_text, return_tensors="pt")
generation_output = model.generate(
input_ids=input_ids["input_ids"].to(device),
attention_mask=input_ids['attention_mask'].to(device),
**generation_config
)
s = generation_output[0]
output = tokenizer.decode(s)
print(output)
I set device to "cuda:0"

Which flash attention version is being used?

I tried running the training script and got flash_attn_unpadded_func is not defined. Doing some digging, apparently it's deprecated in 2.0: https://github.com/Dao-AILab/flash-attention/blob/d30f2e1cd50185c98ed88c0684b4a603f15bee37/README.md?plain=1#L127

Is upgrading to flash-attn to 2.0 trivial (simply renaming some functions)? I'm not familiar with this project so can't say. If it's difficult, perhaps adding documentation somewhere specifying flash-attn 1.x is being used will be helpful for newcomers.

What is the difference between these versions of OpenChat?

Thank you for your amazing work! I have some questions below:

In alpaca eval leaderboard, there are 5 versions: OpenChatV3.1, OpenChatV2-W, OpenChatV2, OpenChat, OpenChat8192.
What is the difference?
And what is the datasets used?
What is the difference between openchat_shareGPT_v3 and openchat_shareGPT4?
Which datasets do you use for OpenChatV3.1?

Looking forward to your reply.

openchat3.5 training data formatting

Congrats to the authors on the great achievement!

Trying to understand your great work a bit more. In the inference examples, there are prompts like GPT4 Correct User, Code User. What are other conditional prompts used in training? What does Correct mean here? Thanks!

Contact

Is there any way to contact you? I want to work with you and I have a proposal.

docker gets undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

openchat-openchat-server-1 | Traceback (most recent call last): openchat-openchat-server-1 | File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main openchat-openchat-server-1 | return _run_code(code, main_globals, None, openchat-openchat-server-1 | File "/usr/lib/python3.10/runpy.py", line 86, in _run_code openchat-openchat-server-1 | exec(code, run_globals) openchat-openchat-server-1 | File "/ochat/serving/openai_api_server.py", line 29, in <module> openchat-openchat-server-1 | from ochat.config.model_config import MODEL_CONFIG_MAP openchat-openchat-server-1 | File "/ochat/config/model_config.py", line 7, in <module> openchat-openchat-server-1 | import ochat.models openchat-openchat-server-1 | File "/ochat/models/__init__.py", line 1, in <module> openchat-openchat-server-1 | from ochat.models.unpadded_llama import LlamaForCausalLM openchat-openchat-server-1 | File "/ochat/models/unpadded_llama.py", line 31, in <module> openchat-openchat-server-1 | from transformers.modeling_utils import PreTrainedModel openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 88, in <module> openchat-openchat-server-1 | from accelerate import dispatch_model, infer_auto_device_map, init_empty_weights openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/__init__.py", line 3, in <module> openchat-openchat-server-1 | from .accelerator import Accelerator openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 35, in <module> openchat-openchat-server-1 | from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/checkpointing.py", line 24, in <module> openchat-openchat-server-1 | from .utils import ( openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 136, in <module> openchat-openchat-server-1 | from .launch import ( openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/launch.py", line 33, in <module> openchat-openchat-server-1 | from ..utils.other import is_port_in_use, merge_dicts openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/other.py", line 27, in <module> openchat-openchat-server-1 | from .transformer_engine import convert_model openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/transformer_engine.py", line 21, in <module> openchat-openchat-server-1 | import transformer_engine.pytorch as te openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/__init__.py", line 6, in <module> openchat-openchat-server-1 | from .module import LayerNormLinear openchat-openchat-server-1 | File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module.py", line 20, in <module> openchat-openchat-server-1 | import transformer_engine_extensions as tex openchat-openchat-server-1 | ImportError: /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

seems like there is some sort of cuda version error?

Issue with `pip3 install ochat`

I'm using a Windows machine, and I've been following the instructions outlined in this answer:: #41 (comment)

Everything went smoothly until I reached the step of running pip3 install ochat, where I encountered an error.

Here's the error message I'm getting:

error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [15 lines of output]
      test.c
      LINK : fatal error LNK1181: cannot open input file 'aio.lib'
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "C:\Users\hasans\AppData\Local\Temp\pip-install-22ncvin1\deepspeed_d1d81ae59ce344d3a308adf94757a6b8\setup.py", line 
165, in <module>
        File "C:\Users\hasans\AppData\Local\Temp\pip-install-22ncvin1\deepspeed_d1d81ae59ce344d3a308adf94757a6b8\setup.py", line 
51, in abort
          assert False, msg
      AssertionError: Unable to pre-compile async_io
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables 
to where it can be found.
       [WARNING]  One can disable async_io with DS_BUILD_AIO=0
       [ERROR]  Unable to pre-compile async_io
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
PS C:\Users\hasans\Documents\openchat> pip install libaio-devel
ERROR: Could not find a version that satisfies the requirement libaio-devel (from versions: none)
ERROR: No matching distribution found for libaio-devel

Could someone please guide me on how to resolve this issue? Your assistance would be greatly appreciated!

Thank you for your time and effort in maintaining this amazing project.

Can I use system prompt when training?

Hi @imoneoi , I want my assistant to have different emotions or it can act as someone based on system prompt. So when training can I use data samples of the form {system_prompt} Human: {human_message} <|end_of_turn|>Assistant: {assistant_message} .... or such prompts are used in {human_message} like data samples you trained? Thank you!
Example: <s>Human: Act as SEO expert. I want you to create the best meta descriptions among my competitors.\n\nHere are the list of our competitor's meta descriptions. \n\n\nHere is my meta description. Revise it. I don't want NFT in my description. I do not offer any staking service. \n\nBuy and sell the world's largest selection of 10,000+ Cryptocurrencies<|end_of_turn|>Assistant: ....
Convert to <s>Act as SEO expert. I want you to create the best meta descriptions among my competitors <|end_of_turn|>Human: Here are the list of our competitor's meta descriptions. \n\n\nHere is my meta description. Revise it. I don't want NFT in my description. I do not offer any staking service. \n\nBuy and sell the world's largest selection of 10,000+ Cryptocurrencies<|end_of_turn|>Assistant: ....

An error in training

I have installed flash-attn using pip3 install --no-build-isolation "flash-attn<2"
But an error emerges:

File "openchat/ochat/models/unpadded_llama.py", line 184, in forward
    attn_output = flash_attn_varlen_func(
                  ^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'flash_attn_varlen_func' is not defined

Error Installing requirements.txt: ModuleNotFoundError for 'torch'

Hi, I trying to install your requirement.txt but getting this error message:

Getting requirements to build wheel ... error
ERROR: Command errored out with exit status 1:
command: /root/miniconda3/bin/python /root/miniconda3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpt4c_a3i6
cwd: /tmp/pip-install-_op0fkvy/flash-attn_32ecdb534ca149cebac1b8d1956665eb
Complete output (15 lines):
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 280, in <module>
main()
File "/root/miniconda3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/root/miniconda3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 114, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-0d5h4yvd/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-0d5h4yvd/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-0d5h4yvd/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "<string>", line 13, in <module>
ModuleNotFoundError: No module named 'torch'

It seems to be unable to find the module 'torch'. However, I have verified that torch is installed in my environment, with version 1.11.0+cu113 and torchvision version 0.12.0+cu113.

I have also tried to install the requirements on a different machine where torch version 2.0.0+cu117 is installed, but the error persists.

Any assistance to resolve this issue would be greatly appreciated. Thank you.

OpenCoderPlus outputs will not stop

I tried to launch OpenCoderPlus with the latest code of this repo and vLLM:

python -m ochat.serving.openai_api_server --model-type opencoder --model openchat/opencoderplus

It can work, but the outputs will never stop util hitting the max_tokens limit, even if I pass the stop parameter:

requests.post(
  "http://localhost:18888/v1/chat/completions",
  json={
    "model": "opencoder",
    "messages": [{"rule": "user", "content": "Write a bubble sort."}],
    "stop": ["<|end_of_turn|>"]
  }
) 

I refered to OpenCoderPlus's training data, it seems that this model is training on data with the <|end_of_turn|> character.

So does anyone know how to stop this model's outputs? Any help will be appreciated.

openchat_v3.2_super deployment to SageMaker doesn't work

Hi team,

I would like to deploy new model to AWS Sagemaker with below code and getting RuntimeError: weight model.layers.0.self_attn.rotary_emb.inv_freq does not exist seems something is missing in the model index. At least, I couldn't find it in https://huggingface.co/openchat/openchat_v3.2_super/blob/main/pytorch_model.bin.index.json

Thanks in advance for your help!

Here is the deploy.py

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='role-name')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'openchat/openchat_v3.2_super',
	'SM_NUM_GPUS': json.dumps(4)
}



# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	image_uri=get_huggingface_llm_image_uri("huggingface",version="0.9.3"),
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1,
	instance_type="ml.g5.12xlarge",
    endpoint_name="openchat-v3-2-super",
	container_startup_health_check_timeout=600,
  )
  
# send request
predictor.predict({
	"inputs": "My name is Julien and I like to",
})

LIMA pipeline

Do you have a pipeline script from which you reduced the 90K data to 6K based on LIMA?

Met error of CUDA out of memory. How to split the model into multiple GPUs?

When loading the checkpoint, it comes out:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 15.90 GiB total capacity; 15.30 GiB already allocated; 31.81 MiB free; 15.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.