Git Product home page Git Product logo

mftcoder's Introduction

MFTCoder: High Accuracy and Efficiency Multi-task Fine-Tuning Framework

stars forks License: MIT Open Issues

πŸ€— HuggingFace β€’ πŸ€– ModelScope

[δΈ­ζ–‡] [English]

Contents

News

πŸ”₯πŸ”₯πŸ”₯ [2024/01/17] We released MFTCoder v0.3.0, mainly for MFTCoder-accelerate. It now supports new models like Mixtral(MoE), DeepSeek-coder, chatglm3. It supports FSDP as an option. It also supports Self-paced Loss as a solution for convergence balance in Multitask Fine-tuning.

πŸ”₯πŸ”₯πŸ”₯ [2024/01/17] CodeFuse-DeepSeek-33B has been released, achieving a pass@1 (greedy decoding) score of 78.7% on HumanEval. It lists as top-1 LLM on Bigcode Leardboard in terms of win-rate, the official result is going to be published later.

πŸ”₯πŸ”₯πŸ”₯ [2024/01/17] CodeFuse-Mixtral-8x7B has been released, achieving a pass@1 (greedy decoding) score of 56.1% on HumanEval.

πŸ”₯πŸ”₯ [2023/11/07] MFTCoder Paper has been released on Arxiv, which discloses technique details of multi-task-fine-tuning.

πŸ”₯πŸ”₯ [2023/10/20] CodeFuse-QWen-14B has been released, achieving a pass@1 (greedy decoding) score of 48.8% on HumanEval, which gains 16% absolute improvement over the base model Qwen-14b

πŸ”₯πŸ”₯ [2023/09/27] CodeFuse-StarCoder-15B has been released, achieving a pass@1 (greedy decoding) score of 54.9% on HumanEval.

πŸ”₯πŸ”₯ [2023/09/26]We are pleased to announce the release of the 4-bit quantized version of CodeFuse-CodeLlama-34B. Despite the quantization process, the model still achieves a remarkable 73.8% accuracy (greedy decoding) on the HumanEval pass@1 metric.

πŸ”₯πŸ”₯ [2023/09/07]We released CodeFuse-CodeLlama-34B, which achieves the 74.4% Python Pass@1 (greedy decoding) and surpasses GPT4 (2023/03/15) and ChatGPT-3.5 on the HumanEval Benchmarks.

πŸ”₯πŸ”₯ [2023/08/26]We released MFTCoder-v0.1.0 which supports finetuning Code Llama, Llama, Llama2, StarCoder, ChatGLM2, CodeGeeX2, Qwen, and GPT-NeoX models with LoRA/QLoRA.

HumanEval Performance

Model HumanEval(Pass@1) Date
CodeFuse-DeepSeek-33B 78.7% 2024/01
CodeFuse-CodeLlama-34B 74.4% 2023/09
CodeFuse-CodeLlama-34B-4bits 73.8% 2023/09
WizardCoder-Python-34B-V1.0 73.2% 2023/08
GPT-4(zero-shot) 67.0% 2023/03
PanGu-Coder2 15B 61.6% 2023/08
CodeFuse-Mixtral-8x7B 56.1% 2024/01
CodeFuse-StarCoder-15B 54.9% 2023/08
CodeLlama-34b-Python 53.7% 2023/08
CodeFuse-QWen-14B 48.8% 2023/10
CodeLlama-34b 48.8% 2023/08
GPT-3.5(zero-shot) 48.1% 2022/11
OctoCoder 46.2% 2023/08
StarCoder-15B 33.6% 2023/05
QWen-14B 32.3% 2023/10

Articles

MFT Arxiv paper

Introduction

High Accuracy and efficiency Multi-task Fine-tuning framework for Code LLMs.

MFTCoder is an open-source project of CodeFuse for accurate and efficient Multi-task Fine-tuning(MFT) on Large Language Models(LLMs), especially on Code-LLMs(large language model for code tasks). Moreover, we open source Code LLM models and code-related datasets along with the MFTCoder framework.

In MFTCoder, we released two codebases for finetuning Large Language Models:

  • MFTCoder-accelerate is a framework with accelerate and DeepSpeed/FSDP. All tech-stacks are open-source and vibrant. We highly recommend you try this framework and make your fintuning accurate and efficient.
  • MFTCoder-atorch is based on the ATorch frameworks, which is a fast distributed training framework of LLM.

The aim of this project is to foster collaboration and share advancements in large language models, particularly within the domain of code development.

Frameworks

img.jpg

Highlights

βœ… Multi-task: Train models on multiple tasks while maintaining a balance between them. The models can even generalize to new, previously unseen tasks.

βœ… Multi-model: It integrates state-of-the-art open-source models such as gpt-neox, llama, llama-2, baichuan, Qwen, chatglm2, and more. (These finetuned models will be released in the near future.)

βœ… Multi-framework: It provides support for both Accelerate (with Deepspeed and FSDP) and ATorch

βœ… Efficient fine-tuning: It supports LoRA, QLoRA as well as Full-parameters training, enabling fine-tuning of large models with minimal resources. The training speed meets the demands of almost all fine-tuning scenarios.

The main components of this project include:

  • Support for both SFT (Supervised FineTuning) and MFT (Multi-task FineTuning). The current MFTCoder achieves data balance among multiple tasks, and future releases will achieve a balance between task difficulty and convergence speed during training.
  • Support for QLoRA instruction fine-tuning, LoRA fine-tuning as well as Full-parameters fine-tuning.
  • Support for most mainstream open-source large models, particularly those relevant to Code-LLMs, such as DeepSeek-coder, Mistral, Mixtral, Chatglm3, Code-LLaMA, Starcoder, Codegeex2, Qwen, GPT-Neox, and more.
  • Support for weight merging between the LoRA adaptor and base models, simplifying the inference process.
  • Release of 2 high-quality code-related instruction fine-tuning datasets: Evol-instruction-66k and CodeExercise-Python-27k.
  • Release of many Code LLMs, please refer to organizations: codefuse-ai on huggingface or codefuse-ai on modelscope.

Requirements

To begin, ensure that you have successfully installed CUDA (version >= 11.4, preferably 11.7) along with the necessary drivers. Additionally, make sure you have installed torch (version 2.0.1).

Next, we have provided an init_env.sh script to simplify the installation of required packages. Execute the following command to run the script:

sh init_env.sh

We highly recommend training with flash attention(version >= 2.1.0, preferably 2.3.6), please refer to the following link for installation instructions: https://github.com/Dao-AILab/flash-attention

Training

As mentioned above, we open source two training frameworks. You could refer to their own READMEs for more details as followed.

If you are familiar with open source transformers, DeepSpeed or FSDP, we highly recommend you try:

πŸš€πŸš€ MFTCoder-accelerate: Accelerate + Deepspeed/FSDP Codebase for MFT(Multi-task Finetuning)

If you want to explore some new framework like atorch, you could check:

πŸš€ MFTCoder-atorch: Atorch Codebase for MFT(Multi-task Finetuning)

Models

We are excited to release the following two CodeLLMs trained by MFTCoder, now available on both HuggingFace and ModelScope:

Model HuggingFace Links ModelScope Links Base Model Num of examples trained Batch Size Seq Length
πŸ”₯ CodeFuse-DeepSeek-33B h-link m-link DeepSeek-coder-33B 60δΈ‡ 80 4096
πŸ”₯ CodeFuse-Mixtral-8x7B h-link m-link Mixtral-8x7B 60δΈ‡ 80 4096
πŸ”₯ CodeFuse-CodeLlama-34B h-link m-link CodeLlama-34b-Python 60δΈ‡ 80 4096
πŸ”₯ CodeFuse-CodeLlama-34B-4bits h-link m-link CodeLlama-34b-Python 4096
πŸ”₯ CodeFuse-StarCoder-15B h-link m-link StarCoder-15B 60δΈ‡ 80 4096
πŸ”₯ CodeFuse-QWen-14B h-link m-link Qwen-14b 110δΈ‡ 256 4096
πŸ”₯ CodeFuse-CodeGeex2-6B h-link m-link CodeGeex2-6B 110δΈ‡ 256 4096

Datasets

We are also pleased to release two code-related instruction datasets, meticulously selected from a range of datasets to facilitate multitask training. Moving forward, we are committed to releasing additional instruction datasets covering various code-related tasks.

Dataset Description
⭐ Evol-instruction-66k Based on open-evol-instruction-80k, filter out low-quality, repeated, and similar instructions to HumanEval, thus get high-quality code instruction dataset.
⭐ CodeExercise-Python-27k python code exercise instruction dataset

Contributing

Contributions are welcome! If you have any suggestions, ideas, bug reports, or new model/feature supported, please open an issue or submit a pull request.

Citation

If you find our work useful or helpful for your R&D works, please feel free to cite our paper as below.

@article{mftcoder2023,
      title={MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning}, 
      author={Bingchang Liu and Chaoyu Chen and Cong Liao and Zi Gong and Huan Wang and Zhichao Lei and Ming Liang and Dajun Chen and Min Shen and Hailian Zhou and Hang Yu and Jianguo Li},
      year={2023},
      journal={arXiv preprint arXiv},
      archivePrefix={arXiv},
      eprint={2311.02303}
}

Star-History

Star History Chart

mftcoder's People

Contributors

ss41979310 avatar chencyudel avatar jglee2046 avatar antlab-ri avatar wj882018 avatar twelveand0 avatar rankcoder avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.