Git Product home page Git Product logo

bitnet-1.58-instruct's Introduction

BitNet-1.58-Instruct

๐Ÿ“• Resources. ๐Ÿ”— Blog Post: https://www.oxen.ai/blog/arxiv-dives-bitnet-1-58

๐Ÿฑ GitHub Repo: https://github.com/someshfengde/BitNet-1.58-Instruct

๐Ÿ wandb dashboard: https://wandb.ai/som/1bitllm_finetuning?nw=nwusersom

๐Ÿ‘จโ€๐Ÿ’ป Lightning Studio link: https://lightning.ai/someshfengde/vision-model/studios/1-5bitllms-finetuning/code

What's changed in this repo.

  • Added wandb tracking
  • Able to run over CPU and multiGPU for finetuning LLM
  • Fientuned over the mistralai data
  • Evaluated by using LLM as a judge

Implementation of BitNet-1.58 instruct tuning. All data and models are versioned and stored on Oxen.ai at ox/BitNet. This work builds off the pre-trained models released in the 1bitLLM/bitnet_b1_58-large project on Hugging Face.

Code Name: Bessie the BitNet ๐Ÿ‚

Comparison of Model Outputs Before and After Finetuning

Old Outputs (Before Finetuning)

Model Correctness

is_correct count
True 73
False 27

Model Metric: Accuracy

is_correct proportion
True 0.73
False 0.27

New Outputs (After Finetuning)

Model Correctness

is_correct count
True 75
False 25

Model Metric: Accuracy

is_correct proportion
True 0.75
False 0.25

Improvement: After finetuning, there is a 2% improvement in model accuracy.

output examples

alt text

alt text

alt text

alt text

Motivation

This is work done was originally done for the arXiv dive community and more info on BitNets can be found on our blog post.

We also have some internal use cases at Oxen.ai for a fast and local LLM. BitNet 1.58 seem like an interesting direction. We will open source our models, data, and code as we go.

Inference

There is a simple script to prompt given a system message. You can give it a base llm or fine tuned llm.

Run Base Model

python scripts/prompt.py -m 1bitLLM/bitnet_b1_58-large

Run Fine-Tuned Model

oxen download ox/BitNet models/bitnet_b1_58-large-instruct-100k
python scripts/prompt.py -m models/bitnet_b1_58-large-instruct-100k

Training

The training was done on an A10 with 24GB of VRAM. We cut off the max seq len to 768 because otherwise it runs out of VRAM on some batches. Would be nice to kick off a train on a larger GPU and larger context length.

python tools/train.py -d -m 1bitLLM/bitnet_b1_58-large -d train.jsonl -o results/bitnet_b1_58-large-instruct

Pre-Training

The models are trained with RedPajama dataset for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following paper.

NOTE: This repo does not perform the pre-training, just uses these models as a jumping off point for instruct tuning.

Instruct Tuning Data

The instruct tuning was done on a mix of data:

  1. SQuADv2 with context and questions
  2. mosaicml/instruct-v3

You can see the mix of data here:

https://www.oxen.ai/ox/BitNet/file/main/train.jsonl

oxen download ox/BitNet train.jsonl
oxen download ox/BitNet dev.jsonl

Data Format

The dataset should be jsonl with prompt and response fields for the SFT step.

head -n 1 train.jsonl | jq
{
  "prompt": "What is Oxen.ai?",
  "response": "Oxen.ai is a Open-source tools to track, iterate, collaborate on, and discover multi-modal data in any format.",
  "source": "manual"
}

System Prompt

The system prompt is currently hard coded into bitnet/prompts/assistant_prompt.py.

You are Bessie, created by Oxen.ai. You are happy to help with writing, analysis, question answering, math, coding, and all sorts of other tasks. You give concise responses to simple questions or statements, but provide thorough responses to more complex and open-ended questions. Answer the user's query as best as you can, and say "I don't know" if you don't know the answer.

TODO: Evaluation

For evaluation purposes, we are also using SQuAD dataset. The idea is the model should be able to answer generic questions as well as extract answers from questions and context if provided.

If the answer is not in the context, we want to be able to say "Not in context.".

python tools/eval.py -m results/bitnet_b1_58-large-instruct/final_checkpoint/ -d dev.jsonl -o eval.jsonl -n 100

The eval script outputs a dataframe like this:

TODO:

bitnet-1.58-instruct's People

Contributors

someshfengde avatar

Watchers

 avatar

Forkers

techthiyanes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.