Git Product home page Git Product logo

alpaca-lora-serve's Introduction

Alpaca-LoRA as a Chatbot Service

🚧 This project is still under development process. While serving the project, I noticed there are some bugs emitted by the model itself such as too many line breaks which causes OOM eventually. You can propose PR, but I will merge any improvement at any time as soon as I spot any problems.

This repository demonstrates Alpaca-LoRA as a Chatbot service with Alpaca-LoRA and Gradio. It comes with the following features:

The easiest way to run this project is to use Colab. Just open up the alpaca_lora_in_colab notebook in Colab (there is a button open in colab), and run every cell sequentially. With the standard GPU instance(T4), you can run 7B and 13B models. With the premium GPU instance(A100 40GB), you can even run 30B model! Screenshot👇🏼

Mode

1. Batch Generation Mode: batch generation mode aggregates requests up to batch_size, and pass the prompts in the requests to the model. It waits the current requests are fully handled. For instance, with batch_size=4, if a user sends a request, that is under processing. While it is under processing, if other users are connected, up to 4 requests from the users are aggregated and processed as soon as the current one is done.

2. Streaming Mode: streaming mode handles multiple requests in a interleaving way with threads. For instance, if there are two users (A and B) are connected, A's request is handled, and then B's request is handled, and then A's request is handled again.... This is because of the nature of streaming mode which generates and yield tokens in one by one manner.

Context management

  • Alpaca-LoRA as a Chatbot Service manages context in two ways. First of all, it remembers(stores) every history of the conversations by default as in the following code snippet. context_string is set as "Below is a history of instructions that describe tasks, paired with an input that provides further context. Write a response that appropriately completes the request by remembering the conversation history." by default, but it could be set manually via the Context field on top of the screen.
    • additionall, there is a Summarize button in the middle (you need to expand the component labeled as "Helper Buttons"). If you click this button, it automatically input "summarize our conversations so far in three sentences." as a prompt, and the resulting generated text will be inserted into the Context field. THen all the conversation history up to this point will be ignored. That means the conversation fresh restarts with the below code snippet except context_string will be filled up with the model generated text.
f"""{context_string}

### Input: {input} # Surrounding information to AI

### Instruction: {prompt1} # First instruction/prompt given by user

### Response {response1} # First response on the first prompt by AI

### Instruction: {prompt2} # Second instruction/prompt given by user

### Response: {response2} # Second response on the first prompt by AI
....
"""

misc.

  • There is a continue button in the middle of screen. What it does is to simply send "continue." prompt to the model. This is useful if you get incomplete previous response from the model. With the "continue.", the model tries to complete the response. Also, since this is a continuation of the response, the "continue." prompt will be hidden to make chatting history more natural.

Currently supported LoRA checkpoints

Instructions

  1. Prerequisites

Note that the code only works Python >= 3.9

$ conda create -n alpaca-serve python=3.9
$ conda activate alpaca-serve
  1. Install dependencies
$ cd Alpaca-LoRA-Serve
$ pip install -r requirements.txt
  1. Run Gradio application
$ BASE_URL=decapoda-research/llama-7b-hf
$ FINETUNED_CKPT_URL=tloen/alpaca-lora-7b

$ python app.py --base_url $BASE_URL --ft_ckpt_url $FINETUNED_CKPT_URL --port 6006

the following flags are supported

usage: app.py [-h] [--base_url BASE_URL] [--ft_ckpt_url FT_CKPT_URL] [--port PORT] [--batch_size BATCH_SIZE]
              [--api_open API_OPEN] [--share SHARE] [--gen_config_path GEN_CONFIG_PATH]

Gradio Application for Alpaca-LoRA as a chatbot service

optional arguments:
  -h, --help            show this help message and exit
  --base_url BASE_URL   Hugging Face Hub url
  --ft_ckpt_url FT_CKPT_URL
                        Hugging Face Hub url
  --port PORT           port to serve app
  --batch_size BATCH_SIZE
                        how many requests to handle at the same time
                        default is set to 1 which enables streaming mode
  --api_open API_OPEN   do you want to open as API
  --share SHARE         do you want to share temporarily
  --gen_config_path GEN_CONFIG_PATH
                        which config to use for GenerationConfig

Design figure

Acknowledgements

I am thankful to Jarvislabs.ai who generously provided free GPU resources to experiment with Alpaca-LoRA deployment and share it to communities to try out.

alpaca-lora-serve's People

Contributors

deep-diver avatar gururise avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.