Git Product home page Git Product logo

llmlinguamitm's Introduction

LLMLinguaMITM

This is a proof of concept of a small server that sits inbetween your frontend and LLM backend and compresses prompts using LLMLingua.

Right now the tested environment is intercepting communication between SillyTavern and oobabooga's Text Generation WebUI.

Disclaimer

This was whipped up within a few hours, there are no guarantees that it will work as expected, work at all, or not explode your PC.

I take no responsibility for what happens and have only conducted a limited amount of testing after I got it to work.

Requirements

By default, this software requires:

Nvidia GPU with at least 6 GB VRAM (as it runs CUDA using TheBloke/Llama-2-7b-Chat-GPTQ)

Other methods have not been tested yet. For further information go here: LLMLingua Docs

Installation

Windows

To install you need first to set up a Python virtual environment:

python -m venv \path\to\venv

Afterward start up the venv:

# In cmd.exe
\path\to\venv\Scripts\activate.bat
# In PowerShell
\path\to\venv\Scripts\Activate.ps1

Within the virtual environment, install the requirements:

pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Now you can go onto Running

Linux

To install you need first to set up a Python virtual environment:

python -m venv /path/to/venv

Afterward start up the venv:

source /path/to/venv/bin/activate

Within the virtual environment, install the requirements:

pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Now you can go onto Running

Configuration

The application expects a config.yaml file next to the Python script:

base_url: 'url' # The target LLM backend
model_name: 'TheBloke/Llama-2-7b-Chat-GPTQ' # The model used to compress
branch: 'main' # Branch to use
device: 'cuda' # Device environment (e.g., 'cuda', 'cpu', 'mps')
ratio: 0.5 # Compression ratio for the prompt

Running

Simply run the command:

python -m uvicorn main:app --host 0.0.0.0 --port 8080 

Within SillyTavern, set this as your target API:

Text Completion
Default (oobabooga)

llmlinguamitm's People

Contributors

dakraid avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

genaiwizards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.