Git Product home page Git Product logo

grouped_gemm-rocm's Introduction

EmbeddedLLM

Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). Easiest way to launch OpenAI API Compatible Server on Windows, Linux and MacOS

Support matrix Supported now Under Development On the roadmap
Model architectures Gemma
Llama *
Mistral +
Phi
Platform Linux
Windows
Architecture x86
x64
Arm64
Hardware Acceleration CUDA
DirectML
IpexLLM
QNN
ROCm
OpenVINO

* The Llama model architecture supports similar model families such as CodeLlama, Vicuna, Yi, and more.

+ The Mistral model architecture supports similar model families such as Zephyr.

πŸš€ Latest News

  • [2024/06] Support Phi-3 (mini, small, medium), Phi-3-Vision-Mini, Llama-2, Llama-3, Gemma (v1), Mistral v0.3, Starling-LM, Yi-1.5.
  • [2024/06] Support vision/chat inference on iGPU, APU, CPU and CUDA.

Table Content

Supported Models (Quick Start)

Models Parameters Context Length Link
Gemma-2b-Instruct v1 2B 8192 EmbeddedLLM/gemma-2b-it-onnx
Llama-2-7b-chat 7B 4096 EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml
Llama-2-13b-chat 13B 4096 EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml
Llama-3-8b-chat 8B 8192 EmbeddedLLM/mistral-7b-instruct-v0.3-onnx
Mistral-7b-v0.3-instruct 7B 32768 EmbeddedLLM/mistral-7b-instruct-v0.3-onnx
Phi-3-mini-4k-instruct-062024 3.8B 4096 EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx
Phi3-mini-4k-instruct 3.8B 4096 microsoft/Phi-3-mini-4k-instruct-onnx
Phi3-mini-128k-instruct 3.8B 128k microsoft/Phi-3-mini-128k-instruct-onnx
Phi3-medium-4k-instruct 17B 4096 microsoft/Phi-3-medium-4k-instruct-onnx-directml
Phi3-medium-128k-instruct 17B 128k microsoft/Phi-3-medium-128k-instruct-onnx-directml
Openchat-3.6-8b 8B 8192 EmbeddedLLM/openchat-3.6-8b-20240522-onnx
Yi-1.5-6b-chat 6B 32k EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx
Phi-3-vision-128k-instruct 128k EmbeddedLLM/Phi-3-vision-128k-instruct-onnx

Getting Started

Installation

From Source

  • Windows

    1. Custom Setup:
    • IPEX(XPU): Requires anaconda environment. conda create -n ellm python=3.10 libuv; conda activate ellm.
    • DirectML: If you are using Conda Environment. Install additional dependencies: conda install conda-forge::vs2015_runtime.
    1. Install embeddedllm package. $env:ELLM_TARGET_DEVICE='directml'; pip install -e .. Note: currently support cpu, directml and cuda.

      • DirectML: $env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml]
      • CPU: $env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu]
      • CUDA: $env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda]
      • IPEX: $env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop
      • OpenVINO: $env:ELLM_TARGET_DEVICE='openvino'; pip install -e .[openvino]
      • With Web UI:
        • DirectML: $env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml,webui]
        • CPU: $env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu,webui]
        • CUDA: $env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda,webui]
        • IPEX: $env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop; pip install -r requirements-webui.txt
        • OpenVINO: $env:ELLM_TARGET_DEVICE='openvino'; pip install -e .[openvino,webui]
  • Linux

    1. Custom Setup:
    • IPEX(XPU): Requires anaconda environment. conda create -n ellm python=3.10 libuv; conda activate ellm.
    • DirectML: If you are using Conda Environment. Install additional dependencies: conda install conda-forge::vs2015_runtime.
    1. Install embeddedllm package. ELLM_TARGET_DEVICE='directml' pip install -e .. Note: currently support cpu, directml and cuda.

      • DirectML: ELLM_TARGET_DEVICE='directml' pip install -e .[directml]
      • CPU: ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu]
      • CUDA: ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda]
      • IPEX: ELLM_TARGET_DEVICE='ipex' python setup.py develop
      • OpenVINO: ELLM_TARGET_DEVICE='openvino' pip install -e .[openvino]
      • With Web UI:
        • DirectML: ELLM_TARGET_DEVICE='directml' pip install -e .[directml,webui]
        • CPU: ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu,webui]
        • CUDA: ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda,webui]
        • IPEX: ELLM_TARGET_DEVICE='ipex' python setup.py develop; pip install -r requirements-webui.txt
        • OpenVINO: ELLM_TARGET_DEVICE='openvino' pip install -e .[openvino,webui]

Launch OpenAI API Compatible Server

  1. Custom Setup:

    • Ipex

      • For Intel iGPU:

        set SYCL_CACHE_PERSISTENT=1
        set BIGDL_LLM_XMX_DISABLED=1
      • For Intel Arcβ„’ A-Series Graphics:

        set SYCL_CACHE_PERSISTENT=1
  2. ellm_server --model_path <path/to/model/weight>.

  3. Example code to connect to the api server can be found in scripts/python. Note: To find out more of the supported arguments. ellm_server --help.

Launch Chatbot Web UI

  1. ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost. Note: To find out more of the supported arguments. ellm_chatbot --help.

asset/ellm_chatbot_vid.webp

Launch Model Management UI

It is an interface that allows you to download and deploy OpenAI API compatible server. You can find out the disk space required to download the model in the UI.

  1. ellm_modelui --port 6678. Note: To find out more of the supported arguments. ellm_modelui --help.

Model Management UI

Compile OpenAI-API Compatible Server into Windows Executable

NOTE: OpenVINO packaging currently uses torch==2.4.0. It will not be able to run due to missing dependencies which is libomp. Make sure to install libomp and add the libomp-xxxxxxx.dll to C:\\Windows\\System32.

  1. Install embeddedllm.

  2. Install PyInstaller: pip install pyinstaller==6.9.0.

  3. Compile Windows Executable: pyinstaller .\ellm_api_server.spec.

  4. You can find the executable in the dist\ellm_api_server.

  5. Use it like ellm_server. .\ellm_api_server.exe --model_path <path/to/model/weight>.

    Powershell/Terminal Usage:

    ellm_server --model_path <path/to/model/weight>
    
    # DirectML
    ellm_server --model_path 'EmbeddedLLM_Phi-3-mini-4k-instruct-062024-onnx\onnx\directml\Phi-3-mini-4k-instruct-062024-int4' --port 5555
    
    # IPEX-LLM
    ellm_server --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\'  --backend 'ipex' --device 'xpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'
    
    # OpenVINO
    ellm_server --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\'  --backend 'openvino' --device 'gpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'

Prebuilt OpenAI API Compatible Windows Executable (Alpha)

You can find the prebuilt OpenAI API Compatible Windows Executable in the Release page.

Powershell/Terminal Usage (Use it like ellm_server):

.\ellm_api_server.exe --model_path <path/to/model/weight>

# DirectML
.\ellm_api_server.exe --model_path 'EmbeddedLLM_Phi-3-mini-4k-instruct-062024-onnx\onnx\directml\Phi-3-mini-4k-instruct-062024-int4' --port 5555

# IPEX-LLM
.\ellm_api_server.exe --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\'  --backend 'ipex' --device 'xpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'

# OpenVINO
.\ellm_api_server.exe --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\'  --backend 'openvino' --device 'gpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'

Acknowledgements

grouped_gemm-rocm's People

Contributors

tgale96 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.