aminediro / cria Goto Github PK
View Code? Open in Web Editor NEWOpenAI compatible API for serving LLAMA-2 model
License: MIT License
OpenAI compatible API for serving LLAMA-2 model
License: MIT License
I think /v1/completions
has streaming code but not /v1/chat/completions
the former API is deprecated and I'm using the new one.
To me it also looks like the code could be combined somehow? i.e. after creating the prompt from the incoming JSON with reference to #18
Perhaps it's a case of passing it to the streaming code in routes/completions.rs
So you're are aware and for some context to my requests we have integrated cria
into https://github.com/purton-tech/bionicgpt
Thanks
Hello, I came across this project while searching for OpenAI API compatible servers for llama.cpp and I was wondering if this can handle multiple requests at once?
Loading another model into RAM for each concurrent user doesn't seem like a great idea, and I was wondering if this was even possible at all with this project.
Thank you for your work!
thread 'main' panicked at 'Failed to load LLaMA model from "/home/cisco/git/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf": invalid magic number 46554747 (GGUF) for "/home/cisco/git/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf"', /home/cisco/git/cria/src/lib.rs:54:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I use Docker for deployment. I downloaded a 7B file, and the environment configuration file looks like this.
CRIA_SERVICE_NAME=cria
CRIA_HOST=0.0.0.0
CRIA_PORT=3000
CRIA_MODEL_ARCHITECTURE=llama
CRIA_MODEL_PATH=/llama/llama-2-7b/consolidated.00.pth
CRIA_USE_GPU=true
CRIA_GPU_LAYERS=32
CRIA_ZIPKIN_ENDPOINT=http://zipkin-server:9411/api/v2/spans
I see you have no LICENSE file for this project. The default is copyright.
I would suggest releasing the code under the GPL-3.0-or-later or AGPL-3.0-or-later license so that others are encouraged to contribute changes back to your project.
I attempted to build on Windows, specifically Windows 11 and it failed and I get the following error.
error[E0308]: mismatched types
--> llm\crates\llm-base\src\tokenizer\huggingface.rs:25:21
|
25 | .decode(vec![idx as u32], true)
| ------ ^^^^^^^^^^^^^^^^ expected `&[u32]`, found `Vec<u32>`
| |
| arguments to this method are incorrect
|
= note: expected reference `&[u32]`
found struct `Vec<u32>`
note: method defined here
--> C:\Users\gabri\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokenizers-0.13.4\src\tokenizer\mod.rs:814:12
|
814 | pub fn decode(&self, ids: &[u32], skip_special_tokens: bool) -> Result<String> {
| ^^^^^^
error[E0308]: mismatched types
--> llm\crates\llm-base\src\tokenizer\huggingface.rs:70:21
|
70 | .decode(tokens, skip_special_tokens)
| ------ ^^^^^^ expected `&[u32]`, found `Vec<u32>`
| |
| arguments to this method are incorrect
|
= note: expected reference `&[u32]`
found struct `Vec<u32>`
note: method defined here
--> C:\Users\gabri\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokenizers-0.13.4\src\tokenizer\mod.rs:814:12
|
814 | pub fn decode(&self, ids: &[u32], skip_special_tokens: bool) -> Result<String> {
| ^^^^^^
help: consider borrowing here
|
70 | .decode(&tokens, skip_special_tokens)
| +
For more information about this error, try `rustc --explain E0308`.
error: could not compile `llm-base` (lib) due to 2 previous errors
warning: build failed, waiting for other jobs to finish...
If anyone knows how to fix it, I appreciate the help. I just follow the details on how to run cria based on the README.
It seems like there is a problem with the llm
submodule?
I am trying to get this running in Docker
Here is my current build
from ubuntu:latest
RUN apt-get update && apt-get install -y \
sudo
RUN apt-get update && \
apt-get install -y \
curl \
build-essential \
libssl-dev \
pkg-config
# Install Rust and Cargo
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"
RUN yes | sudo apt install git
RUN git clone https://github.com/AmineDiro/cria.git
WORKDIR /cria
#RUN yes | git submodule update --init --recursive
# Build the project using Cargo in release mode
#RUN cargo build --release
#COPY ggml-model-q4_0.bin .
Unfortunately, it seems like the git submodule update --init --recursive
command is somehow interfacing with the repo via ssh
Have you considered maybe building this in a way that the dependencies can be installed without ssh?
The response cuts off at around 256tokens.
# .env
CRIA_MODEL_PATH=/home/bala/Models/llama-2-13b-chat.ggmlv3.q8_0.bin
# Other environement variables to set
CRIA_SERVICE_NAME=cria
CRIA_HOST=0.0.0.0
CRIA_PORT=3000
CRIA_MODEL_ARCHITECTURE=llama
CRIA_USE_GPU=false
CRIA_GPU_LAYERS=32
CRIA_ZIPKIN_ENDPOINT=http://zipkin-server:9411/api/v2/spans
Input
{
"prompt":"[INST]<<SYS>>.<</SYS>>How do I get from UNSW to Central Station?[/INST]",
"temperature":0.1
}
Output
console.log(response.choices[0].text)
There are several ways to get from the University of New South Wales (UNSW) to Central Station in Sydney. Here are some options: 1. Train: The easiest and most convenient way to get to Central Station from UNSW is by train. The UNSW campus is located near the Kensington station, which is on the Airport & South Line. You can take a train from Kensington station to Central Station. The journey takes around 20 minutes. 2. Bus: You can also take a bus from UNSW to Central Station. The UNSW campus is served by several bus routes, including the 395, 397, and 398. These buses run frequently throughout the day and the journey takes around 45-60 minutes, depending on traffic. 3. Light Rail: Another option is to take the light rail from UNSW to Central Station. The light rail runs along Anzac Parade and stops at Central Station. The journey takes around 30-40 minutes. 4. Taxi or Ride-sharing: You can also take a taxi or ride-sharing service such as Uber or Ly
I was following the steps given in the readme, new to rust, so have no idea how to get through this one, I tried googling but nothing.
Compiling ggml-sys v0.2.0-dev (D:\textgen\cria\llm\crates\ggml\sys)
error: failed to run custom build command for `ggml-sys v0.2.0-dev (D:\textgen\cria\llm\crates\ggml\sys)`
Caused by:
process didn't exit successfully: `D:\textgen\cria\target\release\build\ggml-sys-347ad8dfc92431e3\build-script-build` (exit code: 101)
--- stdout
cargo:rerun-if-changed=llama-cpp
OPT_LEVEL = Some("3")
TARGET = Some("x86_64-pc-windows-msvc")
HOST = Some("x86_64-pc-windows-msvc")
cargo:rerun-if-env-changed=CC_x86_64-pc-windows-msvc
CC_x86_64-pc-windows-msvc = None
cargo:rerun-if-env-changed=CC_x86_64_pc_windows_msvc
CC_x86_64_pc_windows_msvc = None
cargo:rerun-if-env-changed=HOST_CC
HOST_CC = None
cargo:rerun-if-env-changed=CC
CC = None
cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
CRATE_CC_NO_DEFAULTS = None
CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
DEBUG = Some("false")
cargo:rerun-if-env-changed=CFLAGS_x86_64-pc-windows-msvc
CFLAGS_x86_64-pc-windows-msvc = None
cargo:rerun-if-env-changed=CFLAGS_x86_64_pc_windows_msvc
CFLAGS_x86_64_pc_windows_msvc = None
cargo:rerun-if-env-changed=HOST_CFLAGS
HOST_CFLAGS = None
cargo:rerun-if-env-changed=CFLAGS
CFLAGS = None
OPT_LEVEL = Some("3")
TARGET = Some("x86_64-pc-windows-msvc")
HOST = Some("x86_64-pc-windows-msvc")
cargo:rerun-if-env-changed=CC_x86_64-pc-windows-msvc
CC_x86_64-pc-windows-msvc = None
cargo:rerun-if-env-changed=CC_x86_64_pc_windows_msvc
CC_x86_64_pc_windows_msvc = None
cargo:rerun-if-env-changed=HOST_CC
HOST_CC = None
cargo:rerun-if-env-changed=CC
CC = None
cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
CRATE_CC_NO_DEFAULTS = None
CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
DEBUG = Some("false")
cargo:rerun-if-env-changed=CFLAGS_x86_64-pc-windows-msvc
CFLAGS_x86_64-pc-windows-msvc = None
cargo:rerun-if-env-changed=CFLAGS_x86_64_pc_windows_msvc
CFLAGS_x86_64_pc_windows_msvc = None
cargo:rerun-if-env-changed=HOST_CFLAGS
HOST_CFLAGS = None
cargo:rerun-if-env-changed=CFLAGS
CFLAGS = None
--- stderr
thread 'main' panicked at 'Please make sure nvcc is executable and the paths are defined using CUDA_PATH, CUDA_INCLUDE_PATH and/or CUDA_LIB_PATH', llm\crates\ggml\sys\build.rs:344:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
From this guide https://replicate.com/blog/how-to-prompt-llama
A prompt with history would look like
<s>[INST] <<SYS>>
You are are a helpful... bla bla.. assistant
<</SYS>>
Hi there! [/INST] Hello! How can I help you today? </s><s>[INST] What is a neutron star? [/INST] A neutron star is a ... </s><s> [INST] Okay cool, thank you! [/INST]
It may even be that the newlines can be removed.
So I think this prompt technique should replace the one in https://github.com/AmineDiro/cria/blob/main/src/routes/chat.rs#L16
I'm getting this error when trying to run on MacOS:
error: invalid value 'llama-2' for '<MODEL_ARCHITECTURE>': llama-2 is not one of supported model architectures: [Bloom, Gpt2, GptJ, GptNeoX, Llama, Mpt]
If I use LLama
instead, it crashes (as it probably should)
GGML_ASSERT: llama-cpp/ggml.c:6192: ggml_nelements(a) == ne0*ne1*ne2
fish: Job 1, 'target/release/cria Llama ../ll…' terminated by signal SIGABRT (Abort)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.