Git Product home page Git Product logo

xusenlinzy / api-for-open-llm Goto Github PK

View Code? Open in Web Editor NEW
2.2K 23.0 256.0 18.24 MB

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口

License: Apache License 2.0

Dockerfile 0.08% Python 99.92%
docker langchain llms nlp openai chatglm llama baichuan internlm llama2 qwen xverse sqlcoder code-llama

api-for-open-llm's Introduction

API for Open LLMs

llm.png

图片来自于论文: [A Survey of Large Language Models](https://arxiv.org/pdf/2303.18223.pdf)

📢 新闻

  • 【2024.06.13】 支持 MiniCPM-Llama3-V-2_5 模型,修改环境变量 MODEL_NAME=minicpm-v PROMPT_NAME=minicpm-v DTYPE=bfloat16

  • 【2024.06.12】 支持 GLM-4V 模型,修改环境变量 MODEL_NAME=glm-4v PROMPT_NAME=glm-4v DTYPE=bfloat16, 测试示例见 glm4v

  • 【2024.06.08】 已支持 QWEN2 模型,修改环境变量 MODEL_NAME=qwen2 PROMPT_NAME=qwen2

  • 【2024.06.05】 支持 GLM4 模型,修改环境变量 MODEL_NAME=chatglm4 PROMPT_NAME=chatglm4

  • 【2024.04.18】 支持 Code Qwen 模型,sql问答demo

  • 【2024.04.16】 支持 Rerank 重排序模型,使用方式

  • 【2024.02.26】 QWEN1.5 模型需要修改环境变量 MODEL_NAME=qwen2 PROMPT_NAME=qwen2

更多新闻和历史请转至 此处


此项目主要内容

此项目为开源大模型的推理实现统一的后端接口,与 OpenAI 的响应保持一致,具有以下特性:

  • ✨ 以 OpenAI ChatGPT API 的方式调用各类开源大模型

  • 🖨️ 支持流式响应,实现打印机效果

  • 📖 实现文本嵌入模型,为文档知识问答提供支持

  • 🦜️ 支持大规模语言模型开发工具 langchain 的各类功能

  • 🙌 只需要简单的修改环境变量即可将开源模型作为 chatgpt 的替代模型,为各类应用提供后端支持

  • 🚀 支持加载经过自行训练过的 lora 模型

  • ⚡ 支持 vLLM 推理加速和处理并发请求

内容导引

章节 描述
💁🏻‍♂支持模型 此项目支持的开源模型以及简要信息
🚄启动方式 启动模型的环境配置和启动命令
⚡vLLM启动方式 使用 vLLM 启动模型的环境配置和启动命令
💻调用方式 启动模型之后的调用方式
❓常见问题 一些常见问题的回复

🐼 支持模型

语言模型

模型 模型参数大小
Baichuan 7B/13B
ChatGLM 6B
DeepSeek 7B/16B/67B/236B
InternLM 7B/20B
LLaMA 7B/13B/33B/65B
LLaMA-2 7B/13B/70B
LLaMA-3 8B/70B
Qwen 1.8B/7B/14B/72B
Qwen1.5 0.5B/1.8B/4B/7B/14B/32B/72B/110B
Qwen2 0.5B/1.5B/7B/57B/72B
Yi (1/1.5) 6B/9B/34B

启动方式详见 vLLM启动方式transformers启动方式

嵌入模型

模型 维度 权重链接
bge-large-zh 1024 bge-large-zh
m3e-large 1024 moka-ai/m3e-large
text2vec-large-chinese 1024 text2vec-large-chinese
bce-embedding-base_v1(推荐) 768 bce-embedding-base_v1

🤖 使用方式

环境变量

  • OPENAI_API_KEY: 此处随意填一个字符串即可

  • OPENAI_API_BASE: 后端启动的接口地址,如:http://192.168.0.xx:80/v1

cd streamlit-demo
pip install -r requirements.txt
streamlit run streamlit_app.py

img.png

👉 Chat Completions
from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://192.168.20.59:7891/v1/",
)

# Chat completion API
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "你好",
        }
    ],
    model="gpt-3.5-turbo",
)
print(chat_completion)
# 你好👋!我是人工智能助手 ChatGLM3-6B,很高兴见到你,欢迎问我任何问题。


# stream = client.chat.completions.create(
#     messages=[
#         {
#             "role": "user",
#             "content": "感冒了怎么办",
#         }
#     ],
#     model="gpt-3.5-turbo",
#     stream=True,
# )
# for part in stream:
#     print(part.choices[0].delta.content or "", end="", flush=True)
👉 Completions
from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://192.168.20.59:7891/v1/",
)


# Chat completion API
completion = client.completions.create(
    model="gpt-3.5-turbo",
    prompt="你好",
)
print(completion)
# 你好👋!我是人工智能助手 ChatGLM-6B,很高兴见到你,欢迎问我任何问题。
👉 Embeddings
from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://192.168.20.59:7891/v1/",
)


# compute the embedding of the text
embedding = client.embeddings.create(
    input="你好",
    model="text-embedding-ada-002"
)
print(embedding)

可接入的项目

通过修改 OPENAI_API_BASE 环境变量,大部分的 chatgpt 应用和前后端项目都可以无缝衔接!

docker run -d -p 3000:3000 \
   -e OPENAI_API_KEY="sk-xxxx" \
   -e BASE_URL="http://192.168.0.xx:80" \
   yidadaa/chatgpt-next-web

web

# 在docker-compose.yml中的api和worker服务中添加以下环境变量
OPENAI_API_BASE: http://192.168.0.xx:80/v1
DISABLE_PROVIDER_CONFIG_VALIDATION: 'true'

dify

📜 License

此项目为 Apache 2.0 许可证授权,有关详细信息,请参阅 LICENSE 文件。

🚧 References

Star History

Star History Chart

api-for-open-llm's People

Contributors

ainzlimuru avatar anonno2 avatar beatwade avatar calehh avatar claudegpt avatar freerotate avatar gsy44355 avatar lzhfe avatar tendo33 avatar wey-gu avatar xusenlinzy avatar yimi81 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

api-for-open-llm's Issues

Ubuntu服务器docker启动报错

使用以下命令docker启动,启动以后会马上退出:

docker run -it -d --gpus all --ipc=host --net=host -p 80:80 --name=chatglm \
    --ulimit memlock=-1 --ulimit stack=67108864 \
    -v `pwd`:/workspace \
    llm-api:pytorch \
    python api/app.py \
    --port 80 \
    --allow-credentials \
    --model_name chatglm \
    --model_path THUDM/chatglm-6b \
    --embedding_name moka-ai/m3e-base

通过docker logs查看,报了以下error:

=============
== PyTorch ==
=============

NVIDIA Release 22.12 (build 49968248)
PyTorch Version 1.14.0a0+410ce96

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

/opt/nvidia/nvidia_entrypoint.sh: line 49: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]

谢谢~

尝试加载本地firefly-baichuan13b的模型时显存占用翻倍

服务正常,但是总显存占用量达到了 50 GB 左右,相比单独加载的13B模型,显存占用量翻倍,启动命令如下:

python api/app.py \
--port 80 \
--allow-credentials \
--model_name baichuan13b-firefly \
--model_path /usr/repo/firefly-baichuan-13b \
--device cuda \
--embedding_name /usr/repo/text2vec-large-chinese \
--gpus 0,1 \
--num_gpus 2 \
--prompt_name firefly

chat/completions和completions接口的结果差异较大

最近使用的时候发现,同样的请求,调用chat/completions和completions的效果差异较大,在Baichuan-13B和Qwen上都发现这个现象。
我使用的prompt都是一样的,top-p和temperature也一样。
这是chat请求接收到后的gen_params:
chat generated params: {'model': 'gpt-turbo-3.5', 'prompt': [{'role': 'user', 'content': "内容"}], 'temperature': 1.0, 'top_p': 0.95, 'max_new_tokens': 1024, 'echo': False, 'stream': False, 'stop': ['<|im_end|>']}
这是completion请求接收到后的gen_params:
completion generated params: {'model': 'text-davinci-003', 'prompt': 'xxx', 'temperature': 1.0, 'top_p': 0.95, 'max_new_tokens': 256, 'echo': False, 'stream': False, 'stop': ['<|im_end|>']}
因为我是使用langchain.OpenAI和langchain.chat_models.ChatOpenAI分别调用api的,所以model那块自动填充的是上面这两个模型,底下跑的还是Qwen哈。

使用completion接口的时候,经常会出现模型直接返回空或者多个换行符,不返回实质性内容。使用chat接口就会正常一些。

对baichuan 13b chat的代码存在疑问。

我看到代码中api/prompt_adapter.py‎中对baichuan的支持中,写着

user_prompt = "<reserved_102> {}<reserved_103> "
assistant_prompt = "{}"
stop = ["<reserved_102>", "<reserved_103>"]

但我看huggingface中generation_config.json中配置的user token id和assistant token id跟这个并不一致。

{
"pad_token_id": 0,
"bos_token_id": 1,
"eos_token_id": 2,
"user_token_id": 195,
"assistant_token_id": 196,
"max_new_tokens": 2048,
"temperature": 0.3,
"top_k": 5,
"top_p": 0.85,
"repetition_penalty": 1.1,
"do_sample": true,
"transformers_version": "4.29.2"
}

这到底是什么原因呢?

推理速度疑问(很快)

模型: Baichuan-13B-Chat

问题:使用官方提供的web_demo.py,以及在一些群里有些人反馈Baichuan-13B-Chat越聊会越慢(对话多或者输入和生成token较大),但是这个项目的推理速度就比较快.

  • 对比了一下Baichuan-13B-Chat使用vllm加速推理和这个项目的推理速度比较接近,请问这个项目对推理做了什么优化吗?
  • 如果对推理做了优化,是否牺牲了推理生成内容质量?

报错mixed dtype (CPU): expect input to have scalar type of BFloat16)","code":50001

本地启动,使用CPU模式运行时报错:

Traceback (most recent call last):
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\openai\api_requestor.py", line 331, in handle_error_response
error_data = resp["error"]
KeyError: 'error'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\xxx\Downloads\api-for-open-llm-master-adapt\api-for-open-llm-master\api\untitled0.py", line 16, in
completion = openai.ChatCompletion.create(
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\openai\api_requestor.py", line 226, in request
resp, got_stream = self._interpret_response(result, stream)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\openai\api_requestor.py", line 619, in _interpret_response
self._interpret_response_line(
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\openai\api_requestor.py", line 682, in _interpret_response_line
raise self.handle_error_response(
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\openai\api_requestor.py", line 333, in handle_error_response
raise error.APIError(
openai.error.APIError: Invalid response object from API: '{"object":"error","message":"NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(mixed dtype (CPU): expect input to have scalar type of BFloat16)","code":50001}' (HTTP response code was 500)

用本服务调用baichuan输出结果较差,经常不可用。跟原版调用差别很大

调用的内容模版

xxxx文本内容
\n\n根据上面内容,回答问题:什么时候上下班。

同样的内容。用本服务调用baichuan输出结果较差,经常不可用(经常输出中还有“回答问题:什么时候上下班。”这种重复提问)。而直接使用baichuan的web_demo.py,内部调用的是底层的chat方法却可以。

原版调用
https://github.com/baichuan-inc/Baichuan-13B/blob/main/web_demo.py

是不是本服务代码逻辑调用generate跟原版调用chat的方式还是不一致,有问题呢?

tiktoken.model.encoding_for_model 需要联网?

当我在一台没有联网的机器上启动模型服务,想要推理m3e-base的embedding的时候,发现下面代码报错:

@app.post("/v1/embeddings")
@app.post("/v1/engines/{model_name}/embeddings")
async def create_embeddings(request: EmbeddingsRequest, model_name: str = None):
    if request.model is None:
        request.model = model_name
    inputs = request.input
    if isinstance(inputs, str):
        inputs = [inputs]
    elif isinstance(inputs, list):
        if isinstance(inputs[0], int):
            decoding = tiktoken.model.encoding_for_model(request.model)
            inputs = [decoding.decode(inputs)]
        elif isinstance(inputs[0], list):
            decoding = tiktoken.model.encoding_for_model(request.model)
            inputs = [decoding.decode(text) for text in inputs]

具体报错信息为:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7fe8a1f5b550>: Failed to resolve 'openaipublic.blob.core.windows.net' ([Errno -2] Name or service not known)"))

检查后发现request.model是text-embedding-ada-002,是正常的情况吗?联网也是必须的?

ModuleNotFoundError: No module named 'api'

执行:
python api/app.py --port 8090 --allow-credentials --model_path D:\Project\ChatGLM\chatglm-6b-int4 --embedding_name GanymedeNil/text2vec-large-chinese

返回:
Traceback (most recent call last):
File "D:\Project\ChatGLM\api-for-open-llm-master\api\app.py", line 13, in
from api.constants import ErrorCode
ModuleNotFoundError: No module named 'api'

麻烦帮忙看一下是什么原因呢谢谢

本地启动报错,找不到api模块

(llm-api) root@aiclusternode2-PowerEdge-T640:/data0/api-for-open-llm# python api/app.py --port 6006 --allow-credentials --model_name chatglm --model_path /data0/models/chatglm-6b --device cuda --embedding_name /data0/models/m3e-base --gpus 2 --num-gpus 1
Traceback (most recent call last):
File "/data0/api-for-open-llm/api/app.py", line 15, in
from api.constants import ErrorCode
ModuleNotFoundError: No module named 'api'

docker启动没有报错,但是端口没有监听,无法访问

windows环境
docker启动命令:docker run -it -d --gpus all --ipc=host --net=host -p 8001:8001 --name=chatglm --ulimit memlock=-1 --ulimit stack=67108864 -v E:\AI\api-for-open-llm:/workspace -v E:\AI\models:/workspace/models llm-api:pytorch

app.py运行日志如下:
2023-08-08 10:37:53.347 | INFO | api.router:main:474 - args: Namespace(adapter_model_path=None, allow_credentials=True, allowed_headers=[''], allowed_methods=[''], allowed_origins=['*'], alpha=None, context_len=None, device='cuda', embedding_name=None, gpus=None, host='0.0.0.0', load_in_4bit=False, load_in_8bit=False, model_name='chatglm', model_path='/workspace/models/chatglm2-6b', num_gpus=1, port=8001, prompt_name=None, quantize=16, stream_interval=2, use_ptuning_v2=False)
Loading checkpoint shards: 100%|██████████| 7/7 [00:11<00:00, 1.68s/it]
2023-08-08 10:38:07.270 | INFO | api.generate:init:463 - Using ChatGLM Model for Chat!
INFO: Started server process [693]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)

运行后,可以查到进程,但是netstat -an查不到8001端口

访问出错

我在服务器上启动了容器:
docker run -it -d --gpus all --ipc=host --net=bridge -p 6006:6006 --name=baichuan-13b-chat --ulimit memlock=-1 --ulimit stack=67108864 -v pwd:/workspace -v /data/models:/workspace/models llm-api:pytorch

在容器里面运行:
python3 api/app.py --port 6006 --allow-credentials --model_name baichuan-13b-chat --model_path ../models/Baichuan-13B-Chat --device cuda --embedding_name ../models/text2vec-large-chinese --gpus 2,3,4,5

在本地访问服务器:
ssh -L 7860:127.0.0.1:7860 [email protected]
ssh -L 7860:0.0.0.0:6006 [email protected]

然后浏览器输入IP:
http://127.0.0.1:7860/

报错:

2023-07-28 18-48-09屏幕截图

2023-07-28 18-49-55屏幕截图
2023-07-28 18-50-43屏幕截图

知道什么原因吗?

fine-tuning problem

请问fine-tuning的模型目前只支持Lora吗?如果想用ptuning-v2可以吗? 感谢

关于多个LLM的同时部署的问题

Hi,我想在自己的服务器上同时部署多个开源LLM(比如ChatGLM2-6B、Baichuan-13B等),怎样能够使api里面的“model”字段生效,可以在同一个api_base上支持不同的模型选择?

请问如何使用指定的卡

机器上有4块卡,我想使用第3块显卡启动应用,命令如下:
docker run -it -d --gpus 2 --ipc=host --net=bridge -p 6011:6006 --name=chatglm
--ulimit memlock=-1 --ulimit stack=67108864
-v pwd:/workspace
-v /data0/models:/workspace/models
llm-api:pytorch
python api/app.py
--port 6006
--allow-credentials
--model_name chatglm
--model_path models/chatglm-6b
--device cuda
--embedding_name models/m3e-base
但是跑起来以后,还是使用第一块显卡,是命令参数不对吗?

添加知识库后报错

环境:本地部署
已经可以通过web_demo访问页面并进行聊天,点击“知识库”上传了一个txt,并添加到知识库后

image

api.py 控制台报错:
image

web_demo.py 控制台报错:
image

大佬们帮忙看看

想要加载文件至向量库,报错"zipfile.BadZipFile: File is not a zip file"

image
2023-07-20 17:47:04.335 | DEBUG | tools.doc_qa:get_documents:82 - Loading documents...
2023-07-20 17:47:04.335 | INFO | tools.doc_qa:_get_documents:30 - Loading file: doc_store/chatgpt.txt
2023-07-20 17:47:04.337 | ERROR | tools.doc_qa:_get_documents:74 - Error loading file: doc_store/chatgpt.txt
Traceback (most recent call last):
File "/storage/workplace/liuyw/project/api-for-open-llm-master/applications/tools/doc_qa.py", line 71, in _get_documents
return loader.load_and_split(text_splitter=text_splitter)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/langchain/document_loaders/base.py", line 43, in load_and_split
docs = self.load()
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/langchain/document_loaders/unstructured.py", line 86, in load
elements = self._get_elements()
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/langchain/document_loaders/unstructured.py", line 167, in _get_elements
from unstructured.partition.auto import partition
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/unstructured/partition/auto.py", line 16, in
from unstructured.partition.doc import partition_doc
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/unstructured/partition/doc.py", line 8, in
from unstructured.partition.docx import partition_docx
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/unstructured/partition/docx.py", line 33, in
from unstructured.partition.text_type import (
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 21, in
from unstructured.nlp.tokenize import pos_tag, sent_tokenize, word_tokenize
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/unstructured/nlp/tokenize.py", line 32, in
_download_nltk_package_if_not_present(package_name, package_category)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/unstructured/nlp/tokenize.py", line 21, in _download_nltk_package_if_not_present
nltk.find(f"{package_category}/{package_name}")
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/nltk/data.py", line 555, in find
return find(modified_name, paths)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/nltk/data.py", line 542, in find
return ZipFilePathPointer(p, zipentry)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/nltk/compat.py", line 41, in _decorator
return init_func(*args, **kwargs)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/nltk/data.py", line 394, in init
zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/nltk/compat.py", line 41, in _decorator
return init_func(*args, **kwargs)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/nltk/data.py", line 935, in init
zipfile.ZipFile.init(self, filename)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/zipfile.py", line 1266, in init
self._RealGetContents()
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

报错:更新了最新代码后报错

File "/workspace/./api/router.py", line 270, in create_chat_completion
gen_params = get_gen_params(
File "/workspace/./api/router.py", line 115, in get_gen_params
messages = get_qwen_react_prompt(messages, functions, function_call)
File "/workspace/./api/react_prompt.py", line 35, in get_qwen_react_prompt
name_for_model=info["name_for_model"],
KeyError: 'name_for_model'

怎么能对接dify,应该修改哪里呢?

Hi,我想知道修改哪里可以用本地llm模型对接上dify,我使用如下修改:

os.environ["OPENAI_API_BASE"] = "http://xxx"
os.environ["OPENAI_API_KEY"] = "xxx"

但是进入dify后会要求验证我的Open AI APIkey
image

docker启动报错

启动命令:

docker run -it -d --gpus 2 --ipc=host --net=bridge -p 6006:6006 --name=chatglm \
    --ulimit memlock=-1 --ulimit stack=67108864 \
    -v `pwd`:/workspace \
    llm-api:pytorch \
    python api/app.py \
    --port 6006 \
    --allow-credentials \
    --model_name chatglm \
    --model_path /data0/models/chatglm-6b \
    --device cuda \
    --embedding_name /data0/models/m3e-base

报错:

2023-06-18 16:36:45.162 | INFO     | __main__:<module>:463 - args: Namespace(adapter_model_path=None, allow_credentials=True, allowed_headers=['*'], allowed_methods=['*'], allowed_origins=['*'], device='cuda', embedding_name='/data0/models/m3e-base', gpus=None, host='0.0.0.0', load_in_4bit=False, load_in_8bit=False, model_name='chatglm', model_path='/data0/models/chatglm-6b', num_gpus=1, port=6006, quantize=16, stream_interval=2, use_ptuning_v2=False)
Traceback (most recent call last):
  File "api/app.py", line 465, in <module>
    model, tokenizer = load_model(
  File "/workspace/api/models.py", line 231, in load_model
    model, tokenizer = adapter.load_model(
  File "/workspace/api/models.py", line 73, in load_model
    tokenizer = self.tokenizer_class.from_pretrained(model_name_or_path, **tokenizer_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 643, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 487, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 417, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/data0/models/chatglm-6b'. Use `repo_type` argument if needed.

/data0/models/chatglm-6b 目录有效,其它应用可以正常加载并使用模型

baichuan-13b-chat 500 Internal Server Error

请求baichuan-13b-chat 时报错

curl http://192.168.50.176:17864/v1/chat/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer sk-xxx"
-d '{
"model": "baichuan-13b-chat",
"messages": [{"role": "assistant", "content": "请你做一道英语选择题\n请你一步一步思考并将 思考过程写在【解析】和之间。你将从A,B,C,D中选出正确的答案,并写在【答案】和之间。\n例如:【答案】: A \n完整的题目回答的格式如下:\n【解析】 ... \n【答案】 ... \n请你严格按照上述格式作答。\n题目如下:"},{"role": "user", "content": "1. (5 分) 已知集合 $A=\{x \mid x=3 n+2, n \in N\}, B=\{6,8,10,12,14\}$, 则集合 $A \cap$ $B$ 中元素的个数为 $(\quad)$\nA. 5\nB. 4\nC. 3\nD. 2\n"}]
}'

请求端:
{"object":"error","message":"NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(a Tensor with 573 elements cannot be converted to Scalar)","code":50001}

服务端:
INFO: 192.168.50.176:53934 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.