Comments (6)
可以去掉,因为 transformers 原来用来生成 token 映射的方式比较慢(两层循环遍历 token 组合),所以这里做个 cache 以免下次重复生成。可以删掉
还有 internlm 的 tp 似乎确实存在问题,正在排查中。
from lmdeploy.
profile_serving.py用的数据是:
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
from lmdeploy.
重新编译并且安装后 用python3 -m lmdeploy.serve.client {server_ip_addresss}:33337 internlm
测试了下 ,结果是正常的了
benchmark中关于profile_serving中测试集的dataset能提供一下嘛,看脚本里面对json格式是有要求的
另外profile_generation测试对应tp为2的workspace目录会coredump
from lmdeploy.
@lvhan028 profiling_serving 的 dataset 方面方便提供一些信息吗
coredump 的问题这边的环境上无法复现,能提供更详细的环境信息吗?
from lmdeploy.
具体步骤如下
#启动容器 宿主机当前为双T4机器
docker run -itd --net=host --name internlm_server
--gpus all -v /data/work/deploy-interlm:/workspace
-v /data/models/:/models
--shm-size 16g
--cap-add=SYS_PTRACE --cap-add=SYS_ADMIN --security-opt seccomp=unconfined
--env NCCL_LAUNCH_MODE=GROUP openmmlab/lmdeploy:latest bash
# 下载仓库并编译
git clone https://github.com/InternLM/lmdeploy.git
mkdir build && cd build
bash ../generate.sh
make && make install
#转换模型为turbomind格式
python3 -m lmdeploy.serve.turbomind.deploy internlm-chat-7b /models/internlm-chat-7b hf --tokenizer_path /models/internlm-chat-7b/tokenizer.model --tp 2
#benchmark
cd /opt/tritonserver/lmdeploy
export PYTHONPATH=./:$PYTHONPATH
python3 benchmark/profile_generation.py --model_path /opt/tritonserver/lmdeploy/workspace --model_name internlm --concurrency 4 --input_seqlen 1 --output_seqlen 512--test_round 4
操作就是这样 这个会和拆分有关嘛
from lmdeploy.
不清楚,profile_generation 和 lmdeploy.turbomind.chat 用的接口类似,如果 chat 能加载成功的话 profile 应该也可以。
from lmdeploy.
Related Issues (20)
- [Bug] InternVL-Chat-V1-5量化报错 HOT 4
- [Bug] w8a8量化后的InternVL-Chat-V1-5模型lmdeploychat启动transformers报错 HOT 2
- [Feature] Guided Decoding HOT 1
- Batch infer seems no speed up HOT 8
- [Bug] WSL2环境下,0.4.2做InternVL量化时,磁盘写入速度极低 HOT 6
- 请问 TurboMind 支持cogvlm系列么? HOT 1
- [Bug] UnboundLocalError: local variable 'head_num' referenced before assignment HOT 4
- [feature] need rope_scaling_factor args in benchmark/profile_generation.py to enable dynamic NTK.
- [Feature] AWQ量化的校准数据集
- [Feature] ModuleNotFoundError: No module named 'timm' with Internal Vision model HOT 2
- [Feature] health endpoint HOT 2
- [Feature] model name should be settable or follow original full HF link name, not random new name HOT 6
- [Bug] lmdeploy lite auto_awq: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! HOT 1
- [Bug] result of W4A16 quantized Qwen1.5-1.8B-Chat model not correct HOT 1
- Support for SWIFT finetuned models HOT 4
- 总是看到一个using default GEMM algo的WARNING,是否会因为使用了默认的GEMM而影响速度或者吞吐量? HOT 1
- [Bug] 下载代码执行internvl-v1.5量化,导入本地模型时报错 HOT 11
- [Feature] peft<=0.9.0 要求的版本要求太低,与较多环境要求peft>0.10冲突,能否修改
- [Feature] Support for LLaVA-NeXT HOT 1
- [Docs] How are multiple images handled? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lmdeploy.