Comments (8)
感谢您的反馈~
-
首先bce的两个模型都是bert base规模,比其他large规模的模型,效率高大约3倍;
-
关于这两个模型高效推理,我们最近release出了一个版本(推理框架onnxruntime-gpu):
-
bce-embedding高效推理:https://github.com/netease-youdao/QAnything/blob/qanything-python/qanything_kernel/connector/embedding/embedding_onnx_backend.py
-
bce-reranker高效推理(包含我们长passages精排方案,被一些其他开源项目采用):https://github.com/netease-youdao/QAnything/blob/qanything-python/qanything_kernel/connector/rerank/rerank_onnx_backend.py
- 如果在集成上述高效推理方案有任务问题,欢迎在此issue提出!
from bcembedding.
感谢您的反馈~
- 首先bce的两个模型都是bert base规模,比其他large规模的模型,效率高大约3倍;
- 关于这两个模型高效推理,我们最近release出了一个版本(推理框架onnxruntime-gpu):
- bce-embedding高效推理:https://github.com/netease-youdao/QAnything/blob/qanything-python/qanything_kernel/connector/embedding/embedding_onnx_backend.py
- bce-reranker高效推理(包含我们长passages精排方案,被一些其他开源项目采用):https://github.com/netease-youdao/QAnything/blob/qanything-python/qanything_kernel/connector/rerank/rerank_onnx_backend.py
- 如果在集成上述高效推理方案有任务问题,欢迎在此issue提出!
感谢!这部分代码已经跑起来了,我发现推理的结果是不一样的,是因为模型转换丢失精度吗?
from bcembedding.
sentences = ["This is a test sentence.", "Another sentence for embedding."]
我进行了简单的测试,发现onnx 推理比SentenceTransformer还慢是为什么?
from bcembedding.
1、你的onnx模型是自己转的,还是直接下载我们开源的onnx模型?
2、如果是下载我们在qanything开源的onnx模型,是没问题的。注意是qanything最新放出来的embedding和reranker的onnx模型。
3、onnx和torch的结果稍许差异是正常的,看一下cos相似度是不是0.99,是的话就没问题。
4、推理慢的原因可能是:a、你的onnxruntime是cpu版本的,解决方案是先卸载onnxruntime,再pip install onnxruntime-gpu;b、确定onnxruntime-gpu的包是对的,用我们的推理代码,应该不会有问题的,注意看一下CUDAExecutionProvider是否成功。
from bcembedding.
1、你的onnx模型是自己转的,还是直接下载我们开源的onnx模型? 2、如果是下载我们在qanything开源的onnx模型,是没问题的。注意是qanything最新放出来的embedding和reranker的onnx模型。 3、onnx和torch的结果稍许差异是正常的,看一下cos相似度是不是0.99,是的话就没问题。 4、推理慢的原因可能是:a、你的onnxruntime是cpu版本的,解决方案是先卸载onnxruntime,再pip install onnxruntime-gpu;b、确定onnxruntime-gpu的包是对的,用我们的推理代码,应该不会有问题的,注意看一下CUDAExecutionProvider是否成功。
感谢指导,已经定位到原因了,应该是是cuda libcublasLt.so.11的问题,导致只使用了CPU
from bcembedding.
from bcembedding.
余弦相似度: 0.9999987920724557
from bcembedding.
后面简单跑了下,平均快了将近20倍
from bcembedding.
Related Issues (20)
- bce-reranker-base_v1原生支持的passage长度问题 HOT 1
- Are there any optimization methods to support the optimized run of this project? HOT 1
- RerankerModel的token数量 HOT 4
- 复现 LlamaIndex RAG评测结果时的一些问题 HOT 1
- Xinference对比手动部署 HOT 4
- 单独部署-用api的方式提供服务 HOT 1
- python3.11报错 HOT 1
- 请问有 C-MTEB (非英文)的测试结果么? HOT 1
- 请问有没有提交到最新的mteb leaderboard的结果对比?
- 请问出现这种情况:Token indices sequence length is longer than the specified maximum sequence length for this model (781 > 512),如何解决? HOT 1
- query只能是一个句子吗?多轮交互下的召回或者排序,BCEEmbedding支持处理呢? HOT 3
- Input validation error: `inputs` must have less than 512 tokens. Given: 1009 HOT 1
- accelerate加速,微调bce模型报错:Could not find the transformer layer class to wrap in the model. HOT 1
- FYI:最近测了 bce chunksize & bge 对比 HOT 2
- 请问如何在特定数据集上优化这个词向量模型?
- 为什么匹配效果越来越差 HOT 1
- 多GPU并发调用报错:Core dumped HOT 4
- reranker模型使用时,出现在待排序的doc中添加前缀会出现排序结果更好的情况 HOT 1
- 咨询关于 Rerank Overlap 相关的问题 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bcembedding.