Comments (4)
如果关掉这个选项 -DPPLNN_ENABLE_CUDA_JIT=OFF,就不会有上面的错误。但是不打开这个选项,性能是不是也会有所下降,我测试了benchmark,各个模型的性能都不如tensorrt好
from ppl.nn.
我也遇到类似的问题,还不知道如何解决。
from ppl.nn.
我也遇到类似的问题,还不知道如何解决。
Hello, I have no RTX3090 to be tested. And I tested mnasnet0_5.onnx on T4 and A100 with cuda=11.2 and JIT=ON, but these all works fine. I suggest you to substitute conv_jit.cc:Line599~600 with:
int cta_num_limit_by_regs = (regs_per_cta == 0) ? cta_num_limit_by_thds : max_regs_per_sm / regs_per_cta; int cta_num_limit_by_smem = (smem_per_cta == 0) ? cta_num_limit_by_thds : max_smem_per_sm / smem_per_cta;
and then test again. Please report the result on RTX3090, thx.
from ppl.nn.
我也遇到类似的问题,还不知道如何解决。
Hello, I have no RTX3090 to be tested. And I tested mnasnet0_5.onnx on T4 and A100 with cuda=11.2 and JIT=ON, but these all works fine. I suggest you to substitute conv_jit.cc:Line599~600 with:
int cta_num_limit_by_regs = (regs_per_cta == 0) ? cta_num_limit_by_thds : max_regs_per_sm / regs_per_cta; int cta_num_limit_by_smem = (smem_per_cta == 0) ? cta_num_limit_by_thds : max_smem_per_sm / smem_per_cta;
and then test again. Please report the result on RTX3090, thx.
Hi,
I tried your suggestion and modified the code, but the problem still persists on 3090. In addition, there is no problem in my multiple tests on the T4 and Jetson series.
from ppl.nn.
Related Issues (20)
- Onnx run error HOT 2
- 请问支持int8在高通芯片上cDSP进行推理吗?
- Slice op question HOT 1
- pplnn run mobilenet v2 model failed. (use cuda) HOT 7
- linux compile error protobuf static assertion failed HOT 3
- malloc_consolidate(): invalid chunk size HOT 2
- pplnn save-input 得到的NDARRAY的 shape不正确 HOT 1
- 如何使用cmake的将ppl.nn和依赖ppl.nn的代码一同编译? HOT 3
- Segmentation fault at ppl::nn::x86::X86Kernel::DumpOutputTensors HOT 5
- 获取模型推理结果(GetOutputs)耗时长 HOT 2
- Install Error HOT 1
- The compilation passed, but an error was reported in test phase HOT 2
- 使用x86 engine运行resnet50 fp16 onnx模型 core dump
- (Ask) why InferInheritedType handle int8 to fp16 out? HOT 3
- Got wrong output shape when run a Gemm op(transB=0) use cuda HOT 4
- Crash with ONNX Split operator
- 关于全局engine,其他线程引用导致的性能下降问题 HOT 4
- 推理误差排查
- 多模型pipeline的示例
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ppl.nn.