neuralmagic / deepsparse Goto Github PK
View Code? Open in Web Editor NEWSparsity-aware deep learning inference runtime for CPUs
Home Page: https://neuralmagic.com/deepsparse/
License: Other
Sparsity-aware deep learning inference runtime for CPUs
Home Page: https://neuralmagic.com/deepsparse/
License: Other
Describe the bug
Trying to run the server-client example.
Environment
Include all relevant environment information:
Ubuntu 18.04
deepsparse 0.1.1
{'vendor': 'GenuineIntel', 'isa': 'avx2', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 8, 'available_cores_per_socket': 8, 'threads_per_core': 1, 'available_threads_per_core': 1, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 12582912}
To Reproduce
from deepsparse.utils import arrays_to_bytes, bytes_to_arrays
Errors
Traceback (most recent call last):
File "server.py", line 62, in
from deepsparse.utils import arrays_to_bytes, bytes_to_arrays
ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'
What is the maximum expected acceleration from Deepsparse? (Assuming that the model is sparsified further with sparseml as well)
In the documentation here, achieving GPU-level performance on CPUs is promised, however in this article (Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference), the maximum acceleration on top of cpu is stated as 2.5x. So it is 2.5 times faster than running on cpu which is not very close to gpu performance.
Thanks!
i stopped model training at 150 epoch & trying to convert it into onnx weight, but im getting issue? so should i need to do continue training? whats the impact if we stop earlier & do resume training>
Great contribution.
It would be better if deepsparse could support ARM devices.
Hello,
I have trained several pruned models and saved the weights (using other pruning methods) and converted the saved models to onnx (using torch). I'm interested in comparing their inference times. The results are confusing as the trends do not stay the same when I change the batch size. Also, for some batch size and model combinations, onnx is faster than Deep Sparse which is confusing. I was wondering if there is an explanation for that, or I'm missing something.
Hi, I want to use your quant and sparse yolov5s model.
when I run compile_file engine (engine.run) I get a list of array of this shapes:
(1,3,40,40,85) , (1,3,20,20,85),..
how can I get croped image (as array) from this output?(as yolo)
yolo itself use non_max_suppression on model output (it has such shape: (1,2550,85) ) and then some post processing on it. how about your model?
because of I need to integrate it with other codes, I want to get and save result from model without annotate.py . pls don't suggest it!
Hi, I tried to run the following codes and it seems that it could run smoothly on my mac/terminal, but always died if I run in jupyter notebook:
from sparseml.pytorch.models import ModelRegistry
from sparseml.pytorch.datasets import ImagenetteDataset, ImagenetteSize
The error information:
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.4.0 (bc00bf8b) (release) (optimized)
Date: 06-05-2021 @ 03:09:41 EDT
OS: Linux gv02.nyu.cluster 4.18.0-193.28.1.el8_2.x86_64 neuralmagic/sparseml#1 SMP Fri Oct 16 13:38:49 EDT 2020
Arch: x86_64
CPU:
Vendor:
Cores/sockets/threads: [0, 0, 0]
Available cores/sockets/threads: [0, 0, 0]
L1 cache size data/instruction: 0k/0k
L2 cache size: 0Mb
L3 cache size: 0Mb
Total memory: 377.337G
Free memory: 334.654G
Assertion at src/lib/core/cpu.cpp:263
Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
1# wand::detail::assert_fail(char const*, char const*, int) in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
2# 0x0000148A66A5E51C in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
3# 0x0000148A66A5EEDD in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
4# 0x0000148AF53F8783 in /lib64/ld-linux-x86-64.so.2
5# 0x0000148AF53FD24F in /lib64/ld-linux-x86-64.so.2
6# _dl_catch_exception in /lib/x86_64-linux-gnu/libc.so.6
7# 0x0000148AF53FC81A in /lib64/ld-linux-x86-64.so.2
8# 0x0000148AF4BD4F96 in /lib/x86_64-linux-gnu/libdl.so.2
9# _dl_catch_exception in /lib/x86_64-linux-gnu/libc.so.6
10# _dl_catch_error in /lib/x86_64-linux-gnu/libc.so.6
11# 0x0000148AF4BD5745 in /lib/x86_64-linux-gnu/libdl.so.2
12# dlopen in /lib/x86_64-linux-gnu/libdl.so.2
13# _PyImport_FindSharedFuncptr in /ext3/miniconda3/bin/python
14# _PyImport_LoadDynamicModuleWithSpec in /ext3/miniconda3/bin/python
15# 0x000055B382CAAE49 in /ext3/miniconda3/bin/python
16# _PyMethodDef_RawFastCallDict in /ext3/miniconda3/bin/python
17# _PyCFunction_FastCallDict in /ext3/miniconda3/bin/python
18# _PyEval_EvalFrameDefault in /ext3/miniconda3/bin/python
19# _PyEval_EvalCodeWithName in /ext3/miniconda3/bin/python
20# _PyFunction_FastCallKeywords in /ext3/miniconda3/bin/python
21# _PyEval_EvalFrameDefault in /ext3/miniconda3/bin/python
22# _PyFunction_FastCallKeywords in /ext3/miniconda3/bin/python
23# _PyEval_EvalFrameDefault in /ext3/miniconda3/bin/python
version:
I tried to install sparseml using pip install sparseml
, and it will install torch with version 1.8.1+cu102
(which I found strange since the doc said sparseml requires <=1.8.0). I also tried to downgrade torch to 1.8.0 but the same error still happens.
The error appears both on CPU or GPU.
Describe the bug
Run the instruction
sparseml.transformers.token_classification \ --output_dir models/teacher \ --model_name_or_path zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none \ --recipe zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none?recipe_type=transfer-token_classification \ --recipe_args '{"init_lr":0.00003}' \ --dataset_name conll2003 --per_device_train_batch_size 32 \ --per_device_eval_batch_size 32 --preprocessing_num_workers 6 \ --do_train --do_eval --evaluation_strategy epoch --fp16 \ --save_strategy epoch --save_total_limit 1
https://neuralmagic.com/use-cases/sparse-named-entity-recognition/
Always prompt
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.neuralmagic.com', port=443): Max retries exceeded with url: /models/download/nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none/vocab.txt?release_version=0.7.0 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f3747532a90>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
Expected behavior
Download data automated.
Environment
use the https://github.com/neuralmagic/deepsparse/tree/main/docker Dokerfile
docker build -t deepsparse_docker .
docker run -itd --gpus all -v $(pwd):/root/deepsparse -p 5543:5543 --name deepsparse deepsparse_docker
docker exec -it deepsparse bash
>>> import deepsparse.cpu
>>> print(deepsparse.cpu.cpu_architecture())
{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 28835840, 'architecture': 'x86_64', 'available_cores_per_socket': 20, 'available_num_cores': 40, 'available_num_hw_threads': 80, 'available_num_numa': 2, 'available_num_sockets': 2, 'available_sockets': 2, 'available_threads_per_core': 2, 'cores_per_socket': 20, 'isa': 'avx512', 'num_cores': 40, 'num_hw_threads': 80, 'num_numa': 2, 'num_sockets': 2, 'threads_per_core': 2, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz', 'vnni': True}
To Reproduce
Exact steps to reproduce the behavior:
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
Additional context
Add any other context about the problem here. Also include any relevant files.
1.i used this repo
https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo
& this command
!python annotate.py
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94
--source "/content/loc1min.mp4"
--quantized-inputs
--image-shape 416 416
--save-dir '/content/ops/'
--model-config '/content/coco128.yaml'
--device 'cpu'
& im getting low fps on cpu, (yolov5s model) its normal fps or should we get 50-60 fps? bcz you have mentioned that model will be 10x faster. but its very less.
What's wrong here? my goal is to use a custom data train model with sparceml & do inference using deepspare.
Just a quick question. Is it possible to use deepsparse for inference directly in other languages e.g. C++, C# or similar? Or is all code written in python?
Bug Description
I am trying to sparsify and run YOLOv7 on Deepsparse. I used SparseML to sparsify the model, and was able to get a sparsity of 0.75. Exporting this sparse YOLOv7 to an onnx model and running it on OpenVino was successful so the model was created fine (albeit with only moderate performance improvements).
Running the same onnx model on Deepsparse however results in an error while compiling the model: compile_model(onnx_filepath, batch_size)
Expected behavior
The onnx model should compile successfully and then run on Deepsparse.
Environment
To Reproduce
Exact steps to reproduce the behavior:
from deepsparse import compile_model
from deepsparse.utils import generate_random_inputs
onnx_filepath = "pruned_finalized_144_torch.onnx"
batch_size = 1
inputs = generate_random_inputs(onnx_filepath, batch_size)
engine = compile_model(onnx_filepath, batch_size) #<---- error
Errors
2022-08-16 13:22:33 deepsparse.utils.onnx INFO Generating input 'images', type = float32, shape = [1, 3, 640, 640]
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.0.2 (7dc5fa34) (release) (optimized) (system=avx2, binary=avx2)
[nm_ort 7fe93a051340 >ERROR< supported_subgraphs /home/ubuntu/build/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:858] ==== FAILED TO COMPILE ====
Unexpected exception message: map::at
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.0.2 (7dc5fa34) (release) (optimized)
Date: 08-16-2022 @ 13:22:34 UTC
OS: Linux AZJAIVISIONGPUL05 5.11.0-1028-azure #31~20.04.2-Ubuntu SMP Tue Jan 18 08:46:15 UTC 2022
Arch: x86_64
CPU: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
Vendor: GenuineIntel
Cores/sockets/threads: [24, 2, 24]
Available cores/sockets/threads: [24, 2, 24]
L1 cache size data/instruction: 32k/32k
L2 cache size: 0.25Mb
L3 cache size: 35Mb
Total memory: 440.897G
Free memory: 339.802G
Assertion at /home/ubuntu/build/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:76
Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
1# 0x00007FE85F146492 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
2# 0x00007FE85F147F2C in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
3# 0x00007FE85F410261 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
4# 0x00007FE85FAA40B8 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
5# 0x00007FE85FAA6ACC in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
6# 0x00007FE85FAA9D99 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
7# 0x00007FE85F3F094B in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
8# 0x00007FE85F3F8BCE in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
9# 0x00007FE85F39F73D in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
10# 0x00007FE85F39F9D5 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
11# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::shared_ptrwand::parallel::scheduler_factory_t) in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libdeepsparse.so
12# 0x00007FE8E9282309 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/deepsparse_engine.so
13# 0x00007FE8E92826DE in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/deepsparse_engine.so
14# 0x00007FE8E92C7D0D in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/deepsparse_engine.so
15# 0x00007FE8E9298A74 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/deepsparse_engine.so
16# _PyMethodDef_RawFastCallDict in /data3/anaconda3/envs/export_yolov7/bin/python
17# _PyObject_FastCallDict in /data3/anaconda3/envs/export_yolov7/bin/python
18# 0x000055E951FD01C3 in /data3/anaconda3/envs/export_yolov7/bin/python
19# PyObject_Call in /data3/anaconda3/envs/export_yolov7/bin/python
20# 0x000055E951F5EF94 in /data3/anaconda3/envs/export_yolov7/bin/python
21# 0x000055E951FDB847 in /data3/anaconda3/envs/export_yolov7/bin/python
22# 0x00007FE86CCFD907 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-37m-x86_64-linux-gnu.so
23# _PyObject_FastCallKeywords in /data3/anaconda3/envs/export_yolov7/bin/python
Please email a copy of this stack trace and any additional information to: [email protected]
Aborted
Additional context
Sparse ONNX model on which the error appeared: link
Sparse pytorch model form which the ONNX model was created: link
Hello team. I was trying to use your yolov5 benchmarking script today, to compare fps achieved by default yolov5 model in pytorch and your optimized version running in deepsparse engine. Up until now, I was using your script to measure the performance of prooned and quantized models (as you may know as I was posting about it on your Slack) but I was using my own inference pipeline to measure the performance of the default yolov5 model. However, I'm starting to increasingly come to the conclusion that this comparison is unfair.
As a result, I decided to use your script, hoping that it will allow me to eliminate the variable - related to the input. And I tried using your script in this way:
python benchmark.py \
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none \
--engine torch \
--batch-size 1 \
--num-iterations 500 \
--num-warmup-iterations 100
Unfortunately, I was unsuccessful and the execution ended with an exception:
Loading torch model for zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
Traceback (most recent call last):
File "benchmark.py", line 518, in <module>
main()
File "benchmark.py", line 514, in main
benchmark_yolo(args)
File "benchmark.py", line 448, in benchmark_yolo
model, has_postprocessing = _load_model(args)
File "benchmark.py", line 381, in _load_model
model = torch.load(args.model_filepath)
File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 658, in load
with _open_file_like(f, 'rb') as opened_file:
File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 231, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 212, in __init__
super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none'
Is that by design? Wouldn't enabling such a benchmark make sense? Or am I just doing something wrong?
Hi there,
I have been experimenting with the DeepSparse engine, and this is my second issue. I thought initially that DeepSparse engine is a general engine designed to exploit the sparsity in a model to achieve faster inference speed. However, recently I discovered that regardless of the sparsity of the model, model's architecture seems to play a bigger role in the final inference speed achieved by the DeepSparse engine. For example, when I compared the Resnet models' inference speed with and without the DeepSparse engine (all models having zero sparsity) , the inference speed using the DeepSparse engine is much faster despite the zero sparsity. This is the same with the EfficientNet models and the MobileNet models. But, the previously described behavior is not observed in networks like ResNext, SeResNext, ViT, etc. I have a feeling that when using DeepSparse engine, pruning/ high model's sparsity plays a secondary role, and the main reason for the speed up is the model's architecture.
May I know what is the view of the NeuralMagic team about the observation/ opinion above regarding the DeepSparse engine?
Thank you.
From community slack https://discuss-neuralmagic.slack.com/archives/C020FPF3MQX/p1657890578280219:
mt
8:09 AM
Hello!
I was using deepsparse on a checkpoint of a yolov5l model generated by --one-shot on a c5.12xlarge and got the following error for batch size >=8
2022-07-14 20:06:54 deepsparse.benchmark.benchmark_model INFO Thread pinning to cores enabled
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized) (system=avx512, binary=avx512)
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized)
Date: 07-14-2022 @ 20:06:59 UTC
OS: Linux data-workstation 5.4.0-1072-aws #77~18.04.1-Ubuntu SMP Thu Apr 7 21:38:47 UTC 2022
Arch: x86_64
CPU: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Vendor: GenuineIntel
Cores/sockets/threads: [24, 1, 48]
Available cores/sockets/threads: [24, 1, 48]
L1 cache size data/instruction: 32k/32k
L2 cache size: 1Mb
L3 cache size: 35.75Mb
Total memory: 92.2119G
Free memory: 90.8325G
Assertion at src/lib/engine/execution/pyramidal/exec_graph_utils.cpp:240
Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
1# wand::detail::assert_fail(char const*, char const*, int) in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
2# 0x00007FEC9F562EBA in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
3# 0x00007FEC9F565E3A in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
4# 0x00007FEC9F550117 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
5# 0x00007FEC9F4C6C01 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
6# 0x00007FEC9F4C8502 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
7# 0x00007FEC9F4C8563 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
8# 0x00007FECA0C4A040 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
9# 0x00007FEDCFE776DB in /lib/x86_64-linux-gnu/libpthread.so.0
10# clone in /lib/x86_64-linux-gnu/libc.so.6
Please email a copy of this stack trace and any additional information to: [email protected]
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized)deepsparse_testing.sh: line 2: 30994 Aborted deepsparse.benchmark -b $i checkpoints/logo_l_pruned_quant.onnx
I tried it with the nightly version as well, but it did not work. The process was just killed.
Describe the bug
Hi, I have been experimenting with pruning with SparseML and inference with DeepSparse.
There are two bugs/ questions that I would like to ask here:
I have found that for my own pruned models, they run slower on DeepSparse with batch size 1 than the unpruned version.
In fact, the pruned models' speed exceeds the unpruned version when the batch size is >=16.
For models downloaded from the SparseZoo, the pruned model is always faster than the unpruned version even at batch size==1.
Is there any known explanation for this?
For both SparseZoo pruned models and my own pruned models, when doing inference on DeepSparse, the speed is higher when using batch size of size 2^n, starting from 16.
If I change the batch size to 15 or 17 for example, the pruned models' speed decreases abruptly compared to the batch size 16 inference time.
This is not observed for unpruned models. The speed is relatively uniform across different batch sizes.
Is this an expected behavior of the DeepSparse engine?
Expected behavior
Environment
Include all relevant environment information:
f7245c8
]: 0.11.1To Reproduce
Run the notebook with the corresponding one-shot pruning recipe inside the zip file.
oneshot_pruning.zip
(I show an example of one-shot pruning because it is faster to reproduce, but the same issue can be reproduced with training-aware
pruning.)
Bug description
I tried training aware pruning on a custom transformer model, reaching the desired accuracy and sparsity (65% total). I then exported the model via ModuleExporter
. When I ran the model via the DeepSparse Engine, I got a slightly higher latency compared to when I ran the same exported model via the ONNX runtime.
Expected behavior
The inference latency of the DeepSparse Engine should be much lower than the inference latency obtained from running the model via the ONNX runtime.
Environment
Include all relevant environment information:
{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576,
'L3_cache_size': 25952256, 'architecture': 'x86_64', 'available_cores_per_socket': 4,
'available_num_cores': 4, 'available_num_hw_threads': 8, 'available_num_numa': 1,
'available_num_sockets': 1, 'available_sockets': 1, 'available_threads_per_core': 2,
'cores_per_socket': 4, 'isa': 'avx512', 'num_cores': 4, 'num_hw_threads': 8, 'num_numa': 1,
'num_sockets': 1, 'threads_per_core': 2, 'vendor': 'GenuineIntel',
'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) CPU @ 3.10GHz', 'vnni': True}
To Reproduce
One can skip the training part and randomly zero out some of a weights of a trained transformer model in PyTorch and try executing the ONNX converted model via the engine and also via the ONNX runtime.
I installed deepsparse using Pip. When I try to import it, Python immediately crashes:
(venv) vvolhejn@eu-login-21 ~> python3
Python 3.8.5 (default, Sep 27 2021, 10:10:37)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import deepsparse
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.1 (18c5ee67) (release) (optimized)
Date: 05-24-2022 @ 15:40:28 CEST
OS: Linux eu-login-21 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022
Arch: x86_64
CPU:
Vendor:
Cores/sockets/threads: [0, 0, 0]
Available cores/sockets/threads: [0, 0, 0]
L1 cache size data/instruction: 0k/0k
L2 cache size: 0Mb
L3 cache size: 0Mb
Total memory: 47.349G
Free memory: 8.72411G
Assertion at src/lib/core/cpu.cpp:273
Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /cluster/home/vvolhejn/venv/lib64/python3.8/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
1# wand::detail::assert_fail(char const*, char const*, int) in /cluster/home/vvolhejn/venv/lib64/python3.8/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
2# 0x00002B061C304E9B in /cluster/home/vvolhejn/venv/lib64/python3.8/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
3# 0x00002B061C30513C in /cluster/home/vvolhejn/venv/lib64/python3.8/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
4# 0x00002B06015189C3 in /lib64/ld-linux-x86-64.so.2
5# 0x00002B060151D59E in /lib64/ld-linux-x86-64.so.2
6# 0x00002B06015187D4 in /lib64/ld-linux-x86-64.so.2
7# 0x00002B060151CB8B in /lib64/ld-linux-x86-64.so.2
8# 0x00002B06020F8FAB in /lib64/libdl.so.2
9# 0x00002B06015187D4 in /lib64/ld-linux-x86-64.so.2
10# 0x00002B06020F95AD in /lib64/libdl.so.2
11# dlopen in /lib64/libdl.so.2
12# _PyImport_FindSharedFuncptr in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
13# _PyImport_LoadDynamicModuleWithSpec in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
14# 0x00002B06018D0449 in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
15# 0x00002B060180CE03 in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
16# PyVectorcall_Call in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
17# _PyEval_EvalFrameDefault in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
18# _PyEval_EvalCodeWithName in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
19# _PyFunction_Vectorcall in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
20# _PyEval_EvalFrameDefault in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
21# 0x00002B0601798209 in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
22# _PyEval_EvalFrameDefault in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
23# 0x00002B0601798209 in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
Please email a copy of this stack trace and any additional information to: [email protected]
fish: Job 1, 'python3' terminated by signal SIGABRT (Abort)
Environment
Include all relevant environment information:
f7245c8
]: 0.12.1 (18c5ee67){
"vendor" : "GenuineIntel",
"isa" : "avx2",
"vnni" : false,
"num_sockets" : 1,
"available_sockets" : 0,
"cores_per_socket" : 0,
"available_cores_per_socket" : 0,
"threads_per_core" : 0,
"available_threads_per_core" : 0,
"L1_instruction_cache_size" : 32768,
"L1_data_cache_size" : 32768,
"L2_cache_size" : 262144,
"L3_cache_size" : 6291456
}
I am trying to compile an LSTM based ONNX model, but the kernel dies. It works with CNN based ONNX. Also, it is possible to have a model with dynamic batch size?
Ubuntu - 18.04
Python - 3.8
ONNX - 1.9.0
deepsparse - 0.12.1
Hello,
I'm BM. A very nice and polite guy.
Please make an ARM version for Raspberry Pi 4
The bug
Unable to install deepsparse-transformers dependencies
Environment
To Reproduce
Steps to reproduce the behavior:
1.pip install deepsparse
2.pip install https://github.com/neuralmagic/transformers/releases/download/nightly/transformers-4.18.0.dev0-py3-none-any.whl
Errors
I wanted to try deepsparse.transformers for summarization task but as an output I got error
which lead me to the error I am writing you about
I tried to find transformers-4.18.0.dev0-py3-none-any.whl following the path but I can not find the folder releases in transformers
https://github.com/neuralmagic/transformers
Dear @jeanniefinks ,
Firstly, thanks for sharing your work.
We are trying to apply sparseml to NanoDet-Plus-m that is considered the most suitable for edge devices til now.
Here are some steps I have been trying:
pytorch (.pth)
model then convert to onnx
model. I even tried: sparseml.onnx_export, I was able to convert to model.onnx
, but still failed in the next step.onnx
model to deepsparse
. It is similar to the issue #218I already tried on varying environments:
OS: Ubuntu16.4/18.4
CPU: avx avx2, grep -o 'avx[^ ]*' /proc/cpuinfo
Varying deepsparse, onnx/onnxruntine versions
torch: 1.8.2+cpu
Code to produce error:
>>> from deepsparse import compile_model
>>> from deepsparse.utils import generate_random_inputs
>>> batch_size = 1
>>> onnx_filepath = "checkpoints/nanodet-plus-m_320.onnx"
>>> inputs = generate_random_inputs(onnx_filepath, batch_size)
[INFO onnx.py:176 ] Generating input 'data', type = float32, shape = [1, 3, 320, 320]
>>> engine = compile_model(onnx_filepath, batch_size)
Error:
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx2, binary=avx2)
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized)
Date: 03-08-2022 @ 08:50:06 UTC
OS: Linux 7a5617e3c49d 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021
Arch: x86_64
CPU: Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz
Vendor: GenuineIntel
Cores/sockets/threads: [8, 2, 16]
Available cores/sockets/threads: [8, 2, 16]
L1 cache size data/instruction: 32k/32k
L2 cache size: 0.25Mb
L3 cache size: 10Mb
Total memory: 127.793G
Free memory: 10.5767G
Assertion at ./src/include/wand/utility/pyramidal/task_graph_utils.hpp:133
Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
1# 0x00007F2DCE3D0D08 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
2# 0x00007F2DCE3D5487 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
3# 0x00007F2DCE3D9B76 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
4# 0x00007F2DCE311F6F in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
5# 0x00007F2DCE3140A5 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
6# 0x00007F2DCE315444 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
7# 0x00007F2DCE315819 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
8# 0x00007F2DCE2C6E1B in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
9# 0x00007F2DCE228704 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
10# 0x00007F2DCE228A32 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
11# 0x00007F2DCE228B78 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
12# 0x00007F2DCE228D5D in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
13# 0x00007F2DCE228FA8 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
14# 0x00007F2DCE229010 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
15# 0x00007F2DCD82BD47 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
16# 0x00007F2DCD8320CF in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
17# 0x00007F2DCD7AA52B in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
18# 0x00007F2DCD79A109 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
19# 0x00007F2DCD79B2C1 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
20# 0x00007F2DCDE266B8 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
21# 0x00007F2DCDE290CC in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
22# 0x00007F2DCDE2C399 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
23# 0x00007F2DCD77B9AB in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
Please email a copy of this stack trace and any additional information to: [email protected]
Aborted
It seems that you have your own onnxruntime?
Could you examine the NanoDet-Plus-m model? I really appreciate your time.
Hi!
How is it going?
At first ,thanks for your good repo and helping to make better and faster model.
I use your yolo example for getting better speed, and I compare base, pruned and quant models as you said. but all result were aproximatly same .
there is no vnni warning, and my server is ubuntu 18
my code is:
import os
yolov5s_base = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none"
yolov5s_pruned ="zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96"
yolov5s_pruned_quant = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94"
source_img = "img.bmp"
print("\n base inference:\n")
bash_cmd = f"python annotate.py {yolov5s_base} --source {source_img} --image-shape 640 640 "
os.system(bash_cmd)
print("\n pruned inference:\n")
bash_cmd = f"python annotate.py {yolov5s_pruned } --source {source_img} --image-shape 640 640 "
os.system(bash_cmd)
print("\n pruned_quant inference:\n")
bash_cmd = f"python annotate.py {yolov5s_pruned_quant} --source {source_img} --quantized-inputs --image-shape 640 640 "
os.system(bash_cmd)
when I run this code in bash script, I get this results:
base inference:
2022-03-08 20:28:15 main INFO Results will be saved to annotation_results/deepsparse-annotations-8
model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none downloaded to /home/fteam/.cache/sparsezoo/cdaaf2c9-a2f1-45d2-841d-45ce123e7b25/model.onnx
2022-03-08 20:28:17 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/cdaaf2c9-a2f1-45d2-841d-45ce123e7b25/model.onnx
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512)
2022-03-08 20:28:18 main INFO Inference 0 processed in 128.20696830749512 ms
2022-03-08 20:28:18 main INFO Results saved to annotation_results/deepsparse-annotations-8
pruned inference:
2022-03-08 20:28:19 main INFO Results will be saved to annotation_results/deepsparse-annotations-9
model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 downloaded to /home/fteam/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx
2022-03-08 20:28:21 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512)
2022-03-08 20:28:23 main INFO Inference 0 processed in 124.91464614868164 ms
2022-03-08 20:28:23 main INFO Results saved to annotation_results/deepsparse-annotations-9
pruned_quant inference:
2022-03-08 20:28:24 main INFO Results will be saved to annotation_results/deepsparse-annotations-10
model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 downloaded to /home/fteam/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx
2022-03-08 20:28:26 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512)
2022-03-08 20:28:28 main INFO Inference 0 processed in 114.76516723632812 ms
2022-03-08 20:28:28 main INFO Results saved to annotation_results/deepsparse-annotations-10
as you see quant pruned has no more speed !
pls guide me to get faster result
thanks!
Describe the bug
Download the yolo example to try.
https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo
But, after run pip3 install -r .\requirements.txt
, throw an error
ERROR: Command errored out with exit status 1:
command: 'c:\users\user\appdata\local\programs\python\python39\python.exe' 'c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py' get_requires_for_build_wheel 'C:\Users\user\AppData\Local\Temp\tmp7thtj5al'
cwd: C:\Users\user\AppData\Local\Temp\pip-install-dwp60hed\deepsparse_e6f915f0eb184b6b86733212e02460ac
Complete output (18 lines):
Traceback (most recent call last):
File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 280, in <module>
main()
File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 114, in get_requires_for_build_wheel
return hook(config_settings)
File "C:\Users\user\AppData\Local\Temp\pip-build-env-0_dlvtg5\overlay\Lib\site-packages\setuptools\build_meta.py", line
177, in get_requires_for_build_wheel
return self._get_build_requires(
File "C:\Users\user\AppData\Local\Temp\pip-build-env-0_dlvtg5\overlay\Lib\site-packages\setuptools\build_meta.py", line
159, in _get_build_requires
self.run_setup()
File "C:\Users\user\AppData\Local\Temp\pip-build-env-0_dlvtg5\overlay\Lib\site-packages\setuptools\build_meta.py", line
281, in run_setup
super(_BuildMetaLegacyBackend,
File "C:\Users\user\AppData\Local\Temp\pip-build-env-0_dlvtg5\overlay\Lib\site-packages\setuptools\build_meta.py", line
174, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 24, in <module>
from utils.artifacts import (
ModuleNotFoundError: No module named 'utils'
Environment
Include all relevant environment information:
I want to convert a BERT classification model twith Deepsparse. I'm unable to find any appropriate examples for the.
I have an ONNX converted a model fine tuned with Hugging face API.
Any help with this regard is highly appreciated.
Thanks,
Subhasis
Describe the bug
For testing purposes, I want to try if my code works on Windows Subsystem for Linux (WSL2). I'm using Ubuntu 18.04LTS.
Once on Ubuntu on WSL, I create a new python virtual env, then pip install deepsparse
.
After that, while trying to import deepsparse I get:
>>> import deepsparse
arch.bin: ./src/include/cpu_info/cpu_info.hpp:515: std::shared_ptr<cpu_info::topology> cpu_info::detect_topology_from_cpuid_api(): Assertion `!thread.exists' failed.
Traceback (most recent call last):
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 119, in _parse_arch_bin
info_str = subprocess.check_output(file_path).decode("utf-8")
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/__init__.py", line 28, in <module>
from .engine import *
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in <module>
from deepsparse.lib import init_deepsparse_lib
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in <module>
CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 216, in cpu_details
arch = cpu_architecture()
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 148, in cpu_architecture
arch = _parse_arch_bin()
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 47, in __call__
self.memo[args] = self.f(*args)
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 123, in _parse_arch_bin
raise OSError(
OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
Expected behavior
Maybe it should work on WSL :)
Environment
Include all relevant environment information:
f7245c8
]: 0.8.0This is basically what's not working
To Reproduce
Exact steps to reproduce the behavior:
import deepsparse
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
>>> import deepsparse
arch.bin: ./src/include/cpu_info/cpu_info.hpp:515: std::shared_ptr<cpu_info::topology> cpu_info::detect_topology_from_cpuid_api(): Assertion `!thread.exists' failed.
Traceback (most recent call last):
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 119, in _parse_arch_bin
info_str = subprocess.check_output(file_path).decode("utf-8")
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/__init__.py", line 28, in <module>
from .engine import *
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in <module>
from deepsparse.lib import init_deepsparse_lib
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in <module>
CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 216, in cpu_details
arch = cpu_architecture()
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 148, in cpu_architecture
arch = _parse_arch_bin()
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 47, in __call__
self.memo[args] = self.f(*args)
File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 123, in _parse_arch_bin
raise OSError(
OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
Additional context
Add any other context about the problem here. Also include any relevant files.
Hi, is there anyone compares DeepSparse with OpenVino?
Describe the bug
Unable to install deepsparse engine.
Expected behavior
Should have installed from the pip install deepsparse command.
Environment
Include all relevant environment information:
To Reproduce
pip install deepsparse
Errors
Please see the attached error log.
When I run the script below, the error occur
deepsparse.server \
--model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned-moderate" \
--task image_classification
ValueError: unsupported task given of image_classification for serve model config task='image_classification' model_path='zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned-moderate' batch_size=1 alias=None kwargs={} engine='deepsparse' num_cores=None scheduler='async'
.
The public https://github.com/neuralmagic/deepsparse/tree/main/examples file has the following link to YOLOv3 examples.
https://github.com/neuralmagic/deepsparse/blob/main/examples/ultralytics-yolov3/
This link leads to a GitHub 404 page, indicating that the target directory does not exist.
I expected the link to bring me to a valid page. Perhaps it should be https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo ?
Describe the bug
I installed DeepSparse on a different machine than the one causing trouble in my last issue, but I'm running into another problem: DeepSparse is only using 50% of the CPU, namely only one of the two cores (vCPUs, rather).
This happens no matter the setting of num_cores
in the deepsparse.compile_model()
call. I've tried everything between 0 and 4.
Expected behavior
The CPU usage should be close to 100%. This is indeed what happens when running using other frameworks such as ONNX Runtime.
Environment
Include all relevant environment information:
f7245c8
]: 0.12.2{'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 1, 'available_cores_per_socket': 1, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 40370176}
To Reproduce
Run the following script:
import psutil
import deepsparse
import tf2onnx
import tensorflow as tf
import numpy as np
import onnx
print(deepsparse.cpu_architecture())
hidden_size = 512
n_layers = 5
orig_model = tf.keras.Sequential(
[tf.keras.layers.Input(shape=(hidden_size,))]
+ [
tf.keras.layers.Dense(hidden_size, activation=tf.nn.relu)
for _ in range(n_layers)
]
)
input_signature = [
tf.TensorSpec([1] + orig_model.input.shape[1:], dtype=np.float32, name="input")
]
onnx_model, _ = tf2onnx.convert.from_keras(orig_model, input_signature, opset=13)
onnx_filepath = "/tmp/debug.onnx"
onnx.save(onnx_model, onnx_filepath)
batch_size = 32
engine = deepsparse.compile_model(
onnx_filepath,
batch_size=batch_size,
num_cores=3,
# util.get_n_cpus_available()
)
data = np.random.randn(batch_size, *orig_model.input.shape[1:]).astype(np.float32)
psutil.cpu_percent() # Run once to initialize
for i in range(500):
engine.run([data])
print("Average CPU usage:", psutil.cpu_percent())
This is the output:
2022-06-20 14:26:39.778197: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/openvino_2022/tools/compile_tool:/opt/intel/openvino_2022/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022/runtime/lib/intel64:/opt/intel/openvino_2022.1.0.643/tools/compile_tool:/opt/intel/openvino_2022.1.0.643/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022.1.0.643/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022.1.0.643/runtime/lib/intel64
2022-06-20 14:26:39.778240: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
{'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 1, 'available_cores_per_socket': 1, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 40370176}
2022-06-20 14:26:41.876161: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/openvino_2022/tools/compile_tool:/opt/intel/openvino_2022/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022/runtime/lib/intel64:/opt/intel/openvino_2022.1.0.643/tools/compile_tool:/opt/intel/openvino_2022.1.0.643/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022.1.0.643/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022.1.0.643/runtime/lib/intel64
2022-06-20 14:26:41.876214: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-06-20 14:26:41.876248: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (n1-west1-1): /proc/driver/nvidia/version does not exist
2022-06-20 14:26:41.876626: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-20 14:26:42.060953: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2022-06-20 14:26:42.061236: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-06-20 14:26:42.063087: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
function_optimizer: function_optimizer did nothing. time = 0.016ms.
function_optimizer: function_optimizer did nothing. time = 0.001ms.
WARNING:tensorflow:From /home/vaclav/venv3.8/lib/python3.8/site-packages/tf2onnx/tf_loader.py:711: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2022-06-20 14:26:42.173088: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2022-06-20 14:26:42.173281: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-06-20 14:26:42.202669: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
constant_folding: Graph size after: 28 nodes (-10), 37 edges (-10), time = 14.961ms.
function_optimizer: function_optimizer did nothing. time = 0.004ms.
constant_folding: Graph size after: 28 nodes (0), 37 edges (0), time = 4.993ms.
function_optimizer: function_optimizer did nothing. time = 0.002ms.
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized) (system=avx512, binary=avx512)
Average CPU usage: 49.6
Inspecting htop
while the script is running also tells me the CPU usage is around 50%.
Hi, I would like to try deepsparse with yolov5 on windows 10.
Could the existing code run on windows 10?
I tried but so far no success.
Describe the bug
When running annotate.py
in https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo
the text annotation on the inference window is too large.
The text annotation (images_per_sec
) on the inference window is so large to the point some of the texts are not visible because they exceed the window size.
Screenshot attached 👇
Expected behavior
All texts should be visible within the window.
Additional context
Related-To: AICoE/elyra-aidevsecops-tutorial#297
Describe the bug
A clear and concise description of what the bug is.
Expected behavior
A clear and concise description of what you expected to happen.
Environment
Include all relevant environment information:
To Reproduce
Exact steps to reproduce the behavior:
conda created environment
installed onnx - and pip installed deepsparse
Errors
error: Native Mac is currently unsupported for the DeepSparse Engine. Please run on a Linux system or within a Linux container on Mac. More info can be found in our docs here: https://docs.neuralmagic.com/deepsparse/source/hardware.html
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for deepsparse
Failed to build deepsparse
ERROR: Could not build wheels for deepsparse, which is required to install pyproject.toml-based projects
Describe the bug
Hello,
I am trying to compile the onnx-converted model of a sparse Huggingface base Wav2Vec2 model (where sparsity was obtained via unstructured magnitude pruning) through compile_model :
dse_network = compile_model(onnx_filepath, batch_size=batch_size, num_cores=1, num_streams=1)
My kernel crashed and I received the following message:
Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
1# 0x00007FFB125A27C4 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
2# 0x00007FFB125A8906 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
3# 0x00007FFB125A89F2 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
4# 0x00007FFB125B12FA in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
5# 0x00007FFB125B1370 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
6# 0x00007FFB11B1F76D in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
7# 0x00007FFB11B25BCF in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
8# 0x00007FFB11A92015 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
9# 0x00007FFB11A81939 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
10# 0x00007FFB11A82AF1 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
11# 0x00007FFB1213F938 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
12# 0x00007FFB121423B3 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
13# 0x00007FFB121456B9 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
14# 0x00007FFB11A6312B in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
15# 0x00007FFB11A6B3CE in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
16# 0x00007FFB11A11C1A in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
17# 0x00007FFB11A11ED5 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
18# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::shared_ptrwand::parallel::scheduler_factory_t) in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libdeepsparse.so
19# 0x00007FFBE3641649 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so
20# 0x00007FFBE364184B in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so
21# 0x00007FFBE36788B6 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so
22# 0x00007FFBE364B0F9 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so
23# 0x0000561F0FD79B66 in /opt/conda/bin/pythonPlease email a copy of this stack trace and any additional information to: [email protected]
Environment
Include all relevant environment information:
f7245c8
]: 1.0.2>>> import deepsparse.cpu
>>> print(deepsparse.cpu.cpu_architecture())
{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 31719424, 'architecture': 'x86_64', 'available_cores_per_socket': 19, 'available_num_cores': 38, 'available_num_hw_threads': 76, 'available_num_numa': 2, 'available_num_sockets': 2, 'available_sockets': 2, 'available_threads_per_core': 2, 'cores_per_socket': 19, 'isa': 'avx512', 'num_cores': 38, 'num_hw_threads': 76, 'num_numa': 2, 'num_sockets': 2, 'threads_per_core': 2, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz', 'vnni': False}
Would you please have any solution?
Thank you
Describe the bug
Can't run quickstart https://github.com/neuralmagic/deepsparse#quickstart-with-sparsezoo-onnx-models on DigitalOcean droplet with "Premium Intel" CPU
Expected behavior
No error message
Environment
To Reproduce
Exact steps to reproduce the behavior:
apt update
apt-get -y install python3-venv build-essential cmake
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install deepsparse
Run "ResNet-50 Dense" example linked above
Errors
OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/root/venv/lib/python3.8/site-packages/deepsparse/arch.bin' returned non-zero exit status 1.
Additional context
The binary doesn't output anything:
(venv) root@ubuntu-s-1vcpu-2gb-intel-lon1-01:~# /root/venv/lib/python3.8/site-packages/deepsparse/arch.bin
(venv) root@ubuntu-s-1vcpu-2gb-intel-lon1-01:~#
Hi.
I would be interested to know if this is possible.
I'm trying export model YOLOv5s Pruned Quantized to ONNX. ONNX version this model from model:zoo doesn't work with opencv dnn. So I tried download .pt version and export to onnx with dynamic option.
here is command:
sparseml.yolov5.export_onnx --weights model.pt --dynamic --simplify --include onnx
And I get this error:
File "/opt/conda/bin/sparseml.yolov5.export_onnx", line 8, in <module>
sys.exit(export())
File "/opt/conda/lib/python3.8/site-packages/sparseml/yolov5/scripts.py", line 60, in export
export_run(**vars(opt))
File "/opt/conda/lib/python3.8/site-packages/yolov5/export.py", line 712, in export_run
main(opt)
File "/opt/conda/lib/python3.8/site-packages/yolov5/export.py", line 706, in main
run(**vars(opt))
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/yolov5/export.py", line 595, in run
model, extras = load_checkpoint(type_='ensemble', weights=weights, device=device) # load FP32 model
File "/opt/conda/lib/python3.8/site-packages/yolov5/export.py", line 531, in load_checkpoint
state_dict = load_state_dict(model, state_dict, run_mode=not ensemble_type, exclude_anchors=exclude_anchors)
File "/opt/conda/lib/python3.8/site-packages/yolov5/export.py", line 555, in load_state_dict
model.load_state_dict(state_dict, strict=not run_mode) # load
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
Unexpected key(s) in state_dict: "model.24.anchor_grid".
Hi,
I'm trying to convert an onnx model to a deepsparse model, here is the code:
from deepsparse import compile_model
from deepsparse.utils import generate_random_inputs
onnx_filepath = "fom.onnx"
batch_size = 1
# Generate random sample input
inputs = generate_random_inputs(onnx_filepath, batch_size)
# Compile and run
engine = compile_model(onnx_filepath, batch_size)
outputs = engine.run(inputs)
**Environment**
Include all relevant environment information:
1. Ubuntu 18.04:
2. Python version 3.7.9. :
3. DeepSparse version 0.8.0 :
4. torch 1.9.0+cu102:
5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
6. CPU {'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': True, 'num_sockets': 2, 'available_sockets': 2, 'cores_per_socket': 18, 'available_cores_per_socket': 18, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 25952256}
**Errors**
[ INFO onnx.py: 128 - generate_random_inputs() ] -- generating random input #0 of shape = [1, 3, 256, 256]
[ INFO onnx.py: 128 - generate_random_inputs() ] -- generating random input #1 of shape = [1, 10, 2]
[ INFO onnx.py: 128 - generate_random_inputs() ] -- generating random input #2 of shape = [1, 10, 2]
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.8.0 (68df72e1) (release) (optimized) (system=avx512, binary=avx512)
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.8.0 (68df72e1) (release) (optimized)
Date: 12-05-2021 @ 12:58:29 EST
OS: Linux visiongpu49 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021
Arch: x86_64
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Vendor: GenuineIntel
OS: Linux visiongpu49 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021
Arch: x86_64
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Vendor: GenuineIntel
Cores/sockets/threads: [36, 2, 72]
Available cores/sockets/threads: [36, 2, 72]
L1 cache size data/instruction: 32k/32k
L2 cache size: 1Mb
L3 cache size: 24.75Mb
Total memory: 507.367G
Free memory: 22.4387G
Assertion at ./src/include/wand/engine/compute/planner.hpp:131
Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
1# 0x00007F36EB17C234 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
2# 0x00007F36EB185889 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
3# 0x00007F36EB185982 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
4# 0x00007F36EB18AA8A in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
5# 0x00007F36EB18AB00 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
6# 0x00007F36EA7E985D in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
7# 0x00007F36EA7EE443 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
8# 0x00007F36EA76BD6B in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
9# 0x00007F36EA75AB3F in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
10# 0x00007F36EA75C1C1 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
11# 0x00007F36EADA9668 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
12# 0x00007F36EADAC0A2 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
13# 0x00007F36EADAF3B9 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
14# 0x00007F36EA73B76C in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
15# 0x00007F36EA7414C3 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
16# 0x00007F36EA6FB982 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
17# 0x00007F36EA6FBC05 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
18# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, int, wand::safe_type<wand::parallel::use_current_affinity_tag, bool>, std::shared_ptr<wand::parallel::scheduler_factory_t>) in /data/lib/python3.7/site-packages/deepsparse/avx512/libdeepsparse.so
19# 0x00007F3771031D1B in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
20# 0x00007F3771031F39 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
21# 0x00007F377105D5C5 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
22# 0x00007F377104B250 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
23# _PyMethodDef_RawFastCallDict in python
Please email a copy of this stack trace and any additional information to: [email protected]
Aborted
Do you have any ideas why the code is failing?
Describe the bug
I was trying to run demo code with YOLOv5 pruned_quant-aggressive_94 model on g4dn.x2large
and encountered this exception.
Stack trace
| 2021-12-16T15:36:11.889+01:00 | Overwriting original model shape (640, 640) to (800, 800)
| 2021-12-16T15:36:11.889+01:00 | Original model path: /mnt/pylot/unleash_models/yolov5_optimised/yolov5-s/pruned_quant-aggressive_94.onnx, new temporary model saved to /tmp/tmpd8kad_7r
| 2021-12-16T15:36:11.890+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized) (system=avx512, binary=avx512)
| 2021-12-16T15:36:13.559+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized)
| 2021-12-16T15:36:13.559+01:00 | Date: 12-16-2021 @ 14:36:13 UTC
| 2021-12-16T15:36:13.559+01:00 | OS: Linux ip-10-0-2-22.ap-southeast-2.compute.internal 4.14.173-137.229.amzn2.x86_64 #1 SMP Wed Apr 1 18:06:08 UTC 2020
| 2021-12-16T15:36:13.559+01:00 | Arch: x86_64
| 2021-12-16T15:36:13.559+01:00 | CPU: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
| 2021-12-16T15:36:13.559+01:00 | Vendor: GenuineIntel
| 2021-12-16T15:36:13.559+01:00 | Cores/sockets/threads: [4, 1, 8]
| 2021-12-16T15:36:13.559+01:00 | Available cores/sockets/threads: [4, 1, 8]
| 2021-12-16T15:36:13.559+01:00 | L1 cache size data/instruction: 32k/32k
| 2021-12-16T15:36:13.559+01:00 | L2 cache size: 1Mb
| 2021-12-16T15:36:13.559+01:00 | L3 cache size: 35.75Mb
| 2021-12-16T15:36:13.559+01:00 | Total memory: 30.9605G
| 2021-12-16T15:36:13.559+01:00 | Free memory: 14.6592G
| 2021-12-16T15:36:13.559+01:00 | Assertion at ./src/include/wand/jit/pooling/common.hpp:239
| 2021-12-16T15:36:13.559+01:00 | Backtrace:
| 2021-12-16T15:36:13.560+01:00 | 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 1# wand::detail::assert_fail(char const*, char const*, int) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 2# 0x00007F4B71E55271 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 3# 0x00007F4B71E55125 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 4# 0x00007F4B71E554FD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 5# 0x00007F4B71E5A4E0 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 6# 0x00007F4B71E5A89A in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 7# 0x00007F4B71E5CDE8 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 8# 0x00007F4B7101F93B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 9# 0x00007F4B7101FAF9 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 10# 0x00007F4B7101B9D5 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 11# 0x00007F4B71042618 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 12# 0x00007F4B71042C91 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 13# 0x00007F4B71070667 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 14# 0x00007F4B70BFA76B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 15# 0x00007F4B70BEA8FC in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 16# 0x00007F4B70BD7A4F in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 17# 0x00007F4B71156499 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 18# 0x00007F4B70C0A3EF in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 19# 0x00007F4B70C28DCD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 20# 0x00007F4B70C28EF3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 21# 0x00007F4B70C295B3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 22# 0x00007F4B71FB8E10 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
| 2021-12-16T15:36:13.560+01:00 | 23# 0x00007F4CFA2C06DB in /lib/x86_64-linux-gnu/libpthread.so.0
| 2021-12-16T15:36:13.560+01:00 | Please email a copy of this stack trace and any additional information to: [email protected]
Environment
torch @ https://download.pytorch.org/whl/cu110/torch-1.7.1%2Bcu110-cp38-cp38-linux_x86_64.whl
torchvision @ https://download.pytorch.org/whl/cu110/torchvision-0.8.2%2Bcu110-cp38-cp38-linux_x86_64.whl
sparseml==0.9.0
sparsezoo==0.9.0
numpy==1.21.4
onnx==1.9.0
onnxruntime==1.7.0
Is there any chance you could help me out to debug that issue?
Web browsers are generally very resource constrained environments, which makes them a perfect place to use an engine like DeepSparse. Web Assembly is slowly becoming more mature, with features like threads and simd becoming available in major browsers, and with more features on the way soon: https://webassembly.org/roadmap/
The ONNX runtime has recently been ported to Web Assembly: https://github.com/microsoft/onnxruntime/tree/master/js/web and I've used it to create a demos of popular ML models running completely in the browser - including OpenAI's CLIP, for example: https://github.com/josephrocca/clip-image-sorter The problem is always that inference is quite slow, and in my experience WebGL tends to crash easily (WebGPU may help fix this when it is released), or the device just doesn't have enough GPU memory for the model, which forces me to use CPU-based backends.
So I'm wondering if the team has considered porting the DeepSparse engine to wasm via Emscripten?
Hi, I'm trying to use DeepSparse to run 1D CNNs for audio processing. I was benchmarking the performance depending on whether or not pruning (with 0.9 sparsity) and/or quantization is used, i.e. four different models. I'm using an AVX512 CPU (no VNNI).
I would expect the ordering of performance (higher is better) to be
If I understand correctly, quantized+pruned might not improve performance because "sparse quantization" only works with VNNI as stated here.
However, the inference times I measured for the models are:
deepsparse.benchmark path-to-model.onnx -nstreams 1 -s sync -b 1
confirmed the times as well.Here are .onnx files of the models I was using -- I hope the naming scheme is clear.
I pruned/quantized the models using SparseML, where I built the recipes using these templates:
base_recipe_template = """
version: 0.1.0
modifiers:
- !EpochRangeModifier
start_epoch: 0.0
end_epoch: 2
- !LearningRateModifier
start_epoch: 0
end_epoch: 2
init_lr: 0.005
lr_class: ExponentialLR
lr_kwargs:
gamma: 0.9
"""
pruning_template = """
- !GMPruningModifier
start_epoch: 0
end_epoch: 1
update_frequency: 1.0
init_sparsity: 0.05
final_sparsity: 0.9
mask_type: block4
params: {layers_to_prune}
"""
quantization_template = """
- !QuantizationModifier
start_epoch: 0.0
"""
where layers_to_prune = sparseml.pytorch.utils.get_prunable_layers(model)
. I only add pruning_template
for the pruned models and quantization_template
for the quantized models.
I initialize the model randomly and train on random data (torch.randn_like(...)
).
Expected behavior: quantization and pruning should decrease inference time, not increase it.
Environment
Include all relevant environment information:
f7245c8
]: 0.12.2{'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 1, 'available_cores_per_socket': 1, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 40370176}
To Reproduce: Run the .onnx files (with batch size 1) and measure latency, e.g. using deepsparse.benchmark path-to-model.onnx -nstreams 1 -s sync -b 1
Am I doing something wrong? Why is this happening?
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Describe the bug
Failed to deploy deepsparse in the cluster.
Expected behavior
Environment
Include all relevant environment information:
f7245c8
]: 0.7.0>>> import deepsparse.cpu
>>> print(deepsparse.cpu.cpu_architecture())
To Reproduce
Image: https://quay.io/repository/thoth-station/neural-magic-deepsparse
Errors
[2021-09-29 16:26:02 +0000] [22] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/usr/lib64/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/app-root/src/wsgi.py", line 61, in <module>
from src.neural_magic_model import Model as NeuralMagicModel
File "/opt/app-root/src/src/neural_magic_model.py", line 27, in <module>
from deepsparse import compile_model
File "/opt/app-root/lib64/python3.8/site-packages/deepsparse/__init__.py", line 28, in <module>
from .engine import *
File "/opt/app-root/lib64/python3.8/site-packages/deepsparse/engine.py", line 44, in <module>
from deepsparse.lib import init_deepsparse_lib
File "/opt/app-root/lib64/python3.8/site-packages/deepsparse/lib.py", line 27, in <module>
CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
File "/opt/app-root/lib64/python3.8/site-packages/deepsparse/cpu.py", line 216, in cpu_details
arch = cpu_architecture()
File "/opt/app-root/lib64/python3.8/site-packages/deepsparse/cpu.py", line 167, in cpu_architecture
raise OSError(
OSError: neuralmagic: cannot determine avx instruction set. Set NM_ARCH to one of avx2,avx512 to continue.
Additional context
Related-To: AICoE/elyra-aidevsecops-tutorial#297
In rare cases where a tensor, used as the input or output to an operation, is larger than 2GB, the engine can segfault. Users should decrease the batch size as a workaround.
In some cases, models running complicated pre- or post-processing steps could diminish the DeepSparse Engine performance by up to a factor of 10x due to hyperthreading, as two engine threads can run on the same physical core. Address the performance issue by trying the following recommended solutions in order of preference:
If that does not give performance benefit or you want to try additional options:
Use the numactl utility to prevent the process from running on hyperthreads.
Manually set the thread affinity in Python as follows:
import os
from deepsparse.cpu import cpu_architecture
ARCH = cpu_architecture()
if ARCH.vendor == "GenuineIntel":
os.sched_setaffinity(0, range(ARCH.num_physical_cores()))
elif ARCH.vendor == "AuthenticAMD":
os.sched_setaffinity(0, range(0, 2*ARCH.num_physical_cores(), 2))
else:
raise RuntimeError(f"Unknown CPU vendor {ARCH.vendor}")
Hi
Is deepsparse useful only for optimisations on CPU?
Can I use deepsparse for memory optimizations on GPU also? If yes can you please share some tutorial for it.
Thanks
Hello, thanks for your work. I got an issue when executing the example. Not sure if I misunderstand anything.
Sorry that I mislabel it to a bug, but I don't know how to delete this label.
Describe the bug
A clear and concise description of what the bug is.
To execute the example - ultralytics-yolo, it would have the error as following.
The AVX instruction set is unknown. Set NM_ARCH to one of avx512,avx2 to continue.
Expected behavior
A clear and concise description of what you expected to happen.
It could successfully have object detection output.
Environment
Include all relevant environment information:
OS [e.g. Ubuntu 18.04]: Ubuntu 16.04
Python version [e.g. 3.7]: 3.9.10
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8
]: 0.11.0
ML framework version(s) [e.g. torch 1.7.1]: torch 1.9.0
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:
>>> import deepsparse.cpu
>>> print(deepsparse.cpu.cpu_architecture())
I can't execute the above command, but got the same error.
My CPU is Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz.
To Reproduce
Exact steps to reproduce the behavior:
python annotate.py zoo:cv/detection/yolov5-s/pytorch/ultr
alytics/coco/pruned_quant-aggressive_94 --source /home/dev/Documents/teco/experiment/tflite1/test1.jpg --image-shape 416 416 --device cpu
# or
python annotate.py -h
Errors
Traceback (most recent call last):
File "/home/dev/Documents/teco/experiment/deepsparse/examples/ultralytics-yolo/annotate.py", line 119, in
from deepsparse import compile_model
File "/home/dev/mambaforge/envs/tflite1-env/lib/python3.9/site-packages/deepsparse/init.py", line 33, in
from .engine import *
File "/home/dev/mambaforge/envs/tflite1-env/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in
from deepsparse.lib import init_deepsparse_lib
File "/home/dev/mambaforge/envs/tflite1-env/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in
CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
File "/home/dev/mambaforge/envs/tflite1-env/lib/python3.9/site-packages/deepsparse/cpu.py", line 242, in cpu_details
arch = cpu_architecture()
File "/home/dev/mambaforge/envs/tflite1-env/lib/python3.9/site-packages/deepsparse/cpu.py", line 184, in cpu_architecture
raise OSError(
OSError: Neural Magic: The AVX instruction set is unknown. Set NM_ARCH to one of avx512,avx2 to continue.
Additional context
Add any other context about the problem here. Also include any relevant files.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.