Comments (11)
@Y-T-G Here is a quick colab notebook I made using T4 GPU to one-shot sparsify/quantize the model. https://colab.research.google.com/drive/1DLB-tE1ide-55b9gzq6kQyrrW0lvT7xj?usp=sharing
It uses our Sparsify tool (in alpha right now, so leave feedback!) to optimize the ONNX with some (dummy) calibration data: https://github.com/neuralmagic/sparsify
Here is the low-sparsity ONNX: https://drive.google.com/file/d/1qMZCtikHtS4Edy0EBP9R7qX5i9eLyOvz/view?usp=sharing
Here is the high-sparsity ONNX: https://drive.google.com/file/d/1XkVYhX4SJfM0mLRuH-RIx6F0xQ9-vAYy/view?usp=drive_link
I used dummy data so the model likely isn't accurate, but you can substitute real input data to maintain.
If you want to talk more on this, happy to jump on a call or join our slack to ask question: https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ
from deepsparse.
Hey @Y-T-G, thanks for the bug report. We can reproduce this issue and have a fix for it. I'll ping you once it's available in deepsparse-nightly
from deepsparse.
Hello @Y-T-G
The latest nightly has been mounted. Please now try pip install deepsparse-nightly
- THANK YOU! 🥇
Jeannie / Neural Magic
from deepsparse.
Hi @Y-T-G here is a Colab notebook showing how to export the ONNX and run it on deepsparse-nightly: https://colab.research.google.com/drive/16r8fLUgAEqPWbDlQmgrmmuq8WvFXxLnQ?usp=sharing
from deepsparse.
A fix will be available the next time a nightly release goes out, and I'll close the issue then!
from deepsparse.
@tlrmchlsmth Cool. Thanks for the fix.
from deepsparse.
When can I expect the nightly to be available?
from deepsparse.
@jeanniefinks @mgoin Thanks. I will try it out.
from deepsparse.
I was able to test it on C++ and it works. Thanks.
from deepsparse.
Thanks for sharing @Y-T-G , very cool project! Let me know if you'd be interested in sparsifying the model for more performance
from deepsparse.
@mgoin Sure. That would be great. I was wondering how to improve the FPS further.
from deepsparse.
Related Issues (20)
- Research: 4-bit quantization HOT 5
- Assertion `!cache_sizes.empty()' failed HOT 2
- transformers_embedding-extraction for text-generation tasks HOT 3
- Question on quantization size HOT 1
- NM: error: Node (/model/Add_1) Op (Add) [ShapeInferenceError] Incompatible dimension HOT 5
- Using output_value as "token_embeddings" is broken for Sentence Transformer HOT 2
- docker access denied error HOT 8
- Unsupported ONNX type 10 for FP16 HOT 5
- Assertion at src/lib/core/topology.cpp:627 HOT 1
- yolo-v8 in onnx-runtime outperforms deepsparse on iMX8 HOT 3
- Python3.12? HOT 3
- How to use for the fintuned roberta model for text classification HOT 3
- Purpose of exporter.export_onnx(sample_batch=torch.randn(1, 1, 28, 28)) HOT 4
- deepsparse.TextGeneration doesn't accept `trust_remote_code` as an arg anymore HOT 1
- YOLOv8 - Display bounding boxes and classes names in image using python. HOT 3
- [Question] about converting onnx model with dynamic batch size input to deepsparse model HOT 2
- How can I make proper request to server HOT 1
- Unable to load DeepSparseSentenceTransformer HOT 1
- How can I use deepsparse instead of ultralytics HOT 2
- Unknown Pipeline task yolov8. Currently supported tasks are ['text_generation', 'opt', 'llama', 'code_gen', 'code_generation', 'codegen', 'image_classification', 'mpt'] HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepsparse.