skku-eslab / acl-lowp Goto Github PK
View Code? Open in Web Editor NEWThis project forked from arm-software/computelibrary
arm compute library implementation of efficient low precision neural network
License: MIT License
This project forked from arm-software/computelibrary
arm compute library implementation of efficient low precision neural network
License: MIT License
The README mentions 'quantization overhead' in the example results section. Could you provide more information on what this entails and how it affects the overall performance of the library?
Currently, it only supports the GEMM operation.
To make the model inference graph we have to implement
Can you provide more details on how the benchmarking is done, especially the tools or methods used to measure latency?
Currently, we support low precision GEMM operation using a modified NEON integer operation.
Other frameworks support bit packing and bit-serial operations for low precision data.
To compare two operations, we must provide a bit-serial operation.
Current Lowp gemm in this repository is only applied to NEON kernel.
Mali GPUs are said to be vector processing units so each thread which is running gets SIMD-style instructions.
I suppose that you are able to implement Lowp gemm in OpenCL so that Mali GPUs can run the kernel.
I want to know the format of SIMD MAC instructions
Currently, i'm working with TVM to optimize inference on android phone.
How can I apply this kernel examples for other frameworks?
Has the library been tested or updated to work with the latest ARM architectures and devices? It would be beneficial to have information on compatibility and performance with newer hardware
ACL-lowp includes NEON-based kernel for low-bitwidth quantized matrix multiplication.
It is now integrated to arm compute library.
There is more room to optimize kernels for specific devices, such as tile size, loop unrolling factor, etc.
However, tuning the settings of kernels by hands is very inefficient.
Kernel compiler such as TVM provides automatic kernel tuning functionality.
As I think, integrating these kernels into TVM may produce better performance.
Is there any additional support for GEMV except for the GEMM library?
This repository seems only supports low precision gemm.
How about add low precision elementwise support?
I think elementwise operations like matrix addition would be not that complicated to implement in low precision version using a similar technique.
Can you provide guidance on building the library for architectures other than armv7a? Are there specific considerations or modifications needed for different ARM architectures?
Can it only be used in arm structure?
Is there an active community around the ANT Framework for developers to seek help and share experiences?
The project currently supports 4-bit data for GEMM. Are there plans to include support for other data precisions, such as 8-bit or 16-bit? If so, what would be the expected timeline for this?
Can you explain the process of computing with a specific kernel file in more detail?
Hi,
I wonder how much your low precision implementation accelerates gemm on NEON.
Can you provide some evaluation results?
How does the use of NEON SIMD MAC instructions benefit low precision computations in terms of performance and efficiency?
What challenges might arise when using bit-packed data, and how does the library address these challenges?
Are there any detailed explanations or documentations for the test examples provided, such as neon_lowgemm.cpp?
There are problems with inference time when using lowp.
Quantization overhead time is too long and 8bit GEMM is not so faster than floating-point GEMM.
We have to optimize these for future work.
The documentation in some parts, like the building instructions and example usage, seems to be outdated or not detailed enough. Could the documentation be updated to reflect the current state of the project and provide clearer guidance for new users?"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.