skku-eslab / acl-lowp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from arm-software/computelibrary

24.0 24.0 16.0 157.69 MB

arm compute library implementation of efficient low precision neural network

License: MIT License

Python 0.13% C++ 86.08% HTML 0.01% C 12.68% C# 1.11%

arm-compute-library cnn inference-engine machine-learning quantization

acl-lowp's People

Stargazers

Watchers

Forkers

redcarrottt hoseung2 leehayun gleegend gh-jo jungjeeyoon 7bvcxz choijinwoo01 sunghern podossiu wizehun sechakb jinseok103 nugabom asdfrv100 kimjoohyungsd

acl-lowp's Issues

Quantization Overhead Clarification

The README mentions 'quantization overhead' in the example results section. Could you provide more information on what this entails and how it affects the overall performance of the library?

Implementation of the model inference

Currently, it only supports the GEMM operation.

To make the model inference graph we have to implement

bit packing operation
quantize function
dequantize function

Error occurs when build

I built ACL-lowp in ubuntu 18.04 server and got this error.

How could I fix this?

Benchmarking details

Can you provide more details on how the benchmarking is done, especially the tools or methods used to measure latency?

Support for bit-serial operation

Currently, we support low precision GEMM operation using a modified NEON integer operation.

Other frameworks support bit packing and bit-serial operations for low precision data.

To compare two operations, we must provide a bit-serial operation.

Apply to OpenCL kernel

Current Lowp gemm in this repository is only applied to NEON kernel.
Mali GPUs are said to be vector processing units so each thread which is running gets SIMD-style instructions.

I suppose that you are able to implement Lowp gemm in OpenCL so that Mali GPUs can run the kernel.

instructions set

I want to know the format of SIMD MAC instructions

Applying the kernel_examples for other frameworks

Currently, i'm working with TVM to optimize inference on android phone.

How can I apply this kernel examples for other frameworks?

Compatibility with Latest ARM Architectures

Has the library been tested or updated to work with the latest ARM architectures and devices? It would be beneficial to have information on compatibility and performance with newer hardware

Integrating ACL-lowp to TVM

ACL-lowp includes NEON-based kernel for low-bitwidth quantized matrix multiplication.
It is now integrated to arm compute library.

There is more room to optimize kernels for specific devices, such as tile size, loop unrolling factor, etc.
However, tuning the settings of kernels by hands is very inefficient.

Kernel compiler such as TVM provides automatic kernel tuning functionality.
As I think, integrating these kernels into TVM may produce better performance.

GEMV Library

Is there any additional support for GEMV except for the GEMM library?

expand into element-wise operations

This repository seems only supports low precision gemm.

How about add low precision elementwise support?

I think elementwise operations like matrix addition would be not that complicated to implement in low precision version using a similar technique.

Building for Different Architectures

Can you provide guidance on building the library for architectures other than armv7a? Are there specific considerations or modifications needed for different ARM architectures?

Device excluding arm structure

Can it only be used in arm structure?

Community Support

Is there an active community around the ANT Framework for developers to seek help and share experiences?

Support for Other Data Precisions

The project currently supports 4-bit data for GEMM. Are there plans to include support for other data precisions, such as 8-bit or 16-bit? If so, what would be the expected timeline for this?

Kernel file modification

Can you explain the process of computing with a specific kernel file in more detail?

Evaluation of lowpgemm

Hi,
I wonder how much your low precision implementation accelerates gemm on NEON.
Can you provide some evaluation results?

NEON SIMD MAC Instructions

How does the use of NEON SIMD MAC instructions benefit low precision computations in terms of performance and efficiency?

Bit-Packed GEMM

What challenges might arise when using bit-packed data, and how does the library address these challenges?

Running Tests and Examples

Are there any detailed explanations or documentations for the test examples provided, such as neon_lowgemm.cpp?

Performance analysis for example result.

There are problems with inference time when using lowp.

Quantization overhead time is too long and 8bit GEMM is not so faster than floating-point GEMM.
We have to optimize these for future work.

The documentation in some parts, like the building instructions and example usage, seems to be outdated or not detailed enough. Could the documentation be updated to reflect the current state of the project and provide clearer guidance for new users?"

skku-eslab / acl-lowp Goto Github PK

acl-lowp's People

Stargazers

Watchers

Forkers

acl-lowp's Issues

Recommend Projects

Recommend Topics

Recommend Org