Git Product home page Git Product logo

acl-lowp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

acl-lowp's Issues

Quantization Overhead Clarification

The README mentions 'quantization overhead' in the example results section. Could you provide more information on what this entails and how it affects the overall performance of the library?

Implementation of the model inference

Currently, it only supports the GEMM operation.

To make the model inference graph we have to implement

  1. bit packing operation
  2. quantize function
  3. dequantize function

Benchmarking details

Can you provide more details on how the benchmarking is done, especially the tools or methods used to measure latency?

Support for bit-serial operation

Currently, we support low precision GEMM operation using a modified NEON integer operation.

Other frameworks support bit packing and bit-serial operations for low precision data.

To compare two operations, we must provide a bit-serial operation.

Apply to OpenCL kernel

Current Lowp gemm in this repository is only applied to NEON kernel.
Mali GPUs are said to be vector processing units so each thread which is running gets SIMD-style instructions.

I suppose that you are able to implement Lowp gemm in OpenCL so that Mali GPUs can run the kernel.

Compatibility with Latest ARM Architectures

Has the library been tested or updated to work with the latest ARM architectures and devices? It would be beneficial to have information on compatibility and performance with newer hardware

Integrating ACL-lowp to TVM

ACL-lowp includes NEON-based kernel for low-bitwidth quantized matrix multiplication.
It is now integrated to arm compute library.

There is more room to optimize kernels for specific devices, such as tile size, loop unrolling factor, etc.
However, tuning the settings of kernels by hands is very inefficient.

Kernel compiler such as TVM provides automatic kernel tuning functionality.
As I think, integrating these kernels into TVM may produce better performance.

GEMV Library

Is there any additional support for GEMV except for the GEMM library?

expand into element-wise operations

This repository seems only supports low precision gemm.

How about add low precision elementwise support?

I think elementwise operations like matrix addition would be not that complicated to implement in low precision version using a similar technique.

Building for Different Architectures

Can you provide guidance on building the library for architectures other than armv7a? Are there specific considerations or modifications needed for different ARM architectures?

Community Support

Is there an active community around the ANT Framework for developers to seek help and share experiences?

Support for Other Data Precisions

The project currently supports 4-bit data for GEMM. Are there plans to include support for other data precisions, such as 8-bit or 16-bit? If so, what would be the expected timeline for this?

Evaluation of lowpgemm

Hi,
I wonder how much your low precision implementation accelerates gemm on NEON.
Can you provide some evaluation results?

NEON SIMD MAC Instructions

How does the use of NEON SIMD MAC instructions benefit low precision computations in terms of performance and efficiency?

Bit-Packed GEMM

What challenges might arise when using bit-packed data, and how does the library address these challenges?

Running Tests and Examples

Are there any detailed explanations or documentations for the test examples provided, such as neon_lowgemm.cpp?

Performance analysis for example result.

There are problems with inference time when using lowp.

Quantization overhead time is too long and 8bit GEMM is not so faster than floating-point GEMM.
We have to optimize these for future work.

Documentation Update

The documentation in some parts, like the building instructions and example usage, seems to be outdated or not detailed enough. Could the documentation be updated to reflect the current state of the project and provide clearer guidance for new users?"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.