arm-software / cmsis-nn Goto Github PK

View Code? Open in Web Editor NEW

150.0 12.0 41.0 6.44 MB

CMSIS-NN Library

Home Page: https://arm-software.github.io/CMSIS-NN

License: Apache License 2.0

CMake 0.67% C 94.26% Python 5.02% Shell 0.05%

cmsis-nn's People

Stargazers

Watchers

cmsis-nn's Issues

What is the proper way to define `__CLZ`?

__CLZ is used in the softmax functions, however it is not defined for non-Arm compilers in CMSIS-NN.

Compare this to CMSIS-DSP, defines it by ultimately including dsp/none.h which contains an inline C function for it (and other intrinsics).

One can solve this by including dsp/none.h in the softmax headers, but that requires both installing CMSIS-DSP and hacking this library.

Include/Internal/arm_nn_compiler.h: undefined __ARM_FEATURE_MVE

Missing check that __ARM_FEATURE_MVE is defined before checking its value

Include/Internal/arm_nn_compiler.h:130

#if ((__ARM_FEATURE_MVE & 3) == 3) || (__ARM_FEATURE_MVE & 1)
    #include <arm_mve.h>
#endif

Update to:

#if defined(__ARM_FEATURE_MVE)
    #if ((__ARM_FEATURE_MVE & 3) == 3) || (__ARM_FEATURE_MVE & 1)
        #include <arm_mve.h>
    #endif
#endif

Vector of variable length used in arm_convolve_1x1_s8_fast

Compilation fails with [-Werror,-Wvla] and buffers of unknown size in stack shouldn't be allocated in stack as there is a memory already allocated in the scratch buffer for that.

Documentation warning

There are a couple of documentation warnings.. If you scroll down to the end of https://github.com/ARM-software/CMSIS-NN/pull/78/files , you can see it..

Unable to compile CMSIS-NN for cortex-M7

Hi,
As per readme.md file I could successfully use command "cmake .. -DCMAKE_TOOLCHAIN_FILE=/home/supratim/toolchains/ethos-u-core-platform/cmake/toolchain/arm-none-eabi-gcc.cmake -DTARGET_CPU=cortex-m7 -DCMSIS_PATH=/home/supratim/CMSIS-NN".

But when from the build folder, I executed 'make' command, the compilation fail with following error:
"In file included from /home/supratim/CMSIS-NN/Include/arm_nnfunctions.h:114,
from /home/supratim/CMSIS-NN/Source/ConvolutionFunctions/arm_convolve_s8.c:31:
/home/supratim/CMSIS-NN/Include/arm_nn_math_types.h:93:10: fatal error: cmsis_compiler.h: No such file or directory
93 | #include "cmsis_compiler.h"

I know that cmsis_compiler.;h exists in https://github.com/ARM-software/CMSIS_5/tree/develop/CMSIS/Core/Include, but this is a separate github repo and ideally one github repo should not have dependency on another one.

Could you please let me know if I am missing anything?

Thanks
Supratim

reserved identifier violation

I would like to point out that identifiers like “_ARM_NNFUNCTIONS_H” and “_UART_STDOUT_H_” do not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?

Does the MVEI version of CMSIS NN have implementations of q15 and q7?

Support for N>1 in maxpool

One of the users of TVM via CMSIS-NN needs support of N>1 in maxpool. Here is the link: https://discuss.tvm.apache.org/t/cmsis-nn-qnn-max-pool2d-layout-on-cortex-m7-without-npu/12712/3. Would it be possible to add support for this?

Doubts regarding some of the functions

(1) Can you please explain what exactly the line arm_nn_requantize() perform.
(2) Also, the functionality of the arm_nn_activation_s16 .
(3) How the sigmoid lookup table is created?
(4) Explain the working of vmaxq_s32(acc, vdupq_n_s32(NN_Q15_MIN)) which is present in the folder
NNSupportFunctions->arm_nn_vec_mat_mul_result_acc_s8.c
Can you please clear the above doubts as they will be very useful for me.

arm_elementwise_add_s8 does not work properly for a Residual Architecture

Hi, appreciate for your work!

I am trying to build a Residual-like NN module as the following:

import tensorflow.keras as keras
import keras.layers as KL

nnInput = keras.Input(
	shape=(inputH, inputW, inputCh))
x = KL.Convolution2D(
	1, 
	(3, 3), 
	strides=(2, 2), 
	padding='same', 
	name='conv1',
	bias_initializer='glorot_uniform',
	activation='relu')(nnInput)

shortcut = x 

x = KL.Conv2D(
	filters=num_filters, 
	kernel_size=1, 
	strides=stride, 
	name=name + "_1_conv",
	activation='relu')(x)

x = KL.Conv2D(
	filters=num_filters, 
	kernel_size=kernel_size, 
	padding="SAME", 
	name=name + "_2_conv",
	activation='relu')(x)
x = KL.Conv2D(
	filters=4 * num_filters, 
	kernel_size=1, 
	name=name + "_3_conv")(x)
x = KL.Add(
	name=name + "_add")([shortcut, x])

The calculation results of CMSIS-NN and tensorflow-lite before the elementwise add are all the same. I follow the testing case in the repo to build the arm_elementwise_add_s8 accordingly but it is still not working. I am wondering is the function proved to be compatible with tensorflow-lite or do I miss something?

Thanks in advance!

Requantize INT_32 to INT8 using arm_nn_requantize

Hi,
I am using this FC s8 CMSIS layer for a project. After we invoke a FC s8 kernel we have Int32 output that we need to dequantize-requantize (Int32->FP->Int8). For example if we have this FC layer in the image to dequantize-requantize the 32-bit output of the FC layer we need to apply this equation: Output_Int8 = (0.0039215669967234135*Weight_mult/0.14308570325374603) *Output_Int32 + 69. => Multiplier * Output_Int32 + shifter

CMSIS-NN has arm_nn_requantizefunction that takes as input Int32 Multiplier and shifter and returns (val * multiplier)/(2 ^ shift). Can you explain me how to map the above FP Multiplier and Int32 shifter into the requantize function?

The mat_mul_kernel_s16 function needs PKHTB & PKHBT to re-order the value

Hi,
I found that at below line:

CMSIS-NN/Source/NNSupportFunctions/arm_nn_mat_mul_kernel_s16.c

Line 93 in d071e9f

ip_a0 = read_and_pad(ip_a0, &a01, &a02);

We use a read_and_pad to process the weights for the value expanding from q7_t to q15_t, and also a group: __PKHxx for
reording the value from (a0, a2, a1, a3) to (a0, a1, a2, a3).
My question is that why we add this two PKHxx operations, I think that We can still use the (a0, a2, a1, a3), if we process the
input with the same way (I found that the 1x1 conv2d has the similarity operation, without __PKHxx). So that we can save two-instructs and then save the inference time.

Regards,
Crist

Is there a roadmap?

Hi, thank you for your work!

Is there a future operation roadmap in long/short term like adding RNN... transformer...?

'asm operand has impossible constraint' - GCC 11.3

GCC version : arm-none-eabi-gcc (Arm GNU Toolchain 11.3.Rel1) 11.3.1 20220712

Reproducer: arm-none-eabi-gcc -mcpu=cortex-m55 -Os -I../../Include/ -mfloat-abi=hard -mfpu=auto -S arm_nn_mat_mult_nt_t_s8.c

The issue is not seen in newer releases or other optimization levels.

Confusing doc on `arm_nn_requantize`

The documentation of arm_nn_requantize is

/**
 * @brief           Requantize a given value.
 * @param[in]       val         Value to be requantized
 * @param[in]       multiplier  multiplier. Range {NN_Q31_MIN + 1, Q32_MAX}
 * @param[in]       shift       left or right shift for 'val * multiplier'
 *
 * @return          Returns (val * multiplier)/(2 ^ shift)
 *
 */

But the CMSIS_USE_SINGLE_ROUNDING path does not compute that, but some rounding of (val * multipler) / (2^(shift+31))

Besides, the left or right shift part of the doc is not very clear: how to specify the direction of the shift?
It seems that it is based on the sign of shift, with positive shifts being left shift and negative shifts being right shifts.

IAR compiler issues

There are some compile error for IAR

__STATIC_FORCEINLINE void arm_memcpy_s8(int8_t *__RESTRICT dst, const int8_t *__RESTRICT src, uint32_t block_size)
                                                             ^
"Include/arm_nnsupportfunctions.h",953  Error[Pe018]: 
          expected a ")"

  __STATIC_FORCEINLINE int32x4_t arm_doubling_high_mult_mve(const int32x4_t m1, const int32_t m2)
                       ^
"Include/arm_nnsupportfunctions.h",990  Error[Pe020]: 
          identifier "int32x4_t" is undefined

Occur an ASM error when compiling arm_nn_mat_mul_core_4x_s8

Processor: Cortex-M55
Error message:
cmsis-nn/Source/NNSupportFunctions/arm_nn_mat_mul_core_4x_s8.c: In function 'arm_nn_mat_mul_core_4x_s8':
/(my project path)/cmsis-nn/Include/Internal/arm_nn_compiler.h:97:23: error: 'asm' operand has impossible constraints
97 | #define __ASM __asm
| ^~~~~
cmsis-nn/Source/NNSupportFunctions/arm_nn_mat_mul_core_4x_s8.c:84:9: note: in expansion of macro '__ASM'
84 | __ASM volatile(" .p2align 2 \n"
| ^~~~~

Why is this happening? And how to fix it?
Appreciate!

Unit tests fails to build with -O0

GCC cannot build all unit tests with -O0.

Incorrect condition check for buffer size in DW convolve s16

The condition to use arm_depthwise_conv_fast_s16() reads as
if (dw_conv_params->ch_mult == 1 && dw_conv_params->dilation.w == 1 && dw_conv_params->dilation.h == 1 &&
filter_dims->w * filter_dims->h * input_dims->c < 512)

The 'filter_dims->w * filter_dims->h * input_dims->c' should be filter_dims->w * filter_dims->h as DW conv is a layer based op.

The consequence of this is that optimization is missed at times resulting in slower performance than what can be. This is not an accuracy error.

No MVE support Conv2D int16

README claims to support Conv2D Int16 MVE but it is actually missing.
https://github.com/ARM-software/CMSIS-NN/blob/main/README.md#mve-extension

Possible undefined behavior for `arm_nn_requantize` when compiling with `CMSIS_NN_USE_SINGLE_ROUNDING`

In the CMSIS_NN_USE_SINGLE_ROUNDING ifdef branch, there is UB when shift >= 31 as total_shift-1 becomes negative.
This can make sense in some context when using a call to requantize to also left_align some quantized int8 value.

__STATIC_FORCEINLINE int32_t arm_nn_requantize(const int32_t val, const int32_t multiplier, const int32_t shift)
{
#ifdef CMSIS_NN_USE_SINGLE_ROUNDING
    const int64_t total_shift = 31 - shift;
    const int64_t new_val = val * (int64_t)multiplier;

    int32_t result = new_val >> (total_shift - 1); // <-- Here is the problematic line
    result = (result + 1) >> 1;

    return result;
#else
    return arm_nn_divide_by_power_of_two(arm_nn_doubling_high_mult_no_sat(val * (1 << LEFT_SHIFT(shift)), multiplier),
                                         RIGHT_SHIFT(shift));
#endif
}

Move test platform to other FVPs

Hi,

I am trying to figure out how to use FVP as a test platform. I successfully build tests as the tutorial shows. But when I try to switch to Corstone SSE-310 FVP to test how my code performs on M85, it fails. It seems code in Corstone-300 is hardware-specific. How can I generate these codes for other FVPs?

Thanks!

How to implement a neural network by using CMSIS-NN？

Is there any tutorial? I am confused about the cmsis-nn transform of nn.lstm, nn.layernorm, nn.prelu etc。

How to build on host

Hi,
I tried to build CMSIS-NN for a x86-64 host cpu and had some trouble to compile it using gcc 9.4.

Finally I ended with:

cmake  -D CMSIS_PATH="../../CMSIS_5" -DCMAKE_C_FLAGS:STRING="-D__GNUC_PYTHON__ -D__RESTRICT=__restrict" ..

Is this the recommended way to do it?
I had to specify __RESTRICT, otherwise I run into

/../cmsis/CMSIS-NN/Include/arm_nnsupportfunctions.h:967:62: error: expected ‘;’, ‘,’ or ‘)’ before ‘dst’
  967 | __STATIC_FORCEINLINE void arm_memcpy_q15(int16_t *__RESTRICT dst, const int16_t *__RESTRICT src, uint32_t block_size)

Best regards.

Compilation fail with arm-gcc-none-eabi 12.2.0

I using the arm gcc from official website: https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads

$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (Arm GNU Toolchain 12.2.MPACBTI-Bet1 (Build arm-12-mpacbti.16)) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

When I try to compile with steps from: https://github.com/ARM-software/CMSIS-NN#building-cmsis-nn-as-a-library

I got the compilation error:

[ 60%] Building C object CMakeFiles/cmsis-nn.dir/Source/SoftmaxFunctions/arm_softmax_s8.c.obj
during RTL pass: combine
/CMSIS-NN/Source/SoftmaxFunctions/arm_softmax_s8.c: In function 'arm_exp_on_negative_values_mve_32x4':
/CMSIS-NN/Source/SoftmaxFunctions/arm_softmax_s8.c:74:1: internal compiler error: in trunc_int_for_mode, at explow.cc:59
   74 | }
      | ^
0x7f2a15836d8f __libc_start_call_main
        ../sysdeps/nptl/libc_start_call_main.h:58
0x7f2a15836e3f __libc_start_main_impl
        ../csu/libc-start.c:392
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://bugs.linaro.org/> for instructions.
make[2]: *** [CMakeFiles/cmsis-nn.dir/build.make:580: CMakeFiles/cmsis-nn.dir/Source/SoftmaxFunctions/arm_softmax_s8.c.obj] Error 1
make[1]: *** [CMakeFiles/Makefile2:295: CMakeFiles/cmsis-nn.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

Seems like the issue with mve support. I will need to use latest gcc as my target is also m85 that previous gcc doesn't support.

How to use arm_lstm_unidirectional_s16_s8? Are there any tutorials？

Besides TFLM, are there any tutorials, especially the example codes, for us to learn how to use the arm_lstm_unidirectional_s16_s8 function? It is so complicated.

missing arm_s8_to_s16_unordered_with_offset.c when I included the pack

I try to incorporate the CMSIS-NN into my project. However, I saw error message:

.\objects\myProject.axf: Error: L6218E: Undefined symbol arm_s8_to_s16_unordered_with_offset (referred from arm_convolve_s8.o).

I traced the function in the CMSIS-NN and found out the source code arm_s8_to_s16_unordered_with_offset.c is in the
directory C:\Users\XXX\AppData\Local\Arm\Packs\ARM\CMSIS-NN\4.1.0\Source\NNSupportFunctions. But the pack didn't
include the file.

Did the pack miss it accidentally? How can I add it back into the pack?

Best Regards,

/Tony

Potential bug in arm_depthwise_conv_wrapper_s8_get_buffer_size

We think there is a bug in arm_depthwise_conv_wrapper_s8_get_buffer_size for cases,
where arm_depthwise_conv_wrapper_s8 calls arm_depthwise_conv_3x3_s8.
arm_depthwise_conv_3x3_s8 has a buffer size of zero, but the arm_depthwise_conv_wrapper_s8_get_buffer_size function
does not handle the case
(filter_dims->w == 3) && (filter_dims->h == 3) && (dw_conv_params->padding.h <= 1) && (dw_conv_params->padding.w <= 1)) special.
It should return 0 then.
Please compare

CMSIS-NN/Source/ConvolutionFunctions/arm_depthwise_conv_wrapper_s8.c

Line 66 in ca5dc34

 if ((filter_dims->w == 3) && (filter_dims->h == 3) && (dw_conv_params->padding.h <= 1) && 

and

CMSIS-NN/Source/ConvolutionFunctions/arm_depthwise_conv_wrapper_s8.c

Line 124 in ca5dc34

 if (input_dims->c == output_dims->c && input_dims->n == 1 && dw_conv_params->dilation.w == 1 && 

@UlrikHjort @vdkhoi @bewagner

Is there some easy to get started guide?

I wanted to use this but don't know where to get started. I want to get aware with the work flow of how CMSIS NN work, with some easy to use example like some sine wave predictor which TensorFlow lite library comes with Arduino etc.
Is there getting's tarted guide for this?
Do I have to train the model on the device or I can convert the TensorFlow trained and quantized model with this? Can I run some simple NN example on STM32F103C8 blue pill board? no matter how small the example is. I just want to get use to with the work flow.
Thanks.

Inconsistency of CMSIS-NN Quantization Method(Q-format) with ARM Documentation

Hello.

I am currently in the process of developing using the Q-Format (Qm.n) for quantization. However, upon reviewing the revision history, I noticed that starting from version 4.1.0, the q-format approach is no longer being followed. My current approach aligns with the methods outlined in the following ARM documentation links:

While TensorFlow Lite for Microcontrollers employs Zero Point and Scale Factor for quantization, which necessitates additional memory and floating-point operations, it appears that Q-format based quantization would be more suitable for Cortex-M processors due to these constraints.

Could you kindly provide a clear explanation for the necessity of this change? The absence of discussion regarding its impact on speed and accuracy has left me somewhat perplexed. Any insight into the rationale behind this decision would be greatly appreciated, as it would aid in understanding the best practices for quantization within the context of TensorFlow Lite for Microcontrollers and CMSIS-NN.

Thank you for your time and consideration.

MVE Conv 1 x N does not handle all configurations

Some corner cases of conv 1_x_n falls into this condition:

CMSIS-NN/Source/ConvolutionFunctions/arm_convolve_1_x_n_s8.c

Line 95 in 1e0f44c

return ARM_CMSIS_NN_FAILURE;

This could then be handled by the generic conv function.

Create dynamic library for Cortex-M33

Hello,

Is it possible to compile a dynamic library for the Cortex-M33? I changed the CMakeLists.txt file as detailed below but I still get a static library.

cmake_minimum_required(VERSION 3.15.6)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(BUILD_SHARED_LIBS ON)
project(CMSISNN)
add_library(cmsis-nn SHARED)
target_compile_options(cmsis-nn PRIVATE -Ofast -fPIC)
target_include_directories(cmsis-nn PUBLIC "Include")
add_subdirectory(Source)
set_target_properties(cmsis-nn PROPERTIES
POSITION_INDEPENDENT_CODE ON
)

arm-software / cmsis-nn Goto Github PK

cmsis-nn's People

Stargazers

Watchers

Forkers

cmsis-nn's Issues

Recommend Projects

Recommend Topics

Recommend Org