Git Product home page Git Product logo

cmsis-nn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cmsis-nn's Issues

What is the proper way to define `__CLZ`?

__CLZ is used in the softmax functions, however it is not defined for non-Arm compilers in CMSIS-NN.

Compare this to CMSIS-DSP, defines it by ultimately including dsp/none.h which contains an inline C function for it (and other intrinsics).

One can solve this by including dsp/none.h in the softmax headers, but that requires both installing CMSIS-DSP and hacking this library.

Include/Internal/arm_nn_compiler.h: undefined __ARM_FEATURE_MVE

Missing check that __ARM_FEATURE_MVE is defined before checking its value

Include/Internal/arm_nn_compiler.h:130

#if ((__ARM_FEATURE_MVE & 3) == 3) || (__ARM_FEATURE_MVE & 1)
    #include <arm_mve.h>
#endif

Update to:

#if defined(__ARM_FEATURE_MVE)
    #if ((__ARM_FEATURE_MVE & 3) == 3) || (__ARM_FEATURE_MVE & 1)
        #include <arm_mve.h>
    #endif
#endif

Unable to compile CMSIS-NN for cortex-M7

Hi,
As per readme.md file I could successfully use command "cmake .. -DCMAKE_TOOLCHAIN_FILE=/home/supratim/toolchains/ethos-u-core-platform/cmake/toolchain/arm-none-eabi-gcc.cmake -DTARGET_CPU=cortex-m7 -DCMSIS_PATH=/home/supratim/CMSIS-NN".

But when from the build folder, I executed 'make' command, the compilation fail with following error:
"In file included from /home/supratim/CMSIS-NN/Include/arm_nnfunctions.h:114,
from /home/supratim/CMSIS-NN/Source/ConvolutionFunctions/arm_convolve_s8.c:31:
/home/supratim/CMSIS-NN/Include/arm_nn_math_types.h:93:10: fatal error: cmsis_compiler.h: No such file or directory
93 | #include "cmsis_compiler.h"

I know that cmsis_compiler.;h exists in https://github.com/ARM-software/CMSIS_5/tree/develop/CMSIS/Core/Include, but this is a separate github repo and ideally one github repo should not have dependency on another one.

Could you please let me know if I am missing anything?

Thanks
Supratim

Doubts regarding some of the functions

(1) Can you please explain what exactly the line arm_nn_requantize() perform.
(2) Also, the functionality of the arm_nn_activation_s16 .
(3) How the sigmoid lookup table is created?
(4) Explain the working of vmaxq_s32(acc, vdupq_n_s32(NN_Q15_MIN)) which is present in the folder
NNSupportFunctions->arm_nn_vec_mat_mul_result_acc_s8.c
Can you please clear the above doubts as they will be very useful for me.

arm_elementwise_add_s8 does not work properly for a Residual Architecture

Hi, appreciate for your work!

I am trying to build a Residual-like NN module as the following:

import tensorflow.keras as keras
import keras.layers as KL

nnInput = keras.Input(
	shape=(inputH, inputW, inputCh))
x = KL.Convolution2D(
	1, 
	(3, 3), 
	strides=(2, 2), 
	padding='same', 
	name='conv1',
	bias_initializer='glorot_uniform',
	activation='relu')(nnInput)

shortcut = x 

x = KL.Conv2D(
	filters=num_filters, 
	kernel_size=1, 
	strides=stride, 
	name=name + "_1_conv",
	activation='relu')(x)

x = KL.Conv2D(
	filters=num_filters, 
	kernel_size=kernel_size, 
	padding="SAME", 
	name=name + "_2_conv",
	activation='relu')(x)
x = KL.Conv2D(
	filters=4 * num_filters, 
	kernel_size=1, 
	name=name + "_3_conv")(x)
x = KL.Add(
	name=name + "_add")([shortcut, x])

The calculation results of CMSIS-NN and tensorflow-lite before the elementwise add are all the same. I follow the testing case in the repo to build the arm_elementwise_add_s8 accordingly but it is still not working. I am wondering is the function proved to be compatible with tensorflow-lite or do I miss something?

Thanks in advance!

Requantize INT_32 to INT8 using arm_nn_requantize

Hi,
I am using this FC s8 CMSIS layer for a project. After we invoke a FC s8 kernel we have Int32 output that we need to dequantize-requantize (Int32->FP->Int8). For example if we have this FC layer in the image to dequantize-requantize the 32-bit output of the FC layer we need to apply this equation: Output_Int8 = (0.0039215669967234135*Weight_mult/0.14308570325374603) *Output_Int32 + 69. => Multiplier * Output_Int32 + shifter
image

CMSIS-NN has arm_nn_requantizefunction that takes as input Int32 Multiplier and shifter and returns (val * multiplier)/(2 ^ shift). Can you explain me how to map the above FP Multiplier and Int32 shifter into the requantize function?

The mat_mul_kernel_s16 function needs __PKHTB & __PKHBT to re-order the value

Hi,
I found that at below line:

ip_a0 = read_and_pad(ip_a0, &a01, &a02);

We use a read_and_pad to process the weights for the value expanding from q7_t to q15_t, and also a group: __PKHxx for
reording the value from (a0, a2, a1, a3) to (a0, a1, a2, a3).
My question is that why we add this two PKHxx operations, I think that We can still use the (a0, a2, a1, a3), if we process the
input with the same way (I found that the 1x1 conv2d has the similarity operation, without __PKHxx). So that we can save two-instructs and then save the inference time.

Regards,
Crist

Is there a roadmap?

Hi, thank you for your work!

Is there a future operation roadmap in long/short term like adding RNN... transformer...?

'asm operand has impossible constraint' - GCC 11.3

GCC version : arm-none-eabi-gcc (Arm GNU Toolchain 11.3.Rel1) 11.3.1 20220712

Reproducer: arm-none-eabi-gcc -mcpu=cortex-m55 -Os -I../../Include/ -mfloat-abi=hard -mfpu=auto -S arm_nn_mat_mult_nt_t_s8.c

The issue is not seen in newer releases or other optimization levels.

Confusing doc on `arm_nn_requantize`

The documentation of arm_nn_requantize is

/**
 * @brief           Requantize a given value.
 * @param[in]       val         Value to be requantized
 * @param[in]       multiplier  multiplier. Range {NN_Q31_MIN + 1, Q32_MAX}
 * @param[in]       shift       left or right shift for 'val * multiplier'
 *
 * @return          Returns (val * multiplier)/(2 ^ shift)
 *
 */

But the CMSIS_USE_SINGLE_ROUNDING path does not compute that, but some rounding of (val * multipler) / (2^(shift+31))

Besides, the left or right shift part of the doc is not very clear: how to specify the direction of the shift?
It seems that it is based on the sign of shift, with positive shifts being left shift and negative shifts being right shifts.

IAR compiler issues

There are some compile error for IAR

__STATIC_FORCEINLINE void arm_memcpy_s8(int8_t *__RESTRICT dst, const int8_t *__RESTRICT src, uint32_t block_size)
                                                             ^
"Include/arm_nnsupportfunctions.h",953  Error[Pe018]: 
          expected a ")"

  __STATIC_FORCEINLINE int32x4_t arm_doubling_high_mult_mve(const int32x4_t m1, const int32_t m2)
                       ^
"Include/arm_nnsupportfunctions.h",990  Error[Pe020]: 
          identifier "int32x4_t" is undefined

Occur an ASM error when compiling arm_nn_mat_mul_core_4x_s8

Processor: Cortex-M55
Error message:
cmsis-nn/Source/NNSupportFunctions/arm_nn_mat_mul_core_4x_s8.c: In function 'arm_nn_mat_mul_core_4x_s8':
/(my project path)/cmsis-nn/Include/Internal/arm_nn_compiler.h:97:23: error: 'asm' operand has impossible constraints
97 | #define __ASM __asm
| ^~~~~
cmsis-nn/Source/NNSupportFunctions/arm_nn_mat_mul_core_4x_s8.c:84:9: note: in expansion of macro '__ASM'
84 | __ASM volatile(" .p2align 2 \n"
| ^~~~~

Why is this happening? And how to fix it?
Appreciate!

Incorrect condition check for buffer size in DW convolve s16

The condition to use arm_depthwise_conv_fast_s16() reads as
if (dw_conv_params->ch_mult == 1 && dw_conv_params->dilation.w == 1 && dw_conv_params->dilation.h == 1 &&
filter_dims->w * filter_dims->h * input_dims->c < 512)

The 'filter_dims->w * filter_dims->h * input_dims->c' should be filter_dims->w * filter_dims->h as DW conv is a layer based op.

The consequence of this is that optimization is missed at times resulting in slower performance than what can be. This is not an accuracy error.

Possible undefined behavior for `arm_nn_requantize` when compiling with `CMSIS_NN_USE_SINGLE_ROUNDING`

In the CMSIS_NN_USE_SINGLE_ROUNDING ifdef branch, there is UB when shift >= 31 as total_shift-1 becomes negative.
This can make sense in some context when using a call to requantize to also left_align some quantized int8 value.

__STATIC_FORCEINLINE int32_t arm_nn_requantize(const int32_t val, const int32_t multiplier, const int32_t shift)
{
#ifdef CMSIS_NN_USE_SINGLE_ROUNDING
    const int64_t total_shift = 31 - shift;
    const int64_t new_val = val * (int64_t)multiplier;

    int32_t result = new_val >> (total_shift - 1); // <-- Here is the problematic line
    result = (result + 1) >> 1;

    return result;
#else
    return arm_nn_divide_by_power_of_two(arm_nn_doubling_high_mult_no_sat(val * (1 << LEFT_SHIFT(shift)), multiplier),
                                         RIGHT_SHIFT(shift));
#endif
}

Move test platform to other FVPs

Hi,

I am trying to figure out how to use FVP as a test platform. I successfully build tests as the tutorial shows. But when I try to switch to Corstone SSE-310 FVP to test how my code performs on M85, it fails. It seems code in Corstone-300 is hardware-specific. How can I generate these codes for other FVPs?

Thanks!

How to build on host

Hi,
I tried to build CMSIS-NN for a x86-64 host cpu and had some trouble to compile it using gcc 9.4.

Finally I ended with:

cmake  -D CMSIS_PATH="../../CMSIS_5" -DCMAKE_C_FLAGS:STRING="-D__GNUC_PYTHON__ -D__RESTRICT=__restrict" ..

Is this the recommended way to do it?
I had to specify __RESTRICT, otherwise I run into

/../cmsis/CMSIS-NN/Include/arm_nnsupportfunctions.h:967:62: error: expected ‘;’, ‘,’ or ‘)’ before ‘dst’
  967 | __STATIC_FORCEINLINE void arm_memcpy_q15(int16_t *__RESTRICT dst, const int16_t *__RESTRICT src, uint32_t block_size)

Best regards.

Compilation fail with arm-gcc-none-eabi 12.2.0

I using the arm gcc from official website: https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads

$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (Arm GNU Toolchain 12.2.MPACBTI-Bet1 (Build arm-12-mpacbti.16)) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

When I try to compile with steps from: https://github.com/ARM-software/CMSIS-NN#building-cmsis-nn-as-a-library

I got the compilation error:

[ 60%] Building C object CMakeFiles/cmsis-nn.dir/Source/SoftmaxFunctions/arm_softmax_s8.c.obj
during RTL pass: combine
/CMSIS-NN/Source/SoftmaxFunctions/arm_softmax_s8.c: In function 'arm_exp_on_negative_values_mve_32x4':
/CMSIS-NN/Source/SoftmaxFunctions/arm_softmax_s8.c:74:1: internal compiler error: in trunc_int_for_mode, at explow.cc:59
   74 | }
      | ^
0x7f2a15836d8f __libc_start_call_main
        ../sysdeps/nptl/libc_start_call_main.h:58
0x7f2a15836e3f __libc_start_main_impl
        ../csu/libc-start.c:392
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://bugs.linaro.org/> for instructions.
make[2]: *** [CMakeFiles/cmsis-nn.dir/build.make:580: CMakeFiles/cmsis-nn.dir/Source/SoftmaxFunctions/arm_softmax_s8.c.obj] Error 1
make[1]: *** [CMakeFiles/Makefile2:295: CMakeFiles/cmsis-nn.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

Seems like the issue with mve support. I will need to use latest gcc as my target is also m85 that previous gcc doesn't support.

missing arm_s8_to_s16_unordered_with_offset.c when I included the pack

I try to incorporate the CMSIS-NN into my project. However, I saw error message:

.\objects\myProject.axf: Error: L6218E: Undefined symbol arm_s8_to_s16_unordered_with_offset (referred from arm_convolve_s8.o).

I traced the function in the CMSIS-NN and found out the source code arm_s8_to_s16_unordered_with_offset.c is in the
directory C:\Users\XXX\AppData\Local\Arm\Packs\ARM\CMSIS-NN\4.1.0\Source\NNSupportFunctions. But the pack didn't
include the file.

CMSIS-NN File List - 1
CMSIS-NN File List - 2
CMSIS-NN File List - 3

Did the pack miss it accidentally? How can I add it back into the pack?

Best Regards,

/Tony

Potential bug in arm_depthwise_conv_wrapper_s8_get_buffer_size

We think there is a bug in arm_depthwise_conv_wrapper_s8_get_buffer_size for cases,
where arm_depthwise_conv_wrapper_s8 calls arm_depthwise_conv_3x3_s8.
arm_depthwise_conv_3x3_s8 has a buffer size of zero, but the arm_depthwise_conv_wrapper_s8_get_buffer_size function
does not handle the case
(filter_dims->w == 3) && (filter_dims->h == 3) && (dw_conv_params->padding.h <= 1) && (dw_conv_params->padding.w <= 1)) special.
It should return 0 then.
Please compare

if ((filter_dims->w == 3) && (filter_dims->h == 3) && (dw_conv_params->padding.h <= 1) &&

and

if (input_dims->c == output_dims->c && input_dims->n == 1 && dw_conv_params->dilation.w == 1 &&

@UlrikHjort @vdkhoi @bewagner

Is there some easy to get started guide?

I wanted to use this but don't know where to get started. I want to get aware with the work flow of how CMSIS NN work, with some easy to use example like some sine wave predictor which TensorFlow lite library comes with Arduino etc.
Is there getting's tarted guide for this?
Do I have to train the model on the device or I can convert the TensorFlow trained and quantized model with this? Can I run some simple NN example on STM32F103C8 blue pill board? no matter how small the example is. I just want to get use to with the work flow.
Thanks.

Inconsistency of CMSIS-NN Quantization Method(Q-format) with ARM Documentation

Hello.

I am currently in the process of developing using the Q-Format (Qm.n) for quantization. However, upon reviewing the revision history, I noticed that starting from version 4.1.0, the q-format approach is no longer being followed. My current approach aligns with the methods outlined in the following ARM documentation links:

While TensorFlow Lite for Microcontrollers employs Zero Point and Scale Factor for quantization, which necessitates additional memory and floating-point operations, it appears that Q-format based quantization would be more suitable for Cortex-M processors due to these constraints.

Could you kindly provide a clear explanation for the necessity of this change? The absence of discussion regarding its impact on speed and accuracy has left me somewhat perplexed. Any insight into the rationale behind this decision would be greatly appreciated, as it would aid in understanding the best practices for quantization within the context of TensorFlow Lite for Microcontrollers and CMSIS-NN.

Thank you for your time and consideration.

Create dynamic library for Cortex-M33

Hello,

Is it possible to compile a dynamic library for the Cortex-M33? I changed the CMakeLists.txt file as detailed below but I still get a static library.

cmake_minimum_required(VERSION 3.15.6)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(BUILD_SHARED_LIBS ON)
project(CMSISNN)
add_library(cmsis-nn SHARED)
target_compile_options(cmsis-nn PRIVATE -Ofast -fPIC)
target_include_directories(cmsis-nn PUBLIC "Include")
add_subdirectory(Source)
set_target_properties(cmsis-nn PROPERTIES
POSITION_INDEPENDENT_CODE ON
)

Inconsistency of CMSIS-NN Quantization Method(Q-format) with ARM Documentation

Hello.

I am currently in the process of developing using the Q-Format (Qm.n) for quantization. However, upon reviewing the revision history, I noticed that starting from version 4.1.0, the q-format approach is no longer being followed. My current approach aligns with the methods outlined in the following ARM documentation links:

While TensorFlow Lite for Microcontrollers employs Zero Point and Scale Factor for quantization, which necessitates additional memory and floating-point operations, it appears that Q-format based quantization would be more suitable for Cortex-M processors due to these constraints.

Could you kindly provide a clear explanation for the necessity of this change? The absence of discussion regarding its impact on speed and accuracy has left me somewhat perplexed. Any insight into the rationale behind this decision would be greatly appreciated, as it would aid in understanding the best practices for quantization within the context of TensorFlow Lite for Microcontrollers and CMSIS-NN.

Thank you for your time and consideration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.