arm-software / cmsis-nn Goto Github PK
View Code? Open in Web Editor NEWCMSIS-NN Library
Home Page: https://arm-software.github.io/CMSIS-NN
License: Apache License 2.0
CMSIS-NN Library
Home Page: https://arm-software.github.io/CMSIS-NN
License: Apache License 2.0
__CLZ
is used in the softmax functions, however it is not defined for non-Arm compilers in CMSIS-NN.
Compare this to CMSIS-DSP, defines it by ultimately including dsp/none.h
which contains an inline C function for it (and other intrinsics).
One can solve this by including dsp/none.h
in the softmax headers, but that requires both installing CMSIS-DSP and hacking this library.
Missing check that __ARM_FEATURE_MVE
is defined before checking its value
Include/Internal/arm_nn_compiler.h:130
#if ((__ARM_FEATURE_MVE & 3) == 3) || (__ARM_FEATURE_MVE & 1)
#include <arm_mve.h>
#endif
Update to:
#if defined(__ARM_FEATURE_MVE)
#if ((__ARM_FEATURE_MVE & 3) == 3) || (__ARM_FEATURE_MVE & 1)
#include <arm_mve.h>
#endif
#endif
Compilation fails with [-Werror,-Wvla] and buffers of unknown size in stack shouldn't be allocated in stack as there is a memory already allocated in the scratch buffer for that.
There are a couple of documentation warnings.. If you scroll down to the end of https://github.com/ARM-software/CMSIS-NN/pull/78/files , you can see it..
Hi,
As per readme.md file I could successfully use command "cmake .. -DCMAKE_TOOLCHAIN_FILE=/home/supratim/toolchains/ethos-u-core-platform/cmake/toolchain/arm-none-eabi-gcc.cmake -DTARGET_CPU=cortex-m7 -DCMSIS_PATH=/home/supratim/CMSIS-NN".
But when from the build folder, I executed 'make' command, the compilation fail with following error:
"In file included from /home/supratim/CMSIS-NN/Include/arm_nnfunctions.h:114,
from /home/supratim/CMSIS-NN/Source/ConvolutionFunctions/arm_convolve_s8.c:31:
/home/supratim/CMSIS-NN/Include/arm_nn_math_types.h:93:10: fatal error: cmsis_compiler.h: No such file or directory
93 | #include "cmsis_compiler.h"
I know that cmsis_compiler.;h exists in https://github.com/ARM-software/CMSIS_5/tree/develop/CMSIS/Core/Include, but this is a separate github repo and ideally one github repo should not have dependency on another one.
Could you please let me know if I am missing anything?
Thanks
Supratim
I would like to point out that identifiers like “_ARM_NNFUNCTIONS_H
” and “_UART_STDOUT_H_
” do not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?
One of the users of TVM via CMSIS-NN needs support of N>1 in maxpool. Here is the link: https://discuss.tvm.apache.org/t/cmsis-nn-qnn-max-pool2d-layout-on-cortex-m7-without-npu/12712/3. Would it be possible to add support for this?
(1) Can you please explain what exactly the line arm_nn_requantize() perform.
(2) Also, the functionality of the arm_nn_activation_s16 .
(3) How the sigmoid lookup table is created?
(4) Explain the working of vmaxq_s32(acc, vdupq_n_s32(NN_Q15_MIN)) which is present in the folder
NNSupportFunctions->arm_nn_vec_mat_mul_result_acc_s8.c
Can you please clear the above doubts as they will be very useful for me.
Hi, appreciate for your work!
I am trying to build a Residual-like NN module as the following:
import tensorflow.keras as keras
import keras.layers as KL
nnInput = keras.Input(
shape=(inputH, inputW, inputCh))
x = KL.Convolution2D(
1,
(3, 3),
strides=(2, 2),
padding='same',
name='conv1',
bias_initializer='glorot_uniform',
activation='relu')(nnInput)
shortcut = x
x = KL.Conv2D(
filters=num_filters,
kernel_size=1,
strides=stride,
name=name + "_1_conv",
activation='relu')(x)
x = KL.Conv2D(
filters=num_filters,
kernel_size=kernel_size,
padding="SAME",
name=name + "_2_conv",
activation='relu')(x)
x = KL.Conv2D(
filters=4 * num_filters,
kernel_size=1,
name=name + "_3_conv")(x)
x = KL.Add(
name=name + "_add")([shortcut, x])
The calculation results of CMSIS-NN and tensorflow-lite before the elementwise add are all the same. I follow the testing case in the repo to build the arm_elementwise_add_s8 accordingly but it is still not working. I am wondering is the function proved to be compatible with tensorflow-lite or do I miss something?
Thanks in advance!
Hi,
I am using this FC s8 CMSIS layer for a project. After we invoke a FC s8 kernel we have Int32 output that we need to dequantize-requantize (Int32->FP->Int8). For example if we have this FC layer in the image to dequantize-requantize the 32-bit output of the FC layer we need to apply this equation: Output_Int8 = (0.0039215669967234135*Weight_mult/0.14308570325374603) *Output_Int32 + 69. => Multiplier * Output_Int32 + shifter
CMSIS-NN has arm_nn_requantizefunction that takes as input Int32 Multiplier and shifter and returns (val * multiplier)/(2 ^ shift). Can you explain me how to map the above FP Multiplier and Int32 shifter into the requantize function?
Hi,
I found that at below line:
Regards,
Crist
Hi, thank you for your work!
Is there a future operation roadmap in long/short term like adding RNN... transformer...?
GCC version : arm-none-eabi-gcc (Arm GNU Toolchain 11.3.Rel1) 11.3.1 20220712
Reproducer: arm-none-eabi-gcc -mcpu=cortex-m55 -Os -I../../Include/ -mfloat-abi=hard -mfpu=auto -S arm_nn_mat_mult_nt_t_s8.c
The issue is not seen in newer releases or other optimization levels.
The documentation of arm_nn_requantize
is
/**
* @brief Requantize a given value.
* @param[in] val Value to be requantized
* @param[in] multiplier multiplier. Range {NN_Q31_MIN + 1, Q32_MAX}
* @param[in] shift left or right shift for 'val * multiplier'
*
* @return Returns (val * multiplier)/(2 ^ shift)
*
*/
But the CMSIS_USE_SINGLE_ROUNDING
path does not compute that, but some rounding of (val * multipler) / (2^(shift+31))
Besides, the left or right shift
part of the doc is not very clear: how to specify the direction of the shift?
It seems that it is based on the sign of shift, with positive shifts being left shift and negative shifts being right shifts.
There are some compile error for IAR
__STATIC_FORCEINLINE void arm_memcpy_s8(int8_t *__RESTRICT dst, const int8_t *__RESTRICT src, uint32_t block_size)
^
"Include/arm_nnsupportfunctions.h",953 Error[Pe018]:
expected a ")"
__STATIC_FORCEINLINE int32x4_t arm_doubling_high_mult_mve(const int32x4_t m1, const int32_t m2)
^
"Include/arm_nnsupportfunctions.h",990 Error[Pe020]:
identifier "int32x4_t" is undefined
Processor: Cortex-M55
Error message:
cmsis-nn/Source/NNSupportFunctions/arm_nn_mat_mul_core_4x_s8.c: In function 'arm_nn_mat_mul_core_4x_s8':
/(my project path)/cmsis-nn/Include/Internal/arm_nn_compiler.h:97:23: error: 'asm' operand has impossible constraints
97 | #define __ASM __asm
| ^~~~~
cmsis-nn/Source/NNSupportFunctions/arm_nn_mat_mul_core_4x_s8.c:84:9: note: in expansion of macro '__ASM'
84 | __ASM volatile(" .p2align 2 \n"
| ^~~~~
Why is this happening? And how to fix it?
Appreciate!
GCC cannot build all unit tests with -O0.
The condition to use arm_depthwise_conv_fast_s16() reads as
if (dw_conv_params->ch_mult == 1 && dw_conv_params->dilation.w == 1 && dw_conv_params->dilation.h == 1 &&
filter_dims->w * filter_dims->h * input_dims->c < 512)
The 'filter_dims->w * filter_dims->h * input_dims->c' should be filter_dims->w * filter_dims->h as DW conv is a layer based op.
The consequence of this is that optimization is missed at times resulting in slower performance than what can be. This is not an accuracy error.
README claims to support Conv2D Int16 MVE but it is actually missing.
https://github.com/ARM-software/CMSIS-NN/blob/main/README.md#mve-extension
In the CMSIS_NN_USE_SINGLE_ROUNDING
ifdef branch, there is UB when shift >= 31 as total_shift-1 becomes negative.
This can make sense in some context when using a call to requantize to also left_align some quantized int8 value.
__STATIC_FORCEINLINE int32_t arm_nn_requantize(const int32_t val, const int32_t multiplier, const int32_t shift)
{
#ifdef CMSIS_NN_USE_SINGLE_ROUNDING
const int64_t total_shift = 31 - shift;
const int64_t new_val = val * (int64_t)multiplier;
int32_t result = new_val >> (total_shift - 1); // <-- Here is the problematic line
result = (result + 1) >> 1;
return result;
#else
return arm_nn_divide_by_power_of_two(arm_nn_doubling_high_mult_no_sat(val * (1 << LEFT_SHIFT(shift)), multiplier),
RIGHT_SHIFT(shift));
#endif
}
Hi,
I am trying to figure out how to use FVP as a test platform. I successfully build tests as the tutorial shows. But when I try to switch to Corstone SSE-310 FVP to test how my code performs on M85, it fails. It seems code in Corstone-300 is hardware-specific. How can I generate these codes for other FVPs?
Thanks!
Is there any tutorial? I am confused about the cmsis-nn transform of nn.lstm, nn.layernorm, nn.prelu etc。
Hi,
I tried to build CMSIS-NN
for a x86-64 host cpu and had some trouble to compile it using gcc 9.4.
Finally I ended with:
cmake -D CMSIS_PATH="../../CMSIS_5" -DCMAKE_C_FLAGS:STRING="-D__GNUC_PYTHON__ -D__RESTRICT=__restrict" ..
Is this the recommended way to do it?
I had to specify __RESTRICT
, otherwise I run into
/../cmsis/CMSIS-NN/Include/arm_nnsupportfunctions.h:967:62: error: expected ‘;’, ‘,’ or ‘)’ before ‘dst’
967 | __STATIC_FORCEINLINE void arm_memcpy_q15(int16_t *__RESTRICT dst, const int16_t *__RESTRICT src, uint32_t block_size)
Best regards.
I using the arm gcc from official website: https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads
$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (Arm GNU Toolchain 12.2.MPACBTI-Bet1 (Build arm-12-mpacbti.16)) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
When I try to compile with steps from: https://github.com/ARM-software/CMSIS-NN#building-cmsis-nn-as-a-library
I got the compilation error:
[ 60%] Building C object CMakeFiles/cmsis-nn.dir/Source/SoftmaxFunctions/arm_softmax_s8.c.obj
during RTL pass: combine
/CMSIS-NN/Source/SoftmaxFunctions/arm_softmax_s8.c: In function 'arm_exp_on_negative_values_mve_32x4':
/CMSIS-NN/Source/SoftmaxFunctions/arm_softmax_s8.c:74:1: internal compiler error: in trunc_int_for_mode, at explow.cc:59
74 | }
| ^
0x7f2a15836d8f __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
0x7f2a15836e3f __libc_start_main_impl
../csu/libc-start.c:392
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://bugs.linaro.org/> for instructions.
make[2]: *** [CMakeFiles/cmsis-nn.dir/build.make:580: CMakeFiles/cmsis-nn.dir/Source/SoftmaxFunctions/arm_softmax_s8.c.obj] Error 1
make[1]: *** [CMakeFiles/Makefile2:295: CMakeFiles/cmsis-nn.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
Seems like the issue with mve support. I will need to use latest gcc as my target is also m85 that previous gcc doesn't support.
Besides TFLM, are there any tutorials, especially the example codes, for us to learn how to use the arm_lstm_unidirectional_s16_s8 function? It is so complicated.
I try to incorporate the CMSIS-NN into my project. However, I saw error message:
.\objects\myProject.axf: Error: L6218E: Undefined symbol arm_s8_to_s16_unordered_with_offset (referred from arm_convolve_s8.o).
I traced the function in the CMSIS-NN and found out the source code arm_s8_to_s16_unordered_with_offset.c is in the
directory C:\Users\XXX\AppData\Local\Arm\Packs\ARM\CMSIS-NN\4.1.0\Source\NNSupportFunctions. But the pack didn't
include the file.
Did the pack miss it accidentally? How can I add it back into the pack?
Best Regards,
/Tony
We think there is a bug in arm_depthwise_conv_wrapper_s8_get_buffer_size for cases,
where arm_depthwise_conv_wrapper_s8 calls arm_depthwise_conv_3x3_s8.
arm_depthwise_conv_3x3_s8 has a buffer size of zero, but the arm_depthwise_conv_wrapper_s8_get_buffer_size function
does not handle the case
(filter_dims->w == 3) && (filter_dims->h == 3) && (dw_conv_params->padding.h <= 1) && (dw_conv_params->padding.w <= 1)) special.
It should return 0 then.
Please compare
and
I wanted to use this but don't know where to get started. I want to get aware with the work flow of how CMSIS NN work, with some easy to use example like some sine wave predictor which TensorFlow lite library comes with Arduino etc.
Is there getting's tarted guide for this?
Do I have to train the model on the device or I can convert the TensorFlow trained and quantized model with this? Can I run some simple NN example on STM32F103C8 blue pill board? no matter how small the example is. I just want to get use to with the work flow.
Thanks.
Hello.
I am currently in the process of developing using the Q-Format (Qm.n) for quantization. However, upon reviewing the revision history, I noticed that starting from version 4.1.0, the q-format approach is no longer being followed. My current approach aligns with the methods outlined in the following ARM documentation links:
While TensorFlow Lite for Microcontrollers employs Zero Point and Scale Factor for quantization, which necessitates additional memory and floating-point operations, it appears that Q-format based quantization would be more suitable for Cortex-M processors due to these constraints.
Could you kindly provide a clear explanation for the necessity of this change? The absence of discussion regarding its impact on speed and accuracy has left me somewhat perplexed. Any insight into the rationale behind this decision would be greatly appreciated, as it would aid in understanding the best practices for quantization within the context of TensorFlow Lite for Microcontrollers and CMSIS-NN.
Thank you for your time and consideration.
Some corner cases of conv 1_x_n falls into this condition:
Hello,
Is it possible to compile a dynamic library for the Cortex-M33? I changed the CMakeLists.txt file as detailed below but I still get a static library.
cmake_minimum_required(VERSION 3.15.6)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(BUILD_SHARED_LIBS ON)
project(CMSISNN)
add_library(cmsis-nn SHARED)
target_compile_options(cmsis-nn PRIVATE -Ofast -fPIC)
target_include_directories(cmsis-nn PUBLIC "Include")
add_subdirectory(Source)
set_target_properties(cmsis-nn PROPERTIES
POSITION_INDEPENDENT_CODE ON
)
Hello.
I am currently in the process of developing using the Q-Format (Qm.n) for quantization. However, upon reviewing the revision history, I noticed that starting from version 4.1.0, the q-format approach is no longer being followed. My current approach aligns with the methods outlined in the following ARM documentation links:
While TensorFlow Lite for Microcontrollers employs Zero Point and Scale Factor for quantization, which necessitates additional memory and floating-point operations, it appears that Q-format based quantization would be more suitable for Cortex-M processors due to these constraints.
Could you kindly provide a clear explanation for the necessity of this change? The absence of discussion regarding its impact on speed and accuracy has left me somewhat perplexed. Any insight into the rationale behind this decision would be greatly appreciated, as it would aid in understanding the best practices for quantization within the context of TensorFlow Lite for Microcontrollers and CMSIS-NN.
Thank you for your time and consideration.
The new lstm-operator implementation is not bit exact to TFLM when running a network with lhs_offset = 0 using MVE.
CMSIS: Common Microcontroller Software Interface Standard ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.