mit-han-lab / tinyengine Goto Github PK

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory

Home Page: https://mcunet.mit.edu

License: MIT License

C 88.46% Python 2.62% Shell 0.01% Makefile 5.63% HTML 2.10% C++ 0.95% Assembly 0.21% Starlark 0.01% Cuda 0.01%

c codegenerator cpp deep-learning microcontroller pytorch tinyml edge-computing neural-architecture-search quantization

tinyengine's Introduction

TinyEngine

This is the official implementation of TinyEngine, a memory-efficient and high-performance neural network library for Microcontrollers. TinyEngine is a part of MCUNet, which also consists of TinyNAS. MCUNet is a system-algorithm co-design framework for tiny deep learning on microcontrollers. TinyEngine and TinyNAS are co-designed to fit the tight memory budgets.

The MCUNet and TinyNAS repo is here.

TinyML Project Website | MCUNetV1 | MCUNetV2 | MCUNetV3

Demo (Inference)

Demo (Training)

News

If you are interested in getting updates, please sign up here to get notified!

(2024/03) We release a new demo video of On-Device Training Under 256KB Memory.
(2023/10) Tiny Machine Learning: Progress and Futures [Feature] appears at IEEE CAS Magazine.
(2023/02) We now support running the inference tutorial without an Arducam. Feel free to give it a try!
(2023/02) We release the source code of the person detection demo, face mask detection demo, and on-device training demo on OpenMV Cam H7.
(2022/12) We update the measured results on STM32H743 with the new versions of the inference libraries.
(2022/12) We release the source code for patch-based inference and update the tutorial of our inference demo to provide option that generates patch-based inference code for the visual wake words (VWW) demo.
(2022/11) We release the source code of Tiny Training Engine, and include the tutorial of our training demo for training a visual wake words (VWW) model on microcontrollers.
(2022/11) We release the source code of the algorithm and compilation parts of MCUNetV3 in this repo. Please take a look!
(2022/10) Our new work On-Device Training Under 256KB Memory is highlighted on the MIT homepage!
(2022/09) Our new work On-Device Training Under 256KB Memory is accepted to NeurIPS 2022! It enables tiny on-device training for IoT devices.
(2022/08) Our New Course on TinyML and Efficient Deep Learning will be released soon in September 2022: efficientml.ai.
(2022/08) We include the tutorial of our inference demo for deploying a visual wake words (VWW) model onto microcontrollers.
(2022/08) We opensource the TinyEngine repo.
(2022/07) We include the person detection model used in the video demo above in the MCUNet repo.
(2022/06) We refactor the MCUNet repo as a standalone repo (previous repo: https://github.com/mit-han-lab/tinyml)
(2021/10) MCUNetV2 is accepted to NeurIPS 2021: https://arxiv.org/abs/2110.15352 !
(2020/10) MCUNet is accepted to NeurIPS 2020 as spotlight: https://arxiv.org/abs/2007.10319 !
Our projects are covered by: MIT Spotlight News (v3), MIT News (v2), MIT News (v1), WIRED, Morning Brew, Stacey on IoT, Analytics Insight, Techable, etc.

Overview

Microcontrollers are low-cost, low-power hardware. They are widely deployed and have wide applications, but the tight memory budget (50,000x smaller than GPUs) makes deep learning deployment difficult.

MCUNet is a system-algorithm co-design framework for tiny deep learning on microcontrollers. It consists of TinyNAS and TinyEngine. They are co-designed to fit the tight memory budgets. With system-algorithm co-design, we can significantly improve the deep learning performance on the same tiny memory budget.

Specifically, TinyEngine is a memory-efficient inference library. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing memory usage and accelerating the inference. It outperforms existing inference libraries such as TF-Lite Micro from Google, CMSIS-NN from Arm, and X-CUBE-AI from STMicroelectronics.

TinyEngine adopts the following optimization techniques to accelerate inference speed and minimize memory footprint.

In-place depth-wise convolution: A unique data placement technique for depth-wise convolution that overwrites input data by intermediate/output data to reduce peak SRAM memory.
Patch-based inference: A generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory.
Operator fusion: A method that improves performance by merging one operator into a different operator so that they are executed together without requiring a roundtrip to memory.
SIMD (Single instruction, multiple data) programming: A computing method that performs the same operation on multiple data points simultaneously.
HWC to CHW weight format transformation: A weight format transformation technique that increases cache hit ratio for in-place depth-wise convolution.
Image to Column (Im2col) convolution: An implementation technique of computing convolution operation using general matrix multiplication (GEMM) operations.
Loop reordering: A loop transformation technique that attempts to optimize a program's execution speed by reordering/interchanging the sequence of loops.
Loop unrolling: A loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space-time tradeoff.
Loop tiling: A loop transformation technique that attempts to reduce memory access latency by partitioning a loop's iteration space into smaller chunks or blocks, so as to help ensure data used in a loop stays in the cache until it is reused.

By adopting the abovementioned optimization techniques, TinyEngine can not only enhance inference speed but also reduce peak memory, as shown in the figures below.

MAC/s improvement breakdown:

Peak memory reduction:

To sum up, our TinyEngine inference engine could be a useful infrastructure for MCU-based AI applications. It significantly improves the inference speed and reduces the memory usage compared to existing libraries like TF-Lite Micro, CMSIS-NN, X-CUBE-AI, etc. It improves the inference speed by 1.1-18.6x, and reduces the peak memory by 1.3-3.6x.

Save Memory with Patch-based Inference: We can dramastically reduce the inference peak memory by using patch-based inference for the memory-intensive stage of CNNs.

For MobileNetV2, using patch-based inference allows us to reduce the peak memory by 8x.

With patch-based infernece, tinyengine achieves higher accuracy at the same memory budgets.

Code Structure

code_generator contains a python library that is used to compile neural networks into low-level source code (C/C++).

TinyEngine contains a C/C++ library that implements operators and performs inference on Microcontrollers.

examples contains the examples of transforming TFLite models into our TinyEngine models.

tutorial contains the demo tutorial (of inference and training) of deploying a visual wake words (VWW) model onto microcontrollers.

assets contains misc assets.

Requirement

Python 3.6+
STM32CubeIDE 1.5+

Setup for Users

First, clone this repository:

git clone --recursive https://github.com/mit-han-lab/tinyengine.git

(Optional) Using a virtual environment with conda is recommended.

conda create -n tinyengine python=3.6 pip
conda activate tinyengine

Install dependencies:

pip install -r requirements.txt

Setup for Developers

Install pre-commit hooks to automatically format changes in your code.

pre-commit install

Deployment Example

Please see tutorial to learn how to deploy a visual wake words (VWW) model onto microcontrollers by using TinyEngine. We include both the inference demo and the training demo in the tutorial, please take a look!

Measured Results

All the tflite models are from Model Zoo in MCUNet repo. Please see MCUNet repo to know how to build the pre-trained int8 quantized models in TF-Lite format.
All the latency, peak memory (SRAM) and Flash memory usage results are profiled on STM32H743 with the limitations of 512 KB peak memory and 2 MB storage.
Note that we measure the newer versions of libraries in this repo, so that the results in this repo might be different from the ones in the MCUNet papers.
For each inference library, we use the git commit ID to indicate the version.
All the tflite models are compiled by -Ofast optimization level in STM32CubeIDE.
OOM denotes Out Of Memory.
Measurement for X-Cube-AI v7.3.0 was conducted with the default compilation setting of balanced mode.

The latency results:

net_id	TF-Lite Micro @ 713b6ed	CMSIS-NN @ 011bf32	X-CUBE-AI v7.3.0	TinyEngine @ 0363956
# mcunet models (VWW)
mcunet-vww0	587ms	53ms	32ms	27ms
mcunet-vww1	1120ms	97ms	57ms	51ms
mcunet-vww2	5310ms	478ms	269ms	234ms
# mcunet models (ImageNet)
mcunet-in0	586ms	51ms	35ms	25ms
mcunet-in1	1227ms	103ms	63ms	56ms
mcunet-in2	6463ms	642ms	351ms	280ms
mcunet-in3	7821ms	770ms	414ms	336ms
mcunet-in4	OOM	OOM	516ms	463ms
# baseline models
proxyless-w0.3-r64	512ms	54kB	35kB	23kB
proxyless-w0.3-r176	3801ms	380ms	205ms	176ms
mbv2-w0.3-r64	467ms	43ms	29ms	23ms

The peak memory (SRAM) results:

net_id	TF-Lite Micro @ 713b6ed	CMSIS-NN @ 011bf32	X-CUBE-AI v7.3.0	TinyEngine @ 0363956
# mcunet models (VWW)
mcunet-vww0	163kB	163kB	88kB	59kB
mcunet-vww1	220kB	220kB	113kB	92kB
mcunet-vww2	385kB	390kB	201kB	174kB
# mcunet models (ImageNet)
mcunet-in0	161kB	161kB	69kB	49kB
mcunet-in1	219kB	219kB	106kB	96kB
mcunet-in2	460kB	469kB	238kB	215kB
mcunet-in3	493kB	493kB	243kB	260kB
mcunet-in4	OOM	OOM	342kB	416kB
# baseline models
proxyless-w0.3-r64	128kB	136kB	97kB	35kB
proxyless-w0.3-r176	453kB	453kB	221kB	259kB
mbv2-w0.3-r64	173kB	173kB	88kB	61kB

The Flash memory usage results:

net_id	TF-Lite Micro @ 713b6ed	CMSIS-NN @ 011bf32	X-CUBE-AI v7.3.0	TinyEngine @ 0363956
# mcunet models (VWW)
mcunet-vww0	627kB	646kB	463kB	453kB
mcunet-vww1	718kB	736kB	534kB	521kB
mcunet-vww2	1016kB	1034kB	774kB	741kB
# mcunet models (ImageNet)
mcunet-in0	1072kB	1090kB	856kB	842kB
mcunet-in1	937kB	956kB	737kB	727kB
mcunet-in2	1084kB	1102kB	849kB	830kB
mcunet-in3	1091kB	1106kB	867kB	835kB
mcunet-in4	OOM	OOM	1843kB	1825kB
# baseline models
proxyless-w0.3-r64	1065kB	1084kB	865kB	777kB
proxyless-w0.3-r176	1065kB	1084kB	865kB	779kB
mbv2-w0.3-r64	940kB	959kB	768kB	690kB

Citation

If you find the project helpful, please consider citing our paper:

@article{
  lin2020mcunet,
  title={Mcunet: Tiny deep learning on iot devices},
  author={Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han, Song},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

@inproceedings{
  lin2021mcunetv2,
  title={MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning},
  author={Lin, Ji and Chen, Wei-Ming and Cai, Han and Gan, Chuang and Han, Song},
  booktitle={Annual Conference on Neural Information Processing Systems (NeurIPS)},
  year={2021}
}

@article{
  lin2022ondevice,
  title = {On-Device Training Under 256KB Memory},
  author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song},
  booktitle={Annual Conference on Neural Information Processing Systems (NeurIPS)},
  year = {2022}
}

Related Projects

MCUNet: Tiny Deep Learning on IoT Devices (NeurIPS'20)

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning (NeurIPS'21)

MCUNetV3: On-Device Training Under 256KB Memory (NeurIPS'22)

tinyengine's People

Contributors

Stargazers

Watchers

Forkers

tuskaw 133martie zineos haorand jordaocassiano stanleyjacob jtiab anminhhung h-jia mofanv ulrikhjort longstudy jwang-x rwang0417 tonyzhangnxp es-renesas anylee2021 rajarshisaha95 ominux phdmarine xuannadi tamoghna-sarkar baris-unver jmluu math4mad solderzzc faizol wenjie-lu lyken17 amorjnyh joshuayan jie311 tkernelcn nixward raymondwang0 berserkr otakbeku 1170300817 zeki411 trigrass2 zerkclown ambiqai dawnpower fhahaha zhenyulincs yanghl1998 dwxqs izenderi gunjupark zhaolanhuang ellial nepallizarazu supercb zj463261929 dragonxu neo4reo babyblue26 gjtjx chen-ruixuan yin9w9 pengzhouzp rniczh nugabom zhouguangxin boriskra yuelong59 gorangeeeeeeeeeeeeeeeeeeeeeeeeeee behicklncky huangzhengxiang klonggan sun2018421 gabemeikle robotseye patoalejor hoanglocla9 shockstar 17873158480 igiardiyanto apollohuang1 emclab-sinica martxel brunoscaglione people-in-ai tic-top vbcalinao sunxitong1 bubblebabyboi mozarturing gogo800 ronsheely esl-epfl laymond1 liunix61 caixiong110 ooshyun mayssa1000 yujin2011 lkfdu67 ducbxfsoft sonnh-uit

tinyengine's Issues

STM32CubeIDE version

Hello, I tried the vww example with the stm32f746-disco board. However, as written, it works well in version 1.5.0, but I hope there is a place where I can find the reason and solution for not running in 1.11.0.

.patch file to build it for other boards

Hey, @meenchen

For person_detection example, the "openmv_person_detection.patch" file handles everything for OpenMV H7 only.

For example, for the section:

diff --git a/src/omv/boards/OPENMV4/omv_boardconfig.h b/src/omv/boards/OPENMV4/omv_boardconfig.h
index 412de472..f7da2c03 100644
--- a/src/omv/boards/OPENMV4/omv_boardconfig.h
+++ b/src/omv/boards/OPENMV4/omv_boardconfig.h
@@ -150,16 +150,18 @@
 // The maximum available fb_alloc memory = FB_ALLOC_SIZE + FB_SIZE - (w*h*bpp).
 #define OMV_FFS_MEMORY          DTCM        // Flash filesystem cache memory
 #define OMV_MAIN_MEMORY         SRAM1       // data, bss and heap memory
+#define OMV_MAIN_MEMORY2        SRAM5       // my memory
 #define OMV_STACK_MEMORY        ITCM        // stack memory
 #define OMV_DMA_MEMORY          SRAM2       // DMA buffers memory.
 #define OMV_FB_MEMORY           AXI_SRAM    // Framebuffer, fb_alloc
 #define OMV_JPEG_MEMORY         SRAM3       // JPEG buffer memory.
 #define OMV_VOSPI_MEMORY        SRAM4       // VoSPI buffer memory.
 
-#define OMV_FB_SIZE             (400K)      // FB memory: header + VGA/GS image
-#define OMV_FB_ALLOC_SIZE       (100K)      // minimum fb alloc size
+#define OMV_FB_SIZE             (100K)      // defualt: 400 FB memory: header + VGA/GS image
+#define OMV_FB_ALLOC_SIZE       (50K)      // default: 100 minimum fb alloc size
 #define OMV_STACK_SIZE          (64K)
-#define OMV_HEAP_SIZE           (236K)
+#define OMV_HEAP_SIZE           (136K)
+// #define OMV_HEAP_SIZE           (236K)
 
 #define OMV_LINE_BUF_SIZE       (3 * 1024)  // Image line buffer round(640 * 2BPP * 2 buffers).
 #define OMV_MSC_BUF_SIZE        (2K)        // USB MSC bot data

This is the copied version of OPENMV4. That is why changing the "OPENMV4" parts with "OPENMV4P" is not enough to successfully build it. Further changes required since the "OPENMV4P/omv_boardconfig.h" is completely different than "OPENMV4/omv_boardconfig.h".

Potential changes are tough to guess, so it would be excellent to have some information about how to modify this .patch file. Besides, if there exists any further changes needed, I would be appreciated if you can also mention them.

Thanks in advance.

About SE Block in tinyengine codes.

Hi, I have some questions about tinyengine's codes.

While looking through tinyengine's codes, I found code in TfliteConvertor.py that handles SE Block.
The code includes a comment that reads as follows:

#         -> MEAN -> MEAN -> PWCONV -> PWCONV -> | ADD -> MUL ->     |
#  DWCONV                                        |            -> MUL |
#

here are my questions:

Is it correct that SE Block means Squeeze & Excitation module?
If so, does "ADD -> MUL -> MUL" refer to the h-swish activation function that replaces sigmoid?

Thank you

get_kernel_buffer undefined

Thanks for your great jobs. When I use training tutorial, the c file 'convolve_1x1_s8_kbuf.c' and 'convolve_1x1_s8_skip_pad.c' in the int_forward_op, the function 'get_kernel_buffer'/'get_sbuffer_size' is used inside, and this function has an undefined error. May I ask where this function is defined or maybe I've done something wrong? I would appreciate if you could provide some help.

Conversion of FC Layers and Conv Layers

Thanks for the great work. Unfortunately, as of now the library only works for mcunet models. The support for custom models is not completely implemented. For example, to convert a model with fully connected layers. The code generates Conv operation.
_convert_FULLY_CONNECTED located in TfliteConvertor.py returns wrong operations.
def _convert_FULLY_CONNECTED(self, op): ....... op = conv2d.Conv2d(params) return op
Also, codegen only generates depthwise convolution header files with floating point quantization. Meaning that the genModel.c file might contain an operation, which is not yet defined. For instance, convolve_x_y_z_fpreq.h file is not generated.
To resolve these issues, I think fc.py needs to be implemented inside operators folder and code templates need to be implemented for the convolution as well as fully connected layers. Are my objections correct? Are you planning to implement the missing files? If i decide to implement it myself, from where should i start?

Up-to-date ProxylessNAS models?

We're trying to run NAS ourselves using OFA, but you have not open-sourced up-to-date ProxylessNAS models used in the mcunet search. These would be helpful for us to re-create your results and use them in our projects. Are there plans to do this?

Thanks!

patch inference

thanks for your excellent jobs! it's very useful for me to porting ai model on low power edge devices. i'm very interesting about the patch inference method, but i can't find anymore information about the patch method in the codebase. will you provide this code?

Cannot run Codegen to generate code for other models

I was trying to deploy a model with a different input shape to the STM32 board, but running this command raises NotImplementedError:

python examples/tiny_training.py -f full_bp-1x3x128x128-graph.json -D full_bp-1x3x128x128-params.pkl -QAS scale.json -m -g -d -FR

Where scale.json comes from img1 (highlighted)
And both full_bp-1x3x128x128-graph.json & full_bp-1x3x128x128-params.pkl comes from img2 (highlighted)

These 3 files were generated accordingly from the tiny_training repo's compilation/readme.md

Any thoughts for this issue?

VWW demo and Arducam Shield Mini 2MP Plus

I am very interested in your project and want to realize it. I couldn't get the Arducam Shield Mini 2MP Plus on Taobao.Can I use Arducam Shield Mini 5MP Plus?

Different inference result on my own model using TinyEngine compare to python

Hi, @meenchen. Thanks for your great jobs. As title, when I implemented my own task in the STM32cubeIDE and checked the network inference results, I found that the inference result would appear some biases compare to result of inferring TFlite model using python, especially happens when the deeper the network layer. I would like to ask if these biases are caused by some slight differences between the op in TinyEngine and the op in Tflite? Or have you ever encountered this problem? I would appreciate if you could provide some help. The device I am using is STM32F746G-DISCO, and my tensorflow version is 2.11.0.

Is it possible to use a better resolution than QQVGA?

Hey, @meenchen

Like you have written in the person_detection_demo script,

sensor.set_framesize(sensor.LCD)  # Set frame size to QVGA 160x128

we made our inference on QQVGA (160x128):

sensor.set_framesize(sensor.QQVGA)

(since we use the frame buffer instead of an LCD). It worked very nice. However, I tried to test it with higher resolutions but there did not occur any detection. I was wondering if it is possible to have a higher resolution with using this engine. Thanks a lot in advance.

mcunet model with cmsis-nn

Hi @meenchen @RaymondWang0 I couldn't find any tutorial or demo to run mcunet models with cmsis-nn, can you please point me to the page or please let me know how to generate mcunet model to deploy on cmsis-nn functions.. or can i use the same model with cmsis-nn similar functions?

inference tutorial error

14:35:33 **** Incremental Build of configuration Debug for project TinyEngine_vww_tutorial ****
make -j7 all
arm-none-eabi-g++ -o "TinyEngine_vww_tutorial.elf" @"objects.list" -mcpu=cortex-m7 -T"../STM32F746NGHx_FLASH.ld" --specs=nosys.specs -Wl,-Map="TinyEngine_vww_tutorial.map" -Wl,--gc-sections -static -mfpu=fpv5-sp-d16 -mfloat-abi=hard -mthumb -Wl,--start-group -lc -lm -lstdc++ -lsupc++ -Wl,--end-group
/Applications/STM32CubeIDE.app/Contents/Eclipse/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.7-2018-q2-update.macos64_1.5.0.202011040924/tools/bin/../lib/gcc/arm-none-eabi/7.3.1/../../../../arm-none-eabi/bin/ld:../STM32F746NGHx_FLASH.ld:163: warning: memory region DTCMRAM' not declared Src/TinyEngine/src/kernels/fp_requantize_op/convolve_1x1_s8_ch16_fpreq.o: In function convolve_1x1_s8_ch16_fpreq':
/Users/karl/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/src/kernels/fp_requantize_op/convolve_1x1_s8_ch16_fpreq.c:61: undefined reference to write_q15x2_ia' /Users/karl/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/src/kernels/fp_requantize_op/convolve_1x1_s8_ch16_fpreq.c:61: undefined reference to write_q15x2_ia'
/Users/karl/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/src/kernels/fp_requantize_op/convolve_1x1_s8_ch16_fpreq.c:62: undefined reference to write_q15x2_ia' /Users/karl/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/src/kernels/fp_requantize_op/convolve_1x1_s8_ch16_fpreq.c:62: undefined reference to write_q15x2_ia'
/Users/karl/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/src/kernels/fp_requantize_op/convolve_1x1_s8_ch16_fpreq.c:88: undefined reference to write_q15x2_ia' Src/TinyEngine/src/kernels/fp_requantize_op/convolve_1x1_s8_ch16_fpreq.o:/Users/karl/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/src/kernels/fp_requantize_op/convolve_1x1_s8_ch16_fpreq.c:88: more undefined references to write_q15x2_ia' follow
collect2: error: ld returned 1 exit status
make: *** [makefile:88: TinyEngine_vww_tutorial.elf] Error 1
"make -j7 all" terminated with exit code 2. Build might be incomplete.
14:35:34 Build Failed. 7 errors, 1 warnings. (took 468ms)

I followed your tutorial, using macOS, python 3.6, stm32cubeide 1.5.0, but got this error. Please help to fix it. Thank you very much!!!

screen too dark

Hi, thanks for your work.
I tried the tutorial without a camera and the MCU board is the same as yours. It can detect if there is a person in the picture. But the screen was too dark. Where can I adjust to make the screen brighter.

No implemantation of convolve_1x1_s8_oddch_fpreq()

Hello team.

I found that there is no kernel of convolve_1x1_s8_oddch_fpreq(),

Would you mind upload this kernel of tiny engine?

IDE Compilation Error

Currently following the inference tutorial and attempting to build the the VWW demo (step 3 of the tutorial). STM32CubeIDE seems to be unable to fully compile the project. Any suggestions to where the source of the error might lie?

Is it in the Makefile? or could the error have been caused all the way back during the CodeGeneration step? Am I including the wrong version of MCUNet?

Problem when doing inference tutorial

Hi team,

This is quite an amateur question but I'm doing the Inference Tutorial up to step 2 (created a new directory and moved libraries).

So up to this point, I just want to run the empty int main(void) to make sure the libraries are loaded successfully and use another model in the future. I

have defined includePath in c_cpp_properties.json of C/C++ Intellisense extension the same as the tutorial's (Image 1)
gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)
ran main.cpp with C/C++: g++ build active file
am using VSCode with C/C++ Intellisense extension. Is this or good choice? Or what do you recommend?

Still, in main.cpp, the error appears in the first line #include "main.h" saying fatal error: main.h: No such file or directory despite it is in Inc and included in includePath. What should I do to resolve this issue?

Image 1

Image 2

Appreciate any support you can provide :) Please ask me if you need any clarification.
Rodo

Using TinyEngine with TensorFlow Lite custom models & different/custom datasets.

Is it possible to capture/improve the performance of (in terms of accuracy and peak memory usage) a custom tflite already trained model (that I converted from an originally simple keras model) using TinyEngine, when compared to the plain TensorFlow Lite implementation of the same model? Also do I need to add any extra functionality to the existing code-base in order to evaluate my model against my own dataset(dataset form: training & validation sets as numpy arrays, classification problem with 4 classes)?

Any suggestion/guidance would be deeply appreciated on how to conduct the performance analysis described above by using the TinyEngine inference library, given that my model only supports compatible TinyEngine operators(aka neural net layers).
-Antonios.
p.s. Novice fan/user of TinyEngine.

KWS Model availability

Hi @meenchen, is there a KWS model and its source code available as mentioned in Paper1, if yes can you please provide it? else is there any reason why the model is not available and also let us know by when it can be made available?

Would you mind you uploading inplace depthwise kernel file on github?

Hi. Thank you for your contribution.

I generated c++ code of Mobilent V2(net id=mbv2-320kB) model with patch-based option by tiny engine.

when I did end-to-end working test in stm32ide, it's not working with error message that undefined reference to `depthwise_kernel7x7_stride2_inplace_CHW_fpreq'

I spend a lot of time finding this file but I could't.

Would you mind you uploading these file on github?

*error message

11:21:03 **** Incremental Build of configuration Debug for project mbv2_patch_1226 ****
make -j8 all
arm-none-eabi-g++ -o "mbv2_patch_1226.elf" @"objects.list" -mcpu=cortex-m7 -T"../STM32F746NGHx_FLASH.ld" --specs=nosys.specs -Wl,-Map="mbv2_patch_1226.map" -Wl,--gc-sections -static -mfpu=fpv5-sp-d16 -mfloat-abi=hard -mthumb -Wl,--start-group -lc -lm -lstdc++ -lsupc++ -Wl,--end-group
c:\st\stm32cubeide_1.5.0\stm32cubeide\plugins\com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.7-2018-q2-update.win32_1.5.0.202011040924\tools\arm-none-eabi\bin\ld.exe:../STM32F746NGHx_FLASH.ld:163: warning: memory region DTCMRAM' not declared Src/TinyEngine/codegen/Source/genModel.o: In function invoke':
/Users/raymondwang/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/codegen/Source/genModel.c:62: undefined reference to depthwise_kernel7x7_stride2_inplace_CHW_fpreq' /Users/raymondwang/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/codegen/Source/genModel.c:76: undefined reference to depthwise_kernel5x5_stride1_inplace_CHW_fpreq'
/Users/raymondwang/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/codegen/Source/genModel.c:84: undefined reference to depthwise_kernel7x7_stride2_inplace_CHW_fpreq' /Users/raymondwang/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/codegen/Source/genModel.c:90: undefined reference to depthwise_kernel7x7_stride1_inplace_CHW_fpreq'
/Users/raymondwang/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/codegen/Source/genModel.c:112: undefined reference to depthwise_kernel5x5_stride2_inplace_CHW_fpreq' /Users/raymondwang/STM32CubeIDE/workspace_1.5.0/TinyEngine_vww_tutorial/Debug/../Src/TinyEngine/codegen/Source/genModel.c:134: undefined reference to depthwise_kernel7x7_stride1_inplace_CHW_fpreq'
collect2.exe: error: ld returned 1 exit status
make: *** [makefile:88: mbv2_patch_1226.elf] Error 1
"make -j8 all" terminated with exit code 2. Build might be incomplete.

11:21:05 Build Failed. 7 errors, 1 warnings. (took 2s.39ms)

**********************end error message ************************

Once again, Thank you for your work.

mcunet-10fps inference failing shows nonperson always

With the help of the tinyengine/tutorial I am able to run the mcunet_5ps(default) and able to see the inference is working fine, so tried changing model to mcunet_10fps in tinyengine/tutorial/examples/vww.py file able to run the code but the inference is failing (always showing no person though the person is present).

anything I need to take care to run mcunet_10fps?
@meenchen @RaymondWang0 please help, thanks in advance.

keras model for the mcunet-5-fps-vww model

Hi @meenchen, can I get the base keras model of mcunet-5-fps-vww model prior converting it to tflite?
Thanks.

Build Guidance for non-STM32 MCUs

Tinyengine is an exciting project to people want to deploy ai models on mcus like me .
I think tinyengine would enable deep learning on huge amounts of edge devices including but not limited to STM32 chips, so please consider writing a makefile template / build guidance for people want to build tinyengine for non-stm32 mcus with arm-linux toolchains .

Thanks a lot!

the code of the audio demo

Hi there,
It's so fantastic to deploy a model on an mcu.
Inspired by your paper，I wanna implement an intelligent doorlock with offline speaker verification system.
However,I cannot find the demo code of KWS model mentioned in the paper.Could you please share it?

Thanks a lot

Tutorial issues

Thanks for the great work! Currently I do not use ArduCam and I just want to test LCD
After following steps described in the tutorial, I can compile without errors but run with some errors:

(1) No source available for "d_expression_1() at 0x8001058"

(2) Even if I move lcdsetup() before "SystemClock_Config()" and try to display string on LCD, it doesn't show anything onto the LCD screen. Do we need to set something else? Or is it just related to the cam device?

(3) Is there any method we can print message in console mode to verify if we enter main() function?

Furthermore, the table of "Measure results", how can we get the time measurements? Use HAL_getTick() before and after invoke()?

Could you please help comment that? Thanks

Using SDRAM of OpenMV instead of SRAM

Hey, @meenchen . it is me again :)

I have realized that there exists a SDRAM which has a 32MB of memory while I was looking for OpenMV H7 Plus' features. This is way bigger than SRAM which has only 1MB. I was wondering if is there a way to embed the TinyEngine to SDRAM.

Just for you to remember, I was trying to handle a memory overflow problem in the firmware for higher resolutions ( #75 ).

I have no limit for the time that the inference takes, I will have a lot of time for inference in the pipeline of my project. So, if using SDRAM for higher resolutions is possible, I won't be suffering because of the time increased.

Thank you so much in advance.

Would you mind upload some kernel of tiny engine?

Hello again, and Thanks for your work.

I found that there is no kernel, patchpadding_depthwise_kernel5x5_stride1_inplace_CHW.c,
patchpadding_depthwise_kernel5x5_stride2_inplace_CHW.c,
and 7x7 stride1 and 2..so on.

file location tinyengine/TinyEngine/src/kernels/int_forward_op.

Would you mind upload these kernel of tiny engine?

If that's impossible, I'd appreciate it if you could let me know as well.

.......................................................................................................................................................

Recently, I successed your Patch-based code genertion(with mcunet github's model-zoo tflite file), and it worked well with some model, on my board. (I solved a lot of error for this work, but anyway it's working.)

I did the end-to-end test on stm32 F746gz-disc board, and camera.(same stuff as the tutorial)

but I found that there is no kernel, patch_depthwise_kernel5x5_stride1_inplace_kernel_CHW.c and patch_depthwise_kernel7x7_stride1_inplace_kernel_CHW.c file in project forder path,
tinyengine/TinyEngine/src/kernels/int_forward_op.

***my Experiment result

(this Experisment is wrong because I changed patchpadding file from 5x5 to 3x3.)

once again, Thanks for your work.

Not implemented error got when trying to generate code for one of the already included networks

Hi, @meenchen, thanks for your great work. As I put in the title I got the following error (see image)

I started by following the tiny-training's repo compilation/README instructions, So: running the mcu_ir_gen.py and using the mcunet-5fps.pkl file.
Then I ran the ir2json.py file selecting the sparse_bp-49kb-1x3x128x128.ir just generated.
Then I took the scale.json from the ir_zoos corresponding folder (generated with mcu_gen_ir), the graph.json and the params.pkl from .model/testproj/ that were generated with ir2json, I put all three in the assets folder.
Finally I ran: python examples/tiny_training.py -f assets/sparse_bp-49kb-1x3x128x128-graph.json -D assets/sparse_bp-49kb-1x3x128x128-params.pkl -QAS assets/scale.json -m -g -d -FR
Making sure it was using the same 49kb sparse update scheme, and I got that Not implemented error, is there something that I might have missed during the process?

Pd. It seems to be originating from an "abs" operation that could not be handled, but being the provided examples in tiny-training makes me think I missed something at some point. Any thoughts on it?

Demo without LCD/Camera

Thanks for the great work!

For now, the demos are all strongly related to the camera and LCD. I think a demo without camera/lcd, and read image from head file and print output results will be very helpful.

BTW: I'm working on this demo in spare time, but I'm familiar with stm32... Don't know how much time I need to finish this

Recipe for target 'FIRMWARE_OBJS' failed - Error 2

Hello, @meenchen

I get "Error 2" while I am trying to build my OpenMV H7 Plus for your person detection. I could not solve this issue.

make[1]: Leaving directory '/home/senceryucel/Desktop/tinyengine/examples/openmv_person_detection/openmv/src/drivers/winc1500'
omv/ports/stm32/omv_portconfig.mk:593: recipe for target 'FIRMWARE_OBJS' failed
make: *** [FIRMWARE_OBJS] Error 2
make: Leaving directory '/home/senceryucel/Desktop/tinyengine/examples/openmv_person_detection/openmv/src'

How can I handle this? Thanks in advance.

Could you tell me model hyper-parameter in MCUNet paper

Hello, I found your MCUNet paper interesting and I am writing to ask for information on the hyper-parameters used in the implementation of the models used in the experiments. While I believe that you will eventually provide the training code for MCUNet or ProxylessNas, at least, I would appreciate it if you could let me know what hyper-parameters you used for the MobileNet V2 that was used in the MCUNet experiments. Specifically, I am interested in the following models: MobileNet w0.35-r64 in MCUNet V1 and MobileNet w0.35-r144 in MCUNet V2, as well as MobileNet V2, MobileNetV2-RD, and MobileNetV2 (Non-overlap) in Table 5. It would be great if you could provide the information in fp32.

I am looking forward to a prompt response.

Platform-independent operation

Hi team，
Some questions are bothering me.
When I use code generation, arm dependent code is automatically generated,
"depthwise_kernel3x3_stride1_inplace_CHW_fpreq.c",
"#include "arm_nnsupportfunctions.h" //TODO: remove this in the future for self-contained"
This header file is included NN component of CMSIS, and generated code contains these arm dependencies.
If I'm testing a demo on windows or linux, this bothers me, because I need to build a simulation environment to test.
I clearly know that using third-party libraries will speed up operational calculations, but I want to keep things simple.
So is there an implementation that platform independent or does not require third-party library?

I am looking forward to your reply.

mcunet face detection model and code

Hi Team, I would like to run the mcunet timyengine face detection model on M7 board and looking to find the source code but I see that there is only VWW code captured here. Is there a way to find the face detection code ? Also it would be helpful if you can add face detection benchmarking numbers for detection timings.

Thanks in advance.

How can I send "3" to the UART input for the stm32 MCU?

Hi,

Context: I am new to this and going through the on-device training tutorial.

Issue: I am on this instruction: Send "3" to the UART input for the MCU: Training mode
Do I need to use another board to direct the UART communication? Or is there a PC app I can use to perform this UART communication?

Thanks!

TinyEngine convolutional layer has greater latency than ARM's CMSIS-NN

Hello,

I was measuring the latency on one of TinyEngine's convolutional kernels (convolve_s8_kernel3_stride1_pad1) versus CMSIS-NN's fast convolutional kernel (arm_convolve_HWC_q7_fast). The TinyEngine kernel had a latency of appx. 200000 cycles while the CMSIS kernel had a latency of appx. 130000 cycles.

Is the additional overhead due to the per channel requantization of Tiny Engine? Could you explain why per channel requantization is needed in the kernel?
Have you tried benchmarking the latencies of the frameworks per kernel? If so, could you share the results?

Thank you in advance.

inference demo error

When I followed the steps to build, there were these errors. Please help to solve them. Thank you very much

visual wake word (VWW) model end to end workflow?

Hi,

Where do I find jupyter notebook of given example visual wake word (VWW) model in the tutorial folder if I want to check it's model architecture and how it is optimized for MCU (pruned, quantized, model conversion (how C/C++ code is generated from the model)) to follow end to end workflow?

camera do not work

Hello,your job is great.but when I followed your tutorial to do it, the software model is the same, the model of the development board and the model of the camera are also the same, to reproduce the vvw model, but the camera does not work, the screen is completely black.

person_detection and face_mask_detection

Hello, may I ask what are the hardware devices and models of the demo OpenMV Cam H7?
I want to repeat that.
Thank you very much!!

Recipe for target 'firmware' failed

Hello,

Thanks for your job, everything is amazing. Following is what I am trying to face:

I have done everything correctly and it worked fine on my OpenMV H7 Board. However, when I try to run it on my OpenMV H7 Plus Board, I got an error at the last step of the building the firmware.

I changed TARGET=OPENMV4 to TARGET=OPENMV4P for both while building the source and while recompiling it. In building source part, everything works correctly. But I got the error below at the last step, while recompiling it:

 make[1]: Leaving directory '/home/senceryucel/Desktop/tinyengine/examples/openmv_person_detection/openmv/src/omv'
/usr/local/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/10.2.1/../../../../arm-none-eabi/bin/ld: /home/senceryucel/Desktop/tinyengine/examples/openmv_person_detection/openmv/src/build/bin/firmware.elf section `.text' will not fit in region `FLASH_TEXT'
/usr/local/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/10.2.1/../../../../arm-none-eabi/bin/ld: /home/senceryucel/Desktop/tinyengine/examples/openmv_person_detection/openmv/src/build/bin/firmware.elf section `.bss' will not fit in region `SRAM1'
/usr/local/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/10.2.1/../../../../arm-none-eabi/bin/ld: section .dma_memory VMA [0000000030040000,0000000030043bff] overlaps section .bss VMA [0000000030000adc,0000000030042c8b]
/usr/local/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/10.2.1/../../../../arm-none-eabi/bin/ld: section ._heap VMA [0000000030042c8c,000000003007ec8b] overlaps section .dma_memory VMA [0000000030040000,0000000030043bff]
/usr/local/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/10.2.1/../../../../arm-none-eabi/bin/ld: section .d2_dma_memory VMA [0000000030043c00,0000000030047bff] overlaps section ._heap VMA [0000000030042c8c,000000003007ec8b]
/usr/local/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/10.2.1/../../../../arm-none-eabi/bin/ld: region `SRAM1' overflowed by 257164 bytes
/usr/local/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/10.2.1/../../../../arm-none-eabi/bin/ld: region `FLASH_TEXT' overflowed by 168120 bytes
collect2: error: ld returned 1 exit status
omv/ports/stm32/omv_portconfig.mk:649: recipe for target 'firmware' failed
make: *** [firmware] Error 1
make: Leaving directory '/home/senceryucel/Desktop/tinyengine/examples/openmv_person_detection/openmv/src'

What might be the cause for this? Thanks a lot in advance.

simple convolution outputs not matching with hand calculation and cmsis-nn simp_conv function outputs

Hi @meenchen @RaymondWang0,
context: I am trying to run a simple convolution layer (by using convolve_s8_kernel3_stride1_pad1 function)
and trying to compare the outputs with hand calculation and also with cmsis-nn simple convolution outputs, also trying to compare the timing results for the layer.

Issue: The outputs of (convolve_s8_kernel3_stride1_pad1 function) are not matching with the hand calculated outputs please refer to the attached code snippet below.

FYI: Hand calculated outputs got matched with cmsis-nn simple convolution function.

My question is am I calling correct simple convolution function or is there any other simple convolution function to test the basic convolution using mcunet kernel.

`
#include "arm_math.h"
#include "arm_nnfunctions.h"
#include "arm_nnsupportfunctions.h"
#include "img2col_element.h"
#include "tinyengine_function.h"
#include<stdio.h>

#define CONV_WT_M4 {0, -1, 1, -1, 0, -1, 0, -1, -1, 0, 1, 1, -1, 1, 1, 0, 1, -1, 1, -1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, -1, 1, 0, 0, 0, -1, -1, 1, -1, -1, -1, 1, 1, 1, 1, 1, -1, -1, -1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, -1, -1, -1, 0, 0, 0, 0, 0, 0}

const int8_t in_data[36] =
{
1, 2, 0, 2, 1, 2, 1, 0, 1, 2, 2, 2, 2, 1, 0, 2, 2, 0, 2, 2, 0, 0, 1, 2,
2, 0, 2, 0, 0, 1, 0, 0, 1, 0, 1, 0
};

#define CONV_BIAS_M4 {0, 0}

const int8_t out_data[18] =
{
-2, -1, 4, -2, 4, -1, -1, 7, 1, -1, -5, -2, 3, 3, -2, -3, 1, 1
};

#define CONV_IN_DIM_M4 3
#define CONV_IN_CH_M4 4
#define CONV_KER_DIM_M4 3
#define CONV_PAD_M4 1
#define CONV_STRIDE_M4 1
#define CONV_OUT_CH_M4 2
#define CONV_OUT_DIM_M4 3
#define CONV_BIAS_LSHIFT_M4 0
#define CONV_OUT_RSHIFT_M4 0

static const q7_t conv2_wt[CONV_IN_CH_M4 * CONV_KER_DIM_M4 *
CONV_KER_DIM_M4 * CONV_OUT_CH_M4] = CONV_WT_M4;

static const q7_t conv2_bias[CONV_OUT_CH_M4] = CONV_BIAS_M4;

q7_t scratch_buffer_2[92160];

q15_t * buffer2 = (q15_t *) scratch_buffer_2;

q7_t output_data[CONV_OUT_DIM_M4 * CONV_OUT_DIM_M4 * CONV_OUT_CH_M4];

static const q7_t conv2_out_mul[CONV_OUT_CH_M4] = { 1, 1};

static const q7_t conv2_out_shift[CONV_OUT_CH_M4] = { 30, 30 };

int success_or_not = !tinyengine_status;

int status =
convolve_s8_kernel3_stride1_pad1 ((const q7_t *) in_data, CONV_IN_DIM_M4,
CONV_IN_DIM_M4, CONV_IN_CH_M4, conv2_wt,

conv2_bias, conv2_out_shift, conv2_out_mul,
0, 0, -128, 127, (q7_t *) output_data,
CONV_OUT_DIM_M4, CONV_OUT_DIM_M4,
CONV_OUT_CH_M4, buffer2, 0);

if (success_or_not != status)
{

printf ("Function call Failed\r\n");

}
else
{

printf ("Function call Passed\r\n");

}

Uint8 model

Thanks for the great work.
As the tutorial described, the input and output are all of "int8" type.

Is uint8 model supported or we should just convert into int8 instead?
Thanks

Torch->TFlite Converter?

Your example .tflite fliles in the /assets folder, seem like they were generated by a custom tool. At least their description field in the binary is TinyNeuralNetwork Converted. instead of your standard MLIR Converted. or TOCO Converted., coming from tensorflow's tf.lite.TFLiteConverter. Is this correct?

We're trying to convert our own Proxyless models but are having trouble doing so because restricting op support in the code generator. Are there plans to open source a torch->tflite converter?

In the original mcunet submodule (the old MCUNet repo), there's some TensorFlow 1.x code to convert a ProxylessNAS network to TFLite. Do you have updated code for this? And updated Proxyless models? Which ties in with... #5

Thanks!

Profiling method?

First of all, I love the project you are doing!
So my question is: How do you profile the memory (and maybe storage) the model consumes?

I really appreciate any help you can provide.
Rodo

some api used in "GeneralMemoryScheduler.py"

Excuse me, I cannot trace the exact definition of the following api which are related to op

layermem["MAC"] = op.get_macs()
layermem["activation"] = op.get_activation_size()
layermem["scale"] = op.get_scale_size()
layermem["runtime"] = op.get_sbuf_size()
layermem["kernel"] = op.get_kbuf_size()

Where can we find the definition of get_macs()/get_activation_size()/get_scale_size()/get_sbuf_size()/get_kbuf_size() ?
Could you please help comment that?
Thanks

blank screen

Hi, I built the TinyEngine tutorial with zero errors. Then, after the download, the screen goes black immediately. No errors are mentioned. No warnings either. Any thoughts or advice?

Code Generation Patch Based Inference Bug

Facing an issue related to PR #26:

Traceback (most recent call last):
File "examples/vww.py", line 31, in
life_cycle_path="./lifecycle.png",
File "/Users/amahmed/Desktop/UMass/Spring_2022/Thesis/tinyengine/code_generator/CodegenUtilTFlite.py", line 70, in GenerateSourceFilesFromTFlite
code_generator.codeGeneration()
File "/Users/amahmed/Desktop/UMass/Spring_2022/Thesis/tinyengine/code_generator/CodeGenerator.py", line 131, in codeGeneration
self._genPatchInference()
File "/Users/amahmed/Desktop/UMass/Spring_2022/Thesis/tinyengine/code_generator/CodeGenerator.py", line 182, in _genPatchInference
last_patch_op_output_buffer_str_for_patch_inference = last_patch_op._getBufferstr(
AttributeError: 'NoneType' object has no attribute '_getBufferstr'

Deleting lines 179 to 181 in CodeGenerator.py removes the error.

No module named "cexample"

Hello, @meenchen

We were trying to integrate this beautiful engine into our H7 Plus Board. We changed the TARGET = OPENMV4 to OPENMV4P as needed. Everything seemed to be working fine, until we tried to run the example person detection script.


import cexample
import sensor


sensor.reset()  # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565)  # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.HD)  # Set frame size to QVGA 160x128

while True:
    img = sensor.snapshot()  # Take a picture and return the image.
    ret = cexample.person_detection(img, 0.15)

Since we have no LCD screen, we just modified the code a little to test it via frame buffer. However, when we run, it says that there is no module named "cexample".

We looked a bit in the repo for it, but could not find anything that would help us. Any help will be appreciated. Thanks a lot in advance!

vvw.py error

Hello, I'm trying to do exmaple. However I encounter this error while executing "python vvw.py"

(pytorch) user:~/바탕화면/tinyengine$ python examples/vww.py 
Deriving the memory schedule for 41 activation tensors.
100%|██████████████████████████████████████████████████████████████████████████| 41/41 [00:00<00:00, 185109.22it/s]
Traceback (most recent call last):
  File "/home/user/바탕화면/tinyengine/examples/vww.py", line 28, in <module>
    peakmem = GenerateSourceFilesFromTFlite(
  File "/home/user/바탕화면/tinyengine/code_generator/CodegenUtilTFlite.py", line 54, in GenerateSourceFilesFromTFlite
    memory_scheduler.allocateMemory()
  File "/home/user/바탕화면/tinyengine/code_generator/GeneralMemoryScheduler.py", line 190, in allocateMemory
    self.allocator.visualize(self.mem_visual_path)
  File "/home/user/바탕화면/tinyengine/code_generator/allocator/base_allocator.py", line 240, in visualize
    plt.savefig(path, dpi=FIGURE_CONFIG["DPI"])
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/pyplot.py", line 942, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/figure.py", line 3272, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/backend_bases.py", line 2338, in print_figure
    result = print_method(
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/backend_bases.py", line 2204, in <lambda>
    print_method = functools.wraps(meth)(lambda *args, **kwargs: meth(
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/_api/deprecation.py", line 410, in wrapper
    return func(*inner_args, **inner_kwargs)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py", line 520, in print_png
    self._print_pil(filename_or_obj, "png", pil_kwargs, metadata)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py", line 466, in _print_pil
    FigureCanvasAgg.draw(self)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py", line 408, in draw
    self.figure.draw(self.renderer)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/artist.py", line 74, in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/artist.py", line 51, in draw_wrapper
    return draw(artist, renderer)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/figure.py", line 3069, in draw
    mimage._draw_list_compositing_images(
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/image.py", line 131, in _draw_list_compositing_images
    a.draw(renderer)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/artist.py", line 51, in draw_wrapper
    return draw(artist, renderer)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/axes/_base.py", line 3099, in draw
    self.patch.draw(renderer)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/artist.py", line 51, in draw_wrapper
    return draw(artist, renderer)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/patches.py", line 589, in draw
    self._draw_paths_with_artist_properties(
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/patches.py", line 574, in _draw_paths_with_artist_properties
    renderer.draw_path(gc, *draw_path_args)
  File "/home/user/anaconda3/envs/pytorch/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py", line 149, in draw_path
    self._renderer.draw_path(gc, path, transform, rgbFace)
TypeError: must be real number, not str

OpenMV Entegration Tutorial

Anyone knows that how to entegrate this engine to OpenMV H7 Plus board ?

mit-han-lab / tinyengine Goto Github PK

tinyengine's Introduction

TinyEngine

TinyML Project Website | MCUNetV1 | MCUNetV2 | MCUNetV3

News

Overview

Code Structure

Requirement

Setup for Users

Setup for Developers

Deployment Example

Measured Results

Citation

Related Projects

tinyengine's People

Contributors

Stargazers

Watchers

Forkers

tinyengine's Issues

Recommend Projects

Recommend Topics

Recommend Org