Comments (4)
Hi @zxros10
The FullyConnected implementation only supports 2D inputs so the reshape is added to flatten the 3D input to a 2D one 1x800x256 to 800x256, in the same way the ExpandDims is added to change the output from 800x256 to 1x800x256. I am surprised that these layers take a long time to run, can you post your profiling data for these layers?
Best regards,
Mike
from armnn.
I use https://github.com/morgolock/vison to analyze my profiling data:
Total time per kernel Percentage of total time Kernel name
24.5090 us 0.0011 activation_layer_quant_f32
43.7800 us 0.0020 activation_layer
45.5170 us 0.0021 concatenate_width_x4
46.9790 us 0.0021 elementwise_operation_DIV
70.8790 us 0.0032 concatenate_width
185.5250 us 0.0085 gemm_reshape_rhs_matrix_t
205.6450 us 0.0094 transpose
212.1030 us 0.0097 strided_slice
285.2300 us 0.0130 elementwise_operation_SUB_quantized
306.3610 us 0.0140 quantization_layer
310.0540 us 0.0141 activation_layer_quant
550.5810 us 0.0251 permute
697.3280 us 0.0318 elementwise_operation_ADD_quantized
1030.5740 us 0.0469 gemmlowp_matrix_b_reduction
1085.6590 us 0.0495 dequantization_layer
1099.9560 us 0.0501 tile
1787.1760 us 0.0814 reduction_operation_x
3003.1760 us 0.1368 pixelwise_mul_quantized
5445.1560 us 0.2480 reshape_layer
5518.2940 us 0.2514 gemmlowp_mm_reshaped_only_rhs_t_fused_output_stage_fixedpoint
The reshape_layer occupy much time. These reshape_layer are not only for FullyConnected,but also Add, Mul, and so on.
For FullyConnected nodes, one of them:
Reshape_for:FullyConnected:0:2_ClReshapeWorkload_Execute_#286
406.52 us reshape_layer
FullyConnected:0:2_ClFullyConnectedWorkload_Execute_#287
44.064 us transpose
38.519 us gemm_reshape_rhs_matrix_t
173.63 us gemmlowp_matrix_b_reduction
2194.185 us gemmlowp_mm_reshaped_only_rhs_t_fused_output_stage_fixedpoint
ExpandDims:0:2_ClReshapeWorkload_Execute_#288
1204.807 us
The reshape _for and ExpandDims exhaust time is close to FullyConnected kernel compute time
from armnn.
How many iterations have you ran before getting these execution times?
The first time you run a network it will be a lot slower as the GpuBackend compiles kernels for each of the workloads during the first inference. The second and subsequent runs will be faster.
from armnn.
I run 100 iterations. At last I reduce the feature map dimensions in the model, and dismiss the reshape of fullyconnected.
Thanks.
from armnn.
Related Issues (20)
- Assessment of the difficulty in porting CPU architecture for armnn HOT 1
- CpuAcc and GpuAcc not working (but CpuRef is) HOT 14
- Remove the workaround for gcc.99578? HOT 1
- Compilation Terminated Issue HOT 1
- Linker Error HOT 3
- ETA for prebuilt binary for Android 14? HOT 9
- How Armnn deal with unsupport operator by armnn_delegate HOT 5
- How to use the profiling data to improve my inference HOT 3
- Building delegate using Docker does not copy delegate includes to output tar HOT 2
- UnitTests Failed ! After Build success HOT 2
- How to close optimization of graph or save optimized graph in dot/tflite format when I run test/executeNetwork and set --tflite-executor as delegate? Does armnn provide options? HOT 4
- Do you have armnn profiling analyze tool? HOT 2
- Gather operator dimension check error HOT 10
- I want config the gpu core number for armnn HOT 3
- Explicit padding for Transpose Convolution fails HOT 8
- opencv failed to call opencl HOT 1
- ScatterND not support HOT 3
- Building ARM NN via Docker approach with specific GLIBC , GLIBCXX versions ... HOT 3
- Gather(ND) dim error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from armnn.