Git Product home page Git Product logo

Comments (4)

MikeJKelly avatar MikeJKelly commented on May 27, 2024

Hi @zxros10

The FullyConnected implementation only supports 2D inputs so the reshape is added to flatten the 3D input to a 2D one 1x800x256 to 800x256, in the same way the ExpandDims is added to change the output from 800x256 to 1x800x256. I am surprised that these layers take a long time to run, can you post your profiling data for these layers?

Best regards,
Mike

from armnn.

zxros10 avatar zxros10 commented on May 27, 2024

I use https://github.com/morgolock/vison to analyze my profiling data:
Total time per kernel Percentage of total time Kernel name
24.5090 us 0.0011 activation_layer_quant_f32
43.7800 us 0.0020 activation_layer
45.5170 us 0.0021 concatenate_width_x4
46.9790 us 0.0021 elementwise_operation_DIV
70.8790 us 0.0032 concatenate_width
185.5250 us 0.0085 gemm_reshape_rhs_matrix_t
205.6450 us 0.0094 transpose
212.1030 us 0.0097 strided_slice
285.2300 us 0.0130 elementwise_operation_SUB_quantized
306.3610 us 0.0140 quantization_layer
310.0540 us 0.0141 activation_layer_quant
550.5810 us 0.0251 permute
697.3280 us 0.0318 elementwise_operation_ADD_quantized
1030.5740 us 0.0469 gemmlowp_matrix_b_reduction
1085.6590 us 0.0495 dequantization_layer
1099.9560 us 0.0501 tile
1787.1760 us 0.0814 reduction_operation_x
3003.1760 us 0.1368 pixelwise_mul_quantized
5445.1560 us 0.2480 reshape_layer
5518.2940 us 0.2514 gemmlowp_mm_reshaped_only_rhs_t_fused_output_stage_fixedpoint

The reshape_layer occupy much time. These reshape_layer are not only for FullyConnected,but also Add, Mul, and so on.
For FullyConnected nodes, one of them:
Reshape_for:FullyConnected:0:2_ClReshapeWorkload_Execute_#286
406.52 us reshape_layer
FullyConnected:0:2_ClFullyConnectedWorkload_Execute_#287
44.064 us transpose
38.519 us gemm_reshape_rhs_matrix_t
173.63 us gemmlowp_matrix_b_reduction
2194.185 us gemmlowp_mm_reshaped_only_rhs_t_fused_output_stage_fixedpoint
ExpandDims:0:2_ClReshapeWorkload_Execute_#288
1204.807 us

The reshape _for and ExpandDims exhaust time is close to FullyConnected kernel compute time

from armnn.

MikeJKelly avatar MikeJKelly commented on May 27, 2024

How many iterations have you ran before getting these execution times?

The first time you run a network it will be a lot slower as the GpuBackend compiles kernels for each of the workloads during the first inference. The second and subsequent runs will be faster.

from armnn.

zxros10 avatar zxros10 commented on May 27, 2024

I run 100 iterations. At last I reduce the feature map dimensions in the model, and dismiss the reshape of fullyconnected.
Thanks.

from armnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.