Comments (6)
NESoftMaxLayer is in the list of functions which haven't been ported to use the new accurate padding yet, therefore it falls back onto the overly conservative auto_padding which is why the size ends up being so big.
We're in the process of finishing porting the functions to use the new accurate padding method.
We'll release an update as soon as it's ready.
from computelibrary.
Thanks I have read the auto_padding.
now I change my code as
Tensor in,out;
in.allocator()->init(TensorInfo(10,1,Format::F32));
out.allocator()->init(*in.info());
NESoftmaxLayer softmax;
softmax.configure(&in,&out);
in.allocator()->allocate();
out.allocator()->allocate();
float in_data[10];
for(int i=0;i<10;i++) in_data[i]=i;
std::copy_n(in_data, 10,
in.buffer()+in.info()->offset_element_in_bytes(Coordinates(0, 0)));
softmax.run();
float out_data[10];
std::copy_n(out.buffer()+out.info()->offset_element_in_bytes(Coordinates(0, 0)),
4*10, out_data);
for(int i=0;i<10;i++) printf("%f ",out_data[i]);
puts("");
but the output was
0.000000 0.000000 0.000000 128.000000 0.000000 0.000000 0.000000 128.000000 0.000000 0.000000
what's wrong for me?
from computelibrary.
Hello @SCUTE-ZZ,
Can you cast your in pointer to float * when you copy the data to the input tensor (std::copy works differently than mempy)?
std::copy_n(in_data, 10, reinterpret_cast<float*>(in.buffer() + in.info()->offset_element_in_bytes(arm_compute::Coordinates(0, 0))));
from computelibrary.
Hello @GeorgeARM,
now I have use mempy instead of std::copy
Tensor in,out;
in.allocator()->init(TensorInfo(10,1,Format::F32));
out.allocator()->init(*in.info());
NESoftmaxLayer softmax;
softmax.configure(&in,&out);
in.allocator()->allocate();
out.allocator()->allocate();
float in_data[10];
for(int i=0;i<10;i++) in_data[i]=i;
memcpy(
in.buffer()+in.info()->offset_element_in_bytes(Coordinates(0, 0)),
in_data,
10*sizeof(float));
softmax.run();
float out_data[10];
memcpy(
out_data,
out.buffer()+out.info()->offset_element_in_bytes(Coordinates(0, 0)),
10*sizeof(float));
for(int i=0;i<10;i++)
printf("%f ",out_data[i]);
puts("");
but the output was
-0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000
from computelibrary.
@SCUTE-ZZ so it seems they were two problems:
a) you were copying wrong with copy_n which you addressed in the patch above
b) it seems that softmax might fail if your output is not a multiple of 4, we did notice this recently (and is fixed internally) as we added more boards in our testing farm. It seems that an exponent is raised to a really high negative value causing the sequence of calculating this exponent to return -inf breaking the sum part of the softmax equation [exp(x - max(x)) / sum(exp(x - max(x)))], that's why you get -0. We didn't notice this as the boards we were initially testing on didn't have such behavior.
In other words, this will be fixed in the next maintenance release in the following days.
Thanks
from computelibrary.
@GeorgeARM
I change size 10 to 16 the output was true.
Thanks
from computelibrary.
Related Issues (20)
- sparse gemm kernels are not supported in ACL HOT 5
- ACL operators need to be made stateless to avoid runtime initialization overhead HOT 2
- why compile_commands.json not generated? HOT 2
- ohos HOT 2
- Performance in NCHW layout operations HOT 1
- [OpenVINO 2023.2.0] ComputeLibrary/libarm_compute-static.a] sh: Argument list too long HOT 3
- Cast operation at Gpu backend should support DataType::QSYMM8 HOT 4
- Do you have group deconvolution layer? HOT 3
- CpuGemmConv2d optimization affects performance on Apple M2/M2 Pro HOT 9
- Uninitialized Regs used causes unintended behavior HOT 3
- Problem with graph_alexnet.cpp HOT 5
- NEMeanStdDevNormalizationLayer returns nans for f16 tensors HOT 3
- CPUInfo::get().has_fp16() returns true on RaspberryPi 4 HOT 3
- Compiling error on Raspberry Pi 5: HOT 3
- Unit test failure CPU/UNIT/Context/CpuCapabilities on WoA HOT 1
- Segfault when running DWC tests on WoA HOT 1
- Fix ACL WoA native build compiler errors HOT 1
- why L1_cache_size and L2_cache_size are constant value
- Question about quantization gemm example
- The result of CLWinogradConvolutionLayer is incorrect.How should I do?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from computelibrary.