yeah568 / math-neon Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/math-neon
Automatically exported from code.google.com/p/math-neon
Library: MATH-NEON By: Lachlan Tychsen-Smith Licence: MIT (expat) ======================================================================================= This project implements the cmath functions and some optimised matrix functions with the aim of increasing the floating point performance of ARM Cortex A-8 based platforms. As well as implementing the functions in ARM NEON assembly, they sacrifice error checking and some accuracy to achieve better performance. Function Errors: ======================================================================================= The measurement and characterisations of the inaccuracies present within these functions is really a field within itself. For the benchmark i provide the maximum absolute, maximum relative and root mean squared error compared to the cmath implementations over the specified range. However these values can be misleading, especially for functions which quickly go to infinity. So its always a good idea to test it within your actual program. In general, this library will not be as accurate as cmath, however for many functions it is close enough to be negilible. Notes: ======================================================================================= - The *_c functions are c implementations of the *_neon code. - Like cmath, The errors present in the functions are very dependent on the range which your operating in. So you should test them first. - Look in the "math_neon.h" file for discriptions of the functions. In some function files there are also notes on the specific implementation. - The *_neon functions make certain assumptions about the location of arguments that is incompatible with inlining. Contact: ======================================================================================= Name: Lachlan Tychsen-Smith Email: [email protected]
I had issues getting correct values following your frexpf algorithm. When I
switched to the algorithm shown at
http://code.metager.de/source/xref/sdcc/sdcc/device/lib/frexpf.c I had better
luck. Honestly not sure if I just messed up my implementation, or if your
algorithm is wrong or not. Just suggesting you might take a second look at it.
Original issue reported on code.google.com by [email protected]
on 7 May 2014 at 9:26
What steps will reproduce the problem?
1. Call asinf_c(x)
2. Call system asinf(x) with same x
3. Compare the two results
What is the expected output? What do you see instead?
When x=-0.9193184972, the system call returns 0.20978982746601104736 and
asinf_c() returns 0.48538914322853088379
What version of the product are you using? On what operating system?
Trunk version on iPhone simulator.
Please provide any additional information below.
Thanks,
Mike
Original issue reported on code.google.com by [email protected]
on 4 Nov 2010 at 2:06
I had cooked a simple makefile for building the math_debug:
http://gitorious.org/vjaquez-misc/math-neon/commit/14ba470caad37c33cf7245be69efc
9a1366d8f99?format=patch
Original issue reported on code.google.com by [email protected]
on 25 Mar 2011 at 11:59
The function dot4_neon() uses the intrinsics API which is not correctly handled
by the project, also it collides with the math_neon.h defined signature.
I just commented it out:
http://gitorious.org/vjaquez-misc/math-neon/commit/3ca3102732e0786350486b52329f0
7392554bd97?format=patch
Original issue reported on code.google.com by [email protected]
on 25 Mar 2011 at 12:02
What steps will reproduce the problem?
1. call sinf_neon(PI/2) and cosf_neon(0)
What is the expected output? What do you see instead?
I am testing sin(PI/2) and cos(0) and I expect to see 1 as result for both. By
default I see 0, but if I change the code to properly load the function
parameter it works.
What version of the product are you using? On what operating system?
I am using this code on iPad/iPhone 3GS so Cortex-A8 (iOS 4.2 Beta 2 SDK,
LLVM-GCC or GCC 4.2).
(example with the sinf_neon() function) I need to add this code before the hfp
variant of the function is called or at the top of the hfp variant for the
sinf_neon() function to produce the correct result:
asm volatile ("vdup.f32 d0, %[xInput] \n\t"
:
:[xInput] "r" (x)
:
);
because the default
asm volatile ("vdup.f32 d0, r0");
is not able to load the input value correctly.
Original issue reported on code.google.com by [email protected]
on 13 Oct 2010 at 9:47
ARM changed the instructions for 64-bit Neon, so none of this will compile. You
need to guard using __arm64__ define.
Original issue reported on code.google.com by [email protected]
on 13 Feb 2014 at 2:03
sqrtf_neon_hpf() first computes the inverse of the square root, and then the
reciprocal, i.e.
t = 1/sqrt(x)
r = 1/t
it might be easier/faster to compute the inverse of the square root, and then
multiply by the original value, i.e.
t = 1/sqrt(x)
r = x * t
Original issue reported on code.google.com by [email protected]
on 15 Sep 2011 at 11:53
Architecture: Xilinx Zynq (ARM Cortex-A9)
Compiler: arm-xilinx-eabi-gcc (Sourcery CodeBench Lite 2012.09-105) 4.7.2
Arguments: -Wall -O0 -g3 -c -fmessage-length=0
-I../../cpu0_bsp/ps7_cortexa9_0/include
3.
The following warnings appear:
math_sinf.c:123:1: warning: control reaches end of non-void function
[-Wreturn-type]
math_sinf.c:111:1: warning: control reaches end of non-void function
[-Wreturn-type]
Also the function sinf_neon() does not return the correct value. However, the
following code behaves correctly:
float sinf_neon_rms(float x)
{
asm volatile (
"vld1.32 d3, [%1] \n\t" //d3 = {invrange, range}
"vdup.f32 d0, %3 \n\t" //d0 = {x, x}
"vabs.f32 d1, d0 \n\t" //d1 = {ax, ax}
"vmul.f32 d2, d1, d3[0] \n\t" //d2 = d1 * d3[0]
"vcvt.u32.f32 d2, d2 \n\t" //d2 = (int) d2
"vmov.i32 d5, #1 \n\t" //d5 = 1
"vcvt.f32.u32 d4, d2 \n\t" //d4 = (float) d2
"vshr.u32 d7, d2, #1 \n\t" //d7 = d2 >> 1
"vmls.f32 d1, d4, d3[1] \n\t" //d1 = d1 - d4 * d3[1]
"vand.i32 d5, d2, d5 \n\t" //d5 = d2 & d5
"vclt.f32 d18, d0, #0 \n\t" //d18 = (d0 < 0.0)
"vcvt.f32.u32 d6, d5 \n\t" //d6 = (float) d5
"vmls.f32 d1, d6, d3[1] \n\t" //d1 = d1 - d6 * d3[1]
"veor.i32 d5, d5, d7 \n\t" //d5 = d5 ^ d7
"vmul.f32 d2, d1, d1 \n\t" //d2 = d1*d1 = {x^2, x^2}
"vld1.32 {d16, d17}, [%2] \n\t" //q8 = {p7, p3, p5, p1}
"veor.i32 d5, d5, d18 \n\t" //d5 = d5 ^ d18
"vshl.i32 d5, d5, #31 \n\t" //d5 = d5 << 31
"veor.i32 d1, d1, d5 \n\t" //d1 = d1 ^ d5
"vmul.f32 d3, d2, d2 \n\t" //d3 = d2*d2 = {x^4, x^4}
"vmul.f32 q0, q8, d1[0] \n\t" //q0 = q8 * d1[0] = {p7x, p3x, p5x, p1x}
"vmla.f32 d1, d0, d2[0] \n\t" //d1 = d1 + d0*d2 = {p5x + p7x^3, p1x + p3x^3}
"vmla.f32 d1, d3, d1[0] \n\t" //d1 = d1 + d3*d0 = {...., p1x + p3x^3 + p5x^5 + p7x^7}
"vmov.f32 %0, s3 \n\t" //s0 = s3
: "=r"(x)
: "r"(__sinf_rng), "r"(__sinf_lut), "r"(x)
: "q0", "q1", "q2", "q3", "q8", "q9"
);
return x;
}
Original issue reported on code.google.com by [email protected]
on 27 Jun 2013 at 9:18
Line 164 of math_atan2f.c, should read atan2f_c(x, y)...
(Or, more appropriately, rename the arguments in _sfp to y, x and fix line 161
to atan2f_neon_hfp(y, x) instead)
Original issue reported on code.google.com by [email protected]
on 7 Aug 2011 at 7:51
What steps will reproduce the problem?
1. Compile math_acosf.c
2. Compile math_vec2.c
3. Compile math_vec4.c
What is the expected output? What do you see instead?
There are errors in the functions: dot4_neon_hfp, dot2_neon_hfp, and
acosf_neon_hfp. The compiler error is "expected string literal before ')'
token," and it refers to what appears to be a missing register string at the
end of the asm block. I do not know ARM Neon assembly, and for that matter I
am super-rusty on assembly in general, so I'm trying to figure out how to fix
it.
What version of the product are you using? On what operating system?
I'm using the only code I've been able to find on the SVN. The OS is Angstrom
Linux, running on a beagleboard.
Original issue reported on code.google.com by [email protected]
on 5 Oct 2010 at 5:12
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.