Git Product home page Git Product logo

math-neon's Introduction

Library: 	MATH-NEON
By:			Lachlan Tychsen-Smith
Licence:	MIT (expat)
=======================================================================================
This project implements the cmath functions and some optimised matrix functions 
with the aim of increasing the floating point performance of ARM Cortex A-8
based platforms. As well as implementing the functions in ARM NEON assembly, 
they sacrifice error checking and some accuracy to achieve better performance.

Function Errors:
=======================================================================================
The measurement and characterisations of the inaccuracies present within these 
functions is really a field within itself. For the benchmark i provide the 
maximum absolute, maximum relative and root mean squared error compared to the
cmath implementations over the specified range. However these values can be 
misleading, especially for functions which quickly go to infinity. So its always a 
good idea to test it within your actual program. In general, this library will not 
be as accurate as cmath, however for many functions it is close enough to be
negilible. 
	
Notes:
=======================================================================================
- The *_c functions are c implementations of the *_neon code.
- Like cmath, The errors present in the functions are very dependent on the 
  range which your operating in. So you should test them first.
- Look in the "math_neon.h" file for discriptions of the functions. In some 
  function files there are also notes on the specific implementation.
- The *_neon functions make certain assumptions about the location of arguments 
  that is incompatible with inlining. 
	  
Contact:
=======================================================================================
Name: 	Lachlan Tychsen-Smith 
Email: 	[email protected]

math-neon's People

Watchers

James Cloos avatar James Liu avatar

math-neon's Issues

frexpf

I had issues getting correct values following your frexpf algorithm.  When I 
switched to the algorithm shown at 
http://code.metager.de/source/xref/sdcc/sdcc/device/lib/frexpf.c I had better 
luck.  Honestly not sure if I just messed up my implementation, or if your 
algorithm is wrong or not.  Just suggesting you might take a second look at it.

Original issue reported on code.google.com by [email protected] on 7 May 2014 at 9:26

asinf_c() does not seem to be giving correct results

What steps will reproduce the problem?
1. Call asinf_c(x) 
2. Call system asinf(x) with same x
3. Compare the two results

What is the expected output? What do you see instead?
When x=-0.9193184972, the system call returns 0.20978982746601104736 and 
asinf_c() returns 0.48538914322853088379

What version of the product are you using? On what operating system?
Trunk version on iPhone simulator.

Please provide any additional information below.

Thanks,
Mike


Original issue reported on code.google.com by [email protected] on 4 Nov 2010 at 2:06

simple makefile for build math_debug

I had cooked a simple makefile for building the math_debug:

http://gitorious.org/vjaquez-misc/math-neon/commit/14ba470caad37c33cf7245be69efc
9a1366d8f99?format=patch

Original issue reported on code.google.com by [email protected] on 25 Mar 2011 at 11:59

not valid intrinsic's dot4_neon()

The function dot4_neon() uses the intrinsics API which is not correctly handled 
by the project, also it collides with the math_neon.h defined signature.

I just commented it out:

http://gitorious.org/vjaquez-misc/math-neon/commit/3ca3102732e0786350486b52329f0
7392554bd97?format=patch

Original issue reported on code.google.com by [email protected] on 25 Mar 2011 at 12:02

The input value is not properly read by the inline asm code

What steps will reproduce the problem?
1. call sinf_neon(PI/2) and cosf_neon(0)

What is the expected output? What do you see instead?
I am testing sin(PI/2) and cos(0) and I expect to see 1 as result for both. By 
default I see 0, but if I change the code to properly load the function 
parameter it works.

What version of the product are you using? On what operating system?
I am using this code on iPad/iPhone 3GS so Cortex-A8 (iOS 4.2 Beta 2 SDK, 
LLVM-GCC or GCC 4.2).


(example with the sinf_neon() function) I need to add this code before the hfp 
variant of the function is called or at the top of the hfp variant for the 
sinf_neon() function to produce the correct result:

    asm volatile ("vdup.f32 d0, %[xInput]   \n\t"
                  :
                  :[xInput] "r" (x)
                  :
                  );

because the default
    asm volatile ("vdup.f32 d0, r0");

is not able to load the input value correctly.


Original issue reported on code.google.com by [email protected] on 13 Oct 2010 at 9:47

Won't compile on ARM64

ARM changed the instructions for 64-bit Neon, so none of this will compile. You 
need to guard using __arm64__ define.

Original issue reported on code.google.com by [email protected] on 13 Feb 2014 at 2:03

impl. of sqrtf_neon_hpf()


sqrtf_neon_hpf() first computes the inverse of the square root, and then the 
reciprocal, i.e.

t = 1/sqrt(x)
r = 1/t

it might be easier/faster to compute the inverse of the square root, and then 
multiply by the original value, i.e.

t = 1/sqrt(x)
r = x * t

Original issue reported on code.google.com by [email protected] on 15 Sep 2011 at 11:53

Operands access

Architecture: Xilinx Zynq (ARM Cortex-A9)
Compiler: arm-xilinx-eabi-gcc (Sourcery CodeBench Lite 2012.09-105) 4.7.2
Arguments: -Wall -O0 -g3 -c -fmessage-length=0 
-I../../cpu0_bsp/ps7_cortexa9_0/include
3.

The following warnings appear:
math_sinf.c:123:1: warning: control reaches end of non-void function 
[-Wreturn-type]
math_sinf.c:111:1: warning: control reaches end of non-void function 
[-Wreturn-type]

Also the function sinf_neon() does not return the correct value. However, the 
following code behaves correctly:

float sinf_neon_rms(float x)
{
    asm volatile (

        "vld1.32                d3, [%1]                                \n\t"   //d3 = {invrange, range}
        "vdup.f32               d0, %3                                  \n\t"   //d0 = {x, x}
        "vabs.f32               d1, d0                                  \n\t"   //d1 = {ax, ax}

        "vmul.f32               d2, d1, d3[0]                           \n\t"   //d2 = d1 * d3[0]
        "vcvt.u32.f32           d2, d2                                  \n\t"   //d2 = (int) d2
        "vmov.i32               d5, #1                                  \n\t"   //d5 = 1
        "vcvt.f32.u32           d4, d2                                  \n\t"   //d4 = (float) d2
        "vshr.u32               d7, d2, #1                              \n\t"   //d7 = d2 >> 1
        "vmls.f32               d1, d4, d3[1]                           \n\t"   //d1 = d1 - d4 * d3[1]

        "vand.i32               d5, d2, d5                              \n\t"   //d5 = d2 & d5
        "vclt.f32               d18, d0, #0                             \n\t"   //d18 = (d0 < 0.0)
        "vcvt.f32.u32           d6, d5                                  \n\t"   //d6 = (float) d5
        "vmls.f32               d1, d6, d3[1]                           \n\t"   //d1 = d1 - d6 * d3[1]
        "veor.i32               d5, d5, d7                              \n\t"   //d5 = d5 ^ d7
        "vmul.f32               d2, d1, d1                              \n\t"   //d2 = d1*d1 = {x^2, x^2}

        "vld1.32                {d16, d17}, [%2]                        \n\t"   //q8 = {p7, p3, p5, p1}
        "veor.i32               d5, d5, d18                             \n\t"   //d5 = d5 ^ d18
        "vshl.i32               d5, d5, #31                             \n\t"   //d5 = d5 << 31
        "veor.i32               d1, d1, d5                              \n\t"   //d1 = d1 ^ d5

        "vmul.f32               d3, d2, d2                              \n\t"   //d3 = d2*d2 = {x^4, x^4}
        "vmul.f32               q0, q8, d1[0]                           \n\t"   //q0 = q8 * d1[0] = {p7x, p3x, p5x, p1x}
        "vmla.f32               d1, d0, d2[0]                           \n\t"   //d1 = d1 + d0*d2 = {p5x + p7x^3, p1x + p3x^3}
        "vmla.f32               d1, d3, d1[0]                           \n\t"   //d1 = d1 + d3*d0 = {...., p1x + p3x^3 + p5x^5 + p7x^7}

        "vmov.f32               %0, s3                                  \n\t"   //s0 = s3
        : "=r"(x)
        : "r"(__sinf_rng), "r"(__sinf_lut), "r"(x)
        : "q0", "q1", "q2", "q3", "q8", "q9"
        );

        return x;
}

Original issue reported on code.google.com by [email protected] on 27 Jun 2013 at 9:18

Compilation errors

What steps will reproduce the problem?
1. Compile math_acosf.c
2. Compile math_vec2.c
3. Compile math_vec4.c

What is the expected output? What do you see instead?
There are errors in the functions: dot4_neon_hfp, dot2_neon_hfp, and 
acosf_neon_hfp.  The compiler error is "expected string literal before ')' 
token," and it refers to what appears to be a missing register string at the 
end of the asm block.  I do not know ARM Neon assembly, and for that matter I 
am super-rusty on assembly in general, so I'm trying to figure out how to fix 
it.


What version of the product are you using? On what operating system?
I'm using the only code I've been able to find on the SVN.  The OS is Angstrom 
Linux, running on a beagleboard.



Original issue reported on code.google.com by [email protected] on 5 Oct 2010 at 5:12

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.