Git Product home page Git Product logo

fastmath's Issues

FastPower does not work

FastPower(Single, Single) does not work with Base = 0,474733531475067 eg and Exponent = 150, should return zero.
The same checks as in rtl before Result := FastExp2(AExponent * FastLog2(ABase)); make it work as it should.

FMA thoughts

Thanks for the well documented and elegantly coded repository. The documentation correctly notes that FMA is a single operation for ARM, but not for SSE.

One option would be to use AVX FMA3 instructions provided with modern AMD/Intel CPUs. The FreePascal code below illustrates this. I can understand why you might not want to do this - SSE is available with every x86_64 CPU, while AVX requires a more recent computer (and therefore one would want to detect this feature).

Any plans to extend this repository to combine other AVX instructions? On one hand, the 128-bit SSE is perfect for your 4-component singles, but I would have thought the 256-bits would allow you to tackle double precision values.

program avxTst;
{$mode delphi}   
uses sysutils;

//https://stackoverflow.com/questions/41468871/how-to-detect-avx2-in-delphi
{$ASMMODE INTEL}
function IsAVX2supported: boolean;
asm
    // Save EBX
    {$IFDEF CPUx86}
      push ebx
    {$ELSE CPUx64}
      mov r10, rbx
    {$ENDIF}
    //Check CPUID.0
    xor eax, eax
    cpuid //modifies EAX,EBX,ECX,EDX
    cmp al, 7 // do we have a CPUID leaf 7 ?
    jge @Leaf7
      xor eax, eax
      jmp @Exit
    @Leaf7:
      //Check CPUID.7
      mov eax, 7h
      xor ecx, ecx
      cpuid
      bt ebx, 5 //AVX2: CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]=1
      setc al
   @Exit:
   // Restore EBX
   {$IFDEF CPUx86}
     pop ebx
   {$ELSE CPUx64}
     mov rbx, r10
   {$ENDIF}
end;

type
TVec4 = packed record 
	x,y,z,w: single;
end;

function v(x,y,z,w:single): TVec4;
begin
	result.x := x;
	result.y := y;
	result.z := z;
	result.w := w;
end;

{$ASMMODE INTEL}
function FMA(const A, B, C: TVec4): TVec4;
//result := (A * B) + C
//https://neslib.github.io/FastMath/Neslib.FastMath.html#FMA
//https://en.wikipedia.org/wiki/FMA_instruction_set#Excerpt_from_FMA3
begin
	asm
		movups    xmm0, [C]
		movups    xmm1, [B]
		movups    xmm2, [A]
		vfmadd231ps xmm0, xmm1, xmm2 //a = b·c + a	
		movhlps xmm1, xmm0
		vzeroupper		
	end;
end;

procedure tst;
var
	a,b,c,d: TVec4;
begin
	a := v(1,2,3,4);
	b := v(5,10,10,10);
	c := v(22,1,1,1);
	d := FMA(a,b,c);
	writeln(format('xyzw: %g %g %g %g', [d.x, d.y, d.z, d.w])); 
end;

begin
	if not IsAVX2supported then begin
		writeln('AVX2 is not supported');
		exit;
	end;
	tst;
end.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.