herumi / fmath Goto Github PK

View Code? Open in Web Editor NEW

210.0 21.0 37.0 238 KB

fast log and exp functions for x86/x64 SSE

Home Page: http://herumi.in.coocan.jp/soft/fmath.html

C++ 73.98% Makefile 1.72% Assembly 15.92% Python 8.39%

fmath's Introduction

fast approximate function of exponential function exp and log

How to use

include fmath.hpp and use fmath::log, fmath::exp, fmath::expd.

fmath::PowGenerator is a class to generate a function to compute pow(x, y) of x >= 0 for a given fixed y > 0.

eg. fmath::PowGenerator f(1.234); f.get(x) returns pow(x, 1.234);

Prototype of function

float fmath::exp(float);
float fmath::log(float);
double fmath::logd(double);
__m128 fmath::exp_ps(__m128);
__m128 fmath::log_ps(__m128);
void fmath::expv_d(double *p, size_t n); // for double p[n];

for AVX-512

fmath.h provides the following functions:

void fmath_expf_avx512(float *dst, const float *src, size_t n);
void fmath_logf_avx512(float *dst, const float *src, size_t n);

void fmath::expf_v(float *dst, const float *src, size_t n);
void fmath::logf_v(float *dst, const float *src, size_t n);

Experimental

If you install xbyak and define FMATH_USE_XBYAK before including fmath.hpp, then fmath::exp() and fmath::exp_ps() will be about 10~20 % faster. Xbyak version uses SSE4.1 if available.

AVX version of fmath::exp is experimental

Remark

gcc puts warnings such as "dereferencing type-punned pointer will break strict-aliasing rules." It is no problem. Please change #if 1 in fmath.hpp:423 if you worry about it. But it causes a little slower.

-ffast-math option of gcc may generate bad code for fmath::expd.

License

modified new BSD License
http://opensource.org/licenses/BSD-3-Clause

History

2022/May/30 log for AVX-512 got 1.5 times faster
2020/Jul/10 add expf_v and logf_v for AVX-512
2012/Oct/30 fix fmath::expd for small value
2011/Aug/26 add fmath::expd_v
2011/Mar/25 exp supports AVX
2011/Mar/25 exp, exp_ps support avx
2010/Feb/16 add fmath::exp_ps, log_ps and optimize functions
2010/Jan/10 add fmath::PowGenerator
2009/Dec/28 add fmath::log()
2009/Dec/09 support cygwin
2009/Dec/08 first version

Author

MITSUNARI Shigeo([email protected]) http://herumi.in.coocan.jp/

Benchmark

compiler

Visual Studio 2010RC
icc 11.1
gcc 4.3.2 on cygwin
gcc 4.4.1 on 64bit Linux

option

cl(icl):

/Ox /Ob2 /GS- /Zi /D_SECURE_SCL=0 /MD /Oy /arch:SSE2 /fp:fast /DNOMINMAX

gcc:

-O3 -fomit-frame-pointer -DNDEBUG -fno-operator-names -msse2 -mfpmath=sse -march=native

see fastexp.cpp

fmath's People

Contributors

Stargazers

Watchers

fmath's Issues

Compiling without xbyak

I had trouble compiling the code because I do not have xbyak installed. Even though the README states that this is OK, headers for xbyak are imported on line 14 of bench.cpp and line 183 of fastexp.cpp. Both of these threw errors for me. I was able to compile fmath by not compiling bench.cpp at all, and then commenting out every line that involved xbyak and it's objects (e.g. every line that involved the clk object).

So either the README should be updated to reflect a non-optional dependence on xbyak, or the use of xbyak should be made optional. Thanks!

Second argument to expd_v

I'm getting compiler errors related to calls to expd_v. It's not clear to me what I'm doing wrong, and I'm wondering whether the documentation for exp_v can be improved to prevent people from getting these errors in the future.

Here is the function of mine (get_fourth_term) that uses expd_v:

double get_fourth_term(const std::vector<double> & spike_times, const double t,
                       const double tau, const double tau_1,
                       const double tau_2, const double I_syn_bar,
                       const double C, const double N, const double t_zero)
{

  // Compute values that don't depend on spike_times.
  const double first_other = 1 - exp(t / tau_1 - t / tau);
  const double second_other = 1 - exp(t / tau_2 - t / tau);

  // t is relative to the most recent spike in the network, but spike_times are
  // absolute -- therefore, make them relative to the most recent spike (which
  // occurred at absolute time t_zero).
  const int n_spikes = spike_times.size();
  double first_num[n_spikes];
  double second_num[n_spikes];
  for (int i = 0; i < n_spikes; i += 1)
    {
      first_num[i] = -(t - (spike_times[i] - t_zero)) / tau_1;
      second_num[i] = -(t - (spike_times[i] - t_zero)) / tau_2;
    }

  fmath::expd_v(first_num, sizeof(first_num));
  const double first_denom = 1 / tau - 1 / tau_1;

  fmath::expd_v(second_num, sizeof(second_num));
  const double second_denom = 1 / tau - 1 / tau_2;

  double sum_term = 0;
  for (int i = 0; i < n_spikes; i++)
    {
      const double first_term = first_num[i] / first_denom * first_other;
      const double second_term = second_num[i] / second_denom * second_other;
      sum_term += first_term - second_term;
    }

  return I_syn_bar / (C * N * (tau_1 - tau_2)) * sum_term;

}

These are the compiler errors I get

(master)dbliss@nx3[dopa_net]> g++ -O0 -g hansel.cpp -o hansel.o
In file included from hansel.cpp:7:
fmath.hpp: In member function ‘void fmath::local::ExpCode::makeExp(const fmath::local::ExpVar<N>*, const Xbyak::util::Cpu&) [with long unsigned int N = 10u]’:
fmath.hpp:236:   instantiated from ‘fmath::local::ExpCode::ExpCode(const fmath::local::ExpVar<N>*) [with long unsigned int N = 10u]’
fmath.hpp:389:   instantiated from ‘static const fmath::local::ExpCode& fmath::local::C<EXP_N, LOG_N, EXPD_N>::getInstance() [with long unsigned int EXP_N = 10u, long unsigned int LOG_N = 12u, long unsigned int EXPD_N = 11u]’
fmath.hpp:830:   instantiated from here
fmath.hpp:273: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::a’  of NULL object
fmath.hpp:273: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:281: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:281: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:283: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::b’  of NULL object
fmath.hpp:283: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:287: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::f1’  of NULL object
fmath.hpp:287: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:296: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::maxX’  of NULL object
fmath.hpp:296: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:297: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::minX’  of NULL object
fmath.hpp:297: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp: In member function ‘void fmath::local::ExpCode::makeExpPs(const fmath::local::ExpVar<N>*, const Xbyak::util::Cpu&) [with long unsigned int N = 10u]’:
fmath.hpp:240:   instantiated from ‘fmath::local::ExpCode::ExpCode(const fmath::local::ExpVar<N>*) [with long unsigned int N = 10u]’
fmath.hpp:389:   instantiated from ‘static const fmath::local::ExpCode& fmath::local::C<EXP_N, LOG_N, EXPD_N>::getInstance() [with long unsigned int EXP_N = 10u, long unsigned int LOG_N = 12u, long unsigned int EXPD_N = 11u]’
fmath.hpp:830:   instantiated from here
fmath.hpp:330: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::i7fffffff’  of NULL object
fmath.hpp:330: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:331: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::a’  of NULL object
fmath.hpp:331: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:332: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::b’  of NULL object
fmath.hpp:332: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:333: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::maxX’  of NULL object
fmath.hpp:333: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:335: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::i127s’  of NULL object
fmath.hpp:335: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:337: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::mask_s’  of NULL object
fmath.hpp:337: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:348: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:348: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:349: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::f1’  of NULL object
fmath.hpp:349: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:352: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:352: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:354: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:354: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:361: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:361: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:362: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:362: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:364: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:364: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:365: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:365: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:373: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::maxX’  of NULL object
fmath.hpp:373: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:374: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::minX’  of NULL object
fmath.hpp:374: warning: (perhaps the ‘offsetof’ macro was used incorrectly)

Explanation about expd

Hi @herumi, 🙂

Could you comment in more detail how the expd function (on fmath.hpp) works? I have tried to understand the flow explained in https://github.com/herumi/fmath/blob/master/algo-ja.md, but it does not clear any of my doubts. 😅

Sorry for these questions, but I am not very familiar with unions and binary operations, so this expd function is kind of difficult to unfold for me. From what I have been able to understand, first you store the values of powers of two from 0 to 1 (2^0 <> 2^1) in a lookup table (ExpdVar c.tbl).

fmath/fmath.hpp

Lines 177 to 182 in 0a10069

 for (int i = 0; i < s; i++) { 

 di di; 

 di.d = ::pow(2.0, i * (1.0 / s)); 

 tbl[i] = di.i & mask64(52); 

 } 

 }

Then, this lookup table is used in the expd function. Let me know if this is correct.
However, I was not able to follow the rest of the operations. More specifically:

What is the purpose of the variable b = 3ULL << 51?
Why do you calculate di.d = x * c.a + b ?
What does the variable iax represent?
Should not the value of t always be zero? I suppose this has something to do with floating numbers, since the equation, with real numbers, should simplify to zero.
What does the variable u represent? This computation is quite hard to understand (to a newbie like me)😭
Finally, I suppose the value of y is the evaluation of a polynomial, but I do not know what it is exactly representing.
And the final two operations (binary OR and the product of y with di.d) also no idea.

fmath/fmath.hpp

Lines 474 to 484 in 0a10069

  const uint64_t b = 3ULL << 51; 

  di di; 

  di.d = x * c.a + b; 

  uint64_t iax = c.tbl[di.i & mask(c.sbit)]; 

  double t = (di.d - b) * c.ra - x; 

  uint64_t u = ((di.i + c.adj) >> c.sbit) << 52; 

  double y = (c.C3[0] - t) * (t * t) * c.C2[0] - t + c.C1[0]; 

  di.i = u | iax; 

  return y * di.d;

I would appreciate very much if you could comment the overall picture for computing expd, and if it is also possible 🙏, a more detailed breakdown of each line in expd.

Thank you very much for your time.

problem with fmath::expd on -711 and lesser values...

Hello,

I started to try fmath::expd this morning and the perf was amazing...
I used it in a bigger piece of code and this problem came up : for values less than -711, the fmath::expd return value becomes wrong...

Maybe my plateform is to blame :

Freebsd 8.2
gcc 4.2.1

cpuflags
OS : 'FreeBSD'
hw.model : 'Intel Core i5-2300 CPU @ 2.80GHz'
hw.machine : 'amd64'
hw.machine_arch : 'amd64'
cpu details :
CPU: Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz (2793.68-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0x206a7 Family = 6 Model = 2a Stepping = 7
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x179ae3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,POPCNT,,AESNI,XSAVE,>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x1
TSC: P-state invariant
-msse3 -mfpmath=sse

You will find below the code I used to compare std::exp and fmath::exp...

./myTest -s 711 is OK

./myTest -f 711 is not

include <math.h>

include <memory.h>

include <stdio.h>

include <stdlib.h>

include <time.h>

include <emmintrin.h>

include "fmath.hpp"

include "xbyak/xbyak_util.h"

int main(int argc, char** argv)
{
if (argc!=3)
{
printf("usage : ./test -s(=std) -f(=fast) integer (=loop count)\n" );
exit(1);
}

int loopCount = atoi(argv[2]);

double* vals = (double*)malloc(loopCount*sizeof(double));

int sign = -1;

for (int i=0; i < loopCount;i++)
{

    vals[i] = sign*i;
}

double sum = 0;

if (argv[1][1] == 'f')
{
    printf("fast\n");

    for (int i=0; i < loopCount;i++)
    {
       sum += fmath::expd(vals[i]);
    }
}
else
{
    printf("standard\n");

    for (int i=0; i < loopCount;i++)
    {
       sum += exp(vals[i]);
    }
}
printf("sum is : %5.30f\n",sum);

}

Thanks a lot for your great work !

Gwen

Why is exp_pd commented out?

Could you comment on why the function exp_pd is commented out with a note saying "not fast"? It looks like the same thing you do for expd, but doing two at once. And it is pretty much exactly what you do in expd_v when __AVX2__ isn't enabled.

Furthermore, I tried out the existing exp_pd code and it seems to be significantly faster than doing two expd calls. And since I'm already writing code using __m128d variables, it is more convenient to not break out the two doubles, call expd twice, and then go back to __m128d.

Related to that, the code I'm working on is actually doing a non-integer power x^nu over an array of x values (as part of a longer calculation), which I'm implementing as exp(nu * log(x)). The single-precision version looks like this:

__m128 xtothenu = fmath::exp_ps(_mm_mul_ps(nu, fmath::log_ps(x)));

So to make the corresponding double-precision code, I wrote my own log_pd that just calls out to the standard library

inline __m128d log_pd(__m128d x)
{
    union { __m128d m; double d[2]; } logx;
    logx.d[0] = std::log(*reinterpret_cast<double*>(&x));
    logx.d[1] = std::log(*(reinterpret_cast<double*>(&x)+1));
    return logx.m;
}

If you don't think it's possible to do a real "2 at once" version of log_pd that would be any faster than this, then maybe worth putting this into the fmath repo just to make life easier for people doing SSE2 operations and want to take a log of their __m128d values.

there is no logd as claimed?

only fmath::log available ?

	for (int i = 0; i < s; i++) {
	di di;
	di.d = ::pow(2.0, i * (1.0 / s));
	tbl[i] = di.i & mask64(52);
	}
	}

	const uint64_t b = 3ULL << 51;
	di di;
	di.d = x * c.a + b;
	uint64_t iax = c.tbl[di.i & mask(c.sbit)];

	double t = (di.d - b) * c.ra - x;
	uint64_t u = ((di.i + c.adj) >> c.sbit) << 52;
	double y = (c.C3[0] - t) * (t * t) * c.C2[0] - t + c.C1[0];

	di.i = u \| iax;
	return y * di.d;