Git Product home page Git Product logo

fmath's Introduction

fast approximate function of exponential function exp and log

How to use

include fmath.hpp and use fmath::log, fmath::exp, fmath::expd.

fmath::PowGenerator is a class to generate a function to compute pow(x, y) of x >= 0 for a given fixed y > 0.

eg. fmath::PowGenerator f(1.234); f.get(x) returns pow(x, 1.234);

Prototype of function

  • float fmath::exp(float);
  • float fmath::log(float);
  • double fmath::logd(double);
  • __m128 fmath::exp_ps(__m128);
  • __m128 fmath::log_ps(__m128);
  • void fmath::expv_d(double *p, size_t n); // for double p[n];

for AVX-512

fmath.h provides the following functions:

void fmath_expf_avx512(float *dst, const float *src, size_t n);
void fmath_logf_avx512(float *dst, const float *src, size_t n);
void fmath::expf_v(float *dst, const float *src, size_t n);
void fmath::logf_v(float *dst, const float *src, size_t n);

Experimental

If you install xbyak and define FMATH_USE_XBYAK before including fmath.hpp, then fmath::exp() and fmath::exp_ps() will be about 10~20 % faster. Xbyak version uses SSE4.1 if available.

AVX version of fmath::exp is experimental

Remark

gcc puts warnings such as "dereferencing type-punned pointer will break strict-aliasing rules." It is no problem. Please change #if 1 in fmath.hpp:423 if you worry about it. But it causes a little slower.

-ffast-math option of gcc may generate bad code for fmath::expd.

License

History

  • 2022/May/30 log for AVX-512 got 1.5 times faster
  • 2020/Jul/10 add expf_v and logf_v for AVX-512
  • 2012/Oct/30 fix fmath::expd for small value
  • 2011/Aug/26 add fmath::expd_v
  • 2011/Mar/25 exp supports AVX
  • 2011/Mar/25 exp, exp_ps support avx
  • 2010/Feb/16 add fmath::exp_ps, log_ps and optimize functions
  • 2010/Jan/10 add fmath::PowGenerator
  • 2009/Dec/28 add fmath::log()
  • 2009/Dec/09 support cygwin
  • 2009/Dec/08 first version

Author

MITSUNARI Shigeo([email protected]) http://herumi.in.coocan.jp/

Benchmark

compiler

  • Visual Studio 2010RC
  • icc 11.1
  • gcc 4.3.2 on cygwin
  • gcc 4.4.1 on 64bit Linux

option

  • cl(icl):

/Ox /Ob2 /GS- /Zi /D_SECURE_SCL=0 /MD /Oy /arch:SSE2 /fp:fast /DNOMINMAX

  • gcc:

-O3 -fomit-frame-pointer -DNDEBUG -fno-operator-names -msse2 -mfpmath=sse -march=native

see fastexp.cpp

fmath's People

Contributors

80c535 avatar herumi avatar jschueller avatar syohex avatar t-suzuki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fmath's Issues

Compiling without xbyak

I had trouble compiling the code because I do not have xbyak installed. Even though the README states that this is OK, headers for xbyak are imported on line 14 of bench.cpp and line 183 of fastexp.cpp. Both of these threw errors for me. I was able to compile fmath by not compiling bench.cpp at all, and then commenting out every line that involved xbyak and it's objects (e.g. every line that involved the clk object).

So either the README should be updated to reflect a non-optional dependence on xbyak, or the use of xbyak should be made optional. Thanks!

Second argument to expd_v

I'm getting compiler errors related to calls to expd_v. It's not clear to me what I'm doing wrong, and I'm wondering whether the documentation for exp_v can be improved to prevent people from getting these errors in the future.

Here is the function of mine (get_fourth_term) that uses expd_v:

double get_fourth_term(const std::vector<double> & spike_times, const double t,
                       const double tau, const double tau_1,
                       const double tau_2, const double I_syn_bar,
                       const double C, const double N, const double t_zero)
{

  // Compute values that don't depend on spike_times.
  const double first_other = 1 - exp(t / tau_1 - t / tau);
  const double second_other = 1 - exp(t / tau_2 - t / tau);

  // t is relative to the most recent spike in the network, but spike_times are
  // absolute -- therefore, make them relative to the most recent spike (which
  // occurred at absolute time t_zero).
  const int n_spikes = spike_times.size();
  double first_num[n_spikes];
  double second_num[n_spikes];
  for (int i = 0; i < n_spikes; i += 1)
    {
      first_num[i] = -(t - (spike_times[i] - t_zero)) / tau_1;
      second_num[i] = -(t - (spike_times[i] - t_zero)) / tau_2;
    }

  fmath::expd_v(first_num, sizeof(first_num));
  const double first_denom = 1 / tau - 1 / tau_1;

  fmath::expd_v(second_num, sizeof(second_num));
  const double second_denom = 1 / tau - 1 / tau_2;

  double sum_term = 0;
  for (int i = 0; i < n_spikes; i++)
    {
      const double first_term = first_num[i] / first_denom * first_other;
      const double second_term = second_num[i] / second_denom * second_other;
      sum_term += first_term - second_term;
    }

  return I_syn_bar / (C * N * (tau_1 - tau_2)) * sum_term;

}

These are the compiler errors I get

(master)dbliss@nx3[dopa_net]> g++ -O0 -g hansel.cpp -o hansel.o
In file included from hansel.cpp:7:
fmath.hpp: In member function ‘void fmath::local::ExpCode::makeExp(const fmath::local::ExpVar<N>*, const Xbyak::util::Cpu&) [with long unsigned int N = 10u]’:
fmath.hpp:236:   instantiated from ‘fmath::local::ExpCode::ExpCode(const fmath::local::ExpVar<N>*) [with long unsigned int N = 10u]’
fmath.hpp:389:   instantiated from ‘static const fmath::local::ExpCode& fmath::local::C<EXP_N, LOG_N, EXPD_N>::getInstance() [with long unsigned int EXP_N = 10u, long unsigned int LOG_N = 12u, long unsigned int EXPD_N = 11u]’
fmath.hpp:830:   instantiated from here
fmath.hpp:273: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::a’  of NULL object
fmath.hpp:273: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:281: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:281: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:283: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::b’  of NULL object
fmath.hpp:283: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:287: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::f1’  of NULL object
fmath.hpp:287: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:296: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::maxX’  of NULL object
fmath.hpp:296: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:297: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::minX’  of NULL object
fmath.hpp:297: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp: In member function ‘void fmath::local::ExpCode::makeExpPs(const fmath::local::ExpVar<N>*, const Xbyak::util::Cpu&) [with long unsigned int N = 10u]’:
fmath.hpp:240:   instantiated from ‘fmath::local::ExpCode::ExpCode(const fmath::local::ExpVar<N>*) [with long unsigned int N = 10u]’
fmath.hpp:389:   instantiated from ‘static const fmath::local::ExpCode& fmath::local::C<EXP_N, LOG_N, EXPD_N>::getInstance() [with long unsigned int EXP_N = 10u, long unsigned int LOG_N = 12u, long unsigned int EXPD_N = 11u]’
fmath.hpp:830:   instantiated from here
fmath.hpp:330: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::i7fffffff’  of NULL object
fmath.hpp:330: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:331: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::a’  of NULL object
fmath.hpp:331: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:332: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::b’  of NULL object
fmath.hpp:332: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:333: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::maxX’  of NULL object
fmath.hpp:333: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:335: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::i127s’  of NULL object
fmath.hpp:335: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:337: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::mask_s’  of NULL object
fmath.hpp:337: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:348: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:348: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:349: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::f1’  of NULL object
fmath.hpp:349: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:352: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:352: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:354: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:354: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:361: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:361: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:362: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:362: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:364: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:364: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:365: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::tbl’  of NULL object
fmath.hpp:365: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:373: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::maxX’  of NULL object
fmath.hpp:373: warning: (perhaps the ‘offsetof’ macro was used incorrectly)
fmath.hpp:374: warning: invalid access to non-static data member ‘fmath::local::ExpVar<10u>::minX’  of NULL object
fmath.hpp:374: warning: (perhaps the ‘offsetof’ macro was used incorrectly)

Explanation about expd

Hi @herumi, 🙂

Could you comment in more detail how the expd function (on fmath.hpp) works? I have tried to understand the flow explained in https://github.com/herumi/fmath/blob/master/algo-ja.md, but it does not clear any of my doubts. 😅

Sorry for these questions, but I am not very familiar with unions and binary operations, so this expd function is kind of difficult to unfold for me. From what I have been able to understand, first you store the values of powers of two from 0 to 1 (2^0 <> 2^1) in a lookup table (ExpdVar c.tbl).

fmath/fmath.hpp

Lines 177 to 182 in 0a10069

for (int i = 0; i < s; i++) {
di di;
di.d = ::pow(2.0, i * (1.0 / s));
tbl[i] = di.i & mask64(52);
}
}

Then, this lookup table is used in the expd function. Let me know if this is correct.
However, I was not able to follow the rest of the operations. More specifically:

  • What is the purpose of the variable b = 3ULL << 51?
  • Why do you calculate di.d = x * c.a + b ?
  • What does the variable iax represent?
  • Should not the value of t always be zero? I suppose this has something to do with floating numbers, since the equation, with real numbers, should simplify to zero.
  • What does the variable u represent? This computation is quite hard to understand (to a newbie like me)😭
  • Finally, I suppose the value of y is the evaluation of a polynomial, but I do not know what it is exactly representing.
  • And the final two operations (binary OR and the product of y with di.d) also no idea.

fmath/fmath.hpp

Lines 474 to 484 in 0a10069

const uint64_t b = 3ULL << 51;
di di;
di.d = x * c.a + b;
uint64_t iax = c.tbl[di.i & mask(c.sbit)];
double t = (di.d - b) * c.ra - x;
uint64_t u = ((di.i + c.adj) >> c.sbit) << 52;
double y = (c.C3[0] - t) * (t * t) * c.C2[0] - t + c.C1[0];
di.i = u | iax;
return y * di.d;

I would appreciate very much if you could comment the overall picture for computing expd, and if it is also possible 🙏, a more detailed breakdown of each line in expd.

Thank you very much for your time.

problem with fmath::expd on -711 and lesser values...

Hello,

I started to try fmath::expd this morning and the perf was amazing...
I used it in a bigger piece of code and this problem came up : for values less than -711, the fmath::expd return value becomes wrong...

Maybe my plateform is to blame :

Freebsd 8.2
gcc 4.2.1

cpuflags
OS : 'FreeBSD'
hw.model : 'Intel Core i5-2300 CPU @ 2.80GHz'
hw.machine : 'amd64'
hw.machine_arch : 'amd64'
cpu details :
CPU: Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz (2793.68-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0x206a7 Family = 6 Model = 2a Stepping = 7
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x179ae3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,POPCNT,,AESNI,XSAVE,>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x1
TSC: P-state invariant
-msse3 -mfpmath=sse

You will find below the code I used to compare std::exp and fmath::exp...

./myTest -s 711 is OK

./myTest -f 711 is not


include <math.h>

include <memory.h>

include <stdio.h>

include <stdlib.h>

include <time.h>

include <emmintrin.h>

include "fmath.hpp"

include "xbyak/xbyak_util.h"

int main(int argc, char** argv)
{
if (argc!=3)
{
printf("usage : ./test -s(=std) -f(=fast) integer (=loop count)\n" );
exit(1);
}

int loopCount = atoi(argv[2]);

double* vals = (double*)malloc(loopCount*sizeof(double));

int sign = -1;

for (int i=0; i < loopCount;i++)
{

    vals[i] = sign*i;
}

double sum = 0;

if (argv[1][1] == 'f')
{
    printf("fast\n");

    for (int i=0; i < loopCount;i++)
    {
       sum += fmath::expd(vals[i]);
    }
}
else
{
    printf("standard\n");

    for (int i=0; i < loopCount;i++)
    {
       sum += exp(vals[i]);
    }
}
printf("sum is : %5.30f\n",sum);

}

Thanks a lot for your great work !

Gwen

Why is exp_pd commented out?

Could you comment on why the function exp_pd is commented out with a note saying "not fast"? It looks like the same thing you do for expd, but doing two at once. And it is pretty much exactly what you do in expd_v when __AVX2__ isn't enabled.

Furthermore, I tried out the existing exp_pd code and it seems to be significantly faster than doing two expd calls. And since I'm already writing code using __m128d variables, it is more convenient to not break out the two doubles, call expd twice, and then go back to __m128d.

Related to that, the code I'm working on is actually doing a non-integer power x^nu over an array of x values (as part of a longer calculation), which I'm implementing as exp(nu * log(x)). The single-precision version looks like this:

__m128 xtothenu = fmath::exp_ps(_mm_mul_ps(nu, fmath::log_ps(x)));

So to make the corresponding double-precision code, I wrote my own log_pd that just calls out to the standard library

inline __m128d log_pd(__m128d x)
{
    union { __m128d m; double d[2]; } logx;
    logx.d[0] = std::log(*reinterpret_cast<double*>(&x));
    logx.d[1] = std::log(*(reinterpret_cast<double*>(&x)+1));
    return logx.m;
}

If you don't think it's possible to do a real "2 at once" version of log_pd that would be any faster than this, then maybe worth putting this into the fmath repo just to make life easier for people doing SSE2 operations and want to take a log of their __m128d values.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.