Git Product home page Git Product logo

blasfeo's People

Contributors

bnovoselnik avatar freyjo avatar giaf avatar imciner2 avatar jgillis avatar lvanroye avatar nielsvd avatar omersahintas avatar pkourouklidis avatar reichardtj avatar roversch avatar stefanct avatar tmmsartor avatar wdecre avatar zanellia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blasfeo's Issues

SIGSEGV using hpipm

Hi guys,

I hope this is the right place for this issue.
Since a couple of days I'm struggling with hpipm. The example runs perfectly fine but
when using hpipm for my particular ocp I get the following error:

Program received signal SIGSEGV, Segmentation fault.
0x00005555558aaf87 in d_ocp_qp_init_var ()

Does anyone know how to deal with that? I appreciate any help.

Thanks a lot.
Best, Miri

blasfeo_ddot: suggestion for improvement

In the 'reduce' step of blasfeo_ddot, a horizontal add _mm_hadd_pd is computed. Instead, one could replace

u_tmp = _mm_hadd_pd(u_tmp, u_tmp);

with

__m128d hi64 = _mm_unpackhi_pd(u_tmp, u_tmp);
u_tmp = _mm_add_sd(u_tmp, hi64);

effectively trading a packed double operation with a scalar one.

wrong results using blasfeo_dgemv_t - HP and REFERENCE

Hey guys,

I found a bug in blasfeo_dgemv_t:
I am using BLASFEO_TARGET = X64_INTEL_SANDY_BRIDGE.
Running the following example, I get wrong results using both HIGH PERFORMANCE and REFERENCE:

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

// blasfeo
#include <blasfeo/include/blasfeo_target.h>
#include <blasfeo/include/blasfeo_common.h>
#include <blasfeo/include/blasfeo_d_aux.h>
#include <blasfeo/include/blasfeo_d_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_v_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_d_blas.h>
int main() {
    int nx = 9;
    int nu = 2;

    struct blasfeo_dmat A;
    struct blasfeo_dvec lambda;
    struct blasfeo_dvec lambda0;

    blasfeo_allocate_dmat(nx, nx, &A); //    
    blasfeo_allocate_dvec(nx+nu, &lambda);
    blasfeo_allocate_dvec(nx+nu, &lambda0);

    for (int ii = 0; ii < nx+nu; ii++) {
        blasfeo_dvecin1((double) ii, &lambda, ii);
    }
    blasfeo_dgese(nx, nx, 1.0, &A, 0, 0);

    blasfeo_print_dmat(nx, nx, &A, 0, 0);
    printf("lambda = \n");
    blasfeo_print_dvec(nx+nu, &lambda, 0);    

    blasfeo_dgemv_t(nx, nx, 1.0, &A, 0, 0, &lambda, 0, 0.0, &lambda0, 0, &lambda, 0); // recheck!
    printf("lambda: result = \n");
    blasfeo_print_exp_dvec(nx+  nu, &lambda,0);

    return 0;
}

REFERENCE prints:

lambda: result =
3.600000e+01
3.600000e+01
1.070000e+02
1.070000e+02
3.160000e+02
3.160000e+02
9.390000e+02
9.390000e+02
2.804000e+03
9.000000e+00
1.000000e+01

whereas HP prints:

lambda: result =
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
2.960000e+02
9.000000e+00
1.000000e+01

The correct result should be:

lambda =

    36
    36
    36
    36
    36
    36
    36
    36
    36
     9
    10

I hope this helps fixing stuff!

Testing against other blas version

Hello,
i am missing the point to test against other libraries like openblas, because where should i add the according references for example in cmake.

best regards

Error with avx2 when compiling for Intel Haswell architecture

Hi,

I tried compiling BLASFEO with Intel Haswell as target architecture but it threw an error concerning avx2. Here is the log. It works under other Intel architectures but my processor is in the Haswell family.

Thank you very much!

[ggleizer@localhost blasfeo]$ make rm -f libblasfeo.a make -C auxiliary clean make[1]: Entering directory /ggleizer/hpmpc/blasfeo/auxiliary'
rm -f *.o
make -C avx2 clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/auxiliary/avx2' rm -f *.o make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/auxiliary/avx2'
make -C avx clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/auxiliary/avx' rm -f *.o make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/auxiliary/avx'
make -C c99 clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/auxiliary/c99' rm -f *.o make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/auxiliary/c99'
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/auxiliary' make -C kernel clean make[1]: Entering directory /ggleizer/hpmpc/blasfeo/kernel'
make -C avx2 clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/kernel/avx2' rm -f *.o rm -f *.s make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel/avx2'
make -C avx clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/kernel/avx' rm -f *.o rm -f *.s make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel/avx'
make -C sse3 clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/kernel/sse3' rm -f *.o rm -f *.s make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel/sse3'
make -C fma clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/kernel/fma' rm -f *.o rm -f *.s make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel/fma'
make -C c99 clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/kernel/c99' rm -f *.o rm -f *.s make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel/c99'
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel' make -C blas clean make[1]: Entering directory /ggleizer/hpmpc/blasfeo/blas'
rm -f *.o
rm -f *.s
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/blas' make -C test_problems clean make[1]: Entering directory /ggleizer/hpmpc/blasfeo/test_problems'
rm -f *.o
rm -f test.out
rm -f libblasfeo.a
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/test_problems' make -C examples clean make[1]: Entering directory /ggleizer/hpmpc/blasfeo/examples'
rm -f *.o
rm -f test.out
rm -f libblasfeo.a
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/examples' touch ./include/blasfeo_target.h echo "#ifndef TARGET_X64_INTEL_HASWELL" > ./include/blasfeo_target.h echo "#define TARGET_X64_INTEL_HASWELL" >> ./include/blasfeo_target.h echo "#endif" >> ./include/blasfeo_target.h echo "#ifndef LA_HIGH_PERFORMANCE" >> ./include/blasfeo_target.h echo "#define LA_HIGH_PERFORMANCE" >> ./include/blasfeo_target.h echo "#endif" >> ./include/blasfeo_target.h ( cd auxiliary; make obj) make[1]: Entering directory /ggleizer/hpmpc/blasfeo/auxiliary'
gcc -O2 -fPIC -m64 -mavx2 -mfma -DTARGET_X64_INTEL_HASWELL -DLA_HIGH_PERFORMANCE -DOS_LINUX -DREF_BLAS_OPENBLAS -I/opt/openblas/include -c -o d_aux_lib4.o d_aux_lib4.c
cc1: error: unrecognized command line option "-mavx2"
make[1]: *** [d_aux_lib4.o] Error 1
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/auxiliary' make: *** [static_library] Error 2

Stable branch

Shall we have a stable branch? We could make one based on the commit currently used in acados. Or we could have master and develop.

Incorrect answer for NT mul 13x13, Haswell

Look at the very last element, C[13,13] if we're using 1-based indexing.

julia> M = K = N = 13;

julia> A = rand(M, K); B = rand(K, N); C1 = Matrix{Float64}(undef, M, N);

julia> gemmfeo!(C1, A, B'); C1 # BLASFEO
13×13 Matrix{Float64}:
 3.21111  3.63236  3.60901  3.42567  4.02387  4.42455  3.78604  4.14247  3.90693  3.75106  4.28329  4.65109  3.62135
 2.98562  2.27237  2.43335  3.50058  4.19308  3.65243  3.37025  3.83939  3.60735  3.11551  3.39936  3.44949  2.92242
 2.66924  3.08483  3.67439  3.24781  4.3147   4.44565  3.74392  4.17439  4.14373  3.90033  3.50683  3.26632  3.22764
 3.30279  3.12434  2.83558  3.38436  4.23039  4.41239  3.35741  3.79297  3.58183  3.29104  3.45312  3.61621  3.21767
 2.77241  2.80838  2.23948  3.82555  4.77529  4.13492  3.73856  3.94799  3.69966  3.33374  4.02254  3.63971  3.11314
 2.59138  2.99149  3.29702  2.77667  3.9835   4.59488  3.24307  3.36162  3.14416  3.48404  2.93357  3.27987  3.48901
 2.19375  3.11538  2.99718  3.37941  5.23863  4.17066  3.73315  4.428    4.78703  3.46704  3.72306  3.87886  2.96676
 1.79388  1.9374   1.60787  2.40525  3.53875  2.73698  2.49986  3.09528  2.78403  2.29148  2.9501   2.66847  2.19448
 2.92898  2.96992  2.81447  3.48228  4.17282  4.01132  3.78475  4.21871  4.142    3.84598  4.13346  4.17309  3.47024
 2.21216  2.78679  2.60065  2.4549   3.28243  3.42619  3.12983  3.94333  3.58097  3.43396  3.64217  3.66227  3.1028
 2.08495  2.19327  2.42211  2.86349  3.62425  3.05892  2.78832  3.50663  3.28341  2.83864  3.43406  3.0449   2.58316
 1.91349  2.09611  1.71962  2.35067  2.85976  2.23329  2.3354   3.00702  3.23756  2.17555  2.76727  2.70094  1.74628
 3.16237  3.32345  3.8718   3.80725  4.90455  4.48073  4.10663  4.68758  4.42813  4.03683  4.49546  4.10844  1.88106

julia> A * B'
13×13 Matrix{Float64}:
 3.21111  3.63236  3.60901  3.42567  4.02387  4.42455  3.78604  4.14247  3.90693  3.75106  4.28329  4.65109  3.62135
 2.98562  2.27237  2.43335  3.50058  4.19308  3.65243  3.37025  3.83939  3.60735  3.11551  3.39936  3.44949  2.92242
 2.66924  3.08483  3.67439  3.24781  4.3147   4.44565  3.74392  4.17439  4.14373  3.90033  3.50683  3.26632  3.22764
 3.30279  3.12434  2.83558  3.38436  4.23039  4.41239  3.35741  3.79297  3.58183  3.29104  3.45312  3.61621  3.21767
 2.77241  2.80838  2.23948  3.82555  4.77529  4.13492  3.73856  3.94799  3.69966  3.33374  4.02254  3.63971  3.11314
 2.59138  2.99149  3.29702  2.77667  3.9835   4.59488  3.24307  3.36162  3.14416  3.48404  2.93357  3.27987  3.48901
 2.19375  3.11538  2.99718  3.37941  5.23863  4.17066  3.73315  4.428    4.78703  3.46704  3.72306  3.87886  2.96676
 1.79388  1.9374   1.60787  2.40525  3.53875  2.73698  2.49986  3.09528  2.78403  2.29148  2.9501   2.66847  2.19448
 2.92898  2.96992  2.81447  3.48228  4.17282  4.01132  3.78475  4.21871  4.142    3.84598  4.13346  4.17309  3.47024
 2.21216  2.78679  2.60065  2.4549   3.28243  3.42619  3.12983  3.94333  3.58097  3.43396  3.64217  3.66227  3.1028
 2.08495  2.19327  2.42211  2.86349  3.62425  3.05892  2.78832  3.50663  3.28341  2.83864  3.43406  3.0449   2.58316
 1.91349  2.09611  1.71962  2.35067  2.85976  2.23329  2.3354   3.00702  3.23756  2.17555  2.76727  2.70094  1.74628
 3.16237  3.32345  3.8718   3.80725  4.90455  4.48073  4.10663  4.68758  4.42813  4.03683  4.49546  4.10844  3.44541

julia> C1 .== A * B'
13×13 BitMatrix:
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  0

julia> A[end,:]' * B[end,:]
3.4454147532268555

This is on the latest master, using wrapper code from here to call the gemm routines from Julia.

Windows Support

It seems that Windows Support is implied yet I couldn't find any explicit documentation about it in the web site (For instance, Installation section in https://blasfeo.syscop.de).

Are there any official instructions to create a Static and Shared libraries of BLASFEO under Windows?

bug in blasfeo_dgemm_nn

Hey guys,

I found another bug in blasfeo_dgemm_nn.
I am using BLASFEO_TARGET = X64_INTEL_SANDY_BRIDGE.
BLASFEO_VERSION = HIGH_PERFORMANCE is giving the wrong results, whereas REFERENCE gives the correct result
I wrote the following example and hope it helps fixing it.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

// blasfeo
#include <blasfeo/include/blasfeo_target.h>
#include <blasfeo/include/blasfeo_common.h>
#include <blasfeo/include/blasfeo_d_aux.h>
#include <blasfeo/include/blasfeo_d_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_v_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_d_blas.h>
int main() {
    int nx = 8;
    int nu = 2;
    int nZ = 4;
    int nK1 = nx * 4;
    struct blasfeo_dmat A;
    struct blasfeo_dmat B;
    struct blasfeo_dmat C;
    struct blasfeo_dmat result;

    double *some_doubles;
    some_doubles = (double*) calloc(10, sizeof(double));
    for (int ii = 0; ii < 10; ii++) {
        some_doubles[ii] = (double) ii;
    }
    blasfeo_allocate_dmat(nx, nx+nu, &B); //    
    blasfeo_allocate_dmat(nK1, nx, &A); //    
    blasfeo_allocate_dmat(nK1, nu, &result); //
    blasfeo_allocate_dmat(nK1, nu, &C); //   

    for (int ii = 0; ii < nx; ii++) {
        blasfeo_pack_dmat(1,10, &some_doubles[0], 1, &B, ii,0);
    }
    blasfeo_dgese(nK1, nx, 1.0, &A, 0, 0);

    printf("A = \n");
    blasfeo_print_dmat(nK1,  nx, &A, 0, 0);
    printf("B_multiplication = \n");
    blasfeo_print_dmat(nx, nu, &B, 0, nx);    
    blasfeo_dgemm_nn(nK1, nu,  nx, -1.0, &A, 0, 0, &B, 0, nx, 1.0, &C, 0, 0, &result , 0, 0); // Blasfeo HP & Reference differ here

    printf("A * B_multiplication: result = \n");
    blasfeo_print_exp_dmat(nK1,  nu, &result,0,0);

    return 0;
}

blasfeo_d_blas.h docu

I just got really confused by this:

// y = y + alpha*x
void daxpy_libstr(int kmax, double alpha, struct d_strvec *sx, int xi, struct d_strvec *sy, int yi, struct d_strvec *sz, int zi);

Wondering what z is used for. I guess the comment should be z=y+alpha*x.
There are some similar comments in this .h file.
Fixing these comments, would make blasfeo easier to use for beginners, like me, i guess.

[ct-v2] s_aux_lib8.c: undefined reference to `kernel_sgead_8_7_gen_lib8'

Hi all,

When I tried to compile the branch with ct-v2 tag, it fails with the following information

[ 92%] Building C object test_problems/CMakeFiles/s_blas.dir/test_blas_s.c.o [ 95%] Linking C executable s_blas ../libblasfeo.a(s_aux_lib8.c.o): In functionsgead_libstr':
s_aux_lib8.c:(.text+0x4c48): undefined reference to kernel_sgead_8_7_gen_lib8' s_aux_lib8.c:(.text+0x4da2): undefined reference to kernel_sgead_8_7_lib8'
s_aux_lib8.c:(.text+0x4e64): undefined reference to kernel_sgead_8_0_lib8' s_aux_lib8.c:(.text+0x4ebb): undefined reference to kernel_sgead_8_0_gen_lib8'
s_aux_lib8.c:(.text+0x4f4a): undefined reference to kernel_sgead_8_3_lib8' s_aux_lib8.c:(.text+0x4fb2): undefined reference to kernel_sgead_8_3_gen_lib8'
s_aux_lib8.c:(.text+0x4ff3): undefined reference to kernel_sgead_8_0_gen_lib8' s_aux_lib8.c:(.text+0x50aa): undefined reference to kernel_sgead_8_1_lib8'
s_aux_lib8.c:(.text+0x5112): undefined reference to kernel_sgead_8_1_gen_lib8' s_aux_lib8.c:(.text+0x519b): undefined reference to kernel_sgead_8_2_lib8'
s_aux_lib8.c:(.text+0x5201): undefined reference to kernel_sgead_8_2_gen_lib8' s_aux_lib8.c:(.text+0x523e): undefined reference to kernel_sgead_8_1_gen_lib8'
s_aux_lib8.c:(.text+0x526c): undefined reference to kernel_sgead_8_7_gen_lib8' s_aux_lib8.c:(.text+0x5302): undefined reference to kernel_sgead_8_4_lib8'
s_aux_lib8.c:(.text+0x536a): undefined reference to kernel_sgead_8_4_gen_lib8' s_aux_lib8.c:(.text+0x5402): undefined reference to kernel_sgead_8_5_lib8'
s_aux_lib8.c:(.text+0x546a): undefined reference to kernel_sgead_8_5_gen_lib8' s_aux_lib8.c:(.text+0x54a6): undefined reference to kernel_sgead_8_2_gen_lib8'
s_aux_lib8.c:(.text+0x5552): undefined reference to kernel_sgead_8_6_lib8' s_aux_lib8.c:(.text+0x55ba): undefined reference to kernel_sgead_8_6_gen_lib8'
s_aux_lib8.c:(.text+0x55f6): undefined reference to kernel_sgead_8_3_gen_lib8' s_aux_lib8.c:(.text+0x564e): undefined reference to kernel_sgead_8_4_gen_lib8'
s_aux_lib8.c:(.text+0x56b9): undefined reference to kernel_sgead_8_5_gen_lib8' s_aux_lib8.c:(.text+0x56f7): undefined reference to kernel_sgead_8_6_gen_lib8'
collect2: error: ld returned 1 exit status
test_problems/CMakeFiles/s_blas.dir/build.make:95: recipe for target 'test_problems/s_blas' failed
make[2]: *** [test_problems/s_blas] Error 1
CMakeFiles/Makefile2:124: recipe for target 'test_problems/CMakeFiles/s_blas.dir/all' failed
make[1]: *** [test_problems/CMakeFiles/s_blas.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
`

My system is Ubuntu 16.04 64bits, the way I compile it is simply cmake and make.

Then I tried to compile the master branch, it works without any problem.

Could anyone give me any hint?

Thank you in advance!
Best,
Kahn

Argument aliasing check

As pointed out in #27 aliasing of routines arguments can lead to unexpected and unwanted behaviors.
To mitigate this problem I propose two incremental solutions:

  • Specify this requirement more clearly in the documentation
  • Add a compilation flag with which one can enable some kind of checks for argument aliasing on all routines that do not allow them.

Windows Support

It seems that Windows Support is implied yet I couldn't find any explicit documentation about it in the web site (For instance, Installation section in https://blasfeo.syscop.de).

Are there any official instructions to create a Static and Shared libraries of BLASFEO under Windows?

How to compile armv7 library on linux?

I'm a earlier.
I have changed target to armv7 in Makefile.rule and make.
but error occur.
gcc: error :unreconized command line option '-mfpu=neon-vfpv4'
what do i have to do?
thanks in advance.

Downgrade default target

Hi,

I have a reasonably new Laptop with the following CPU:
Intel® Core™ i7-3520M CPU @ 2.90GHz × 4
and Ubuntu 18.04.

Unfortunately, the new default target (X64_INTEL_HASWELL) does not work for me.
I think it would be good to downgrade the default target to SANDY_BRIDGE or GENERIC, in order to reduce the installation effort of HPIPM.

Performance Comparison to Intel MKL (MATLAB)

Seeing the benchmarks of BLASFEO and how it beats Intel MKL on small matrices made want to create a MATLAB MEX wrapper for it to speed up small matrices calculations.

The logic was, since BLASFEO beats Intel MKL on tests with no overheads with MEX I'd beat MATLAB by a lot since MATLAB only adds overhead on top of MKL and doesn't use MKL_DIRECT_CALL.
All reasons to be optimistic.

I implemented a MEX wrapper around blasfeo_dgemm() and validated it against MATLAB (The error is almost nothing).

Then I did a run time analysis:

Figure0001

Now, the BLASFEO MEX working in place (Namely it receives a pre allocated matrix to write the result onto) while MATLAB has to use its regular API (Allocates the output, overhead on the input).
Yet still it much faster than BLASFEO compiled with AVX2 code path.

MATLAB does use Multi Threading (I don't know the threshold, but it does as I can see on the CPU Utilization graph). But even for very small matrices (Size 2:10) MATLAB beats BLASFEO.

I think that in order to validate results we need to use the Multi Threaded version of MKL in benchmarks.

This is the analysis MATLAB File - RunTimeAnalysis.zip.

Generic implementation functions wrong number of arguments

When manually compiling the generic kernel implementation, there are a few functions that have a mismatch in the number of arguments in the header and the source (and/or different argument or return types). The functions are:

kernel_dgemm_diag_left_4_a0_lib4
kernel_dgemm_diag_left_4_lib4
kernel_dlarf_t_4_lib4
kernel_strcp_l_2_0_lib4
kernel_sgemm_diag_left_4_a0_lib4
kernel_sgemm_diag_left_4_lib4

It seems that some have been changed in the avx implementation a while ago (e.g. kernel_dlarf_t_4_lib4 in 4604ddc on Apr 25, 2017) but not in the generic.

fast cvt_mat2strmat for row and column

@tmmsartor
The case of a mat with one dimension equal to 1 (e.g. converting a row or a column) should be "fast" (i.e. coded explicitly in the routine and without falling back to kernels) as it is common.

Bug in blasfeo_dgemm_nn

Hey guys,

I am using BLASFEO_VERSION = HIGH_PERFORMANCE and BLASFEO_TARGET = X64_INTEL_SANDY_BRIDGE, which worked quite well for me before.
But now I got wrong results using blasfeo_dgemm_nn. I wrote the following minimal example:

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

// blasfeo
#include <blasfeo/include/blasfeo_target.h>
#include <blasfeo/include/blasfeo_common.h>
#include <blasfeo/include/blasfeo_d_aux.h>
#include <blasfeo/include/blasfeo_d_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_v_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_d_blas.h>
int main() {
int nx = 8;
int nu = 2;
int nZ = 4;
struct blasfeo_dmat B;
struct blasfeo_dmat A;
struct blasfeo_dmat result;

double *some_doubles;
some_doubles = (double*) calloc(10, sizeof(double));
for (int ii = 0; ii < 10; ii++) {
    some_doubles[ii] = (double) ii;
}

blasfeo_allocate_dmat(nx, nx+nu, &B); //    
blasfeo_allocate_dmat(nZ, nx, &A); //    
blasfeo_allocate_dmat(nZ, nu, &result); //   

for (int ii = 0; ii < nx; ii++) {
    blasfeo_pack_dmat(1,10, &some_doubles[0], 1, &B, ii,0);
}
for (int ii = 0; ii < 4; ii++) {
    blasfeo_pack_dmat(1,1, &some_doubles[1],1, &A, 0+ii,1+2*ii);
}
blasfeo_pack_dmat(1,1, &some_doubles[1],1, &A, 0,1);

printf("A = \n");
blasfeo_print_dmat(nZ,  nx, &A,0,0);
printf("B_multiplication = \n");
blasfeo_print_dmat(nx, nu, &B,0,nx);

blasfeo_dgemm_nn(nZ, nu , nx, 1.0, &A, 0, 0, &B, 0, nx, 0.0, &A, 0, 0, &result, 0, 0);

printf("A * B_multiplication: result = \n");
blasfeo_print_dmat(nZ,  nu, &result,0,0);

return 0;

}

This gives me:
`
A =
0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000

B_multiplication =
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000`
8.00000 9.00000
8.00000 9.00000

A * B_multiplication: result =
8.00000 9.00000
8.00000 9.00000
0.00000 0.00000
0.00000 0.00000

I will switch to using the REFERENCE instead of HP for now. Which gives the correct result:

A * B_multiplication: result =
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000

Shared library target fails on old gcc

/usr/bin/x86_64-linux-gnu-ld: CMakeFiles/blasfeo.dir/kernel/avx2/kernel_dgemm_12x4_lib4.S.o: relocation R_X86_64_PC32 against symbol `inner_kernel_dgemm_add_nt_12x4_lib4' can not be used when making a shared object; recompile with -fPIC /usr/bin/x86_64-linux-gnu-ld: final link failed: Bad value collect2: error: ld returned 1 exit status

BLASFEO fails to compile as shared library both with make and Cmake for some gcc version/configuration,
(tested and failing on ubuntu with gcc 4.9, 4.8 and 6)

Broken since 62d64013, with this commit some x86 internal assembly "functions" are exported as global symbols, before this change the shared library compilation is working.

Error message is bogus when linking hand written assembly code.
Most relevant reference: stackoverflow 1, 2, 3 and PR #68 .

Using MACROS_LEVEL=2 solve the issue. In this case the global symbol is not even defined.

blasfeo_dsyrk_dpotrf_ln() returns incorrect results on X64_AMD_BULLDOZER target

Checked against openblas

Actual value:
( 2.7854532671410537            0            0            0            0 )
( 0.83913770035387469 2.9938299960925923            0            0            0 )
( 0.58622882080751648 0.70368090020327312 2.1283975704681657            0            0 )
( 0.38255481974053374 0.48675723887823252 -0.025473476463104543 2.4204282617974231            0 )
( 0.39976739552731955 0.72857415255088964 -0.19353788779731101 0.0968379384920936 2.4809996556130165 )
Expected value:
( 2.7854532671410532            0            0            0            0 )
( 0.83913770035387469 2.9938299960925918            0            0            0 )
( 0.58622882080751648 0.7036809002032729 2.6269317099125673            0            0 )
( 0.38255481974053374 0.48675723887823247 0.23201943119372695 2.48300863498413            0 )
( 0.39976739552731966 0.72857415255088964 0.31399001255216946 0.35722002546495896 2.4445854284358015 )

Starting from column 2 we see that the results differ.

Segfault dpotrf_l for target HASWELL

The following program segfaults for HASWELL (not for GENERIC). I'm trying to compute the cholesky decomposition of a submatrix and store it in a smaller matrix. For GENERIC, I get the correct result (checked in the last lines).

#include <stdlib.h>

#include "blasfeo_d_blas.h"
#include "blasfeo_d_aux.h"
#include "blasfeo_d_aux_ext_dep.h"

int main()
{

    double R[4] = {4, 2, 2, 2};
    double Q[4] = {1, 2, 2, 8};
    double S[4] = {-0.25, -0.5, -0.75, -1};

    struct blasfeo_dmat RSQ, L;

    int num_bytes = blasfeo_memsize_dmat(4, 4);
    void *raw_mem = malloc(num_bytes);
    blasfeo_create_dmat(4, 4, &RSQ, raw_mem);
    blasfeo_dgese(4, 4, 0.0, &RSQ, 0, 0);
    
    num_bytes = blasfeo_memsize_dmat(2, 2);
    raw_mem = malloc(num_bytes);
    blasfeo_create_dmat(2, 2, &L, raw_mem);
    blasfeo_dgese(2, 2, 0.0, &L, 0, 0);

    blasfeo_pack_dmat(2, 2, R, 2, &RSQ, 0, 0);
    blasfeo_pack_dmat(2, 2, Q, 2, &RSQ, 2, 2);
    blasfeo_pack_tran_dmat(2, 2, S, 2, &RSQ, 2, 0);

    /// Segfault occurs here
    blasfeo_dpotrf_l(2, &RSQ, 0, 0, &L, 0, 0);
    ///

    blasfeo_print_dmat(2, 2, &L, 0, 0);

    blasfeo_dgemm_nt(2, 2, 2, 1.0, &L, 0, 0, &L, 0, 0, 0.0, &L, 0, 0, &L, 0, 0);

    blasfeo_print_dmat(2, 2, &L, 0, 0);

}

Do not link to 'm' explicitly

MSVC is failing with following message:

screen shot 2018-07-17 at 11 08 22

Please do not link explicitly to m in CMake, because it does not exist on Windows systems.

Examples memory leakage

Sorry to bother you again, but the "getting_started" and "example_d_ricatti_recursion" have memory leaks according to valgrind.

const correctness

Have you ever considered making the input vectors/matrices to the BLASFEO routines const? It seems standard with other BLAS implementations, e.g. Netlib CBLAS does this: http://www.netlib.org/blas/cblas.h

If you change to for example

void blasfeo_daxpy(int kmax, const double alpha, const struct blasfeo_dvec *sx, int xi, const struct blasfeo_dvec *sy, int yi, struct blasfeo_dvec *sz, int zi);

it would have the following advantages:

  • it communicates the intent of your code more clearly (i.e. BLASFEO is not going to change anything in the struct pointed to)
  • const correctness (https://www.cprogramming.com/tutorial/const_correctness.html)
  • don't need casts of the type (blasfeo_dvec *)... or const_cast<blasfeo_dvec *>(...) in wrapper code

Erroneous "Stack size likely to be exceeded" make error when 'ulimit -s' returns "unlimited"

After making three variable modifications to Makefile.rule:

BLAS_API = 1
FORTRAN_BLAS_API = 1
CBLAS_API = 1

of commit 78c3120, I run make and then receive the following error:

$ make
Parsing Makefile.rule
Makefile.rule:324: *** stack size likely to be exceeded, please decrease the value of K_MAX_STACK .  Stop.

Looking at Makefile.rule more closely, I find around line 321:

STACK_SIZE := $(shell ulimit -s)
STACK_SIZE_EXCEEDED := $(shell echo $(K_MAX_STACK)*12*8*2 \> $(STACK_SIZE)*1024 | bc )
ifeq ($(STACK_SIZE_EXCEEDED), 1)
$(error stack size likely to be exceeded, please decrease the value of K_MAX_STACK )
endif
CFLAGS  += -DK_MAX_STACK=$(K_MAX_STACK)

It appears that this code doesn't work properly when ulimit -s returns unlimited, as it does on my current Cray Linux Environment 6 system:

$ uname -a
Linux nid00019 4.4.103-6.38_4.0.95-cray_ari_c #1 SMP Fri Feb 9 17:52:44 UTC 2018 (172b90b) x86_64 x86_64 x86_64 GNU/Linux

I checked a previous version of BLASFEO that was working fine (2c9f312) and it appears that the code snippet above is new.

Commenting out the first five lines above (all except for update of CFLAGS) seems to allow the build system to finish normally.

invalid reads

When running valgrind on the d_riccati_recursion example,

it returns invalid read errors in the functions:
blasfeo_dtrmv_lnn (d_blas2_lib4.c:575) and blasfeo_pack_tran_dmat (d_aux_lib4.c:2184)

warning: using integer absolute value function 'abs' when argument is of floating point type

/usr/ports/math/blasfeo/work/blasfeo-0.1.2/examples/tools.c:412:10: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
                temp = abs(*(ptrA+j*row));
                       ^
/usr/ports/math/blasfeo/work/blasfeo-0.1.2/examples/tools.c:412:10: note: use function 'fabs' instead
                temp = abs(*(ptrA+j*row));
                       ^~~
                       fabs
/usr/ports/math/blasfeo/work/blasfeo-0.1.2/examples/tools.c:415:12: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
                        temp += abs(*(ptrA+j*row+i));
                                ^
/usr/ports/math/blasfeo/work/blasfeo-0.1.2/examples/tools.c:415:12: note: use function 'fabs' instead
                        temp += abs(*(ptrA+j*row+i));
                                ^~~
                                fabs

armv7a: Error: co-processor offset out of range in kernel_dgemm_4x4_lib4.S

When compiling with TARGET_ARMV7A_ARM_CORTEX_A15 (and at least A7) with the current gcc in Debian stable (Buster) which is arm-linux-gnueabihf-gcc (Debian 8.3.0-2) 8.3.0 kernel/armv7a/kernel_dgemm_4x4_lib4.S does not assemble correctly:

kernel_dgemm_4x4_lib4.S: Assembler messages:
kernel_dgemm_4x4_lib4.S:2053: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2122: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2163: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2247: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2302: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2302: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2302: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2302: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2302: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2370: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2530: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2621: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2621: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2621: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2621: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2621: Error: co-processor offset out of range

This is related to the .LC00 and .LC01 labels that hold zero constants that are used in various places as second parameter for the fldd instructions. The labels are not allowed to have too great offsets from their usages but apparently the assembler thinks this is the case in the above cases. I could not find a fix for that but my assembler foo is lacking ;)

BTW: there is another breakage on armv7a regarding bad instructions but I have a fix for that and will open a pull request soon.

The preprocessed output of /usr/lib/gcc-cross/arm-linux-gnueabihf/8/cc1 -E -lang-asm -quiet -v -imultilib . -imultiarch arm-linux-gnueabihf -D MACRO_LEVEL=1 -D OS_LINUX -D TARGET_ARMV7A_ARM_CORTEX_A15 kernel_dgemm_4x4_lib4.S -mfpu=neon-vfpv4 -mfloat-abi=hard -mthumb -mtls-dialect=gnu -march=armv7-a+neon-vfpv4 -fno-directives-only is attached. It's then fed into /usr/lib/gcc-cross/arm-linux-gnueabihf/8/../../../../arm-linux-gnueabihf/bin/as -v -march=armv7-a -mfloat-abi=hard -mfpu=neon-vfpv4 -meabi=5 -o kernel_dgemm_4x4_lib4.o which produces the errors above. The flow is easily visible by executing arm-linux-gnueabihf-gcc -DMACRO_LEVEL=1 -DOS_LINUX -mfpu=neon-vfpv4 -DTARGET_ARMV7A_ARM_CORTEX_A15 -c -o kernel_dgemm_4x4_lib4.o kernel_dgemm_4x4_lib4.S -v (note the -v) in kernel/armv7a.

cc3nUHts.s.log

blasfeo_dgead - Bug in Sandy Bridge implementation offset 15, 0

Hey guys,

I think I found a Bug in the Sandy Bridge implementation of blasfeo_dgead.
Namely when using the function
void blasfeo_dgead(int m, int n, double alpha, struct blasfeo_dmat *sA, int ai, int aj, struct blasfeo_dmat *sC, int ci, int cj)
with
m = 5, n = 8, ci = 15, cj = 0.
It seems like the corresponding block is not added, the entries stay zero in contrast to the Generic implementation, where it works fine..

Should help to find the bug..
Best

Unpredictable execution time of BLASFEO functions caused by uninitialized padding elements of the matrix

Attached is the source code of a benchmark which measures execution time of the gemm BLAS routine implemented in BLASFEO and in a BLAS library of choice for different matrix sizes.

gemm-benchmark.zip

The benchmark uses the Google Benchmark library.

Running the benchmark with the following command line

./bench --benchmark_repetitions=5

gives the following output:

2019-09-18 13:08:50
Running ./bench
Run on (4 X 3200 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 6144K (x1)
Load Average: 1.86, 1.50, 1.18
--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
BM_gemm_blasfeo/2/2/2                  277 ns          277 ns      2521364
BM_gemm_blasfeo/2/2/2                  277 ns          277 ns      2521364
BM_gemm_blasfeo/2/2/2                  277 ns          277 ns      2521364
BM_gemm_blasfeo/2/2/2                  320 ns          320 ns      2521364
BM_gemm_blasfeo/2/2/2                  320 ns          320 ns      2521364
BM_gemm_blasfeo/2/2/2_mean             294 ns          294 ns            5
BM_gemm_blasfeo/2/2/2_median           277 ns          277 ns            5
BM_gemm_blasfeo/2/2/2_stddev          23.5 ns         23.5 ns            5
BM_gemm_blasfeo/3/3/3                 2143 ns         2143 ns       319712
BM_gemm_blasfeo/3/3/3                 2143 ns         2143 ns       319712
BM_gemm_blasfeo/3/3/3                 2142 ns         2142 ns       319712
BM_gemm_blasfeo/3/3/3                 2228 ns         2228 ns       319712
BM_gemm_blasfeo/3/3/3                 2228 ns         2228 ns       319712
BM_gemm_blasfeo/3/3/3_mean            2177 ns         2177 ns            5
BM_gemm_blasfeo/3/3/3_median          2143 ns         2143 ns            5
BM_gemm_blasfeo/3/3/3_stddev          46.6 ns         46.6 ns            5
BM_gemm_blasfeo/5/5/5                11403 ns        11403 ns        61176
BM_gemm_blasfeo/5/5/5                11402 ns        11402 ns        61176
BM_gemm_blasfeo/5/5/5                11402 ns        11402 ns        61176
BM_gemm_blasfeo/5/5/5                 2673 ns         2672 ns        61176
BM_gemm_blasfeo/5/5/5                 2673 ns         2673 ns        61176
BM_gemm_blasfeo/5/5/5_mean            7911 ns         7910 ns            5
BM_gemm_blasfeo/5/5/5_median         11402 ns        11402 ns            5
BM_gemm_blasfeo/5/5/5_stddev          4781 ns         4781 ns            5
BM_gemm_blasfeo/10/10/10             10092 ns        10092 ns        68876
BM_gemm_blasfeo/10/10/10             10093 ns        10093 ns        68876
BM_gemm_blasfeo/10/10/10             10092 ns        10092 ns        68876
BM_gemm_blasfeo/10/10/10              9707 ns         9707 ns        68876
BM_gemm_blasfeo/10/10/10              9707 ns         9706 ns        68876
BM_gemm_blasfeo/10/10/10_mean         9938 ns         9938 ns            5
BM_gemm_blasfeo/10/10/10_median      10092 ns        10092 ns            5
BM_gemm_blasfeo/10/10/10_stddev        211 ns          211 ns            5
BM_gemm_blasfeo/20/20/20              1078 ns         1078 ns       639117
BM_gemm_blasfeo/20/20/20              1078 ns         1078 ns       639117
BM_gemm_blasfeo/20/20/20              1078 ns         1078 ns       639117
BM_gemm_blasfeo/20/20/20              1066 ns         1066 ns       639117
BM_gemm_blasfeo/20/20/20              1067 ns         1067 ns       639117
BM_gemm_blasfeo/20/20/20_mean         1074 ns         1074 ns            5
BM_gemm_blasfeo/20/20/20_median       1078 ns         1078 ns            5
BM_gemm_blasfeo/20/20/20_stddev       6.34 ns         6.34 ns            5
BM_gemm_blasfeo/30/30/30              2594 ns         2594 ns       268109
BM_gemm_blasfeo/30/30/30              2595 ns         2595 ns       268109
BM_gemm_blasfeo/30/30/30              2594 ns         2594 ns       268109
BM_gemm_blasfeo/30/30/30              2595 ns         2595 ns       268109
BM_gemm_blasfeo/30/30/30              2595 ns         2595 ns       268109
BM_gemm_blasfeo/30/30/30_mean         2594 ns         2594 ns            5
BM_gemm_blasfeo/30/30/30_median       2595 ns         2595 ns            5
BM_gemm_blasfeo/30/30/30_stddev      0.340 ns        0.372 ns            5
BM_gemm_cblas/2/2/2                    235 ns          235 ns      2972773
BM_gemm_cblas/2/2/2                    235 ns          235 ns      2972773
BM_gemm_cblas/2/2/2                    235 ns          235 ns      2972773
BM_gemm_cblas/2/2/2                    235 ns          235 ns      2972773
BM_gemm_cblas/2/2/2                    235 ns          235 ns      2972773
BM_gemm_cblas/2/2/2_mean               235 ns          235 ns            5
BM_gemm_cblas/2/2/2_median             235 ns          235 ns            5
BM_gemm_cblas/2/2/2_stddev           0.011 ns        0.011 ns            5
BM_gemm_cblas/3/3/3                    392 ns          392 ns      1786397
BM_gemm_cblas/3/3/3                    392 ns          392 ns      1786397
BM_gemm_cblas/3/3/3                    392 ns          392 ns      1786397
BM_gemm_cblas/3/3/3                    392 ns          392 ns      1786397
BM_gemm_cblas/3/3/3                    392 ns          392 ns      1786397
BM_gemm_cblas/3/3/3_mean               392 ns          392 ns            5
BM_gemm_cblas/3/3/3_median             392 ns          392 ns            5
BM_gemm_cblas/3/3/3_stddev           0.021 ns        0.021 ns            5
BM_gemm_cblas/5/5/5                    472 ns          472 ns      1483886
BM_gemm_cblas/5/5/5                    472 ns          472 ns      1483886
BM_gemm_cblas/5/5/5                    472 ns          472 ns      1483886
BM_gemm_cblas/5/5/5                    472 ns          472 ns      1483886
BM_gemm_cblas/5/5/5                    472 ns          472 ns      1483886
BM_gemm_cblas/5/5/5_mean               472 ns          472 ns            5
BM_gemm_cblas/5/5/5_median             472 ns          472 ns            5
BM_gemm_cblas/5/5/5_stddev           0.380 ns        0.380 ns            5
BM_gemm_cblas/10/10/10                 841 ns          841 ns       817796
BM_gemm_cblas/10/10/10                 841 ns          841 ns       817796
BM_gemm_cblas/10/10/10                 841 ns          841 ns       817796
BM_gemm_cblas/10/10/10                 841 ns          841 ns       817796
BM_gemm_cblas/10/10/10                 841 ns          841 ns       817796
BM_gemm_cblas/10/10/10_mean            841 ns          841 ns            5
BM_gemm_cblas/10/10/10_median          841 ns          841 ns            5
BM_gemm_cblas/10/10/10_stddev        0.249 ns        0.249 ns            5
BM_gemm_cblas/20/20/20                1905 ns         1905 ns       364260
BM_gemm_cblas/20/20/20                1905 ns         1905 ns       364260
BM_gemm_cblas/20/20/20                1904 ns         1904 ns       364260
BM_gemm_cblas/20/20/20                1921 ns         1921 ns       364260
BM_gemm_cblas/20/20/20                1922 ns         1921 ns       364260
BM_gemm_cblas/20/20/20_mean           1911 ns         1911 ns            5
BM_gemm_cblas/20/20/20_median         1905 ns         1905 ns            5
BM_gemm_cblas/20/20/20_stddev         9.13 ns         9.14 ns            5
BM_gemm_cblas/30/30/30                4240 ns         4240 ns       165046
BM_gemm_cblas/30/30/30                4240 ns         4239 ns       165046
BM_gemm_cblas/30/30/30                4239 ns         4239 ns       165046
BM_gemm_cblas/30/30/30                4239 ns         4239 ns       165046
BM_gemm_cblas/30/30/30                4238 ns         4238 ns       165046
BM_gemm_cblas/30/30/30_mean           4239 ns         4239 ns            5
BM_gemm_cblas/30/30/30_median         4239 ns         4239 ns            5
BM_gemm_cblas/30/30/30_stddev        0.559 ns        0.554 ns            5

One can see that, according to the benchmark, the execution time of BLASFEO dgemm() varies a lot, whereas the execution time of the openblas implementation is very stable.

  • BLASFEO is built in Release mode for INTEL_HASWELL architecture.
  • The compiler is g++ (Ubuntu 8.3.0-6ubuntu1) 8.3.0
  • The benchmark is run on an Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz CPU (Skylake architechure).
  • The operating system is Ubuntu Linux 19.04 with 5.0.0-29-generic kernel.

BLASFEO_PROCESSOR_FEATURES as identifier instead of object

We are using Blasfeo in our research project. If we have multiple compilation units in the library we are developing, we get linking errors as the object BLASFEO_PROCESSOR_FEATURES gets defined multiple times:

We did not find where BLASFEO_PROCESSOR_FEATURES is used as an object, but maybe we missed something (could be depending on the options). Perhaps the idea was to use BLASFEO_PROCESSOR_FEATURES as an identifier instead of an object? Or is there a reason for it?

enum BLASFEO_PROCESSOR_FEATURES
{
	// x86-64 CPU features
	BLASFEO_PROCESSOR_FEATURE_AVX  = 0x0001,    /// AVX instruction set
	BLASFEO_PROCESSOR_FEATURE_AVX2 = 0x0002,    /// AVX2 instruction set
	BLASFEO_PROCESSOR_FEATURE_FMA  = 0x0004,    /// FMA instruction set
	BLASFEO_PROCESSOR_FEATURE_SSE3 = 0x0008,    /// SSE3 instruction set

	// ARM CPU features
	BLASFEO_PROCESSOR_FEATURE_VFPv3  = 0x0100,  /// VFPv3 instruction set
	BLASFEO_PROCESSOR_FEATURE_NEON   = 0x0100,  /// NEON instruction set
	BLASFEO_PROCESSOR_FEATURE_VFPv4  = 0x0100,  /// VFPv4 instruction set
	BLASFEO_PROCESSOR_FEATURE_NEONv2 = 0x0100,  /// NEONv2 instruction set
};

Thank you,
Wilm and Lander

Support for Clang-CL on Windows

While there is some Windows Support it is limited to the GENERIC code path.

I suggest that support for Clang-CL will be added in CMAKELists.txt so if the compiler is Clang-CL things will work like GCC on Linux as Clang-CL should have support for AT&T style of assembly.

The problem is the configuration somehow doesn't support that.
I get errors like:

..\kernel\generic\kernel_dgemm_4x4_lib4.c(3056,15): error: expected ';' at end of declaration
        double CC[16] __declspec(align(64)) = {0};
                     ^
                     ;
..\kernel\generic\kernel_dgemm_4x4_lib4.c(3056,38): error: expected identifier or '('
        double CC[16] __declspec(align(64)) = {0};
                                            ^
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6398,15): error: expected ';' at end of declaration
        double CC[16] __declspec(align(64)) = {0};
                     ^
                     ;
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6398,38): error: expected identifier or '('
        double CC[16] __declspec(align(64)) = {0};
                                            ^
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6480,15): error: expected ';' at end of declaration
        double CC[16] __declspec(align(64)) = {0};
                     ^
                     ;
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6480,38): error: expected identifier or '('
        double CC[16] __declspec(align(64)) = {0};
                                            ^
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6654,15): error: expected ';' at end of declaration
        double CC[16] __declspec(align(64)) = {0};
                     ^
                     ;
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6654,38): error: expected identifier or '('
        double CC[16] __declspec(align(64)) = {0};
                                            ^
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6757,15): error: expected ';' at end of declaration
        double CC[16] __declspec(align(64)) = {0};

Also it tries to use flags like -fPIC -msse3 etc...
Those flags matches GCC.

SO I suggest that the configuration will support Clang-CL as it was MSVC with the only different it will allow it to use different configurations.

I think the flags:

set(C_FLAGS_TARGET_X64_INTEL_HASWELL      "-m64 -mavx -mavx2 -mfma")
set(C_FLAGS_TARGET_X64_INTEL_SANDY_BRIDGE "-m64 -mavx")
set(C_FLAGS_TARGET_X64_INTEL_CORE         "-m64 -msse3")
set(C_FLAGS_TARGET_X64_AMD_BULLDOZER      "-m64 -mavx -mfma")

Needs to be decorated with -XClang or adapted to Windows.

benchmarks/cpu_freq.h.example licensed under GPL3 + classpath

The file benchmarks/cpu_freq.h.example is licensed under GPL3 + classpath. Is this intentional? It's alway a bit tricky to combine GPL with other licenses in one project. Due to the class-path exception, particularly for source distributions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.