giaf / blasfeo Goto Github PK
View Code? Open in Web Editor NEWBasic linear algebra subroutines for embedded optimization
License: Other
Basic linear algebra subroutines for embedded optimization
License: Other
Hi guys,
I hope this is the right place for this issue.
Since a couple of days I'm struggling with hpipm. The example runs perfectly fine but
when using hpipm for my particular ocp I get the following error:
Program received signal SIGSEGV, Segmentation fault.
0x00005555558aaf87 in d_ocp_qp_init_var ()
Does anyone know how to deal with that? I appreciate any help.
Thanks a lot.
Best, Miri
In the 'reduce' step of blasfeo_ddot
, a horizontal add _mm_hadd_pd
is computed. Instead, one could replace
u_tmp = _mm_hadd_pd(u_tmp, u_tmp);
with
__m128d hi64 = _mm_unpackhi_pd(u_tmp, u_tmp);
u_tmp = _mm_add_sd(u_tmp, hi64);
effectively trading a packed double operation with a scalar one.
Hey guys,
I found a bug in blasfeo_dgemv_t:
I am using BLASFEO_TARGET = X64_INTEL_SANDY_BRIDGE.
Running the following example, I get wrong results using both HIGH PERFORMANCE and REFERENCE:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
// blasfeo
#include <blasfeo/include/blasfeo_target.h>
#include <blasfeo/include/blasfeo_common.h>
#include <blasfeo/include/blasfeo_d_aux.h>
#include <blasfeo/include/blasfeo_d_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_v_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_d_blas.h>
int main() {
int nx = 9;
int nu = 2;
struct blasfeo_dmat A;
struct blasfeo_dvec lambda;
struct blasfeo_dvec lambda0;
blasfeo_allocate_dmat(nx, nx, &A); //
blasfeo_allocate_dvec(nx+nu, &lambda);
blasfeo_allocate_dvec(nx+nu, &lambda0);
for (int ii = 0; ii < nx+nu; ii++) {
blasfeo_dvecin1((double) ii, &lambda, ii);
}
blasfeo_dgese(nx, nx, 1.0, &A, 0, 0);
blasfeo_print_dmat(nx, nx, &A, 0, 0);
printf("lambda = \n");
blasfeo_print_dvec(nx+nu, &lambda, 0);
blasfeo_dgemv_t(nx, nx, 1.0, &A, 0, 0, &lambda, 0, 0.0, &lambda0, 0, &lambda, 0); // recheck!
printf("lambda: result = \n");
blasfeo_print_exp_dvec(nx+ nu, &lambda,0);
return 0;
}
REFERENCE prints:
lambda: result =
3.600000e+01
3.600000e+01
1.070000e+02
1.070000e+02
3.160000e+02
3.160000e+02
9.390000e+02
9.390000e+02
2.804000e+03
9.000000e+00
1.000000e+01
whereas HP prints:
lambda: result =
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
2.960000e+02
9.000000e+00
1.000000e+01
The correct result should be:
lambda =
36
36
36
36
36
36
36
36
36
9
10
I hope this helps fixing stuff!
I want to use it in Android ,How should I compile it ?
Apparently, blasfeo_strsv_lnu
and blasfeo_strsv_unn
are not implemented for high-performance linear algebra.
Is it on the roadmap anytime soon?
Hello,
i am missing the point to test against other libraries like openblas, because where should i add the according references for example in cmake.
best regards
Hi,
I tried compiling BLASFEO with Intel Haswell as target architecture but it threw an error concerning avx2. Here is the log. It works under other Intel architectures but my processor is in the Haswell family.
Thank you very much!
[ggleizer@localhost blasfeo]$ make rm -f libblasfeo.a make -C auxiliary clean make[1]: Entering directory
/ggleizer/hpmpc/blasfeo/auxiliary'
rm -f *.o
make -C avx2 clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/auxiliary/avx2' rm -f *.o make[2]: Leaving directory
/ggleizer/hpmpc/blasfeo/auxiliary/avx2'
make -C avx clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/auxiliary/avx' rm -f *.o make[2]: Leaving directory
/ggleizer/hpmpc/blasfeo/auxiliary/avx'
make -C c99 clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/auxiliary/c99' rm -f *.o make[2]: Leaving directory
/ggleizer/hpmpc/blasfeo/auxiliary/c99'
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/auxiliary' make -C kernel clean make[1]: Entering directory
/ggleizer/hpmpc/blasfeo/kernel'
make -C avx2 clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/kernel/avx2' rm -f *.o rm -f *.s make[2]: Leaving directory
/ggleizer/hpmpc/blasfeo/kernel/avx2'
make -C avx clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/kernel/avx' rm -f *.o rm -f *.s make[2]: Leaving directory
/ggleizer/hpmpc/blasfeo/kernel/avx'
make -C sse3 clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/kernel/sse3' rm -f *.o rm -f *.s make[2]: Leaving directory
/ggleizer/hpmpc/blasfeo/kernel/sse3'
make -C fma clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/kernel/fma' rm -f *.o rm -f *.s make[2]: Leaving directory
/ggleizer/hpmpc/blasfeo/kernel/fma'
make -C c99 clean
make[2]: Entering directory /ggleizer/hpmpc/blasfeo/kernel/c99' rm -f *.o rm -f *.s make[2]: Leaving directory
/ggleizer/hpmpc/blasfeo/kernel/c99'
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel' make -C blas clean make[1]: Entering directory
/ggleizer/hpmpc/blasfeo/blas'
rm -f *.o
rm -f *.s
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/blas' make -C test_problems clean make[1]: Entering directory
/ggleizer/hpmpc/blasfeo/test_problems'
rm -f *.o
rm -f test.out
rm -f libblasfeo.a
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/test_problems' make -C examples clean make[1]: Entering directory
/ggleizer/hpmpc/blasfeo/examples'
rm -f *.o
rm -f test.out
rm -f libblasfeo.a
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/examples' touch ./include/blasfeo_target.h echo "#ifndef TARGET_X64_INTEL_HASWELL" > ./include/blasfeo_target.h echo "#define TARGET_X64_INTEL_HASWELL" >> ./include/blasfeo_target.h echo "#endif" >> ./include/blasfeo_target.h echo "#ifndef LA_HIGH_PERFORMANCE" >> ./include/blasfeo_target.h echo "#define LA_HIGH_PERFORMANCE" >> ./include/blasfeo_target.h echo "#endif" >> ./include/blasfeo_target.h ( cd auxiliary; make obj) make[1]: Entering directory
/ggleizer/hpmpc/blasfeo/auxiliary'
gcc -O2 -fPIC -m64 -mavx2 -mfma -DTARGET_X64_INTEL_HASWELL -DLA_HIGH_PERFORMANCE -DOS_LINUX -DREF_BLAS_OPENBLAS -I/opt/openblas/include -c -o d_aux_lib4.o d_aux_lib4.c
cc1: error: unrecognized command line option "-mavx2"
make[1]: *** [d_aux_lib4.o] Error 1
make[1]: Leaving directory /ggleizer/hpmpc/blasfeo/auxiliary' make: *** [static_library] Error 2
Shall we have a stable branch? We could make one based on the commit currently used in acados. Or we could have master and develop.
Look at the very last element, C[13,13]
if we're using 1-based indexing.
julia> M = K = N = 13;
julia> A = rand(M, K); B = rand(K, N); C1 = Matrix{Float64}(undef, M, N);
julia> gemmfeo!(C1, A, B'); C1 # BLASFEO
13×13 Matrix{Float64}:
3.21111 3.63236 3.60901 3.42567 4.02387 4.42455 3.78604 4.14247 3.90693 3.75106 4.28329 4.65109 3.62135
2.98562 2.27237 2.43335 3.50058 4.19308 3.65243 3.37025 3.83939 3.60735 3.11551 3.39936 3.44949 2.92242
2.66924 3.08483 3.67439 3.24781 4.3147 4.44565 3.74392 4.17439 4.14373 3.90033 3.50683 3.26632 3.22764
3.30279 3.12434 2.83558 3.38436 4.23039 4.41239 3.35741 3.79297 3.58183 3.29104 3.45312 3.61621 3.21767
2.77241 2.80838 2.23948 3.82555 4.77529 4.13492 3.73856 3.94799 3.69966 3.33374 4.02254 3.63971 3.11314
2.59138 2.99149 3.29702 2.77667 3.9835 4.59488 3.24307 3.36162 3.14416 3.48404 2.93357 3.27987 3.48901
2.19375 3.11538 2.99718 3.37941 5.23863 4.17066 3.73315 4.428 4.78703 3.46704 3.72306 3.87886 2.96676
1.79388 1.9374 1.60787 2.40525 3.53875 2.73698 2.49986 3.09528 2.78403 2.29148 2.9501 2.66847 2.19448
2.92898 2.96992 2.81447 3.48228 4.17282 4.01132 3.78475 4.21871 4.142 3.84598 4.13346 4.17309 3.47024
2.21216 2.78679 2.60065 2.4549 3.28243 3.42619 3.12983 3.94333 3.58097 3.43396 3.64217 3.66227 3.1028
2.08495 2.19327 2.42211 2.86349 3.62425 3.05892 2.78832 3.50663 3.28341 2.83864 3.43406 3.0449 2.58316
1.91349 2.09611 1.71962 2.35067 2.85976 2.23329 2.3354 3.00702 3.23756 2.17555 2.76727 2.70094 1.74628
3.16237 3.32345 3.8718 3.80725 4.90455 4.48073 4.10663 4.68758 4.42813 4.03683 4.49546 4.10844 1.88106
julia> A * B'
13×13 Matrix{Float64}:
3.21111 3.63236 3.60901 3.42567 4.02387 4.42455 3.78604 4.14247 3.90693 3.75106 4.28329 4.65109 3.62135
2.98562 2.27237 2.43335 3.50058 4.19308 3.65243 3.37025 3.83939 3.60735 3.11551 3.39936 3.44949 2.92242
2.66924 3.08483 3.67439 3.24781 4.3147 4.44565 3.74392 4.17439 4.14373 3.90033 3.50683 3.26632 3.22764
3.30279 3.12434 2.83558 3.38436 4.23039 4.41239 3.35741 3.79297 3.58183 3.29104 3.45312 3.61621 3.21767
2.77241 2.80838 2.23948 3.82555 4.77529 4.13492 3.73856 3.94799 3.69966 3.33374 4.02254 3.63971 3.11314
2.59138 2.99149 3.29702 2.77667 3.9835 4.59488 3.24307 3.36162 3.14416 3.48404 2.93357 3.27987 3.48901
2.19375 3.11538 2.99718 3.37941 5.23863 4.17066 3.73315 4.428 4.78703 3.46704 3.72306 3.87886 2.96676
1.79388 1.9374 1.60787 2.40525 3.53875 2.73698 2.49986 3.09528 2.78403 2.29148 2.9501 2.66847 2.19448
2.92898 2.96992 2.81447 3.48228 4.17282 4.01132 3.78475 4.21871 4.142 3.84598 4.13346 4.17309 3.47024
2.21216 2.78679 2.60065 2.4549 3.28243 3.42619 3.12983 3.94333 3.58097 3.43396 3.64217 3.66227 3.1028
2.08495 2.19327 2.42211 2.86349 3.62425 3.05892 2.78832 3.50663 3.28341 2.83864 3.43406 3.0449 2.58316
1.91349 2.09611 1.71962 2.35067 2.85976 2.23329 2.3354 3.00702 3.23756 2.17555 2.76727 2.70094 1.74628
3.16237 3.32345 3.8718 3.80725 4.90455 4.48073 4.10663 4.68758 4.42813 4.03683 4.49546 4.10844 3.44541
julia> C1 .== A * B'
13×13 BitMatrix:
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 0
julia> A[end,:]' * B[end,:]
3.4454147532268555
This is on the latest master, using wrapper code from here to call the gemm routines from Julia.
It seems that Windows Support is implied yet I couldn't find any explicit documentation about it in the web site (For instance, Installation section in https://blasfeo.syscop.de).
Are there any official instructions to create a Static and Shared libraries of BLASFEO under Windows?
Hey guys,
I found another bug in blasfeo_dgemm_nn.
I am using BLASFEO_TARGET = X64_INTEL_SANDY_BRIDGE.
BLASFEO_VERSION = HIGH_PERFORMANCE is giving the wrong results, whereas REFERENCE gives the correct result
I wrote the following example and hope it helps fixing it.
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
// blasfeo
#include <blasfeo/include/blasfeo_target.h>
#include <blasfeo/include/blasfeo_common.h>
#include <blasfeo/include/blasfeo_d_aux.h>
#include <blasfeo/include/blasfeo_d_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_v_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_d_blas.h>
int main() {
int nx = 8;
int nu = 2;
int nZ = 4;
int nK1 = nx * 4;
struct blasfeo_dmat A;
struct blasfeo_dmat B;
struct blasfeo_dmat C;
struct blasfeo_dmat result;
double *some_doubles;
some_doubles = (double*) calloc(10, sizeof(double));
for (int ii = 0; ii < 10; ii++) {
some_doubles[ii] = (double) ii;
}
blasfeo_allocate_dmat(nx, nx+nu, &B); //
blasfeo_allocate_dmat(nK1, nx, &A); //
blasfeo_allocate_dmat(nK1, nu, &result); //
blasfeo_allocate_dmat(nK1, nu, &C); //
for (int ii = 0; ii < nx; ii++) {
blasfeo_pack_dmat(1,10, &some_doubles[0], 1, &B, ii,0);
}
blasfeo_dgese(nK1, nx, 1.0, &A, 0, 0);
printf("A = \n");
blasfeo_print_dmat(nK1, nx, &A, 0, 0);
printf("B_multiplication = \n");
blasfeo_print_dmat(nx, nu, &B, 0, nx);
blasfeo_dgemm_nn(nK1, nu, nx, -1.0, &A, 0, 0, &B, 0, nx, 1.0, &C, 0, 0, &result , 0, 0); // Blasfeo HP & Reference differ here
printf("A * B_multiplication: result = \n");
blasfeo_print_exp_dmat(nK1, nu, &result,0,0);
return 0;
}
I just got really confused by this:
// y = y + alpha*x
void daxpy_libstr(int kmax, double alpha, struct d_strvec *sx, int xi, struct d_strvec *sy, int yi, struct d_strvec *sz, int zi);
Wondering what z is used for. I guess the comment should be z=y+alpha*x.
There are some similar comments in this .h file.
Fixing these comments, would make blasfeo easier to use for beginners, like me, i guess.
Hi all,
When I tried to compile the branch with ct-v2
tag, it fails with the following information
[ 92%] Building C object test_problems/CMakeFiles/s_blas.dir/test_blas_s.c.o [ 95%] Linking C executable s_blas ../libblasfeo.a(s_aux_lib8.c.o): In function
sgead_libstr':
s_aux_lib8.c:(.text+0x4c48): undefined reference to kernel_sgead_8_7_gen_lib8' s_aux_lib8.c:(.text+0x4da2): undefined reference to
kernel_sgead_8_7_lib8'
s_aux_lib8.c:(.text+0x4e64): undefined reference to kernel_sgead_8_0_lib8' s_aux_lib8.c:(.text+0x4ebb): undefined reference to
kernel_sgead_8_0_gen_lib8'
s_aux_lib8.c:(.text+0x4f4a): undefined reference to kernel_sgead_8_3_lib8' s_aux_lib8.c:(.text+0x4fb2): undefined reference to
kernel_sgead_8_3_gen_lib8'
s_aux_lib8.c:(.text+0x4ff3): undefined reference to kernel_sgead_8_0_gen_lib8' s_aux_lib8.c:(.text+0x50aa): undefined reference to
kernel_sgead_8_1_lib8'
s_aux_lib8.c:(.text+0x5112): undefined reference to kernel_sgead_8_1_gen_lib8' s_aux_lib8.c:(.text+0x519b): undefined reference to
kernel_sgead_8_2_lib8'
s_aux_lib8.c:(.text+0x5201): undefined reference to kernel_sgead_8_2_gen_lib8' s_aux_lib8.c:(.text+0x523e): undefined reference to
kernel_sgead_8_1_gen_lib8'
s_aux_lib8.c:(.text+0x526c): undefined reference to kernel_sgead_8_7_gen_lib8' s_aux_lib8.c:(.text+0x5302): undefined reference to
kernel_sgead_8_4_lib8'
s_aux_lib8.c:(.text+0x536a): undefined reference to kernel_sgead_8_4_gen_lib8' s_aux_lib8.c:(.text+0x5402): undefined reference to
kernel_sgead_8_5_lib8'
s_aux_lib8.c:(.text+0x546a): undefined reference to kernel_sgead_8_5_gen_lib8' s_aux_lib8.c:(.text+0x54a6): undefined reference to
kernel_sgead_8_2_gen_lib8'
s_aux_lib8.c:(.text+0x5552): undefined reference to kernel_sgead_8_6_lib8' s_aux_lib8.c:(.text+0x55ba): undefined reference to
kernel_sgead_8_6_gen_lib8'
s_aux_lib8.c:(.text+0x55f6): undefined reference to kernel_sgead_8_3_gen_lib8' s_aux_lib8.c:(.text+0x564e): undefined reference to
kernel_sgead_8_4_gen_lib8'
s_aux_lib8.c:(.text+0x56b9): undefined reference to kernel_sgead_8_5_gen_lib8' s_aux_lib8.c:(.text+0x56f7): undefined reference to
kernel_sgead_8_6_gen_lib8'
collect2: error: ld returned 1 exit status
test_problems/CMakeFiles/s_blas.dir/build.make:95: recipe for target 'test_problems/s_blas' failed
make[2]: *** [test_problems/s_blas] Error 1
CMakeFiles/Makefile2:124: recipe for target 'test_problems/CMakeFiles/s_blas.dir/all' failed
make[1]: *** [test_problems/CMakeFiles/s_blas.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
`
My system is Ubuntu 16.04 64bits, the way I compile it is simply cmake
and make
.
Then I tried to compile the master
branch, it works without any problem.
Could anyone give me any hint?
Thank you in advance!
Best,
Kahn
I was wondering what blasfeo_dpotrf_l_mn
does. There is no documentation on it in blasfeo_d_blas.h
.
As pointed out in #27 aliasing of routines arguments can lead to unexpected and unwanted behaviors.
To mitigate this problem I propose two incremental solutions:
The following routine segfaults on windows with the error written on the title:
blasfeo_dtrsm_lunn
(note that the routine blasfeo_dtrsm_lunu is used just before without problems)
blasfeo is compiled with HIGH_PERFORMANCE and GENERIC (REFERENCE works fine)
It seems that Windows Support is implied yet I couldn't find any explicit documentation about it in the web site (For instance, Installation section in https://blasfeo.syscop.de).
Are there any official instructions to create a Static and Shared libraries of BLASFEO under Windows?
I'm a earlier.
I have changed target to armv7 in Makefile.rule and make.
but error occur.
gcc: error :unreconized command line option '-mfpu=neon-vfpv4'
what do i have to do?
thanks in advance.
Hi,
I have a reasonably new Laptop with the following CPU:
Intel® Core™ i7-3520M CPU @ 2.90GHz × 4
and Ubuntu 18.04.
Unfortunately, the new default target (X64_INTEL_HASWELL) does not work for me.
I think it would be good to downgrade the default target to SANDY_BRIDGE or GENERIC, in order to reduce the installation effort of HPIPM.
It would be great to either eliminate those or provide a cmake variable to compile blasfeo without the examples.
Seeing the benchmarks of BLASFEO and how it beats Intel MKL on small matrices made want to create a MATLAB MEX wrapper for it to speed up small matrices calculations.
The logic was, since BLASFEO beats Intel MKL on tests with no overheads with MEX I'd beat MATLAB by a lot since MATLAB only adds overhead on top of MKL and doesn't use MKL_DIRECT_CALL
.
All reasons to be optimistic.
I implemented a MEX wrapper around blasfeo_dgemm()
and validated it against MATLAB (The error is almost nothing).
Then I did a run time analysis:
Now, the BLASFEO MEX working in place (Namely it receives a pre allocated matrix to write the result onto) while MATLAB has to use its regular API (Allocates the output, overhead on the input).
Yet still it much faster than BLASFEO compiled with AVX2
code path.
MATLAB does use Multi Threading (I don't know the threshold, but it does as I can see on the CPU Utilization graph). But even for very small matrices (Size 2:10
) MATLAB beats BLASFEO.
This is the analysis MATLAB File - RunTimeAnalysis.zip.
When manually compiling the generic kernel implementation, there are a few functions that have a mismatch in the number of arguments in the header and the source (and/or different argument or return types). The functions are:
kernel_dgemm_diag_left_4_a0_lib4
kernel_dgemm_diag_left_4_lib4
kernel_dlarf_t_4_lib4
kernel_strcp_l_2_0_lib4
kernel_sgemm_diag_left_4_a0_lib4
kernel_sgemm_diag_left_4_lib4
It seems that some have been changed in the avx implementation a while ago (e.g. kernel_dlarf_t_4_lib4 in 4604ddc on Apr 25, 2017) but not in the generic.
@tmmsartor
The case of a mat with one dimension equal to 1 (e.g. converting a row or a column) should be "fast" (i.e. coded explicitly in the routine and without falling back to kernels) as it is common.
Hey guys,
I am using BLASFEO_VERSION = HIGH_PERFORMANCE and BLASFEO_TARGET = X64_INTEL_SANDY_BRIDGE, which worked quite well for me before.
But now I got wrong results using blasfeo_dgemm_nn. I wrote the following minimal example:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
// blasfeo
#include <blasfeo/include/blasfeo_target.h>
#include <blasfeo/include/blasfeo_common.h>
#include <blasfeo/include/blasfeo_d_aux.h>
#include <blasfeo/include/blasfeo_d_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_v_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_d_blas.h>
int main() {
int nx = 8;
int nu = 2;
int nZ = 4;
struct blasfeo_dmat B;
struct blasfeo_dmat A;
struct blasfeo_dmat result;
double *some_doubles;
some_doubles = (double*) calloc(10, sizeof(double));
for (int ii = 0; ii < 10; ii++) {
some_doubles[ii] = (double) ii;
}
blasfeo_allocate_dmat(nx, nx+nu, &B); //
blasfeo_allocate_dmat(nZ, nx, &A); //
blasfeo_allocate_dmat(nZ, nu, &result); //
for (int ii = 0; ii < nx; ii++) {
blasfeo_pack_dmat(1,10, &some_doubles[0], 1, &B, ii,0);
}
for (int ii = 0; ii < 4; ii++) {
blasfeo_pack_dmat(1,1, &some_doubles[1],1, &A, 0+ii,1+2*ii);
}
blasfeo_pack_dmat(1,1, &some_doubles[1],1, &A, 0,1);
printf("A = \n");
blasfeo_print_dmat(nZ, nx, &A,0,0);
printf("B_multiplication = \n");
blasfeo_print_dmat(nx, nu, &B,0,nx);
blasfeo_dgemm_nn(nZ, nu , nx, 1.0, &A, 0, 0, &B, 0, nx, 0.0, &A, 0, 0, &result, 0, 0);
printf("A * B_multiplication: result = \n");
blasfeo_print_dmat(nZ, nu, &result,0,0);
return 0;
}
This gives me:
`
A =
0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000
B_multiplication =
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000`
8.00000 9.00000
8.00000 9.00000
A * B_multiplication: result =
8.00000 9.00000
8.00000 9.00000
0.00000 0.00000
0.00000 0.00000
I will switch to using the REFERENCE instead of HP for now. Which gives the correct result:
A * B_multiplication: result =
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000
8.00000 9.00000
Create a BLASFEO timing, working on all platforms.
to @tmmsartor ?
/usr/bin/x86_64-linux-gnu-ld: CMakeFiles/blasfeo.dir/kernel/avx2/kernel_dgemm_12x4_lib4.S.o: relocation R_X86_64_PC32 against symbol `inner_kernel_dgemm_add_nt_12x4_lib4' can not be used when making a shared object; recompile with -fPIC /usr/bin/x86_64-linux-gnu-ld: final link failed: Bad value collect2: error: ld returned 1 exit status
BLASFEO
fails to compile as shared library both with make
and Cmake
for some gcc
version/configuration,
(tested and failing on ubuntu with gcc 4.9, 4.8 and 6)
Broken since 62d64013, with this commit some x86 internal assembly "functions" are exported as global symbols, before this change the shared library compilation is working.
Error message is bogus when linking hand written assembly code.
Most relevant reference: stackoverflow 1, 2, 3 and PR #68 .
Using MACROS_LEVEL=2
solve the issue. In this case the global symbol is not even defined.
Checked against openblas
Actual value:
( 2.7854532671410537 0 0 0 0 )
( 0.83913770035387469 2.9938299960925923 0 0 0 )
( 0.58622882080751648 0.70368090020327312 2.1283975704681657 0 0 )
( 0.38255481974053374 0.48675723887823252 -0.025473476463104543 2.4204282617974231 0 )
( 0.39976739552731955 0.72857415255088964 -0.19353788779731101 0.0968379384920936 2.4809996556130165 )
Expected value:
( 2.7854532671410532 0 0 0 0 )
( 0.83913770035387469 2.9938299960925918 0 0 0 )
( 0.58622882080751648 0.7036809002032729 2.6269317099125673 0 0 )
( 0.38255481974053374 0.48675723887823247 0.23201943119372695 2.48300863498413 0 )
( 0.39976739552731966 0.72857415255088964 0.31399001255216946 0.35722002546495896 2.4445854284358015 )
Starting from column 2 we see that the results differ.
The following program segfaults for HASWELL (not for GENERIC). I'm trying to compute the cholesky decomposition of a submatrix and store it in a smaller matrix. For GENERIC, I get the correct result (checked in the last lines).
#include <stdlib.h>
#include "blasfeo_d_blas.h"
#include "blasfeo_d_aux.h"
#include "blasfeo_d_aux_ext_dep.h"
int main()
{
double R[4] = {4, 2, 2, 2};
double Q[4] = {1, 2, 2, 8};
double S[4] = {-0.25, -0.5, -0.75, -1};
struct blasfeo_dmat RSQ, L;
int num_bytes = blasfeo_memsize_dmat(4, 4);
void *raw_mem = malloc(num_bytes);
blasfeo_create_dmat(4, 4, &RSQ, raw_mem);
blasfeo_dgese(4, 4, 0.0, &RSQ, 0, 0);
num_bytes = blasfeo_memsize_dmat(2, 2);
raw_mem = malloc(num_bytes);
blasfeo_create_dmat(2, 2, &L, raw_mem);
blasfeo_dgese(2, 2, 0.0, &L, 0, 0);
blasfeo_pack_dmat(2, 2, R, 2, &RSQ, 0, 0);
blasfeo_pack_dmat(2, 2, Q, 2, &RSQ, 2, 2);
blasfeo_pack_tran_dmat(2, 2, S, 2, &RSQ, 2, 0);
/// Segfault occurs here
blasfeo_dpotrf_l(2, &RSQ, 0, 0, &L, 0, 0);
///
blasfeo_print_dmat(2, 2, &L, 0, 0);
blasfeo_dgemm_nt(2, 2, 2, 1.0, &L, 0, 0, &L, 0, 0, 0.0, &L, 0, 0, &L, 0, 0);
blasfeo_print_dmat(2, 2, &L, 0, 0);
}
Here is the list of functions in math.h in C89 and C99
https://en.wikibooks.org/wiki/C_Programming/math.h
There is no 'test' target.
Hi Gianluca,
I believe this line should be:
$<INSTALL_INTERFACE:${BLASFEO_HEADERS_INSTALLATION_DIRECTORY}>)
instead of:
$<INSTALL_INTERFACE:include/blasfeo/include>)
Best, Niels
BLASFEO convert matrix format from row-major into panel-major.Whether the cost is calculated in the experiment?
Sorry to bother you again, but the "getting_started" and "example_d_ricatti_recursion" have memory leaks according to valgrind.
Have you ever considered making the input vectors/matrices to the BLASFEO routines const
? It seems standard with other BLAS implementations, e.g. Netlib CBLAS does this: http://www.netlib.org/blas/cblas.h
If you change to for example
void blasfeo_daxpy(int kmax, const double alpha, const struct blasfeo_dvec *sx, int xi, const struct blasfeo_dvec *sy, int yi, struct blasfeo_dvec *sz, int zi);
it would have the following advantages:
(blasfeo_dvec *)...
or const_cast<blasfeo_dvec *>(...)
in wrapper codeAfter making three variable modifications to Makefile.rule
:
BLAS_API = 1
FORTRAN_BLAS_API = 1
CBLAS_API = 1
of commit 78c3120, I run make
and then receive the following error:
$ make
Parsing Makefile.rule
Makefile.rule:324: *** stack size likely to be exceeded, please decrease the value of K_MAX_STACK . Stop.
Looking at Makefile.rule
more closely, I find around line 321:
STACK_SIZE := $(shell ulimit -s)
STACK_SIZE_EXCEEDED := $(shell echo $(K_MAX_STACK)*12*8*2 \> $(STACK_SIZE)*1024 | bc )
ifeq ($(STACK_SIZE_EXCEEDED), 1)
$(error stack size likely to be exceeded, please decrease the value of K_MAX_STACK )
endif
CFLAGS += -DK_MAX_STACK=$(K_MAX_STACK)
It appears that this code doesn't work properly when ulimit -s
returns unlimited
, as it does on my current Cray Linux Environment 6 system:
$ uname -a
Linux nid00019 4.4.103-6.38_4.0.95-cray_ari_c #1 SMP Fri Feb 9 17:52:44 UTC 2018 (172b90b) x86_64 x86_64 x86_64 GNU/Linux
I checked a previous version of BLASFEO that was working fine (2c9f312) and it appears that the code snippet above is new.
Commenting out the first five lines above (all except for update of CFLAGS
) seems to allow the build system to finish normally.
When running valgrind on the d_riccati_recursion example,
it returns invalid read errors in the functions:
blasfeo_dtrmv_lnn (d_blas2_lib4.c:575) and blasfeo_pack_tran_dmat (d_aux_lib4.c:2184)
/usr/ports/math/blasfeo/work/blasfeo-0.1.2/examples/tools.c:412:10: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
temp = abs(*(ptrA+j*row));
^
/usr/ports/math/blasfeo/work/blasfeo-0.1.2/examples/tools.c:412:10: note: use function 'fabs' instead
temp = abs(*(ptrA+j*row));
^~~
fabs
/usr/ports/math/blasfeo/work/blasfeo-0.1.2/examples/tools.c:415:12: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
temp += abs(*(ptrA+j*row+i));
^
/usr/ports/math/blasfeo/work/blasfeo-0.1.2/examples/tools.c:415:12: note: use function 'fabs' instead
temp += abs(*(ptrA+j*row+i));
^~~
fabs
When compiling with TARGET_ARMV7A_ARM_CORTEX_A15
(and at least A7
) with the current gcc in Debian stable (Buster) which is arm-linux-gnueabihf-gcc (Debian 8.3.0-2) 8.3.0 kernel/armv7a/kernel_dgemm_4x4_lib4.S
does not assemble correctly:
kernel_dgemm_4x4_lib4.S: Assembler messages:
kernel_dgemm_4x4_lib4.S:2053: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2122: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2163: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2247: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2302: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2302: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2302: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2302: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2302: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2370: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2530: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2621: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2621: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2621: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2621: Error: co-processor offset out of range
kernel_dgemm_4x4_lib4.S:2621: Error: co-processor offset out of range
This is related to the .LC00
and .LC01
labels that hold zero constants that are used in various places as second parameter for the fldd
instructions. The labels are not allowed to have too great offsets from their usages but apparently the assembler thinks this is the case in the above cases. I could not find a fix for that but my assembler foo is lacking ;)
BTW: there is another breakage on armv7a regarding bad instructions but I have a fix for that and will open a pull request soon.
The preprocessed output of /usr/lib/gcc-cross/arm-linux-gnueabihf/8/cc1 -E -lang-asm -quiet -v -imultilib . -imultiarch arm-linux-gnueabihf -D MACRO_LEVEL=1 -D OS_LINUX -D TARGET_ARMV7A_ARM_CORTEX_A15 kernel_dgemm_4x4_lib4.S -mfpu=neon-vfpv4 -mfloat-abi=hard -mthumb -mtls-dialect=gnu -march=armv7-a+neon-vfpv4 -fno-directives-only
is attached. It's then fed into /usr/lib/gcc-cross/arm-linux-gnueabihf/8/../../../../arm-linux-gnueabihf/bin/as -v -march=armv7-a -mfloat-abi=hard -mfpu=neon-vfpv4 -meabi=5 -o kernel_dgemm_4x4_lib4.o
which produces the errors above. The flow is easily visible by executing arm-linux-gnueabihf-gcc -DMACRO_LEVEL=1 -DOS_LINUX -mfpu=neon-vfpv4 -DTARGET_ARMV7A_ARM_CORTEX_A15 -c -o kernel_dgemm_4x4_lib4.o kernel_dgemm_4x4_lib4.S -v
(note the -v
) in kernel/armv7a
.
Hey guys,
I think I found a Bug in the Sandy Bridge implementation of blasfeo_dgead.
Namely when using the function
void blasfeo_dgead(int m, int n, double alpha, struct blasfeo_dmat *sA, int ai, int aj, struct blasfeo_dmat *sC, int ci, int cj)
with
m = 5, n = 8, ci = 15, cj = 0.
It seems like the corresponding block is not added, the entries stay zero in contrast to the Generic implementation, where it works fine..
Should help to find the bug..
Best
Attached is the source code of a benchmark which measures execution time of the gemm
BLAS routine implemented in BLASFEO and in a BLAS library of choice for different matrix sizes.
The benchmark uses the Google Benchmark library.
Running the benchmark with the following command line
./bench --benchmark_repetitions=5
gives the following output:
2019-09-18 13:08:50
Running ./bench
Run on (4 X 3200 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 6144K (x1)
Load Average: 1.86, 1.50, 1.18
--------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------
BM_gemm_blasfeo/2/2/2 277 ns 277 ns 2521364
BM_gemm_blasfeo/2/2/2 277 ns 277 ns 2521364
BM_gemm_blasfeo/2/2/2 277 ns 277 ns 2521364
BM_gemm_blasfeo/2/2/2 320 ns 320 ns 2521364
BM_gemm_blasfeo/2/2/2 320 ns 320 ns 2521364
BM_gemm_blasfeo/2/2/2_mean 294 ns 294 ns 5
BM_gemm_blasfeo/2/2/2_median 277 ns 277 ns 5
BM_gemm_blasfeo/2/2/2_stddev 23.5 ns 23.5 ns 5
BM_gemm_blasfeo/3/3/3 2143 ns 2143 ns 319712
BM_gemm_blasfeo/3/3/3 2143 ns 2143 ns 319712
BM_gemm_blasfeo/3/3/3 2142 ns 2142 ns 319712
BM_gemm_blasfeo/3/3/3 2228 ns 2228 ns 319712
BM_gemm_blasfeo/3/3/3 2228 ns 2228 ns 319712
BM_gemm_blasfeo/3/3/3_mean 2177 ns 2177 ns 5
BM_gemm_blasfeo/3/3/3_median 2143 ns 2143 ns 5
BM_gemm_blasfeo/3/3/3_stddev 46.6 ns 46.6 ns 5
BM_gemm_blasfeo/5/5/5 11403 ns 11403 ns 61176
BM_gemm_blasfeo/5/5/5 11402 ns 11402 ns 61176
BM_gemm_blasfeo/5/5/5 11402 ns 11402 ns 61176
BM_gemm_blasfeo/5/5/5 2673 ns 2672 ns 61176
BM_gemm_blasfeo/5/5/5 2673 ns 2673 ns 61176
BM_gemm_blasfeo/5/5/5_mean 7911 ns 7910 ns 5
BM_gemm_blasfeo/5/5/5_median 11402 ns 11402 ns 5
BM_gemm_blasfeo/5/5/5_stddev 4781 ns 4781 ns 5
BM_gemm_blasfeo/10/10/10 10092 ns 10092 ns 68876
BM_gemm_blasfeo/10/10/10 10093 ns 10093 ns 68876
BM_gemm_blasfeo/10/10/10 10092 ns 10092 ns 68876
BM_gemm_blasfeo/10/10/10 9707 ns 9707 ns 68876
BM_gemm_blasfeo/10/10/10 9707 ns 9706 ns 68876
BM_gemm_blasfeo/10/10/10_mean 9938 ns 9938 ns 5
BM_gemm_blasfeo/10/10/10_median 10092 ns 10092 ns 5
BM_gemm_blasfeo/10/10/10_stddev 211 ns 211 ns 5
BM_gemm_blasfeo/20/20/20 1078 ns 1078 ns 639117
BM_gemm_blasfeo/20/20/20 1078 ns 1078 ns 639117
BM_gemm_blasfeo/20/20/20 1078 ns 1078 ns 639117
BM_gemm_blasfeo/20/20/20 1066 ns 1066 ns 639117
BM_gemm_blasfeo/20/20/20 1067 ns 1067 ns 639117
BM_gemm_blasfeo/20/20/20_mean 1074 ns 1074 ns 5
BM_gemm_blasfeo/20/20/20_median 1078 ns 1078 ns 5
BM_gemm_blasfeo/20/20/20_stddev 6.34 ns 6.34 ns 5
BM_gemm_blasfeo/30/30/30 2594 ns 2594 ns 268109
BM_gemm_blasfeo/30/30/30 2595 ns 2595 ns 268109
BM_gemm_blasfeo/30/30/30 2594 ns 2594 ns 268109
BM_gemm_blasfeo/30/30/30 2595 ns 2595 ns 268109
BM_gemm_blasfeo/30/30/30 2595 ns 2595 ns 268109
BM_gemm_blasfeo/30/30/30_mean 2594 ns 2594 ns 5
BM_gemm_blasfeo/30/30/30_median 2595 ns 2595 ns 5
BM_gemm_blasfeo/30/30/30_stddev 0.340 ns 0.372 ns 5
BM_gemm_cblas/2/2/2 235 ns 235 ns 2972773
BM_gemm_cblas/2/2/2 235 ns 235 ns 2972773
BM_gemm_cblas/2/2/2 235 ns 235 ns 2972773
BM_gemm_cblas/2/2/2 235 ns 235 ns 2972773
BM_gemm_cblas/2/2/2 235 ns 235 ns 2972773
BM_gemm_cblas/2/2/2_mean 235 ns 235 ns 5
BM_gemm_cblas/2/2/2_median 235 ns 235 ns 5
BM_gemm_cblas/2/2/2_stddev 0.011 ns 0.011 ns 5
BM_gemm_cblas/3/3/3 392 ns 392 ns 1786397
BM_gemm_cblas/3/3/3 392 ns 392 ns 1786397
BM_gemm_cblas/3/3/3 392 ns 392 ns 1786397
BM_gemm_cblas/3/3/3 392 ns 392 ns 1786397
BM_gemm_cblas/3/3/3 392 ns 392 ns 1786397
BM_gemm_cblas/3/3/3_mean 392 ns 392 ns 5
BM_gemm_cblas/3/3/3_median 392 ns 392 ns 5
BM_gemm_cblas/3/3/3_stddev 0.021 ns 0.021 ns 5
BM_gemm_cblas/5/5/5 472 ns 472 ns 1483886
BM_gemm_cblas/5/5/5 472 ns 472 ns 1483886
BM_gemm_cblas/5/5/5 472 ns 472 ns 1483886
BM_gemm_cblas/5/5/5 472 ns 472 ns 1483886
BM_gemm_cblas/5/5/5 472 ns 472 ns 1483886
BM_gemm_cblas/5/5/5_mean 472 ns 472 ns 5
BM_gemm_cblas/5/5/5_median 472 ns 472 ns 5
BM_gemm_cblas/5/5/5_stddev 0.380 ns 0.380 ns 5
BM_gemm_cblas/10/10/10 841 ns 841 ns 817796
BM_gemm_cblas/10/10/10 841 ns 841 ns 817796
BM_gemm_cblas/10/10/10 841 ns 841 ns 817796
BM_gemm_cblas/10/10/10 841 ns 841 ns 817796
BM_gemm_cblas/10/10/10 841 ns 841 ns 817796
BM_gemm_cblas/10/10/10_mean 841 ns 841 ns 5
BM_gemm_cblas/10/10/10_median 841 ns 841 ns 5
BM_gemm_cblas/10/10/10_stddev 0.249 ns 0.249 ns 5
BM_gemm_cblas/20/20/20 1905 ns 1905 ns 364260
BM_gemm_cblas/20/20/20 1905 ns 1905 ns 364260
BM_gemm_cblas/20/20/20 1904 ns 1904 ns 364260
BM_gemm_cblas/20/20/20 1921 ns 1921 ns 364260
BM_gemm_cblas/20/20/20 1922 ns 1921 ns 364260
BM_gemm_cblas/20/20/20_mean 1911 ns 1911 ns 5
BM_gemm_cblas/20/20/20_median 1905 ns 1905 ns 5
BM_gemm_cblas/20/20/20_stddev 9.13 ns 9.14 ns 5
BM_gemm_cblas/30/30/30 4240 ns 4240 ns 165046
BM_gemm_cblas/30/30/30 4240 ns 4239 ns 165046
BM_gemm_cblas/30/30/30 4239 ns 4239 ns 165046
BM_gemm_cblas/30/30/30 4239 ns 4239 ns 165046
BM_gemm_cblas/30/30/30 4238 ns 4238 ns 165046
BM_gemm_cblas/30/30/30_mean 4239 ns 4239 ns 5
BM_gemm_cblas/30/30/30_median 4239 ns 4239 ns 5
BM_gemm_cblas/30/30/30_stddev 0.559 ns 0.554 ns 5
One can see that, according to the benchmark, the execution time of BLASFEO dgemm()
varies a lot, whereas the execution time of the openblas
implementation is very stable.
INTEL_HASWELL
architecture.g++ (Ubuntu 8.3.0-6ubuntu1) 8.3.0
Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
CPU (Skylake architechure).Ubuntu Linux 19.04
with 5.0.0-29-generic
kernel.Here is how the bug can be reproduced: https://gitlab.syscop.de/dimitris.kouzoupis/treeQP-dev/issues/26
We are using Blasfeo in our research project. If we have multiple compilation units in the library we are developing, we get linking errors as the object BLASFEO_PROCESSOR_FEATURES gets defined multiple times:
We did not find where BLASFEO_PROCESSOR_FEATURES is used as an object, but maybe we missed something (could be depending on the options). Perhaps the idea was to use BLASFEO_PROCESSOR_FEATURES as an identifier instead of an object? Or is there a reason for it?
enum BLASFEO_PROCESSOR_FEATURES
{
// x86-64 CPU features
BLASFEO_PROCESSOR_FEATURE_AVX = 0x0001, /// AVX instruction set
BLASFEO_PROCESSOR_FEATURE_AVX2 = 0x0002, /// AVX2 instruction set
BLASFEO_PROCESSOR_FEATURE_FMA = 0x0004, /// FMA instruction set
BLASFEO_PROCESSOR_FEATURE_SSE3 = 0x0008, /// SSE3 instruction set
// ARM CPU features
BLASFEO_PROCESSOR_FEATURE_VFPv3 = 0x0100, /// VFPv3 instruction set
BLASFEO_PROCESSOR_FEATURE_NEON = 0x0100, /// NEON instruction set
BLASFEO_PROCESSOR_FEATURE_VFPv4 = 0x0100, /// VFPv4 instruction set
BLASFEO_PROCESSOR_FEATURE_NEONv2 = 0x0100, /// NEONv2 instruction set
};
Thank you,
Wilm and Lander
While there is some Windows Support it is limited to the GENERIC
code path.
I suggest that support for Clang-CL
will be added in CMAKELists.txt
so if the compiler is Clang-CL
things will work like GCC
on Linux as Clang-CL
should have support for AT&T
style of assembly.
The problem is the configuration somehow doesn't support that.
I get errors like:
..\kernel\generic\kernel_dgemm_4x4_lib4.c(3056,15): error: expected ';' at end of declaration
double CC[16] __declspec(align(64)) = {0};
^
;
..\kernel\generic\kernel_dgemm_4x4_lib4.c(3056,38): error: expected identifier or '('
double CC[16] __declspec(align(64)) = {0};
^
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6398,15): error: expected ';' at end of declaration
double CC[16] __declspec(align(64)) = {0};
^
;
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6398,38): error: expected identifier or '('
double CC[16] __declspec(align(64)) = {0};
^
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6480,15): error: expected ';' at end of declaration
double CC[16] __declspec(align(64)) = {0};
^
;
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6480,38): error: expected identifier or '('
double CC[16] __declspec(align(64)) = {0};
^
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6654,15): error: expected ';' at end of declaration
double CC[16] __declspec(align(64)) = {0};
^
;
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6654,38): error: expected identifier or '('
double CC[16] __declspec(align(64)) = {0};
^
..\kernel\generic\kernel_dgemm_4x4_lib4.c(6757,15): error: expected ';' at end of declaration
double CC[16] __declspec(align(64)) = {0};
Also it tries to use flags like -fPIC -msse3
etc...
Those flags matches GCC
.
SO I suggest that the configuration will support Clang-CL
as it was MSVC
with the only different it will allow it to use different configurations.
I think the flags:
set(C_FLAGS_TARGET_X64_INTEL_HASWELL "-m64 -mavx -mavx2 -mfma")
set(C_FLAGS_TARGET_X64_INTEL_SANDY_BRIDGE "-m64 -mavx")
set(C_FLAGS_TARGET_X64_INTEL_CORE "-m64 -msse3")
set(C_FLAGS_TARGET_X64_AMD_BULLDOZER "-m64 -mavx -mfma")
Needs to be decorated with -XClang
or adapted to Windows.
Before make
, I need to put into comment OS = LINUX
and uncomment #OS = MAC
in Makefile.rule
. Can't this be done automatically?
See e.g. [http://stackoverflow.com/questions/714100/os-detecting-makefile]
The file benchmarks/cpu_freq.h.example
is licensed under GPL3 + classpath. Is this intentional? It's alway a bit tricky to combine GPL with other licenses in one project. Due to the class-path exception, particularly for source distributions.
This might be of interest: https://github.com/dockcross/dockcross
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.