Comments (41)
Nice, it helped a lot:
93% tests passed, 3 tests failed out of 41
Total Test time (real) = 38.24 sec
The following tests FAILED:
36 - clblast_test_xsyrk (Failed)
38 - clblast_test_xsyr2k (OTHER_FAULT)
39 - clblast_test_xher2k (SEGFAULT)
I'll post the remaining errors details later.
from clblast.
Hmm, not good. So there are two types of tests: the regular behaviour
with proper input arguments and invalid buffer sizes
with funny input arguments, such as zero-sized or too small buffers. Apparently only the latter type fails.
I assume this is on the development
branch, given that the verbose output is much more verbose than in the latest version. I re-ran the same command just now on my machine with Beignet on Linux and Intel(R) HD Graphics Skylake ULT GT2
(almost the same), and everything is fine. However, I noticed that verbose mode doesn't output extra information for the invalid buffer sizes
cases, perhaps I should add that to get a little bit of extra information why things go wrong.
The first thing we should try is to find out whether clBLAS (the reference) or CLBlast crashes. Perhaps you can go to line 218 of correctness/testblas.cc
and change:
auto status1 = run_reference_(args, buffers1, queue_);
into:
auto status1 = StatusCode::kSuccess;
If it still crashes then the bug is in CLBlast, otherwise it is in clBLAS (not unreasonable to thing since that library hasn't been tested on Intel/Beignet).
from clblast.
I did your test and tried both clblas and clblast alone and they both segfault ...
I tried to skip both at the same time and I get 23/40 failed.
43% tests passed, 23 tests failed out of 40
Total Test time (real) = 134.66 sec
The following tests FAILED:
11 - clblast_test_xgemv (SEGFAULT)
13 - clblast_test_xhemv (SEGFAULT)
16 - clblast_test_xsymv (SEGFAULT)
19 - clblast_test_xtrmv (SEGFAULT)
20 - clblast_test_xtbmv (SEGFAULT)
21 - clblast_test_xtpmv (SEGFAULT)
22 - clblast_test_xger (SEGFAULT)
23 - clblast_test_xgeru (SEGFAULT)
24 - clblast_test_xgerc (SEGFAULT)
25 - clblast_test_xher (SEGFAULT)
27 - clblast_test_xher2 (SEGFAULT)
28 - clblast_test_xhpr2 (Failed)
29 - clblast_test_xsyr (SEGFAULT)
31 - clblast_test_xsyr2 (SEGFAULT)
32 - clblast_test_xspr2 (Failed)
33 - clblast_test_xgemm (SEGFAULT)
34 - clblast_test_xsymm (SEGFAULT)
35 - clblast_test_xhemm (SEGFAULT)
36 - clblast_test_xsyrk (SEGFAULT)
37 - clblast_test_xherk (SEGFAULT)
38 - clblast_test_xsyr2k (SEGFAULT)
39 - clblast_test_xher2k (SEGFAULT)
40 - clblast_test_xtrmm (SEGFAULT)
Looks like I assumed they all failed the same way a little fast ...
I think I'll open a new issue when this one is solved. One issue at a time !
from clblast.
OK, it seems there is something else causing issues. First of all I've made the invalid-buffer sizes more verbose in verbose mode. For example, the ./clblast_test_xswap -verbose
command would now output:
* Running on OpenCL device 'Iris Pro'.
* Starting tests for the 'SSWAP' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
. -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for 'default':
Config: n=7 incx=1 incy=1 offx=0 offy=0 -> :
Config: n=7 incx=1 incy=2 offx=0 offy=0 -> :
Config: n=7 incx=1 incy=7 offx=0 offy=0 -> :
Config: n=7 incx=2 incy=1 offx=0 offy=0 -> :
Config: n=7 incx=2 incy=2 offx=0 offy=0 -> :
Config: n=7 incx=2 incy=7 offx=0 offy=0 -> :
Config: n=7 incx=7 incy=1 offx=0 offy=0 -> :
Config: n=7 incx=7 incy=2 offx=0 offy=0 -> :
Config: n=7 incx=7 incy=7 offx=0 offy=0 -> :
Config: n=93 incx=1 incy=1 offx=0 offy=0 -> :
Config: n=93 incx=1 incy=2 offx=0 offy=0 -> :
Config: n=93 incx=1 incy=7 offx=0 offy=0 -> :
Config: n=93 incx=2 incy=1 offx=0 offy=0 -> :
Config: n=93 incx=2 incy=2 offx=0 offy=0 -> :
Config: n=93 incx=2 incy=7 offx=0 offy=0 -> :
Config: n=93 incx=7 incy=1 offx=0 offy=0 -> :
Config: n=93 incx=7 incy=2 offx=0 offy=0 -> :
Config: n=93 incx=7 incy=7 offx=0 offy=0 -> :
Config: n=4096 incx=1 incy=1 offx=0 offy=0 -> :
Config: n=4096 incx=1 incy=2 offx=0 offy=0 -> :
Config: n=4096 incx=1 incy=7 offx=0 offy=0 -> :
Config: n=4096 incx=2 incy=1 offx=0 offy=0 -> :
Config: n=4096 incx=2 incy=2 offx=0 offy=0 -> :
Config: n=4096 incx=2 incy=7 offx=0 offy=0 -> :
Config: n=4096 incx=7 incy=1 offx=0 offy=0 -> :
Config: n=4096 incx=7 incy=2 offx=0 offy=0 -> :
Config: n=4096 incx=7 incy=7 offx=0 offy=0 -> :
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for 'default':
Config: n=64 xsize=0 ysize=0 -> .
Config: n=64 xsize=0 ysize=63 -> .
Config: n=64 xsize=0 ysize=64 -> .
Config: n=64 xsize=63 ysize=0 -> .
Config: n=64 xsize=63 ysize=63 -> .
Config: n=64 xsize=63 ysize=64 -> .
Config: n=64 xsize=64 ysize=0 -> .
Config: n=64 xsize=64 ysize=63 -> .
Config: n=64 xsize=64 ysize=64 -> .
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Completed all test-cases for this routine. Results:
36 test(s) passed
0 test(s) skipped
0 test(s) failed
In the last bit, it shows that it is testing swapping of two buffers with 64 elements using smaller sized buffers. Both clBLAS and CLBlast are protected against this behaviour and return appropriate error codes.
One more thing I could think of now is that Beignet isn't happy with zero-sized buffers. Perhaps you can change line 66 of test/correctness/testblas.h
from:
const std::vector<size_t> kVecSizes = {0, kBufferSize - 1, kBufferSize};
into:
const std::vector<size_t> kVecSizes = {kBufferSize - 1, kBufferSize};
Let's see if that helps for the xswap
test.
For the other errors, I would first suggest to test against a CPU BLAS library, since the reference clBLAS might crash or give incorrect results in some cases on Intel GPUs. You can do this by providing -clblas 0 -cblas 1
to the command-line. Perhaps I should make this the default behaviour?
from clblast.
I just tested the dev branch and the issue looks gone:
53% tests passed, 19 tests failed out of 40
Total Test time (real) = 70.75 sec
The following tests FAILED:
11 - clblast_test_xgemv (SEGFAULT)
13 - clblast_test_xhemv (SEGFAULT)
16 - clblast_test_xsymv (SEGFAULT)
19 - clblast_test_xtrmv (SEGFAULT)
22 - clblast_test_xger (SEGFAULT)
23 - clblast_test_xgeru (SEGFAULT)
24 - clblast_test_xgerc (SEGFAULT)
25 - clblast_test_xher (SEGFAULT)
27 - clblast_test_xher2 (SEGFAULT)
29 - clblast_test_xsyr (SEGFAULT)
31 - clblast_test_xsyr2 (SEGFAULT)
33 - clblast_test_xgemm (SEGFAULT)
34 - clblast_test_xsymm (SEGFAULT)
35 - clblast_test_xhemm (SEGFAULT)
36 - clblast_test_xsyrk (SEGFAULT)
37 - clblast_test_xherk (SEGFAULT)
38 - clblast_test_xsyr2k (SEGFAULT)
39 - clblast_test_xher2k (SEGFAULT)
40 - clblast_test_xtrmm (SEGFAULT)
There are 4 more tests that pass compared to the status1/status2 trick 3 days ago.
CBLAS is already the default:
$ ./clblast_test_xgemv -verbose 1 -clblas 0 -cblas 1
* Options given/available:
-platform 0 [=default]
-device 0 [=default]
-full_test [false]
-verbose [true]
-clblas 0 [=default]
-cblas 1 [=default]
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'SGEMV' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
. -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for '101 (row-major) 111 (regular)':
Config: m=61 n=61 lda=61 incx=1 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=1 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=1 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=1 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=1 incy=7 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=1 incy=7 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=2 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=2 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=2 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=2 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=2 incy=7 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=2 incy=7 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=7 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=7 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=7 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=7 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=7 incy=7 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=7 incy=7 offa=0 offx=0 offy=0 -> :
Segmentation fault
I tried with clblas as reference:
$ ./clblast_test_xgemv -verbose 1 -clblas 1 -cblas 0
* Options given/available:
-platform 0 [=default]
-device 0 [=default]
-full_test [false]
-verbose [true]
-clblas 1
-cblas 0
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'SGEMV' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
. -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for '101 (row-major) 111 (regular)':
Config: m=61 n=61 lda=61 incx=1 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=1 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=1 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=1 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=1 incy=7 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=1 incy=7 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=2 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=2 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=2 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=2 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=2 incy=7 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=2 incy=7 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=7 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=7 incy=1 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=7 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=7 incy=2 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=61 incx=7 incy=7 offa=0 offx=0 offy=0 -> :
Config: m=61 n=61 lda=512 incx=7 incy=7 offa=0 offx=0 offy=0 -> :
Segmentation fault
from clblast.
OK, I should have said this: the development
branch now has the CPU BLAS as a default, so it is not testing against clBLAS anymore (unless you specify). I don't understand why the original issue doesn't show up anymore though...
From your other tests with GEMV we can conclude that the issue is indeed in CLBlast and not in one of the reference libraries. The configuration afterwards is:
Config: m=61 n=512 lda=61 incx=1 incy=1 offa=0 offx=0 offy=0 -> .
The small dot there at the end denotes that it is an invalid configuration, i.e. the library should return with a status-code instead of actually trying to run it. I think that is the common thing across all your tests: it only fails for 'invalid' configurations.
Are you on the latest Beignet by the way? Perhaps that is influencing the results as well somehow?
from clblast.
I use beignet 1.1.2.
What's your version ?
from clblast.
I'm on a git version from 2 weeks back. I had to do that because my Skylake GPU is quite new. But 1.1.2 seems to be from April this year, so that's quite recent.
I'll try to think of ways how to debug this property. But for now I think you can actually use the library: it only crashes for invalid configurations it seems.
from clblast.
As some tuners fail too, maybe we can focus on that.
I'll post all failing unit tests too in case you see something obvious.
from clblast.
The tuners crash as well? I'll also investigate the current issue further, but I don't have time until Monday.
from clblast.
That is weird. make alltuners
fails during Xgemm reporting a seg fault:
[ RUN ] Running Xgemm
[ OK ] Completed Xgemm (2607 ms) - 10 out of 117
[ RUN ] Running Xgemm
[ OK ] Completed Xgemm (170 ms) - 11 out of 117
[ RUN ] Running Xgemm
[ OK ] Completed Xgemm (545 ms) - 12 out of 117
[ RUN ] Running Xgemm
[ OK ] Completed Xgemm (2408 ms) - 13 out of 117
[ RUN ] Running Xgemm
[ OK ] Completed Xgemm (1366 ms) - 14 out of 117
CMakeFiles/alltuners.dir/build.make:57: recipe for target 'CMakeFiles/alltuners' failed
make[3]: *** [CMakeFiles/alltuners] Segmentation fault
CMakeFiles/Makefile2:146: recipe for target 'CMakeFiles/alltuners.dir/all' failed
make[2]: *** [CMakeFiles/alltuners.dir/all] Error 2
CMakeFiles/Makefile2:153: recipe for target 'CMakeFiles/alltuners.dir/rule' failed
make[1]: *** [CMakeFiles/alltuners.dir/rule] Error 2
Makefile:186: recipe for target 'alltuners' failed
make: *** [alltuners] Error 2
but running clblast_tuner_xgemm directly works fine ...
from clblast.
Here is the issue (with complex numbers):
$ ./clblast_tuner_xgemm -precision 3232
* Options given/available:
-platform 0 [=default]
-device 0 [=default]
-precision 3232 (complex-single)
-m 1024 [=default]
-n 1024 [=default]
-k 1024 [=default]
-alpha 2+0.5i [=default]
-beta 2+0.5i [=default]
-fraction 2048.000000 [=default]
[==========] Initializing on platform 0 device 0
[==========] Device name: 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (OpenCL 1.2 beignet 1.1.2)
[----------] Testing reference Xgemm
[ RUN ] Running Xgemm
[ OK ] Completed Xgemm (460 ms) - 1 out of 1
[----------] Testing kernel Xgemm
[ RUN ] Running Xgemm
[ OK ] Completed Xgemm (189 ms) - 1 out of 117
[ RUN ] Running Xgemm
[ OK ] Completed Xgemm (1471 ms) - 2 out of 117
Segmentation fault
from clblast.
OK, thanks for running the tuner and showing the output. First thing to check for now is to see whether it is a bug in the compiler or in CLTune or the CLBlast kernels. Because I don't know how to do that properly if I can't re-produce the errors myself, I've added a 'VERBOSE' setting to CLTune. So, could you do the following for me:
- Pull the latest version of the
development
branch of CLTune - Run
cmake -DVERBOSE=ON ..
to enable 'VERBOSE' mode - Compile and install CLTune
- Re-build the CLBlast tuners
- Re-run the tuner and post the output here
Perhaps it is not verbose enough yet, but this would be the first step I guess.
Thanks!
from clblast.
The answer is Compilation !
$ ./clblast_tuner_xgemm -precision 3232
* Options given/available:
-platform 0 [=default]
-device 0 [=default]
-precision 3232 (complex-single)
-m 1024 [=default]
-n 1024 [=default]
-k 1024 [=default]
-alpha 2+0.5i [=default]
-beta 2+0.5i [=default]
-fraction 2048.000000 [=default]
[==========] Initializing on platform 0 device 0
[==========] Device name: 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (OpenCL 1.2 beignet 1.1.2)
[----------] Testing reference Xgemm
[ VERBOSE ] Starting compilation
[ VERBOSE ] Finished compilation
[ VERBOSE ] Creating a copy of the output buffer
[ VERBOSE ] Setting kernel arguments
[ RUN ] Running Xgemm
[ VERBOSE ] Launching kernel (1 out of 1 for averaging)
[ OK ] Completed Xgemm (461 ms) - 1 out of 1
[----------] Testing kernel Xgemm
[ VERBOSE ] Computing the permutations of all parameters
[ VERBOSE ] Exploring configuration (1 out of 117)
[ VERBOSE ] Starting compilation
[ VERBOSE ] Finished compilation
[ VERBOSE ] Creating a copy of the output buffer
[ VERBOSE ] Setting kernel arguments
[ RUN ] Running Xgemm
[ VERBOSE ] Launching kernel (1 out of 1 for averaging)
[ OK ] Completed Xgemm (2495 ms) - 1 out of 117
[ VERBOSE ] Exploring configuration (2 out of 117)
[ VERBOSE ] Starting compilation
[ VERBOSE ] Finished compilation
[ VERBOSE ] Creating a copy of the output buffer
[ VERBOSE ] Setting kernel arguments
[ RUN ] Running Xgemm
[ VERBOSE ] Launching kernel (1 out of 1 for averaging)
[ OK ] Completed Xgemm (1923 ms) - 2 out of 117
[ VERBOSE ] Exploring configuration (3 out of 117)
[ VERBOSE ] Starting compilation
[ VERBOSE ] Finished compilation
[ VERBOSE ] Creating a copy of the output buffer
[ VERBOSE ] Setting kernel arguments
[ RUN ] Running Xgemm
[ VERBOSE ] Launching kernel (1 out of 1 for averaging)
[ OK ] Completed Xgemm (1126 ms) - 3 out of 117
[ VERBOSE ] Exploring configuration (4 out of 117)
[ VERBOSE ] Starting compilation
[ VERBOSE ] Finished compilation
[ VERBOSE ] Creating a copy of the output buffer
[ VERBOSE ] Setting kernel arguments
[ RUN ] Running Xgemm
[ VERBOSE ] Launching kernel (1 out of 1 for averaging)
[ OK ] Completed Xgemm (3122 ms) - 4 out of 117
[ VERBOSE ] Exploring configuration (5 out of 117)
[ VERBOSE ] Starting compilation
[ VERBOSE ] Finished compilation
[ VERBOSE ] Creating a copy of the output buffer
[ VERBOSE ] Setting kernel arguments
[ RUN ] Running Xgemm
[ VERBOSE ] Launching kernel (1 out of 1 for averaging)
[ OK ] Completed Xgemm (178 ms) - 5 out of 117
[ VERBOSE ] Exploring configuration (6 out of 117)
[ VERBOSE ] Starting compilation
Segmentation fault
from clblast.
If it helps, here is the output of valgrind (with lots of memory error suppressed):
$ valgrind --tool=memcheck --show-leak-kinds=definite --error-limit=no ./clblast_tuner_xgemm -precision 3232
...
[ VERBOSE ] Exploring configuration (4 out of 117)
[ VERBOSE ] Starting compilation
==13154== Invalid write of size 8
==13154== at 0x858A62E: gbe::Kernel::setSamplerSet(gbe::ir::SamplerSet*) (in /usr/lib64/beignet/libgbe.so)
==13154== by 0x85841B1: gbe::Program::buildFromUnit(gbe::ir::Unit const&, std::string&) (in /usr/lib64/beignet/libgbe.so)
==13154== by 0x8583F91: gbe::Program::buildFromLLVMFile(char const*, void const*, std::string&, int) (in /usr/lib64/beignet/libgbe.so)
==13154== by 0x86FF22C: gbe::genProgramNewFromLLVM(unsigned int, char const*, void const*, void const*, char const*, unsigned long, char*, unsigned long*, int) (in /usr/lib64/beignet/libgbe.so)
==13154== by 0x858890F: gbe::programNewFromSource(unsigned int, char const*, unsigned long, char const*, char*, unsigned long*) (in /usr/lib64/beignet/libgbe.so)
==13154== by 0x54D27F1: cl_program_build (in /usr/lib64/beignet/libcl.so)
==13154== by 0x54C6C0B: clBuildProgram (in /usr/lib64/beignet/libcl.so)
==13154== by 0x529789C: cltune::TunerImpl::RunKernel(std::string const&, cltune::KernelInfo const&, unsigned long, unsigned long) (in /usr/lib64/libcltune.so)
==13154== by 0x529989D: cltune::TunerImpl::Tune() (in /usr/lib64/libcltune.so)
==13154== by 0x4177A2: void clblast::Tuner<clblast::TuneXgemm<std::complex<float> >, std::complex<float> >(int, char**) (in /home/thomas/src/CLBlast/build/clblast_tuner_xgemm)
==13154== by 0x4081DC: main (in /home/thomas/src/CLBlast/build/clblast_tuner_xgemm)
==13154== Address 0x58 is not stack'd, malloc'd or (recently) free'd
==13154==
==13154==
==13154== Process terminating with default action of signal 11 (SIGSEGV)
==13154== Access not within mapped region at address 0x58
==13154== at 0x858A62E: gbe::Kernel::setSamplerSet(gbe::ir::SamplerSet*) (in /usr/lib64/beignet/libgbe.so)
==13154== by 0x85841B1: gbe::Program::buildFromUnit(gbe::ir::Unit const&, std::string&) (in /usr/lib64/beignet/libgbe.so)
==13154== by 0x8583F91: gbe::Program::buildFromLLVMFile(char const*, void const*, std::string&, int) (in /usr/lib64/beignet/libgbe.so)
==13154== by 0x86FF22C: gbe::genProgramNewFromLLVM(unsigned int, char const*, void const*, void const*, char const*, unsigned long, char*, unsigned long*, int) (in /usr/lib64/beignet/libgbe.so)
==13154== by 0x858890F: gbe::programNewFromSource(unsigned int, char const*, unsigned long, char const*, char*, unsigned long*) (in /usr/lib64/beignet/libgbe.so)
==13154== by 0x54D27F1: cl_program_build (in /usr/lib64/beignet/libcl.so)
==13154== by 0x54C6C0B: clBuildProgram (in /usr/lib64/beignet/libcl.so)
==13154== by 0x529789C: cltune::TunerImpl::RunKernel(std::string const&, cltune::KernelInfo const&, unsigned long, unsigned long) (in /usr/lib64/libcltune.so)
==13154== by 0x529989D: cltune::TunerImpl::Tune() (in /usr/lib64/libcltune.so)
==13154== by 0x4177A2: void clblast::Tuner<clblast::TuneXgemm<std::complex<float> >, std::complex<float> >(int, char**) (in /home/thomas/src/CLBlast/build/clblast_tuner_xgemm)
==13154== by 0x4081DC: main (in /home/thomas/src/CLBlast/build/clblast_tuner_xgemm)
==13154== If you believe this happened as a result of a stack
==13154== overflow in your program's main thread (unlikely but
==13154== possible), you can try to increase the size of the
==13154== main thread stack using the --main-stacksize= flag.
==13154== The main thread stack size used in this run was 8388608.
==13154==
==13154== HEAP SUMMARY:
==13154== in use at exit: 191,945,013 bytes in 802,573 blocks
==13154== total heap usage: 23,654,057 allocs, 22,851,484 frees, 2,182,157,488 bytes allocated
==13154==
==13154== LEAK SUMMARY:
==13154== definitely lost: 10,988 bytes in 182 blocks
==13154== indirectly lost: 6,074,124 bytes in 31,357 blocks
==13154== possibly lost: 15,540,120 bytes in 266,188 blocks
==13154== still reachable: 170,319,781 bytes in 504,846 blocks
==13154== suppressed: 0 bytes in 0 blocks
==13154== Rerun with --leak-check=full to see details of leaked memory
==13154==
==13154== For counts of detected and suppressed errors, rerun with: -v
==13154== Use --track-origins=yes to see where uninitialised values come from
==13154== ERROR SUMMARY: 4243812 errors from 245 contexts (suppressed: 0 from 0)
Segmentation fault
from clblast.
Yes, so we can indeed conclude this is an issue with Beignet or the Intel drivers.
What you can do is modify src/tuner_impl.cc
line 254 and replace
fprintf(stdout, "%s Starting compilation\n", kMessageVerbose.c_str());
with
fprintf(stdout, "%s Starting compilation\n%s\n", kMessageVerbose.c_str(), source.c_str());
Then, copy-paste the faulty kernel and report it to the developers of Beignet, possibly with a small test program that does nothing else than compilation. Note that this kernel can be quite long for GEMM. In the worst-case if this kernel is not valid OpenCL, the Beignet compiler should still report the error instead of crash with a segfault.
Before you do this, I recommend building the latest version of Beignet from the git source repository. And then run the included unit tests, first see if they pass. That's what the developers of Beignet will ask you to do I guess.
Unfortunately Beignet doesn't seem mature enough yet. I've seen some issues myself on Skylake GPUs, mostly with FP16 though.
from clblast.
I switched to beignet git HEAD and it works:
[----------] Printing best result in database format to stdout
{ "Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile", { {"MWG",64}, {"NWG",64}, {"KWG",32}, {"MDIMC",16}, {"NDIMC",16}, {"MDIMA",16}, {"NDIMB",8}, {"KWI",2}, {"VWM",4}, {"VWN",2}, {"STRM",1}, {"STRN",0}, {"SA",0}, {"SB",1}, {"PRECISION",3232} } }
[ -------> ] 121.6 ms or 17.7 GFLOPS
I think we are done with this issue, thanks a lot for your help!
from clblast.
OK, good to hear that a new version of Beignet helped with the Tuner issues. But the original issue was with the tests, right? So I suggest that you pull the latests version of the CLBlast development
branch (which includes the tuner results for your device) and re-run the tests. If that goes fine, we can close this issue.
from clblast.
Before pulling new dev HEAD:
51% tests passed, 20 tests failed out of 41
Total Test time (real) = 25.75 sec
The following tests FAILED:
11 - clblast_test_xgemv (SEGFAULT)
13 - clblast_test_xhemv (SEGFAULT)
16 - clblast_test_xsymv (SEGFAULT)
19 - clblast_test_xtrmv (SEGFAULT)
22 - clblast_test_xger (SEGFAULT)
23 - clblast_test_xgeru (SEGFAULT)
24 - clblast_test_xgerc (SEGFAULT)
25 - clblast_test_xher (SEGFAULT)
27 - clblast_test_xher2 (SEGFAULT)
29 - clblast_test_xsyr (SEGFAULT)
31 - clblast_test_xsyr2 (SEGFAULT)
33 - clblast_test_xgemm (SEGFAULT)
34 - clblast_test_xsymm (SEGFAULT)
35 - clblast_test_xhemm (SEGFAULT)
36 - clblast_test_xsyrk (SEGFAULT)
37 - clblast_test_xherk (SEGFAULT)
38 - clblast_test_xsyr2k (SEGFAULT)
39 - clblast_test_xher2k (SEGFAULT)
40 - clblast_test_xtrmm (SEGFAULT)
41 - clblast_test_xomatcopy (SEGFAULT)
After pulling new dev HEAD (Updating 61105e3..66908ef):
51% tests passed, 20 tests failed out of 41
Total Test time (real) = 53.28 sec
The following tests FAILED:
11 - clblast_test_xgemv (SEGFAULT)
13 - clblast_test_xhemv (SEGFAULT)
16 - clblast_test_xsymv (SEGFAULT)
19 - clblast_test_xtrmv (SEGFAULT)
22 - clblast_test_xger (SEGFAULT)
23 - clblast_test_xgeru (SEGFAULT)
24 - clblast_test_xgerc (SEGFAULT)
25 - clblast_test_xher (SEGFAULT)
27 - clblast_test_xher2 (SEGFAULT)
29 - clblast_test_xsyr (SEGFAULT)
31 - clblast_test_xsyr2 (SEGFAULT)
33 - clblast_test_xgemm (SEGFAULT)
34 - clblast_test_xsymm (SEGFAULT)
35 - clblast_test_xhemm (SEGFAULT)
36 - clblast_test_xsyrk (SEGFAULT)
37 - clblast_test_xherk (SEGFAULT)
38 - clblast_test_xsyr2k (SEGFAULT)
39 - clblast_test_xher2k (SEGFAULT)
40 - clblast_test_xtrmm (SEGFAULT)
41 - clblast_test_xomatcopy (SEGFAULT)
It doesn't seem to help with the unit tests. If thoses tests aren't that important, maybe we can could use a signal handling or child process strategy so that a seg fault in Beignet doesn't crash the whole unit test ...
What do you think ?
from clblast.
Here is the output of all failling tests in case you want to check them:
test_results.txt
from clblast.
Thanks for the data, I will look into it as soon as I have some time. Quick look tells me again the only failures are for tests which should return an error code. So although not crucial, still it would be good if the error codes were returned correctly. And I am also curious why this happens, since no actual OpenCL kernel should be compiled/executed in that case.
So this is on the git version of Beignet I presume, the one you used to run the tuners successfully?
from clblast.
I just checked it: indeed, it only crashes for tests that should return an error code.
I am still not sure what is the cause of this issue, so I added extra printing statements (and std::flush) to the tests, hopefully they will help us locate the source of the error, whether it is in the test code or in one of the tested libraries.
Could you re-run one of those failing tests after pulling in the latest changes from development
? I see output like this for example for the GER tests:
./clblast_test_xger -verbose -device 1
* Options given/available:
-platform 0 [=default]
-device 1
-full_test [false]
-verbose [true]
-clblas 0 [=default]
-cblas 1 [=default]
* Running on OpenCL device 'Iris Pro'.
* Starting tests for the 'SGER' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
. -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for '101 (row-major)':
Testing: m=61 n=61 lda=61 incx=1 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=1 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=1 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=1 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=1 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=1 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=2 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=2 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=2 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=2 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=2 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=2 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=7 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=7 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=7 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=7 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=7 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=7 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=512 lda=61 incx=1 incy=1 offa=0 offx=0 offy=0 [CLBlast] -> .
Testing: m=61 n=512 lda=512 incx=1 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=512 lda=61 incx=1 incy=2 offa=0 offx=0 offy=0 [CLBlast] -> .
Testing: m=61 n=512 lda=512 incx=1 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=512 lda=61 incx=1 incy=7 offa=0 offx=0 offy=0 [CLBlast] -> .
Testing: m=61 n=512 lda=512 incx=1 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=512 lda=61 incx=2 incy=1 offa=0 offx=0 offy=0 [CLBlast] -> .
Testing: m=61 n=512 lda=512 incx=2 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=512 lda=61 incx=2 incy=2 offa=0 offx=0 offy=0 [CLBlast] -> .
Testing: m=61 n=512 lda=512 incx=2 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=512 lda=61 incx=2 incy=7 offa=0 offx=0 offy=0 [CLBlast] -> .
(...)
from clblast.
Is this still an issue with the newest version of Beignet?
from clblast.
Hi,
I just tried with latest beignet GIT and I don't get any segfault on tuners.
I get a few errors:
[ RUN ] Running Xger
[ FAILED ] Kernel Xger failed
[ FAILED ] catched exception: Internal OpenCL error: -54
[ WARNING ] Results differ: L2 norm is 6.41e+06
[ FAILED ] Xger; 0 ms; WGS1 512; WGS2 1; WPT 4;PRECISION 3232;
[ OK ] Completed Xgemm (701 ms) - 20 out of 117
device compiler error/warning: Xgemm:(GBE): error: failed in Gen backend.
[ FAILED ] Kernel Xgemm failed
[ FAILED ] catched exception: device compiler error/warning occurred ^^
[ FAILED ] Xgemm; 0 ms; MWG 128; NWG 128; KWG 32; MDIMC 8; NDIMC 8; MDIMA 16; NDIMB 16; KWI 2; VWM 8; VWN 4; STRM 0; STRN 1; SA 1; SB 0;PRECISION 3232;
[ OK ] Completed Xgemm (445 ms) - 35 out of 117
device compiler error/warning: Xgemm:(GBE): error: failed in Gen backend.
[ FAILED ] Kernel Xgemm failed
[ FAILED ] catched exception: device compiler error/warning occurred ^^
[ FAILED ] Xgemm; 0 ms; MWG 128; NWG 128; KWG 32; MDIMC 8; NDIMC 8; MDIMA 16; NDIMB 32; KWI 2; VWM 1; VWN 1; STRM 0; STRN 1; SA 0; SB 0;PRECISION 3232;
[ OK ] Completed Xgemm (272 ms) - 40 out of 117
device compiler error/warning: Xgemm:(GBE): error: failed in Gen backend.
[ FAILED ] Kernel Xgemm failed
[ FAILED ] catched exception: device compiler error/warning occurred ^^
[ FAILED ] Xgemm; 0 ms; MWG 128; NWG 128; KWG 16; MDIMC 8; NDIMC 8; MDIMA 16; NDIMB 16; KWI 2; VWM 1; VWN 8; STRM 1; STRN 0; SA 1; SB 1;PRECISION 3232;
[ OK ] Completed Xgemm (293 ms) - 76 out of 117
device compiler error/warning: Xgemm:(GBE): error: failed in Gen backend.
[ FAILED ] Kernel Xgemm failed
[ FAILED ] catched exception: device compiler error/warning occurred ^^
[ FAILED ] Xgemm; 0 ms; MWG 128; NWG 128; KWG 32; MDIMC 8; NDIMC 8; MDIMA 8; NDIMB 32; KWI 8; VWM 4; VWN 1; STRM 0; STRN 0; SA 1; SB 1;PRECISION 3232;
[ OK ] Completed Xgemm (327 ms) - 85 out of 117
device compiler error/warning: Xgemm:(GBE): error: failed in Gen backend.
[ FAILED ] Kernel Xgemm failed
[ FAILED ] catched exception: device compiler error/warning occurred ^^
[ FAILED ] Xgemm; 0 ms; MWG 128; NWG 128; KWG 16; MDIMC 8; NDIMC 8; MDIMA 8; NDIMB 8; KWI 2; VWM 8; VWN 2; STRM 0; STRN 0; SA 1; SB 1;PRECISION 3232;
I'm not sure whether its an issue or not. What do you think ?
from clblast.
Incorrect results during tuning are automatically filtered out, so you don't have to worry. Well, as long as not all tuning results fail of course :-)
I also had some problems myself with Beignet, it seems it is not 100%. error: failed in Gen backend
is not the type of error you hope to see from your compiler.
What about the tests, do they work?
from clblast.
I'll test on the Haswell laptop asap.
I run my linux on a Broadwell laptop today:
$ clinfo
Platform #0
Name: Intel Gen OCL Driver
Version: OpenCL 1.2 beignet 1.2 (git-8bc5d28)
Device #0
Name: Intel(R) HD Graphics 5500 BroadWell U-Processor GT2
Type: GPU
Version: OpenCL 1.2 beignet 1.2 (git-8bc5d28)
Global memory size: 3 GB 888 MB
Local memory size: 64 kB
Max work group size: 512
Max work item sizes: (512, 512, 512)
I get this results:
The following tests FAILED:
2 - clblast_test_xscal (SEGFAULT)
11 - clblast_test_xgemv (SEGFAULT)
12 - clblast_test_xgbmv (OTHER_FAULT)
13 - clblast_test_xhemv (SEGFAULT)
16 - clblast_test_xsymv (SEGFAULT)
17 - clblast_test_xsbmv (OTHER_FAULT)
18 - clblast_test_xspmv (OTHER_FAULT)
19 - clblast_test_xtrmv (SEGFAULT)
20 - clblast_test_xtbmv (OTHER_FAULT)
21 - clblast_test_xtpmv (OTHER_FAULT)
22 - clblast_test_xger (SEGFAULT)
23 - clblast_test_xgeru (SEGFAULT)
24 - clblast_test_xgerc (SEGFAULT)
25 - clblast_test_xher (SEGFAULT)
27 - clblast_test_xher2 (SEGFAULT)
29 - clblast_test_xsyr (SEGFAULT)
31 - clblast_test_xsyr2 (SEGFAULT)
32 - clblast_test_xspr2 (Failed)
33 - clblast_test_xgemm (SEGFAULT)
34 - clblast_test_xsymm (SEGFAULT)
35 - clblast_test_xhemm (SEGFAULT)
36 - clblast_test_xsyrk (SEGFAULT)
37 - clblast_test_xherk (SEGFAULT)
38 - clblast_test_xsyr2k (SEGFAULT)
39 - clblast_test_xher2k (SEGFAULT)
40 - clblast_test_xtrmm (SEGFAULT)
41 - clblast_test_xomatcopy (SEGFAULT)
Errors while running CTest
There is also a lot of compiler errors in the tuners.
I'll go through the thread again. Maybe BroadWell have similar issues :(
from clblast.
Thanks for the test. Perhaps these issues are in the FP16 versions of the kernels only? What happens if you for example look at the full output of clblast_test_xscal
? On my Skylake Intel GPU I also see a lot of errors with FP16 - again something that's not well implemented in Beignet. Perhaps I should disable inclusion of those tests when running make test
or make alltests
.
from clblast.
Back to Haswell (I didn't have time to run verbose tests on the Broadwell laptop)
The following tests FAILED:
11 - clblast_test_xgemv (SEGFAULT)
13 - clblast_test_xhemv (SEGFAULT)
16 - clblast_test_xsymv (SEGFAULT)
19 - clblast_test_xtrmv (SEGFAULT)
22 - clblast_test_xger (SEGFAULT)
23 - clblast_test_xgeru (SEGFAULT)
24 - clblast_test_xgerc (SEGFAULT)
25 - clblast_test_xher (SEGFAULT)
27 - clblast_test_xher2 (SEGFAULT)
29 - clblast_test_xsyr (SEGFAULT)
31 - clblast_test_xsyr2 (SEGFAULT)
33 - clblast_test_xgemm (SEGFAULT)
34 - clblast_test_xsymm (SEGFAULT)
35 - clblast_test_xhemm (SEGFAULT)
36 - clblast_test_xsyrk (SEGFAULT)
37 - clblast_test_xherk (SEGFAULT)
38 - clblast_test_xsyr2k (SEGFAULT)
39 - clblast_test_xher2k (SEGFAULT)
40 - clblast_test_xtrmm (SEGFAULT)
41 - clblast_test_xomatcopy (SEGFAULT)
Errors while running CTest
I did not rebuild anything, there is quite less errors on Haswell.
Verbose tests on the way.
from clblast.
$ ./clblast_test_xgemv -verbose true
* Options given/available:
-platform 0 [=default]
-device 0 [=default]
-full_test [false]
-verbose [true]
-clblas 0 [=default]
-cblas 1 [=default]
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'SGEMV' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
- -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for '101 (row-major) 111 (regular)':
Testing: m=61 n=61 lda=61 incx=1 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=1 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=1 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=1 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=1 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=1 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=2 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=2 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=2 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=2 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=2 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=2 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=7 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=7 incy=1 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=7 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=7 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=7 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=7 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=512 lda=61 incx=1 incy=1 offa=0 offx=0 offy=0 [CLBlast]Segmentation fault
Valgrind output at the segfault: libcl again
Testing: m=61 n=61 lda=512 incx=7 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=512 lda=61 incx=1 incy=1 offa=0 offx=0 offy=0 [CLBlast]==4621== Invalid read of size 8
==4621== at 0x6C79BC5: clWaitForEvents (in /usr/lib64/beignet/libcl.so)
==4621== by 0x435DF8: clblast::TestXgemv<float>::RunRoutine(clblast::Arguments<float> const&, clblast::Buffers<float>&, clblast::Queue&) (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==4621== by 0x48AEDA: clblast::TestBlas<float, float>::TestRegular(std::vector<clblast::Arguments<float>, std::allocator<clblast::Arguments<float> > >&, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==4621== by 0x43E9D8: unsigned long clblast::RunTests<clblast::TestXgemv<float>, float, float>(int, char**, bool, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==4621== by 0x4232B1: main (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==4621== Address 0x18 is not stack'd, malloc'd or (recently) free'd
==4621==
==4621==
==4621== Process terminating with default action of signal 11 (SIGSEGV)
==4621== Access not within mapped region at address 0x18
==4621== at 0x6C79BC5: clWaitForEvents (in /usr/lib64/beignet/libcl.so)
==4621== by 0x435DF8: clblast::TestXgemv<float>::RunRoutine(clblast::Arguments<float> const&, clblast::Buffers<float>&, clblast::Queue&) (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==4621== by 0x48AEDA: clblast::TestBlas<float, float>::TestRegular(std::vector<clblast::Arguments<float>, std::allocator<clblast::Arguments<float> > >&, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==4621== by 0x43E9D8: unsigned long clblast::RunTests<clblast::TestXgemv<float>, float, float>(int, char**, bool, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==4621== by 0x4232B1: main (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==4621== If you believe this happened as a result of a stack
==4621== overflow in your program's main thread (unlikely but
==4621== possible), you can try to increase the size of the
==4621== main thread stack using the --main-stacksize= flag.
==4621== The main thread stack size used in this run was 8388608.
==4621==
==4621== HEAP SUMMARY:
==4621== in use at exit: 14,574,819 bytes in 96,493 blocks
==4621== total heap usage: 855,203 allocs, 758,710 frees, 98,981,746 bytes allocated
==4621==
==4621== LEAK SUMMARY:
==4621== definitely lost: 2 bytes in 2 blocks
==4621== indirectly lost: 0 bytes in 0 blocks
==4621== possibly lost: 471,500 bytes in 10,317 blocks
==4621== still reachable: 14,103,317 bytes in 86,174 blocks
==4621== suppressed: 0 bytes in 0 blocks
==4621== Rerun with --leak-check=full to see details of leaked memory
==4621==
==4621== For counts of detected and suppressed errors, rerun with: -v
==4621== Use --track-origins=yes to see where uninitialised values come from
==4621== ERROR SUMMARY: 254323 errors from 181 contexts (suppressed: 0 from 0)
Segmentation fault
from clblast.
Could you perhaps try again with the latest Beignet and CLBlast? CLBlast now has the tuning parameters for your devices included, perhaps that changes something. If not, please post the latest output again and I'll re-investigate what could be the cause. Thanks!
from clblast.
with beignet 2c1f246 (current HEAD) and clblast b1929d8 (current dev HEAD): identical results.
It till crashes at libcl:
Testing: m=61 n=61 lda=512 incx=7 incy=2 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=61 incx=7 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=61 lda=512 incx=7 incy=7 offa=0 offx=0 offy=0 [CLBlast] [CPU BLAS] -> :
Testing: m=61 n=512 lda=61 incx=1 incy=1 offa=0 offx=0 offy=0 [CLBlast]==30716== Invalid read of size 8
==30716== at 0x6C802C5: clWaitForEvents (in /usr/lib64/beignet/libcl.so)
==30716== by 0x435EA8: clblast::TestXgemv<float>::RunRoutine(clblast::Arguments<float> const&, clblast::Buffers<float>&, clblast::Queue&) (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==30716== by 0x48B11A: clblast::TestBlas<float, float>::TestRegular(std::vector<clblast::Arguments<float>, std::allocator<clblast::Arguments<float> > >&, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==30716== by 0x43EA88: unsigned long clblast::RunTests<clblast::TestXgemv<float>, float, float>(int, char**, bool, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==30716== by 0x423361: main (in /home/thomas/src/CLBlast/build/clblast_test_xgemv)
==30716== Address 0x78 is not stack'd, malloc'd or (recently) free'd
==30716==
==30716==
==30716== Process terminating with default action of signal 11 (SIGSEGV)
==30716== Access not within mapped region at address 0x78
A pointer with a value of 0x78
is obviously wrong.
Maybe there is something with auto status = Gemv(args.layout, ... , &event);
in test/routines/level2/xgemv.hpp
setting a wrong value to event
.
Then clWaitForEvents(1, &event);
would fail.
from clblast.
Indeed, you are right. Your valgrind trace helped me locate the issue. It crashes indeed on clWaitForEvents
with an invalid event. And earlier on I observed already that your crash happens only in particular cases:
indeed, it only crashes for tests that should return an error code.
Taking both observations together: in case the CLBlast routine doesn't finish correctly (it doesn't return StatusCode::kSuccess
) its event is also not allocated and thus waiting for it is wrong. I have now guarded all the clWaitForEvents
statements in the tests and samples against this. Also, I've added clReleaseEvent
to also fix the memory leak.
This is fixed in the development
branch in commit d595a8e. Can you test it again?
from clblast.
First clblast_test_xsyrk:
$ ./clblast_test_xsyrk
* Options given/available:
-platform 0 [=default]
-device 0 [=default]
-full_test [false]
-verbose [false]
-clblas 0 [=default]
-cblas 1 [=default]
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'SSYRK' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
- -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 111 (regular)':
::::--::-X-X---X
Error rate 12.9%: n=64 k=7 lda=7 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=64 lda=64 ldc=64 offa=0 offc=0
Pass rate 37.5%: 6 passed / 7 skipped / 3 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 111 (regular)':
::::--::-:-:---:
Pass rate 56.2%: 9 passed / 7 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 112 (transposed)':
::::::::---X---X
Error rate 12.9%: n=64 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=64 lda=64 ldc=64 offa=0 offc=0
Pass rate 50.0%: 8 passed / 6 skipped / 2 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 112 (transposed)':
::::::::---:---:
Pass rate 62.5%: 10 passed / 6 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 111 (regular)':
::::::::---X---X
Error rate 12.9%: n=64 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=64 lda=64 ldc=64 offa=0 offc=0
Pass rate 50.0%: 8 passed / 6 skipped / 2 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 111 (regular)':
::::::::---:---:
Pass rate 62.5%: 10 passed / 6 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 112 (transposed)':
::::--::-X-X---X
Error rate 12.8%: n=64 k=7 lda=7 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=64 lda=64 ldc=64 offa=0 offc=0
Pass rate 37.5%: 6 passed / 7 skipped / 3 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 112 (transposed)':
::::--::-:-:---:
Pass rate 56.2%: 9 passed / 7 skipped / 0 failed
* Completed all test-cases for this routine. Results:
66 test(s) passed
52 test(s) skipped
10 test(s) failed
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'DSYRK' routine.
* All tests skipped: Unsupported precision
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'CSYRK' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
- -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 111 (regular)':
::::--::-:-:---:
Pass rate 56.2%: 9 passed / 7 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 111 (regular)':
::::--::-:-:---:
Pass rate 56.2%: 9 passed / 7 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 112 (transposed)':
::::::::---:---:
Pass rate 62.5%: 10 passed / 6 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 112 (transposed)':
::::::::---:---:
Pass rate 62.5%: 10 passed / 6 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 111 (regular)':
::::::::---:---:
Pass rate 62.5%: 10 passed / 6 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 111 (regular)':
::::::::---:---:
Pass rate 62.5%: 10 passed / 6 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 112 (transposed)':
::::--::-:-:---:
Pass rate 56.2%: 9 passed / 7 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 112 (transposed)':
::::--::-:-:---:
Pass rate 56.2%: 9 passed / 7 skipped / 0 failed
* Completed all test-cases for this routine. Results:
76 test(s) passed
52 test(s) skipped
0 test(s) failed
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'ZSYRK' routine.
* All tests skipped: Unsupported precision
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'HSYRK' routine.
* All tests skipped: Unsupported precision
I tried with CLBlas with little different results:
$ ./clblast_test_xsyrk -clblas 1 -cblas 0
* Options given/available:
-platform 0 [=default]
-device 0 [=default]
-full_test [false]
-verbose [false]
-clblas 1
-cblas 0
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'SSYRK' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
- -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 111 (regular)':
XXXX..::.X.X...X
Error rate 57.1%: n=7 k=7 lda=7 ldc=7 offa=0 offc=0
Error rate 57.1%: n=7 k=7 lda=7 ldc=64 offa=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldc=7 offa=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=7 lda=7 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=64 lda=64 ldc=64 offa=0 offc=0
Pass rate 56.2%: 9 passed / 0 skipped / 7 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 121 (upper) 111 (regular)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 111 (regular)':
::::..::.:.:...:
Pass rate 100.0%: 16 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 122 (lower) 111 (regular)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 112 (transposed)':
::::::::...X...X
Error rate 12.9%: n=64 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=64 lda=64 ldc=64 offa=0 offc=0
Pass rate 87.5%: 14 passed / 0 skipped / 2 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 121 (upper) 112 (transposed)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 112 (transposed)':
::::::::...:...:
Pass rate 100.0%: 16 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 122 (lower) 112 (transposed)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 111 (regular)':
::::::::...X...X
Error rate 12.9%: n=64 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=64 lda=64 ldc=64 offa=0 offc=0
Pass rate 87.5%: 14 passed / 0 skipped / 2 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 121 (upper) 111 (regular)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 111 (regular)':
::::::::...:...:
Pass rate 100.0%: 16 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 122 (lower) 111 (regular)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 112 (transposed)':
XXXX..::.X.X...X
Error rate 57.1%: n=7 k=7 lda=7 ldc=7 offa=0 offc=0
Error rate 57.1%: n=7 k=7 lda=7 ldc=64 offa=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldc=7 offa=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.8%: n=64 k=7 lda=7 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.9%: n=64 k=64 lda=64 ldc=64 offa=0 offc=0
Pass rate 56.2%: 9 passed / 0 skipped / 7 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 121 (upper) 112 (transposed)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 112 (transposed)':
::::..::.:.:...:
Pass rate 100.0%: 16 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 122 (lower) 112 (transposed)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Completed all test-cases for this routine. Results:
182 test(s) passed
0 test(s) skipped
18 test(s) failed
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'DSYRK' routine.
* All tests skipped: Unsupported precision
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'CSYRK' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
- -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 111 (regular)':
XXXX..XX.:.:...:
Error rate 30.6%: n=7 k=7 lda=7 ldc=7 offa=0 offc=0
Error rate 36.7%: n=7 k=7 lda=7 ldc=64 offa=0 offc=0
Error rate 36.7%: n=7 k=7 lda=64 ldc=7 offa=0 offc=0
Error rate 18.4%: n=7 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 12.2%: n=7 k=64 lda=64 ldc=7 offa=0 offc=0
Error rate 14.3%: n=7 k=64 lda=64 ldc=64 offa=0 offc=0
Pass rate 62.5%: 10 passed / 0 skipped / 6 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 121 (upper) 111 (regular)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 111 (regular)':
::::..::.:.:...:
Pass rate 100.0%: 16 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 122 (lower) 111 (regular)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 112 (transposed)':
::::::::...:...:
Pass rate 100.0%: 16 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 121 (upper) 112 (transposed)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 112 (transposed)':
::::::::...:...:
Pass rate 100.0%: 16 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 122 (lower) 112 (transposed)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 111 (regular)':
::::::::...:...:
Pass rate 100.0%: 16 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 121 (upper) 111 (regular)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 111 (regular)':
::::::::...:...:
Pass rate 100.0%: 16 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 122 (lower) 111 (regular)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 112 (transposed)':
XXXX..XX.:.:...:
Error rate 34.7%: n=7 k=7 lda=7 ldc=7 offa=0 offc=0
Error rate 20.4%: n=7 k=7 lda=7 ldc=64 offa=0 offc=0
Error rate 36.7%: n=7 k=7 lda=64 ldc=7 offa=0 offc=0
Error rate 36.7%: n=7 k=7 lda=64 ldc=64 offa=0 offc=0
Error rate 36.7%: n=7 k=64 lda=64 ldc=7 offa=0 offc=0
Error rate 32.7%: n=7 k=64 lda=64 ldc=64 offa=0 offc=0
Pass rate 62.5%: 10 passed / 0 skipped / 6 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 121 (upper) 112 (transposed)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 112 (transposed)':
::::..::.:.:...:
Pass rate 100.0%: 16 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 122 (lower) 112 (transposed)':
.........
Pass rate 100.0%: 9 passed / 0 skipped / 0 failed
* Completed all test-cases for this routine. Results:
188 test(s) passed
0 test(s) skipped
12 test(s) failed
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'ZSYRK' routine.
* All tests skipped: Unsupported precision
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'HSYRK' routine.
* All tests skipped: Unsupported precision
I think that's a tricky one.
The two others (clblast_test_xsyr2k and clblast_test_xher2k) are memory issues (free error and segfault), I'll try to locate it.
from clblast.
clblast_test_xher2k crashes in CBLAS:
Testing: n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 [CLBlast] [CPU BLAS]==21485== Invalid write of size 4
==21485== at 0x6C7C15F: cblas_cher2k (in /usr/lib64/libgslcblas.so.0.0.0)
==21485== by 0x429BBD: clblast::cblasXher2k(CBLAS_ORDER, CBLAS_UPLO, CBLAS_TRANSPOSE, unsigned long, unsigned long, std::complex<float>, std::vector<std::complex<float>, std::allocator<std::complex<float> > > const&, unsigned long, unsigned long, std::vector<std::complex<float>, std::allocator<std::complex<float> > > const&, unsigned long, unsigned long, float, std::vector<std::complex<float>, std::allocator<std::complex<float> > >&, unsigned long, unsigned long) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x43314C: clblast::TestXher2k<std::complex<float>, float>::RunReference2(clblast::Arguments<float> const&, clblast::Buffers<std::complex<float> >&, clblast::Queue&) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x47DB76: clblast::TestBlas<std::complex<float>, float>::TestRegular(std::vector<clblast::Arguments<float>, std::allocator<clblast::Arguments<float> > >&, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x437E38: unsigned long clblast::RunTests<clblast::TestXher2k<std::complex<float>, float>, std::complex<float>, float>(int, char**, bool, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x4206A1: main (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== Address 0x98aad7c is 28 bytes after a block of size 32 in arena "client"
==21485==
==21485== Invalid read of size 4
==21485== at 0x6C7C191: cblas_cher2k (in /usr/lib64/libgslcblas.so.0.0.0)
==21485== by 0x429BBD: clblast::cblasXher2k(CBLAS_ORDER, CBLAS_UPLO, CBLAS_TRANSPOSE, unsigned long, unsigned long, std::complex<float>, std::vector<std::complex<float>, std::allocator<std::complex<float> > > const&, unsigned long, unsigned long, std::vector<std::complex<float>, std::allocator<std::complex<float> > > const&, unsigned long, unsigned long, float, std::vector<std::complex<float>, std::allocator<std::complex<float> > >&, unsigned long, unsigned long) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x43314C: clblast::TestXher2k<std::complex<float>, float>::RunReference2(clblast::Arguments<float> const&, clblast::Buffers<std::complex<float> >&, clblast::Queue&) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x47DB76: clblast::TestBlas<std::complex<float>, float>::TestRegular(std::vector<clblast::Arguments<float>, std::allocator<clblast::Arguments<float> > >&, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x437E38: unsigned long clblast::RunTests<clblast::TestXher2k<std::complex<float>, float>, std::complex<float>, float>(int, char**, bool, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x4206A1: main (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== Address 0x98aad78 is 24 bytes after a block of size 32 in arena "client"
==21485==
==21485== Invalid write of size 4
==21485== at 0x6C7C196: cblas_cher2k (in /usr/lib64/libgslcblas.so.0.0.0)
==21485== by 0x429BBD: clblast::cblasXher2k(CBLAS_ORDER, CBLAS_UPLO, CBLAS_TRANSPOSE, unsigned long, unsigned long, std::complex<float>, std::vector<std::complex<float>, std::allocator<std::complex<float> > > const&, unsigned long, unsigned long, std::vector<std::complex<float>, std::allocator<std::complex<float> > > const&, unsigned long, unsigned long, float, std::vector<std::complex<float>, std::allocator<std::complex<float> > >&, unsigned long, unsigned long) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x43314C: clblast::TestXher2k<std::complex<float>, float>::RunReference2(clblast::Arguments<float> const&, clblast::Buffers<std::complex<float> >&, clblast::Queue&) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x47DB76: clblast::TestBlas<std::complex<float>, float>::TestRegular(std::vector<clblast::Arguments<float>, std::allocator<clblast::Arguments<float> > >&, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x437E38: unsigned long clblast::RunTests<clblast::TestXher2k<std::complex<float>, float>, std::complex<float>, float>(int, char**, bool, std::string const&) (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== by 0x4206A1: main (in /home/thomas/src/CLBlast/build/clblast_test_xher2k)
==21485== Address 0x98aad78 is 24 bytes after a block of size 32 in arena "client"
==21485==
valgrind: m_mallocfree.c:304 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 96, hi = 1106500021.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata. If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away. Please try that before reporting this as a bug.
clblast_test_xher2k -clblas 1 -cblas 0
passes without errors.
clblast_test_xsyr2k -clblas 1 -cblas 0
fails 20 tests without crashing.
from clblast.
Good to see that most tests now pass. I'll look into the few failure cases in a couple of days. Thanks for the feedback and data!
from clblast.
I have just fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters. Could you perhaps try the tests again and see if those are now successful?
from clblast.
clblast_test_xsyrk
now works !
95% tests passed, 2 tests failed out of 41
Total Test time (real) = 51.54 sec
The following tests FAILED:
38 - clblast_test_xsyr2k (SEGFAULT)
39 - clblast_test_xher2k (Failed)
./clblast_test_xher2k -clblas 1 -cblas 0
passes.
./clblast_test_xsyr2k -clblas 1 -cblas 0
fails
There is something realy wrong with CBLAS calls:
./clblast_test_xher2k -clblas 0 -cblas 1
has a bad memory crash (libc memory corruption):
valgrind: m_mallocfree.c:304 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 128, hi = 13972549651098155320.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata. If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away. Please try that before reporting this as a bug.
./clblast_test_xsyr2k -clblas 0 -cblas 1
has test failures then seg faults:
X:X::X:--24316-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--24316-- si_code=80; Faulting address: 0x0; sp: 0x807008dd0
valgrind: the 'impossible' happened:
Killed by fatal signal
from clblast.
Details for xsyr2k:
$ ./clblast_test_xsyr2k -clblas 1 -cblas 0
* Options given/available:
-platform 0 [=default]
-device 0 [=default]
-full_test [false]
-verbose [false]
-clblas 1
-cblas 0
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'SSYR2K' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
- -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 111 (regular)':
XXXXXXXX......X:.:.:.:.:.......:
Error rate 57.1%: n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0
Error rate 24.5%: n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0
Pass rate 71.9%: 23 passed / 0 skipped / 9 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 121 (upper) 111 (regular)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 111 (regular)':
::::::::......::.:.:.:.:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 122 (lower) 111 (regular)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 112 (transposed)':
::::::::::::::::.......:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 121 (upper) 112 (transposed)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 112 (transposed)':
::::::::::::::::.......:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 122 (lower) 112 (transposed)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 111 (regular)':
::::::::::::::::.......:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 121 (upper) 111 (regular)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 111 (regular)':
::::::::::::::::.......:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 122 (lower) 111 (regular)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 112 (transposed)':
XXXXXXXX......X:.:.:.:.:.......:
Error rate 57.1%: n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0
Error rate 57.1%: n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0
Error rate 24.5%: n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0
Pass rate 71.9%: 23 passed / 0 skipped / 9 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 121 (upper) 112 (transposed)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 112 (transposed)':
::::::::......::.:.:.:.:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 122 (lower) 112 (transposed)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Completed all test-cases for this routine. Results:
454 test(s) passed
0 test(s) skipped
18 test(s) failed
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'DSYR2K' routine.
* All tests skipped: Unsupported precision
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'CSYR2K' routine. Legend:
: -> Test produced correct results
. -> Test returned the correct error code
X -> Test produced incorrect results
/ -> Test returned an incorrect error code
\ -> Test not executed: OpenCL-kernel compilation error
o -> Test not executed: Unsupported precision
- -> Test not completed: Reference CBLAS doesn't output error codes
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 111 (regular)':
XXXXXXXX......XX.:.:.:.:.......:
Error rate 36.7%: n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0
Error rate 34.7%: n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0
Pass rate 68.8%: 22 passed / 0 skipped / 10 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 121 (upper) 111 (regular)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 111 (regular)':
::::::::......::.:.:.:.:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 122 (lower) 111 (regular)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 121 (upper) 112 (transposed)':
::::::::::::::::.......:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 121 (upper) 112 (transposed)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 122 (lower) 112 (transposed)':
::::::::::::::::.......:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '101 (row-major) 122 (lower) 112 (transposed)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 111 (regular)':
::::::::::::::::.......:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 121 (upper) 111 (regular)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 111 (regular)':
::::::::::::::::.......:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 122 (lower) 111 (regular)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 121 (upper) 112 (transposed)':
XXXXXXXX......XX.:.:.:.:.......:
Error rate 36.7%: n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0
Error rate 36.7%: n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0
Pass rate 68.8%: 22 passed / 0 skipped / 10 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 121 (upper) 112 (transposed)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 122 (lower) 112 (transposed)':
::::::::......::.:.:.:.:.......:
Pass rate 100.0%: 32 passed / 0 skipped / 0 failed
* Testing 'invalid buffer sizes' for '102 (col-major) 122 (lower) 112 (transposed)':
...........................
Pass rate 100.0%: 27 passed / 0 skipped / 0 failed
* Completed all test-cases for this routine. Results:
452 test(s) passed
0 test(s) skipped
20 test(s) failed
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'ZSYR2K' routine.
* All tests skipped: Unsupported precision
* Running on OpenCL device 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile'.
* Starting tests for the 'HSYR2K' routine.
* All tests skipped: Unsupported precision
from clblast.
Thanks again for testing. I am now trying to reproduce it myself. I am also on Beignet, but with a Skylake GPU. I am testing with the tuning parameters for your Haswell GPU, so that's as close as I can get to your set-up. Below are my results for syr2k:
./clblast_test_xsyr2k -platform 1 -clblas 1 -cblas 0
: Same 18 & 20 failures as you./clblast_test_xsyr2k -platform 1 -clblas 0 -cblas 1
: No failures. Conclusion: there is a bug in clBLAS and not in CLBlast.
And for her2k:
./clblast_test_xher2k -platform 1 -clblas 1 -cblas 0
: No failures../clblast_test_xher2k -platform 1 -clblas 0 -cblas 1
: No failures.
I also tried to run under valgrind but I didn't observe anything interesting. So in conclusion I don't know if I can help you any further. Perhaps there is a genuine bug in the CBLAS library you're using? Or perhaps there is still an issue in Beignet or in the Intel drivers for your GPU?
from clblast.
I just updated from on my system from Beignet 1.2 to 1.2.1 and I see a lot improvements, especially related to half-precision (fp16). I also re-run the above commands, and I no longer see any errors. Could you perhaps also re-run the tests with the latest Beignet?
from clblast.
I am closing this issue, since I think most of the bugs are now fixed. The latest version of the code contains SYRK/SYR2K/HERK/HER2K and TRMM fixes, so that should be good. And then Beignet 1.2.1 should fix any remaining issues. If this is note the case, please open a new issue with a report of which test(s) fail.
from clblast.
Related Issues (20)
- GemmStridedBatched results question HOT 5
- make alltuner error HOT 7
- CL kernel preprocess cause compilation error HOT 2
- [Question] How to Install on Windows? HOT 2
- Cuda execution failed,when running clblast_sample_sgemm_cuda, "CUDA NVRTC error: nvrtcCompileProgram: NVRTC_ERROR_INVALID_OPTION" HOT 2
- [implement details] usm beheavior HOT 2
- CMake find package paths broken in MSYS2 HOT 3
- Binary releases on github are not valid tar.gz files
- Pyclblast float16 scalar conversion HOT 4
- Is it a good idea to use GCN cross lane instruction for optimization? HOT 15
- Do I have to cross-compile both opencl and clblast for android? HOT 2
- HGEMM performance in Adreno(tm) 740 is not faster than SGEMM HOT 1
- Compilining and running SGMM freezes HOT 4
- USing GPU for CLBLAST (need a tutorial) HOT 2
- need a tutorial on clblast::copy HOT 2
- gemm performance downgrade for small size M and big size N&K HOT 1
- Segmentation fault with OpenCL 3.0 CUDA (CUDA 12.3) HOT 3
- Consider ad HOT 1
- Consider add SVM Buffer interface support? HOT 3
- ruby numo-linalg + clblast: OpenCL error: clCreateContext: -6 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clblast.