codeplaysoftware / portblas Goto Github PK

View Code? Open in Web Editor NEW

223.0 223.0 48.0 2.54 MB

An implementation of BLAS using the SYCL open standard.

License: Apache License 2.0

CMake 4.42% C++ 93.87% Python 1.45% Shell 0.11% Dockerfile 0.08% C 0.06% Batchfile 0.01%

portblas's People

Contributors

Stargazers

Watchers

portblas's Issues

Adding constructor for Queue_interface to accept the pointer mapper as an input

Incorrect usage of Variadic Lenght Array on the Kernel code

The eval method of the RedRowMatVct operation in blas2_trees contains a local array of variable size, which is not supported in C++ , OpenCL or SYCL:

value_type eval(size_t i) {                                                   
    auto dim = r2.getSize();                                                    
    value_type v[warpSize];                                                     
    for (size_t w = 0; w < warpSize; w++) {                                     
      auto valWI = iniAddOp1_struct::eval(r2.eval(0));                          
      for (size_t j = w; j < dim; j += warpSize) {                              
        valWI += r1.eval(i, j) * r2.eval(j);                                    
      }                                                                         
      v[w] = valWI;                                                             
    }                                                                           
    auto valWG = iniAddOp1_struct::eval(r2.eval(0));                            
    for (size_t w = 0; w < warpSize; w++) {                                     
      valWG += v[w];                                                            
    }                                                                           
    return valWG;                                                               
  }

I've tried to sort out this by manually replacing warpSize with a one (which is the only used value), but I get an incorrect result. The test works on the Intel CPU but fails on the AMD and Intel GPUs.

Script to plot performance vs size for syclblas vs clblas

We need an script that can plot the performance of syclblas vs clblas with a range of input sizes for the vectors, and the speedup of the fusion.

GEMM gives wrong results with TARGET=INTEL_GPU with DPC++ 2022/10/13 and newer

dfa0579 allows recent DPC++ nightlies to compile SYCL-BLAS after recent changes to multi_ptr to match the SYCL-2020 spec. See #343.

However, if SYCL-BLAS is tested with -DTARGET=INTEL_GPU, the tests blas3_gemm_test, blas3_gemm_batched_test and blas3_trsm_test fail. With -DTARGET=DEFAULT_CPU, the tests pass. The benchmarking suite suggests that gemm fails only for some matrix sizes.

Compiling DPC++ with -DTARGET=INTEL_GPU with DPC++ nightly 2022/10/19, 2022/10/16, 2022/10/13 results in test failures when ctest is run:
- blas3_gemm_test, blas3_gemm_batched_test and blas3_trsm_test fail
Compiling DPC++ with -DTARGET=DEFAULT_CPU with DPC++ nightly 2022/10/19, 2022/10/16, 2022/10/13 passes tests
Compiling DPC++ with -DTARGET=DEFAULT_CPU or -DTARGET=INTEL_GPU with DPC++ nightly 2022/10/12 passes tests

DPC++ compile bug

In file portBLAS/include/blas_meta.h, the header file

#include <ext/oneapi/experimental/sycl_complex.hpp>

no longer exists in DPC++, instead you can use

#include <ext/oneapi/experimental/complex/complex.hpp>

which is located in llvm-dpcpp/build/include/sycl/ext/oneapi/experimental/complex/complex.hpp.

Enable the no shared memory version of reduction for devices that does not support shared memory

Special case of vector_view when stride = 1

We should add a special case of vector_view when stride is one (the most common case) to enable special optimizations in this case.
This requires adding a template parameter to the vector view for stride=1.

TRSM should assume unit diagonal when diag arg indicates this

Description

The trsm (solve triangular system of equations) takes the argument diag which can be set to indicate that the argument matrix is unit diagonal or not unit diagonal. Unit diagonal implies that the values on the diagonal of the matrix are all = 1.

When diag is set to unit diagonal, that this is true should be assumed (LAPACK docs, Intel docs). Instead, the trsm routine depends on the actual value on the diagonal.

Reproducing this issue

In the TRSM unit test here:

const scalar_t diagValue =
      diag == 'u' ? scalar_t{1} : random_scalar(scalar_t{1}, scalar_t{10});

can be set to

const scalar_t diagValue =
      diag == 'u' ? scalar_t{0.4} : random_scalar(scalar_t{1}, scalar_t{10});

The reference implementation ignores the 0.4 on the diagonal. SYCL-BLAS does not.

[DPC++][SYCL-2020] SYCL-2020 vec breaks SYCL-BLAS

In the DPC++ commit intel/llvm@8700b76, the vec and multi-pointer interface is updated to match the SYCL-2020 spec. In particular, the following changes is made:

  template <access::address_space Space>
  void store(size_t Offset, multi_ptr<DataT, Space> Ptr) const {

  template <access::address_space Space, access::decorated DecorateAddress>
  void store(size_t Offset,
             multi_ptr<DataT, Space, DecorateAddress> Ptr) const {

This breaks the usage of vec.store in SYCL-BLAS. Eg https://github.com/codeplaysoftware/sycl-blas/blob/master/src/operations/blas3/gemm_no_local_partial_vec.hpp#L474:

SYCL-BLAS uses:

cl::sycl::vec<element_t, work_per_load> in_vec;
vec.template store<ADDR_SPACE>(offset, float_ptr);

where float_ptr is float*. In SYCL-1.2.1 this is implicity cast to a multi-pointer. With SYCL-2020, the multi-pointer construct from ElemT* is only available where DecordateAddress is access::decorated::legacy.

Add Blas Level 3 to the benchmark

Small matrix support

In SYCL-DNN we have various handwritten tests that operate on small matrices to verify that the underlying matmul implementation is being used correctly.

These tests currently fail on SYCL-BLAS. Further investigation has revealed that SYCL-BLAS performs incorrectly for matrices under 15x15. For example, a 3x3 matrix multiply only computes the first row, leaving the last two rows in their initial state.

Fixing Arg/Min/Max

It is possible to remove code duplication by separationg component tasks into different individual tasks.

// ReducAssignNewOp3 can be eliminated completely by introducing (In blas1_tree.hpp)
//STEP1:Adding Tuple OP node
//.....
template<typename RHS>
struct TupleOP {
using value_type = IndVal<size_t, typename RHS::value_type>;
RHS r;
UnaryOp(RHS& _r) : r(_r){};
size_t getSize() { return r.getSize(); }
inline value_type eval(size_t i) { return value_type(i, r.eval(i); }
inline value_type eval(cl::sycl::nd_item<1> ndItem) {
return eval(ndItem.get_global(0));
}/....
//STEP2// By adding that it is possible to use the ReducAssignNewOp2 to do the imax, imean.
// Therefore for constructing the _imax, we wrap the input node with <TupleOP<INPUTTYPE>(input_node);
// This is in blas1_interface_sycl.hpp
///....
auto vec =std::vector<IndVal<T> >(nWG)
auto buf_val=cl::sycl::buffer<IndVal<T>, 1>(vec.data(), cl::sycl::range{nWG});
auto val1 = BufferVectorView<IndVal<T>>>(valT1, 0, 1, nWG);
auto val_tuple = TupleOP<vector_view<IndVal<T>, std::vector<IndVal<T>>>>(my_vx);
auto assignOp1 =
make_maxIndReducAssignNewOp2(val1, val_tuple, localSize, nWG * localSize);
ex.execute(assignOp1);
std::vector<IndVal<T>> valT2(1);
auto val2 = vector_view<IndVal<T>, std::vector<IndVal<T>>>(valT1, 0, 1, 1);
auto assignOp2 = make_maxIndReducAssignNewOp2(val2, val1, localSize, nWG);
//.....
// next step is to modify the blas_operators for Imax and Imean
//...
SYCLBLAS_DEFINE_TERNARY_OPERATOR(maxIndOp3_struct, const_val::min, > )
SYCLBLAS_DEFINE_TERNARY_OPERATOR(minIndOp3_struct, const_val::max, < )
// Should change to
SYCLBLAS_DEFINE_BINARY_OPERATOR(maxIndOp3_struct, const_val::min,
((l.getVal() > r.getVal()) ? l : r))
SYCLBLAS_DEFINE_BINARY_OPERATOR(minIndOp3_struct, const_val::max,
((l.getVal() < r.getVal()) ? l : r))
//.....

};

blas1_interface_test incorrect result

Blas1_interface_test does not compute the correct result on AMD GPU

~/sycl-blas/build$ ./tests/blas1_interface_test 
ERROR!! --> res = 15280 , sum = 16430.3 , err = 1150.3
ERROR!! --> res = -98 , dot = 47568.3 , err = 47666.3
ERROR!! --> res = 289172 , nrmY = 584.285 , err = -288588
ERROR!! --> res = 1126 , ind = 1022 , max = -41
ERROR!! --> res = -98 , giv = 30433.8 , err = 30531.8
ERROR!! --> res = 33 , diff = 75.5293 , err = 42.5293

Create a script to compare performance syclblas / clblas

We need an script to compare performance results of paper_blas1 and paper_clblas and produce plots.

Reporting a vulnerability

Hello!

I hope you are doing well!

We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.

Can you enable it, so that we can report it?

Thanks in advance!

PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository

Local size and global size must be read from the device instead of fixating it on the code.

Currently, the global size and the local size have been hard coded on the device. We should remove hard-coding this in the interface model and find a way to get it from the device.

Make allocate and deallocate to be RAII safe

Incorrect library name for DPC++ sycl-runtime on Windows.

In this commit to Intel's DPC++ nightly, the Windows SYCL-runtime library 's name was changed from sycl.dll/lib to sycl{MAJOR_VERSION}.dll/lib. This breaks the FindDPCPP.cmake module in SYCL-BLAS, which looks for just sycl.dll/lib.

Add GemV0 and GER0 in Blas level 2

Link to BLAS interface?

It would be useful to have a link to BLAS interface in here: https://github.com/codeplaysoftware/sycl-blas#interface

Add BLAS Level2 to the benchmark

run test and benchmark test

Can you please specify the commands to run your tests and benchmark ?

Thanks

examples

Can you please provide some examples (matrix multiply, matrix vector multiply, etc) that use SYCL-BLAS ?

Thanks

Build samples failed with DPC++

I try to build samples with DPC++ compiler, but have some errors in compilation. Does this look like a DPC++ compiler version issue?

The build errors follow:

In file included from ../samples/gemv.cpp:1:
In file included from ../samples/../src/sycl_blas.hpp:25:
In file included from ../include/sycl_blas.h:26:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl.hpp:11:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl/accessor.hpp:13:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl/buffer.hpp:11:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl/detail/buffer_impl.hpp:20:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl/types.hpp:13:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl/detail/generic_type_traits.hpp:16:
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:120:5: error: statement not allowed in constexpr function
do {
^
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:186:9: error: cannot assign to non-static data member within const member function 'operator-'
Buf ^= 0x8000;
~~~ ^
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:185:19: note: member function 'cl::sycl::detail::host_half_impl::half::operator-' is declared const here
constexpr half &operator-() {
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:187:12: error: binding reference of type 'cl::sycl::detail::host_half_impl::half' to value of type 'const cl::sycl::detail::host_half_impl::half' drops 'const' qualifier
return *this;
^~~~~
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:185:19: error: no return statement in constexpr function
constexpr half &operator-() {
^
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:211:22: error: an explicitly-defaulted copy assignment operator may not have 'const', 'constexpr' or 'volatile' qualifiers
constexpr half_v2 &operator=(const half_v2 &rhs) = default;
^
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:216:12: error: binding reference of type 'cl::sycl::detail::host_half_impl::half_v2' to value of type 'const cl::sycl::detail::host_half_impl::half_v2' drops 'const' qualifier
return *this;

.....

Restructuring the blas2_tree.hpp and remove the duplicated code from it.

OpenCL Headers Not Found

Hi, the build process for tests would complain that OpenCL headers could not be found.As long as the OpenCL SDK is not installed on regular system path.
It turns out that the FindComputeCpp.cmake doesn't add OpenCL_INCLUDE_DIR variable into include directories.
I'll create a pull request later.

ComputeCpp_INCLUDE_DIRS

I think users need to set ComputeCpp_INCLUDE_DIRS in addition to ComputeCpp_DIR before building the software using ninja.

Generated names for half and double are invalid

If BLAS_DATA_TYPES are set to include double or half, the unit tests generate errors similar to the following:

5: [ FATAL ] test/googletest-src/googletest/include/gtest/internal/gtest-param-util.h:582:: Condition IsValidParamName(param_name) failed. Parameterized test name 'upl0_u__n_14__alpha_1.5__incX_2__ldaMul_2' is invalid, in /<DIR>/sycl-blas/test/unittest/blas2/blas2_syr_test.cpp line 98

The name is invalid because of the . in the name - test names cannot include - or ..

This is fixed for float in https://github.com/codeplaysoftware/sycl-blas/blob/master/test/blas_test.hpp#L212 with a specialized dump_arg function. Similar specializations are required for half and double (and any other floating point types).

Here we can use some heuristics to select better global, local, and // scratch size per device

Build fails with -DTUNING_TARGET=POWER_VR due to missing TBSV implementation

I get a series of errors like this:

In file included from /home/rob/sycl/sycl-blas/build/computecpp_powervr/generated_src/blas2/ger//ger_float_int_BufferIterator_float_BufferIterator_float_BufferIterator_float_int.cpp:28:
/home/rob/sycl/sycl-blas/src/interface/blas2_interface.hpp:1095:30: error: no member named 'tbsv' in namespace 'blas'
  INST_UPLO_TRANS_DIAG(blas::tbsv::backend::_tbsv, sb_handle, _N, _K, _mA, _lda,
                       ~~~~~~^

It looks reasonably easy to add from what I can see from other files, so will open a PR once I have something.

hipSYCL Compilation Error

When compiling with hipSYCL using the README.md command cmake .. -DhipSYCL_DIR=/opt/hipSYCL/CUDA/lib/cmake/hipSYCL -DSYCL_COMPILER=hipsycl I get the attached compiler error multiple times. My guess is a release version of hipSYCL such as v0.9.1 is needed rather than the repo default branch but I'm guessing that a breaking version change since the initial hipSYCL support is the issue.

Building master on Arch Linux with packages:
computecpp: 2.3.0-1
hipsycl-cuda-git: r1419.72b201e-1
gcc: 11.1.0-1

Compiler Error:

In file included from ~/sycl-blas/src/policy/sycl_policy_handler.cpp:29:
~/sycl-blas/src/policy/sycl_policy_handler.hpp: In instantiation of ‘blas::BufferIterator<element_t, blas::codeplay_policy> blas::PolicyHandler<blas::codeplay_policy>::get_buffer(element_t*) const [with element_t = float]’:
~/sycl-blas/src/policy/sycl_policy_handler.cpp:80:1:   required from here
~/sycl-blas/src/policy/sycl_policy_handler.hpp:62:10: error: no matching function for call to ‘blas::BufferIterator<float, blas::codeplay_policy>::BufferIterator(hipsycl::sycl::buffer<float, 1, std::allocator<unsigned char> >&, long int&)’
   62 |   return BufferIterator<element_t, codeplay_policy>(buff, offset);
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ~/sycl-blas/include/policy/sycl_policy_handler.h:29,
                 from ~/sycl-blas/src/policy/sycl_policy_handler.hpp:28,
                 from ~/sycl-blas/src/policy/sycl_policy_handler.cpp:29:
~/sycl-blas/include/container/sycl_iterator.h:86:3: note: candidate: ‘template<class other_scalar_t, class U, class> blas::BufferIterator<element_t, blas::codeplay_policy>::BufferIterator(const blas::BufferIterator<other_scalar_t, blas::codeplay_policy>&) [with other_scalar_t = other_scalar_t; U = U; <template-parameter-2-3> = <template-parameter-1-3>; element_t = float]’
   86 |   BufferIterator(const BufferIterator<other_scalar_t, codeplay_policy>& other);
      |   ^~~~~~~~~~~~~~
~/sycl-blas/include/container/sycl_iterator.h:86:3: note:   template argument deduction/substitution failed:
In file included from ~/sycl-blas/src/policy/sycl_policy_handler.cpp:29:
~/sycl-blas/src/policy/sycl_policy_handler.hpp:62:10: note:   ‘hipsycl::sycl::buffer<float, 1, std::allocator<unsigned char> >’ is not derived from ‘const blas::BufferIterator<element_t, blas::codeplay_policy>’
   62 |   return BufferIterator<element_t, codeplay_policy>(buff, offset);
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ~/sycl-blas/include/policy/sycl_policy_handler.h:29,
                 from ~/sycl-blas/src/policy/sycl_policy_handler.hpp:28,
                 from ~/sycl-blas/src/policy/sycl_policy_handler.cpp:29:
~/sycl-blas/include/container/sycl_iterator.h:275:8: note: candidate: ‘blas::BufferIterator<element_t, blas::codeplay_policy>::BufferIterator(const buff_t&) [with element_t = float; blas::BufferIterator<element_t, blas::codeplay_policy>::buff_t = hipsycl::sycl::buffer<float, 1, std::allocator<float> >]’
  275 | inline BufferIterator<element_t, codeplay_policy>::BufferIterator(
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~/sycl-blas/include/container/sycl_iterator.h:275:8: note:   candidate expects 1 argument, 2 provided
~/sycl-blas/include/container/sycl_iterator.h:269:8: note: candidate: ‘blas::BufferIterator<element_t, blas::codeplay_policy>::BufferIterator(const buff_t&, std::ptrdiff_t) [with element_t = float; blas::BufferIterator<element_t, blas::codeplay_policy>::buff_t = hipsycl::sycl::buffer<float, 1, std::allocator<float> >; std::ptrdiff_t = long int]’
  269 | inline BufferIterator<element_t, codeplay_policy>::BufferIterator(
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~/sycl-blas/include/container/sycl_iterator.h:270:72: note:   no known conversion for argument 1 from ‘hipsycl::sycl::buffer<float, 1, std::allocator<unsigned char> >’ to ‘const buff_t&’ {aka ‘const hipsycl::sycl::buffer<float, 1, std::allocator<float> >&’}
  270 |     const typename BufferIterator<element_t, codeplay_policy>::buff_t& buff,
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
~/sycl-blas/include/container/sycl_iterator.h:67:3: note: candidate: ‘blas::BufferIterator<element_t, blas::codeplay_policy>::BufferIterator() [with element_t = float]’
   67 |   BufferIterator() : offset_{0}, buffer_{cl::sycl::range<1>{1}} {}
      |   ^~~~~~~~~~~~~~
~/sycl-blas/include/container/sycl_iterator.h:67:3: note:   candidate expects 0 arguments, 2 provided
In file included from ~/sycl-blas/include/policy/sycl_policy_handler.h:29,
                 from ~/sycl-blas/src/policy/sycl_policy_handler.hpp:28,
                 from ~/sycl-blas/src/policy/sycl_policy_handler.cpp:29:
~/sycl-blas/include/container/sycl_iterator.h:36:7: note: candidate: ‘blas::BufferIterator<float, blas::codeplay_policy>::BufferIterator(const blas::BufferIterator<float, blas::codeplay_policy>&)’
   36 | class BufferIterator<element_t, codeplay_policy> {
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~/sycl-blas/include/container/sycl_iterator.h:36:7: note:   candidate expects 1 argument, 2 provided
~/sycl-blas/include/container/sycl_iterator.h:36:7: note: candidate: ‘blas::BufferIterator<float, blas::codeplay_policy>::BufferIterator(blas::BufferIterator<float, blas::codeplay_policy>&&)’
~/sycl-blas/include/container/sycl_iterator.h:36:7: note:   candidate expects 1 argument, 2 provided

Tests failing on GPU: Binary format for key='0', ident='' is not recognized

Hi,

building sycl-blas works fine on my system but the tests only work on the host device (that is all tests fail on either
my nvidia or intel gpu).
I see the warning

/home/hschulz/SYCL/ComputeCpp-CE-2.1.0-x86_64-linux-gnu/bin/spirv-ll-tool: /lib64/libtinfo.so.5: no version information available (required by /home/hschulz/SYCL/ComputeCpp-CE-2.1.0-x86_64-linux-gnu/bin/spirv-ll-tool)

at every build step but according to codeplay's support pages this should not be an issue.

Testing e.g.
./test/unittest/blas3_gemm_test --device=nvidia:gpu
results in messages of the form

[----------] Global test environment set-up.
[----------] 72 tests from Gemm/GemmSmallBetaNonZeroLDMatchFloat
[ RUN      ] Gemm/GemmSmallBetaNonZeroLDMatchFloat.test/0
Device vendor: NVIDIA Corporation
Device name: GeForce GTX 960M
Device type: gpu
Error: [ComputeCpp:RT0100] Failed to build program (<Build log for program 0xf45840 (size: 65) error   : Binary format for key='0', ident='' is not recognized>)
Value mismatch at index 0: -0.814363; expected 101.708
../test/unittest/blas3/blas3_gemm_common.hpp:157: Failure

I am following your instructions to build with cmake using this command, the logfile is attached:
cmake.log
cmake -GNinja ../ -DComputeCpp_DIR=$PWD/../../ComputeCpp-CE-2.1.0-x86_64-linux-gnu/ -DCMAKE_INSTALL_PREFIX=$PWD/../local -DBLAS_ENABLE_TESTING=ON

My system is running Fedora30 and computecpp_info reports:

GLIBC version: 2.29
GLIBCXX: 20190605
This version of libstdc++ is not supported.

[...]

  Device is supported                     : UNTESTED - Untested OS
  Bitcode targets                         : ptx64 
  CL_DEVICE_NAME                          : GeForce GTX 960M
  CL_DEVICE_VENDOR                        : NVIDIA Corporation
  CL_DRIVER_VERSION                       : 440.64
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 

  Device is supported                     : NO - No bitcode target found
  Bitcode targets                         : 
  CL_DEVICE_NAME                          : pthread-Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  CL_DEVICE_VENDOR                        : GenuineIntel
  CL_DRIVER_VERSION                       : 1.3-pre
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_CPU 

  Device is supported                     : UNTESTED - Untested OS
  Bitcode targets                         : spir32 
  CL_DEVICE_NAME                          : Intel(R) HD Graphics Skylake Halo GT2
  CL_DEVICE_VENDOR                        : Intel
  CL_DRIVER_VERSION                       : 1.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU

I heard today that the glibc warning should not matter, too.
I am not sure why things are going wrong and would greatly appreciate any help.

Setting BLAS_ENABLE_CONST_INPUT=OFF has no effect in some places

The option BLAS_ENABLE_CONST_INPUT can be set to OFF, theoretically disabling kernel instantiation with const input buffers. However, this has no effect in some parts of the source. This is because the source C++ checks whether BLAS_ENABLE_CONST_INPUT is defined, not its value.

The CMake defines BLAS_ENABLE_CONST_INPUT here
BLAS_ENABLE_CONST_INPUT is #ifdef'd here

Add Wrapper over the blas interface to suport old cstyle s_ , d_, c***_ interface for float, double and complex type

sudo Docker build .

return

[ 3%] Building CXX object tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/slide_21.cpp.o
In file included from /usr/include/boost/compute/detail/meta_kernel.hpp:37,
from /usr/include/boost/compute/algorithm/reduce.hpp:18,
from /usr/include/boost/compute/algorithm/accumulate.hpp:19,
from /usr/include/boost/compute/algorithm.hpp:18,
from /usr/include/boost/compute.hpp:14,
from /trisycl/include/triSYCL/detail/global_config.hpp:51,
from /trisycl/include/triSYCL/sycl.hpp:42,
from /trisycl/include/CL/sycl.hpp:10,
from /trisycl/tests/2014-04-21-HPC-GPU_Meetup/slide_14.cpp:6:
/usr/include/boost/compute/memory/svm_ptr.hpp: In member function 'boost::compute::context& boost::compute::svm_ptr::get_context() const':
/usr/include/boost/compute/memory/svm_ptr.hpp:131:16: error: binding reference of type 'boost::compute::context&' to 'const boost::compute::context' discards qualifiers
return m_context;
^~~~~~~~~
In file included from /usr/include/boost/compute/detail/meta_kernel.hpp:37,
from /usr/include/boost/compute/algorithm/reduce.hpp:18,
from /usr/include/boost/compute/algorithm/accumulate.hpp:19,
from /usr/include/boost/compute/algorithm.hpp:18,
from /usr/include/boost/compute.hpp:14,
from /trisycl/include/triSYCL/detail/global_config.hpp:51,
from /trisycl/include/triSYCL/sycl.hpp:42,
from /trisycl/include/CL/sycl.hpp:10,
from /trisycl/tests/2014-04-21-HPC-GPU_Meetup/slide_17-18.cpp:4:
/usr/include/boost/compute/memory/svm_ptr.hpp: In member function 'boost::compute::context& boost::compute::svm_ptr::get_context() const':
/usr/include/boost/compute/memory/svm_ptr.hpp:131:16: error: binding reference of type 'boost::compute::context&' to 'const boost::compute::context' discards qualifiers
return m_context;
^~~~~~~~~
In file included from /usr/include/boost/compute/detail/meta_kernel.hpp:37,
from /usr/include/boost/compute/algorithm/reduce.hpp:18,
from /usr/include/boost/compute/algorithm/accumulate.hpp:19,
from /usr/include/boost/compute/algorithm.hpp:18,
from /usr/include/boost/compute.hpp:14,
from /trisycl/include/triSYCL/detail/global_config.hpp:51,
from /trisycl/include/triSYCL/sycl.hpp:42,
from /trisycl/include/CL/sycl.hpp:10,
from /trisycl/tests/2014-04-21-HPC-GPU_Meetup/slide_7.cpp:4:
/usr/include/boost/compute/memory/svm_ptr.hpp: In member function 'boost::compute::context& boost::compute::svm_ptr::get_context() const':
/usr/include/boost/compute/memory/svm_ptr.hpp:131:16: error: binding reference of type 'boost::compute::context&' to 'const boost::compute::context' discards qualifiers
return m_context;
^~~~~~~~~
In file included from /usr/include/boost/compute/detail/meta_kernel.hpp:37,
from /usr/include/boost/compute/algorithm/reduce.hpp:18,
from /usr/include/boost/compute/algorithm/accumulate.hpp:19,
from /usr/include/boost/compute/algorithm.hpp:18,
from /usr/include/boost/compute.hpp:14,
from /trisycl/include/triSYCL/detail/global_config.hpp:51,
from /trisycl/include/triSYCL/sycl.hpp:42,
from /trisycl/include/CL/sycl.hpp:10,
from /trisycl/tests/2014-04-21-HPC-GPU_Meetup/slide_21.cpp:4:
/usr/include/boost/compute/memory/svm_ptr.hpp: In member function 'boost::compute::context& boost::compute::svm_ptr::get_context() const':
/usr/include/boost/compute/memory/svm_ptr.hpp:131:16: error: binding reference of type 'boost::compute::context&' to 'const boost::compute::context' discards qualifiers
return m_context;
^~~~~~~~~
tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_14.dir/build.make:62: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_14.dir/slide_14.cpp.o' failed
make[2]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_14.dir/slide_14.cpp.o] Error 1
CMakeFiles/Makefile2:1130: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_14.dir/all' failed
make[1]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_14.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_7.dir/build.make:62: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_7.dir/slide_7.cpp.o' failed
make[2]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_7.dir/slide_7.cpp.o] Error 1
CMakeFiles/Makefile2:1167: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_7.dir/all' failed
make[1]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_7.dir/all] Error 2
make[2]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/slide_21.cpp.o] Error 1
tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/build.make:62: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/slide_21.cpp.o' failed
make[1]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/all] Error 2
CMakeFiles/Makefile2:1204: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/all' failed
tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_17-18.dir/build.make:62: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_17-18.dir/slide_17-18.cpp.o' failed
make[2]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_17-18.dir/slide_17-18.cpp.o] Error 1
CMakeFiles/Makefile2:1093: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_17-18.dir/all' failed
make[1]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_17-18.dir/all] Error 2

Add Attach/Dettach buffer to Sycl_queue

Add Gemm Transposed Tests to unit test

mix of sycl-blas and sycl kernels

To get people started quickly, I think this example is helpful.
There are two SYCL kernels: the first kernel produces results read by the second kernel. The first one is a general kernel while the second one uses sycl-blas. It is not clear if we can pass 'output_d' to the sycl-blas function directly. All the sycl-blas examples assume the inputs are initialized on the host before calling 'make_sycl_iterator_buffer'.

Can we declare the input buffers to the sycl-blas functions using the general 'buffer<type, 1>' ?

Thanks for your guide.

sycl:
buffer<float, 1> output_d (n);

sycl-blas
auto a_gpu = blas::make_sycl_iterator_buffer(A, n); // auto a_gpu = output_d ?

blas2_interface_test and blas3_interface_test fail with AMD backend

An instruction selection error seems to affect both tests after the latest changes on the development branch:

~/sycl-blas/build$ ./tests/blas2_interface_test  
i1i1i1i1i1i1i1i1i1i1i1i1i1i1i1i1i1i1LLVM ERROR: Cannot select: 0xf8c920: i32,ch = load 0xf8c710:1, 0xf8d058, 0xd453a0<LD1[undef](align=16)(nontemporal), zext from i1> [ORD=663] [ID=106]
  0xf8d058: i64 = Constant<192> [ORD=663] [ID=7]
  0xd453a0: i64 = undef [ORD=663] [ID=4]
In function: __OpenCL_SYCL_ea23033f1d994e66743d1606f2793a48_0_stub
 SYCL Runtime closed with the following errors:
SYCL objects are still alive while the runtime is shutting down

This is potentially due to the usage of boolean values that are converted into integers. This appears to be a bug in the AMD compiler backend. We can workaround this on SYCLBLAS by avoiding conversions from boolean to integer.

Incorrect result of blas3_interface

The blas3_interface_test outputs an incorrect result on Intel and AMD GPUs, although it works fine on Intel CPU.

$ ./tests/blas3_interface_test 
igdrcl: using XCB-DRI2 authentication...
ERROR!! --> res = 1.35646e+21 , addC = 1.43327e+22 , err = 1.29763e+22

codeplaysoftware / portblas Goto Github PK

portblas's People

Contributors

Stargazers

Watchers

Forkers

portblas's Issues

Description

Reproducing this issue

Recommend Projects

Recommend Topics

Recommend Org