codeplaysoftware / portblas Goto Github PK
View Code? Open in Web Editor NEWAn implementation of BLAS using the SYCL open standard.
License: Apache License 2.0
An implementation of BLAS using the SYCL open standard.
License: Apache License 2.0
The eval method of the RedRowMatVct operation in blas2_trees contains a local array of variable size, which is not supported in C++ , OpenCL or SYCL:
value_type eval(size_t i) {
auto dim = r2.getSize();
value_type v[warpSize];
for (size_t w = 0; w < warpSize; w++) {
auto valWI = iniAddOp1_struct::eval(r2.eval(0));
for (size_t j = w; j < dim; j += warpSize) {
valWI += r1.eval(i, j) * r2.eval(j);
}
v[w] = valWI;
}
auto valWG = iniAddOp1_struct::eval(r2.eval(0));
for (size_t w = 0; w < warpSize; w++) {
valWG += v[w];
}
return valWG;
}
I've tried to sort out this by manually replacing warpSize with a one (which is the only used value), but I get an incorrect result. The test works on the Intel CPU but fails on the AMD and Intel GPUs.
We need an script that can plot the performance of syclblas vs clblas with a range of input sizes for the vectors, and the speedup of the fusion.
dfa0579 allows recent DPC++ nightlies to compile SYCL-BLAS after recent changes to multi_ptr
to match the SYCL-2020 spec. See #343.
However, if SYCL-BLAS is tested with -DTARGET=INTEL_GPU
, the tests blas3_gemm_test
, blas3_gemm_batched_test
and blas3_trsm_test
fail. With -DTARGET=DEFAULT_CPU
, the tests pass. The benchmarking suite suggests that gemm
fails only for some matrix sizes.
-DTARGET=INTEL_GPU
with DPC++ nightly 2022/10/19, 2022/10/16, 2022/10/13 results in test failures when ctest
is run:
blas3_gemm_test
, blas3_gemm_batched_test
and blas3_trsm_test
fail-DTARGET=DEFAULT_CPU
with DPC++ nightly 2022/10/19, 2022/10/16, 2022/10/13 passes tests-DTARGET=DEFAULT_CPU
or -DTARGET=INTEL_GPU
with DPC++ nightly 2022/10/12 passes testsIn file portBLAS/include/blas_meta.h
, the header file
#include <ext/oneapi/experimental/sycl_complex.hpp>
no longer exists in DPC++, instead you can use
#include <ext/oneapi/experimental/complex/complex.hpp>
which is located in llvm-dpcpp/build/include/sycl/ext/oneapi/experimental/complex/complex.hpp
.
We should add a special case of vector_view when stride is one (the most common case) to enable special optimizations in this case.
This requires adding a template parameter to the vector view for stride=1.
The trsm
(solve triangular system of equations) takes the argument diag
which can be set to indicate that the argument matrix is unit diagonal or not unit diagonal. Unit diagonal implies that the values on the diagonal of the matrix are all = 1.
When diag
is set to unit diagonal, that this is true should be assumed (LAPACK docs, Intel docs). Instead, the trsm
routine depends on the actual value on the diagonal.
In the TRSM unit test here:
const scalar_t diagValue =
diag == 'u' ? scalar_t{1} : random_scalar(scalar_t{1}, scalar_t{10});
can be set to
const scalar_t diagValue =
diag == 'u' ? scalar_t{0.4} : random_scalar(scalar_t{1}, scalar_t{10});
The reference implementation ignores the 0.4 on the diagonal. SYCL-BLAS does not.
In the DPC++ commit intel/llvm@8700b76, the vec
and multi-pointer interface is updated to match the SYCL-2020 spec. In particular, the following changes is made:
template <access::address_space Space>
void store(size_t Offset, multi_ptr<DataT, Space> Ptr) const {
to
template <access::address_space Space, access::decorated DecorateAddress>
void store(size_t Offset,
multi_ptr<DataT, Space, DecorateAddress> Ptr) const {
This breaks the usage of vec.store
in SYCL-BLAS. Eg https://github.com/codeplaysoftware/sycl-blas/blob/master/src/operations/blas3/gemm_no_local_partial_vec.hpp#L474:
SYCL-BLAS uses:
cl::sycl::vec<element_t, work_per_load> in_vec;
vec.template store<ADDR_SPACE>(offset, float_ptr);
where float_ptr
is float*
. In SYCL-1.2.1 this is implicity cast to a multi-pointer. With SYCL-2020, the multi-pointer construct from ElemT*
is only available where DecordateAddress
is access::decorated::legacy
.
In SYCL-DNN we have various handwritten tests that operate on small matrices to verify that the underlying matmul implementation is being used correctly.
These tests currently fail on SYCL-BLAS. Further investigation has revealed that SYCL-BLAS performs incorrectly for matrices under 15x15. For example, a 3x3 matrix multiply only computes the first row, leaving the last two rows in their initial state.
It is possible to remove code duplication by separationg component tasks into different individual tasks.
// ReducAssignNewOp3 can be eliminated completely by introducing (In blas1_tree.hpp)
//STEP1:Adding Tuple OP node
//.....
template<typename RHS>
struct TupleOP {
using value_type = IndVal<size_t, typename RHS::value_type>;
RHS r;
UnaryOp(RHS& _r) : r(_r){};
size_t getSize() { return r.getSize(); }
inline value_type eval(size_t i) { return value_type(i, r.eval(i); }
inline value_type eval(cl::sycl::nd_item<1> ndItem) {
return eval(ndItem.get_global(0));
}/....
//STEP2// By adding that it is possible to use the ReducAssignNewOp2 to do the imax, imean.
// Therefore for constructing the _imax, we wrap the input node with <TupleOP<INPUTTYPE>(input_node);
// This is in blas1_interface_sycl.hpp
///....
auto vec =std::vector<IndVal<T> >(nWG)
auto buf_val=cl::sycl::buffer<IndVal<T>, 1>(vec.data(), cl::sycl::range{nWG});
auto val1 = BufferVectorView<IndVal<T>>>(valT1, 0, 1, nWG);
auto val_tuple = TupleOP<vector_view<IndVal<T>, std::vector<IndVal<T>>>>(my_vx);
auto assignOp1 =
make_maxIndReducAssignNewOp2(val1, val_tuple, localSize, nWG * localSize);
ex.execute(assignOp1);
std::vector<IndVal<T>> valT2(1);
auto val2 = vector_view<IndVal<T>, std::vector<IndVal<T>>>(valT1, 0, 1, 1);
auto assignOp2 = make_maxIndReducAssignNewOp2(val2, val1, localSize, nWG);
//.....
// next step is to modify the blas_operators for Imax and Imean
//...
SYCLBLAS_DEFINE_TERNARY_OPERATOR(maxIndOp3_struct, const_val::min, > )
SYCLBLAS_DEFINE_TERNARY_OPERATOR(minIndOp3_struct, const_val::max, < )
// Should change to
SYCLBLAS_DEFINE_BINARY_OPERATOR(maxIndOp3_struct, const_val::min,
((l.getVal() > r.getVal()) ? l : r))
SYCLBLAS_DEFINE_BINARY_OPERATOR(minIndOp3_struct, const_val::max,
((l.getVal() < r.getVal()) ? l : r))
//.....
};
Blas1_interface_test does not compute the correct result on AMD GPU
~/sycl-blas/build$ ./tests/blas1_interface_test
ERROR!! --> res = 15280 , sum = 16430.3 , err = 1150.3
ERROR!! --> res = -98 , dot = 47568.3 , err = 47666.3
ERROR!! --> res = 289172 , nrmY = 584.285 , err = -288588
ERROR!! --> res = 1126 , ind = 1022 , max = -41
ERROR!! --> res = -98 , giv = 30433.8 , err = 30531.8
ERROR!! --> res = 33 , diff = 75.5293 , err = 42.5293
We need an script to compare performance results of paper_blas1 and paper_clblas and produce plots.
Hello!
I hope you are doing well!
We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.
Can you enable it, so that we can report it?
Thanks in advance!
PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository
Currently, the global size and the local size have been hard coded on the device. We should remove hard-coding this in the interface model and find a way to get it from the device.
In this commit to Intel's DPC++ nightly, the Windows SYCL-runtime library 's name was changed from sycl.dll/lib
to sycl{MAJOR_VERSION}.dll/lib
. This breaks the FindDPCPP.cmake module in SYCL-BLAS, which looks for just sycl.dll/lib
.
It would be useful to have a link to BLAS interface in here: https://github.com/codeplaysoftware/sycl-blas#interface
Can you please specify the commands to run your tests and benchmark ?
Thanks
Can you please provide some examples (matrix multiply, matrix vector multiply, etc) that use SYCL-BLAS ?
Thanks
I try to build samples with DPC++ compiler, but have some errors in compilation. Does this look like a DPC++ compiler version issue?
The build errors follow:
In file included from ../samples/gemv.cpp:1:
In file included from ../samples/../src/sycl_blas.hpp:25:
In file included from ../include/sycl_blas.h:26:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl.hpp:11:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl/accessor.hpp:13:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl/buffer.hpp:11:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl/detail/buffer_impl.hpp:20:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl/types.hpp:13:
In file included from /home/mlx/llvm/build/bin/../include/sycl/CL/sycl/detail/generic_type_traits.hpp:16:
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:120:5: error: statement not allowed in constexpr function
do {
^
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:186:9: error: cannot assign to non-static data member within const member function 'operator-'
Buf ^= 0x8000;
~~~ ^
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:185:19: note: member function 'cl::sycl::detail::host_half_impl::half::operator-' is declared const here
constexpr half &operator-() {
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:187:12: error: binding reference of type 'cl::sycl::detail::host_half_impl::half' to value of type 'const cl::sycl::detail::host_half_impl::half' drops 'const' qualifier
return *this;
^~~~~
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:185:19: error: no return statement in constexpr function
constexpr half &operator-() {
^
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:211:22: error: an explicitly-defaulted copy assignment operator may not have 'const', 'constexpr' or 'volatile' qualifiers
constexpr half_v2 &operator=(const half_v2 &rhs) = default;
^
/home/mlx/llvm/build/bin/../include/sycl/CL/sycl/half_type.hpp:216:12: error: binding reference of type 'cl::sycl::detail::host_half_impl::half_v2' to value of type 'const cl::sycl::detail::host_half_impl::half_v2' drops 'const' qualifier
return *this;
.....
Hi, the build process for tests would complain that OpenCL headers could not be found.As long as the OpenCL SDK is not installed on regular system path.
It turns out that the FindComputeCpp.cmake doesn't add OpenCL_INCLUDE_DIR
variable into include directories.
I'll create a pull request later.
I think users need to set ComputeCpp_INCLUDE_DIRS in addition to ComputeCpp_DIR before building the software using ninja.
If BLAS_DATA_TYPES
are set to include double
or half
, the unit tests generate errors similar to the following:
5: [ FATAL ] test/googletest-src/googletest/include/gtest/internal/gtest-param-util.h:582:: Condition IsValidParamName(param_name) failed. Parameterized test name 'upl0_u__n_14__alpha_1.5__incX_2__ldaMul_2' is invalid, in /<DIR>/sycl-blas/test/unittest/blas2/blas2_syr_test.cpp line 98
The name is invalid because of the .
in the name - test names cannot include -
or .
.
This is fixed for float in https://github.com/codeplaysoftware/sycl-blas/blob/master/test/blas_test.hpp#L212 with a specialized dump_arg
function. Similar specializations are required for half
and double
(and any other floating point types).
I get a series of errors like this:
In file included from /home/rob/sycl/sycl-blas/build/computecpp_powervr/generated_src/blas2/ger//ger_float_int_BufferIterator_float_BufferIterator_float_BufferIterator_float_int.cpp:28:
/home/rob/sycl/sycl-blas/src/interface/blas2_interface.hpp:1095:30: error: no member named 'tbsv' in namespace 'blas'
INST_UPLO_TRANS_DIAG(blas::tbsv::backend::_tbsv, sb_handle, _N, _K, _mA, _lda,
~~~~~~^
It looks reasonably easy to add from what I can see from other files, so will open a PR once I have something.
When compiling with hipSYCL using the README.md command cmake .. -DhipSYCL_DIR=/opt/hipSYCL/CUDA/lib/cmake/hipSYCL -DSYCL_COMPILER=hipsycl
I get the attached compiler error multiple times. My guess is a release version of hipSYCL such as v0.9.1 is needed rather than the repo default branch but I'm guessing that a breaking version change since the initial hipSYCL support is the issue.
Building master on Arch Linux with packages:
computecpp: 2.3.0-1
hipsycl-cuda-git: r1419.72b201e-1
gcc: 11.1.0-1
Compiler Error:
In file included from ~/sycl-blas/src/policy/sycl_policy_handler.cpp:29:
~/sycl-blas/src/policy/sycl_policy_handler.hpp: In instantiation of ‘blas::BufferIterator<element_t, blas::codeplay_policy> blas::PolicyHandler<blas::codeplay_policy>::get_buffer(element_t*) const [with element_t = float]’:
~/sycl-blas/src/policy/sycl_policy_handler.cpp:80:1: required from here
~/sycl-blas/src/policy/sycl_policy_handler.hpp:62:10: error: no matching function for call to ‘blas::BufferIterator<float, blas::codeplay_policy>::BufferIterator(hipsycl::sycl::buffer<float, 1, std::allocator<unsigned char> >&, long int&)’
62 | return BufferIterator<element_t, codeplay_policy>(buff, offset);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ~/sycl-blas/include/policy/sycl_policy_handler.h:29,
from ~/sycl-blas/src/policy/sycl_policy_handler.hpp:28,
from ~/sycl-blas/src/policy/sycl_policy_handler.cpp:29:
~/sycl-blas/include/container/sycl_iterator.h:86:3: note: candidate: ‘template<class other_scalar_t, class U, class> blas::BufferIterator<element_t, blas::codeplay_policy>::BufferIterator(const blas::BufferIterator<other_scalar_t, blas::codeplay_policy>&) [with other_scalar_t = other_scalar_t; U = U; <template-parameter-2-3> = <template-parameter-1-3>; element_t = float]’
86 | BufferIterator(const BufferIterator<other_scalar_t, codeplay_policy>& other);
| ^~~~~~~~~~~~~~
~/sycl-blas/include/container/sycl_iterator.h:86:3: note: template argument deduction/substitution failed:
In file included from ~/sycl-blas/src/policy/sycl_policy_handler.cpp:29:
~/sycl-blas/src/policy/sycl_policy_handler.hpp:62:10: note: ‘hipsycl::sycl::buffer<float, 1, std::allocator<unsigned char> >’ is not derived from ‘const blas::BufferIterator<element_t, blas::codeplay_policy>’
62 | return BufferIterator<element_t, codeplay_policy>(buff, offset);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ~/sycl-blas/include/policy/sycl_policy_handler.h:29,
from ~/sycl-blas/src/policy/sycl_policy_handler.hpp:28,
from ~/sycl-blas/src/policy/sycl_policy_handler.cpp:29:
~/sycl-blas/include/container/sycl_iterator.h:275:8: note: candidate: ‘blas::BufferIterator<element_t, blas::codeplay_policy>::BufferIterator(const buff_t&) [with element_t = float; blas::BufferIterator<element_t, blas::codeplay_policy>::buff_t = hipsycl::sycl::buffer<float, 1, std::allocator<float> >]’
275 | inline BufferIterator<element_t, codeplay_policy>::BufferIterator(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~/sycl-blas/include/container/sycl_iterator.h:275:8: note: candidate expects 1 argument, 2 provided
~/sycl-blas/include/container/sycl_iterator.h:269:8: note: candidate: ‘blas::BufferIterator<element_t, blas::codeplay_policy>::BufferIterator(const buff_t&, std::ptrdiff_t) [with element_t = float; blas::BufferIterator<element_t, blas::codeplay_policy>::buff_t = hipsycl::sycl::buffer<float, 1, std::allocator<float> >; std::ptrdiff_t = long int]’
269 | inline BufferIterator<element_t, codeplay_policy>::BufferIterator(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~/sycl-blas/include/container/sycl_iterator.h:270:72: note: no known conversion for argument 1 from ‘hipsycl::sycl::buffer<float, 1, std::allocator<unsigned char> >’ to ‘const buff_t&’ {aka ‘const hipsycl::sycl::buffer<float, 1, std::allocator<float> >&’}
270 | const typename BufferIterator<element_t, codeplay_policy>::buff_t& buff,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
~/sycl-blas/include/container/sycl_iterator.h:67:3: note: candidate: ‘blas::BufferIterator<element_t, blas::codeplay_policy>::BufferIterator() [with element_t = float]’
67 | BufferIterator() : offset_{0}, buffer_{cl::sycl::range<1>{1}} {}
| ^~~~~~~~~~~~~~
~/sycl-blas/include/container/sycl_iterator.h:67:3: note: candidate expects 0 arguments, 2 provided
In file included from ~/sycl-blas/include/policy/sycl_policy_handler.h:29,
from ~/sycl-blas/src/policy/sycl_policy_handler.hpp:28,
from ~/sycl-blas/src/policy/sycl_policy_handler.cpp:29:
~/sycl-blas/include/container/sycl_iterator.h:36:7: note: candidate: ‘blas::BufferIterator<float, blas::codeplay_policy>::BufferIterator(const blas::BufferIterator<float, blas::codeplay_policy>&)’
36 | class BufferIterator<element_t, codeplay_policy> {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~/sycl-blas/include/container/sycl_iterator.h:36:7: note: candidate expects 1 argument, 2 provided
~/sycl-blas/include/container/sycl_iterator.h:36:7: note: candidate: ‘blas::BufferIterator<float, blas::codeplay_policy>::BufferIterator(blas::BufferIterator<float, blas::codeplay_policy>&&)’
~/sycl-blas/include/container/sycl_iterator.h:36:7: note: candidate expects 1 argument, 2 provided
Hi,
building sycl-blas works fine on my system but the tests only work on the host device (that is all tests fail on either
my nvidia or intel gpu).
I see the warning
/home/hschulz/SYCL/ComputeCpp-CE-2.1.0-x86_64-linux-gnu/bin/spirv-ll-tool: /lib64/libtinfo.so.5: no version information available (required by /home/hschulz/SYCL/ComputeCpp-CE-2.1.0-x86_64-linux-gnu/bin/spirv-ll-tool)
at every build step but according to codeplay's support pages this should not be an issue.
Testing e.g.
./test/unittest/blas3_gemm_test --device=nvidia:gpu
results in messages of the form
[----------] Global test environment set-up.
[----------] 72 tests from Gemm/GemmSmallBetaNonZeroLDMatchFloat
[ RUN ] Gemm/GemmSmallBetaNonZeroLDMatchFloat.test/0
Device vendor: NVIDIA Corporation
Device name: GeForce GTX 960M
Device type: gpu
Error: [ComputeCpp:RT0100] Failed to build program (<Build log for program 0xf45840 (size: 65) error : Binary format for key='0', ident='' is not recognized>)
Value mismatch at index 0: -0.814363; expected 101.708
../test/unittest/blas3/blas3_gemm_common.hpp:157: Failure
I am following your instructions to build with cmake using this command, the logfile is attached:
cmake.log
cmake -GNinja ../ -DComputeCpp_DIR=$PWD/../../ComputeCpp-CE-2.1.0-x86_64-linux-gnu/ -DCMAKE_INSTALL_PREFIX=$PWD/../local -DBLAS_ENABLE_TESTING=ON
My system is running Fedora30 and computecpp_info reports:
GLIBC version: 2.29
GLIBCXX: 20190605
This version of libstdc++ is not supported.
[...]
Device is supported : UNTESTED - Untested OS
Bitcode targets : ptx64
CL_DEVICE_NAME : GeForce GTX 960M
CL_DEVICE_VENDOR : NVIDIA Corporation
CL_DRIVER_VERSION : 440.64
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
Device is supported : NO - No bitcode target found
Bitcode targets :
CL_DEVICE_NAME : pthread-Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
CL_DEVICE_VENDOR : GenuineIntel
CL_DRIVER_VERSION : 1.3-pre
CL_DEVICE_TYPE : CL_DEVICE_TYPE_CPU
Device is supported : UNTESTED - Untested OS
Bitcode targets : spir32
CL_DEVICE_NAME : Intel(R) HD Graphics Skylake Halo GT2
CL_DEVICE_VENDOR : Intel
CL_DRIVER_VERSION : 1.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
I heard today that the glibc warning should not matter, too.
I am not sure why things are going wrong and would greatly appreciate any help.
The option BLAS_ENABLE_CONST_INPUT
can be set to OFF, theoretically disabling kernel instantiation with const input buffers. However, this has no effect in some parts of the source. This is because the source C++ checks whether BLAS_ENABLE_CONST_INPUT
is defined, not its value.
return
[ 3%] Building CXX object tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/slide_21.cpp.o
In file included from /usr/include/boost/compute/detail/meta_kernel.hpp:37,
from /usr/include/boost/compute/algorithm/reduce.hpp:18,
from /usr/include/boost/compute/algorithm/accumulate.hpp:19,
from /usr/include/boost/compute/algorithm.hpp:18,
from /usr/include/boost/compute.hpp:14,
from /trisycl/include/triSYCL/detail/global_config.hpp:51,
from /trisycl/include/triSYCL/sycl.hpp:42,
from /trisycl/include/CL/sycl.hpp:10,
from /trisycl/tests/2014-04-21-HPC-GPU_Meetup/slide_14.cpp:6:
/usr/include/boost/compute/memory/svm_ptr.hpp: In member function 'boost::compute::context& boost::compute::svm_ptr::get_context() const':
/usr/include/boost/compute/memory/svm_ptr.hpp:131:16: error: binding reference of type 'boost::compute::context&' to 'const boost::compute::context' discards qualifiers
return m_context;
^~~~~~~~~
In file included from /usr/include/boost/compute/detail/meta_kernel.hpp:37,
from /usr/include/boost/compute/algorithm/reduce.hpp:18,
from /usr/include/boost/compute/algorithm/accumulate.hpp:19,
from /usr/include/boost/compute/algorithm.hpp:18,
from /usr/include/boost/compute.hpp:14,
from /trisycl/include/triSYCL/detail/global_config.hpp:51,
from /trisycl/include/triSYCL/sycl.hpp:42,
from /trisycl/include/CL/sycl.hpp:10,
from /trisycl/tests/2014-04-21-HPC-GPU_Meetup/slide_17-18.cpp:4:
/usr/include/boost/compute/memory/svm_ptr.hpp: In member function 'boost::compute::context& boost::compute::svm_ptr::get_context() const':
/usr/include/boost/compute/memory/svm_ptr.hpp:131:16: error: binding reference of type 'boost::compute::context&' to 'const boost::compute::context' discards qualifiers
return m_context;
^~~~~~~~~
In file included from /usr/include/boost/compute/detail/meta_kernel.hpp:37,
from /usr/include/boost/compute/algorithm/reduce.hpp:18,
from /usr/include/boost/compute/algorithm/accumulate.hpp:19,
from /usr/include/boost/compute/algorithm.hpp:18,
from /usr/include/boost/compute.hpp:14,
from /trisycl/include/triSYCL/detail/global_config.hpp:51,
from /trisycl/include/triSYCL/sycl.hpp:42,
from /trisycl/include/CL/sycl.hpp:10,
from /trisycl/tests/2014-04-21-HPC-GPU_Meetup/slide_7.cpp:4:
/usr/include/boost/compute/memory/svm_ptr.hpp: In member function 'boost::compute::context& boost::compute::svm_ptr::get_context() const':
/usr/include/boost/compute/memory/svm_ptr.hpp:131:16: error: binding reference of type 'boost::compute::context&' to 'const boost::compute::context' discards qualifiers
return m_context;
^~~~~~~~~
In file included from /usr/include/boost/compute/detail/meta_kernel.hpp:37,
from /usr/include/boost/compute/algorithm/reduce.hpp:18,
from /usr/include/boost/compute/algorithm/accumulate.hpp:19,
from /usr/include/boost/compute/algorithm.hpp:18,
from /usr/include/boost/compute.hpp:14,
from /trisycl/include/triSYCL/detail/global_config.hpp:51,
from /trisycl/include/triSYCL/sycl.hpp:42,
from /trisycl/include/CL/sycl.hpp:10,
from /trisycl/tests/2014-04-21-HPC-GPU_Meetup/slide_21.cpp:4:
/usr/include/boost/compute/memory/svm_ptr.hpp: In member function 'boost::compute::context& boost::compute::svm_ptr::get_context() const':
/usr/include/boost/compute/memory/svm_ptr.hpp:131:16: error: binding reference of type 'boost::compute::context&' to 'const boost::compute::context' discards qualifiers
return m_context;
^~~~~~~~~
tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_14.dir/build.make:62: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_14.dir/slide_14.cpp.o' failed
make[2]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_14.dir/slide_14.cpp.o] Error 1
CMakeFiles/Makefile2:1130: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_14.dir/all' failed
make[1]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_14.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_7.dir/build.make:62: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_7.dir/slide_7.cpp.o' failed
make[2]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_7.dir/slide_7.cpp.o] Error 1
CMakeFiles/Makefile2:1167: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_7.dir/all' failed
make[1]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_7.dir/all] Error 2
make[2]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/slide_21.cpp.o] Error 1
tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/build.make:62: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/slide_21.cpp.o' failed
make[1]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/all] Error 2
CMakeFiles/Makefile2:1204: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_21.dir/all' failed
tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_17-18.dir/build.make:62: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_17-18.dir/slide_17-18.cpp.o' failed
make[2]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_17-18.dir/slide_17-18.cpp.o] Error 1
CMakeFiles/Makefile2:1093: recipe for target 'tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_17-18.dir/all' failed
make[1]: *** [tests/2014-04-21-HPC-GPU_Meetup/CMakeFiles/2014-04-21-HPC-GPU_Meetup_slide_17-18.dir/all] Error 2
To get people started quickly, I think this example is helpful.
There are two SYCL kernels: the first kernel produces results read by the second kernel. The first one is a general kernel while the second one uses sycl-blas. It is not clear if we can pass 'output_d' to the sycl-blas function directly. All the sycl-blas examples assume the inputs are initialized on the host before calling 'make_sycl_iterator_buffer'.
Can we declare the input buffers to the sycl-blas functions using the general 'buffer<type, 1>' ?
Thanks for your guide.
sycl:
buffer<float, 1> output_d (n);
sycl-blas
auto a_gpu = blas::make_sycl_iterator_buffer(A, n); // auto a_gpu = output_d ?
An instruction selection error seems to affect both tests after the latest changes on the development branch:
~/sycl-blas/build$ ./tests/blas2_interface_test
i1i1i1i1i1i1i1i1i1i1i1i1i1i1i1i1i1i1LLVM ERROR: Cannot select: 0xf8c920: i32,ch = load 0xf8c710:1, 0xf8d058, 0xd453a0<LD1[undef](align=16)(nontemporal), zext from i1> [ORD=663] [ID=106]
0xf8d058: i64 = Constant<192> [ORD=663] [ID=7]
0xd453a0: i64 = undef [ORD=663] [ID=4]
In function: __OpenCL_SYCL_ea23033f1d994e66743d1606f2793a48_0_stub
SYCL Runtime closed with the following errors:
SYCL objects are still alive while the runtime is shutting down
This is potentially due to the usage of boolean values that are converted into integers. This appears to be a bug in the AMD compiler backend. We can workaround this on SYCLBLAS by avoiding conversions from boolean to integer.
The blas3_interface_test outputs an incorrect result on Intel and AMD GPUs, although it works fine on Intel CPU.
$ ./tests/blas3_interface_test
igdrcl: using XCB-DRI2 authentication...
ERROR!! --> res = 1.35646e+21 , addC = 1.43327e+22 , err = 1.29763e+22
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.