Git Product home page Git Product logo

llvm's Introduction

This is the Intel staging area for llvm.org contributions and the home for Intel LLVM-based projects:

For general contribution process see CONTRIBUTING.md

oneAPI DPC++ compiler

oneAPI logo

SYCL Post Commit Generate Doxygen documentation

The DPC++ is a LLVM-based compiler project that implements compiler and runtime support for the SYCL* language. The project is hosted in the sycl branch and is synced with the tip of the LLVM upstream main branch on a regular basis (revisions delay is usually not more than 1-2 weeks). DPC++ compiler takes everything from LLVM upstream as is, however some modules of LLVM might be not included in the default project build configuration. Additional modules can be enabled by modifying build framework settings.

The DPC++ goal is to support the latest SYCL* standard and work on that is in progress. DPC++ also implements a number of extensions to the SYCL* standard, which can be found in the sycl/doc/extensions directory.

The main purpose of this project is open source collaboration on the DPC++ compiler implementation in LLVM across a variety of architectures, prototyping compiler and runtime library solutions, designing future extensions, and conducting experiments. As the implementation becomes more mature, we try to upstream as much DPC++ support to LLVM main branch as possible. See SYCL upstreaming working group notes for more details.

Note that this project can be used as a technical foundation for some proprietary compiler products, which may leverage implementations from this open source project. One of the examples is Intel(R) oneAPI DPC++ Compiler Features parity between this project and downstream projects is not guaranteed.

Project documentation is available at: DPC++ Documentation.

How to use DPC++

Docker containers

See available containers with pre-built/pre-installed DPC++ compiler at: Containers

Releases

Daily builds of the sycl branch on Linux are available at releases. A few times a year, we publish Release Notes to highlight all important changes made in the project: features implemented and issues addressed. The corresponding builds can be found using search in daily releases. None of the branches in the project are stable or rigorously tested for production quality control, so the quality of these releases is expected to be similar to the daily releases.

Build from sources

See Get Started Guide.

Report a problem

Submit an issue or initiate a discussion.

How to contribute to DPC++

This project welcomes contributions from the community. Please refer to CONTRIBUTING for general guidelines around contributing to this project. You can then see ContributeToDPCPP for DPC++ specific guidelines.

Late-outline OpenMP* and OpenMP* Offload

See openmp branch.

License

See LICENSE for details.

*Other names and brands may be claimed as the property of others.

llvm's People

Contributors

akyrtzi avatar arsenm avatar chandlerc avatar chapuni avatar d0k avatar ddunbar avatar douggregor avatar dwblaikie avatar echristo avatar espindola avatar fhahn avatar isanbard avatar jdevlieghere avatar kazutakahirata avatar klausler avatar labath avatar lattner avatar lebedevri avatar lhames avatar maskray avatar nico avatar nikic avatar preames avatar rksimon avatar rnk avatar rotateright avatar rui314 avatar tkremenek avatar topperc avatar zygoloid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llvm's Issues

[MacOS] directory copy of OpenCL fails

I understand that MacOS is not a priority and that supporting Apple OpenCL is impossible due to lack of SPIR-V. However, it seems that the current testing for OpenCL support is inadequate, as it doesn't determine if the OpenCL CMake finds is sufficient.

I'd like to use POCL for OpenCL CPU support on MacOS, which shouldn't be any different than POCL on Linux. What are the CMake variables to tell the build system to look for POCL instead? I've made many attempts at this but nothing documented on the internet is working.

jrhammon-mac02:build jrhammon$ make VERBOSE=1 sycl-toolchain
/usr/local/Cellar/cmake/3.14.4/bin/cmake -S/Users/jrhammon/Work/OpenCL/ISYCL/llvm/llvm -B/Users/jrhammon/Work/OpenCL/ISYCL/build --check-build-system CMakeFiles/Makefile.cmake 0
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f CMakeFiles/Makefile2 sycl-toolchain
/usr/local/Cellar/cmake/3.14.4/bin/cmake -S/Users/jrhammon/Work/OpenCL/ISYCL/llvm/llvm -B/Users/jrhammon/Work/OpenCL/ISYCL/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/Cellar/cmake/3.14.4/bin/cmake -E cmake_progress_start /Users/jrhammon/Work/OpenCL/ISYCL/build/CMakeFiles 56
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f CMakeFiles/Makefile2 tools/sycl/CMakeFiles/sycl-toolchain.dir/all
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f tools/sycl/CMakeFiles/ocl-icd.dir/build.make tools/sycl/CMakeFiles/ocl-icd.dir/depend
cd /Users/jrhammon/Work/OpenCL/ISYCL/build && /usr/local/Cellar/cmake/3.14.4/bin/cmake -E cmake_depends "Unix Makefiles" /Users/jrhammon/Work/OpenCL/ISYCL/llvm/llvm /Users/jrhammon/Work/OpenCL/ISYCL/llvm/sycl /Users/jrhammon/Work/OpenCL/ISYCL/build /Users/jrhammon/Work/OpenCL/ISYCL/build/tools/sycl /Users/jrhammon/Work/OpenCL/ISYCL/build/tools/sycl/CMakeFiles/ocl-icd.dir/DependInfo.cmake --color=
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f tools/sycl/CMakeFiles/ocl-icd.dir/build.make tools/sycl/CMakeFiles/ocl-icd.dir/build
[  0%] Copying OpenCL ICD Loader ...
[  0%] Built target ocl-icd
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f tools/sycl/CMakeFiles/ocl-headers.dir/build.make tools/sycl/CMakeFiles/ocl-headers.dir/depend
cd /Users/jrhammon/Work/OpenCL/ISYCL/build && /usr/local/Cellar/cmake/3.14.4/bin/cmake -E cmake_depends "Unix Makefiles" /Users/jrhammon/Work/OpenCL/ISYCL/llvm/llvm /Users/jrhammon/Work/OpenCL/ISYCL/llvm/sycl /Users/jrhammon/Work/OpenCL/ISYCL/build /Users/jrhammon/Work/OpenCL/ISYCL/build/tools/sycl /Users/jrhammon/Work/OpenCL/ISYCL/build/tools/sycl/CMakeFiles/ocl-headers.dir/DependInfo.cmake --color=
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f tools/sycl/CMakeFiles/ocl-headers.dir/build.make tools/sycl/CMakeFiles/ocl-headers.dir/build
[  0%] Copying OpenCL headers ...
cd /Users/jrhammon/Work/OpenCL/ISYCL/build/tools/sycl && /usr/local/Cellar/cmake/3.14.4/bin/cmake -E copy_directory /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/System/Library/Frameworks/OpenCL.framework/CL /Users/jrhammon/Work/OpenCL/ISYCL/build/./lib/clang/9.0.0/include/CL
Error copying directory from "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/System/Library/Frameworks/OpenCL.framework/CL" to "/Users/jrhammon/Work/OpenCL/ISYCL/build/./lib/clang/9.0.0/include/CL".
make[3]: *** [tools/sycl/CMakeFiles/ocl-headers] Error 1
make[2]: *** [tools/sycl/CMakeFiles/ocl-headers.dir/all] Error 2
make[1]: *** [tools/sycl/CMakeFiles/sycl-toolchain.dir/rule] Error 2
make: *** [sycl-toolchain] Error 2

Steps to reproduce

#!/bin/bash

export SYCL_HOME=$HOME/Work/OpenCL/ISYCL/llvm

mkdir -p $SYCL_HOME

cd $SYCL_HOME && \
    time git pull || \
    time git clone https://github.com/intel/llvm.git $SYCL_HOME

rm -rf $SYCL_HOME/../build

mkdir -p $SYCL_HOME/../build && \
    cd $SYCL_HOME/../build && \
    time cmake \
        -DCMAKE_INSTALL_PREFIX=/opt/isycl \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_ENABLE_PROJECTS="clang;llvm-spirv;sycl" \
        -DLLVM_EXTERNAL_PROJECTS="llvm-spirv;sycl" \
        -DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$SYCL_HOME/sycl \
        -DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$SYCL_HOME/llvm-spirv \
        -DLLVM_TOOL_SYCL_BUILD=ON \
        -DLLVM_TOOL_LLVM_SPIRV_BUILD=ON \
        $SYCL_HOME/llvm

time make -j4 sycl-toolchain

[SYCL] <type_traits> compile error with -std=gnu++11 flag

An odd one here: adding the -std=gnu++11 flag to the sycl compile command produces the error below. (This occurred unexpectedly within a build script in our project.) The sycl-gnu++11.cpp input file contains only #include <type_traits> and the invocation is:

clang++ -std=c++11 -fsycl -std=gnu++11 -c sycl-gnu++11.cpp

In file included from sycl-gnu++11.cpp:1:
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/type_traits:335:39: error: __float128 is not supported on this target
    struct __is_floating_point_helper<__float128>
                                      ^
1 error generated.
In file included from sycl-gnu++11.cpp:1:
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/type_traits:335:39: error: __float128 is not supported on this target
    struct __is_floating_point_helper<__float128>
                                      ^
1 error generated.

[SYCL] Clang Driver ICE when no input file found

I get an ICE at the moment when I compile using the following command and the file is not found:

$ISYCL_BIN_DIR/clang++ -std=c++11 -fsycl no_file.cpp -o no_file -lOpenCL

I've added the stack dump at the bottom if that's of any use.

If I remove the -lOpenCL component, I do not recieve a stack dump and get the desired output:

$ISYCL_BIN_DIR/clang++ -std=c++11 -fsycl no_file.cpp -o no_file

clang-9: error: no such file or directory: 'no_file.cpp'
clang-9: error: no input files

This may be something you guys have already noticed unless it's an environment or understanding problem on my end? Hopefully it's not a user error on my part in this case, if it is I apologize.


**clang-9: error: no such file or directory: 'no_file.cpp'**
Stack dump:
0.	Program arguments: /storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang++ -std=c++11 -fsycl no_file.cpp -o no_file -lOpenCL 
1.	Compilation construction
2.	Building compilation jobs
3.	Building compilation jobs
4.	Building compilation jobs
5.	Building compilation jobs
6.	Computing output path
 #0 0x000055b529dd325a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x240f25a)
 #1 0x000055b529dd1134 llvm::sys::RunSignalHandlers() (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x240d134)
 #2 0x000055b529dd12b5 SignalHandler(int) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x240d2b5)
 #3 0x00007fa8319dcdd0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12dd0)
 #4 0x00007fa8311e2f11 /build/glibc-B9XfQf/glibc-2.28/string/../sysdeps/x86_64/multiarch/strlen-avx2.S:62:0
 #5 0x000055b52a304747 clang::driver::Driver::GetNamedOutputPath(clang::driver::Compilation&, clang::driver::JobAction const&, char const*, llvm::StringRef, bool, bool, llvm::StringRef) const (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x2940747)
 #6 0x000055b52a3149e5 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x29509e5)
 #7 0x000055b52a3155c7 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x29515c7)
 #8 0x000055b52a313666 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x294f666)
 #9 0x000055b52a3155c7 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x29515c7)
#10 0x000055b52a3179c4 void llvm::function_ref<void (clang::driver::Action*, clang::driver::ToolChain const*, char const*)>::callback_fn<clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const::'lambda0'(clang::driver::Action*, clang::driver::ToolChain const*, char const*)>(long, clang::driver::Action*, clang::driver::ToolChain const*, char const*) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x29539c4)
#11 0x000055b52a401404 clang::driver::OffloadAction::doOnEachDeviceDependence(llvm::function_ref<void (clang::driver::Action*, clang::driver::ToolChain const*, char const*)> const&) const (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x2a3d404)
#12 0x000055b52a312e8a clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x294ee8a)
#13 0x000055b52a3155c7 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x29515c7)
#14 0x000055b52a315aab clang::driver::Driver::BuildJobs(clang::driver::Compilation&) const (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x2951aab)
#15 0x000055b52a317475 clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0x2953475)
#16 0x000055b528479875 main (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0xab5875)
#17 0x00007fa83108309b __libc_start_main /build/glibc-B9XfQf/glibc-2.28/csu/../csu/libc-start.c:308:16
#18 0x000055b52850c2ea _start (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/clang+++0xb482ea)
Segmentation fault

[SYCL] noexcept specifier device side ICE in LLVM-SPIRV

#include <CL/sycl.hpp>
using namespace cl::sycl;
class exceptions_on_device;

template <typename T>
void invoke(T func) noexcept {
  func();
}

int main() {
  queue q;
  buffer<int> ob(range<1>{1});
  q.submit([&](handler &cgh) {
      auto wb = ob.get_access<access::mode::write>(cgh);
      cgh.single_task<exceptions_on_device>([=]() {
        invoke([&]() {
            wb[0] += 2;
          }
        );
      });
  });
  q.wait();
  return 0;
}

So, in the example above I'm passing a lambda to a function marked with a noexcept specifier that then invokes the lambda. So nothing fancy, but perhaps I'm doing something silly or the above is not legal SYCL in some way.

In either case case, this example will compile without the noexcept on the invoke function, but ICE with noexcept on the invoke function (it will also ICE if I put noexcept on the passed in lambda as well). I've attached the ICE to the bottom of the issue. I imagine even if this isn't legal in someway it shouldn't lead to an ICE, perhaps a diagnostic of some kind?

The compiler invocation I used in this case was:
$ISYCL_BIN_DIR/clang++ -std=c++11 -fsycl device_side_exception.cpp -o device_side_exception -lOpenCL

I dug into this one a bit, the LLVM-SPIRV translator seems to break on an LLVM landing pad call instruction and it's possible to get the example to compile with noexcept on the function by appending -fno-exceptions alongside the previous command.

I did a quick fix based on this and modified line 2839 of clang/lib/Frontend/CompilerInvocation.cpp to disable exceptions for the device similar to OpenMP+NVPTX and OpenCL C++. Maybe this isn't an ideal fix in this case however, and I'm missing the bigger picture though.


**$ISYCL_BIN_DIR/clang++ -std=c++11 -fsycl  device_side_exception.cpp -o device_side_exception -lOpenCL**

Stack dump:
0.	Program arguments: /storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv -spirv-no-deref-attr -o /tmp/device_side_exception-136eb4.out /tmp/spir64-linked-2d4ccd.bc 
1.	Running pass 'LLVMToSPIRV' on module '/tmp/spir64-linked-2d4ccd.bc'.
 #0 0x000055c117a3a65a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0x41465a)
 #1 0x000055c117a38554 llvm::sys::RunSignalHandlers() (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0x412554)
 #2 0x000055c117a386d5 SignalHandler(int) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0x4126d5)
 #3 0x00007fdad51fcdd0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12dd0)
 #4 0x000055c1179cac60 llvm::Value::getName() const (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0x3a4c60)
 #5 0x000055c11772365b SPIRV::LLVMToSPIRV::transCallInst(llvm::CallInst*, SPIRV::SPIRVBasicBlock*) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0xfd65b)
 #6 0x000055c117724ef2 SPIRV::LLVMToSPIRV::transValueWithoutDecoration(llvm::Value*, SPIRV::SPIRVBasicBlock*, bool) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0xfeef2)
 #7 0x000055c117725014 SPIRV::LLVMToSPIRV::transValue(llvm::Value*, SPIRV::SPIRVBasicBlock*, bool) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0xff014)
 #8 0x000055c117729b31 SPIRV::LLVMToSPIRV::transFunction(llvm::Function*) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0x103b31)
 #9 0x000055c117729f70 SPIRV::LLVMToSPIRV::translate() (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0x103f70)
#10 0x000055c11772a0bc SPIRV::LLVMToSPIRV::runOnModule(llvm::Module&) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0x1040bc)
#11 0x000055c1179a11e2 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0x37b1e2)
#12 0x000055c11772a13d llvm::writeSpirv(llvm::Module*, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0x10413d)
#13 0x000055c1176c9e6d convertLLVMToSPIRV() (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0xa3e6d)
#14 0x000055c1176b8377 main (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0x92377)
#15 0x00007fdad48a909b __libc_start_main /build/glibc-B9XfQf/glibc-2.28/csu/../csu/libc-start.c:308:16
#16 0x000055c1176c90ba _start (/storage/ogozillo/intel-sycl/no-mod/sycl/build/bin/llvm-spirv+0xa30ba)
clang-9: error: unable to execute command: Segmentation fault
/usr/bin/ld: /tmp/device_side_exception-0688e3.o: file not recognized: file truncated
clang-9: error: sycl-link command failed due to signal (use -v to see invocation)
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
clang version 9.0.0 (https://github.com/intel/llvm.git 39e2c74a39cd72b9830e8945bc03ac73d20ed410)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /storage/ogozillo/intel-sycl/no-mod/sycl/build/bin
clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.

[SYCL] should inlining work?

This is more of a question, but should I expect inlining to work?

clang++ -O1 -finline-functions -std=c++11 -fsycl simple-sycl-app.cpp -o simple-sycl-app -lsycl -lOpenCL

should the SYCL get all methods inlined?

Compiler crash on ill-formed program

The attached source causes the compiler to crash. I'm on the latest commit (55e9753), using an otherwise standard Ubuntu 19.04 (i.e. gcc 8.3.0 and associated libstdc++, Boost 1.67.0). Apparently the crash is caused by a pointer cast (Ctrl+F for "evil line") as it doesn't appear when returning 0. It does crash when returning nullptr, though.

After successfully reporting the issue in the code the compiler crashes with the following messages:

First pass

Stack dump:
0.	Program arguments: /home/jan/software/sycl/intel/bin/clang-9 -cc1 -triple spir64-unknown-linux-sycldevice -std=c++11 -fsycl-is-device -aux-triple x86_64-unknown-linux-gnu -disable-llvm-passes -emit-llvm-bc -disable-free -disable-llvm-verifier -discard-value-names -main-file-name clang-crash.cpp -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -mconstructor-aliases -fuse-init-array -dwarf-column-info -debugger-tuning=gdb -resource-dir /home/jan/software/sycl/intel/lib/clang/9.0.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/backward -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/backward -internal-isystem /usr/local/include -internal-isystem /home/jan/software/sycl/intel/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /home/jan/software/sycl/intel/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -std=c++2a -fdeprecated-macro -fdebug-compilation-dir /home/jan/workspace/sycl-issues -ferror-limit 19 -fmessage-length 0 -fno-implicit-modules -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/clang-crash-edfeb3.o -x c++ clang-crash.cpp -faddrsig 
1.	<eof> parser at end of file
 #0 0x0000555d3489f15a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/home/jan/software/sycl/intel/bin/clang-9+0x257d15a)
 #1 0x0000555d3489ce74 llvm::sys::RunSignalHandlers() (/home/jan/software/sycl/intel/bin/clang-9+0x257ae74)
 #2 0x0000555d3489cff5 SignalHandler(int) (/home/jan/software/sycl/intel/bin/clang-9+0x257aff5)
 #3 0x00007fade0d0cf40 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x13f40)
 #4 0x0000555d364d581d MarkDeviceFunction::CheckSYCLType(clang::QualType, clang::SourceRange) (/home/jan/software/sycl/intel/bin/clang-9+0x41b381d)
 #5 0x0000555d365046e1 clang::RecursiveASTVisitor<MarkDeviceFunction>::TraverseTypeAliasDecl(clang::TypeAliasDecl*) (/home/jan/software/sycl/intel/bin/clang-9+0x41e26e1)
 #6 0x0000555d364f66ad clang::RecursiveASTVisitor<MarkDeviceFunction>::TraverseDecl(clang::Decl*) (/home/jan/software/sycl/intel/bin/clang-9+0x41d46ad)
 #7 0x0000555d364f8908 clang::RecursiveASTVisitor<MarkDeviceFunction>::dataTraverseNode(clang::Stmt*, llvm::SmallVectorImpl<llvm::PointerIntPair<clang::Stmt*, 1u, bool, llvm::PointerLikeTypeTraits<clang::Stmt*>, llvm::PointerIntPairInfo<clang::Stmt*, 1u, llvm::PointerLikeTypeTraits<clang::Stmt*> > > >*) (/home/jan/software/sycl/intel/bin/clang-9+0x41d6908)
 #8 0x0000555d36501cd5 clang::RecursiveASTVisitor<MarkDeviceFunction>::TraverseStmt(clang::Stmt*, llvm::SmallVectorImpl<llvm::PointerIntPair<clang::Stmt*, 1u, bool, llvm::PointerLikeTypeTraits<clang::Stmt*>, llvm::PointerIntPairInfo<clang::Stmt*, 1u, llvm::PointerLikeTypeTraits<clang::Stmt*> > > >*) (.constprop.6100) (/home/jan/software/sycl/intel/bin/clang-9+0x41dfcd5)
 #9 0x0000555d3650a869 clang::Sema::MarkDevice() (/home/jan/software/sycl/intel/bin/clang-9+0x41e8869)
#10 0x0000555d36049272 clang::Sema::ActOnEndOfTranslationUnitFragment(clang::Sema::TUFragmentKind) (.part.1414) (/home/jan/software/sycl/intel/bin/clang-9+0x3d27272)
#11 0x0000555d36049339 clang::Sema::ActOnEndOfTranslationUnit() (/home/jan/software/sycl/intel/bin/clang-9+0x3d27339)
#12 0x0000555d35f22a01 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, bool) (/home/jan/software/sycl/intel/bin/clang-9+0x3c00a01)
#13 0x0000555d35f16528 clang::ParseAST(clang::Sema&, bool, bool) (/home/jan/software/sycl/intel/bin/clang-9+0x3bf4528)
#14 0x0000555d354b1557 clang::CodeGenAction::ExecuteAction() (/home/jan/software/sycl/intel/bin/clang-9+0x318f557)
#15 0x0000555d34f6dc56 clang::FrontendAction::Execute() (/home/jan/software/sycl/intel/bin/clang-9+0x2c4bc56)
#16 0x0000555d34f3149e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/jan/software/sycl/intel/bin/clang-9+0x2c0f49e)
#17 0x0000555d3502ba7b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/jan/software/sycl/intel/bin/clang-9+0x2d09a7b)
#18 0x0000555d32eeae3f cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/jan/software/sycl/intel/bin/clang-9+0xbc8e3f)
#19 0x0000555d32e63bb4 main (/home/jan/software/sycl/intel/bin/clang-9+0xb41bb4)
#20 0x00007fade07e4b6b __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:342:3
#21 0x0000555d32ee89ba _start (/home/jan/software/sycl/intel/bin/clang-9+0xbc69ba)
clang-9: error: unable to execute command: Segmentation fault (core dumped)

Second pass

Stack dump:
0.	Program arguments: /home/jan/software/sycl/intel/bin/clang-9 -cc1 -triple spir64-unknown-linux-sycldevice -std=c++11 -fsycl-is-device -aux-triple x86_64-unknown-linux-gnu -disable-llvm-passes -fsyntax-only -disable-free -disable-llvm-verifier -discard-value-names -main-file-name clang-crash.cpp -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -mconstructor-aliases -fuse-init-array -dwarf-column-info -debugger-tuning=gdb -resource-dir /home/jan/software/sycl/intel/lib/clang/9.0.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/backward -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/backward -internal-isystem /usr/local/include -internal-isystem /home/jan/software/sycl/intel/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /home/jan/software/sycl/intel/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -std=c++2a -fdeprecated-macro -fdebug-compilation-dir /home/jan/workspace/sycl-issues -ferror-limit 19 -fmessage-length 0 -fno-implicit-modules -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/clang-crash-ad09bb.h -x c++ clang-crash.cpp -fsycl-int-header=/tmp/clang-crash-ad09bb.h -faddrsig 
1.	<eof> parser at end of file
 #0 0x000056399827e15a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/home/jan/software/sycl/intel/bin/clang-9+0x257d15a)
 #1 0x000056399827be74 llvm::sys::RunSignalHandlers() (/home/jan/software/sycl/intel/bin/clang-9+0x257ae74)
 #2 0x000056399827bff5 SignalHandler(int) (/home/jan/software/sycl/intel/bin/clang-9+0x257aff5)
 #3 0x00007f23ba731f40 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x13f40)
 #4 0x0000563999eb481d MarkDeviceFunction::CheckSYCLType(clang::QualType, clang::SourceRange) (/home/jan/software/sycl/intel/bin/clang-9+0x41b381d)
 #5 0x0000563999ee36e1 clang::RecursiveASTVisitor<MarkDeviceFunction>::TraverseTypeAliasDecl(clang::TypeAliasDecl*) (/home/jan/software/sycl/intel/bin/clang-9+0x41e26e1)
 #6 0x0000563999ed56ad clang::RecursiveASTVisitor<MarkDeviceFunction>::TraverseDecl(clang::Decl*) (/home/jan/software/sycl/intel/bin/clang-9+0x41d46ad)
 #7 0x0000563999ed7908 clang::RecursiveASTVisitor<MarkDeviceFunction>::dataTraverseNode(clang::Stmt*, llvm::SmallVectorImpl<llvm::PointerIntPair<clang::Stmt*, 1u, bool, llvm::PointerLikeTypeTraits<clang::Stmt*>, llvm::PointerIntPairInfo<clang::Stmt*, 1u, llvm::PointerLikeTypeTraits<clang::Stmt*> > > >*) (/home/jan/software/sycl/intel/bin/clang-9+0x41d6908)
 #8 0x0000563999ee0cd5 clang::RecursiveASTVisitor<MarkDeviceFunction>::TraverseStmt(clang::Stmt*, llvm::SmallVectorImpl<llvm::PointerIntPair<clang::Stmt*, 1u, bool, llvm::PointerLikeTypeTraits<clang::Stmt*>, llvm::PointerIntPairInfo<clang::Stmt*, 1u, llvm::PointerLikeTypeTraits<clang::Stmt*> > > >*) (.constprop.6100) (/home/jan/software/sycl/intel/bin/clang-9+0x41dfcd5)
 #9 0x0000563999ee9869 clang::Sema::MarkDevice() (/home/jan/software/sycl/intel/bin/clang-9+0x41e8869)
#10 0x0000563999a28272 clang::Sema::ActOnEndOfTranslationUnitFragment(clang::Sema::TUFragmentKind) (.part.1414) (/home/jan/software/sycl/intel/bin/clang-9+0x3d27272)
#11 0x0000563999a28339 clang::Sema::ActOnEndOfTranslationUnit() (/home/jan/software/sycl/intel/bin/clang-9+0x3d27339)
#12 0x0000563999901a01 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, bool) (/home/jan/software/sycl/intel/bin/clang-9+0x3c00a01)
#13 0x00005639998f5528 clang::ParseAST(clang::Sema&, bool, bool) (/home/jan/software/sycl/intel/bin/clang-9+0x3bf4528)
#14 0x000056399894cc56 clang::FrontendAction::Execute() (/home/jan/software/sycl/intel/bin/clang-9+0x2c4bc56)
#15 0x000056399891049e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/jan/software/sycl/intel/bin/clang-9+0x2c0f49e)
#16 0x0000563998a0aa7b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/jan/software/sycl/intel/bin/clang-9+0x2d09a7b)
#17 0x00005639968c9e3f cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/jan/software/sycl/intel/bin/clang-9+0xbc8e3f)
#18 0x0000563996842bb4 main (/home/jan/software/sycl/intel/bin/clang-9+0xb41bb4)
#19 0x00007f23ba209b6b __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:342:3
#20 0x00005639968c79ba _start (/home/jan/software/sycl/intel/bin/clang-9+0xbc69ba)
clang-9: error: unable to execute command: Segmentation fault (core dumped)
clang-9: error: clang frontend command failed due to signal (use -v to see invocation)
clang-9: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 9.0.0 (https://github.com/intel/llvm.git 55e9753c087d2e3f735d60d09a41dc951ace40c6)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/jan/software/sycl/intel/bin
clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-9: note: diagnostic msg: Error generating preprocessed source(s).

Source code

#include <tuple>
#include <utility>

#include <boost/hana/transform.hpp>
#include <boost/hana/ext/std/tuple.hpp>
#include <boost/hana/fwd/for_each.hpp>
#include <boost/hana/fwd/unpack.hpp>

#include <CL/sycl.hpp>

struct general {};
struct special : general {};
template <typename> struct acc_t { using type = int; };

template <typename Val,
          typename acc_t<decltype(std::declval<Val>().is_placeholder())>::type = 0>
auto require(cl::sycl::handler& cgh, Val&& val, special)
{
    cgh.require(val);
}

template <typename Val>
auto require(cl::sycl::handler&, Val&& val, general)
{}

template <typename Val,
          typename acc_t<decltype(std::declval<Val>().get_pointer())>::type = 0>
auto get_pointer(Val&& val, special)
{
    auto ptr = val.get_pointer();
    return static_cast<typename decltype(ptr)::element_type*>(ptr); // evil line
}

template <typename Val>
auto get_pointer(Val&& val, general)
{
    return val;
}

template <typename Func, typename... Args>
struct generic_kernel
{
    Func m_func;
    std::tuple<Args...> m_args;

    generic_kernel(Func&& func, Args&&... args)
    : m_func{func}, m_args{args...}
    {}

    auto operator()(cl::sycl::handler& cgh)
    {
        boost::hana::for_each(m_args, [&](auto&& arg)
        {
            require(cgh, std::forward<decltype(arg)>(arg), special{});
        });

        // copy by value to prevent 'this' pointer in kernel
        auto k_func = m_func;
        auto k_args = m_args;

        cgh.single_task<class dummy>([=]()
        {
            auto transformed_args = boost::hana::transform(k_args,
            [](auto&& val)
            {
                return get_pointer(std::forward<decltype(val)>(val), special{});
            });

            std::apply(k_func, transformed_args);
        });
    }
};

struct vec_add
{
    auto operator()(int* a, int* b, int* c, std::size_t size) const
    {
        for(auto i = 0; i < size; ++i)
            c[i] = a[i] + b[i];
    }
};

int main()
{
    auto queue = cl::sycl::queue{};
    
    using buf_type = cl::sycl::buffer<int, 1>;
    using read_acc = cl::sycl::accessor<int, 1, cl::sycl::access::mode::read,
                                                cl::sycl::access::target::global_buffer,
                                                cl::sycl::access::placeholder::true_t>;
    using write_acc = cl::sycl::accessor<int, 1, cl::sycl::access::mode::write,
                                                 cl::sycl::access::target::global_buffer,
                                                 cl::sycl::access::placeholder::true_t>;

    auto buf_a = buf_type{1024ul};
    auto buf_b = buf_type{1024ul};
    auto buf_c = buf_type{1024ul};

    auto k = generic_kernel{vec_add{}, read_acc{buf_a}, read_acc{buf_b}, write_acc{buf_c}, 1024ul};
    queue.submit(k);
    queue.wait();

    return 0;
}

Command line

$ clang++ -std=c++2a -fsycl clang-crash.cpp -lOpenCL

[SYCL] [SPEC] USM Feedback

Some comments, questions and feedback about the SYCL USM proposal, very interesting work!

1. Use context instead of device?

All malloc functions target a specific device in the system. However, allocations on SYCL/OpenCL are bound to a context, rather than a specific device. For example, In some platforms, two devices sharing an OpenCL context can also share memory allocations. It is also possible for the same device to have different context's with different allocations on them.

From a SYCL implementation perspective, for the SYCL Runtime to be able to track USM allocations (at least to be aware of their existence and enable conversion to sycl::buffers).
It would be useful to understand to which SYCL context a given allocation belongs to.

This will simplify SYCL implementations "emulating" USM behavior using existing OpenCL buffers (e.g. like we do with the virtual pointer utility in the SDK), because existing allocations can be implemented alongside traditional cl_mem objects

2. Use namespace instead of sycl_ prefix

Is there any particular reason for all functions being pre-fixed with sycl_ ? Seems more natural to use a namespace (sycl::allocate?) in C++. It can still be used as a replacement of any allocation function in the same way, but allows for C++ users to write more generic code.

3. Default selection of device*

The allocation function forms that take no device as parameter is said to "use the device selected by the default selector on success".

However, Note that there is no guarantee on the SYCL specification that a default device selector will chose every time the same device: This means that two consecutive sycl_malloc may put data on different devices.

4. sycl_memcpy and sycl_memset interface*

4.1 Is there any particular reason sycl_memcpy on the handler (a) needs the sycl_ prefix and (b) cannot be done simply by overloading the existing copy method?

4.2. There is no other explicit operation in the SYCL queue, so the direct sycl_memcpy queue operation seems odd. It makes sense for the in-order queue to have it there, but this is adding additional functionality to the SYCL queue for USM that is not matched with buffers at this point.

5. sycl_mem_advise` advice parameter

Is there any particular reason why the device-defined advice for the specified allocation is of type int? Can it not be a template type so implementations can use whatever they prefer?
Ideally from my point of view, this should be using the SYCL properties mechanism so we have a more extensible (and coherent) interface for specifying custom behaviours on the API.

6. get_pointer_info query:

memory::allocation_type returns the type of allocation, but would it be possible to return more information? In particular, in which device/context a given pointer has been allocated can be useful to track allocations on different devices that do not necessarily can share them

7. Table 1, USM device information descriptors

If I understand correctly, the device descriptors additional properties that can be queried from the get_info method of the device class. If so:

7.1 Wording of the table indicates "adds a requirement" but this are info queries. Seems wording in general should change to indicate that (I can do PR if this helps)

7.2 What is the expected value of "info::memory::shared_granularity" for non-shared allocations? should this raise an error or be 0?

7.3 info::memory::valid_shared_devices returns a vector of device objects that can access a shared allocation. Shouldn't this devices be part of the same context anyway? If that is the case, why not simply return a context? See point 8 for details.

8. Multiple devices and USM

Its not clear to me at this point how multiple device allocations will work, when allocations are possible and when they can migrate across devices.
Seems to me it will be clearer if USM is associated with SYCL context objects rather than individual devices. A SYCL context can encapsulate one or multiple devices, so all devices in a given context will share the same USM allocations. This doesn't affect the simple interfaces for sycl_malloc that don't take a device, and will only require one extra step from users to create a context before using the interfaces currently taking a device.

The info queries for "info::memory::valid_shared_devices" are now unnecessary, since, by definition, all devices on the SYCL context will be able to share the allocation.

A SYCL user that wants to ensure the allocations and the queue are using the same underlying resources, can keep the context alive and use it to create the queue, e.g:

// Platform with two devices that can share allocations
// Normal context creation in SYCL, context associated with the default device(s)
sycl::context myContext{default_device_selector()};
// Allocation bound to all devices on said context
void * myPointer = sycl_malloc(sizeof(float)*1024, myContext);
// SYCL queue created on the same context,
sycl::queue myQueue(myContext, myContext.get_devices()[0]);
sycl::queue myQueue(myContext, myContext.get_devices()[1]);

This has the associated benefit the context can be used to track USM allocations, which simplifies some operations such as keeping track of used memory from the SYCL runtime.

9. Conversions between USM pointers and Buffers

9.1 Why a new use_usm_pointer property and not use the existing use_host_ptr one?
9.2 host_no_access is something we implemented as a vendor extension in https://github.com/codeplaysoftware/standards-proposals/blob/master/host_access/sycl-1.2.1/host_access.md which may give some more flexibility

10. SYCL scheduling - DAGs

10.1 The example interface uses sycl_malloc with a template parameter which is not described in the sections above. I rather prefer that format than the different malloc functions :-)

11. Kernel capturing pointers:

The assumption here is that USM refers to allocations in what OpenCL would call "global" memory space.

11.1 SYCL 1.2.1 Section 6.3 restrictions on kernels states that:

[...] Structures containing pointers may be shared but the value of any pointer passed between SYCL devices or between the host and a SYCL device is undefined

However, when USM is available, pointers captured by lambdas are USM pointers - not undefined. This is a significant change on SYCL applications which particularly affects library developers: Even when they don't write their kernels to support USM, the code may be compiled with USM support. This means that pointers that were meaningless and ignored before now are expected to map to global address space.

11.2 In SYCL 1.2.1 Section 6.8:

[...] If no other rule above can be applied to a declaration of a pointer, then it is assumed to be in the private address space. This default assumption is expected to change to be the generic address space for OpenCL versions that support the generic address space.

In the case, the pointers captured by the kernel lambda will be pointers to global memory, rather than private. This changes the address-space deduction rules, which can change what routines get called further down the line (e.g. calling __global specializations vs __private ones).

If USM is relying on generic pointer support this is less of a problem, but we have then to be a bit careful with the potential fragmentation of the ecosystem: Some kernels will be written from USM, and need to be compiled with such support enabled, and some others will not.

11.3 Is USM a feature that is known at compile time? (e.g. compilation flag)

12. USM to Multi pointer:

12.1 Is it possible to convert a USM pointer to a multi-pointer? do you plan to offer an interface for that?

[SYCL] Compiler cast constant address space to generic address space

The following code compiles without any complain

template <typename T, size_t N>
void test(const std::array<T, N> &VA, const std::array<T, N> &VB,
          std::array<T, N> &VC, bool useCst) {
  cl::sycl::queue deviceQueue;
  cl::sycl::range<1> numOfItems{N};
  cl::sycl::buffer<T, 1> bufferA(VA.data(), numOfItems);
  cl::sycl::buffer<T, 1> bufferB(VB.data(), numOfItems);
  cl::sycl::buffer<T, 1> bufferC(VC.data(), numOfItems);

  deviceQueue.submit([&](cl::sycl::handler &cgh) {
    auto accessorA = bufferA.template get_access<sycl_read>(cgh);
    cl::sycl::accessor<T, 1, cl::sycl::access::mode::read,
                       cl::sycl::access::target::constant_buffer,
                       cl::sycl::access::placeholder::false_t>
        accessorB(bufferB, cgh);
    auto accessorC = bufferC.template get_access<sycl_write>(cgh);

    cgh.parallel_for<class Test<T>>(numOfItems,
    [=](cl::sycl::id<1> wiID) {
      T *my_ptr = useCst ? &accessorB.get_pointer()[wiID[0]]
                         : &accessorC.get_pointer()[wiID[0]];
      *my_ptr = accessorA[wiID];
    });
  });
}

Although this force an illegal address space cast as the SPIR-V validator says:

error: line 491: Expected input to have storage class Generic: GenericCastToPtr
  %372 = OpGenericCastToPtr %_ptr_Function_uint %371

[SYCL] Support for cross-compilation?

A quick look at the code suggests that cross-compiling SYCL apps isn't currently possible. Is there any plan to add support? x86->ARM would be particularly interesting.

[SYCL] Possible bug/typo in GetStartedWithSYCLCompiler.md example

A little bit of a silly issue possibly and I may be incorrect and sorry if I am, but I think the NEOGPUDeviceSelector example is perhaps wrong, the following line is using the find function of std::string:

return Device.is_gpu() && DeviceName.find("HD Graphics NEO") ? 1 : -1;

But I think it's assuming the result is a bool that will return true if its found and false otherwise or at least it reads that way at a glance. In that case the result of the overall expression would be true and return 1 if it was a GPU and the correct device.

But I believe find returns a size_t (size_type) and if the string is found its returns the start index of the string, otherwise it returns the maximum value size_type can contain.

So I think in this case the ternary operator is going to do the opposite of what the intent is (or at least at glance value its the opposite) which is return -1 if the value is found (as find returns a 0). Whereas in every other case the ternary will return -1 (as find returns a positive value the maximum of size_type).

So because of this I think the whole statements doing the opposite of what it looks like is intended and return -1 whenever the device is found and 1 every other time it's invoked.

I think changing it to the below will give the correct result:

return Device.is_gpu() && (DeviceName.find("HD Graphics NEO") != std::string::npos) ? 1 : -1;

Got a little bit carried away and could have written this a lot more concisely, sorry!

[SYCL] Cannot compile some standard library functions in SYCL mode

The following (silly) code:

#include <iostream>
#include <vector>
#include <algorithm>

int main () {
  std::vector<int> foo;

  foo.emplace_back(33);

  std::for_each(std::begin(foo), std::end(foo), [&](int& elem) {
      std::cout << elem << std::endl;
  });

  return foo.size();
};

Fails to compile in SYCL mode:

$ intel_sycl -std=c++11 -fsycl  k.cpp  -lOpenCL -lsycl
In file included from k.cpp:3:
/home/ruyman/Projects/sycl/build/lib/clang/8.0.0/include/sycl_wrappers/algorithm:26:10: error: 
      function 'std::for_each<__gnu_cxx::__normal_iterator<int *, std::vector<int,
      std::allocator<int> > >, (lambda at k.cpp:10:49)>' is used but not defined in this
      translation unit, and cannot be defined in any other translation unit because its type does
      not have linkage
Function for_each(InputIterator first, InputIterator last, Function f);
         ^
k.cpp:10:8: note: used here
  std::for_each(std::begin(foo), std::end(foo), [&](int& elem) {
       ^
1 error generated.
In file included from k.cpp:3:
/home/ruyman/Projects/sycl/build/lib/clang/8.0.0/include/sycl_wrappers/algorithm:26:10: error: 
      function 'std::for_each<__gnu_cxx::__normal_iterator<int *, std::vector<int,
      std::allocator<int> > >, (lambda at k.cpp:10:49)>' is used but not defined in this
      translation unit, and cannot be defined in any other translation unit because its type does
      not have linkage
Function for_each(InputIterator first, InputIterator last, Function f);
         ^
k.cpp:10:8: note: used here
  std::for_each(std::begin(foo), std::end(foo), [&](int& elem) {
       ^
1 error generated.

but works without SYCL on the command line:

$ intel_sycl -std=c++11  k.cpp  -lOpenCL -lsycl
$ 

Is this expected?
Am I correct in understanding that the SYCL mode requires different C++ headers, and that some alias are missing?
Is there any particular reason the normal system headers cannot be used? Note there is no SYCL code in that example.

How to make SYCL easily extensible without changing Clang

Since SYCL is focusing on heterogeneous computing and bare-metal efficiency in modern C++, it will require a lot of extensions in Clang/LLVM under the hood, such as vendor-specific built-ins, decorations, attributes...

A good example is fbffaab and obviously we need FPGA extensions. :-)
This is a very good example of the kinds of information we need to carry.

But these extensions are difficult to maintain in a fork with all the merge conflicts or even to be accepted upstream just because of their narrow use-case. So we should think about a way to express in a generic way in Clang/LLVM some attributes, intrinsics, etc.

It would be nice to be able to declare some C++ constructions that abstracts in a nice way the extensions, for example with new types, decorator functions, properties... These constructions at the C++ level would be defined in an extension-specific header, using intrinsic functions or some kind of decoration.

If the back-end is using some "pseudo-intrinsics" (user-defined functions that behaves like intrinsics from the back-end point-of-view), it seems rather easy. You can see an implementation example under https://github.com/triSYCL/triSYCL/tree/master/include/CL/sycl/vendor/Xilinx

But this does not work to decorate other things like types...

Some languages like D have the concept of user-defined attributes https://dlang.org/spec/attribute.html#uda with some introspection mechanism which could serve as a source of inspiration. But it does not provide some way to synthesize some kinds of internal IR decorations.

C++ it-self has the concept of attributes since C++11 https://en.cppreference.com/w/cpp/language/attributes but there is no generic way to implement them into Clang/LLVM as far as I know.

Now we have contracts in C++20, there are 3 new attributes https://en.cppreference.com/w/cpp/language/attributes/contract which do use a C++ expression, so we have at least a source of inspiration about how to implement some part of the mechanic to access to expressions, and indirectly to types through the typeid() operator.

A first approach could be to introduce a new Clang รผber-attribute able to implement everything with a small DSL. For example

[[ using clang::attribute::generic :
     llvm::function::attribute::inaccessiblememonly,
     llvm::function::attribute::builtin,
     llvm::function::attribute::convergent,
     llvm::function::attribute::noduplicate,
     llvm::function::attribute::alignstack(4),
     llvm::metadata::callees(f, g),
     llvm::metadata::loop::distribute::enable ]]

Would allow to synthesize some specific metadata https://llvm.org/docs/LangRef.html#metadata,
There is already such a DSL in C/C++ compilers for decades to deal with extensions like inline assembler expressions, see https://llvm.org/docs/LangRef.html#inline-assembler-expressions for example.

This could be a RFC for the Clang mailing list.

What do you think?

[SYCL] segfault in clCreateProgramWithIL with OpenCL 2.0

My distro (Fedora 29) provides OpenCL with version reported as "OpenCL 2.0 beignet 1.3", which is incapable of using SPIR-V.

So when we unconditionally run clCreateProgramWithIL because we have a SPIR-V binary, we end up crashing with a SIGSEGV.

IMO, we should try to deal with this a bit more gracefully.

[SYCL] compiler issues an error when std::nullptr_t is used as a kernel name type

The following test fails to compile:
null_name.cpp.txt

$ clang++ -fsycl null_name.cpp -lOpenCL
In file included from :1:
/tmp/null_name-36241f.h:41:31: error: unknown type name 'nullptr_t'; did you mean 'std::nullptr_t'?
template <> struct KernelInfo<nullptr_t> {
^~~~~~~~~
std::nullptr_t
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/x86_64-linux-gnu/c++/7.3.0/bits/c++config.h:235:29: note: 'std::nullptr_t' declared here
typedef decltype(nullptr) nullptr_t;
^
1 error generated.

[AddressSpace] Intel OpenCL CPU RT crashes with new address space rules

After #242 Intel OpenCL CPU RT crashes on several LIT tests:

Failing Tests (2):
    SYCLUnitTests :: basic_tests/boolean.cpp
    SYCLUnitTests :: basic_tests/swizzle_op.cpp

This is a known issue and it will be fixed in the next release of Intel OpenCL CPU RT.
Use DISABLE_INFER_AS=1 environment variable as a workaround to switch back to the old address space rules.

[SYCL] Seg fault using simple float4 buffer

The program below gives an unexpected segmentation fault when run on 64-bit Ubuntu 19.04. I compile it with clang++ -std=c++11 -fsycl -lOpenCL float4-intel.cpp.

#include <CL/sycl.hpp>
  
namespace k { struct a; }

int main(int argc, char *argv[])
{
  using namespace cl::sycl;

  const unsigned sz{16};
  float4 v[sz];
  const range<1> r{sz};
  queue q;

  buffer<float4,1> buf(v,sz);

  q.submit([&](handler &cgh) {
    auto acc = buf.get_access<access::mode::read_write>(cgh);
    cgh.parallel_for<k::a>(r, [=](const item<1> ix) {
      acc[ix] = acc[ix];
    });
  });

  return 0;
}

[SYCL][Compilation] Question on Mangling of Record Types

So I was looking through some Kernel IR recently and I had a question on why we're currently mangling RecordDecl's during CodeGen inside of the addRecordTypeName function, snippet here: https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/CodeGenTypes.cpp#L64

There is a comment right now but it isn't too descriptive (probably because I lack the background knowledge) to me about what the problem it's solving is, to me it seems to be a Re-flower thing, as it mentions reflown to a proper name? If it is a Re-flower related segment of code, do we still require it as we've removed the Re-flower pass?

We currently don't have an issue with this piece of code, but I was interested in what the workaround is currently for! Thanks.

[SYCL] Host compilation failure for vec4 sqrt (likely other math functions as well)

#include <CL/sycl.hpp>
using namespace cl::sycl;

int main() {
  queue q;
  q.submit([&](handler &cgh) {
      cgh.single_task<no_known_host_conversion>([=]() {
        float4 f4{2.0};
        auto res = cl::sycl::sqrt(f4);
      });
  });
  q.wait();
  return 0;
}

The host compilation fails using the above code as it cannot find sqrt overloads for cl::sycl::vec<float, 4> as it appears to default to std::sqrt for host rather than a vec4 implementation. This happens when compiling to fatbinary or compiling host/device separately (device compiles, host fails in this case).

Example error output:

../build/lib/clang/9.0.0/include/CL/sycl.hpp:27:
../build/lib/clang/9.0.0/include/CL/sycl/math.hpp:300:10: error: no matching function for call to 'sqrt'
return __sycl_std::sqrt(x);
^~~~~~~~~~~~~~~~
math_host_conflict.cpp:16:30: note: in instantiation of function template specialization 'cl::sycl::sqrt<cl::sycl::vec<float, 4> >' requested here
auto res = cl::sycl::sqrt(f4);
^
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:143:13: note: candidate function not viable: no known conversion from 'cl::sycl::vec<float, 4>' to 'double' for 1st
argument
__MATHCALL (sqrt,, (Mdouble __x));
^
/usr/include/math.h:273:25: note: expanded from macro '__MATHCALL'
__MATHDECL (Mdouble,function,suffix, args)
^
/usr/include/math.h:275:22: note: expanded from macro '__MATHDECL'
__MATHDECL_1(type, function,suffix, args);
^
/usr/include/math.h:283:31: note: expanded from macro '__MATHDECL_1'
extern type __MATH_PRECNAME(function,suffix) args __THROW
^
/usr/include/math.h:286:42: note: expanded from macro '__MATH_PRECNAME'
#define __MATH_PRECNAME(name,r) __CONCAT(name,r)
^
/usr/include/x86_64-linux-gnu/sys/cdefs.h:105:23: note: expanded from macro '__CONCAT'
#define __CONCAT(x,y) x ## y
^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:463:3: note: candidate function not viable: no known conversion from 'cl::sycl::vec<float, 4>' to 'float'
for 1st argument
sqrt(float __x)
^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:467:3: note: candidate function not viable: no known conversion from 'cl::sycl::vec<float, 4>' to
'long double' for 1st argument
sqrt(long double __x)
^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:475:5: note: candidate template ignored: substitution failure [with _Tp = cl::sycl::vec<float, 4>]: no
type named '__type' in '__gnu_cxx::__enable_if<false, double>'
sqrt(_Tp __x)
^
1 error generated.

I believe this will fail for more than just vec4's and the sqrt function.

Compilation of the toolchain with dynamic libraries

When compiling the SYCL toolchain with option -DLLVM_LINK_LLVM_DYLIB:BOOL=ON the compilation fails in the SPIRV tool chain:

cd $SYCL_HOME/build          
cmake -DCMAKE_BUILD_TYPE=Release \
-DLLVM_EXTERNAL_PROJECTS="llvm-spirv;sycl" \
-DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$SYCL_HOME/sycl \
-DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$SYCL_HOME/llvm-spirv \
-DLLVM_ENABLE_PROJECTS="clang;llvm-spirv;sycl" -DLLVM_LINK_LLVM_DYLIB:BOOL=ON \
$SYCL_HOME/llvm
make -j`nproc` sycl-toolchain
[...]
/usr/bin/ld: CMakeFiles/llvm-spirv.dir/llvm-spirv.cpp.o: in function `convertSPIRV()::{lambda(std::ostream&)#1}::operator()(std::ostream&) const [clone .isra.67]':
llvm-spirv.cpp:(.text._ZZL12convertSPIRVvENKUlRSoE_clES_.isra.67+0x42): undefined reference to `SPIRV::convertSpirv(std::istream&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool, bool)'
/usr/bin/ld: CMakeFiles/llvm-spirv.dir/llvm-spirv.cpp.o: in function `convertLLVMToSPIRV()':
llvm-spirv.cpp:(.text._ZL18convertLLVMToSPIRVv+0x14b): undefined reference to `llvm::writeSpirv(llvm::Module*, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)'
/usr/bin/ld: llvm-spirv.cpp:(.text._ZL18convertLLVMToSPIRVv+0x1d9): undefined reference to `llvm::writeSpirv(llvm::Module*, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)'
/usr/bin/ld: llvm-spirv.cpp:(.text._ZL18convertLLVMToSPIRVv+0x207): undefined reference to `SPIRV::SPIRVUseTextFormat'
/usr/bin/ld: CMakeFiles/llvm-spirv.dir/llvm-spirv.cpp.o: in function `convertSPIRVToLLVM()':
llvm-spirv.cpp:(.text._ZL18convertSPIRVToLLVMv+0x78): undefined reference to `llvm::readSpirv(llvm::LLVMContext&, std::istream&, llvm::Module*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)'
/usr/bin/ld: CMakeFiles/llvm-spirv.dir/llvm-spirv.cpp.o: in function `regularizeLLVM()':
llvm-spirv.cpp:(.text._ZL14regularizeLLVMv+0x137): undefined reference to `llvm::regularizeLlvmForSpirv(llvm::Module*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)'
collect2: error: ld returned 1 exit status
make[3]: *** [tools/llvm-spirv/tools/llvm-spirv/CMakeFiles/llvm-spirv.dir/build.make:85: bin/llvm-spirv] Error 1
make[2]: *** [CMakeFiles/Makefile2:69666: tools/llvm-spirv/tools/llvm-spirv/CMakeFiles/llvm-spirv.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs...

[SYCL] Question : Which GPU devices are supported?

Hello All,

Thanks for excellent efforts! I am new to OpenCL/SYCL ecosystem and would like to clarify about which GPU devices are supported from this sycl branch.

On my linux box I have installed OpenCL and see following devices:

$ clinfo | grep Device
  Device Name                                     Quadro K620
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Device Type                                     GPU
...
  Device Name                                     Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 (Build 0)
  Device Type                                     CPU
...

Using sycl::device::get_devices(), I am able to list both devices. But, can I offload to NVIDIA device? I am assume not (?). (Or, with SPIR-V this is possible?)

It will be great if you could clarify/confirm.

[SYCL] building fails on Fedora/gcc

In file included from /home/airlied/devel/compute/intel/llvm/sycl/include/CL/sycl.hpp:46,
from /home/airlied/devel/compute/intel/llvm/sycl/tools/sycl-check.cpp:10:
/home/airlied/devel/compute/intel/llvm/sycl/include/CL/sycl/detail/scheduler/scheduler.cpp: In member function โ€˜void cl::sycl::simple_scheduler::Node::addAccRequirement(cl::sycl::accessor<dataT, dimensions, accessMode, accessTarget, isPlaceholder>&&, int)โ€™:
/home/airlied/devel/compute/intel/llvm/sycl/include/CL/sycl/detail/scheduler/scheduler.cpp:66:48: error: expected โ€˜,โ€™ or โ€˜;โ€™ before โ€˜::โ€™ token
isPlaceholder>::__impl()
^~
/home/airlied/devel/compute/intel/llvm/sycl/include/CL/sycl/detail/scheduler/scheduler.cpp: In member function โ€˜void cl::sycl::simple_scheduler::Node::addExplicitMemOp(cl::sycl::accessor<dataT, dimensions, accessMode, accessTarget, isPlaceholder>&, T)โ€™:
/home/airlied/devel/compute/intel/llvm/sycl/include/CL/sycl/detail/scheduler/scheduler.cpp:133:62: error: expected โ€˜,โ€™ or โ€˜;โ€™ before โ€˜::โ€™ token
isPlaceholder>::__impl();
^~
/home/airlied/devel/compute/intel/llvm/sycl/include/CL/sycl/detail/scheduler/scheduler.cpp: In member function โ€˜void cl::sycl::simple_scheduler::Node::addExplicitMemOp(cl::sycl::accessor<T_src, dim_src, mode_src, tgt_src, isPlaceholder_src>, cl::sycl::accessor<T_dest, dim_dest, mode_dest, tgt_dest, isPlaceholder_dest>)โ€™:
/home/airlied/devel/compute/intel/llvm/sycl/include/CL/sycl/detail/scheduler/scheduler.cpp:158:64: error: expected โ€˜,โ€™ or โ€˜;โ€™ before โ€˜::โ€™ token
isPlaceholder_src>::__impl();
^~
/home/airlied/devel/compute/intel/llvm/sycl/include/CL/sycl/detail/scheduler/scheduler.cpp:163:54: error: expected โ€˜,โ€™ or โ€˜;โ€™ before โ€˜::โ€™ token
isPlaceholder_dest>::__impl();
^~
/home/airlied/devel/compute/intel/llvm/sycl/include/CL/sycl/detail/scheduler/scheduler.cpp: In member function โ€˜void cl::sycl::simple_scheduler::Scheduler::updateHost(cl::sycl::accessor<dataT, dimensions, accessMode, accessTarget, isPlaceholder>&, cl::sycl::event&)โ€™:
/home/airlied/devel/compute/intel/llvm/sycl/include/CL/sycl/detail/scheduler/scheduler.cpp:196:60: error: expected โ€˜,โ€™ or โ€˜;โ€™ before โ€˜::โ€™ token
isPlaceholder>::__impl();
^~
Converting :: to . seems to fix this particular one. I'll send a PR for it.

[SYCL] Possible scheduler 2.0 deadlock with write buffer sync

So I have two example snippets of code that write to a single value in a buffer twice using two kernels and use the accessor functionality to read on host and write on device (I believe both snippets are legal SYCL code, but please do correct me if I am wrong and making some incorrect assumptions).

The first doesn't work but the second does, the only difference (from a user perspective) is the braces { } around the submit calls, which I believe forces a wait/synchronization event in SYCL (perhaps I am misunderstanding however). They both work with the old scheduler when -DSCHEDULER_10 is passed to the compiler, which leads me to think that it's less the legality of the two examples and more some incorrect synchronization event.

Tested with following command and unaltered top of the tree (as of May 14th): $ISYCL_BIN_DIR/clang++ -std=c++11 -fsycl scheduler_2_buffer_block.cpp -o scheduler_2_buffer_block -lOpenCL

I tinkered with this for a while, from what I've found:

  • If the get_access inside the second kernel is just a read accessor, it won't block.
  • It seems to block when waiting on the second kernel to complete, however it doesn't appear to be the kernel blocking it seems to be a dependent event generated from the clEnqueueUnmapMemObject invocation from memory_manager.cpp (can comment out the contents of unmap and the non-working snippet should work).
  • You can replace the second host side get_access with a queue wait and it'll still block
  • As far as the OpenCL runtime (and I) can tell, the OpenCL events generated aren't erroneous

Before I decide to dig any deeper I thought it might be worth finding out if this is a bug or a misconception/silliness on my end and if you guys are already aware and working on it!

Invalid, blocks when trying to wait for second kernel submit:

int main() {
  cl::sycl::queue q;
  cl::sycl::buffer<int, 1> ob((int[1]){0}, 1);
  q.submit([&](handler &cgh) {
    auto wb = ob.get_access<access::mode::read_write>(cgh);
    cgh.single_task<class k1>([=]() {
      wb[0] += 1;
    });
  });
  auto rb = ob.get_access<access::mode::read>();
  std::cout << rb[0] << "\n";

  q.submit([&](handler &cgh) {
    auto wb = ob.get_access<access::mode::read_write>(cgh);
    cgh.single_task<class k2>([=]() {
      wb[0] += 1;
    });
  });
  auto rb2 = ob.get_access<access::mode::read>();
  std::cout << rb2[0] << "\n";

  return 0;
}

Valid, no block:

int main() {
  cl::sycl::queue q;
  cl::sycl::buffer<int, 1> ob((int[1]){0}, 1);
  {
    q.submit([&](handler &cgh) {
      auto wb = ob.get_access<access::mode::read_write>(cgh);
      cgh.single_task<class k1>([=]() {
        wb[0] += 1;
      });
    });
    auto rb = ob.get_access<access::mode::read>();
    std::cout << rb[0] << "\n";
  }

  {
    q.submit([&](handler &cgh) {
      auto wb = ob.get_access<access::mode::read_write>(cgh);
      cgh.single_task<class k2>([=]() {
        wb[0] += 1;
      });
    });
    auto rb2 = ob.get_access<access::mode::read>();
    std::cout << rb2[0] << "\n";
  }

  return 0;
}

[SYCL] -save-temps is broken

/opt/sycl/bin/clang++ -save-temps -std=c++11 -fsycl simple-sycl-app.cpp -o simple-sycl-app -lsycl -lOpenCL

clang++: /home/airlied/devel/intel/llvm/clang/lib/Driver/ToolChains/Clang.cpp:5280: virtual void clang::driver::tools::Clang::ConstructJob(clang::driver::Compilation&, const clang::driver::JobAction&, const clang::driver::InputInfo&, const InputInfoList&, const llvm::opt::ArgList&, const char*) const: Assertion `Tgts && Tgts->getNumValues() && "SYCL offloading has to have targets specified."' failed.

ICE when using libcxx

When using libcxx and a simple example (unrelated to SYCL), then the compiler will ICE. This appears to be due to the use of __global in libcxx:

static locale& __global();

Example
// Originally from cppreference (https://en.cppreference.com/w/cpp/container/array)
#include <string>
#include <iterator>
#include <iostream>
#include <algorithm>
#include <array>

int main()
{
    // construction uses aggregate initialization
    std::array<int, 3> a1{ {1, 2, 3} }; // double-braces required in C++11 prior to the CWG 1270 revision
                                        // (not needed in C++11 after the revision and in C++14 and beyond)
    std::array<int, 3> a2 = {1, 2, 3};  // never required after =
    std::array<std::string, 2> a3 = { std::string("a"), "b" };

    // container operations are supported
    std::sort(a1.begin(), a1.end());
    std::reverse_copy(a2.begin(), a2.end(),
                      std::ostream_iterator<int>(std::cout, " "));

    std::cout << '\n';

    // ranged for loop is supported
    for(const auto& s: a3)
        std::cout << s << ' ';
}
Stack Trace

When invoked as ./bin/clang++ -std=c++11 -stdlib=libc++ repro.cpp -o repro with a build of eacc876:

clang-9: $project/llvm/include/llvm/ADT/StringSwitch.h:189: R llvm::StringSwitch<unsigned int, unsigned int>::operator unsigned int() [T = unsigned int, R = unsigned int]: Assertion `Result && "Fell off the end of a string-switch"' failed.
Stack dump:
0.      Program arguments: $project/build/bin/clang-9 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -mrelax-all -disable-free -main-file-name repro.cpp -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -resource-dir $project/build/lib/clang/9.0.0 -internal-isystem /usr/include/c++/v1 -internal-isystem /usr/local/include -internal-isystem $project/build/lib/clang/9.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -std=c++11 -fdeprecated-macro -fdebug-compilation-dir $project/build/dtw/libcxx-repro -ferror-limit 19 -fmessage-length 0 -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -o /run/user/1000/repro-d15a61.o -x c++ repro.cpp -faddrsig
1.      /usr/include/c++/v1/__locale:139:20: current parser token '__global'
2.      /usr/include/c++/v1/__locale:50:1 <Spelling=/usr/include/c++/v1/__config:439:37>: parsing namespace 'std'
3.      /usr/include/c++/v1/__locale:50:1 <Spelling=/usr/include/c++/v1/__config:439:59>: parsing namespace 'std::__1'
4.      /usr/include/c++/v1/__locale:83:1: parsing struct/union/class body 'std::__1::locale'
 #0 0x0000000002ab3d19 llvm::sys::PrintStackTrace(llvm::raw_ostream&) $project/llvm/lib/Support/Unix/Signals.inc:494:11
 #1 0x0000000002ab3ec9 PrintStackTraceSignalHandler(void*) $project/llvm/lib/Support/Unix/Signals.inc:558:1
 #2 0x0000000002ab27d6 llvm::sys::RunSignalHandlers() $project/llvm/lib/Support/Signals.cpp:67:5
 #3 0x0000000002ab457b SignalHandler(int) $project/llvm/lib/Support/Unix/Signals.inc:357:1
 #4 0x00007fade03b9860 __restore_rt (/usr/lib/libpthread.so.0+0x12860)
 #5 0x00007faddfcdfbe0 raise (/usr/lib/libc.so.6+0x35be0)
 #6 0x00007faddfce0dc1 abort (/usr/lib/libc.so.6+0x36dc1)
 #7 0x00007faddfcd86e7 __assert_fail_base (/usr/lib/libc.so.6+0x2e6e7)
 #8 0x00007faddfcd8792 (/usr/lib/libc.so.6+0x2e792)
 #9 0x00000000031d5b05 llvm::StringSwitch<unsigned int, unsigned int>::operator unsigned int() $project/llvm/include/llvm/ADT/StringSwitch.h:190:23
#10 0x0000000006f20c35 getFutureCompatDiagKind(clang::IdentifierInfo const&, clang::LangOptions const&) $project/clang/lib/Lex/Preprocessor.cpp:761:5
#11 0x0000000006f20633 clang::Preprocessor::HandleIdentifier(clang::Token&) $project/clang/lib/Lex/Preprocessor.cpp:844:5
#12 0x0000000006e984c6 clang::Lexer::LexIdentifier(clang::Token&, char const*) $project/clang/lib/Lex/Lexer.cpp:1688:7
#13 0x0000000006e9dd8c clang::Lexer::LexTokenInternal(clang::Token&, bool) $project/clang/lib/Lex/Lexer.cpp:3444:5
#14 0x0000000006e9bb38 clang::Lexer::Lex(clang::Token&) $project/clang/lib/Lex/Lexer.cpp:3152:8
#15 0x0000000006f20cda clang::Preprocessor::Lex(clang::Token&) $project/clang/lib/Lex/Preprocessor.cpp:886:21
#16 0x0000000006ed103e clang::Preprocessor::CachingLex(clang::Token&) $project/clang/lib/Lex/PPCaching.cpp:65:7
#17 0x0000000006f20d19 clang::Preprocessor::Lex(clang::Token&) $project/clang/lib/Lex/Preprocessor.cpp:893:21
#18 0x0000000005713d37 clang::Parser::ConsumeToken() $project/clang/include/clang/Parse/Parser.h:453:12
#19 0x000000000573ca67 clang::Parser::ParseDeclaratorInternal(clang::Declarator&, void (clang::Parser::*)(clang::Declarator&)) $project/clang/lib/Parse/ParseDecl.cpp:5524:24
#20 0x00000000057252f8 clang::Parser::ParseDeclarator(clang::Declarator&) $project/clang/lib/Parse/ParseDecl.cpp:5387:1
#21 0x000000000575b0eb clang::Parser::ParseCXXMemberDeclaratorBeforeInitializer(clang::Declarator&, clang::VirtSpecifiers&, clang::ActionResult<clang::Expr*, true>&, clang::Parser::LateParsedAttrList&) $project/clang/lib/Parse/ParseDeclCXX.cpp:2302:5
#22 0x000000000575c85b clang::Parser::ParseCXXClassMemberDeclaration(clang::AccessSpecifier, clang::ParsedAttributes&, clang::Parser::ParsedTemplateInfo const&, clang::ParsingDeclRAIIObject*) $project/clang/lib/Parse/ParseDeclCXX.cpp:2670:7
#23 0x000000000575e891 clang::Parser::ParseCXXClassMemberDeclarationWithPragmas(clang::AccessSpecifier&, clang::Parser::ParsedAttributesWithRange&, clang::TypeSpecifierType, clang::Decl*) $project/clang/lib/Parse/ParseDeclCXX.cpp:3137:12
#24 0x0000000005759e2a clang::Parser::ParseCXXMemberSpecification(clang::SourceLocation, clang::SourceLocation, clang::Parser::ParsedAttributesWithRange&, unsigned int, clang::Decl*) $project/clang/lib/Parse/ParseDeclCXX.cpp:3317:7
#25 0x000000000575853f clang::Parser::ParseClassSpecifier(clang::tok::TokenKind, clang::SourceLocation, clang::DeclSpec&, clang::Parser::ParsedTemplateInfo const&, clang::AccessSpecifier, bool, clang::Parser::DeclSpecContext, clang::Parser::ParsedAttributesWithRange&) $project/clang/lib/Parse/ParseDeclCXX.cpp:1954:7
#26 0x000000000573252c clang::Parser::ParseDeclarationSpecifiers(clang::DeclSpec&, clang::Parser::ParsedTemplateInfo const&, clang::AccessSpecifier, clang::Parser::DeclSpecContext, clang::Parser::LateParsedAttrList*) $project/clang/lib/Parse/ParseDecl.cpp:3814:12
#27 0x000000000570e4fd clang::Parser::ParseDeclOrFunctionDefInternal(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec&, clang::AccessSpecifier) $project/clang/lib/Parse/Parser.cpp:1006:7
#28 0x000000000570e0f0 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*, clang::AccessSpecifier) $project/clang/lib/Parse/Parser.cpp:1111:12
#29 0x000000000570d9a0 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) $project/clang/lib/Parse/Parser.cpp:931:12
#30 0x000000000575124a clang::Parser::ParseInnerNamespace(llvm::SmallVector<clang::Parser::InnerNamespaceInfo, 4u> const&, unsigned int, clang::SourceLocation&, clang::ParsedAttributes&, clang::BalancedDelimiterTracker&) $project/clang/lib/Parse/ParseDeclCXX.cpp:246:7
#31 0x0000000005750bf4 clang::Parser::ParseNamespace(clang::DeclaratorContext, clang::SourceLocation&, clang::SourceLocation) $project/clang/lib/Parse/ParseDeclCXX.cpp:227:18
#32 0x000000000572e902 clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&) $project/clang/lib/Parse/ParseDecl.cpp:1747:14
#33 0x000000000570d64e clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) $project/clang/lib/Parse/Parser.cpp:885:16
#34 0x000000000575124a clang::Parser::ParseInnerNamespace(llvm::SmallVector<clang::Parser::InnerNamespaceInfo, 4u> const&, unsigned int, clang::SourceLocation&, clang::ParsedAttributes&, clang::BalancedDelimiterTracker&) $project/clang/lib/Parse/ParseDeclCXX.cpp:246:7
#35 0x0000000005750bf4 clang::Parser::ParseNamespace(clang::DeclaratorContext, clang::SourceLocation&, clang::SourceLocation) $project/clang/lib/Parse/ParseDeclCXX.cpp:227:18
#36 0x000000000572e9fd clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&) $project/clang/lib/Parse/ParseDecl.cpp:1753:12
#37 0x000000000570d500 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) $project/clang/lib/Parse/Parser.cpp:864:14
#38 0x000000000570bc7c clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, bool) $project/clang/lib/Parse/Parser.cpp:682:12
#39 0x0000000005707051 clang::ParseAST(clang::Sema&, bool, bool) $project/clang/lib/Parse/ParseAST.cpp:158:16
#40 0x00000000035f0ca2 clang::ASTFrontendAction::ExecuteAction() $project/clang/lib/Frontend/FrontendAction.cpp:1037:1
#41 0x0000000003f4e05f clang::CodeGenAction::ExecuteAction() $project/clang/lib/CodeGen/CodeGenAction.cpp:1057:1
#42 0x00000000035f06c0 clang::FrontendAction::Execute() $project/clang/lib/Frontend/FrontendAction.cpp:938:7
#43 0x0000000003585970 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) $project/clang/lib/Frontend/CompilerInstance.cpp:946:7
#44 0x000000000377a164 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) $project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:273:8
#45 0x00000000011b4e23 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) $project/clang/tools/driver/cc1_main.cpp:225:13
#46 0x00000000011a8981 ExecuteCC1Tool(llvm::ArrayRef<char const*>, llvm::StringRef) $project/clang/tools/driver/driver.cpp:309:5
#47 0x00000000011a7d11 main $project/clang/tools/driver/driver.cpp:381:5
#48 0x00007faddfcccb8e __libc_start_main (/usr/lib/libc.so.6+0x22b8e)
#49 0x00000000011a750a _start /build/glibc-2.27/csu/../sysdeps/x86_64/start.S:122:0
clang-9: error: unable to execute command: Aborted
clang-9: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 9.0.0
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: $project/build/dtw/libcxx-repro/../../bin
clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-9: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-9: note: diagnostic msg: /run/user/1000/repro-11d120.cpp
clang-9: note: diagnostic msg: /run/user/1000/repro-11d120.sh
clang-9: note: diagnostic msg:

********************

[SYCL]Buffer size zero

I have noticed that It is possible to create a buffer of size zero. e.g.
#include <CL/sycl.hpp> int main(){ auto buff = cl::sycl::buffer<float>(cl::sycl::range<1>(0)); return 0; }
What does buffer size zero mean?

Plan to upstream SYCL to LLVM project.

Milestones

  • Compile device part of "hello world"-like SYCL application.
  • Convert device part of the "hello world"-like SYCL application to SPIR-V format.
  • Run "hello world"-like SYCL application via OpenCL on GPU.
  • Integrate SYCL compilation flow into clang driver.
  • Compile/run multi-source application
  • Compile/run multi-source application linking with an external SYCL library

Components to upstream

Device code outlining

Current approach: we added two new attributes to the compiler: sycl_kernel and sycl_device. SYCL runtime is supposed to use these attributes to mark functions in the single source which are supposed to be compiled for the device. Compiler is supposed to understand if there are other functions/constants/variables needed to compiler.

Idea(s) to try: clang has infrastructure to mark the code compiler is supposed to offload for CUDA. We can utilize this infrastructure. E.g. AdaptiveCpp/AdaptiveCpp#34 (comment) contains some ideas how already supported functionality can be re-used.

Patch to LLVM: https://reviews.llvm.org/D60455

Integration header

Current approach: SYCL device compiler emits "integration header", which provides "implementation defined" part of the host/device ABI (e.g. order of parameters captured by SYCL kernel, SYCL kernel function names, etc.)

Idea(s) to try: integration header is required only if SYCL device compiler is used custom host compiler. To enable compilation flow with clang compiler only, we could emit host-side object file instead of integration header.

Driver support

Current approach: we added new toolchain to support SYCL compilation flow. This toolchain handles host/device compilers invocation, translation to SPIR-V, etc.

Idea(s) to try: current approach can be improved by implementing suggestions from Andrew to make implementation more friendly for adding custom back-ends support.

Device code diagnostics

Current approach: we implemented new diagnostic messages in SemaSYCL.cpp for most of the requirements in SYCL specification.

Idea(s) to try: re-use existing delayed diagnostics engine similar to OpenMP/CUDA. Re-using existing infrastructure for device code outlining should help here too.

SPIR-V format support

Current approach: SYCL compilation flow introduced a new LLVM transformation pass (CxxReflower) that adjust SPIR-V type and SPIR-V built-in function names declared in the SYCL headers to match the format described here: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/master/docs/SPIRVRepresentationInLLVM.rst

Idea(s) to try: declare SPIR-V built-ins in the SYCL headers, so that transformation is not required and use device specific lib to transform SPIR-V built-ins otherwise. User can link built-ins implementation withย -mlink-builtin-bitcodeย option. We can use SPIR-V built-ins in SYCL headers and define them in device specific library. For instance, Xilinx tool-chain can implement SPIR-V built-ins as SPIR-df built-ins in device specific library, although it's might be better to use something else to avoid name collision between user defined functions and OpenCL built-ins (e.g.ย __xilinx_sqrt).
This approach is used for lowering AMDGPU/NVidia specific built-ins. Here is recent patch adding tool-chain support for linking with device library.
https://reviews.llvm.org/D59321ย - AMDGPU: Teach toolchain to link ROCm device libs. Take a look at the driver test: test/Driver/rocm-device-libs.cl.

SYCL library

Current approach: sub-project in LLVM project. Currently is tested only with Intel OpenCL drivers.

Idea(s) to try: No obvious changes, but can be adjusted based on external users feedback to make it vendor neutral.

[SYCL] Possibly redundant getSpirvSource() call in program_manager.cpp

Fairly irrelevant issue and I could be wrong and misunderstanding the code (I apologize if I am) but I think the new addition of the function createOpenCLProgram inside sycl/source/detail/program_manager/program_manager.cpp makes the call to getSpirvSource() on line 125 inside the function getBuiltOpenCLProgram redundant.

As createOpenCLProgram calls it on line 115 (which is invoked by getBuiltOpenCLProgram) and the return from the invocation on line 125 doesn't appear to be used at the moment. The check for existing Spirv source inside getSpirvSource()probably stops the source getting loaded in twice but thought I'd bring it up in case.

[SYCL] linking sycl gives missing __cpu_model

On Fedora 28 + gcc 8.2.1

CMakeFiles/sycl.dir/source/detail/device_info.cpp.o: In function cl::sycl::detail::get_native_vector_width(unsigned long)': device_info.cpp:(.text._ZN2cl4sycl6detail23get_native_vector_widthEm+0x3): undefined reference to __cpu_model'

I can fix this by manually adding -lgcc to the end of the c++ linker line, but that probably means something has gone wrong elsewhere, just mentioning here in case anyone else hits it.

generated SPIRV contains mangled get_global_id

The SPIR-V generated here using the suggested command line contains a function call out to get_global_id.

This should be lowered by the llvm-spirv code, but isn't and I'm not sure how to fix it.

the llvm-spirv code has an OCL20ToSPIRV lowering code for OpenCL C 2.0, and OCL21toSPIRV lowering for OpenCL C++ 2.1 code.

Now we generate OpenCL C++ 1.0 source version, so neither of these gets hit.

However the OpenCL C++ interface to get_global_id is cl::get_global_id in the CL namespace, so the current SYCL runtime gets this wrong.

I'm not sure what best solution is?

Move to CL C++ 2.1 version (maybe this is a bug in the llvm-spirv, since it's probably meant to be C++ 1.0) and use cl:: for the device only code?

[SYCL] [SPEC] USM Feedback

Is there a reason why this is done through new allocation interfaces rather than with the existing buffer and accessor classes? It seems like all the functionality could be described as new types of buffers corresponding to the tables shown in the proposal. What is the basis for providing a new allocation method in SYCL?

If we must have a new interface, I prefer code that is easy for the reader rather than for the writer and as such, would rather see a single sycl_malloc() call that is parameterized.

sycl_malloc_device(size, device);
sycl_malloc(size, device);

It is not obvious how these two functions are different without looking into the spec in detail. An interface like this:

sycl_malloc(size_t size, size_t alignment, const sycl::device& device, const sycl::memory::allocation_type& type);

sycl_malloc(size, 0, device, sycl::memory::allocation::device);
sycl_malloc(size, 0, device, sycl::memory::allocation::shared);

These two are immediately obvious in how the allocations are different and what they do.

[SYCL] The source code of the project should be commented

The current source code is almost not commented and so very hard to understand.
If this is not done on purpose (such as to have a write-only obfuscated code :-) ) the coding rules should state that the source code has to be clearly commented to have PR merged in.
It will help collaboration and up-streaming too...

[SYCL] Failing to compile SYCL code on C++ mode only

The following simplified application

#include <array>

#include <CL/sycl.hpp>

constexpr cl::sycl::access::mode sycl_read = cl::sycl::access::mode::read;
constexpr cl::sycl::access::mode sycl_write = cl::sycl::access::mode::write;
constexpr size_t N = 10u;

template<typename T>
class SimpleVadd;

int main() {
        using T = int;
       using array_type = std::array<T, N>;
        array_type VA, VB, VC;

  {
                cl::sycl::queue deviceQueue;
                cl::sycl::range<1> numOfItems{N};
                cl::sycl::buffer<T, 1> bufferA(VA.data(), numOfItems);
                cl::sycl::buffer<T, 1> bufferB(VB.data(), numOfItems);
                cl::sycl::buffer<T, 1> bufferC(VC.data(), numOfItems);

                deviceQueue.submit([&](cl::sycl::handler& cgh) {
                        auto accessorA = bufferA.template get_access<sycl_read>(cgh);
                        auto accessorB = bufferB.template get_access<sycl_read>(cgh);
                        auto accessorC = bufferC.template get_access<sycl_write>(cgh);

                        cgh.parallel_for<class SimpleVadd<T>>(numOfItems,
                        [=](cl::sycl::id<1> wiID) {
                                accessorC[wiID] = accessorA[wiID] + accessorB[wiID];
                        });
                });
        }
        return VC[0] == (VA[0] + VB[0]);
}

fails to compile when not passing the -fsycl option to the driver with the following error:

$ intel_sycl --std=c++11 k.cpp  -lOpenCL  -lsycl
In file included from k.cpp:3:
In file included from /home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl.hpp:11:
In file included from /home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:14:
In file included from /home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:10:
In file included from /home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl.hpp:15:
In file included from /home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/detail/queue_impl.hpp:15:
/home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/handler.hpp:363:13: error: implicit instantiation of
      undefined template 'cl::sycl::detail::KernelInfo<SimpleVadd<int> >'
            KI::getName(), KI::getNumParams(), &KI::getParamDesc(0),
            ^
k.cpp:30:8: note: in instantiation of function template specialization
      'cl::sycl::handler::parallel_for<SimpleVadd<int>, (lambda at k.cpp:31:4), 1>' requested here
                        cgh.parallel_for<class SimpleVadd<T>>(numOfItems,
                            ^
/home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/detail/kernel_desc.hpp:38:40: note: template is
      declared here
template <class KernelNameType> struct KernelInfo;
                                       ^
In file included from k.cpp:3:
In file included from /home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl.hpp:11:
In file included from /home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:14:
In file included from /home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:10:
In file included from /home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl.hpp:15:
In file included from /home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/detail/queue_impl.hpp:15:
/home/ruyman/Projects/sycl/build/lib/clang/9.0.0/include/CL/sycl/handler.hpp:363:15: error: incomplete definition of
      type 'cl::sycl::detail::KernelInfo<SimpleVadd<int> >'
            KI::getName(), KI::getNumParams(), &KI::getParamDesc(0),
            ~~^~
2 errors generated.

However, it works with when -fsycl is passed.

Looks like something in the handler header requires/expect the KernelInfo but is not there on non-SYCL mode

[SYCL] event get_profiling_info problem

So another contrived example:

#include <CL/sycl.hpp>
using namespace cl::sycl;

int main() {

  queue q{property::queue::enable_profiling()};
  auto e = q.submit([&](handler &cgh) {
      int w = 512;
      cgh.single_task<class event_wait>([=]() mutable {
        for (int i = 0; i < 512000; ++i)
          ++w;
      });
      printf("%d \n", w);
  });

  auto nsTimeEnd = e.get_profiling_info<info::event_profiling::command_end>();
  return 0;
}

So I have tried this with ComputeCPP (1.0) and the Intel SYCL (ISYCL) compiler in both cases using an Intel OCL selector. In ISYCL the get_profiling_info call will terminate in the SYCL runtime with the following diagnostic:

include/CL/sycl/detail/event_info.hpp:26: OpenCL API returns: -7 (CL_PROFILING_INFO_NOT_AVAILABLE)

Now if you put an explicit queue wait in, all is good in the world and it'll give a reasonable value! So, it seems that e.get_profiling_info is not a blocking call/implicit wait. And this leads to nondeterministic results in this case, some times it'll terminate, sometimes the events completed so you get your result back without problem.

However, the same snippet of code compiled with ComputeCPP seems to lead to a different result, without an explicit queue wait, it still yields a reasonable return value indicating that get_profiling_info is waiting on the event to complete before retrieving the information.

I'm not sure which behavior is correct as far as the specification is concerned (I did skim and I don't think there is a rule specifying get_profiling_info forces a wait till event completion). So this may be more of a specification related issue than an implementation issue, if that's the case I can move the issue.

[SYCL] 11 tests are failing on Ubuntu 18.10 and GCC 8.2.0

Current version bae2b75
has some failing tests on Ubuntu 18.10 and GCC 8.2.0:

********************
Failing Tests (11):
    SYCLUnitTests :: aot/with-llvm-bc.cpp
    SYCLUnitTests :: sub_group/barrier.cpp
    SYCLUnitTests :: sub_group/broadcast.cpp
    SYCLUnitTests :: sub_group/common.cpp
    SYCLUnitTests :: sub_group/common_ocl.cpp
    SYCLUnitTests :: sub_group/info.cpp
    SYCLUnitTests :: sub_group/load_store.cpp
    SYCLUnitTests :: sub_group/reduce.cpp
    SYCLUnitTests :: sub_group/scan.cpp
    SYCLUnitTests :: sub_group/shuffle.cpp
    SYCLUnitTests :: sub_group/vote.cpp

  Expected Passes    : 43607
  Expected Failures  : 172
  Unsupported Tests  : 1292
  Unexpected Failures: 11
make[3]: *** [CMakeFiles/check-all.dir/build.make:58: CMakeFiles/check-all] Error 1
make[2]: *** [CMakeFiles/Makefile2:211: CMakeFiles/check-all.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:218: CMakeFiles/check-all.dir/rule] Error 2
make: *** [Makefile:199: check-all] Error 2

Actually it is also mentionned by @Ruyk in #9

Among the 11, it looks like there are 2 kinds of failures:

FAIL: SYCLUnitTests :: aot/with-llvm-bc.cpp (45019 of 45082)
******************** TEST 'SYCLUnitTests :: aot/with-llvm-bc.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';    /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/bin/clang -I/home/rkeryell/Xilinx/Projects/OpenCL/Khronos/OpenCL-Headers -fsycl -fsycl-targets=spir64-unknown-linux-sycldevice -c /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/aot/with-llvm-bc.cpp -o /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.o
: 'RUN: at line 2';    /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/bin/clang -I/home/rkeryell/Xilinx/Projects/OpenCL/Khronos/OpenCL-Headers -fsycl -fsycl-link-targets=spir64-unknown-linux-sycldevice /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.o -o /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.spv
: 'RUN: at line 3';   /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/bin/llvm-spirv -r /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.spv -o /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.bc
: 'RUN: at line 4';    /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/bin/clang -I/home/rkeryell/Xilinx/Projects/OpenCL/Khronos/OpenCL-Headers -fsycl -fsycl-add-targets=binary:/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.bc /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.o -o /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.out -lOpenCL -lsycl -lstdc++
: 'RUN: at line 7';   env SYCL_DEVICE_TYPE=CPU  /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.out
--
Exit Code: -11

Command Output (stdout):
--
$ ":" "RUN: at line 1"
$ "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/bin/clang" "-I/home/rkeryell/Xilinx/Projects/OpenCL/Khronos/OpenCL-Headers" "-fsycl" "-fsycl-targets=spir64-unknown-linux-sycldevice" "-c" "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/aot/with-llvm-bc.cpp" "-o" "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.o"
$ ":" "RUN: at line 2"
$ "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/bin/clang" "-I/home/rkeryell/Xilinx/Projects/OpenCL/Khronos/OpenCL-Headers" "-fsycl" "-fsycl-link-targets=spir64-unknown-linux-sycldevice" "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.o" "-o" "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.spv"
$ ":" "RUN: at line 3"
$ "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/bin/llvm-spirv" "-r" "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.spv" "-o" "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.bc"
$ ":" "RUN: at line 4"
$ "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/bin/clang" "-I/home/rkeryell/Xilinx/Projects/OpenCL/Khronos/OpenCL-Headers" "-fsycl" "-fsycl-add-targets=binary:/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.bc" "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.o" "-o" "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.out" "-lOpenCL" "-lsycl" "-lstdc++"
$ ":" "RUN: at line 7"
$ "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/aot/Output/with-llvm-bc.cpp.tmp.out"
# command stderr:
Stack dump:
0.	Running pass 'Intel OpenCL Vectorizer' on module 'main'.
1.	Running pass 'Intel OpenCL VectorizerCore' on function ''
2.	Running pass 'OCL Builtin Pre Vectorization Pass' on function ''

error: command failed with exit status: -11

--

********************
FAIL: SYCLUnitTests :: sub_group/broadcast.cpp (45042 of 45082)
******************** TEST 'SYCLUnitTests :: sub_group/broadcast.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';    /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/bin/clang -I/home/rkeryell/Xilinx/Projects/OpenCL/Khronos/OpenCL-Headers -std=c++11 -fsycl /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp -o /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/sub_group/Output/broadcast.cpp.tmp.out -lstdc++ -lOpenCL -lsycl
: 'RUN: at line 2';   env SYCL_DEVICE_TYPE=HOST /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/sub_group/Output/broadcast.cpp.tmp.out
: 'RUN: at line 3';   env SYCL_DEVICE_TYPE=CPU  /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/sub_group/Output/broadcast.cpp.tmp.out
: 'RUN: at line 4';    env SYCL_DEVICE_TYPE=GPU  /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/sub_group/Output/broadcast.cpp.tmp.out
: 'RUN: at line 5';   echo /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/sub_group/Output/broadcast.cpp.tmp.out
--
Exit Code: 1

Command Output (stdout):
--
$ ":" "RUN: at line 1"
$ "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/bin/clang" "-I/home/rkeryell/Xilinx/Projects/OpenCL/Khronos/OpenCL-Headers" "-std=c++11" "-fsycl" "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp" "-o" "/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/build/projects/sycl/test/sub_group/Output/broadcast.cpp.tmp.out" "-lstdc++" "-lOpenCL" "-lsycl"
# command stderr:
In file included from /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:14:
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/helper.hpp:105:9: error: call to 'abs' is ambiguous
    if (std::abs(val - ref) != 0) {
        ^~~~~~~~
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:50:7: note: in instantiation of function template specialization 'exit_if_not_equal<unsigned int>' requested here
      exit_if_not_equal<T>(syclacc[j], L * WGid + SGid + SGid * sg_size,
      ^
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:65:3: note: in instantiation of function template specialization 'check<unsigned int>' requested here
  check<unsigned int>(Queue);
  ^
/usr/include/stdlib.h:837:12: note: candidate function
extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur;
           ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:56:3: note: candidate function
  abs(long __i) { return __builtin_labs(__i); }
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:61:3: note: candidate function
  abs(long long __x) { return __builtin_llabs (__x); }
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:70:3: note: candidate function
  abs(double __x)
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:74:3: note: candidate function
  abs(float __x)
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:78:3: note: candidate function
  abs(long double __x)
  ^
In file included from /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:14:
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/helper.hpp:105:9: error: call to 'abs' is ambiguous
    if (std::abs(val - ref) != 0) {
        ^~~~~~~~
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:50:7: note: in instantiation of function template specialization 'exit_if_not_equal<unsigned long>' requested here
      exit_if_not_equal<T>(syclacc[j], L * WGid + SGid + SGid * sg_size,
      ^
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:67:3: note: in instantiation of function template specialization 'check<unsigned long>' requested here
  check<unsigned long>(Queue);
  ^
/usr/include/stdlib.h:837:12: note: candidate function
extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur;
           ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:56:3: note: candidate function
  abs(long __i) { return __builtin_labs(__i); }
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:61:3: note: candidate function
  abs(long long __x) { return __builtin_llabs (__x); }
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:70:3: note: candidate function
  abs(double __x)
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:74:3: note: candidate function
  abs(float __x)
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:78:3: note: candidate function
  abs(long double __x)
  ^
2 errors generated.
In file included from /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:14:
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/helper.hpp:105:9: error: call to 'abs' is ambiguous
    if (std::abs(val - ref) != 0) {
        ^~~~~~~~
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:50:7: note: in instantiation of function template specialization 'exit_if_not_equal<unsigned int>' requested here
      exit_if_not_equal<T>(syclacc[j], L * WGid + SGid + SGid * sg_size,
      ^
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:65:3: note: in instantiation of function template specialization 'check<unsigned int>' requested here
  check<unsigned int>(Queue);
  ^
/usr/include/stdlib.h:837:12: note: candidate function
extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur;
           ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:56:3: note: candidate function
  abs(long __i) { return __builtin_labs(__i); }
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:61:3: note: candidate function
  abs(long long __x) { return __builtin_llabs (__x); }
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:70:3: note: candidate function
  abs(double __x)
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:74:3: note: candidate function
  abs(float __x)
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:78:3: note: candidate function
  abs(long double __x)
  ^
In file included from /home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:14:
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/helper.hpp:105:9: error: call to 'abs' is ambiguous
    if (std::abs(val - ref) != 0) {
        ^~~~~~~~
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:50:7: note: in instantiation of function template specialization 'exit_if_not_equal<unsigned long>' requested here
      exit_if_not_equal<T>(syclacc[j], L * WGid + SGid + SGid * sg_size,
      ^
/home/rkeryell/Xilinx/Projects/LLVM/worktrees/intel/sycl/sycl/test/sub_group/broadcast.cpp:67:3: note: in instantiation of function template specialization 'check<unsigned long>' requested here
  check<unsigned long>(Queue);
  ^
/usr/include/stdlib.h:837:12: note: candidate function
extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur;
           ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:56:3: note: candidate function
  abs(long __i) { return __builtin_labs(__i); }
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:61:3: note: candidate function
  abs(long long __x) { return __builtin_llabs (__x); }
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:70:3: note: candidate function
  abs(double __x)
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:74:3: note: candidate function
  abs(float __x)
  ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:78:3: note: candidate function
  abs(long double __x)
  ^
2 errors generated.

error: command failed with exit status: 1
--
********************

[SYCL] How OpenCL device mode is being reused for SYCL

This is to track the discussion about the device compilation flow in the Clang frontend.

There are multiple ways OpenCL features can be reused:

  • At source level by implementing SYCL libraries using C++ for OpenCL code and enabling OpenCL in LangOpts for SYCL device.
  • At AST level by mapping SYCL constructs onto OpenCL AST nodes.
  • At IR CodeGen level by re-using OpenCL CodeGen functionality.

The objective is to avoid code duplication as much as possible and to keep code base simpler/easier to maintain.

[SYCL] Incorrect Kernel Lambda Capture

#include <CL/sycl.hpp>
using namespace cl::sycl;

int main() {
  queue q;
  q.submit([&](handler &cgh) {
    int w = 512;
    cgh.single_task<class are_you_broken>([=]() {
      printf("%d \n", w);
    });
  });
  q.wait();
  return 0;
}

So it's a pretty simple example, just trying to capture a value and print it, however when using ComputeCPP (1.0) I'll get the correct value 512 inside the single_task, in the Intel implementation I get an arbitrary large integer value (e.g. 946443008). The contrived example should be directly and quickly usable to see if it is indeed a bug in the implementation or if I'm mistaken and its an issue on my end!

I used the following command: clang++ -std=c++11 -fsycl possibly_broken_capture.cpp -o possibly_broken_capture -lsycl -lOpenCL

When I compile just the device side code the header indicates that SemaSYCL does find the parameter and treats it as { kernel_param_kind_t::kind_std_layout, 4, 0 }.

Note: In my test version of the example I have a device selector that forces it to choose Intel OCL, so both SYCL runtimes should find the same OCL implementation.

[SYCL] Cannot use derived accessor objects

The following code:

#include <array>

#include <CL/sycl.hpp>

constexpr cl::sycl::access::mode sycl_read = cl::sycl::access::mode::read;
constexpr cl::sycl::access::mode sycl_write = cl::sycl::access::mode::write;
constexpr size_t N = 10u;

template<typename T>
class SimpleVadd;

template<typename T>
using my_nice_alias = cl::sycl::accessor<T, 1, sycl_write, 
                                        cl::sycl::access::target::global_buffer>;

template<typename T>
struct my_accessor : public my_nice_alias<T> {

  using my_nice_alias<T>::my_nice_alias;
};

int main() {
        using T = int;
  using array_type = std::array<T, N>;

        array_type VA, VB, VC;

  for (size_t i = 0; i < N; i++) {
    VA[i] = VB[i] = i;
  }

  {
                cl::sycl::queue deviceQueue;
                cl::sycl::range<1> numOfItems{N};
                cl::sycl::buffer<T, 1> bufferA(VA.data(), numOfItems);
                cl::sycl::buffer<T, 1> bufferB(VB.data(), numOfItems);
                cl::sycl::buffer<T, 1> bufferC(VC.data(), numOfItems);

                deviceQueue.submit([&](cl::sycl::handler& cgh) {
                        auto accessorA = bufferA.template get_access<sycl_read>(cgh);
                        auto accessorB = bufferB.template get_access<sycl_read>(cgh);
#if NORMAL_ACCESSOR
                        auto accessorC = bufferC.template get_access<sycl_write>(cgh);
#else
                        my_accessor<int> accessorC(bufferC, cgh);
#endif

                        cgh.parallel_for<class SimpleVadd<T>>(numOfItems,
                        [=](cl::sycl::id<1> wiID) {
                                accessorC[wiID] = accessorA[wiID] + accessorB[wiID];
                        });
                });
        }
        return !(VC[0] == (VA[0] + VB[0]));
}

produces incorrrect results when NORMAL_ACCESSOR is not defined, but works fine when it is.

The main difference between the two code paths is that we use a derived class from the SYCL accessor, which is valid from the SYCL specification.
The generated SPIR-V binary doesn't recognize the derived class (my_accessor) as an accessor, and fails to generate the correct code (treats my_accessor as a normal struct). However, no error is reported: It just generates incorrect results at runtime.

Seems this could be to the fact that

bool Util::isSyclAccessorType(const QualType &Ty) {
checks the name of the type to see if it matches the name accessor, which seems to me its a bit fragile for long-term maintenance?
Libraries or users can derive from accessors to create custom types.
I suspect a similar problem may occur if users derive from the stream class.

[SYCL] Calling kernel within library produces CL_INVALID_KERNEL_NAME

I'm facing a runtime error when I call a SYCL kernel, from within a function, which is in a library.

As a MWE I've renamed simple-sycl-app.cpp from GetStartedWithSYCLCompiler.md to simple-sycl-lib.cpp and renamed its main function to simple_sycl. I also added a main.cpp which calls simple_sycl. The commands to build this are listed below:

clang++ -std=c++11 -fsycl -c simple-sycl-lib.cpp -o simple-sycl-lib.o
ar r simple-sycl-lib.a simple-sycl-lib.o
clang++ -std=c++11 -fsycl main.cpp simple-sycl-lib.a -lOpenCL

If I run the resulting a.out executable, the errors below are issued. I've attached the files involved for convenience: simple-sycl-lib.zip

OpenCL API failed. /home/pkeir/code/ext/intel_sycl/sycl/sycl/source/detail/program_manager/program_manager.cpp:112: OpenCL API returns: -46 (CL_INVALID_KERNEL_NAME)
The result is incorrect for element: 0 , expected: 0 , got: 538976288
The result is incorrect for element: 1 , expected: 1 , got: 1886334752
The result is incorrect for element: 2 , expected: 2 , got: 1852387188
The result is incorrect for element: 3 , expected: 3 , got: 795633012

[SYCL] The build system doesn't use the OpenCL_LIBRARY passed in everywhere

There are a number of places where the build system uses a hard-coded reference to OpenCL instead of using the passed in OpenCL_LIBRARY. This prevents the build from succeeding in environments where the OpenCL implementation doesn't reside in a library called OpenCL. The following patch seems to have done the trick for me:

--- a/sycl/CMakeLists.txt
+++ b/sycl/CMakeLists.txt
@@ -32,7 +32,7 @@ set ( LLVM_INST_INC_DIRECTORY "lib${LLVM_LIBDIR_SUFFIX}/clang/${CLANG_VERSION}/i
 find_package(OpenCL REQUIRED)

 include_directories(${OpenCL_INCLUDE_DIRS})
-link_libraries(OpenCL)
+link_libraries(${OpenCL_LIBRARY})

 # Copy SYCL headers
 set(sycl_inc_dir ${CMAKE_CURRENT_SOURCE_DIR}/include/CL)
@@ -96,7 +96,7 @@ add_library("${SYCLLibrary}" SHARED

 include_directories("${SYCLLibrary}" "${includeRootPath}")

-target_link_libraries("${SYCLLibrary}" "${OpenCL_LIBRARIES}")
+target_link_libraries("${SYCLLibrary}" "${OpenCL_LIBRARY}")
 set_target_properties("${SYCLLibrary}" PROPERTIES LINKER_LANGUAGE CXX)

 # Workaround for bug in GCC version 5.
@@ -7,7 +7,7 @@ find_package(OpenCL REQUIRED)
 # All projects need this include directory
 include_directories(${OpenCL_INCLUDE_DIRS})

-link_libraries(OpenCL)
+link_libraries(${OpenCL_LIBRARY})

 add_executable(get_device_count_by_type get_device_count_by_type.cpp)

[SYCL] LLVM ERROR: OCL version mismatch while building test application

Building the simple test application using the 1-step flow results in the following behaviour:

$ ./bin/clang++ -std=c++11 -fsycl -I/path/to/OpenCL-Headers/ simple-sycl-app.cpp -o simple-sycl-app -lsycl  /path/to/opencllib.so
LLVM ERROR: OCL version mismatch
/tmp/simple-sycl-app-e14b6d.o: file not recognised: File truncated
clang-8: error: sycl-link command failed with exit code 1 (use -v to see invocation)
clang-8: error: linker command failed with exit code 1 (use -v to see invocation)

FYI /path/to/opencllib.so contains a version of the SPIRV-LLVM translator.

With the following hack:

--- a/llvm-spirv/lib/SPIRV/OCLUtil.cpp
+++ b/llvm-spirv/lib/SPIRV/OCLUtil.cpp
@@ -231,11 +231,14 @@ unsigned getOCLVersion(Module *M, bool AllowMulti) {
     return std::make_pair(getMDOperandAsInt(MD, 0), getMDOperandAsInt(MD, 1));
   };
   auto Ver = GetVer(0);
+#if 0
   for (unsigned I = 1, E = NamedMD->getNumOperands(); I != E; ++I)
     if (Ver != GetVer(I))
       report_fatal_error("OCL version mismatch");
-
   return encodeOCLVer(Ver.first, Ver.second, 0);
+#else
+  return encodeOCLVer(2, 1, 0);
+#endif
 }

 void decodeMDNode(MDNode *N, unsigned &X, unsigned &Y, unsigned &Z) {

the build succeeds.

[SYCL] Using an aliased type as the type of a non-type template parameter used naming a SYCL kernel

Using a typedef'd type as a non-type template parameter (to name a kernel) produces a compilation error. The error indicates that two declarations are seen - with the second using int rather than the alias name (here sz_t). The code below produces the error.

#include <CL/sycl.hpp>

using sz_t = unsigned long;

namespace kernels {
  template <sz_t> class K;  // Could be size_t         
}

int main(int argc, char *argv[])
{
  using namespace cl::sycl;
  queue q;

  q.submit([&](handler &cgh) {
    cgh.parallel_for<kernels::K<0>>(range<1>{64}, [=](id<1> ix) {
      ix[0];
    });
  });

  return 0;
}

[SYCL] Remove re-flower pass

Current approach with translation to SPIR-V uses re-flower pass. See this section for more details: https://github.com/intel/llvm/blob/sycl/sycl/doc/SYCL_compiler_and_runtime_design.md#integration-with-spir-v-format
To simplify SYCL compiler implementation and upstreaming work we should investigate if it's possible to translate SYCL to SPIR-V w/o this pass.

According to my understanding the main problem this pass solves is name conflicts.
For instance, type names defined for SPIR-V can conflict with user types - e.g. OpTypeEvent.

SPIR-V in LLVM IR format description: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/master/docs/SPIRVRepresentationInLLVM.rst

[SYCL] OpenCL/SPIR built-ins

The current SYCL run-time implementation is reliant on SPIRV which is a problem if you have a device that doesn't support SPIRV just yet and relies on the old built-ins from OpenCL/SPIR, e.g. get_global_id rather than GlobalInvocationId. This is a bit of a problem for us as one of our ideal use cases is to output LLVM-IR for the device side code and feed it into our other tools which make use of SPIR/OpenCL C built-ins. As currently the IR will contain the SPIRV calls which need to be worked around on our end. It also may be interesting to support both in the long term under the concept that you wish to target multiple devices on the one platform and one does not support SPIRV.

So an initial question is, is this relevant to the current SYCL implementation? Or is this perhaps just something that's relevant to our aims at Xilinx and can thus be ignored (which is fair enough).

If the answer is that it is relevant then the question is how best to implement this and I have came up with some preliminary ideas to get the ball rolling:

  1. Solution we currently have/are working on, include OpenCL C built-in functions inside the cl::__spirv namespace and optionally define them based on a pre-processor macro (at the moment we have only added the ones we are interested in/that are used in the implementation as a minimalist approach, just including the opencl-c.h inside of a namespace is another approach though). An LLVM pass removes the namespace mangling from the name and you end up with the identical SPIR built-in mangling on the device. This seems to avoid host conflicts with same named user functions so far. Device conflicts are a little more tricky as you have to rename the mangling of the user defined function you have a conflict with during the pass after you've removed the namespace manglings from all built-ins.
    The pass could perhaps be turned off and on for SPIRV based on the define that optionally includes OpenCL/SPIR built-ins. The define could work similarly to SYCL__DEVICE_ONLY in that the driver defines it based on some input arguments or it could be user specified on compilation.
    The pass could in theory be part of the Reflower I suppose, but it may be putting too much emphasis on the Reflower at that point. Also having old OpenCL built-ins in the cl::__spirv namespace isn't really ideal if it's not actually SPIRV as it's misleading, so having them in a cl::__spir namespace and perhaps a SPIR folder is a better end goal.
    Note: The original idea for this was hopefully not to have to include any additional SPIR/OpenCL built-ins into the run-time API and to try and impact the compiler as little as possible through having a pass that just converts SPIRV intrinsics to SPIR intrinsics. However some of the cl::__spirv built-ins are trivial to convert in this manner (math.hpp) and others are a little more complex (spirv_vars.hpp) and opt phases can impact how they're translated to SPIR manglings in the pass. So unfortunately addition of the OpenCL C built-ins in someway are a requirement in this solution. At least if we want to keep the pass simple and more maintainable (keeping the pass too linked to SPIRV components in the run-time that may change frequently seems like a nightmare).

  2. Variation of the above except that we define intermediate/placeholder calls that will get replaced by SPIR/SPIRV built-ins by a later pass. This concept of placeholder calls to swap between built-in's may not be great, as it probably falls prey to the same arguments as the accessor class at the moment (string comparisons/brittle).

  3. Another mangling approach along the lines of the above except based on some variation of llvm/lib/Target/AMDGPU/AMDGPULibFunc.h/cpp except implemented for SYCL.h/.cpp (also similar to the Reflower I suppose, so perhaps its just idea 2 with extra steps..), placeholder function is used and mapped to either SPIR or SPIRV based on the target triple or a flag. Unsure how feasible this approach is as certain things in SPIRV don't function the same as in SPIR e.g. get_global_id is a function call but GlobalInvocationId in SPIRV seems to be more akin to CUDA's threadId, so the resulting IR is different.

  4. Sema based approach much like SemaSYCL just now. Placeholder built-in's/calls that have there AST modified by a TreeTransform to be either a SPIR or SPIRV style built-in.

I welcome any ideas/input/feedback on this as there is without a doubt better ways of doing this and flaws in the above cursory ideas. In the above approaches I tunnel visioned a little on our specific needs (e.g. not that bothered about speed of compilation, just need the correct mangled SPIR builtin names in the final LLVM-IR so we can link against our own libraries). I'm also happy to clarify any of the information above if the brief descriptions/spewing of thoughts are unclear!

As an aside: Perhaps a generalized way to offload/map to a specific devices built-ins could be an interesting direction. Alexey also mentioned some interest in a unified format for SPIR intrinsics in LLVM-IR as SPIR 1.2/2.0 is a little lacking. One example of a SPIR problem mentioned in idea 1, is that SPIR 1.2/2.0 manglings can cause conflicts with user defined functions of the same name if they're not handled correctly.

handler2.hpp:573:14: error: no matching member function for call to 'set'

Reproducer

git clone --depth 1 https://github.com/ParRes/Kernels.git PRK
cd PRK/common && cp make.defs.llvm make.defs
# edit make.defs appropriately (see below for suggestions)
cd ../Cxx11 && make stencil-sycl

make.defs hint

#
# SYCL flags
#
SYCLDIR=/opt/sycl
SYCLCXX=${SYCLDIR}/bin/clang++
SYCLFLAG=-fsycl -lsycl -lOpenCL -Wl,-rpath=${SYCLDIR}/lib
SYCLFLAG+=-std=c++17 -O3

Compiler

Freshly built within the past few hours...

jrhammon@jrhammon-nuc:~/PRK/Cxx11$ /opt/sycl/bin/clang++  -v
clang version 9.0.0 (https://github.com/intel/llvm 8a95712bcc9a2a6d34ce66b6b5948c24204fa846)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/sycl/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64

Error

This code compiles just fine with CodePlay and triSYCL.

jrhammon@jrhammon-nuc:~/PRK/Cxx11$ make stencil-sycl
/opt/sycl/bin/clang++ -DPRKVERSION="2.16" -fsycl -lsycl -lOpenCL -Wl,-rpath=/opt/sycl/lib -std=c++17 -O3 -DUSE_SYCL -DUSE_2D_INDEXING=0 -DUSE_BOOST_IRANGE  -DUSE_RANGES stencil-sycl.cc -o stencil-sycl
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
./stencil_sycl.hpp:10:7: note: in instantiation of function template specialization 'cl::sycl::handler::parallel_for<star1_1d<float>, (lambda at ./stencil_sycl.hpp:10:92), 2>'
      requested here
    h.parallel_for<class star1_1d<T>>(cl::sycl::range<2> {n-2,n-2}, cl::sycl::id<2> {1,1}, [=] (cl::sycl::item<2> it) {
      ^
stencil-sycl.cc:91:29: note: in instantiation of function template specialization 'star1<float>' requested here
          case 1: stencil = star1; break;
                            ^
stencil-sycl.cc:323:9: note: in instantiation of function template specialization 'run<float>' requested here
        run<float>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
./stencil_sycl.hpp:49:7: note: in instantiation of function template specialization 'cl::sycl::handler::parallel_for<star2_1d<float>, (lambda at ./stencil_sycl.hpp:49:92), 2>'
      requested here
    h.parallel_for<class star2_1d<T>>(cl::sycl::range<2> {n-4,n-4}, cl::sycl::id<2> {2,2}, [=] (cl::sycl::item<2> it) {
      ^
stencil-sycl.cc:92:29: note: in instantiation of function template specialization 'star2<float>' requested here
          case 2: stencil = star2; break;
                            ^
stencil-sycl.cc:323:9: note: in instantiation of function template specialization 'run<float>' requested here
        run<float>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
./stencil_sycl.hpp:98:7: note: in instantiation of function template specialization 'cl::sycl::handler::parallel_for<star3_1d<float>, (lambda at ./stencil_sycl.hpp:98:92), 2>'
      requested here
    h.parallel_for<class star3_1d<T>>(cl::sycl::range<2> {n-6,n-6}, cl::sycl::id<2> {3,3}, [=] (cl::sycl::item<2> it) {
      ^
stencil-sycl.cc:93:29: note: in instantiation of function template specialization 'star3<float>' requested here
          case 3: stencil = star3; break;
                            ^
stencil-sycl.cc:323:9: note: in instantiation of function template specialization 'run<float>' requested here
        run<float>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
./stencil_sycl.hpp:157:7: note: in instantiation of function template specialization 'cl::sycl::handler::parallel_for<star4_1d<float>, (lambda at ./stencil_sycl.hpp:157:92), 2>'
      requested here
    h.parallel_for<class star4_1d<T>>(cl::sycl::range<2> {n-8,n-8}, cl::sycl::id<2> {4,4}, [=] (cl::sycl::item<2> it) {
      ^
stencil-sycl.cc:94:29: note: in instantiation of function template specialization 'star4<float>' requested here
          case 4: stencil = star4; break;
                            ^
stencil-sycl.cc:323:9: note: in instantiation of function template specialization 'run<float>' requested here
        run<float>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
./stencil_sycl.hpp:226:7: note: in instantiation of function template specialization 'cl::sycl::handler::parallel_for<star5_1d<float>, (lambda at ./stencil_sycl.hpp:226:94), 2>'
      requested here
    h.parallel_for<class star5_1d<T>>(cl::sycl::range<2> {n-10,n-10}, cl::sycl::id<2> {5,5}, [=] (cl::sycl::item<2> it) {
      ^
stencil-sycl.cc:95:29: note: in instantiation of function template specialization 'star5<float>' requested here
          case 5: stencil = star5; break;
                            ^
stencil-sycl.cc:323:9: note: in instantiation of function template specialization 'run<float>' requested here
        run<float>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
stencil-sycl.cc:167:11: note: in instantiation of function template specialization 'cl::sycl::handler::parallel_for<add<float>, (lambda at stencil-sycl.cc:168:35), 2>' requested
      here
        h.parallel_for<class add<T>>(cl::sycl::range<2> {n, n}, cl::sycl::id<2> {0, 0},
          ^
stencil-sycl.cc:323:9: note: in instantiation of function template specialization 'run<float>' requested here
        run<float>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
./stencil_sycl.hpp:10:7: note: in instantiation of function template specialization 'cl::sycl::handler::parallel_for<star1_1d<double>, (lambda at ./stencil_sycl.hpp:10:92), 2>'
      requested here
    h.parallel_for<class star1_1d<T>>(cl::sycl::range<2> {n-2,n-2}, cl::sycl::id<2> {1,1}, [=] (cl::sycl::item<2> it) {
      ^
stencil-sycl.cc:91:29: note: in instantiation of function template specialization 'star1<double>' requested here
          case 1: stencil = star1; break;
                            ^
stencil-sycl.cc:324:9: note: in instantiation of function template specialization 'run<double>' requested here
        run<double>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
./stencil_sycl.hpp:49:7: note: in instantiation of function template specialization 'cl::sycl::handler::parallel_for<star2_1d<double>, (lambda at ./stencil_sycl.hpp:49:92), 2>'
      requested here
    h.parallel_for<class star2_1d<T>>(cl::sycl::range<2> {n-4,n-4}, cl::sycl::id<2> {2,2}, [=] (cl::sycl::item<2> it) {
      ^
stencil-sycl.cc:92:29: note: in instantiation of function template specialization 'star2<double>' requested here
          case 2: stencil = star2; break;
                            ^
stencil-sycl.cc:324:9: note: in instantiation of function template specialization 'run<double>' requested here
        run<double>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
./stencil_sycl.hpp:98:7: note: in instantiation of function template specialization 'cl::sycl::handler::parallel_for<star3_1d<double>, (lambda at ./stencil_sycl.hpp:98:92), 2>'
      requested here
    h.parallel_for<class star3_1d<T>>(cl::sycl::range<2> {n-6,n-6}, cl::sycl::id<2> {3,3}, [=] (cl::sycl::item<2> it) {
      ^
stencil-sycl.cc:93:29: note: in instantiation of function template specialization 'star3<double>' requested here
          case 3: stencil = star3; break;
                            ^
stencil-sycl.cc:324:9: note: in instantiation of function template specialization 'run<double>' requested here
        run<double>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
./stencil_sycl.hpp:157:7: note: in instantiation of function template specialization
      'cl::sycl::handler::parallel_for<star4_1d<double>, (lambda at ./stencil_sycl.hpp:157:92), 2>' requested here
    h.parallel_for<class star4_1d<T>>(cl::sycl::range<2> {n-8,n-8}, cl::sycl::id<2> {4,4}, [=] (cl::sycl::item<2> it) {
      ^
stencil-sycl.cc:94:29: note: in instantiation of function template specialization 'star4<double>' requested here
          case 4: stencil = star4; break;
                            ^
stencil-sycl.cc:324:9: note: in instantiation of function template specialization 'run<double>' requested here
        run<double>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
./stencil_sycl.hpp:226:7: note: in instantiation of function template specialization
      'cl::sycl::handler::parallel_for<star5_1d<double>, (lambda at ./stencil_sycl.hpp:226:94), 2>' requested here
    h.parallel_for<class star5_1d<T>>(cl::sycl::range<2> {n-10,n-10}, cl::sycl::id<2> {5,5}, [=] (cl::sycl::item<2> it) {
      ^
stencil-sycl.cc:95:29: note: in instantiation of function template specialization 'star5<double>' requested here
          case 5: stencil = star5; break;
                            ^
stencil-sycl.cc:324:9: note: in instantiation of function template specialization 'run<double>' requested here
        run<double>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
In file included from stencil-sycl.cc:63:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:573:14: error: no matching member function for call to 'set'
    MNDRDesc.set(std::move(NumWorkItems), std::move(WorkItemOffset));
    ~~~~~~~~~^~~
stencil-sycl.cc:167:11: note: in instantiation of function template specialization 'cl::sycl::handler::parallel_for<add<double>, (lambda at stencil-sycl.cc:168:35), 2>'
      requested here
        h.parallel_for<class add<T>>(cl::sycl::range<2> {n, n}, cl::sycl::id<2> {0, 0},
          ^
stencil-sycl.cc:324:9: note: in instantiation of function template specialization 'run<double>' requested here
        run<double>(host, iterations, n, tile_size, star, radius);
        ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:59:29: note: candidate function template not viable: requires single argument 'NumWorkItems', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::range<Dims_> NumWorkItems) {
                            ^
/opt/sycl/lib/clang/9.0.0/include/CL/sycl/detail/cg.hpp:70:29: note: candidate function template not viable: requires single argument 'ExecutionRange', but 2 arguments were
      provided
  template <int Dims_> void set(sycl::nd_range<Dims_> ExecutionRange) {
                            ^
12 errors generated.
make: *** [Makefile:156: stencil-sycl] Error 1

[SYCL] Abstraction layer for device-side SIMT intrinsics like getting global id

SYCL device runtime code uses various "intrinsic" functionalities like getting global and local id of a work item or group, which is usually expressed via efficient hardware-mapped code sequences in the native device code. SYCL device runtime should clearly define an interface layer of such intrinsic functionality and mechanisms to implement it for various devices, including those without SPIRV support. See some discussion here:
#221 (comment)

Currently device-side APIs this functionality are defined in headers under CL/__spirv directory.

[SYCL] build spir-v doesn't validate

This might be an artifact of my install, but attempting to validate the produced spirv with spirv-val errors here.

[airlied@dreadlord-bne-redhat-com ~]$ /opt/intel/bin/clang++ --sycl -Xclang -fsycl-int-header=simple-sycl-app-int-header.h -c simple-sycl-app.cpp -o kernel.spv
[airlied@dreadlord-bne-redhat-com ~]$ spirv-val ./kernel.spv
error: 212: Invalid decoration operand: 4469
[airlied@dreadlord-bne-redhat-com ~]$ spirv-dis ./kernel.spv
; SPIR-V
; Version: 1.0
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 197
; Schema: 0
OpCapability Addresses
OpCapability Linkage
OpCapability Kernel
OpCapability Int64
OpCapability Int8
%1 = OpExtInstImport "OpenCL.std"
OpMemoryModel Physical64 OpenCL
OpEntryPoint Kernel %17 "_ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE10FillBuffer"
%193 = OpString "kernel_arg_type._ZTSZZ4mainENK3$_0clERN2cl4sycl7handlerEE10FillBuffer.int*,range<1>,id<1>,"
OpSource OpenCL_CPP 100000
OpName %_Z13get_global_idj "_Z13get_global_idj"
OpName %class_cl__sycl__range "class.cl::sycl::range"
OpName %class_cl__sycl__detail__array "class.cl::sycl::detail::array"
OpName %struct_cl__sycl__id "struct.cl::sycl::id"
OpName %arg "arg"
OpName %_arg_Range "_arg_Range"
OpName %_arg_Offset "_arg_Offset"
OpName %class_anon "class.anon"
OpName %class_cl__sycl__accessor "class.cl::sycl::accessor"
OpName %struct_cl__sycl__detail__accessor_impl "struct.cl::sycl::detail::accessor_impl"
OpName %ZZZ4mainENK3__0clERN2cl4sycl7handlerEENKUlNS1_2idILi1EEEE_clES5 "_ZZZ4mainENK3$0clERN2cl4sycl7handlerEENKUlNS1_2idILi1EEEE_clES5"
OpName %struct_cl__sycl__detail__accessor_subscript_w "struct.cl::sycl::detail::accessor_subscript_w"
OpName %struct_cl__sycl__detail__accessor_common "struct.cl::sycl::detail::accessor_common"
OpName %class_cl__sycl__detail__accessor_base "class.cl::sycl::detail::accessor_base"
OpDecorate %194 FuncParamAttr ByVal
%194 = OpDecorationGroup
OpDecorate %195 Alignment 8
%195 = OpDecorationGroup
error: 212: Invalid decoration operand: 4469

[SYCL] Driver/ToolChain Extension

As a foreword this is perhaps more of a clarification question than an issue.

I've been working a fair bit with the Driver/ToolChain recently trying to integrate one of our back-ends, I've managed to get it working in a slightly hacky way and I'm looking to more appropriately integrate it now as I'd like to stay as closely aligned to Intel's intended goals as possible. I've noticed that the SYCL driver implementation works more like CUDA than OpenMP in the sense that instead of creating or looking for a new ToolChain using a getToolChain invocation it will create another SYCL ToolChain for the specific Host/Device pairing.

So my main question is how do you guys over at Intel envision people extending the current implementation for new devices (device targets)?

The way I originally thought it would work was a little more aligned to OpenMP than CUDA in that it would choose a device ToolChain over the SYCL ToolChain in the cases that a more specialized device ToolChain was present, e.g. I specify -fsycl-targets=fpga-intel and it would pick the best fit for the triple.

However, on further inspection it seems that it would perhaps eventually work more like the Myriad ToolChain? In that the SYCL ToolChain overloads the SelectTool function and based on the device triple given to the SYCL ToolChain it will select the appropriate ToolChain to offload the JobAction to? This method is a little closer to my hacky way of initially testing our tool integration.

The other method I can see is just having the one ConstructJob call in the SYCL ToolChain and creating different commands based on the triple internally, however that seems like it could get difficult to maintain pretty fast.

Please correct me if my understanding or assumptions above are wrong in any way!

[SYCL] Handler2 Copy Clarification/Possible bug

#include <vector>
#include <CL/sycl.hpp>

int main()
{
  const size_t nElems = 10u;

  std::vector<int> v(nElems);
  std::iota(std::begin(v), std::end(v), 0);

  buffer<int, 1> b{cl::sycl::range<1>(nElems)};

  cl::sycl::queue q;
  q.submit([&](handler& cgh) {
     accessor<int, 1, access::mode::write, access::target::global_buffer>
        acc(b, cgh, range<1>(nElems / 2), id<1>(0));

      cgh.copy(v.data(), acc);
  });

  q.wait();

  return 0;
}

So, I'm not entirely sure this is a bug, the above example is a slightly altered variation of this spec example: https://github.com/KhronosGroup/SYCL-Docs/blob/master/latex/code/explicitcopy.cpp

What I've found is that Handler2's copy methods are killing the device compilation for Xilinx devices, while the original Handler's implementations are not. It seems to be because the non-buffer pointer (v.data() in the above example) isn't being wrapped in a buffer and being passed by accessor to the parallel_for like it was in the original Handler implementation (in our case this means the pointer isn't given an address space and chokes in one of our passes unfortunately).

Is there a reason why it's accessed directly and not via an accessor at the moment (even if it's a host side parallel_for, the device still has to compile it if I'm not mistaken)? It seems like it's a temporary solution in the TODO and will be replaced by an OpenCL call inside memory_manager.cpp at some point. So I'm wondering if it's worth implementing a fix for it at the moment or if the update based on the memory_manager.cpp is perhaps close to completion?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.