Git Product home page Git Product logo

hcc's Introduction

HCC : An open source C++ compiler for heterogeneous devices

This repository hosts the HCC compiler implementation project. The goal is to implement a compiler that takes a program that conforms to a parallel programming standard such as HC, C++ 17 ParallelSTL and transforms it into the AMD GCN ISA.

The project is based on LLVM+CLANG. For more information, please visit the hcc wiki:

https://github.com/RadeonOpenCompute/hcc/wiki

Deprecation Notice

AMD is deprecating HCC to put more focus on HIP development and on other languages supporting heterogeneous compute. We will no longer develop any new feature in HCC and we will stop maintaining HCC after its final release, which is planned for June 2019. If your application was developed with the hc C++ API, we would encourage you to transition it to other languages supported by AMD, such as HIP or OpenCL. HIP and hc language share the same compiler technology, so many hc kernel language features (including inline assembly) are also available through the HIP compilation path.

Download HCC

The project now employs git submodules to manage external components it depends upon. It it advised to add --recursive when you clone the project so all submodules are fetched automatically.

For example:

# automatically fetches all submodules
git clone --recursive -b clang_tot_upgrade https://github.com/RadeonOpenCompute/hcc.git

For more information about git submodules, please refer to git documentation.

Build HCC from source

To configure and build HCC from source, use the following steps:

mkdir -p build; cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

To install it, use the following steps:

sudo make install

Use HCC

For HC source codes:

hcc -hc foo.cpp -o foo

Multiple ISA

HCC now supports having multiple GCN ISAs in one executable file. You can do it in different ways:

use --amdgpu-target= command line option

It's possible to specify multiple --amdgpu-target= option. Example:

# ISA for Fiji(gfx803) and Vega10(gfx900) would 
# be produced
hcc -hc \
    --amdgpu-target=gfx803 \
    --amdgpu-target=gfx900 \
    foo.cpp

configure HCC use CMake HSA_AMDGPU_GPU_TARGET variable

If you build HCC from source, it's possible to configure it to automatically produce multiple ISAs via HSA_AMDGPU_GPU_TARGET CMake variable.

Use ; to delimit each AMDGPU target. Example:

# ISA for Fiji(gfx803) and Vega10(gfx900) would 
# be produced by default
cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DHSA_AMDGPU_GPU_TARGET="gfx803;gfx900" \
    ../hcc

CodeXL Activity Logger

To enable the CodeXL Activity Logger, use the USE_CODEXL_ACTIVITY_LOGGER environment variable.

Configure the build in the following way:

cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DUSE_CODEXL_ACTIVITY_LOGGER=1 \
    <ToT HCC checkout directory>

In your application compiled using hcc, include the CodeXL Activity Logger header:

#include <CXLActivityLogger.h>

For information about the usage of the Activity Logger for profiling, please refer to its documentation.

HCC with ThinLTO Linking

To enable the ThinLTO link time, use the KMTHINLTO environment variable.

Set up your environment in the following way:

export KMTHINLTO=1

ThinLTO Phase 1 - Implemented

For applications compiled using hcc, ThinLTO could significantly improve link-time performance. This implementation will maintain kernels in their .bc file format, create module-summaries for each, perform llvm-lto's cross-module function importing and then perform clamp-device (which uses opt and llc tools) on each of the kernel files. These files are linked with lld into one .hsaco per target specified.

ThinLTO Phase 2 - Under development

This ThinLTO implementation which will use llvm-lto LLVM tool to replace clamp-device bash script. It adds an optllc option into ThinLTOGenerator, which will perform in-program opt and codegen in parallel.

To use HCC Printf Functions

Set up environmental variable:

export HCC_ENABLE_PRINTF=1

Then compile the printf kernel with HCC_ENABLE_ACCELERATOR_PRINTF macro defined.

~/build/bin/hcc -hc -DHCC_ENABLE_ACCELERATOR_PRINTF -lhc_am -o printf.out ~/hcc/tests/Unit/HSA/printf.cpp

For more examples on how to use printf, see tests in tests/Unit/HSA/printf*.cpp.

hcc's People

Contributors

aaronenyeshi avatar aditya4d1 avatar alexratcliff-mcw avatar alexvlx avatar alexvoicu avatar arsenm avatar bensander avatar changchengwang avatar david-salinas avatar dfukalov avatar facao avatar gargrahul avatar huimcw avatar jeffdaily avatar kwu91 avatar lyh-kernel-mcw avatar mangupta avatar pfultz2 avatar rocm-hcc avatar scchan avatar sunway513 avatar tstellaramd avatar unclehandsome avatar vsytch avatar whchung avatar xatier avatar xiangfeng2006 avatar yan-ming avatar yaoxiaocc avatar yxsamliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hcc's Issues

How to use hcc_hsail in Rocm 1.3

I had compiled my code successful and run correctly in rocm 1.2 envrionment(Kaveri).
But after I update the rocm to 1.3, I found it seems the hcc_hsail had been deprecated.
So, I use '/opt/rocm/hcc/' instead of /opt/rocm/hcc-hsail/. (for instance , use /opt/rocm/hcc/bin/hcc /opt/rocm/hcc/bin/hcc-config --cxxflags --ldflags -I /opt/rocm/hcc/include/ xxx.cpp -o xxx to compile). It compiles successful.
But it returns error when running.follows is the error information.

HCC STATUS_CHECK Error: HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS (0x100d) at file:/home/scchan/code/github/radeonopencompute/hcc.1.3/hcc/lib/hsa/mcwamp_hsa.cpp line:2504

Abortado (`core' generado)

How can I compile my code in Kaveri envrionment with rocm 1.3?

Couldn't build HCC clang_tot_upgrade branch on Ubuntu 16.04.1

Hi,
I am trying to build hcc on clang_tot_upgrade branch on Ubuntu 16.04.1 system. make world works fine but doing make produces this error.

In file included from /home/aditya/rocm/hcc.lc.tot/lib/mcwamp.cpp:8:
In file included from /usr/include/c++/v1/iostream:38:
In file included from /usr/include/c++/v1/ios:216:
In file included from /usr/include/c++/v1/__locale:15:
/usr/include/c++/v1/string:1938:44: error: 'basic_string<_CharT, _Traits, _Allocator>' is
      missing exception specification
      'noexcept(is_nothrow_copy_constructible<allocator_type>::value)'
basic_string<_CharT, _Traits, _Allocator>::basic_string(const allocator_type& __a)
                                           ^
/usr/include/c++/v1/string:1326:40: note: previous declaration is here
    _LIBCPP_INLINE_VISIBILITY explicit basic_string(const allocator_type& __a)
                                       ^
1 error generated.
lib/CMakeFiles/mcwamp.dir/build.make:62: recipe for target 'lib/CMakeFiles/mcwamp.dir/mcwamp.cpp.o' failed
make[2]: *** [lib/CMakeFiles/mcwamp.dir/mcwamp.cpp.o] Error 1
CMakeFiles/Makefile2:229: recipe for target 'lib/CMakeFiles/mcwamp.dir/all' failed
make[1]: *** [lib/CMakeFiles/mcwamp.dir/all] Error 2
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2

Supported Platform Clarification?

I know this is horribly pedantic, but I will have to buy a CPU & board to play with a Polaris card and HCC, my dev box is a Sandy Bridge. If I must buy Intel I would prefer to buy a low price i3 or Pentium instead of an expensive i5 or i7. And, I'm sort of waiting for Zen...

The HCC page at gpuopen says:

"Discrete GPU system support
CPU: Intel Haswell or Newer, Core™ i5, Core™ i7; Xeon® E3 v4 & v5; Xeon® E5 v3"

The HCC wiki page says:

"... Radeon discrete GPUs from the Fiji family ... paired with an Intel Haswell CPU or newer."

Two questions:

  1. Will the HCC stack work on an Haswell/Broadwell/Skylake i3 or Pentium?
  2. Will it work on Zen?

"GPU fault detected" for concurrency::parallel_for_each

I'm trying C++AMP with simple "vector_add" code, but it doesn't work.

Environment is following:

  • Ubuntu 14.04.4
  • Core i7-3770K
  • AMD Radeon R9 FuryX
  • DDR3-1600 2GBx4

Steps to reproduce are following:

  1. Clean-install ubuntu 14.04.4 ("Erase disk and install ubuntu")
  2. Boot normally, and I got "low level graphic mode" (this is expected for FuryX, right?).
  3. Enter console mode with "Ctrl+Alt+F1".
  4. Install hcc; following https://github.com/RadeonOpenCompute/ROCm#add-the-rocm-apt-repository
  5. reboot
  6. Enter console mode again
  7. build vector_add.cpp
  8. ./vector_add gets "Aborted (core dumped)"
  9. dmesg says blow
[ 1634.643988] amdgpu 0000:01:00.0: GPU fault detected: 146 0x003ac414
[ 1634.646454] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000007
[ 1634.648978] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x110C4014
[ 1634.650654] VM fault (0x14, vmid 8) at page 7, write from '' (0x00000000) (196)

No cored dumped when "concurrency::parallel_for_each" is commented out.
I think it's not a problem of the "vector_add.cpp" because it can build and run on VisualStudio2015.

What's the problem and how can I fix this?

HC restriction attribute causes compilation failures

Adding [[hc]] attribute to template functions results in unexpected compilation errors, even in non-GPU HPX stuff. Those errors doesn't make sense most of the time, they can't be reproduced with any other compiler or hcc when neither [[hc]] nor [[hc, cpu]] is used.

One of the problems could be repeated with this simple sample:
https://gist.github.com/mcopik/6e1cf89c44df783b08a125b83df80b83
If std::forward_as_tuple is used, everything compiles fine. Our functions and tuple functionalities are defined with an additional macro HPX_HOST_DEVICE in front, defined here:
https://github.com/mcopik/hpx/blob/compute/hpx/config/compiler_specific.hpp#L85
If HPX_HOST_DEVICE is [[cpu]], this example compiles fine. If it is [[cpu, hc]], which is required for invoking algorithms on GPU (those are purely template calls!), we get unexpected error:
"rvalue reference to type 'int' cannot bind to lvalue of type 'int'"

To reproduce the example, please clone the repository with branch compute. It does not require building the project, only configuring build with cmake:
-DHPX_WITH_HCC=(hcc location) -DHPX_WITH_HCC_HC=On
And then compilation of example requires additional include flags: -I$(HPX_SOURCE) -I$(HPX_BUILD)

LLVM-MC does not support `ds_add_f32`

I wasn't sure where to put this issue, since the llvm repository does not allow posting issues. The llvm-mc currently does not implement a number of instructions. In particular, I need ds_add_f32

Issues with thread_local in host code

I'm experiencing erratic issues with global host variables declared thread_local. I stripped it down to the following:

// func.h
#pragma once

bool some_func();
// func.cpp
#include "func.h"

struct state_t
{
    int tmp;
    bool initialized;
};

thread_local state_t state = { 0 };

bool some_func()
{
    return state.initialized;
}
# Makefile
CC=/opt/rocm/hcc/bin/hcc
CC_CONFIG=/opt/rocm/hcc/bin/hcc-config
GNC_ISA=--amdgpu-target=AMD:AMDGPU:7:0:1 # Hawaii

all:
    $(CC) `$(CC_CONFIG) --cxxflags --ldflags` $(GNC_ISA)  main.cpp func.cpp -I../ -I/opt/rocm/include -L/opt/rocm/lib -lhsa-runtime64 -o func

Compiling this causes hcc to crash (error log). Randomly changing variable types (e.g. bool state_t::initialized bool -> int) fixes this for this very example (I haven't found no regularity yet).

My configuration is:

> cat /etc/issue
Ubuntu 14.04.5 LTS \n \l

> uname -a
Linux XXXX 4.6.0-kfd-compute-rocm-rel-1.4-16 #1 SMP Tue Dec 13 13:14:21 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

> /opt/rocm/bin/hcc --version
HCC clang version 3.5.0  (based on HCC 0.10.16501-81f0a2f-02246a0 LLVM 3.5.0svn)
Target: x86_64-unknown-linux-gnu
Thread model: posix

misplaced files after instalation

trying to compile a program after the latest cmake changes results in:
clang-5.0: error: unable to execute command: Executable "hc-host-assemble" doesn't exist!
copying the file from compiler/bin to bin fixes the issue.
there are multiple files with the same problem:

clang-5.0: error: unable to execute command: Executable "hc-host-assemble" doesn't exist!
clang-5.0: error: unable to execute command: Executable "hc-kernel-assemble" doesn't exist!
/opt/rocm/hcc-clang-tot/bin/hc-kernel-assemble: line 107: /opt/rocm/hcc-clang-tot/bin/clamp-assemble: No such file or directory
/opt/rocm/hcc-clang-tot/bin/clamp-assemble: line 23: /opt/rocm/hcc-clang-tot/bin/clamp-embed: No such file or directory
/opt/rocm/hcc-clang-tot/bin/clamp-link: line 382: /opt/rocm/hcc-clang-tot/bin/clamp-device: No such file or directory

the problem is that the new cmake rules install a lot of files (but not all) to bin instead of compiler/bin:

$ ls /opt/rocm/hcc-clang-tot/bin/
amdgpu-objdump         ld.lld           llvm-dwarfdump   llvm-rtdyld
bugpoint               llc              llvm-dwp         llvm-size
c-index-test           lld              llvm-extract     llvm-split
clamp-config           lld-link         llvm-lib         llvm-stress
clang                  lli              llvm-link        llvm-strings
clang++                llvm-ar          llvm-lto         llvm-symbolizer
clang-5.0              llvm-as          llvm-lto2        llvm-tblgen
clang-check            llvm-bcanalyzer  llvm-mc          llvm-xray
clang-cl               llvm-cat         llvm-mcmarkup    obj2yaml
clang-cpp              llvm-config      llvm-modextract  opt
clang-format           llvm-cov         llvm-nm          sancov
clang-import-test      llvm-c-test      llvm-objdump     sanstats
clang-offload-bundler  llvm-cxxdump     llvm-opt-report  scan-build
extractkernel          llvm-cxxfilt     llvm-pdbdump     scan-view
git-clang-format       llvm-diff        llvm-profdata    verify-uselistorder
hcc                    llvm-dis         llvm-ranlib      yaml2obj
hcc-config             llvm-dsymutil    llvm-readobj

vs.

$ ls /opt/rocm/hcc-clang-tot-old/bin/
clamp-config  clang++        hcc         lld      llvm-objdump
clang         extractkernel  hcc-config  llvm-mc

Synchronization of copy_async

I have found out that a synchronization of device (accelerator) doesn't affect the asynchronous copy, for both explicit synchronization and waiting for an accelerator marker; sample proves that results are incorrect unless the future returned from copy_async is used.
The consequence is quite clear: usage an asynchronous copy results in code where it may not be possible to tell if a device is busy or not. Neither C++AMP specification nor HC docs warn that there is an exception to accelerator synchronization and I think the observed behaviour is not correct.

After reading HCC code, my guess is it happens because copy_async is implemented simply as a synchronous copy executed in a new thread. The copy operation itself, which may be simply implemented as a mapping pointer, std::copy and unmapping pointer through queue, doesn't store anywhere information that there are mapped buffers on host. Synchronization checks only for asynchronous operations stored in queue.

Capturing temporary object by reference in completion_future

Method then() in completion_future, responsible for launching a callback functor when completion_future is ready, takes a const reference to functor and then passes it by reference to newly created thread:
https://github.com/RadeonOpenCompute/hcc/blob/master/include/hc.hpp#L953

Hence a reference to a temporary object may be saved by the thread. The result is quite obvious to foresee - a mysterious memory corruption when the callback is launched. It took us three days to figure it out why this piece of code is failing:

some_data * ptr = ....;
auto fut = device.create_marker();
fut.then([ptr]() { /* do something with ptr, which is already garbage */ });

C++AMP API declares this function as taking a const reference, not universal reference (which would enforce moving an rvalue), but the whole issue can be solved just by capturing the functor by value in the lambda passed to std::thread.

Executing the binaries generated by the HCC-Example-Application Segfaults

I have APU Carrizo environment (AMD Embedded R-Series RX-421BD Radeon R7 SOC board) and im running Ubuntu 16.04 on it. Installed ROCm 1.4 and ran vector_copy sample, got all success messages. I git cloned HCC-Example-Application repository on my system built all the example samples and executing the ArrayBandwidth and HCFFT binaries gave the correct output. But when i try to execute MD, SPMV and SyncVsAsyncArrayCopy binaries it Segfaults with core dumped. Also, when i try to re execute the binaries ArrayBandwidth and HCFFT(which previously gave the output) segfaults with the same message.
Are there any steps I'm missing or should be doing in order to execute all the binaries successfully? I'm a newbie to HSA programming domain. So any help is much appreciated.

Thanks in advance!
bhsomegowda

clamp-link: issue with handling paths containing spaces

Reported by @briansp2020 in ROCm/ROCm#47

I tried to build my code at cs344 Problem Set 1 and I get errors.

$ make
hipcc -o HW1 main.o student_func.o compare.o reference_calc.o -L /usr/lib -lopencv_core -lopencv_imgproc -lopencv_highgui -g -hc -std=c++amp
/opt/rocm/hcc-lc/compiler/bin/clamp-link: line 302: cd: /home/briansp/git/cs344/Problem: No such file or directory
objdump: 'main.o': No such file
objdump: 'student_func.o': No such file
objdump: 'compare.o': No such file
objdump: 'reference_calc.o': No such file
ld: cannot find main.o: No such file or directory
ld: cannot find student_func.o: No such file or directory
ld: cannot find compare.o: No such file or directory
ld: cannot find reference_calc.o: No such file or directory
clang-3.5: error: linker command failed with exit code 1 (use -v to see invocation)
Died at /opt/rocm/bin/hipcc line 365.
Makefile:37: recipe for target 'student' failed
make: *** [student] Error 1

Then, I copy the code to ~/dev/cs344 and it builds fine. It looks like the linker is not handling space in the path properly. I'm reporting it here since I'm not sure whether the linker is part of llvm or hcc or clang or lld.

Compile in 32-bit mode

I wonder if there is a way to compile c++ amp program in 32-bit mode. I tried to use hcc 'hcc-config --cxxflags --ldflags' -m32 saxpy.cpp -o saxpy, but I got the error

warning: overriding the module target triple with x86_64-unknown-linux-gnu
1 warning generated.
ld: cannot find -lc++
ld: cannot find -lc++abi

How does hcc link HSA runtime? Is there a way we can link another implementation of the runtime library in 32 bit mode?

Linking two modules of different data layouts

After building clang_tot_upgrade and everything else from source, I get this warning, which doesnt' appear to cause any problems, but more unsightly than anything.

hcc -Wall $(hcc-config --cxxflags --ldflags) -Os -fPIC -I/opt/rocm/include -o saxpy  -Wl,--rpath=/opt/rocm/lib -L/opt/rocm/lib saxpy .cpp
WARNING: Linking two modules of different data layouts: '/tmp/tmp.WIU6ppDs60/hcc-83b52a.kernel.bc' is '' whereas 'llvm-link' is 'e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64

CPU Node [0] has no GPU connected

First, congratulations on the 1.3 release.

Not sure what you did but my GNOME openbox desktop is working again.

Using the new 4.6 kernel, and much the same setup as the 1.2 setup for clang_tot_upgrade branch, I am having difficulty running the saxpy example.

hcc $(hcc-config --ldflags --cxxflags) -o sax saxpy.cpp -g3
LD_LIBRARY_PATH=/opt/rocm/lib ./sax
CPU Node [0] has no GPU connected
Segmentation fault

Strange behavior of pointer casts

I'm not sure whether pointer typecasts are allowed operations in HC kernels when performed on registers and/or LDS. For example, say I have a register and I access part of it like this:

uint32_t x = 0xffffffff;
char a = ((char*)&x)[0];
char b = ((char*)&x)[1];
char c = ((char*)&x)[2];
char d = ((char*)&x)[3];

It seems depending on possibly what operations are performed before this piece of code, the results in the char variables may be incorrect. I'm confident that this does work properly if x is in LDS, but when it is a register I think strange things can happen. I've read that there is no 'indexed register file access' in most GPU's. Is this type of operation illegal (perhaps correct would be to use bitextract or shifts and cast)? Note: it seems difficult to reproduce in any sane way. In my case the result depends on whether I initialize a completely different register to 0 or not before this piece of code.

Unexpected behaviour of access method to accelerator pointer

Method accelerator_pointer() in hc::array gives direct address of data allocated on an accelerator. Unfortunately, the behaviour is not predictable as a sample presents:
https://github.com/mcopik/cppamp_samples/blob/master/hc/accelerator_pointer.cpp

Allocating another data on a CPU through accessing array_view changes currently used queue:
https://github.com/RadeonOpenCompute/hcc/blob/master/include/kalmar_runtime.h#L698
And the access method uses the currently used queue instead of a master device queue:
https://github.com/RadeonOpenCompute/hcc/blob/master/include/kalmar_runtime.h#L617

Changing this line into: devs[master->getDev].data solves the issue and array always return a correct address on device.

Event-based timing

Hi,

I'd like to perform event-based timing. With CUDA you can place a special event in the GPU command queue to perform timing for asynchronous operations. Is there something similar available with hcc (or maybe by means of interop with HSA, I've seen e.g. HSA_EXTENSION_PROFILING_EVENTS but am not sure if this is what I'm searching for and how to use it).

Cheers,
Stefan

rocm 1.1 device query?

Hi,
I am using rocm 4.4.0-kfd-compute-rocm-rel-1.1.1-10 and wanted to write a simple device query app with hc:

void print_device_info(hc::accelerator& acc){
   std::cout << "\tname:" << acc.get_description().c_str() << "\n";
  std::cout << "\tmem :" << acc.get_dedicated_memory() << "\n";
  std::cout << "\tvers:" << acc.get_version() << "\n";
 }

int main(int argc, char *argv[])
{
std::vector<accelerator> accs = accelerator::get_all();
for ( size_t i = 0; i < accs.size();++i){
      std::cout << "devid: " << i << " / "<< accs.size()<<"\n";
      print_device_info(accs[i]);
    }
return 0;
}

The output I am getting on a R730 with 2xE5-2620v3 and 1x S9300x2 is:

devid: 0 / 3
        name:0x7f7fd1dbcc20
        mem :0
        vers:0
devid: 1 / 3
        name:0x7f7fd1dbbbb0
        mem :0
        vers:0
devid: 2 / 3
        name:0x7f7fd1dbbbb0
        mem :0
        vers:0

Am I using the API wrong or is the driver not ready yet?
Best -

HCC STATUS_CHECK Error: HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS

Saxpy example fails with message

### HCC STATUS_CHECK Error: HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS (0x100d) at file:/home/scchan/code/github/hcc-roc-1.4.x/hcc/lib/hsa/mcwamp_hsa.cpp line:2511

Thread 1 "hcc" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
58      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1  0x00007ffff6f3e40a in __GI_abort () at abort.c:89
#2  0x00007ffff679ed31 in Kalmar::HSADevice::BuildOfflineFinalizedProgramImpl(void*, int) () from /opt/rocm/hcc-lc/lib/libmcwamp_hsa.so
#3  0x00007ffff679b6b5 in Kalmar::HSADevice::BuildProgram(void*, void*, bool) () from /opt/rocm/hcc-lc/lib/libmcwamp_hsa.so
#4  0x0000000000404266 in Kalmar::KalmarBootstrap::KalmarBootstrap() ()
#5  0x0000000000403817 in __hcc_shared_library_init ()
#6  0x000000000040811d in __libc_csu_init ()
#7  0x00007ffff6f2a240 in __libc_start_main (main=0x405290 <main>, argc=1, argv=0x7fffffffe548, init=0x4080d0 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe538) at ../csu/libc-start.c:247
#8  0x0000000000407ffa in _start ()

[HSA] Implicitly resource leaking on kernelBufferMap

Hi @whchung,

I'm going through the HCC runtime to seek other chances to improve the overall performance for the porting applications. After some profiling experiments, I noticed that LaunchKernelWithDynamicGroupMemoryAsync somehow takes a portion of time and the cost of std::map operations seems to be one main reason for that.

By looking closer, I noticed that the map used for recording kernel-buffer dependency chain might take part here. In HSAQueue::Push(), map elements are created implicitly while there are two std::for_each calls to traverse the whole map, which takes O(map.size()) time complexity. The map will be released at a very late stage of HCC runtime.

I came up with the following patch to erase those used map elements. This does reduce the map size but I didn't actually see obvious performance gain from my application. Perhaps we might need a bigger refactoring here on the kernel-buffer dependency handling.

I knew you have another ongoing hsa_async_copy branch to leverage the AMD APIs for async copy operations, wondering if you have any further plans for async kernels here.

diff --git a/lib/hsa/mcwamp_hsa.cpp b/lib/hsa/mcwamp_hsa.cpp                    
index 7a854e8..97d943a 100644                                                   
--- a/lib/hsa/mcwamp_hsa.cpp                                                    
+++ b/lib/hsa/mcwamp_hsa.cpp                                                    
@@ -653,6 +653,7 @@ public:                                                     

         // clear data in kernelBufferMap                                       
         kernelBufferMap[ker].clear();                                          
+        kernelBufferMap.erase(ker);                                          

         delete(dispatch);                                                      
     }                                                                          
@@ -697,6 +698,7 @@ public:                                                     

         // clear data in kernelBufferMap                                       
         kernelBufferMap[ker].clear();                                          
+        kernelBufferMap.erase(ker);                                          

         return sp_dispatch;                                                    
     }

This won't bring regressions on my end.

Failing Tests (7):
    CPPAMP :: Unit/AmpShortVectors/hc_short_vector_device.cpp
    CPPAMP :: Unit/HC/memcpy_symbol1.cpp
    CPPAMP :: Unit/HC/memcpy_symbol3.cpp
    CPPAMP :: Unit/HC/wg_size.cpp
    CPPAMP :: Unit/HSAIL/shfl_xor.cpp
    CPPAMP :: Unit/SharedLibrary/shared_library2.cpp
    CPPAMP :: Unit/SharedLibrary/shared_library3.cpp

  Expected Passes    : 662
  Expected Failures  : 25
  Unsupported Tests  : 10
  Unexpected Failures: 7

Compilation failure when accessing global variables in different compilation unit

test.cpp:

#include <amp.h>
#include <iostream>

extern int value;

int main(void)
{
        ::std::atomic_uint sum;
        sum = value;
        parallel_for_each(concurrency::extent<1>(100),
                          [&](concurrency::index<1> i) restrict (amp)
        {
                sum += value;
        });
        ::std::cout << sum << "\n";
        return 0;
}

other.cpp:

int value = 5;

compiler failure:
error: :0:0: in function ZZ4mainEN3_EC__019__cxxamp_trampolineEPNSt3__16atomicIjEE void (%"struct.std::__1::atomic.0" addrspace(1)*): unsupported initializer for address space

error: :0:0: in function ZZ4mainEN3_EC__019__cxxamp_trampolineEPNSt3__16atomicIjEE void (%"struct.std::__1::atomic.0" addrspace(1)*): unsupported initializer for address space

LLVM ERROR: Cannot select: t6: i64 = GlobalAddress<i32 addrspace(1)* @value2> 0
In function: ZZ4mainEN3_EC__019__cxxamp_trampolineEPNSt3__16atomicIjEE
cannot open /tmp/tmp.Aiz9Ys5QhK/kernel.brig.hsail: No such file or directory
clang-3.5: error: linker command failed with exit code 1 (use -v to see invocation)

the same local pointer trick as in #62 works here as well
test-works.cpp:

#include <amp.h>
#include <iostream>

extern int value;

int main(void)
{
        ::std::atomic_uint sum;
        sum = value;
        int *val_l = &value;
        parallel_for_each(concurrency::extent<1>(100),
                          [&](concurrency::index<1> i) restrict (amp)
        {
                sum += *val_l;
        });
        ::std::cout << sum << "\n";
        return 0;
}

Runtime crash on APU when using HCC compiler

Ubuntu VERSION="16.04.1 LTS (Xenial Xerus)"
AMD A10-7860K Radeon R7, 12 Compute Cores 4C+8G
Attempting to compile and run the saxpy code found at: https://gist.github.com/scchan/540d410456e3e2682dbf018d3c179008 saved in a file called hcctestfile.cpp
Compiling using : hcc hcc-config --cxxflags --ldflags hcctestfile.cpp -o test && ./test
compiles successfully but when run produces: HCC STATUS_CHECK Error: HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS (0x100d) at file:/home/scchan/code/github/radeonopencompute/hcc.1.3/hcc/lib/hsa/mcwamp_hsa.cpp line:2504 Aborted (core dumped)
I have tried adding: --amdgpu-target=AMD:AMDGPU:7:0:1 or --amdgpu-target=AMD:AMDGPU:8:0:1
resulting in the same error

The code compiles and runs correctly on a different pc with a discrete gpu: Fiji [Radeon R9 FURY / NANO Series]

errormessage

HSAIL source buffer not available

Hi - I am trying to contribute a stream benchmark using HCC as aluded to #101 . the code does compile alright with the default hcc-config --cxxflags and hcc-config --ldflags, when I want to launch the benchmark however, I receive a SIGABRT from libc.so: clone() ! Inside rocm-gdb, I see the following:

ROCm-gdb: GPU Debugging has been successfully initialized]
GPU-STREAM
Version: 2.0
Implementation: HCC
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using HCC device AMD HSA Agent Fiji2
[New Thread 0x7fffb1e54700 (LWP 21503)]
[ROCm-gdb]: The code object for the current dispatch does not contain debug information
HSAIL source buffer not available
HSAIL kernel source debugging will not occur

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff5aeb700 (LWP 21501)]
0x00007ffff6ed2cc9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(ROCm-gdb) bt
#0  0x00007ffff6ed2cc9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff6ed60d8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff669925b in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#3  0x00007ffff669c855 in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#4  0x00007ffff6684547 in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#5  0x00007ffff747f182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#6  0x00007ffff6f9647d in clone () from /lib/x86_64-linux-gnu/libc.so.6

am I compiling against the wrong backend? I am confused. Can you help?

error: invalid argument '-std=c++amp' not allowed with 'C/ObjC'

I was compiling some code that was mostly written in plain c++ and a function is written with C++ HC. When compile, I got the error in the title. Even if I only use hcc-config --cxxflags to compile a .o file, it will still give me this error. Is there any explaination to this error? Can that be caused by may header file has the extension of h rather than hpp? I think .h is still very wildly used for c++ headers, so why forbidding .h headers?

Need more intrinsic functions

Hi,

I'm porting some C++ AMP applications from Windows/MSVC to Linux/HCC.
I used some GPU intrinsic functions in Microsoft's C++ AMP implementation, such as clamp, mad, imax, imin(declared in direct3d namespace). Though these functions can be emulated by software but they may affect performance.
Will you add such intrinsic functions in your C++AMP/HCC implementation?

Compile error when -Werror, -Wreorder enabled

I got the following error if -Werror -Wall enabled. I am not sure if it is my mistake or some potential problem in HCC. Although I can use -Wno-reorder to disable the error detection, I think it is better to get rid of this problem, since it should be a very easy fix.

/opt/rocm/hcc-lc/include/kalmar_runtime.h:95:47: error: field 'seqNum' will be initialized after field 'commandKind' [-Werror,-Wreorder]
  KalmarAsyncOp(hcCommandKind xCommandKind) : seqNum(0), commandKind(xCommandKind) {} 

Semantics of asynchronous operations

The API of HC defines three ways to obtain a completion_future:

  1. as a synchronization point with device through create_marker()
  2. as a return value of asynchronous copy
  3. as a return value of p_f_e

In cases 2 and 3, I guess any exception thrown during execution of this specific operation is going to be rethrown after calling .get(). I'm curious how HC defines the exception handling of first case? My expectation is that exception, which has been thrown by any operation in the queue, is going to be rethrown in .get() function of marker. How does HC resolve this issue?

What if there are two copy operations enqueued on the device, both of them fail and throw an exception - what is the expected behaviour of marker in such case? Throw the very last exception? Don't do anything and don't signalize a failure during execution?

HCC compile permission error requiring recompile

Typically hcc seems to need to compile something twice - first time it gives a permission denied error, and the second time it goes through.

hcc `hcc-config --cxxflags --ldflags` proto3.cpp -o proto3.o
/opt/rocm/hcc-lc/compiler/bin/llvm-link: /tmp/tmp.BTueUbkHwn/proto3.kernel.bc:1:1: error: expected top-level entity
ELF@▒@▒#@8@
/opt/rocm/hcc-lc/compiler/bin/llvm-link: error loading file '/tmp/tmp.BTueUbkHwn/proto3.kernel.bc'
ld: cannot open output file /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: Permission denied
clang-3.5: error: linker command failed with exit code 1 (use -v to see invocation)

Link hcc outfiles with external compilation units

Hi,

sorry for misusing the issue tracker for a mere question.

I'd like to link hcc-compiled compilation units with .o files that are generated with a different compiler and found no section in the docs where this is explained. I built the hcc .o with a command similar to hcc hcc-config --cxxflags file.cpp -c -o file.o and would now like to do linking with an "ordinary" host compiler so I can build a shared library from several external compilations units, and from file.o. Since only file.o contains "heterogeneous" code, I'd think that -hc does not need to end up on the linker command line, and that any linker can be used, am I right? In that case, what hcc libraries would I have to link in to make this work?

Couldn't build HCC clang_tot_upgrade branch on Debian gcc 6.2.1

After finding this ticket: #180

And upgrading the libs, the problem is still occuring:

Scanning dependencies of target amptest
[  4%] Building CXX object lib/CMakeFiles/mcwamp_atomic.dir/mcwamp_atomic.cpp.o
[  8%] Building CXX object lib/CMakeFiles/hcc-config.dir/mcwamp_main.cpp.o
[ 12%] Building CXX object lib/CMakeFiles/clamp-config.dir/mcwamp_main.cpp.o
[ 16%] Building CXX object lib/CMakeFiles/mcwamp.dir/mcwamp.cpp.o
[ 20%] Building CXX object lib/hsa/CMakeFiles/hc_am.dir/hc_am.cpp.o
[ 24%] Building CXX object utils/gtest/CMakeFiles/mcwamp_gtest.dir/gtest_main.cc.o
[ 28%] Building CXX object utils/gtest/CMakeFiles/mcwamp_gtest.dir/gtest-all.cc.o
[ 32%] Building CXX object lib/hsa/CMakeFiles/mcwamp_hsa.dir/mcwamp_hsa.cpp.o
[ 36%] Building CXX object lib/hsa/CMakeFiles/mcwamp_hsa.dir/unpinned_copy_engine.cpp.o
[ 40%] Building CXX object lib/cpu/CMakeFiles/mcwamp_cpu.dir/mcwamp_cpu.cpp.o
In file included from /tmp/rocm/hcc/lib/mcwamp_main.cppIn file included from :9/tmp/rocm/hcc/lib/mcwamp_main.cpp:
:In file included from 9/usr/include/c++/v1/stdlib.h:
:In file included from 94/usr/include/c++/v1/stdlib.h:
:In file included from 94/usr/lib/gcc/x86_64-linux-gnu/6.2.1/../../../../include/c++/6.2.1/stdlib.h:
:In file included from 36/usr/lib/gcc/x86_64-linux-gnu/6.2.1/../../../../include/c++/6.2.1/stdlib.h:
:36/usr/include/c++/v1/cstdlib:
:95/usr/include/c++/v1/cstdlib::995:: 9: error: error: no memberno  namedmember  'div_t'named  in'div_t'  thein  globalthe  namespaceglobal
namespace
using ::div_t;
      ~~^using ::div_t;

      ~~^
/usr/include/c++/v1/cstdlib:96:9: error: /usr/include/c++/v1/cstdlib:96no: 9member:  named 'ldiv_t'error : in theno  globalmember  namespacenamed
'ldiv_t' in theusing ::ldiv_t;
global       ~~^namespace

using ::ldiv_t;
      ~~^
/usr/include/c++/v1/cstdlib:98:9: error: no member named /usr/include/c++/v1/cstdlib'lldiv_t': 98in: 9the:  global namespaceerror:
no memberusing ::lldiv_t;
named       ~~^'lldiv_t'
 in the global namespace
using ::lldiv_t;
      ~~^
/usr/include/c++/v1/cstdlib:100:9: error: no member named 'atof' in the global namespace
using ::atof;/usr/include/c++/v1/cstdlib
:100      ~~^:
9: error: no member named 'atof' in the global namespace
using ::atof;
      ~~^
/usr/include/c++/v1/cstdlib:101:9: error: no member named 'atoi' in the global namespace
using ::atoi;
      ~~^
/usr/include/c++/v1/cstdlib:101:9: error: no member named 'atoi' in the global namespace
using ::atoi;
      ~~^
/usr/include/c++/v1/cstdlib:102:9: error: no member named 'atol' in the global namespace
using ::atol;
      ~~^
/usr/include/c++/v1/cstdlib:102:9: error: no member named 'atol' in the global namespace
using ::atol;
      ~~^
/usr/include/c++/v1/cstdlib:104:9: error: no member named 'atoll' in the global namespace
using ::atoll;
      ~~^
/usr/include/c++/v1/cstdlib:104:9: error: no member named 'atoll' in the global namespace
using ::atoll;
      ~~^
/usr/include/c++/v1/cstdlib:106:9: error: no member named 'strtod' in the global namespace
using ::strtod;
      ~~^
/usr/include/c++/v1/cstdlib:106:9: error: no member named 'strtod' in the global namespace
using ::strtod;
      ~~^
/usr/include/c++/v1/cstdlib:107:9: error: no member named 'strtof' in the global namespace
using ::strtof;
      ~~^
/usr/include/c++/v1/cstdlib:107:9: error: no member named 'strtof' in the global namespace
using ::strtof;
      ~~^
/usr/include/c++/v1/cstdlib:108:9: error: no member named 'strtold' in the global namespace
using ::strtold;
      ~~^
/usr/include/c++/v1/cstdlib:108:9: error: no member named 'strtold' in the global namespace
using ::strtold;
      ~~^
/usr/include/c++/v1/cstdlib:109:9: error: no member named 'strtol' in the global namespace
using ::strtol;
      ~~^
/usr/include/c++/v1/cstdlib:109:9: error: no member named 'strtol' in the global namespace
using ::strtol;
      ~~^
/usr/include/c++/v1/cstdlib:111:9: error: no member named 'strtoll' in the global namespace
using ::strtoll;
      ~~^
/usr/include/c++/v1/cstdlib:111:9: error: no member named 'strtoll' in the global namespace
using ::strtoll;
      ~~^
/usr/include/c++/v1/cstdlib:113:9: error: no member named 'strtoul' in the global namespace
using ::strtoul;
      ~~^
/usr/include/c++/v1/cstdlib:113:9: error: no member named 'strtoul' in the global namespace
using ::strtoul;
      ~~^
/usr/include/c++/v1/cstdlib:115:9: error: no member named 'strtoull' in the global namespace
using ::strtoull;
      ~~^
/usr/include/c++/v1/cstdlib/usr/include/c++/v1/cstdlib::115117::99::  errorerror: : nono  membermember  namednamed  'strtoull''rand'  inin  thethe  globalglobal  namespacenamespace

using ::rand;
      ~~^
using ::strtoull;
      ~~^
/usr/include/c++/v1/cstdlib:118:9: error: no member named 'srand' in the global namespace
using ::srand;
/usr/include/c++/v1/cstdlib:      ~~^117
:9: error: no member named 'rand' in the global namespace
using ::rand;
      ~~^
/usr/include/c++/v1/cstdlib:119:9: error: no member named 'calloc' in the global namespace
using ::calloc;
      ~~^
/usr/include/c++/v1/cstdlib:118:9: error: no member named 'srand' in the global namespace
using ::srand;
      ~~^
/usr/include/c++/v1/cstdlib:120:9: error: no member named 'free' in the global namespace
using ::free;
      ~~^
/usr/include/c++/v1/cstdlib:119:9: error: no member named 'calloc' in the global namespace
using ::calloc;
      ~~^
/usr/include/c++/v1/cstdlib:121:9: error: no member named 'malloc' in the global namespace
using ::malloc;
      ~~^
/usr/include/c++/v1/cstdlib:120:9: error: no member named 'free' in the global namespace
using ::free;
      ~~^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
/usr/include/c++/v1/cstdlib:121:9: error: no member named 'malloc' in the global namespace
using ::malloc;
      ~~^
[ 44%] Building CXX object amp-conformance/CMakeFiles/amptest.dir/amp_test_lib/src/device.cpp.o
fatal error: too many [ 48%] Building CXX object amp-conformance/CMakeFiles/amptest.dir/amp_test_lib/src/runall.cpp.o
errors emitted, stopping now [-ferror-limit=]
[ 52%] Building CXX object amp-conformance/CMakeFiles/amptest.dir/amp_test_lib/src/string_utils.cpp.o
[ 56%] Building CXX object amp-conformance/CMakeFiles/amptest.dir/amp_test_lib/src/logging.cpp.o
[ 60%] Building CXX object amp-conformance/CMakeFiles/amptest.dir/amp_test_lib/src/main.cpp.o
[ 64%] Building CXX object amp-conformance/CMakeFiles/amptest.dir/amp_test_lib/src/context.cpp.o
20 errors generated.
lib/CMakeFiles/hcc-config.dir/build.make:62: recipe for target 'lib/CMakeFiles/hcc-config.dir/mcwamp_main.cpp.o' failed
make[2]: *** [lib/CMakeFiles/hcc-config.dir/mcwamp_main.cpp.o] Error 1
CMakeFiles/Makefile2:303: recipe for target 'lib/CMakeFiles/hcc-config.dir/all' failed
make[1]: *** [lib/CMakeFiles/hcc-config.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
20 errors generated.
lib/CMakeFiles/clamp-config.dir/build.make:62: recipe for target 'lib/CMakeFiles/clamp-config.dir/mcwamp_main.cpp.o' failed
make[2]: *** [lib/CMakeFiles/clamp-config.dir/mcwamp_main.cpp.o] Error 1
CMakeFiles/Makefile2:192: recipe for target 'lib/CMakeFiles/clamp-config.dir/all' failed
make[1]: *** [lib/CMakeFiles/clamp-config.dir/all] Error 2
[ 68%] Linking CXX static library libmcwamp_atomic.a
[ 68%] Built target mcwamp_atomic
[ 72%] Linking CXX shared library ../libmcwamp_cpu.so
[ 76%] Linking CXX static library libmcwamp.a
[ 76%] Built target mcwamp
[ 76%] Built target mcwamp_cpu
[ 80%] Linking CXX shared library ../libhc_am.so
[ 80%] Built target hc_am
[ 84%] Linking CXX static library ../lib/libamptest.a
[ 84%] Built target amptest
[ 88%] Linking CXX shared library ../libmcwamp_hsa.so
[ 88%] Built target mcwamp_hsa
[ 92%] Linking CXX static library ../../lib/libmcwamp_gtest.a
[ 92%] Built target mcwamp_gtest
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2

Can't access global variables

Simple program produces incorrect results. (see attached file)

#include <amp.h>
#include <iostream>

::std::atomic_uint sum;

int main(void)
{
        sum = 1;
        parallel_for_each(concurrency::extent<1>(100),
                          [&](concurrency::index<1> i) restrict (amp)
        {
                sum += 1;
        });
        ::std::cout << sum << "\n";
        return 0;
}

$ ./test
1

simple modification to add local pointer gives expected result:

#include <amp.h>

#include <iostream>

::std::atomic_uint sum;

int main(void)
{
        sum = 1;
        ::std::atomic_uint *sum_l = &sum;
        parallel_for_each(concurrency::extent<1>(100),
                          [&](concurrency::index<1> i) restrict (amp)
        {
                *sum_l += 1;
        });
        ::std::cout << sum << "\n";
        return 0;
}

$ ./test
101

I'm using hcc configured with amdgpu llvm backend and latest git of llvm and lld. The result is the same both on kaveri and carrizo.

Note: CBackend segfaults in the second case, but the resulting binary works.

C++14 support?

Is there C++14 support for hc planed? I would quite like this.

Compile problem

I use 'tile_static std::atomic_int' define a int variate in my code. But there is compile problem as follows:

/opt/rocm/hcc-hsail/compiler/bin/clamp-device: line 167: 13996 Killed $OPT -load $LIB/LLVMPromote.so -load $LIB/LLVMEraseNonkernel.so -load $LIB/LLVMTileUniform.so -promote-globals -promote-privates -erase-nonkernels -tile-uniform -malloc-select -dce -globaldce -S -o $2.promote.ll.orig < $1

And follows is my code snippet:
..........
............
tile_static std::atomic_int outlier_block_count[1];
........
.......
....
atomic_fetch_add(&outlier_block_count[0], 1); // I think this is the problem point
......

If I comment out 'atomic_fetch_add(&outlier_block_count[0], outlier_local_count); ' line. There is no error in compiling.
Is there some special things need to note when use tile_staic std::atmoic_int variate??

Unable Compile with error of "fatal error: 'amp.h' file not found"

hello, I am exploring a HPC code by C++ AMP.

I am in trouble about building environment of HCC on Ubuntu, could you help me?
I am Japanese, sorry for my poor English.

summary, I couldn't compile this test case code by HCC packaged in rocm explained in this site.
here is the full code.

#include <iostream>
#include <amp.h>


int main()
{
    std::cout << "hello\n";
    return 0;
}

compile command is,
hcc hcc-config --cxxflags --ldflags test.cpp
then, this is outputted.
test.cpp:3:10: fatal error: 'amp.h' file not found
following terms are my environmental information.

CPU : corei5 6400
memory : DDR4 8GB
OS : Ubuntu 16.04 kernel = 4.4.0-kfd-compute-rocm-rel-1.2-31
GPU : R9 390
graphic driver : Radeon Pro GPU driver in the site of here AMD's site

I took below procedure,

1, Install Ubuntu 16.04 with option to install third party software and system update.
2, apt-get update,
3, Install ROCm Package with following this site .
4, Install AMD driver above.
5, export PATH=$PATH:/opt/rocm/hcc/compiler/bin
6, compile test cord.

when i exclude "amp.h", this cord run fine.

could I take some instructions?
thank you for reading.

looping through all devices causes pthread_detach to segfault

Hi - I am trying to contribute a stream benchmark using HCC
psteinb/GPU-STREAM/tree/hcc
I got the code to compile, but a simple loop through all devices

void listDevices(void)
{
  // Get number of devices
  std::vector<hc::accelerator> accs = hc::accelerator::get_all();

  // Print device names
  if (accs.empty())
  {
    std::cerr << "No devices found." << std::endl;
  }
  else
  {
    std::cout << std::endl;
    std::cout << "Devices:" << std::endl;
    for (int i = 0; i < accs.size(); i++)
    {
      std::cout << i << ": " << getDeviceName(accs[i]) << std::endl;
    }
    std::cout << std::endl;
  }
}

causes a segfault that originates in Kalmar::HSAContext::~HSAContext:

(ROCm-gdb) bt
#0  0x00007ffff7480545 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007fffeeadc204 in ?? () from /opt/rocm/hsa/lib/libhsa-runtime-tools64.so.1
#2  0x00007fffeeadd12e in OnUnload () from /opt/rocm/hsa/lib/libhsa-runtime-tools64.so.1
#3  0x00007ffff669a162 in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#4  0x00007ffff669a24c in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#5  0x00007ffff669a488 in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#6  0x00007ffff669028f in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#7  0x00007fffee6db253 in HsaDebugAgent_hsa_shut_down () at HSAIntercept.cpp:49
#8  0x00007ffff6a80bd1 in Kalmar::HSAContext::~HSAContext() () from /opt/rocm/hcc-lc/lib/libmcwamp_hsa.so
#9  0x00007ffff6ed8259 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007ffff6ed82a5 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x000000000040ebc5 in parseArguments (argc=2, argv=0x7fffffffe428) at /home/steinb95/development/gpu_stream/main.cpp:279
#12 0x000000000040ea78 in main (argc=2, argv=0x7fffffffe428) at /home/steinb95/development/gpu_stream/main.cpp:63

listDevices is called by parseArguments in main.cpp:279. What am I missing?

std::bad_function_call error on examples in simple program

Hello everyone

I installed the hcc compiler using debian rep. I tried to run a little example to test if I'm able to use it properly.

Here's my code:

#include <hc.hpp>
#include <iostream>
using namespace concurrency;

int main(int argc, char ** argv)
{


    char c;
    std::cin >> c;

    return 0;
}

And then, I build it as shown in the doc:
hcchcc-config --cxxflags --ldflags../src/_SandboxTEST/test.cpp -o test

However, when I'm trying to run the result ./test, I got:

terminating with uncaught exception of type std::bad_function_call: bad_function_call
Abandon

Is anybody have an idea of what it can be?

Thank you

Plumax

RFE: select GPU devices using GPU_DEVICE_ORDINAL

It is often beneficial to select/restrict GPU devices exposed to an application, just like it is done with the proprietary AMD OpenCL stack's GPU_DEVICE_ORDINAL env. var.

Please implement this feature for the ROC runtime too.

(The issue originally came up in the context of #197)

Build failure

When trying to build hcc:

[ 20%] Building CXX object lib/CMakeFiles/mcwamp.dir/mcwamp.cpp.o
#0 0x000000000148093a llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/jvesely/hcc-clang_tot_upgrade/compiler/lib/Support/Unix/Signals.inc:406:0
#1 0x000000000147eaee llvm::sys::RunSignalHandlers() /home/jvesely/hcc-clang_tot_upgrade/compiler/lib/Support/Signals.cpp:45:0
#2 0x000000000147ec12 SignalHandler(int) /home/jvesely/hcc-clang_tot_upgrade/compiler/lib/Support/Unix/Signals.inc:246:0
#3 0x00007f41b392d5c0 __restore_rt (/lib64/libpthread.so.0+0x115c0)
#4 0x00000000029a0800 /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/AST/Decl.cpp:2047:0
#5 0x00000000029a0800 /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/include/clang/AST/Decl.h:1551:0
#6 0x00000000029a0800 /home/jvesely/hcc-clang_tot_upgrade/compiler/include/llvm/Support/Casting.h:56:0
#7 0x00000000029a0800 /home/jvesely/hcc-clang_tot_upgrade/compiler/include/llvm/Support/Casting.h:96:0
#8 0x00000000029a0800 /home/jvesely/hcc-clang_tot_upgrade/compiler/include/llvm/Support/Casting.h:122:0
#9 0x00000000029a0800 /home/jvesely/hcc-clang_tot_upgrade/compiler/include/llvm/Support/Casting.h:114:0
#10 0x00000000029a0800 /home/jvesely/hcc-clang_tot_upgrade/compiler/include/llvm/Support/Casting.h:133:0
#11 0x00000000029a0800 /home/jvesely/hcc-clang_tot_upgrade/compiler/include/llvm/Support/Casting.h:298:0
#12 0x00000000029a0800 clang::VarDecl::hasInit() const /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/AST/Decl.cpp:2048:0
#13 0x00000000029a0849 clang::VarDecl::getInit() /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/AST/Decl.cpp:2056:0
#14 0x00000000029a7e60 clang::VarDecl::getAnyInitializer(clang::VarDecl const*&) const /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/AST/Decl.cpp:2039:0
#15 0x00000000021b3bef TrackMemoryOperator(clang::Stmt const*, std::vector<clang::Expr*, std::allocator<clang::Expr*> >&) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Sema/SemaDecl.cpp:12833:0
#16 0x00000000021b3a5b TrackMemoryOperator(clang::Stmt const*, std::vector<clang::Expr*, std::allocator<clang::Expr*> >&) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Sema/SemaDecl.cpp:12874:0
#17 0x00000000021bc836 /usr/include/c++/6.3.1/bits/stl_vector.h:656:0
#18 0x00000000021bc836 clang::Sema::ActOnFinishFunctionBody(clang::Decl*, clang::Stmt*, bool) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Sema/SemaDecl.cpp:13044:0
#19 0x0000000001fabf44 clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/ParseStmt.cpp:1964:0
#20 0x0000000001f21984 clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/Parser.cpp:1223:0
#21 0x0000000001fb7f42 clang::Parser::ParseSingleDeclarationAfterTemplate(unsigned int, clang::Parser::ParsedTemplateInfo const&, clang::ParsingDeclRAIIObject&, clang::SourceLocation&, clang::AccessSpecifier, clang::AttributeList*) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/ParseTemplate.cpp:301:0
#22 0x0000000001fb91a1 clang::Parser::ParseTemplateDeclarationOrSpecialization(unsigned int, clang::SourceLocation&, clang::AccessSpecifier, clang::AttributeList*) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/ParseTemplate.cpp:149:0
#23 0x0000000001fb938f clang::Parser::ParseDeclarationStartingWithTemplate(unsigned int, clang::SourceLocation&, clang::AccessSpecifier, clang::AttributeList*) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/ParseTemplate.cpp:39:0
#24 0x0000000001f4474d clang::Parser::ParseDeclaration(unsigned int, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/ParseDecl.cpp:1518:0
#25 0x0000000001f23951 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/Parser.cpp:809:0
#26 0x0000000001f549cd /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/include/clang/Sema/AttributeList.h:630:0
#27 0x0000000001f549cd /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/include/clang/Sema/AttributeList.h:721:0
#28 0x0000000001f549cd /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/include/clang/Parse/Parser.h:1264:0
#29 0x0000000001f549cd clang::Parser::ParseInnerNamespace(std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> >&, std::vector<clang::IdentifierInfo*, std::allocator<clang::IdentifierInfo*> >&, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> >&, unsigned int, clang::SourceLocation&, clang::ParsedAttributes&, clang::BalancedDelimiterTracker&) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/ParseDeclCXX.cpp:219:0
#30 0x0000000001f54fd2 /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/include/clang/Parse/Parser.h:834:0
#31 0x0000000001f54fd2 clang::Parser::ParseNamespace(unsigned int, clang::SourceLocation&, clang::SourceLocation) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/ParseDeclCXX.cpp:200:0
#32 0x0000000001f44778 clang::Parser::ParseDeclaration(unsigned int, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/ParseDecl.cpp:1531:0
#33 0x0000000001f23951 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/Parser.cpp:809:0
#34 0x0000000001f242b7 /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/include/clang/Sema/AttributeList.h:630:0
#35 0x0000000001f242b7 /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/include/clang/Sema/AttributeList.h:721:0
#36 0x0000000001f242b7 /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/include/clang/Parse/Parser.h:1264:0
#37 0x0000000001f242b7 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/Parser.cpp:624:0
#38 0x0000000001f18bab clang::ParseAST(clang::Sema&, bool, bool) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Parse/ParseAST.cpp:146:0
#39 0x0000000001bf3abf clang::CodeGenAction::ExecuteAction() /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/CodeGen/CodeGenAction.cpp:978:0
#40 0x000000000192e8f6 clang::FrontendAction::Execute() /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Frontend/FrontendAction.cpp:468:0
#41 0x0000000001902416 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/Frontend/CompilerInstance.cpp:951:0
#42 0x00000000019ae052 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:249:0
#43 0x0000000000900b78 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/tools/driver/cc1_main.cpp:221:0
#44 0x00000000008b6f93 ExecuteCC1Tool /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/tools/driver/driver.cpp:299:0
#45 0x00000000008b6f93 main /home/jvesely/hcc-clang_tot_upgrade/compiler/tools/clang/tools/driver/driver.cpp:380:0
#46 0x00007f41b246c401 __libc_start_main /usr/src/debug/glibc-2.24-33-ge9e69e4/csu/../csu/libc-start.c:323:0
#47 0x00000000008fee0a _start (/home/jvesely/hcc-clang_tot_upgrade-build/compiler/bin/clang-5.0+0x8fee0a)
Stack dump:
0.	Program arguments: /home/jvesely/hcc-clang_tot_upgrade-build/compiler/bin/clang-5.0 -cc1 -D__KALMAR_AMP__=1 -D__HCC_AMP__=1 -D__KALMAR_CPU__=1 -D__HCC_CPU__=1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -disable-llvm-verifier -discard-value-names -main-file-name mcwamp.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -momit-leaf-frame-pointer -dwarf-column-info -debug-info-kind=limited -dwarf-version=4 -debugger-tuning=gdb -coverage-notes-file /home/jvesely/hcc-clang_tot_upgrade-build/lib/CMakeFiles/mcwamp.dir/mcwamp.cpp.gcno -resource-dir /home/jvesely/hcc-clang_tot_upgrade-build/compiler/bin/../lib/clang/5.0.0 -isystem /home/jvesely/hcc-clang_tot_upgrade/utils -D GTEST_HAS_TR1_TUPLE=0 -I /home/jvesely/hcc-clang_tot_upgrade-build/include -I /home/jvesely/hcc-clang_tot_upgrade/include -I /home/jvesely/hcc-clang_tot_upgrade/compiler/include -I /home/jvesely/hcc-clang_tot_upgrade-build/compiler/include -I /home/jvesely/hcc-clang_tot_upgrade-build/lib -D NDEBUG -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../../../include/c++/6.3.1 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../../../include/c++/6.3.1/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../../../include/c++/6.3.1/backward -internal-isystem /usr/local/include -internal-isystem /home/jvesely/hcc-clang_tot_upgrade-build/compiler/bin/../lib/clang/5.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -O2 -std=c++amp -fdeprecated-macro -fdebug-compilation-dir /home/jvesely/hcc-clang_tot_upgrade-build/lib -ferror-limit 19 -fmessage-length 80 -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o CMakeFiles/mcwamp.dir/mcwamp.cpp.o -x c++ /home/jvesely/hcc-clang_tot_upgrade/lib/mcwamp.cpp 
1.	/home/jvesely/hcc-clang_tot_upgrade-build/include/amp.h:6020:1: current parser token 'template'
2.	/home/jvesely/hcc-clang_tot_upgrade-build/include/amp.h:59:1: parsing namespace 'Concurrency'
3.	/home/jvesely/hcc-clang_tot_upgrade-build/include/amp.h:5985:73: parsing function body 'Concurrency::parallel_for_each'
clang-5.0: error: unable to execute command: Segmentation fault (core dumped)
clang-5.0: error: clang frontend command failed due to signal (use -v to see invocation)
HCC clang version 5.0.0  (based on HCC 1.0.17086-4ba7223-945c0e0-5dc7066 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/jvesely/hcc-clang_tot_upgrade-build/compiler/bin
clang-5.0: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
clang-5.0: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-5.0: note: diagnostic msg: /tmp/mcwamp-1dce37.cpp
clang-5.0: note: diagnostic msg: /tmp/mcwamp-1dce37.sh
clang-5.0: note: diagnostic msg: 

********************
lib/CMakeFiles/mcwamp.dir/build.make:62: recipe for target 'lib/CMakeFiles/mcwamp.dir/mcwamp.cpp.o' failed

Problem compiling OpenCV example.

Hi,
I'm trying to compile OpenCV example using hcc and am having link error. It's a simple example that loads and saves an image. I compile it with the following command

hcchcc-config --cxxflagstest.cpp -lstdc++ -lopencv_highgui -lopencv_imgcodecs -lopencv_core

and get

/tmp/tmp.dCfO0GVjM9/test-5fdd60.host.o: In functionmain':
/tmp/test-d09a2a.s.bc:(.text+0x65b): undefined reference to cv::imwrite(cv::String const&, cv::_InputArray const&, std::__1::vector<int, std::__1::allocator<int> > const&)'

Using g++, it compiles without any issue. Is this an hcc problem? I'm using OpenCV 3.1 compiled locally on my machine. It's interesting that only imwrite is not found. imread, which I believe is in the same library libopencv_imgcodecs, is found without a problem.

test.txt

My ultimate goal is to convert Udacity Parallel Programming class assignment to run on ROCm. I'm using hip to convert problem set 1 code from CUDA to hcc and am having problem with linking OpenCV library.

ps1.zip

hcc-config output problem.

I installed rocm-1.0 release from repository and I'm having problem compiling hccaffe from bitbucket/multicoreware. I think the problem is that hcc-config is returning incorrect path.

:hccaffe$ hcc-config --build --cxxflags
-hc -std=c++amp -stdlib=libc++ -I/home/scchan/code/github/radeonopencompute/hcc.roc-1.0.20160413/hcc/include

Shouldn't the path be /opt/rocm/hcc/include ?

False positives in lambda capture

I had run into issues when capturing compound types which should be amp-compatible. I couldn't reproduce the issue outside HPX and the error was present event for captures without any AMP/HC data structures.
Unfortunately, the error message is a little bit too vague ("variable captured by lambda has unsupported type in amp restricted code") and to properly understand what is happening I had to debug clang itself and I've found out this piece code:
https://github.com/RadeonOpenCompute/hcc-clang/blob/master/lib/Sema/SemaLambda.cpp#L1488

I think its purpose was to eliminate code which captures amp/hc::array by copy. However, it does a lot more things than that:

  • each class which name starts with an "array" is going to be rejected
  • each class which is a template and the argument happens to be hc::array, is going to be rejected - even if it's an amp-compatible object which doesn't store hc::array inside
  • each class which is a template and the arguments happens to start with an array, is going to be rejected etc.
    Hence we have infinitely many ways to reproduce this bug.

I haven't worked with clang, hence I can't suggest an easy patch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.