Git Product home page Git Product logo

vc4cl's People

Contributors

angryfred avatar doe300 avatar doublej472 avatar echoix avatar edsonke avatar kernhanda avatar mathbarc avatar nomaddo avatar seppo avatar thijswithaar avatar xmikus01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vc4cl's Issues

Kernel Float Multiplication Bug

when I write code like

int i;
int j;
int k=i * j;

I can't get correct results
but if I write

int i;
int k=i * 10;

The result is correct, what's wrong with the multiplication between two variables?

DepthMap (StereoBM) acceleration

Hello @doe300!

User Gavinmc42 on Raspberry Pi forum (https://www.raspberrypi.org/forums/viewtopic.php?f=37&t=216940&p=1408611#p1408600) recommended me to contact you.

I'm a hardware developer, and our team did a stereoscopic camera based on Raspberry Pi Compute Module 3 (http://stereopi.com).

For OpenCV and ROS implementation Depth Map is a critical thing, so I'm looking for ways to increase Pi performance in this task. So I have some questions:

  1. How do you think, if VC4CL is applicable for this function (StereoBM, FindStereoCorrespondenceBM)? As for StereoSGBM it looks like this function is far from real-time, but may be you have some ideas or suggestions here.
  2. I plan to use H264 encoder for depth map acceleration. Yes, DM resolution will be smaller in this case (~16 times), but the main ARM kernels will not be loaded with this approach. Have you plan to use H264 encoder in some ways in your VC4CL?
  3. StereoPi can be used not only in stereoscopic setup, but also for work with two independent cameras simultaneously (like back-to-back for 360 panoramas with 200 degree fisheye cameras, or with two different cameras like classic Pi camera and NoIR edition). If you think these tasks can be accelerated too - I will be appreciate for your ideas.
  4. If these tasks are interesting for you, I can send you one of our boards for you to be able to conduct all experiments you need.

Regards,
Eugene

clBuildProgram ERROR

Hi I am working on RPI support on the fork of DarkNet at https://github.com/sowson/darknet in Makefile I enabled RPI=1 and disabled OPENCV=0 after installation of VC4CL and make the project I am trying to run it but without success. One of OpenCL program did not compile in runtime others are fine.

root@raspberrypi:~/cifar# ../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/test/4882_frog.png
Device ID: 0
Device name: VideoCore IV GPU
Device vendor: Broadcom
Device opencl availability: OpenCL 1.2 VC4CL 0.4
Device opencl used: 0.4
Device double precision: NO
Device max group size: 12
Device address bits: 32
opencl_load: could not compile. error: CL_UNKNOWN_ERROR
CL_PROGRAM_BUILD_LOG:
[W] Thu Aug 9 08:06:40 2018: Register conflict resolver has exceeded its maximum rounds, there might still be errors!
[E] Thu Aug 9 08:06:40 2018: Error assigning local to register: %call59.%b.1
[E] Thu Aug 9 08:06:40 2018: (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)+0xac [0x7695f918]
[E] Thu Aug 9 08:06:40 2018: (2) /usr/local/lib/libVC4CC.so.1.2 : vc4c::qpu_asm::GraphColoring::toRegisterMap() const+0x150 [0x76a4d3b8]
[E] Thu Aug 9 08:06:40 2018: (3) /usr/local/lib/libVC4CC.so.1.2 : vc4c::qpu_asm::CodeGenerator::generateInstructions(vc4c::Method&)+0x474 [0x76a3c900]
[E] Thu Aug 9 08:06:40 2018: (4) /usr/local/lib/libVC4CC.so.1.2 : +0x4983ac [0x769613ac]
[E] Thu Aug 9 08:06:40 2018: (5) /usr/local/lib/libVC4CC.so.1.2 : +0x498588 [0x76961588]
[E] Thu Aug 9 08:06:40 2018: (6) /usr/local/lib/libVC4CC.so.1.2 : +0x49942c [0x7696242c]
[E] Thu Aug 9 08:06:40 2018: (7) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76966214]
[E] Thu Aug 9 08:06:40 2018: (8) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocatorvc4c::Method* > >(std::vector<vc4c::Method*, std::allocatorvc4c::Method* > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda()#1}::operator()() const+0xc8 [0x76964a1c]
[E] Thu Aug 9 08:06:40 2018: (9) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocatorvc4c::Method* > >(std::vector<vc4c::Method*, std::allocatorvc4c::Method* > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x769698b8]
[E] Thu Aug 9 08:06:40 2018: (10) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x769508e4]
[E] Thu Aug 9 08:06:40 2018: (11) /usr/local/lib/libVC4CC.so.1.2 : +0x486a70 [0x7694fa70]
[E] Thu Aug 9 08:06:40 2018: (12) /usr/local/lib/libVC4CC.so.1.2 : +0x48735c [0x7695035c]
[E] Thu Aug 9 08:06:40 2018: (13) /usr/local/lib/libVC4CC.so.1.2 : +0x4872c8 [0x769502c8]
[E] Thu Aug 9 08:06:40 2018: (14) /usr/local/lib/libVC4CC.so.1.2 : +0x4872a0 [0x769502a0]
[E] Thu Aug 9 08:06:40 2018: (15) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x763df9dc]
[E] Thu Aug 9 08:06:40 2018: Background worker threw error: Label/Register Mapping: There are erroneous register-associations!
[E] Thu Aug 9 08:06:40 2018: While running worker task: CodeGenerator
[E] Thu Aug 9 08:06:40 2018: Compiler threw exception: Label/Register Mapping: There are erroneous register-associations!

CODE: attribute((noinline))` float get_pixel_kernel(__global float image, int w, int h, int x, int y, int c); attribute((noinline)) float4 rgb_to_hsv_kernel(float4 rgb); attribute((noinline)) float4 hsv_to_rgb_kernel(float4 hsv); attribute((noinline)) float bilinear_interpolate_kernel(__global float image, int w, int h, float x, float y, int c); attribute((noinline)) float get_pixel_kernel(__global float image, int w, int h, int x, int y, int c) { if(x < 0 || x >= w || y < 0 || y >= h) return 0; return image[x + w(y + ch)]; } attribute((noinline)) float4 rgb_to_hsv_kernel(float4 rgb) { float r = rgb.x; float g = rgb.y; float b = rgb.z; float h, s, v; float max = (r > g) ? ( (r > b) ? r : b) : ( (g > b) ? g : b); float min = (r < g) ? ( (r < b) ? r : b) : ( (g < b) ? g : b); float delta = max - min; v = max; if(max == 0){ s = 0; h = -1; }else{ s = delta/max; if(r == max){ h = (g - b) / delta; } else if (g == max) { h = 2 + (b - r) / delta; } else { h = 4 + (r - g) / delta; } if (h < 0) h += 6; } return (float4) (h, s, v, 0.0); } attribute((noinline)) float4 hsv_to_rgb_kernel(float4 hsv) { float h = hsv.x; float s = hsv.y; float v = hsv.z; float r, g, b; float f, p, q, t; if (s == 0) { r = g = b = v; } else { int index = (int) floor(h); f = h - index; p = v(1-s); q = v*(1-sf); t = v(1-s*(1-f)); if(index == 0){ r = v; g = t; b = p; } else if(index == 1){ r = q; g = v; b = p; } else if(index == 2){ r = p; g = v; b = t; } else if(index == 3){ r = p; g = q; b = v; } else if(index == 4){ r = t; g = p; b = v; } else { r = v; g = p; b = q; } } r = (r < 0) ? 0 : ((r > 1) ? 1 : r); g = (g < 0) ? 0 : ((g > 1) ? 1 : g); b = (b < 0) ? 0 : ((b > 1) ? 1 : b); return (float4)(r, g, b, 0.0); } attribute((noinline)) float bilinear_interpolate_kernel(__global float image, int w, int h, float x, float y, int c) { int ix = (int) floor(x); int iy = (int) floor(y); float dx = x - ix; float dy = y - iy; float val1 = ((1-dy) * (1-dx) * get_pixel_kernel(image, w, h, ix, iy, c)); float val2 = (dy * (1-dx) * get_pixel_kernel(image, w, h, ix, iy+1, c)); float val3 = ((1-dy) * dx * get_pixel_kernel(image, w, h, ix+1, iy, c)); float val4 = (dy * dx * get_pixel_kernel(image, w, h, ix+1, iy+1, c)); float val = val1 + val2 + val3 + val4; return val; } __kernel void levels_image_kernel(__global float image, __global float rand, int batch, int w, int h, int train, float saturation, float exposure, float translate, float scale, float shift) { int size = batch * w * h; int id = (get_group_id(0) + get_group_id(1)get_num_groups(0)) * get_local_size(0) + get_local_id(0); if(id >= size) return; int x = id % w; id /= w; int y = id % h; id /= h; float rshift = rand[0]; float gshift = rand[1]; float bshift = rand[2]; float r0 = rand[8id + 0]; float r1 = rand[8id + 1]; float r2 = rand[8id + 2]; float r3 = rand[8id + 3]; saturation = r0*(saturation - 1) + 1; saturation = (r1 > .5) ? 1./saturation : saturation; exposure = r2*(exposure - 1) + 1; exposure = (r3 > .5) ? 1./exposure : exposure; size_t offset = id * h * w * 3; image += offset; float r = image[x + w*(y + h0)]; float g = image[x + w(y + h1)]; float b = image[x + w(y + h2)]; float4 rgb = (float4)(r,g,b, 0.0); if(train){ float4 hsv = rgb_to_hsv_kernel(rgb); hsv.y = saturation; hsv.z = exposure; rgb = hsv_to_rgb_kernel(hsv); } else { shift = 0; } image[x + w(y + h0)] = rgb.xscale + translate + (rshift - .5)shift; image[x + w(y + h1)] = rgb.yscale + translate + (gshift - .5)shift; image[x + w(y + h2)] = rgb.zscale + translate + (bshift - .5)shift; } __kernel void forward_crop_layer_kernel(__global float input, __global float rand, int size, int c, int h, int w, int crop_height, int crop_width, int train, int flip, float angle, __global float output) { int id = (get_group_id(0) + get_group_id(1)get_num_groups(0)) * get_local_size(0) + get_local_id(0); if(id >= size) return; float cx = w/2.; float cy = h/2.; int count = id; int j = id % crop_width; id /= crop_width; int i = id % crop_height; id /= crop_height; int k = id % c; id /= c; int b = id; float r4 = rand[8b + 4]; float r5 = rand[8b + 5]; float r6 = rand[8b + 6]; float r7 = rand[8b + 7]; float dw = (w - crop_width)r4; float dh = (h - crop_height)r5; flip = (flip && (r6 > .5)); angle = 2angler7 - angle; if(!train){ dw = (w - crop_width)/2.; dh = (h - crop_height)/2.; flip = 0; angle = 0; } input += whcb; float x = (flip!=0) ? w - dw - j - 1 : j + dw; float y = i + dh; float rx = cos(angle)(x-cx) - sin(angle)(y-cy) + cx; float ry = sin(angle)(x-cx) + cos(angle)(y-cy) + cy; output[count] = bilinear_interpolate_kernel(input, w, h, rx, ry, k); }

terminate called without an active exception
Aborted

EDIT: attribute is an __attribute__

Can you help? in CODE section above is the code I am trying to compile/build. Thanks!

VC4CL installation error

Hello.

I was able to build VC4CLStdLib, VC4C and VC4CL.
I installed both VC4CLStdLib and VC4C by:

$ sudo dpkg -i /tmp/vc4cl-stdlib.deb
$ sudo dpkg -i /tmp/vc4c.deb 

But, when I install VC4CL by

$ sudo dpkg -i /tmp/vc4cl.deb

I get:

$ dependency problems prevent configuration of vc4cl:
$   vc4cl depends on opencl-c-headers; however Package opencl-c-headers is not installed.

This is strange because I've installed it by:

$ sudo apt-get install opencl-headers

Any ideas?
Thanks in advance.

Performance questions

Hello.

I'm optimizing a RANSAC algorithm using Videocore IV and VC4CL.
However, the performance of the GPU for my kernel is worst than CPU's.

So, I need some help with some performance issues.

  • Since VC4CL only allows 12 work group size, I'm using a three dimensional work group (12x12x12 -> global) with local size (1x1x1 -> local). Is this a good option or should, for example, (6x6x6 -> global) and (2x2x2 -> local) be better?

  • Is there any data type that improves the performance, like uint8, uint16, float8, float16, etc?...

  • I notice that sums, multiplications and divisions take some time. Is better to perform this operations using only integers or should I use floats?

Thanks in advance.

Migrate to new dockeimage

Currently, the problem is

nomaddo@nomaddo-AS:~/idein/VC4CL$ circleci build

====>> Spin up Environment
Build-agent version 0.0.4704-deba4df (2018-02-21T21:01:29+0000)
Starting container nomaddo/cross-rpi:0.1

Using build environment variables:
  BASH_ENV=/tmp/.bash_env-localbuild-1519377596
  CI=true
  CIRCLECI=true
  CIRCLE_BRANCH=master
  CIRCLE_BUILD_NUM=
  CIRCLE_JOB=build
  CIRCLE_NODE_INDEX=0
  CIRCLE_NODE_TOTAL=1
  CIRCLE_REPOSITORY_URL=http://github.com/doe300/VC4CL.git
  CIRCLE_SHA1=68be86b590b85eb37862b01f50753a9ff5af3b48
  CIRCLE_SHELL_ENV=/tmp/.bash_env-localbuild-1519377596
  CIRCLE_WORKING_DIRECTORY=~/project

====>> Checkout code
  #!/bin/sh
mkdir -p /root/project && cp -r /tmp/_circleci_local_build_repo/. /root/project
====>> get vc4c url
  #!/bin/bash -eo pipefail
curl "https://circleci.com/api/v1.1/project/github/doe300/VC4C/latest/artifacts?branch=master&filter=successful" --output /tmp/dump
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1089  100  1089    0     0   1289      0 --:--:-- --:--:-- --:--:--  1288
====>> get vc4cl-stdlib package
  #!/bin/bash -eo pipefail
wget -O /tmp/vc4cl-stdlib.deb $(python .circleci/get_url.py "vc4cl-stdlib-" "/tmp/dump")
--2018-02-23 09:19:59--  https://76-106166771-gh.circle-artifacts.com/0/root/project/vc4cl-stdlib-0.4-Linux.deb
Resolving 76-106166771-gh.circle-artifacts.com (76-106166771-gh.circle-artifacts.com)... 34.194.4.99, 34.198.74.173, 52.44.239.229, ...
Connecting to 76-106166771-gh.circle-artifacts.com (76-106166771-gh.circle-artifacts.com)|34.194.4.99|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-debian-package]
Saving to: '/tmp/vc4cl-stdlib.deb'

/tmp/vc4cl-stdlib.d     [  <=>               ] 103.90K   294KB/s    in 0.4s    

2018-02-23 09:20:01 (294 KB/s) - '/tmp/vc4cl-stdlib.deb' saved [106390]

====>> get vc4c package
  #!/bin/bash -eo pipefail
wget -O /tmp/vc4c.deb $(python .circleci/get_url.py "vc4c-" "/tmp/dump")
--2018-02-23 09:20:01--  https://76-106166771-gh.circle-artifacts.com/0/root/project/vc4c-0.4-Linux.deb
Resolving 76-106166771-gh.circle-artifacts.com (76-106166771-gh.circle-artifacts.com)... 52.72.204.202, 52.44.239.229, 34.198.74.173, ...
Connecting to 76-106166771-gh.circle-artifacts.com (76-106166771-gh.circle-artifacts.com)|52.72.204.202|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-debian-package]
Saving to: '/tmp/vc4c.deb'

/tmp/vc4c.deb           [      <=>           ]   7.84M  4.21MB/s    in 1.9s    

2018-02-23 09:20:04 (4.21 MB/s) - '/tmp/vc4c.deb' saved [8223320]

====>> unpack dependencies
  #!/bin/bash -eo pipefail
dpkg-deb -x /tmp/vc4cl-stdlib.deb / && dpkg-deb -x /tmp/vc4c.deb /
====>> configure
  #!/bin/bash -eo pipefail
cmake . -DCROSS_COMPILE=ON -DBUILD_TESTING=ON -DREGISTER_POKE_KERNELS=ON
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29") 
-- Cross compiling for Raspbian with compiler: /usr/bin/arm-linux-gnueabihf-g++
-- The C compiler identification is GNU 6.3.0
-- The CXX compiler identification is GNU 6.3.0
-- Check for working C compiler: /usr/bin/arm-linux-gnueabihf-gcc
-- Check for working C compiler: /usr/bin/arm-linux-gnueabihf-gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/arm-linux-gnueabihf-g++
-- Check for working CXX compiler: /usr/bin/arm-linux-gnueabihf-g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - found
CMake Error at /usr/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:138 (message):
  Could NOT find OpenCL (missing: OpenCL_LIBRARY) (found version "2.0")
Call Stack (most recent call first):
  /usr/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.7/Modules/FindOpenCL.cmake:127 (find_package_handle_standard_args)
  CMakeLists.txt:66 (find_package)


-- Configuring incomplete, errors occurred!
See also "/root/project/CMakeFiles/CMakeOutput.log".
Error: Exited with code 1
Step failed
====>> Uploading artifacts
Uploading /root/project/build/libVC4CL.so.0.4 to root/project/build/libVC4CL.so.0.4
  No artifact files found at /root/project/build/libVC4CL.so.0.4
====>> Uploading artifacts
Uploading /root/project/build/test/TestVC4CL to root/project/build/test/TestVC4CL
  No artifact files found at /root/project/build/test/TestVC4CL
====>> Uploading artifacts
Uploading /root/project/vc4cl-0.4-Linux.deb to root/project/vc4cl-0.4-Linux.deb
  No artifact files found at /root/project/vc4cl-0.4-Linux.deb
Task failed

I installed opencl-headers (see https://github.com/nomaddo/cross-rpi/blob/master/Dockerfile#L36).
I am not sure why failed. @doe300 Do you have any idea?

P.S.
Now, In my dockerfile, I give up building SPIRV-LLVM and download it from archive.
Auto-building of docker images works. You can push nomaddo/cross-rpi and try images.

VC4CL clCompileProgram failed with "No matching precompiler available" error

I compiled and installed VC4CL on a RPI v3 with Debian stretch. Running clinfo under root works.

When I run my own OpenCL code (mcxcl: http://github.com/fangq/mcxcl) using vc4cl, I got a "No matching precompiler available" error, see log below:

==============================================================================
- code name: [Vanilla MCXCL] compiled with OpenCL version [1]
- compiled with: [RNG] Logistic-Lattice [Seed Length] 5
initializing streams ...	terminate called without an active exception
init complete : 0 ms
Building kernel with option: -cl-mad-enable 
Kernel build log:
[E] Sat Jan 27 00:08:04 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xac [0x7668e23c]
[E] Sat Jan 27 00:08:04 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : vc4c::Compiler::compile(std::istream&, std::ostream&, vc4c::Configuration, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::Optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&)+0x1e0 [0x76690608]
[E] Sat Jan 27 00:08:04 2018:  (3) /usr/local/lib/libVC4CL.so : +0x4a630 [0x76983630]
[E] Sat Jan 27 00:08:04 2018:  (4) /usr/local/lib/libVC4CL.so : vc4cl::Program::compile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, vc4cl::object_wrapper<vc4cl::Program>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, vc4cl::object_wrapper<vc4cl::Program> > > > const&, void (*)(_cl_program*, void*), void*)+0xd8 [0x76984bc4]
[E] Sat Jan 27 00:08:04 2018:  (5) /usr/local/lib/libVC4CL.so : VC4CL_clCompileProgram+0xe88 [0x76987830]
[E] Sat Jan 27 00:08:04 2018:  (6) /usr/local/lib/libVC4CL.so : VC4CL_clBuildProgram+0x1a8 [0x76988468]
[E] Sat Jan 27 00:08:04 2018:  (7) ../../bin/mcxcl() [0x13834]
[E] Sat Jan 27 00:08:04 2018: Compiler threw exception: Pre-compilation: No matching precompiler available!


MCX ERROR(2):Error: Failed to build program executable! in unit mcx_host.cpp:450
Aborted

my opencl kernel was tested on nvidia/intel/amd ocl implementations, I am wondering if the above error is related to my setting or compatibility of the kernel to vc4cl.

please let me know if you have any suggestions how to solve this issue. thanks

To reproduce this issue, you may follow the below commands as root

git clone https://github.com/fangq/mcxcl.git 
cd mcxcl/src
make clean all
cd ../example/benchmark
LD_LIBRARY_PATH=/usr/local/lib ./run_benchmark1.sh -n 1e5 

Set milestones: support CLBlast?

I think supporting practical libraries using OpenCL is one of the big milestone.
Probably clblast or clblas may be suitable for showing benchmark results and its utility.

Are you interested in such projects and setting milestones?
I want to contribute for such practical uses.

Precompilation error on tests

Hello!

I installed all the setup correctly (clinfo runs fine) but when I compile the tests in VC4C/example folder I always get precompile errors.

The compilation command I'm using is:

VC4C --hex -o /dev/null ./example/file.cl

I'm running this in a Raspberry Pi 3B.

Any ideas?
Thanks in advance.

Running with DeepSpeech (TensorFlow OpenCL/ComputeCpp)

I'm currently trying to assert the status of expectation we can have on this setup for DeepSpeech, relying on TensorFlow with ComputeCpp. I have been able to cross-build the driver, and most of the TestVC4C do run (properly or not). That means, I can see clang doing its job and compile some cl stuff.

The GPU is also visible by computecpp_info.

Now, I'm trying to run our code on top of that. So far, it's not being very successfull, but in an unexpected way: as documented in codeplaysoftware/computecpp-sdk#117 (comment), ComputeCpp does see the GPU, and makes use of it. But then, monitoring the system, it's sitting with the deepspeech process at 100%.

I don't see that much of clang running, but I did spot some process llvm-spirv /tmp/vc4c-EgnXeW /dev/stdin being ran. The file /tmp/vc4c-EgnXeW seems to be non-zero size. But no error when running, so I don't know if there's something going on.

The OpenCL kernels might be big (too big for the current limitations? I'm not sure how to check that), and/or the project might be still too young?

As a comparison, we are able to run with the same stack on the Intel Neo driver on my laptop (i7-8650U) using the GPU. The first run of the intel driver does compile the OpenCL code and can cache it on-disk, and this ~220 secs to compile.

Resulting cl_cache for Intel is:

$ LC_ALL=C ll cl_cache/
total 16M
drwxr-xr-x 1 alex alex  700 May 23 16:31 .
drwxr-xr-x 1 alex alex  222 May 23 16:28 ..
-rw-r--r-- 1 alex alex  73K May 23 16:31 34dcc67199396f96.cl_cache
-rw-r--r-- 1 alex alex 734K May 23 16:30 3b459da76e3bb7c4.cl_cache
-rw-r--r-- 1 alex alex 6.7M May 23 16:30 3d76036abef34a33.cl_cache
-rw-r--r-- 1 alex alex 206K May 23 16:31 66637feca75f4aba.cl_cache
-rw-r--r-- 1 alex alex 443K May 23 16:30 6ad895b3f2af8633.cl_cache
-rw-r--r-- 1 alex alex 728K May 23 16:31 6d252b514c3b944f.cl_cache
-rw-r--r-- 1 alex alex 165K May 23 16:30 8a505eb1718a2ecd.cl_cache
-rw-r--r-- 1 alex alex 425K May 23 16:30 98e4482263ced159.cl_cache
-rw-r--r-- 1 alex alex 523K May 23 16:30 9e55731fa2c7901e.cl_cache
-rw-r--r-- 1 alex alex 124K May 23 16:30 a493ba4fcff0b7d1.cl_cache
-rw-r--r-- 1 alex alex 122K May 23 16:29 b1a69f78813be187.cl_cache
-rw-r--r-- 1 alex alex 108K May 23 16:31 d2e0ea42ff6b24ac.cl_cache
-rw-r--r-- 1 alex alex 344K May 23 16:31 ec50bb7158952ee2.cl_cache
-rw-r--r-- 1 alex alex 5.2M May 23 16:31 f4146d630607b157.cl_cache

I've let run deepspeech with the VC4 driver run for ~120m without any visible output or error: is it possible our code is too much compute intensive for now, and it's expected to take that much time ? Or could there be some silent error happening and breaking something ?

As much as I could read of the docs / wiki available, I could not find anything (e.g., env variable) that could be used to get a bit more of informations at runtime. I'm a bit relunctant to try a debug build, considering how slow things are already with a release build, but if that can provide useful feedback, I'd be glad to give it a try.

Bug in sin()

When I use the device-side sin() function, no matter how I try to isolate it or go around it, I'm getting results that don't make sense. Here's my original kernel:

void kernel decomposeamp(global real1* stateProb, global real1* stateAngle, constant bitCapInt* bitCapIntPtr, global cmplx* nStateVec)
{
    bitCapInt ID, Nthreads, lcv;
    
    ID = get_global_id(0);
    Nthreads = get_global_size(0);
    bitCapInt maxQPower = bitCapIntPtr[0];
    real1 angle;
    for (lcv = ID; lcv < maxQPower; lcv += Nthreads) {
        angle = stateAngle[lcv];
        nStateVec[lcv] = sqrt(stateProb[lcv]) * sin((float2)(angle + M_PI_2_F, angle));
    }
}

In the cases I'm testing, stateProb should always be an array of values that are (at least extremely close to) exactly 1 or 0. If I comment out the multiplication by the sin of the vector, my components all do come out 1 or 0, via the sqrt. However, the values returned by sin seem to be not even close, like cos^2(x) + sin^(x) > 2.7. In fact, the vectorized sin() above should be equivalent to components of (float2)(cos(angle), sin(angle)) by trigonometric identities, but I'm finding that the resulting norm (sum of the squared components) is around 2.7 or larger for a stateProb[lcv] of 1.

This doesn't seem to be limited to the vectorized version of sin(), either, if I replace this with two separate calls to construct two components separately. It also happens if I switch M_PI_2_F out with a known good value for pi/2.

The kernel is not particularly complicated, and if I implement it host-side at the same point in the code, and use the host sin() instead, it works correctly.

(Thank you for the awesome compiler, by the way.) Any idea what's going on? Am I doing something wrong?

Raise exception after `main`?

In my OpenCL program (just call test_mul24 in test_int.cl), the exception is raised as follows:

terminate called without an active exception
Aborted

I am not sure what happened. But I see ~thread is called in __run_exit_handlers in _start.
Some threads are unfinished (related to #3?).

My program is as follows. It is straightforward as OpenCL program.
Probably other program raise the same exception.....

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <assert.h>
#include <math.h>

#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

#define MAX_SOURCE_SIZE (0x100000)

#define N 1024

cl_int a[N];
cl_int b[N];

int main() {
  char fileName[] = "./test_int.cl";

  FILE *fp = fopen(fileName, "r");
  if (!fp) {
    fprintf(stderr, "Failed to load kernel\n");
    exit(1);
  }

  char * source_str = (char*)malloc(MAX_SOURCE_SIZE);
  size_t source_size = fread(source_str, 1, MAX_SOURCE_SIZE, fp);
  fclose(fp);

  cl_platform_id platform_id = NULL;
  cl_uint ret_num_platforms;
  cl_int ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);
  cl_device_id device_id = NULL;
  cl_uint ret_num_devices;
  ret = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_DEFAULT, 1, &device_id, &ret_num_devices);

  cl_context context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &ret);
  cl_command_queue command_queue = clCreateCommandQueue(context, device_id, 0, &ret);

  for (int i = 0; i < N; i++)
    a[i] = N;

  cl_mem memobj = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(cl_float) * N, NULL, &ret);
  clEnqueueWriteBuffer(command_queue, memobj, CL_TRUE, 0, sizeof(cl_float) * N, a, 0, NULL, NULL);

  cl_program program = clCreateProgramWithSource(context, 1, (const char **)&source_str, (const size_t *)&source_size, &ret);
  ret = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);

  cl_kernel kernel = clCreateKernel(program, "test_mul24", &ret);
  cl_int arg0 = 2;

  ret = clSetKernelArg(kernel, 0, sizeof(cl_int), (void*)&arg0);
  ret = clSetKernelArg(kernel, 1, sizeof(cl_mem), (void*)&memobj);
  size_t global_item_size[] = { N };
  ret = clEnqueueNDRangeKernel (command_queue, kernel, 1, NULL,
				global_item_size, NULL, 0, NULL, NULL);
  ret = clFinish(command_queue);
  ret = clFlush(command_queue);
  ret = clEnqueueReadBuffer(command_queue, memobj, CL_TRUE, 0, sizeof(cl_float) * N, b, 0, NULL, NULL);
  ret = clReleaseKernel(kernel);
  ret = clReleaseProgram(program);
  ret = clReleaseMemObject(memobj);
  ret = clReleaseCommandQueue(command_queue);
  ret = clReleaseContext(context);
  free(source_str);

  return 0;
}

TestVC4C & TestVC4CL Error

Hello! I have followed the README file to build the VC4CLStdLib, & VC4C & VC4CL. And I have got something really confusing.
I didn't make any change of the CMakeList.txt (in the top dir) but just turned off the Multi-Thread option in the VC4C.
And then I run
cmake .
make
make install

And the output of TestVC4CL


Test 'TestSystem::testGetSystemInfo()' failed!
	Suite: TestSystem
	File: TestSystem.cpp
	Line: 28
	Failure: Got 11, expected 2
Test 'TestSystem::testGetSystemInfo()' failed!
	Suite: TestSystem
	File: TestSystem.cpp
	Line: 30
	Failure: Got 14, expected 4
Test 'TestSystem::testGetSystemInfo()' failed!
	Suite: TestSystem
	File: TestSystem.cpp
	Line: 32
	Failure: Got 14, expected 3
Suite 'TestSystem' finished, 0/1 successful (0%) in 236 microseconds (0.236 ms).
./build/test/TestVC4CL: symbol lookup error: ./build/test/TestVC4CL: undefined symbol: clGetPlatformIDs

the VC4C CMake output:

 
-- VC4CL standard library headers found: /usr/local/include/vc4cl-stdlib/VC4CLStdLib.h
-- The C compiler identification is GNU 6.3.0
-- The CXX compiler identification is GNU 6.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
PING github.com (13.250.177.223) 56(84) bytes of data.

--- github.com ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

CMake Warning at CMakeLists.txt:96 (message):
  Building in off-line mode, some dependencies might not be up-to-date!


-- CLang compiler found: /usr/bin/clang-3.9
-- Using enhanced OpenCL features of Clang 3.9+!
-- LLVM-link found: /usr/bin/llvm-link
-- Compiling LLVM library front-end with LLVM in version 3.9.1 located in '/usr/lib/llvm-3.9/lib'
-- found clang-format: /usr/bin/clang-format
-- build deb package...
-- Debian package expects VC4CL standard library PCH to be located in: /usr/local/include/vc4cl-stdlib/VC4CLStdLib.h.pch
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pi/TestOpenCL/VC4C

the VC4CL CMake output:


-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29") 
-- The C compiler identification is GNU 6.3.0
-- The CXX compiler identification is GNU 6.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
PING github.com (13.250.177.223) 56(84) bytes of data.

--- github.com ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

CMake Warning at CMakeLists.txt:77 (message):
  Building in off-line mode, some dependencies might not be up-to-date!


-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - found
-- Found OpenCL: /usr/lib/arm-linux-gnueabihf/libOpenCL.so (found version "2.0") 
-- Found headers for OpenCL up to version 2.0 located in /usr/include
-- VC4C library found: /home/pi/TestOpenCL/VC4C/build/libVC4CC.so
-- VC4C compiler header found at: /home/pi/TestOpenCL/VC4C/include/VC4C.h
-- Building with ICD support
-- Checking for one of the modules 'ocl-icd>=1.3'
-- Found Khronos ICD Loader in version 2.2.11 in /usr/lib/arm-linux-gnueabihf
-- Enabling register-poking to run kernels
-- found clang-format: /usr/bin/clang-format
-- build deb package...
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pi/TestOpenCL/VC4CL

And the clinfo(which is from the git) output:
Number of platforms 0

And then
sudo ./build/test/TestVC4C

I got the output below


./example/fft2_2.cl
[E] Thu Jun  7 11:01:32 2018: Errors in precompilation:
[E] Thu Jun  7 11:01:32 2018: Expected no forward declarations!
!3 =  !{}
incorrect number of operands in llvm.ident metadata
!0 = !{i32 0, i32 1}
/usr/bin/llvm-link: /tmp/vc4c-KQBI2v: error: input module is broken!
[E] Thu Jun  7 11:01:32 2018:  (1) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)+0xf4 [0x76b776a8]
[E] Thu Jun  7 11:01:32 2018:  (2) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : +0x6a27d0 [0x76e017d0]
[E] Thu Jun  7 11:01:32 2018:  (3) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::precompilation::linkLLVMModules(std::vector, std::allocator > >&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x364 [0x76e02770]
[E] Thu Jun  7 11:01:32 2018:  (4) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::precompilation::linkInStdlibModule(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x78 [0x76e01f9c]
[E] Thu Jun  7 11:01:32 2018:  (5) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::_Function_handler&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&), void (*)(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::_M_invoke(std::_Any_data const&, vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x60 [0x76e04ef0]
[E] Thu Jun  7 11:01:32 2018:  (6) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0x78 [0x76e03fb0]
[E] Thu Jun  7 11:01:32 2018:  (7) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> vc4c::precompilation::chainSteps<(vc4c::SourceType)3, (vc4c::SourceType)1, (vc4c::SourceType)3>(std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&, std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&)::{lambda(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)#1}::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0xd0 [0x76e02e9c]
[E] Thu Jun  7 11:01:32 2018:  (8) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::_Function_handler&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&), std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> vc4c::precompilation::chainSteps<(vc4c::SourceType)3, (vc4c::SourceType)1, (vc4c::SourceType)3>(std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&, std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&)::{lambda(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)#1}>::_M_invoke(std::_Any_data const&, vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x5c [0x76e05130]
[E] Thu Jun  7 11:01:32 2018:  (9) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0x78 [0x76e03d00]
[E] Thu Jun  7 11:01:32 2018:  (10) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::Precompiler::run(std::unique_ptr >&, vc4c::SourceType, std::__cxx11::basic_string, std::allocator > const&, vc4c::Optional, std::allocator > >)+0x4ac [0x76e09af4]
[E] Thu Jun  7 11:01:32 2018:  (11) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::Precompiler::precompile(std::istream&, std::unique_ptr >&, vc4c::Configuration, std::__cxx11::basic_string, std::allocator > const&, vc4c::Optional, std::allocator > > const&, vc4c::Optional, std::allocator > >)+0x158 [0x76e07a80]
[E] Thu Jun  7 11:01:32 2018:  (12) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::Compiler::compile(std::istream&, std::ostream&, vc4c::Configuration, std::__cxx11::basic_string, std::allocator > const&, vc4c::Optional, std::allocator > > const&)+0xc4 [0x76b79898]
[E] Thu Jun  7 11:01:32 2018:  (13) ./build/test/TestVC4C : RegressionTest::testRegression(std::__cxx11::basic_string, std::allocator >, std::__cxx11::basic_string, std::allocator >, vc4c::Frontend)+0xb0 [0xbea78]
[E] Thu Jun  7 11:01:32 2018:  (14) ./build/test/TestVC4C : Test::Suite::TestMethod::TestMethod, std::allocator >, std::__cxx11::basic_string, std::allocator >, vc4c::Frontend>(std::__cxx11::basic_string, std::allocator > const&, void (Test::Suite::*)(std::__cxx11::basic_string, std::allocator >, std::__cxx11::basic_string, std::allocator >, vc4c::Frontend), std::__cxx11::basic_string, std::allocator > const, std::__cxx11::basic_string, std::allocator > const, vc4c::Frontend const)::{lambda(Test::Suite*)#1}::operator()(Test::Suite*) const+0xbc [0xd8afc]
[E] Thu Jun  7 11:01:32 2018:  (15) ./build/test/TestVC4C : std::_Function_handler, std::allocator >, std::__cxx11::basic_string, std::allocator >, vc4c::Frontend>(std::__cxx11::basic_string, std::allocator > const&, void (Test::Suite::*)(std::__cxx11::basic_string, std::allocator >, std::__cxx11::basic_string, std::allocator >, vc4c::Frontend), std::__cxx11::basic_string, std::allocator > const, std::__cxx11::basic_string, std::allocator > const, vc4c::Frontend const)::{lambda(Test::Suite*)#1}>::_M_invoke(std::_Any_data const&, Test::Suite*&&)+0x3c [0xda9b4]
[E] Thu Jun  7 11:01:32 2018:  (16) /home/pi/TestOpenCL/VC4C/build/cpptest-lite/src/cpptest-lite-project-build/libcpptest-lite.so.1.1.2 : Test::Suite::runTestMethod(Test::Suite::TestMethod const&)+0xc0 [0x767486ac]
[E] Thu Jun  7 11:01:32 2018:  (17) /home/pi/TestOpenCL/VC4C/build/cpptest-lite/src/cpptest-lite-project-build/libcpptest-lite.so.1.1.2 : Test::Suite::run(Test::Output&, bool)+0x98 [0x76748958]
[E] Thu Jun  7 11:01:32 2018:  (18) ./build/test/TestVC4C : Test::runSuites(int, char**)+0xae0 [0xdd0e4]
[E] Thu Jun  7 11:01:32 2018:  (19) ./build/test/TestVC4C : main+0xc54 [0xde024]
[E] Thu Jun  7 11:01:32 2018:  (20) /lib/arm-linux-gnueabihf/libc.so.6 : __libc_start_main+0x114 [0x73db4678]
[E] Thu Jun  7 11:01:32 2018: Compiler threw exception: Pre-compilation: Error in precompilation: Expected no forward declarations!
!3 =  !{}
incorrect number of operands in llvm.ident metadata
!0 = !{i32 0, i32 1}
/usr/bin/llvm-link: /tmp/vc4c-KQBI2v: error: input module is broken!
Test-method 'RegressionTest::testRegression("./example/fft2_2.cl", "", 0)' failed with exception!
	Exception: Pre-compilation: Error in precompilation: Expected no forward declarations!
!3 =  !{}
incorrect number of operands in llvm.ident metadata
!0 = !{i32 0, i32 1}
/usr/bin/llvm-link: /tmp/vc4c-KQBI2v: error: input module is broken!
	Errno: 0
	Error: Success
./example/fibonacci.cl
[E] Thu Jun  7 11:01:33 2018: Errors in precompilation:
[E] Thu Jun  7 11:01:33 2018: Expected no forward declarations!
!3 =  !{}
incorrect number of operands in llvm.ident metadata
!0 = !{i32 0, i32 1}
/usr/bin/llvm-link: /tmp/vc4c-oqcFrK: error: input module is broken!
[E] Thu Jun  7 11:01:33 2018:  (1) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)+0xf4 [0x76b776a8]
[E] Thu Jun  7 11:01:33 2018:  (2) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : +0x6a27d0 [0x76e017d0]
[E] Thu Jun  7 11:01:33 2018:  (3) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::precompilation::linkLLVMModules(std::vector, std::allocator > >&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x364 [0x76e02770]
[E] Thu Jun  7 11:01:33 2018:  (4) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::precompilation::linkInStdlibModule(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x78 [0x76e01f9c]
[E] Thu Jun  7 11:01:33 2018:  (5) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::_Function_handler&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&), void (*)(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::_M_invoke(std::_Any_data const&, vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x60 [0x76e04ef0]
[E] Thu Jun  7 11:01:33 2018:  (6) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0x78 [0x76e03fb0]
[E] Thu Jun  7 11:01:33 2018:  (7) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> vc4c::precompilation::chainSteps<(vc4c::SourceType)3, (vc4c::SourceType)1, (vc4c::SourceType)3>(std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&, std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&)::{lambda(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)#1}::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0xd0 [0x76e02e9c]
[E] Thu Jun  7 11:01:33 2018:  (8) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::_Function_handler&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&), std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> vc4c::precompilation::chainSteps<(vc4c::SourceType)3, (vc4c::SourceType)1, (vc4c::SourceType)3>(std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&, std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&)::{lambda(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)#1}>::_M_invoke(std::_Any_data const&, vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x5c [0x76e05130]
[E] Thu Jun  7 11:01:33 2018:  (9) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0x78 [0x76e03d00]
[E] Thu Jun  7 11:01:33 2018:  (10) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::Precompiler::run(std::unique_ptr >&, vc4c::SourceType, std::__cxx11::basic_string, std::allocator > const&, vc4c::Optional, std::allocator > >)+0x4ac [0x76e09af4]
[E] Thu Jun  7 11:01:33 2018:  (11) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::Precompiler::precompile(std::istream&, std::unique_ptr >&, vc4c::Configuration, std::__cxx11::basic_string, std::allocator > const&, vc4c::Optional, std::allocator > > const&, vc4c::Optional, std::allocator > >)+0x158 [0x76e07a80]
[E] Thu Jun  7 11:01:33 2018:  (12) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::Compiler::compile(std::istream&, std::ostream&, vc4c::Configuration, std::__cxx11::basic_string, std::allocator > const&, vc4c::Optional, std::allocator > > const&)+0xc4 [0x76b79898]
[E] Thu Jun  7 11:01:33 2018:  (13) ./build/test/TestVC4C : RegressionTest::testRegression(std::__cxx11::basic_string, std::allocator >, std::__cxx11::basic_string, std::allocator >, vc4c::Frontend)+0xb0 [0xbea78]
[E] Thu Jun  7 11:01:33 2018:  (14) ./build/test/TestVC4C : Test::Suite::TestMethod::TestMethod, std::allocator >, std::__cxx11::basic_string, std::allocator >, vc4c::Frontend>(std::__cxx11::basic_string, std::allocator > const&, void (Test::Suite::*)(std::__cxx11::basic_string, std::allocator >, std::__cxx11::basic_string, std::allocator >, vc4c::Frontend), std::__cxx11::basic_string, std::allocator > const, std::__cxx11::basic_string, std::allocator > const, vc4c::Frontend const)::{lambda(Test::Suite*)#1}::operator()(Test::Suite*) const+0xbc [0xd8afc]
[E] Thu Jun  7 11:01:33 2018:  (15) ./build/test/TestVC4C : std::_Function_handler, std::allocator >, std::__cxx11::basic_string, std::allocator >, vc4c::Frontend>(std::__cxx11::basic_string, std::allocator > const&, void (Test::Suite::*)(std::__cxx11::basic_string, std::allocator >, std::__cxx11::basic_string, std::allocator >, vc4c::Frontend), std::__cxx11::basic_string, std::allocator > const, std::__cxx11::basic_string, std::allocator > const, vc4c::Frontend const)::{lambda(Test::Suite*)#1}>::_M_invoke(std::_Any_data const&, Test::Suite*&&)+0x3c [0xda9b4]
[E] Thu Jun  7 11:01:33 2018:  (16) /home/pi/TestOpenCL/VC4C/build/cpptest-lite/src/cpptest-lite-project-build/libcpptest-lite.so.1.1.2 : Test::Suite::runTestMethod(Test::Suite::TestMethod const&)+0xc0 [0x767486ac]
[E] Thu Jun  7 11:01:33 2018:  (17) /home/pi/TestOpenCL/VC4C/build/cpptest-lite/src/cpptest-lite-project-build/libcpptest-lite.so.1.1.2 : Test::Suite::run(Test::Output&, bool)+0x98 [0x76748958]
[E] Thu Jun  7 11:01:33 2018:  (18) ./build/test/TestVC4C : Test::runSuites(int, char**)+0xae0 [0xdd0e4]
[E] Thu Jun  7 11:01:33 2018:  (19) ./build/test/TestVC4C : main+0xc54 [0xde024]
[E] Thu Jun  7 11:01:33 2018:  (20) /lib/arm-linux-gnueabihf/libc.so.6 : __libc_start_main+0x114 [0x73db4678]
[E] Thu Jun  7 11:01:36 2018: Compiler threw exception: Pre-compilation: Error in precompilation: Expected no forward declarations!
!3 =  !{}
incorrect number of operands in llvm.ident metadata
!0 = !{i32 0, i32 1}
/usr/bin/llvm-link: /tmp/vc4c-oqcFrK: error: input module is broken!
Test-method 'RegressionTest::testRegression("./example/fibonacci.cl", "", 0)' failed with exception!
Exception: Pre-compilation: Error in precompilation: Expected no forward declarations!
!3 =  !{}
incorrect number of operands in llvm.ident metadata
!0 = !{i32 0, i32 1}
/usr/bin/llvm-link: /tmp/vc4c-oqcFrK: error: input module is broken!
	Errno: 0
	Error: Success
./example/hello_world.cl
[E] Thu Jun  7 11:01:37 2018: Errors in precompilation:
[E] Thu Jun  7 11:01:37 2018: Expected no forward declarations!
!3 =  !{}
incorrect number of operands in llvm.ident metadata
!0 = !{i32 0, i32 1}
/usr/bin/llvm-link: /tmp/vc4c-Ky3hHy: error: input module is broken!
[E] Thu Jun  7 11:01:37 2018:  (1) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)+0xf4 [0x76b776a8]
[E] Thu Jun  7 11:01:37 2018:  (2) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : +0x6a27d0 [0x76e017d0]
[E] Thu Jun  7 11:01:37 2018:  (3) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::precompilation::linkLLVMModules(std::vector, std::allocator > >&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x364 [0x76e02770]
[E] Thu Jun  7 11:01:37 2018:  (4) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : vc4c::precompilation::linkInStdlibModule(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x78 [0x76e01f9c]
[E] Thu Jun  7 11:01:37 2018:  (5) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::_Function_handler&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&), void (*)(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::_M_invoke(std::_Any_data const&, vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x60 [0x76e04ef0]
[E] Thu Jun  7 11:01:37 2018:  (6) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0x78 [0x76e03fb0]
[E] Thu Jun  7 11:01:37 2018:  (7) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> vc4c::precompilation::chainSteps<(vc4c::SourceType)3, (vc4c::SourceType)1, (vc4c::SourceType)3>(std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&, std::function&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&)::{lambda(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)#1}::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0xd0 [0x76e02e9c]
[E] Thu Jun  7 11:01:37 2018:  (8) /home/pi/TestOpenCL/VC4C/build/libVC4CC.so.1.2 : std::_Function_handler&&, std::__cxx11::basic_string, std::allocator > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&), std::function

VC4Cl BOINC compatibly

Hello everyone! Was wondering if anyone has tried to use BOINC with GPU compute on raspberry pi?

The ability to have this function would stupidly improve the computing power, and make cheap energy efficient available for distributed computing projects that could GREATLY help the scientific field.

Leaks device memory

Some applications leak device memory, e.g. OpenCL CTS test buffers/test_buffers buffer_copy.

Possible reasons:

  • client application does not free all resources properly
  • implementation does not set/handle the reference count properly
  • a combination of both

Need to make sure, all resources (esp. device memory) are freed when the application exits!

NOTE: this could have something to do with the background-thread sometimes being aborted (instead of exiting normally)

clpeak have a lot of error:(

[W] Tue Jun 19 18:10:46 2018: 64-bit operations are not supported by the VideoCore IV architecture, further compilation may fail!
[W] Tue Jun 19 18:10:46 2018: 64-bit operations are not supported by the VideoCore IV architecture, further compilation may fail!
[W] Tue Jun 19 18:10:46 2018: 64-bit operations are not supported by the VideoCore IV architecture, further compilation may fail!
[E] Tue Jun 19 18:11:23 2018: Failed to in-line or intrinsify function-call: <16 x f16> %call31 = <16 x f16> madDhS_S_(<16 x f16> %splat.splat, <16 x f16> %vecinit29, <16 x f16> %splat.splat)
[W] Tue Jun 19 18:11:23 2018: Candidates:
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v1_local_offset(f32* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: i32 _Z12get_group_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z14get_local_sizej(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z12get_local_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z13get_global_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v2_local_offset(<2 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v4_local_offset(<4 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v8_local_offset(<8 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v16_local_offset(<16 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v1_global_offset(f32* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: i32 _Z15get_global_sizej(i32 %dim
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v2_global_offset(<2 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v4_global_offset(<4 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v8_global_offset(<8 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v16_global_offset(<16 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v1(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: f32 _Z3madfff(f32 %a, f32 %b, f32 %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v2(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <2 x f32> _Z3madfS_S_(<2 x f32> %a, <2 x f32> %b, <2 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v4(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <4 x f32> _Z3madfS_S_(<4 x f32> %a, <4 x f32> %b, <4 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v8(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <8 x f32> _Z3madfS_S_(<8 x f32> %a, <8 x f32> %b, <8 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v16(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <16 x f32> _Z3madfS_S_(<16 x f32> %a, <16 x f32> %b, <16 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v1(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v2(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v4(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v8(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v16(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v1(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v2(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v4(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v8(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v16(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v1(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v2(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v4(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v8(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v16(i32* %ptr, i32 %_A
[E] Tue Jun 19 18:11:23 2018: Failed to in-line or intrinsify function-call: <8 x f16> %call15 = <8 x f16> madDhS_S_(<8 x f16> %splat.splat, <8 x f16> %vecinit13, <8 x f16> %splat.splat)
[W] Tue Jun 19 18:11:23 2018: Candidates:
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v1_local_offset(f32* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: i32 _Z12get_group_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z14get_local_sizej(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z12get_local_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z13get_global_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v2_local_offset(<2 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v4_local_offset(<4 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v8_local_offset(<8 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v16_local_offset(<16 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v1_global_offset(f32* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: i32 _Z15get_global_sizej(i32 %dim
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v2_global_offset(<2 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v4_global_offset(<4 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v8_global_offset(<8 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v16_global_offset(<16 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v1(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: f32 _Z3madfff(f32 %a, f32 %b, f32 %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v2(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <2 x f32> _Z3madfS_S_(<2 x f32> %a, <2 x f32> %b, <2 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v4(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <4 x f32> _Z3madfS_S_(<4 x f32> %a, <4 x f32> %b, <4 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v8(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <8 x f32> _Z3madfS_S_(<8 x f32> %a, <8 x f32> %b, <8 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v16(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <16 x f32> _Z3madfS_S_(<16 x f32> %a, <16 x f32> %b, <16 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v1(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v2(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v4(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v8(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v16(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v1(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v2(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v4(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v8(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v16(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v1(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v2(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v4(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v8(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v16(i32* %ptr, i32 %_A
[E] Tue Jun 19 18:11:23 2018: Failed to in-line or intrinsify function-call: <4 x f16> %call7 = <4 x f16> madDhS_S_(<4 x f16> %splat.splat, <4 x f16> %vecinit5, <4 x f16> %splat.splat)
[W] Tue Jun 19 18:11:23 2018: Candidates:
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v1_local_offset(f32* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: i32 _Z12get_group_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z14get_local_sizej(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z12get_local_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z13get_global_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v2_local_offset(<2 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v4_local_offset(<4 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v8_local_offset(<8 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v16_local_offset(<16 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v1_global_offset(f32* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: i32 _Z15get_global_sizej(i32 %dim
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v2_global_offset(<2 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v4_global_offset(<4 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v8_global_offset(<8 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v16_global_offset(<16 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v1(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: f32 _Z3madfff(f32 %a, f32 %b, f32 %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v2(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <2 x f32> _Z3madfS_S_(<2 x f32> %a, <2 x f32> %b, <2 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v4(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <4 x f32> _Z3madfS_S_(<4 x f32> %a, <4 x f32> %b, <4 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v8(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <8 x f32> _Z3madfS_S_(<8 x f32> %a, <8 x f32> %b, <8 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v16(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <16 x f32> _Z3madfS_S_(<16 x f32> %a, <16 x f32> %b, <16 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v1(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v2(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v4(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v8(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v16(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v1(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v2(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v4(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v8(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v16(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v1(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v2(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v4(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v8(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v16(i32* %ptr, i32 %_A
[E] Tue Jun 19 18:11:23 2018: Failed to in-line or intrinsify function-call: <2 x f16> %call3 = <2 x f16> madDhS_S_(<2 x f16> %splat.splat, <2 x f16> %vecinit1, <2 x f16> %splat.splat)
[W] Tue Jun 19 18:11:23 2018: Candidates:
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v1_local_offset(f32* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: i32 _Z12get_group_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z14get_local_sizej(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z12get_local_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: i32 _Z13get_global_idj(i32 %dim
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v2_local_offset(<2 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v4_local_offset(<4 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v8_local_offset(<8 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v16_local_offset(<16 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v1_global_offset(f32* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: i32 _Z15get_global_sizej(i32 %dim
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v2_global_offset(<2 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v4_global_offset(<4 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v8_global_offset(<8 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void global_bandwidth_v16_global_offset(<16 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v1(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: f32 _Z3madfff(f32 %a, f32 %b, f32 %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v2(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <2 x f32> _Z3madfS_S_(<2 x f32> %a, <2 x f32> %b, <2 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v4(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <4 x f32> _Z3madfS_S_(<4 x f32> %a, <4 x f32> %b, <4 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v8(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <8 x f32> _Z3madfS_S_(<8 x f32> %a, <8 x f32> %b, <8 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_sp_v16(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:23 2018: <16 x f32> _Z3madfS_S_(<16 x f32> %a, <16 x f32> %b, <16 x f32> %c
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v1(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v2(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v4(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v8(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_hp_v16(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v1(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v2(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v4(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v8(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_dp_v16(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v1(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v2(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v4(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v8(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:23 2018: void compute_integer_v16(i32* %ptr, i32 %_A
[E] Tue Jun 19 18:11:23 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x7682e5d0]
[E] Tue Jun 19 18:11:23 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x6cd3a0 [0x76aad3a0]
[E] Tue Jun 19 18:11:23 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::VPMArea::toWriteSetup(vc4c::DataType const&) const+0x154 [0x76ab1af4]
[E] Tue Jun 19 18:11:23 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::VPM::insertWriteVPM(vc4c::Method&, vc4c::InstructionWalker, vc4c::Value const&, vc4c::periphery::VPMArea const*, bool, vc4c::Value const&)+0xd4 [0x76aafeac]
[E] Tue Jun 19 18:11:23 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::insertWriteDMA(vc4c::Method&, vc4c::InstructionWalker, vc4c::Value const&, vc4c::Value const&, bool)+0xb0 [0x76aaf0b4]
[E] Tue Jun 19 18:11:23 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : +0x669020 [0x76a49020]
[E] Tue Jun 19 18:11:23 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : +0x6696a0 [0x76a496a0]
[E] Tue Jun 19 18:11:23 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::mapMemoryAccess(vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&)+0x3b4 [0x76a4acb0]
[E] Tue Jun 19 18:11:23 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::Normalizer::normalizeMethod(vc4c::Module&, vc4c::Method&) const+0x22c [0x76a580dc]
[E] Tue Jun 19 18:11:23 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : +0x6775f8 [0x76a575f8]
[E] Tue Jun 19 18:11:23 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : +0x678f74 [0x76a58f74]
[E] Tue Jun 19 18:11:23 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76834dec]
[E] Tue Jun 19 18:11:23 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const+0xc8 [0x768334ec]
[E] Tue Jun 19 18:11:23 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x76838380]
[E] Tue Jun 19 18:11:23 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x7681f2a4]
[E] Tue Jun 19 18:11:23 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : +0x43e430 [0x7681e430]
[E] Tue Jun 19 18:11:23 2018:  (17) /usr/local/lib/libVC4CC.so.1.2 : +0x43ed1c [0x7681ed1c]
[E] Tue Jun 19 18:11:23 2018:  (18) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec88 [0x7681ec88]
[E] Tue Jun 19 18:11:23 2018:  (19) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec60 [0x7681ec60]
[E] Tue Jun 19 18:11:23 2018:  (20) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x76ec59dc]
[E] Tue Jun 19 18:11:23 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x7682e5d0]
[E] Tue Jun 19 18:11:23 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x677344 [0x76a57344]
[E] Tue Jun 19 18:11:23 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&), void (*)(vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::_M_invoke(std::_Any_data const&, vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker&&, vc4c::Configuration const&)+0x70 [0x76a5c390]
[E] Tue Jun 19 18:11:23 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::operator()(vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&) const+0x94 [0x76a5bd00]
[E] Tue Jun 19 18:11:23 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : +0x677558 [0x76a57558]
[E] Tue Jun 19 18:11:23 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::Normalizer::normalizeMethod(vc4c::Module&, vc4c::Method&) const+0x38c [0x76a5823c]
[E] Tue Jun 19 18:11:23 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : +0x6775f8 [0x76a575f8]
[E] Tue Jun 19 18:11:23 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : +0x678f74 [0x76a58f74]
[E] Tue Jun 19 18:11:23 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76834dec]
[E] Tue Jun 19 18:11:23 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const+0xc8 [0x768334ec]
[E] Tue Jun 19 18:11:23 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x76838380]
[E] Tue Jun 19 18:11:23 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x7681f2a4]
[E] Tue Jun 19 18:11:23 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : +0x43e430 [0x7681e430]
[E] Tue Jun 19 18:11:23 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : +0x43ed1c [0x7681ed1c]
[E] Tue Jun 19 18:11:23 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec88 [0x7681ec88]
[E] Tue Jun 19 18:11:23 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec60 [0x7681ec60]
[E] Tue Jun 19 18:11:23 2018:  (17) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x76ec59dc]
[E] Tue Jun 19 18:11:24 2018: Failed to in-line or intrinsify function-call: f16 %call2 = f16 madDhDhDh(f16 %conv1, f16 %conv, f16 %conv1)
[W] Tue Jun 19 18:11:24 2018: Candidates:
[W] Tue Jun 19 18:11:24 2018: void global_bandwidth_v1_local_offset(f32* %A, f32* %B
[W] Tue Jun 19 18:11:24 2018: i32 _Z12get_group_idj(i32 %dim
[W] Tue Jun 19 18:11:24 2018: i32 _Z14get_local_sizej(i32 %dim
[W] Tue Jun 19 18:11:24 2018: i32 _Z12get_local_idj(i32 %dim
[W] Tue Jun 19 18:11:24 2018: i32 _Z13get_global_idj(i32 %dim
[W] Tue Jun 19 18:11:24 2018: void global_bandwidth_v2_local_offset(<2 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:24 2018: void global_bandwidth_v4_local_offset(<4 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:24 2018: void global_bandwidth_v8_local_offset(<8 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:24 2018: void global_bandwidth_v16_local_offset(<16 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:24 2018: void global_bandwidth_v1_global_offset(f32* %A, f32* %B
[W] Tue Jun 19 18:11:24 2018: i32 _Z15get_global_sizej(i32 %dim
[W] Tue Jun 19 18:11:24 2018: void global_bandwidth_v2_global_offset(<2 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:24 2018: void global_bandwidth_v4_global_offset(<4 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:24 2018: void global_bandwidth_v8_global_offset(<8 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:24 2018: void global_bandwidth_v16_global_offset(<16 x f32>* %A, f32* %B
[W] Tue Jun 19 18:11:24 2018: void compute_sp_v1(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:24 2018: f32 _Z3madfff(f32 %a, f32 %b, f32 %c
[W] Tue Jun 19 18:11:24 2018: void compute_sp_v2(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:24 2018: <2 x f32> _Z3madfS_S_(<2 x f32> %a, <2 x f32> %b, <2 x f32> %c
[W] Tue Jun 19 18:11:24 2018: void compute_sp_v4(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:24 2018: <4 x f32> _Z3madfS_S_(<4 x f32> %a, <4 x f32> %b, <4 x f32> %c
[W] Tue Jun 19 18:11:24 2018: void compute_sp_v8(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:24 2018: <8 x f32> _Z3madfS_S_(<8 x f32> %a, <8 x f32> %b, <8 x f32> %c
[W] Tue Jun 19 18:11:24 2018: void compute_sp_v16(f32* %ptr, f32 %_A
[W] Tue Jun 19 18:11:24 2018: <16 x f32> _Z3madfS_S_(<16 x f32> %a, <16 x f32> %b, <16 x f32> %c
[W] Tue Jun 19 18:11:24 2018: void compute_hp_v1(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:24 2018: void compute_hp_v2(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:24 2018: void compute_hp_v4(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:24 2018: void compute_hp_v8(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:24 2018: void compute_hp_v16(f16* %ptr, f32 %_B
[W] Tue Jun 19 18:11:24 2018: void compute_dp_v1(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:24 2018: void compute_dp_v2(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:24 2018: void compute_dp_v4(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:24 2018: void compute_dp_v8(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:24 2018: void compute_dp_v16(f64* %ptr, f64 %_A
[W] Tue Jun 19 18:11:24 2018: void compute_integer_v1(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:24 2018: void compute_integer_v2(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:24 2018: void compute_integer_v4(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:24 2018: void compute_integer_v8(i32* %ptr, i32 %_A
[W] Tue Jun 19 18:11:24 2018: void compute_integer_v16(i32* %ptr, i32 %_A
[E] Tue Jun 19 18:11:24 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x7682e5d0]
[E] Tue Jun 19 18:11:24 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x677344 [0x76a57344]
[E] Tue Jun 19 18:11:24 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&), void (*)(vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::_M_invoke(std::_Any_data const&, vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker&&, vc4c::Configuration const&)+0x70 [0x76a5c390]
[E] Tue Jun 19 18:11:24 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::operator()(vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&) const+0x94 [0x76a5bd00]
[E] Tue Jun 19 18:11:24 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : +0x677558 [0x76a57558]
[E] Tue Jun 19 18:11:24 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::Normalizer::normalizeMethod(vc4c::Module&, vc4c::Method&) const+0x38c [0x76a5823c]
[E] Tue Jun 19 18:11:24 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : +0x6775f8 [0x76a575f8]
[E] Tue Jun 19 18:11:24 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : +0x678f74 [0x76a58f74]
[E] Tue Jun 19 18:11:24 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76834dec]
[E] Tue Jun 19 18:11:24 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const+0xc8 [0x768334ec]
[E] Tue Jun 19 18:11:24 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x76838380]
[E] Tue Jun 19 18:11:24 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x7681f2a4]
[E] Tue Jun 19 18:11:24 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : +0x43e430 [0x7681e430]
[E] Tue Jun 19 18:11:24 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : +0x43ed1c [0x7681ed1c]
[E] Tue Jun 19 18:11:24 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec88 [0x7681ec88]
[E] Tue Jun 19 18:11:24 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec60 [0x7681ec60]
[E] Tue Jun 19 18:11:24 2018:  (17) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x76ec59dc]
[E] Tue Jun 19 18:11:24 2018: Background worker threw error: Normalizer: Not normalized instruction found: <2 x f16> %call3 = <2 x f16> madDhS_S_(<2 x f16> %splat.splat, <2 x f16> %vecinit1, <2 x f16> %splat.splat)
[E] Tue Jun 19 18:11:24 2018: While running worker task: Normalization
[E] Tue Jun 19 18:11:24 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x7682e5d0]
[E] Tue Jun 19 18:11:24 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x677344 [0x76a57344]
[E] Tue Jun 19 18:11:24 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&), void (*)(vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::_M_invoke(std::_Any_data const&, vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker&&, vc4c::Configuration const&)+0x70 [0x76a5c390]
[E] Tue Jun 19 18:11:24 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::operator()(vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&) const+0x94 [0x76a5bd00]
[E] Tue Jun 19 18:11:24 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : +0x677558 [0x76a57558]
[E] Tue Jun 19 18:11:24 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::Normalizer::normalizeMethod(vc4c::Module&, vc4c::Method&) const+0x38c [0x76a5823c]
[E] Tue Jun 19 18:11:24 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : +0x6775f8 [0x76a575f8]
[E] Tue Jun 19 18:11:24 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : +0x678f74 [0x76a58f74]
[E] Tue Jun 19 18:11:24 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76834dec]
[E] Tue Jun 19 18:11:24 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const+0xc8 [0x768334ec]
[E] Tue Jun 19 18:11:24 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x76838380]
[E] Tue Jun 19 18:11:24 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x7681f2a4]
[E] Tue Jun 19 18:11:24 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : +0x43e430 [0x7681e430]
[E] Tue Jun 19 18:11:24 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : +0x43ed1c [0x7681ed1c]
[E] Tue Jun 19 18:11:24 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec88 [0x7681ec88]
[E] Tue Jun 19 18:11:24 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec60 [0x7681ec60]
[E] Tue Jun 19 18:11:24 2018:  (17) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x76ec59dc]
[E] Tue Jun 19 18:11:24 2018: Background worker threw error: Normalizer: Not normalized instruction found: <4 x f16> %call7 = <4 x f16> madDhS_S_(<4 x f16> %splat.splat, <4 x f16> %vecinit5, <4 x f16> %splat.splat)
[E] Tue Jun 19 18:11:24 2018: While running worker task: Normalization
[E] Tue Jun 19 18:11:24 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x7682e5d0]
[E] Tue Jun 19 18:11:24 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x6cd3a0 [0x76aad3a0]
[E] Tue Jun 19 18:11:24 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::VPMArea::toWriteSetup(vc4c::DataType const&) const+0x154 [0x76ab1af4]
[E] Tue Jun 19 18:11:24 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::VPM::insertWriteVPM(vc4c::Method&, vc4c::InstructionWalker, vc4c::Value const&, vc4c::periphery::VPMArea const*, bool, vc4c::Value const&)+0xd4 [0x76aafeac]
[E] Tue Jun 19 18:11:24 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::insertWriteDMA(vc4c::Method&, vc4c::InstructionWalker, vc4c::Value const&, vc4c::Value const&, bool)+0xb0 [0x76aaf0b4]
[E] Tue Jun 19 18:11:24 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : +0x669020 [0x76a49020]
[E] Tue Jun 19 18:11:24 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : +0x6696a0 [0x76a496a0]
[E] Tue Jun 19 18:11:24 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::mapMemoryAccess(vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&)+0x3b4 [0x76a4acb0]
[E] Tue Jun 19 18:11:24 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::Normalizer::normalizeMethod(vc4c::Module&, vc4c::Method&) const+0x22c [0x76a580dc]
[E] Tue Jun 19 18:11:24 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : +0x6775f8 [0x76a575f8]
[E] Tue Jun 19 18:11:24 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : +0x678f74 [0x76a58f74]
[E] Tue Jun 19 18:11:24 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76834dec]
[E] Tue Jun 19 18:11:24 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const+0xc8 [0x768334ec]
[E] Tue Jun 19 18:11:24 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x76838380]
[E] Tue Jun 19 18:11:24 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x7681f2a4]
[E] Tue Jun 19 18:11:24 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : +0x43e430 [0x7681e430]
[E] Tue Jun 19 18:11:24 2018:  (17) /usr/local/lib/libVC4CC.so.1.2 : +0x43ed1c [0x7681ed1c]
[E] Tue Jun 19 18:11:24 2018:  (18) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec88 [0x7681ec88]
[E] Tue Jun 19 18:11:24 2018:  (19) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec60 [0x7681ec60]
[E] Tue Jun 19 18:11:24 2018:  (20) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x76ec59dc]
[E] Tue Jun 19 18:11:24 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x7682e5d0]
[E] Tue Jun 19 18:11:24 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x6cd3a0 [0x76aad3a0]
[E] Tue Jun 19 18:11:24 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::VPMArea::toWriteSetup(vc4c::DataType const&) const+0x154 [0x76ab1af4]
[E] Tue Jun 19 18:11:24 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::VPM::insertWriteVPM(vc4c::Method&, vc4c::InstructionWalker, vc4c::Value const&, vc4c::periphery::VPMArea const*, bool, vc4c::Value const&)+0xd4 [0x76aafeac]
[E] Tue Jun 19 18:11:24 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::insertWriteDMA(vc4c::Method&, vc4c::InstructionWalker, vc4c::Value const&, vc4c::Value const&, bool)+0xb0 [0x76aaf0b4]
[E] Tue Jun 19 18:11:24 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : +0x669020 [0x76a49020]
[E] Tue Jun 19 18:11:24 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : +0x6696a0 [0x76a496a0]
[E] Tue Jun 19 18:11:24 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::mapMemoryAccess(vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&)+0x3b4 [0x76a4acb0]
[E] Tue Jun 19 18:11:24 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::Normalizer::normalizeMethod(vc4c::Module&, vc4c::Method&) const+0x22c [0x76a580dc]
[E] Tue Jun 19 18:11:24 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : +0x6775f8 [0x76a575f8]
[E] Tue Jun 19 18:11:24 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : +0x678f74 [0x76a58f74]
[E] Tue Jun 19 18:11:24 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76834dec]
[E] Tue Jun 19 18:11:24 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const+0xc8 [0x768334ec]
[E] Tue Jun 19 18:11:24 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x76838380]
[E] Tue Jun 19 18:11:24 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x7681f2a4]
[E] Tue Jun 19 18:11:24 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : +0x43e430 [0x7681e430]
[E] Tue Jun 19 18:11:24 2018:  (17) /usr/local/lib/libVC4CC.so.1.2 : +0x43ed1c [0x7681ed1c]
[E] Tue Jun 19 18:11:24 2018:  (18) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec88 [0x7681ec88]
[E] Tue Jun 19 18:11:24 2018:  (19) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec60 [0x7681ec60]
[E] Tue Jun 19 18:11:24 2018:  (20) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x76ec59dc]
[E] Tue Jun 19 18:11:24 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x7682e5d0]
[E] Tue Jun 19 18:11:24 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x6cd3a0 [0x76aad3a0]
[E] Tue Jun 19 18:11:24 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::VPMArea::toWriteSetup(vc4c::DataType const&) const+0x154 [0x76ab1af4]
[E] Tue Jun 19 18:11:24 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::VPM::insertWriteVPM(vc4c::Method&, vc4c::InstructionWalker, vc4c::Value const&, vc4c::periphery::VPMArea const*, bool, vc4c::Value const&)+0xd4 [0x76aafeac]
[E] Tue Jun 19 18:11:24 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::insertWriteDMA(vc4c::Method&, vc4c::InstructionWalker, vc4c::Value const&, vc4c::Value const&, bool)+0xb0 [0x76aaf0b4]
[E] Tue Jun 19 18:11:24 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : +0x669020 [0x76a49020]
[E] Tue Jun 19 18:11:24 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : +0x6696a0 [0x76a496a0]
[E] Tue Jun 19 18:11:24 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::mapMemoryAccess(vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&)+0x3b4 [0x76a4acb0]
[E] Tue Jun 19 18:11:24 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::Normalizer::normalizeMethod(vc4c::Module&, vc4c::Method&) const+0x22c [0x76a580dc]
[E] Tue Jun 19 18:11:24 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : +0x6775f8 [0x76a575f8]
[E] Tue Jun 19 18:11:24 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : +0x678f74 [0x76a58f74]
[E] Tue Jun 19 18:11:24 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76834dec]
[E] Tue Jun 19 18:11:24 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const+0xc8 [0x768334ec]
[E] Tue Jun 19 18:11:24 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x76838380]
[E] Tue Jun 19 18:11:24 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x7681f2a4]
[E] Tue Jun 19 18:11:24 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : +0x43e430 [0x7681e430]
[E] Tue Jun 19 18:11:24 2018:  (17) /usr/local/lib/libVC4CC.so.1.2 : +0x43ed1c [0x7681ed1c]
[E] Tue Jun 19 18:11:24 2018:  (18) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec88 [0x7681ec88]
[E] Tue Jun 19 18:11:24 2018:  (19) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec60 [0x7681ec60]
[E] Tue Jun 19 18:11:24 2018:  (20) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x76ec59dc]
[E] Tue Jun 19 18:11:24 2018: Background worker threw error: Normalizer: Not normalized instruction found: <8 x f16> %call15 = <8 x f16> madDhS_S_(<8 x f16> %splat.splat, <8 x f16> %vecinit13, <8 x f16> %splat.splat)
[E] Tue Jun 19 18:11:24 2018: While running worker task: Normalization
[E] Tue Jun 19 18:11:24 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x7682e5d0]
[E] Tue Jun 19 18:11:24 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x677344 [0x76a57344]
[E] Tue Jun 19 18:11:24 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&), void (*)(vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::_M_invoke(std::_Any_data const&, vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker&&, vc4c::Configuration const&)+0x70 [0x76a5c390]
[E] Tue Jun 19 18:11:24 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::operator()(vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&) const+0x94 [0x76a5bd00]
[E] Tue Jun 19 18:11:24 2018: Background worker threw error: General: Invalid parameter type-size: 64
[E] Tue Jun 19 18:11:24 2018: Background worker threw error: General: Invalid parameter type-size: 64
[E] Tue Jun 19 18:11:24 2018: While running worker task: Normalization
[E] Tue Jun 19 18:11:24 2018: Background worker threw error: General: Invalid parameter type-size: 64
[E] Tue Jun 19 18:11:24 2018: While running worker task: Normalization
[E] Tue Jun 19 18:11:24 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : +0x677558 [0x76a57558]
[E] Tue Jun 19 18:11:24 2018: While running worker task: Normalization
[E] Tue Jun 19 18:11:24 2018: Background worker threw error: General: Invalid parameter type-size: 64
[E] Tue Jun 19 18:11:24 2018: While running worker task: Normalization
[E] Tue Jun 19 18:11:24 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::Normalizer::normalizeMethod(vc4c::Module&, vc4c::Method&) const+0x38c [0x76a5823c]
[E] Tue Jun 19 18:11:24 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : +0x6775f8 [0x76a575f8]
[E] Tue Jun 19 18:11:24 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : +0x678f74 [0x76a58f74]
[E] Tue Jun 19 18:11:24 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76834dec]
[E] Tue Jun 19 18:11:24 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const+0xc8 [0x768334ec]
[E] Tue Jun 19 18:11:24 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x76838380]
[E] Tue Jun 19 18:11:24 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x7681f2a4]
[E] Tue Jun 19 18:11:24 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : +0x43e430 [0x7681e430]
[E] Tue Jun 19 18:11:24 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : +0x43ed1c [0x7681ed1c]
[E] Tue Jun 19 18:11:24 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec88 [0x7681ec88]
[E] Tue Jun 19 18:11:24 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec60 [0x7681ec60]
[E] Tue Jun 19 18:11:24 2018:  (17) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x76ec59dc]
[E] Tue Jun 19 18:11:24 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x7682e5d0]
[E] Tue Jun 19 18:11:24 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x6cd3a0 [0x76aad3a0]
[E] Tue Jun 19 18:11:24 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::VPMArea::toWriteSetup(vc4c::DataType const&) const+0x154 [0x76ab1af4]
[E] Tue Jun 19 18:11:24 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::VPM::insertWriteVPM(vc4c::Method&, vc4c::InstructionWalker, vc4c::Value const&, vc4c::periphery::VPMArea const*, bool, vc4c::Value const&)+0xd4 [0x76aafeac]
[E] Tue Jun 19 18:11:24 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : vc4c::periphery::insertWriteDMA(vc4c::Method&, vc4c::InstructionWalker, vc4c::Value const&, vc4c::Value const&, bool)+0xb0 [0x76aaf0b4]
[E] Tue Jun 19 18:11:24 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : +0x669020 [0x76a49020]
[E] Tue Jun 19 18:11:24 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : +0x6696a0 [0x76a496a0]
[E] Tue Jun 19 18:11:24 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::mapMemoryAccess(vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&)+0x3b4 [0x76a4acb0]
[E] Tue Jun 19 18:11:24 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::Normalizer::normalizeMethod(vc4c::Module&, vc4c::Method&) const+0x22c [0x76a580dc]
[E] Tue Jun 19 18:11:24 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : +0x6775f8 [0x76a575f8]
[E] Tue Jun 19 18:11:24 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : +0x678f74 [0x76a58f74]
[E] Tue Jun 19 18:11:24 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76834dec]
[E] Tue Jun 19 18:11:24 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const+0xc8 [0x768334ec]
[E] Tue Jun 19 18:11:24 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x76838380]
[E] Tue Jun 19 18:11:24 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x7681f2a4]
[E] Tue Jun 19 18:11:24 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : +0x43e430 [0x7681e430]
[E] Tue Jun 19 18:11:24 2018:  (17) /usr/local/lib/libVC4CC.so.1.2 : +0x43ed1c [0x7681ed1c]
[E] Tue Jun 19 18:11:24 2018:  (18) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec88 [0x7681ec88]
[E] Tue Jun 19 18:11:24 2018:  (19) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec60 [0x7681ec60]
[E] Tue Jun 19 18:11:24 2018:  (20) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x76ec59dc]
[E] Tue Jun 19 18:11:24 2018: Background worker threw error: General: Invalid parameter type-size: 64
[E] Tue Jun 19 18:11:24 2018: While running worker task: Normalization
[E] Tue Jun 19 18:11:24 2018: Background worker threw error: Normalizer: Not normalized instruction found: f16 %call2 = f16 madDhDhDh(f16 %conv1, f16 %conv, f16 %conv1)
[E] Tue Jun 19 18:11:24 2018: While running worker task: Normalization
[E] Tue Jun 19 18:11:24 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x7682e5d0]
[E] Tue Jun 19 18:11:24 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x677344 [0x76a57344]
[E] Tue Jun 19 18:11:24 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&), void (*)(vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::_M_invoke(std::_Any_data const&, vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker&&, vc4c::Configuration const&)+0x70 [0x76a5c390]
[E] Tue Jun 19 18:11:24 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::operator()(vc4c::Module&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&) const+0x94 [0x76a5bd00]
[E] Tue Jun 19 18:11:24 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : +0x677558 [0x76a57558]
[E] Tue Jun 19 18:11:24 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : vc4c::normalization::Normalizer::normalizeMethod(vc4c::Module&, vc4c::Method&) const+0x38c [0x76a5823c]
[E] Tue Jun 19 18:11:24 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : +0x6775f8 [0x76a575f8]
[E] Tue Jun 19 18:11:24 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : +0x678f74 [0x76a58f74]
[E] Tue Jun 19 18:11:24 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Method* const&)>::operator()(vc4c::Method* const&) const+0x54 [0x76834dec]
[E] Tue Jun 19 18:11:24 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}::operator()() const+0xc8 [0x768334ec]
[E] Tue Jun 19 18:11:24 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (), void vc4c::BackgroundWorker::scheduleAll<vc4c::Method*, std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > >(std::vector<vc4c::Method*, std::allocator<vc4c::Method*> > const&, std::function<void (vc4c::Method* const&)> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x24 [0x76838380]
[E] Tue Jun 19 18:11:24 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0x7681f2a4]
[E] Tue Jun 19 18:11:24 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : +0x43e430 [0x7681e430]
[E] Tue Jun 19 18:11:24 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : +0x43ed1c [0x7681ed1c]
[E] Tue Jun 19 18:11:24 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec88 [0x7681ec88]
[E] Tue Jun 19 18:11:24 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : +0x43ec60 [0x7681ec60]
[E] Tue Jun 19 18:11:24 2018:  (17) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0x76ec59dc]
[E] Tue Jun 19 18:11:24 2018: Background worker threw error: Normalizer: Not normalized instruction found: <16 x f16> %call31 = <16 x f16> madDhS_S_(<16 x f16> %splat.splat, <16 x f16> %vecinit29, <16 x f16> %splat.splat)
[E] Tue Jun 19 18:11:24 2018: While running worker task: Normalization
[E] Tue Jun 19 18:11:34 2018: Compiler threw exception: Normalizer: Not normalized instruction found: <8 x f16> %call15 = <8 x f16> madDhS_S_(<8 x f16> %splat.splat, <8 x f16> %vecinit13, <8 x f16> %splat.splat)


Test Compile Error

I have troubled with the following errors when compiling test cases.
What I did is:

  • cross-compile LLVM-VSPHIR and copy them to rasberrypi
  • build VC4C
  • build VC4CL
  • then, build VC4CL/test

Any suggestion?

[  6%] Building CXX object CMakeFiles/TestVC4CL.dir/test.o                                                                                                                                                                                                                                                                                                                           
/usr/bin/c++   -DTEST_OUTPUT_CONSOLE=1 -I/home/pi/work/VC4CL/test/.. -I/home/pi/work/VC4CL/test/../include -I/home/pi/work/VC4CL/test/lib/cpptest-lite/include   -o CMakeFiles/TestVC4CL.dir/test.o -c /home/pi/work/VC4CL/test/test.cpp                                                                                                                                             
In file included from /home/pi/work/VC4CL/test/../src/extensions.h:10:0,                                                                                                                                                                                                                                                                                                             
                 from /home/pi/work/VC4CL/test/TestExtension.h:11,                                                                                                                                                                                                                                                                                                                   
                 from /home/pi/work/VC4CL/test/test.cpp:20:                                                                                                                                                                                                                                                                                                                          
/home/pi/work/VC4CL/test/../src/common.h:31:106: warning: ignoring attributes on template argument โ€˜cl_int {aka int}โ€™ [-Wignored-attributes]                                                                                                                                                                                                                                         
     CHECK_RETURN typename std::enable_if<std::is_arithmetic<T>::value | std::is_pointer<T>::value, cl_int>::type                                                                                                                                                                                                                                                                    
                                                                                                          ^                                                                                                                                                                                                                                                                          
In file included from /home/pi/work/VC4CL/test/../src/types.h:12:0,                                                                                                                                                                                                                                                                                                                  
                 from /home/pi/work/VC4CL/test/../src/common.h:15,                                                                                                                                                                                                                                                                                                                   
                 from /home/pi/work/VC4CL/test/../src/extensions.h:10,                                                                                                                                                                                                                                                                                                               
                 from /home/pi/work/VC4CL/test/TestExtension.h:11,                                                                                                                                                                                                                                                                                                                   
                 from /home/pi/work/VC4CL/test/test.cpp:20:                                                                                                                                                                                                                                                                                                                          
/home/pi/work/VC4CL/test/test.cpp: In function โ€˜int main(int, char**)โ€™:                                                                                                                                                                                                                                                                                                              
/home/pi/work/VC4CL/test/test.cpp:45:34: error: โ€˜struct _cl_contextโ€™ has no member named โ€˜dispatchโ€™                                                                                                                                                                                                                                                                                  
     assert(offsetof(_cl_context, dispatch) == 0);                                                                                                                                                                                                                                                                                                                                   
                                  ^                                                                                                                                                                                                                                                                                                                                                  
CMakeFiles/TestVC4CL.dir/build.make:374: recipe for target 'CMakeFiles/TestVC4CL.dir/test.o' failed                                                                                                                                                                                                                                                                                  

Optimizer error: There should be no more function calls - convert_float_rtef ()

The cl kernel I am trying to compile is mcxcl - the code download URL and test commands can be found in Issue #19.

There were two previous errors (precompilation error - due to the missing of CLANG_PATH, fixed in doe300/VC4C#34, and an compilation error - due to the use of 64bit data structures, see #19 (comment)).

To avoid using the 64bit data structures (ulong and double), one can append -J "-DUSE_LL5_RAND" flag in the command line. However, I am now encountering a new error, in the Optimization stage, error log is listed below:

root@raspberrypi:/home/pi/temp/mcxcl/example/benchmark# ./run_benchmark1.sh -n 1e5 -J "-DUSE_LL5_RAND"
...
Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DMCX_SRC_PENCIL  -DUSE_LL5_RAND
Kernel build log:
[W] Fri Feb 23 16:27:23 2018: Warnings in precompilation:
[W] Fri Feb 23 16:27:23 2018: <stdin>:1039:1: warning: null character ignored
<U+0000>
^
1 warning generated.

[E] Fri Feb 23 16:27:41 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0xb662977c]
[E] Fri Feb 23 16:27:41 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x520e78 [0xb685de78]
[E] Fri Feb 23 16:27:41 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<vc4c::InstructionWalker (vc4c::Module const&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&), vc4c::InstructionWalker (*)(vc4c::Module const&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::_M_invoke(std::_Any_data const&, vc4c::Module const&, vc4c::Method&, vc4c::InstructionWalker&&, vc4c::Configuration const&)+0x7c [0xb6865048]
[E] Fri Feb 23 16:27:41 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : std::function<vc4c::InstructionWalker (vc4c::Module const&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&)>::operator()(vc4c::Module const&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&) const+0x94 [0xb6864624]
[E] Fri Feb 23 16:27:41 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : vc4c::optimizations::OptimizationStep::operator()(vc4c::Module const&, vc4c::Method&, vc4c::InstructionWalker, vc4c::Configuration const&) const+0x54 [0xb685dd58]
[E] Fri Feb 23 16:27:41 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : +0x521158 [0xb685e158]
[E] Fri Feb 23 16:27:41 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&), void (*)(vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&)>::_M_invoke(std::_Any_data const&, vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&)+0x60 [0xb68654a8]
[E] Fri Feb 23 16:27:41 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&)>::operator()(vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&) const+0x78 [0xb68644cc]
[E] Fri Feb 23 16:27:41 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : vc4c::optimizations::OptimizationPass::operator()(vc4c::Module const&, vc4c::Method&, vc4c::Configuration const&) const+0x34 [0xb685dbe0]
[E] Fri Feb 23 16:27:41 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : +0x5215a8 [0xb685e5a8]
[E] Fri Feb 23 16:27:41 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : +0x5218c4 [0xb685e8c4]
[E] Fri Feb 23 16:27:41 2018:  (12) /usr/local/lib/libVC4CC.so.1.2 : +0x5228ec [0xb685f8ec]
[E] Fri Feb 23 16:27:41 2018:  (13) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0xb662f1b4]
[E] Fri Feb 23 16:27:41 2018:  (14) /usr/local/lib/libVC4CC.so.1.2 : threading::BackgroundWorker::operator()()::{lambda()#1}::operator()() const+0x58 [0xb662e3d8]
[E] Fri Feb 23 16:27:41 2018:  (15) /usr/local/lib/libVC4CC.so.1.2 : void std::_Bind_simple<threading::BackgroundWorker::operator()()::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>)+0x38 [0xb663617c]
[E] Fri Feb 23 16:27:41 2018:  (16) /usr/local/lib/libVC4CC.so.1.2 : std::_Bind_simple<threading::BackgroundWorker::operator()()::{lambda()#1} ()>::operator()()+0x1c [0xb66360e8]
[E] Fri Feb 23 16:27:41 2018:  (17) /usr/local/lib/libVC4CC.so.1.2 : std::thread::_State_impl<std::_Bind_simple<threading::BackgroundWorker::operator()()::{lambda()#1} ()> >::_M_run()+0x20 [0xb6636054]
[E] Fri Feb 23 16:27:41 2018:  (18) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0xb6df99dc]
[E] Fri Feb 23 16:27:41 2018: Background worker threw error: Optimizer: There should be no more function-calls: float %call.i209.%call27.i = float convert_float_rtef(float %call.i209.%tmp.110)
[E] Fri Feb 23 16:27:43 2018: Compiler threw exception: Optimizer: There should be no more function-calls: float %call.i209.%call27.i = float convert_float_rtef(float %call.i209.%tmp.110)


MCX ERROR(2):Error: Failed to build program executable! in unit mcx_host.cpp:475
terminate called without an active exception
Aborted

it is not clear to me if this error was still caused by the accidental involvement of double precision data. the error was raised in the optimizer call to convert_float_rtef function.

@doe300, do you have any comment on what might be wrong? thanks

Clpeak compute-sp freezes with mailbox

When executed via the mailbox-interface, the clpeak --compute-sp benchmark freezes (the kernel?) in the second execution. The --global-bandwidth benchmark runs fine, so do both benchmarks when run via register-poking.

More exact error description:

  • according to htop, the VC4CL Queue Handler thread moves into status D, which means uninterruptible sleep (usually IO) (see here), which looks like it hangs in the mailbox-syscall
  • forcibly stopping the process is unable to kill the VC4CL Queue Handler thread, even sudo reboot does nothing.

Side note: When this error is fixed, re-examine the delay in src/executor.cpp:108, since at least the --global-bandwidth benchmark seems not to require it (anymore).

v3d_info linking fails (symbols from libbcm_host not found)

When building VC4CL, I get:

Linking CXX executable v3d_info
../libVC4CL.so.0.4: undefined reference to `bcm_host_get_peripheral_address'
../libVC4CL.so.0.4: undefined reference to `bcm_host_init'
../libVC4CL.so.0.4: undefined reference to `bcm_host_deinit'

Linking succeeds if I modify tools/CMakeLists.txt, adding libbcm_host to the libraries for v3d_info as follows:

target_link_libraries(v3d_info VC4CL ${BCMHOST_LIBRARY})

"clinfo" can't open /dev/mem but "sudo clinfo" does not work

I have successfully compiled and installed VC4C, VC4CL and confirmed that both are working; when I do clinfo I get OpenCL information on my GPU and

[VC4CL] can't open /dev/mem
[VC4CL] This program should be run as root. Try prefixing command with: sudo
terminate called after throwing an instance of 'std::system_error'
what(): Failed to open /dev/mem: Operation not permitted
aborted

which is expected as running any VC4CL implementation requires root access. However, when I do "sudo clinfo", I get Number of platforms 0 as output.

Prior to the above, I created file /etc/OpenCL/vendors/VC4CL.icd which contains one line: /usr/local/lib/libVC4CL.so (I also tried libVC4CL.so as the one line but to no avail) as per instructions.

Please help me, thanks.

OpenCL for OpenCV on Pi

Has anyone been able to run OpenCV on the Pi GPU using OpenCL? Is there an example somewhere demonstrating accessing the GPU with OpenCV using OpenCL?

Errors with VC4C. How to use CPU also? Is LD_LIBRARY_PATH=/usr/local/lib/ required before programs?

Hi.

Thank you for creating VC4VL. I am very eager to implement it on several machines.
On a RPi, I have compiled and installed VC4C and VC4CL including tools, but not test.
I do not yet have VC4VL functional. Details of three problems follow.

(1) In VC4C ./build/test/TestVC4C gave an enormity of errors! I mean page after page. I issued this command, as root, after cmake, make, and make install of both VC4CLStdLib and VC4C (but not VC4CL). After about 45min, and two and a half pages of
"64-bit operations are not supported by the VideoCore IV architecture, further compilation may fail!"
the program stopped. Another shell showed cpu usage at 100%, so after 15min, I killed the program.
Is this behavior normal?

(2) When I issue "cmake" in the directory named "test" I see,
cmake ../test
-- The C compiler identification is GNU 6.3.0
-- The CXX compiler identification is GNU 6.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:10 (ExternalProject_Get_Property):
Unknown CMake command "ExternalProject_Get_Property".

CMake Warning (dev) in CMakeLists.txt:
No cmake_minimum_required command is present. A line of code such as

cmake_minimum_required(VERSION 3.7)

should be added at the top of the file. The version specified may be lower
if you wish to support older CMake versions for this project. For more
information run "cmake --help-policy CMP0000".
This warning is for project developers. Use -Wno-dev to suppress it.

-- Configuring incomplete, errors occurred!
See also "/home/VC4C/VC4CL/test/CMakeFiles/CMakeOutput.log".

(3) clinfo shows Device 0 when mesa-opencl-icd is installed, which I presume means it is a superfluous. Please, correct me if I am wrong. I removed mesa-opencl-icd and Device 0 disappeared.
I want to do a comparison of efficacy between the GPU's and the CPU's use of OpenCL. How may I implement OpenCL support for the RPi's CPU?
The full output of clinfo is below. Note, I had to use the prefix for clinfo to work.

LD_LIBRARY_PATH=/usr/local/lib/ clinfo

Number of platforms 1
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Platform Vendor doe300
Platform Version OpenCL 1.2 VC4CL 0.4
Platform Profile EMBEDDED_PROFILE
Platform Extensions cl_khr_il_program cl_khr_spir cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters
Platform Extensions function suffix VC4CL

Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Number of devices 1
Device Name VideoCore IV GPU
Device Vendor Broadcom
Device Vendor ID 0xa5c
Device Version OpenCL 1.2 VC4CL 0.4
Driver Version 0.4
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile EMBEDDED_PROFILE
Max compute units 1
Max clock frequency 300MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Max work item dimensions 3
Max work item sizes 12x12x12
Max work group size 12
Preferred work group size multiple 1
Preferred / native vector sizes
char 16 / 16
short 16 / 16
int 16 / 16
long 0 / 0
half 0 / 0 (n/a)
float 16 / 16
double 0 / 0 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero Yes
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 67108864 (64MiB)
Error Correction support No
Max memory allocation 67108864 (64MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 64 bytes
Alignment of base address 512 bits (64 bytes)
Global Memory cache type Read/Write
Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30>
Global Memory cache line 64 bytes
Image support No
Local memory type Global
Local memory size 67108864 (64MiB)
Max constant buffer size 67108864 (64MiB)
Max number of constant args 64
Max size of kernel argument 256
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
printf() buffer size 0
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_nv_pragma_unroll cl_arm_core_id cl_ext_atomic_counters_32 cl_khr_initialize_memory cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_int16

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) OpenCL for the Raspberry Pi VideoCore IV GPU
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [VC4CL]
clCreateContext(NULL, ...) [default] Success [VC4CL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Device Name VideoCore IV GPU
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Device Name VideoCore IV GPU

ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.11
ICD loader Profile OpenCL 2.1

How may I bring up the CPU as a second (and/or alternative) device?



My summary questions are as follows.

(1) How may I run clinfo (and other programs) without LD_LIBRARY_PATH=/usr/local/lib/ preceding a command short of making symlinks to /usr/lib/?
(2) How may I bring up the CPU as a second (and/or alternative) device?
(3) The device in clinfo's output is not usable owing to errors such as vc4c::CompilationError::CompilationError,vc4c::precompilation::compileOpenCLWithDefaultHeader, etc. I need to know how to diagnose the underlying cause. Is ./build/test/TestVC4C supposed to give multiple pages of errors and then freeze? If so, where else can I look?

I would be very grateful for any assistance with these problems.

Error on RPI3 when running OpenCV4 DNN module

I compiled and installed VC4CL running OpenCV 4.01, Qt5 and LLVM3.9 packages installed, and when I run my program I get this error:

[ INFO:0] Initialize OpenCL runtime...
OpenCV(ocl4dnn): consider to specify kernel configuration cache directory
via OPENCV_OCL4DNN_CONFIG_PATH parameter.
[ INFO:0] Successfully initialized OpenCL cache directory: /root/.cache/opencv/4.0/opencl_cache/
[ INFO:0] Preparing OpenCL cache configuration for context: 32-bit--Broadcom--VideoCore_IV_GPU--0_4
OpenCL program build log: dnn/dummy
Status -15: CL_COMPILE_PROGRAM_FAILURE
-cl-no-subgroup-ifp
[E] Sat Mar 9 14:55:07 2019: Errors in precompilation:
[E] Sat Mar 9 14:55:07 2019: error: unknown argument: '-cl-no-subgroup-ifp'

OpenCL program build log: dnn/conv_layer_spatial
Status -15: CL_COMPILE_PROGRAM_FAILURE
-D TYPE=1 -D Dtype=float -D Dtype2=float2 -D Dtype4=float4 -D Dtype8=float8 -D Dtype16=float16 -D as_Dtype=as_float -D as_Dtype2=as_float2 -D as_Dtype4=as_float4 -D as_Dtype8=as_float8 -D KERNEL_WIDTH=3 -D KERNEL_HEIGHT=3 -D STRIDE_X=2 -D STRIDE_Y=2 -D DILATION_X=1 -D DILATION_Y=1 -D KERNEL_BASIC -cl-fast-relaxed-math -D ConvolveBasic=BASIC_k3x3_cn3_g1_s2x2_d1x1_b1_in256x256_p1x1_num1_M32_activ1_eltwise0_FP32_4_1_1_1 -D CHANNELS=3 -D APPLY_BIAS=1 -D OUTPUT_Z=32 -D ZPAR=1 -D FUSED_CONV_RELU=1
[W] Sat Mar 9 14:55:09 2019: Warnings in precompilation:
[W] Sat Mar 9 14:55:09 2019: :1484:1: warning: null character ignored
<U+0000>
^
1 warning generated.

[E] Sat Mar 9 14:55:09 2019: Errors in precompilation:
[E] Sat Mar 9 14:55:09 2019: ERROR: Invalid value (Producer: 'LLVM6.0.0svn' Reader: 'LLVM 3.9.1')
/usr/bin/llvm-link: /usr/local/include/vc4cl-stdlib/VC4CLStdLib.bc: error: Corrupted bitcode
/usr/bin/llvm-link: error loading file '/usr/local/include/vc4cl-stdlib/VC4CLStdLib.bc'

Failed to compile kernel: BASIC_k3x3_cn3_g1_s2x2_d1x1_b1_in256x256_p1x1_num1_M32_activ1_eltwise0_FP32_4_1_1_1, buildflags: -D TYPE=1 -D Dtype=float -D Dtype2=float2 -D Dtype4=float4 -D Dtype8=float8 -D Dtype16=float16 -D as_Dtype=as_float -D as_Dtype2=as_float2 -D as_Dtype4=as_float4 -D as_Dtype8=as_float8 -D KERNEL_WIDTH=3 -D KERNEL_HEIGHT=3 -D STRIDE_X=2 -D STRIDE_Y=2 -D DILATION_X=1 -D DILATION_Y=1 -D KERNEL_BASIC -cl-fast-relaxed-math -D ConvolveBasic=BASIC_k3x3_cn3_g1_s2x2_d1x1_b1_in256x256_p1x1_num1_M32_activ1_eltwise0_FP32_4_1_1_1 -D CHANNELS=3 -D APPLY_BIAS=1 -D OUTPUT_Z=32 -D ZPAR=1 -D FUSED_CONV_RELU=1, errmsg: [W] Sat Mar 9 14:55:09 2019: Warnings in precompilation:
[W] Sat Mar 9 14:55:09 2019: :1484:1: warning: null character ignored
<U+0000>
^
1 warning generated.

[E] Sat Mar 9 14:55:09 2019: Errors in precompilation:
[E] Sat Mar 9 14:55:09 2019: ERROR: Invalid value (Producer: 'LLVM6.0.0svn' Reader: 'LLVM 3.9.1')
/usr/bin/llvm-link: /usr/local/include/vc4cl-stdlib/VC4CLStdLib.bc: error: Corrupted bitcode
/usr/bin/llvm-link: error loading file '/usr/local/include/vc4cl-stdlib/VC4CLStdLib.bc'

It appears to have 2 issues, the command argument, -cl-no-subgroup-ifp, and 'corrupted bitcode' for the file VC4CLStdLib.bc

Can you advise on how to fix this?

Wrong result?

I am trying to check VC4CL works correctly.
In the following code, the output is differenct from what I expected and the one of NVIDIA env.

kernel

 kernel void hello(global float * x){                                                                                                                                                     
   int ind = get_global_id(0);                                                                                                                                                            
   x[ind] = x[ind] * 2;                                                                                                                                                                   
 }                                                                                                                                                                                        

host code can found in https://github.com/nomaddo/opencl-benchmark/blob/master/gpu.c .
This host code just take kernel file name, function name, num of args, and the length of each argument..
Then execute the kernel, and print all arguments.

 pi@nomaddo-pi3:~/opencl-benchmark$ sudo ./gpu mul.cl hello 1 10                                                                                                                          
 0.000000                                                                                                                                                                                 
 1.000000                                                                                                                                                                                 
 2.000000                                                                                                                                                                                 
 3.000000                                                                                                                                                                                 
 4.000000                                                                                                                                                                                 
 10.000000                                                                                                                                                                                
 6.000000                                                                                                                                                                                 
 7.000000                                                                                                                                                                                 
 8.000000                                                                                                                                                                                 
 9.000000                                                                                                                                                                                 
 Runtime: 0.000419ms   

The expected result (and the one of NVIDIA GPU) is as follows:

nomaddo@nomaddo-AS:~/opencl-benchmark$ ./gpu mul.cl hello 1 10
0.000000
2.000000
4.000000
6.000000
8.000000
10.000000
12.000000
14.000000
16.000000
18.000000
Runtime: 0.000070ms

I use the latest of VC4C built by circleci, and self-compiled VC4CL, which also come from latest source-code.

"clinfo" can't open /dev/mem but "sudo clinfo" worked

I installed VC4CL and try to run "clinfo" ,its show like these:
""
[VC4CL] can't open /dev/mem
[VC4CL] This program should be run as root. Try prefixing command with: sudo
terminate called after throwing an instance of 'std::system_error'
what(): Failed to open /dev/mem: Permission denied
Aborted
""

When i try to run "sudo clinfo", its OK.
When i test with PlaidML by "plaidml-setup", the same error:
""
[VC4CL] can't open /dev/mem
[VC4CL] This program should be run as root. Try prefixing command with: sudo
terminate called after throwing an instance of 'std::system_error'
what(): Failed to open /dev/mem: Permission denied
Aborted
""

When i test with PlaidML by "sudo plaidml-setup", its cannot found device:
""
No supported devices found. Run 'clinfo' and file an issue containing the full output.
""

Anyone can help me.
Thanks.

Compile

Any fast ideas whats going on?

1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 136.308/136.308/136.308/0.000 ms
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - found
-- Found OpenCL: /usr/lib/libOpenCL.so (found version "2.2")
-- Found headers for OpenCL up to version 2.2 located in /usr/include
-- VC4C library found: /usr/lib/libVC4CC.so
-- VC4C compiler header found at: /usr/include/vc4cc/VC4C.h
-- Building with ICD support
-- Checking for one of the modules 'ocl-icd>=1.3'
-- Found Khronos ICD Loader in version 2.2.12 in /usr/lib
-- Enabling register-poking to run kernels
-- found clang-format: /usr/bin/clang-format
-- Configuring done
-- Generating done
-- Build files have been written to: /home/alex/packages/vc4cl_arch/src/build
Scanning dependencies of target generate_icd
[ 3%] Generating VC4CL.icd
[ 3%] Built target generate_icd
Scanning dependencies of target VC4CL
[ 6%] Building CXX object build/CMakeFiles/VC4CL.dir/CommandQueue.cpp.o
[ 10%] Building CXX object build/CMakeFiles/VC4CL.dir/Buffer.cpp.o
c++: fatal error: no input files
compilation terminated.
distcc[1055] ERROR: compile (null) on localhost failed
c++: fatal error: no input files
compilation terminated.
distcc[1056] ERROR: compile (null) on localhost failed
/bin/sh: -g3: command not found
make[2]: *** [build/CMakeFiles/VC4CL.dir/build.make:63: build/CMakeFiles/VC4CL.dir/Buffer.cpp.o] Error 127
make[2]: *** Waiting for unfinished jobs....
/bin/sh: -g3: command not found
make[2]: *** [build/CMakeFiles/VC4CL.dir/build.make:76: build/CMakeFiles/VC4CL.dir/CommandQueue.cpp.o] Error 127
make[1]: *** [CMakeFiles/Makefile2:183: build/CMakeFiles/VC4CL.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
==> ERROR: A failure occurred in build().
Aborting...

Using this:
https://github.com/alexzk1/vc4cl_arch/blob/master/PKGBUILD

Building Debian package on RPi: manual copying required

When building a debian package on RPi using...

mkdir build && cd build

cmake .. -DBUILD_DEBUG=ON -DBUILD_DEB_PACKAGE=ON -DBUILD_TESTING=ON -DCROSS_COMPILE=ON -DBUILD_ICD=ON -DREGISTER_POKE_KERNELS=ON -DIMAGE_SUPPORT=ON

make

cpack -G DEB

an error is output concerning a missing VC4CLIdl.

Pack: Create package using DEB
CPack: Install projects
CPack: - Run preinstall target for: VC4CL
CPack: - Install project: VC4CL
CMake Error at /usr/local/src/VC4CL/build/cmake_install.cmake:44 (file):
  file INSTALL cannot find "/usr/local/src/VC4CL/VC4CL.icd".


CPack Error: Error when generating package: vc4cl

It can be fixed by copying VC4CL.icd to the parent directory

cp VC4CL.icd ..

At some point this should be fixed in the build files... it's a minor issue, I've posted it here for reference.

Running final binary as sudo

...and getting

INFO:0] Initialize OpenCL runtime...
[ INFO:0] Successfully initialized OpenCL cache directory: /root/.cache/opencv/3.4.1/opencl_cache/
[ INFO:0] Preparing OpenCL cache configuration for context: 32-bit--Broadcom--VideoCore_IV_GPU--0_4
OpenCV(3.4.1) Error: Unknown error code -220 (OpenCL error CL_INVALID_VALUE (-30) during call: clGetProgramInfo(handle, CL_PROGRAM_BINARIES, sizeof(ptr), &ptr, NULL)) in getProgramBinary, file /build/opencv/src/opencv-3.4.1/modules/core/src/ocl.cpp, line 3752
[ WARN:0] Can't save OpenCL binary into cache: /root/.cache/opencv/3.4.1/opencl_cache/32-bit--Broadcom--VideoCore_IV_GPU--0_4/imgproc--filterSepRow_e99b92fca8604fe253f3c641802ce117.bin
OpenCV(3.4.1) /build/opencv/src/opencv-3.4.1/modules/core/src/ocl.cpp:3752: error: (-220) OpenCL error CL_INVALID_VALUE (-30) during call: clGetProgramInfo(handle, CL_PROGRAM_BINARIES, sizeof(ptr), &ptr, NULL) in function getProgramBinary

OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('row_filter_C1_D0', dims=2, globalsize=240x544x1, localsize=16x16x1) sync=false
OpenCV(3.4.1) Error: Unknown error code -220 (OpenCL error CL_INVALID_VALUE (-30) during call: clGetProgramInfo(handle, CL_PROGRAM_BINARIES, sizeof(ptr), &ptr, NULL)) in getProgramBinary, file /build/opencv/src/opencv-3.4.1/modules/core/src/ocl.cpp, line 3752
[ WARN:0] Can't save OpenCL binary into cache: /root/.cache/opencv/3.4.1/opencl_cache/32-bit--Broadcom--VideoCore_IV_GPU--0_4/imgproc--filterSepRow_e99b92fca8604fe253f3c641802ce117.bin
OpenCV(3.4.1) /build/opencv/src/opencv-3.4.1/modules/core/src/ocl.cpp:3752: error: (-220) OpenCL error CL_INVALID_VALUE (-30) during call: clGetProgramInfo(handle, CL_PROGRAM_BINARIES, sizeof(ptr), &ptr, NULL) in function getProgramBinary

OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('row_filter_C1_D0', dims=2, globalsize=240x544x1, localsize=16x16x1) sync=false
FPS 0.0205593, Objects: 0
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('row_filter_C1_D0', dims=2, globalsize=240x544x1, localsize=16x16x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('row_filter_C1_D0', dims=2, globalsize=240x544x1, localsize=16x16x1) sync=false

Is it problem of what ? >: Not supported features, rights, opencl?
Made just in case

sudo chmod 777 /root/.cache/opencv/3.4.1/opencl_cache/32-bit--Broadcom--VideoCore_IV_GPU--0_4

and didnt work

Error running

Hello.

I installed all in ubuntu 18.04 and confirmed that VC4C is working.
However, while creating a cl::Buffer I get this error message:

[VC4CL] test_cl.o: ioctl_set_msg failed: -1
[VC4CL] Error in mbox_propertie: Inappropriate ioctl for device

Any idea what this may be?
Thanks in advance.

Segfault on kernel waiting for list of events

doe300, as you know, back on the Qrack project's Raspberry Pi 3 support pull request at unitaryfund/qrack#143, we get a segfault if our kernel calls receive a large list of events to wait on, but not if the wait list is manually waited on before calling the kernel.

The wait list is daisy-chained. We pass only the direct wait list dependencies for the kernel, while the events in the list can depend on their own separate sets of dependencies that aren't in the kernel list.

We have this pattern, (which works):

#if ENABLE_RASPBERRYPI
    clFinish();
#endif

    // Dispatch the primary kernel, to apply the gate.
    cl::Event kernelEvent;
    std::vector<cl::Event> kernelWaitVec = device_context->ResetWaitEvents();
    queue.enqueueNDRangeKernel(ocl.call, cl::NullRange, // kernel, offset
        cl::NDRange(workItemCount), // global number of work items
        cl::NDRange(localGroupSize), // local number (per group)
        &kernelWaitVec, // vector of events to wait for
        &kernelEvent); // handle to wait for the kernel

    queue.flush();

#if ENABLE_RASPBERRYPI
    clFinish();
#endif

The clFinish() method is a user code "soft finish" implementation; all it does is manually wait on the list of events returned by device_context->ResetWaitEvents(), and it clears the list. In the simplest case we've tested, the list of events just read into the set of buffers needed before calling the kernel. We pass these events in the list to the kernel, ans we manually wait on the same buffer loads before exiting our method, to make sure that the loads complete before we could lose ephemeral inputs for the buffers.

This implementation has been tested extensively on NVIDIA GTX cards, Intel Core HDs, Intel Cores, and and Intel Xeons, and it's never been an issue anywhere except on the Raspberry Pi.

I think this kind of fall-through asynchronous behavior is asking a lot of the Raspberry Pi's resources, but I don't think there's a logical problem with the chain of dependencies, if we remove the clFinish() calls for the Raspberry Pi.

Support additional extensions

Candidates:

  • cl_khr_fp16 (OpenCL 1.2 extension specification, section 9.5): could calculate as float, use un/pack modes to load/store
  • cl_khr_image2d_from_buffer (OpenCL 1.2 extension specification, section 9.14): at least for raster texture-formats, this would be no problem.
  • โœ“ cl_khr_initialize_memory (OpenCL 1.2 extension specification, section 9.15): local memory is already initialized to zero by the compiler if no other initial value is explicitly set; private memory is initialized to zero, if configured this way (via the extension).
  • cl_khr_terminate_context (OpenCL 1.2 extension specification, section 9.16): requires the currently running action (e.g. kernel execution) to be aborted, currently not supported.
  • cl_img_cached_allocations (here): the Mailbox supports (un)cached allocation, need to check if it has any side-effects
  • โœ“ cl_khr_create_command_queue: Adds extension function which maps directly to OpenCL 2.x clCreateCommandQueueWithProperties function already supported.

Test debian packages

(On a freshly installed Raspbian distribution) test installation and usage of the three created debian packages built with CircleCI.

Things to test:

  • does VC4C find libLLVM correctly?
  • does VC4C correctly expect the VC4CLStdLib headers to reside in /usr/local/include/vc4cl-stdlib?
  • does VC4C build the PCH correctly on installation in the correct location?
  • do simple compilation tests run with VC4C?
  • is VC4CL correctly configured to use VC4C? Does it find the libVC4CC.so in /usr/local/lib?
  • is VC4CL correctly configured with ICD loader?
  • does e.g. clinfo print correct results?
  • do the tools v3d_info and v3d_profile work?
  • test for various Raspberry versions (model A/B/Zero running on ARMv6 and model 3 running ARMv8)

Test case failed

I am not sure what happned, but test cases seem failed.
Thanks to #1, I have compiled test cases by adhoc-way.

  • Comment out line 18 to avoid Unknown CMake command "ExternalProject_Get_Property".
    And I downloaded libcpptest-lite by-mand, and link TestVC4CL by man-hand
  • Add #define HAS_COMPILER 1
  • Modify VC4CL_VERSION as constant string to avoid compilation-error

Can you teach me what happened?
Or, Can you tell me the way to construct a right env (my env may be wrong...)

pi@nomaddo:~/work/VC4CL/test$ sudo ./TestVC4CL
Running suite 'TestSystem' with 1 tests...
Suite 'TestSystem' finished, 1/1 successful (100%) in 48 microseconds (0.048 ms).
Running suite 'TestPlatform' with 2 tests...
Test 'TestPlatform::testGetPlatformInfo()' failed!
	Suite: TestPlatform
	File: TestPlatform.cpp
	Line: 56
	Failure: Assertion 'platform_config::VERSION.compare(buffer) == 0' failed
Test-method 'TestPlatform::testGetPlatformInfo()' finished with errors!
Suite 'TestPlatform' finished, 1/2 successful (50%) in 401 microseconds (0.401 ms).
Running suite 'TestDevice' with 5 tests...
Suite 'TestDevice' finished, 5/5 successful (100%) in 1665 microseconds (1.665 ms).
Running suite 'TestContext' with 5 tests...
Suite 'TestContext' finished, 5/5 successful (100%) in 198 microseconds (0.198 ms).
Running suite 'TestCommandQueue' with 4 tests...
Suite 'TestCommandQueue' finished, 4/4 successful (100%) in 568 microseconds (0.568 ms).
Running suite 'TestBuffer' with 17 tests...
Suite 'TestBuffer' finished, 17/17 successful (100%) in 1536 microseconds (1.536 ms).
Running suite '' with 0 tests...
Suite '' finished, 0/0 successful (0%) in 0 microseconds (0 ms).
Running suite 'TestProgram' with 11 tests...
Test 'TestProgram::testCompileProgram()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 109
	Failure: Got -3, expected 0
Test-method 'TestProgram::testCompileProgram()' finished with errors!
Test 'TestProgram::testLinkProgram()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 118
	Failure: Got -59, expected 0
Test 'TestProgram::testLinkProgram()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 119
	Failure: Got 0, expected 0x138a06c
Test-method 'TestProgram::testLinkProgram()' finished with errors!
Test 'TestProgram::testUnloadPlatformCompiler()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 129
	Failure: Got 5, expected 13
Test-method 'TestProgram::testUnloadPlatformCompiler()' finished with errors!
Test 'TestProgram::testGetProgramInfo()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 180
	Failure: Got 0, expected -30
Test 'TestProgram::testGetProgramInfo()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 199
	Failure: Got -45, expected 0
Test 'TestProgram::testGetProgramInfo()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 200
	Failure: Got 0, expected 4
Test 'TestProgram::testGetProgramInfo()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 201
	Failure: Got 0, expected 1
Test 'TestProgram::testGetProgramInfo()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 204
	Failure: Got -45, expected 0
Test-method 'TestProgram::testGetProgramInfo()' finished with errors!
Test 'TestProgram::testGetProgramBuildInfo()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 139
	Failure: Got -1, expected 0
Test 'TestProgram::testGetProgramBuildInfo()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 152
	Failure: Got 0, expected 4
Test-method 'TestProgram::testGetProgramBuildInfo()' finished with errors!
Suite 'TestProgram' finished, 6/11 successful (54.54%) in 5657030 microseconds (5657.03 ms).
Running suite 'TestKernel' with 13 tests...
Test 'TestKernel::testCreateKernel()' failed!
	Suite: TestKernel
	File: TestKernel.cpp
	Line: 62
	Failure: Got -45, expected 0
Test 'TestKernel::testCreateKernel()' failed!
	Suite: TestKernel
	File: TestKernel.cpp
	Line: 63
	Failure: Assertion 'kernel != NULL' failed
Test-method 'TestKernel::testCreateKernel()' finished with errors!
Test 'TestKernel::testCreateKernelsInProgram()' failed!
	Suite: TestKernel
	File: TestKernel.cpp
	Line: 71
	Failure: Got -45, expected 0
Test 'TestKernel::testCreateKernelsInProgram()' failed!
	Suite: TestKernel
	File: TestKernel.cpp
	Line: 72
	Failure: Got 0, expected 1
Segmentation fault

Switch to gcc 6.3 as minimum requirement?

Currently, VC4C and VC4CL are held compatible with the ancient GCC 4.8 to support older Raspbian systems. GCC 4.8 barely supports C++11 [1] and hence lacks a lot of useful new features provided in C++14 and C++17. The latest Raspian Stretch ships with GCC 6.3, which completely supports C++14 as well as a lot of C++17 features [1].

Some useful C++14/C++17 features we could use when switching [1]:

  • Full support for variadic templates
  • Extended constexpr functions
  • __has_include e.g. for distinguishing LLVM versions
  • New library features: string_view, file-system support, optional and any in experimental namespace

Since the official raspberrypi-tools repository has not yet been upgraded to GCC 6.3 (see here), we would need to use another tool-chain for our CircleCI cross-compilation such as this one (see also here).

So the question is:
Should we drop support for GCC 4.8 and therefore for older Raspbian systems?

[1] https://gcc.gnu.org/projects/cxx-status.html

error while linking

on arch linux arm aarch64, clang7

Scanning dependencies of target generate_icd
[  3%] Generating VC4CL.icd
[  3%] Built target generate_icd
Scanning dependencies of target VC4CL
[  7%] Building CXX object src/CMakeFiles/VC4CL.dir/barriers.cpp.o
[ 10%] Building CXX object src/CMakeFiles/VC4CL.dir/Buffer.cpp.o
[ 14%] Building CXX object src/CMakeFiles/VC4CL.dir/CommandQueue.cpp.o
[ 17%] Building CXX object src/CMakeFiles/VC4CL.dir/common.cpp.o
[ 21%] Building CXX object src/CMakeFiles/VC4CL.dir/Context.cpp.o
[ 25%] Building CXX object src/CMakeFiles/VC4CL.dir/Device.cpp.o
[ 28%] Building CXX object src/CMakeFiles/VC4CL.dir/Event.cpp.o
[ 32%] Building CXX object src/CMakeFiles/VC4CL.dir/executor.cpp.o
/home/as3ii/VC4CL/src/executor.cpp: In function 'unsigned int AS_GPU_ADDRESS(const unsigned int*, vc4cl::DeviceBuffer*)':
/home/as3ii/VC4CL/src/executor.cpp:39:24: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     const char* tmp = *reinterpret_cast<const char**>(&ptr);
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/as3ii/VC4CL/src/executor.cpp: In function 'cl_int executeKernel(vc4cl::KernelExecution&)':
/home/as3ii/VC4CL/src/executor.cpp:261:79: warning: conversion from 'long unsigned int' to 'uint32_t' {aka 'unsigned int'} may change value
-Wconversion]
     uint32_t stackFrameSize = kernel->program->moduleInfo.getStackFrameSize() * sizeof(uint64_t);
                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
[ 35%] Building CXX object src/CMakeFiles/VC4CL.dir/extensions.cpp.o
[ 39%] Building CXX object src/CMakeFiles/VC4CL.dir/icd_loader.cpp.o
[ 42%] Building CXX object src/CMakeFiles/VC4CL.dir/Image.cpp.o
[ 46%] Building CXX object src/CMakeFiles/VC4CL.dir/Kernel.cpp.o
[ 50%] Building CXX object src/CMakeFiles/VC4CL.dir/Mailbox.cpp.o
[ 53%] Building CXX object src/CMakeFiles/VC4CL.dir/ObjectTracker.cpp.o
[ 57%] Building CXX object src/CMakeFiles/VC4CL.dir/PerformanceCounter.cpp.o
/home/as3ii/VC4CL/src/PerformanceCounter.cpp: In member function 'cl_int vc4cl::PerformanceCounter::getValue(cl_uint*) const':
/home/as3ii/VC4CL/src/PerformanceCounter.cpp:43:40: warning: conversion from 'int64_t' {aka 'long int'} to 'cl_uint' {aka 'unsigned int'} may change value [-Wconversion]
     *value = V3D::instance().getCounter(index);
              ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
[ 60%] Building CXX object src/CMakeFiles/VC4CL.dir/Platform.cpp.o
[ 64%] Building CXX object src/CMakeFiles/VC4CL.dir/Program.cpp.o
[ 67%] Building CXX object src/CMakeFiles/VC4CL.dir/queue_handler.cpp.o
[ 71%] Building CXX object src/CMakeFiles/VC4CL.dir/TextureFormat.cpp.o
[ 75%] Building CXX object src/CMakeFiles/VC4CL.dir/V3D.cpp.o
[ 78%] Linking CXX shared library libVC4CL.so
[ 78%] Built target VC4CL
Scanning dependencies of target v3d_profile
[ 82%] Building CXX object tools/CMakeFiles/v3d_profile.dir/V3DProfile.cpp.o
In file included from /home/as3ii/VC4CL/tools/../src/V3D.h:10,
                 from /home/as3ii/VC4CL/tools/common.h:10,
                 from /home/as3ii/VC4CL/tools/V3DProfile.cpp:7:
/home/as3ii/VC4CL/tools/../src/common.h:46:106: warning: ignoring attributes on template argument 'cl_int' {aka 'int'} [-Wignored-attributes]
     CHECK_RETURN typename std::enable_if<std::is_arithmetic<T>::value | std::is_pointer<T>::value, cl_int>::type
                                                                                                          ^
[ 85%] Linking CXX executable v3d_profile
/usr/bin/ld: ../src/libVC4CL.so.0.4: undefined reference to `bcm_host_deinit'
/usr/bin/ld: ../src/libVC4CL.so.0.4: undefined reference to `bcm_host_init'
/usr/bin/ld: ../src/libVC4CL.so.0.4: undefined reference to `bcm_host_get_peripheral_address'
collect2: error: ld returned 1 exit status
make[2]: *** [tools/CMakeFiles/v3d_profile.dir/build.make:88: tools/v3d_profile] Error 1
make[1]: *** [CMakeFiles/Makefile2:247: tools/CMakeFiles/v3d_profile.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

Milestone: Get OpenCL Caffe to run on VC4CL

Hi,
I am the maintainer of OpenCL Caffe (https://github.com/naibaf7/caffe) and (https://github.com/BVLC/caffe/tree/opencl).

I would like to get this running on VC4CL, but I am facing some issues. But at it's core, it should be possible because Caffe can be run solely based on internal OpenCL kernels now (no external OpenCL library dependency, except for ViennaCL's infrastructure (but not kernels). This also means full control on what work group sizes are required by the kernels, and it's possible to add quirks/workarounds specific to VC4CL.

Is there a way to chat with @doe300 directly, as this may get quite involved (due to the scope of OpenCL Caffe).

Compilation Error

Hello.
I'm using a Raspberry Pi 3 to optimise some code. I've already run some test programs like summing two arrays, etc.
Now, I'm stuck with the following error.

[D] Thu Mar  7 17:04:25 2019: Compiling 'src/libviso2/libviso2/kernels/get_inliers.cl' into 'out' with optimization level 2 and options '' ...
[D] Thu Mar  7 17:04:25 2019: Temporary file '/tmp/vc4c-EQ5dwJ' created
[D] Thu Mar  7 17:04:25 2019: Temporary file '/tmp/vc4c-i0dILi' created
[I] Thu Mar  7 17:04:25 2019: Compiling OpenCL to LLVM-IR with: /usr/bin/clang-3.9 -cc1 -triple spir-unknown-unknown  -I src/libviso2/libviso2/kernels -O3 -ffp-contract=off -cl-std=CL1.2 -cl-kernel-arg-info -cl-single-precision-constant  -Wno-undefined-inline -Wno-unused-parameter -Wno-unused-local-typedef -Wno-gcc-compat -finclude-default-header -include /usr/local/include/vc4cl-stdlib/defines.h -x cl -S -emit-llvm-bc -o /tmp/vc4c-i0dILi src/libviso2/libviso2/kernels/get_inliers.cl
[E] Thu Mar  7 17:04:25 2019: Errors in precompilation:
[E] Thu Mar  7 17:04:25 2019: Stack dump:
0.	Program arguments: /usr/bin/clang-3.9 -cc1 -triple spir-unknown-unknown -I src/libviso2/libviso2/kernels -O3 -ffp-contract=off -cl-std=CL1.2 -cl-kernel-arg-info -cl-single-precision-constant -Wno-undefined-inline -Wno-unused-parameter -Wno-unused-local-typedef -Wno-gcc-compat -finclude-default-header -include /usr/local/include/vc4cl-stdlib/defines.h -x cl -S -emit-llvm-bc -o /tmp/vc4c-i0dILi src/libviso2/libviso2/kernels/get_inliers.cl 
Segmentation fault

[E] Thu Mar  7 17:04:25 2019:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x76aa9d44]
[E] Thu Mar  7 17:04:25 2019:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x8200ec [0x76dc00ec]
[E] Thu Mar  7 17:04:25 2019:  (3) /usr/local/lib/libVC4CC.so.1.2 : +0x820498 [0x76dc0498]
[E] Thu Mar  7 17:04:25 2019:  (4) /usr/local/lib/libVC4CC.so.1.2 : vc4c::precompilation::compileOpenCLWithDefaultHeader(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x64 [0x76dc0860]
[E] Thu Mar  7 17:04:25 2019:  (5) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&), void (*)(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::_M_invoke(std::_Any_data const&, vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x60 [0x76dc525c]
[E] Thu Mar  7 17:04:25 2019:  (6) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0x78 [0x76dc4200]
[E] Thu Mar  7 17:04:25 2019:  (7) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> vc4c::precompilation::chainSteps<(vc4c::SourceType)3, (vc4c::SourceType)1, (vc4c::SourceType)3>(std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&, std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&)::{lambda(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)#1}::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0xa0 [0x76dc336c]
[E] Thu Mar  7 17:04:25 2019:  (8) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&), std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> vc4c::precompilation::chainSteps<(vc4c::SourceType)3, (vc4c::SourceType)1, (vc4c::SourceType)3>(std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&, std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&)::{lambda(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)#1}>::_M_invoke(std::_Any_data const&, vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x5c [0x76dc5630]
[E] Thu Mar  7 17:04:25 2019:  (9) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0x78 [0x76dc4200]
[E] Thu Mar  7 17:04:25 2019:  (10) /usr/local/lib/libVC4CC.so.1.2 : vc4c::Precompiler::run(std::unique_ptr<std::istream, std::default_delete<std::istream> >&, vc4c::SourceType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::Optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >)+0x608 [0x76dca260]
[E] Thu Mar  7 17:04:25 2019:  (11) /usr/local/lib/libVC4CC.so.1.2 : vc4c::Precompiler::precompile(std::istream&, std::unique_ptr<std::istream, std::default_delete<std::istream> >&, vc4c::Configuration, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::Optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, vc4c::Optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >)+0x164 [0x76dc7fd8]
[E] Thu Mar  7 17:04:25 2019:  (12) /usr/local/lib/libVC4CC.so.1.2 : vc4c::Compiler::compile(std::istream&, std::ostream&, vc4c::Configuration, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::Optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&)+0xc4 [0x76aab5dc]
[E] Thu Mar  7 17:04:25 2019:  (13) VC4C : main+0xf04 [0x375a4]
[E] Thu Mar  7 17:04:25 2019:  (14) /lib/arm-linux-gnueabihf/libc.so.6 : __libc_start_main+0x114 [0x73be5678]
[D] Thu Mar  7 17:04:25 2019: Temporary file '/tmp/vc4c-i0dILi' deleted
[D] Thu Mar  7 17:04:25 2019: Temporary file '/tmp/vc4c-EQ5dwJ' deleted
[E] Thu Mar  7 17:04:25 2019: Compiler threw exception: Pre-compilation: Error in precompilation: Stack dump:
0.	Program arguments: /usr/bin/clang-3.9 -cc1 -triple spir-unknown-unknown -I src/libviso2/libviso2/kernels -O3 -ffp-contract=off -cl-std=CL1.2 -cl-kernel-arg-info -cl-single-precision-constant -Wno-undefined-inline -Wno-unused-parameter -Wno-unused-local-typedef -Wno-gcc-compat -finclude-default-header -include /usr/local/include/vc4cl-stdlib/defines.h -x cl -S -emit-llvm-bc -o /tmp/vc4c-i0dILi src/libviso2/libviso2/kernels/get_inliers.cl 
Segmentation fault

terminate called after throwing an instance of 'vc4c::CompilationError'
  what():  Pre-compilation: Error in precompilation: Stack dump:
0.	Program arguments: /usr/bin/clang-3.9 -cc1 -triple spir-unknown-unknown -I src/libviso2/libviso2/kernels -O3 -ffp-contract=off -cl-std=CL1.2 -cl-kernel-arg-info -cl-single-precision-constant -Wno-undefined-inline -Wno-unused-parameter -Wno-unused-local-typedef -Wno-gcc-compat -finclude-default-header -include /usr/local/include/vc4cl-stdlib/defines.h -x cl -S -emit-llvm-bc -o /tmp/vc4c-i0dILi src/libviso2/libviso2/kernels/get_inliers.cl 
Segmentation fault

Aborted

This happens even in a kernel that had already worked.
The error returned by OpenCL is CL_COMPILE_PROGRAM_FAILURE (-15).

Any ideas?
Thanks in advance.

Edit:

Sometimes, the error is this one...

[D] Thu Mar  7 17:18:02 2019: Compiling 'kernels/kernel.cl' into 'out' with optimization level 2 and options '' ...
[D] Thu Mar  7 17:18:02 2019: Temporary file '/tmp/vc4c-2aReX3' created
[D] Thu Mar  7 17:18:02 2019: Temporary file '/tmp/vc4c-cPNtDX' created
[I] Thu Mar  7 17:18:02 2019: Compiling OpenCL to LLVM-IR with: /usr/bin/clang-3.9 -cc1 -triple spir-unknown-unknown  -I kernels -O3 -ffp-contract=off -cl-std=CL1.2 -cl-kernel-arg-info -cl-single-precision-constant  -Wno-undefined-inline -Wno-unused-parameter -Wno-unused-local-typedef -Wno-gcc-compat -finclude-default-header -include /usr/local/include/vc4cl-stdlib/defines.h -x cl -S -emit-llvm-bc -o /tmp/vc4c-cPNtDX kernels/kernel.cl
[E] Thu Mar  7 17:18:02 2019: Errors in precompilation:
[E] Thu Mar  7 17:18:02 2019: Illegal instruction

[E] Thu Mar  7 17:18:02 2019:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf4 [0x76a8fd44]
[E] Thu Mar  7 17:18:02 2019:  (2) /usr/local/lib/libVC4CC.so.1.2 : +0x8200ec [0x76da60ec]
[E] Thu Mar  7 17:18:02 2019:  (3) /usr/local/lib/libVC4CC.so.1.2 : +0x820498 [0x76da6498]
[E] Thu Mar  7 17:18:02 2019:  (4) /usr/local/lib/libVC4CC.so.1.2 : vc4c::precompilation::compileOpenCLWithDefaultHeader(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x64 [0x76da6860]
[E] Thu Mar  7 17:18:02 2019:  (5) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&), void (*)(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::_M_invoke(std::_Any_data const&, vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x60 [0x76dab25c]
[E] Thu Mar  7 17:18:02 2019:  (6) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0x78 [0x76daa200]
[E] Thu Mar  7 17:18:02 2019:  (7) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> vc4c::precompilation::chainSteps<(vc4c::SourceType)3, (vc4c::SourceType)1, (vc4c::SourceType)3>(std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&, std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&)::{lambda(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)#1}::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0xa0 [0x76da936c]
[E] Thu Mar  7 17:18:02 2019:  (8) /usr/local/lib/libVC4CC.so.1.2 : std::_Function_handler<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&), std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> vc4c::precompilation::chainSteps<(vc4c::SourceType)3, (vc4c::SourceType)1, (vc4c::SourceType)3>(std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&, std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)3>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)> const&)::{lambda(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)#1}>::_M_invoke(std::_Any_data const&, vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)+0x5c [0x76dab630]
[E] Thu Mar  7 17:18:02 2019:  (9) /usr/local/lib/libVC4CC.so.1.2 : std::function<void (vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&)>::operator()(vc4c::precompilation::PrecompilationSource<(vc4c::SourceType)1>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::precompilation::PrecompilationResult<(vc4c::SourceType)3>&) const+0x78 [0x76daa200]
[E] Thu Mar  7 17:18:02 2019:  (10) /usr/local/lib/libVC4CC.so.1.2 : vc4c::Precompiler::run(std::unique_ptr<std::istream, std::default_delete<std::istream> >&, vc4c::SourceType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::Optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >)+0x608 [0x76db0260]
[E] Thu Mar  7 17:18:02 2019:  (11) /usr/local/lib/libVC4CC.so.1.2 : vc4c::Precompiler::precompile(std::istream&, std::unique_ptr<std::istream, std::default_delete<std::istream> >&, vc4c::Configuration, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::Optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, vc4c::Optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >)+0x164 [0x76dadfd8]
[E] Thu Mar  7 17:18:02 2019:  (12) /usr/local/lib/libVC4CC.so.1.2 : vc4c::Compiler::compile(std::istream&, std::ostream&, vc4c::Configuration, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, vc4c::Optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&)+0xc4 [0x76a915dc]
[E] Thu Mar  7 17:18:02 2019:  (13) VC4C : main+0xf04 [0x375a4]
[E] Thu Mar  7 17:18:02 2019:  (14) /lib/arm-linux-gnueabihf/libc.so.6 : __libc_start_main+0x114 [0x73bcb678]
[D] Thu Mar  7 17:18:02 2019: Temporary file '/tmp/vc4c-cPNtDX' deleted
[D] Thu Mar  7 17:18:02 2019: Temporary file '/tmp/vc4c-2aReX3' deleted
[E] Thu Mar  7 17:18:02 2019: Compiler threw exception: Pre-compilation: Error in precompilation: Illegal instruction

terminate called after throwing an instance of 'vc4c::CompilationError'
  what():  Pre-compilation: Error in precompilation: Illegal instruction

Aborted

Are they related?

This is what clinfo returns right now:

Number of platforms                               1
  Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
  Platform Vendor                                 doe300
  Platform Version                                OpenCL 1.2 VC4CL 0.4
  Platform Profile                                EMBEDDED_PROFILE
  Platform Extensions                             cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters
  Platform Extensions function suffix             VC4CL

  Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
Number of devices                                 1
  Device Name                                     VideoCore IV GPU
  Device Vendor                                   Broadcom
  Device Vendor ID                                0xa5c
  Device Version                                  OpenCL 1.2 VC4CL 0.4
  Driver Version                                  0.4
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  EMBEDDED_PROFILE
  Max compute units                               1
  Max clock frequency                             300MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             12x12x12
  Max work group size                             12
  Preferred work group size multiple              <getWGsizes:498: build program : error -15>
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                               16 / 16      
    int                                                 16 / 16      
    long                                                 0 / 0       
    half                                                 0 / 0        (n/a)
    float                                               16 / 16      
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 Yes
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              134217728 (128MiB)
  Error Correction support                        No
  Max memory allocation                           134217728 (128MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             64 bytes
  Alignment of base address                       512 bits (64 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30>
  Global Memory cache line                        64 bytes
  Image support                                   No
  Local memory type                               Global
  Local memory size                               134217728 (128MiB)
  Max constant buffer size                        134217728 (128MiB)
  Max number of constant args                     64
  Max size of kernel argument                     256
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            0
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_nv_pragma_unroll cl_arm_core_id cl_ext_atomic_counters_32 cl_khr_initialize_memory cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_int16

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  OpenCL for the Raspberry Pi VideoCore IV GPU
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [VC4CL]
  clCreateContext(NULL, ...) [default]            Success [VC4CL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
    Device Name                                   VideoCore IV GPU
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
    Device Name                                   VideoCore IV GPU

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

Add support for unified OpenCL headers

Khronos switched the format of its OpenCL headers to have a single version of headers while the OpenCL version to be used is defined by macros (see KhronosGroup/OpenCL-Headers@de26592).
This has not yet any effect on "default" builds, since Raspian still ships with the old format of OpenCL 1.2 specific headers, but introduces problems when the upstream headers are used to build this library.

See also #33 (comment)

Some test cases failing for TestVC4CL (minor)

Good news! I've finally managed to compile and install everything on my RPi and the tests look good...

Suite 'TestImage' finished, 8/10 successful (80%) in 276470 microseconds (276.47 ms).
Suite 'TestExecutions' finished, 13/15 successful (86.66%) in 100462261 microseconds (100462 ms).

Is this an ok result or should it be 100% for both? Are these all test suite or is something missing?

Cmake was called like this:
cmake .. -DBUILD_DEBUG=ON -DBUILD_DEB_PACKAGE=ON -DBUILD_TESTING=ON -DCROSS_COMPILE=ON -DBUILD_ICD=ON -DREGISTER_POKE_KERNELS=ON -DIMAGE_SUPPORT=ON

I am still getting some strange errors but since I build with -DIMAGE_SUPPORT=ON perhaps that is the reason. Attached you find the 4285 line log with -DDEBUG=ON, perhaps it is helpful (or not)

TestVC4CL.log

May I consider VC4CL properly installed and ready for doing something useful with it? (such as porting a tensor flow lite model to OpenCL via code generator?)

clinfo throws an exception

I am begginer in the use of OpenCL, but when I execute the clinfo command, I can check I have one platform with the following information:

Number of platforms                               1
 Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
 Platform Vendor                                 doe300
 Platform Version                                OpenCL 1.2 VC4CL 0.4
 Platform Profile                                EMBEDDED_PROFILE
 Platform Extensions                             cl_khr_il_program cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters
 Platform Extensions function suffix             VC4CL

Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
Number of devices                                 1
 Device Name                                     VideoCore IV GPU

[...]

terminate called after throwing an instance of 'std::runtime_error'
 what():  Failed to enable QPUs!
Aborted

However, it fails and throws an exception before finishing. Could I solve this?

Cmake configuration failed for cross-compilation

Cmake configuration failed when I tried cross-compilation, because the execution find_package(OpenCL REQUIRED) failed.
In this execution, cmake tries to compile some c program using CMAKE_C_COMPILER.
The problem is, this compiler run without any compiler-option to add include directories.
So, even though these exists ${CROSS_COMPILE}/include/CL, finding OpenCL fails.

Probably CMakeLists.txt need to be modified.

My configuration command is as follows. Do I miss some specification?
Of course, /opt/gcc-linaro-7.1.1-2017.08-x86_64_arm-linux-gnueabihf/include/CL exists.
/opt/gcc-linaro-7.1.1-2017.08-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf-gcc -I/opt/gcc-linaro-7.1.1-2017.08-x86_64_arm-linux-gnueabihf/include works correctly.

cmake -DBUILD_TESTING=ON -DCROSS_COMPILE=ON -DCROSS_COMPILER_PATH=/opt/gcc-linaro-7.1.1-2017.08-x86_64_arm-linux-gnueabihf

integration with vc4 DRM driver

Have you considered integration with the vc4 kernel driver for dispatching your CL jobs? It could expose a root-only ioctl taking the start pointers and count and trusting the user. I've got some old code at https://github.com/anholt/linux/commits/rpi-4.4.y-qpu-execute that needed a user.

The other question I have is: For VC4 I was able to make my compiler emit code in such a way that the kernel could verify that we didn't exceed buffer bounds for a given set of buffers with a particular shader (texture samplers have a descriptor, uniform array access gets clamping on the addresses, uniform stream gets reset at branchpoints). Do you think this would be doable for your compiler to remove the root requirement?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.