Comments (8)
Looks like I forgot to implement the OpenCL built-in finction convert_float_rte
in the standard library.
Im off-line for the next couple of days, so the fix might take some time.
from vc4cl.
no problem. I look forward to testing this new function once it is added. thank you again for developing this wonderful library!
from vc4cl.
doe300/VC4CLStdLib@0cf109f adds these functions, can you re-check?
from vc4cl.
thanks @doe300. I recompiled vc4clstdlib and vc4c, reinstalled both packages, and reran my benchmark (./run_benchmark1.sh -n 1e5 -J "-DUSE_LL5_RAND"), the previously observed optimizer error no longer showed up.
Unfortunately, this is still not enough to get my kernel to run on the RPI. The program stalled after printing the line
Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DMCX_SRC_PENCIL -DUSE_LL5_RAND
here is the source code related to this message:
https://github.com/fangq/mcxcl/blob/master/src/mcx_host.cpp#L457-L458
it looks to me that the clBuildProgram() call kept running and not returned, otherwise, it either prints an error message, or print the message "build program complete" on line#477.
From the output of the top command, I don't see clang anymore, the mcxcl thread is taking 100% of the CPU. It has been running for over 50 minutes, but still no new message printing. I am going to kill it.
can you suggest how to debug this and find out what is stalling the clBuildProgram from returning?
thanks again.
from vc4cl.
I left the code running overnight, this morning, I saw the below error message on the screen, it must have been terminated many hours into the compilation:
root@raspberrypi:/home/pi/temp/mcxcl/example/benchmark# ./run_benchmark1.sh -n 1e5 -J "-DUSE_LL5_RAND"
==============================================================================
= Monte Carlo eXtreme (MCX) -- OpenCL =
...
- code name: [Vanilla MCXCL] compiled with OpenCL version [1]
- compiled with: [RNG] Logistic-Lattice [Seed Length] 5
initializing streams ... init complete : 1 ms
Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DMCX_SRC_PENCIL -DUSE_LL5_RAND
completed clBuildProgram
completed clGetProgramBuildInfo
Kernel build log:
[W] Fri Mar 2 00:30:27 2018: Warnings in precompilation:
[W] Fri Mar 2 00:30:27 2018: <stdin>:1039:1: warning: null character ignored
<U+0000>
^
1 warning generated.
[W] Fri Mar 2 01:22:50 2018: Register conflict resolver has exceeded its maximum rounds, there might still be errors!
[E] Fri Mar 2 01:22:50 2018: Error assigning local to register: %energylaunched.1606
[E] Fri Mar 2 01:22:50 2018: (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xac [0xb66ca484]
[E] Fri Mar 2 01:22:50 2018: (2) /usr/local/lib/libVC4CC.so.1.2 : vc4c::qpu_asm::GraphColoring::toRegisterMap() const+0x150 [0xb6778e18]
[E] Fri Mar 2 01:22:50 2018: (3) /usr/local/lib/libVC4CC.so.1.2 : vc4c::qpu_asm::CodeGenerator::generateInstructions(vc4c::Method&)+0x464 [0xb6765f4c]
[E] Fri Mar 2 01:22:50 2018: (4) /usr/local/lib/libVC4CC.so.1.2 : +0x2edee8 [0xb66cbee8]
[E] Fri Mar 2 01:22:50 2018: (5) /usr/local/lib/libVC4CC.so.1.2 : +0x2ee0c8 [0xb66cc0c8]
[E] Fri Mar 2 01:22:50 2018: (6) /usr/local/lib/libVC4CC.so.1.2 : +0x2ef7f8 [0xb66cd7f8]
[E] Fri Mar 2 01:22:50 2018: (7) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0xb66d01b4]
[E] Fri Mar 2 01:22:50 2018: (8) /usr/local/lib/libVC4CC.so.1.2 : threading::BackgroundWorker::operator()()::{lambda()#1}::operator()() const+0x58 [0xb66cf3d8]
[E] Fri Mar 2 01:22:50 2018: (9) /usr/local/lib/libVC4CC.so.1.2 : void std::_Bind_simple<threading::BackgroundWorker::operator()()::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>)+0x38 [0xb66d717c]
[E] Fri Mar 2 01:22:50 2018: (10) /usr/local/lib/libVC4CC.so.1.2 : std::_Bind_simple<threading::BackgroundWorker::operator()()::{lambda()#1} ()>::operator()()+0x1c [0xb66d70e8]
[E] Fri Mar 2 01:22:50 2018: (11) /usr/local/lib/libVC4CC.so.1.2 : std::thread::_State_impl<std::_Bind_simple<threading::BackgroundWorker::operator()()::{lambda()#1} ()> >::_M_run()+0x20 [0xb66d7054]
[E] Fri Mar 2 01:22:50 2018: (12) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0xb6e9a9dc]
[E] Fri Mar 2 01:22:51 2018: Background worker threw error: Label/Register Mapping: There are erroneous register-associations!
[E] Fri Mar 2 01:22:53 2018: Compiler threw exception: Label/Register Mapping: There are erroneous register-associations!
MCX ERROR(2):Error: Failed to build program executable! in unit mcx_host.cpp:476
terminate called without an active exception
Aborted
from vc4cl.
My assumption:
The input you are trying to compile is very complex (e.g. has long kernels). The register-allocation is the slowest part of VC4C, especially, if it cannot find a suitable register-mapping for all locals (as in your case). In this case, it will apply some fixes and try again a few times, which of course prolong the execution time even more.
Depending on the structure of the failing kernel, VC4C may not be able to find a valid register-mapping at all, since it doesn't support register-spilling yet.
from vc4cl.
@doe300, sorry for the delay in getting back to you.
MCXCL needs at least 53 registers based on our results on nvidia/amd/intel cpus/gpus. If vc4c does not support spilling, this can be an issue.
what is the maximum number of registers per kernel supported by a VideoCore IV GPU? if there is no such restriction, but has a total size limit for a block, I can perhaps reduce the block size.
also, is there a "wavefront/warp" size? in VideoCore IV? in nvidia hardware, I typically set my block size to 32 or 64; in AMD GPUs, I typically use 64 for block size.
from vc4cl.
A single QPU has 64 + 3/4 registers available. Since kernels are not threaded, every kernel has the same number of available registers. But not all registers can be used in instructions together, also additional registers may be reserved by variables not specified in the kernel source-code.
The VideoCore IV has 12 processors (which can execute independently and limit the work-group size to 12 work-elements, one per QPU) with 16 SIMD-elements each (execute the same code, and can be leveraged by using vector-types, e.g. int
to int16
).
I think the best/only way to solve this is to implement register-spilling, since this problem will occur on any more complex kernel.
from vc4cl.
Related Issues (20)
- mbox_property Connection timed out error HOT 2
- Can we have global work size a multiple of 16? HOT 2
- clEnqueueNDRangeKernel times out on large buffers HOT 13
- [Question] Is VC4CL compatible with BOINC GPU jobs ? HOT 5
- Error building VC4CL HOT 46
- Encountering problems when running darknet on RPI3B+ HOT 29
- Tests fail for Raspberry Pi Zero HOT 6
- Support Raspberry OS 64 bit Bullseye HOT 5
- clinfo can't find VideoCore on Raspi 3B+, and return `Number of platforms 0` HOT 5
- clinfo not recognizing platform
- No way to build with cpplog with Ninja, also it does not find installed cpplog HOT 3
- issues when building tools and tests HOT 5
- Failed to download dependencies when compiling VC4C HOT 3
- OpenCV DNN Module: OpenCL Target Precompilation Error HOT 3
- [Question] Object Detection running with UMat and/or OpenCL target noticeably slower HOT 1
- [performance] VC4CL more than 10x slower than pocl? HOT 9
- Compiler threw exception: Normalizer: Invalid local type for memory area HOT 6
- support arm64 HOT 10
- [Test] How can I build test folder HOT 1
- [System-Error] application hang when launching a simple example (HelloWorld) HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vc4cl.