Git Product home page Git Product logo

Comments (8)

doe300 avatar doe300 commented on May 25, 2024

Looks like I forgot to implement the OpenCL built-in finction convert_float_rte in the standard library.

Im off-line for the next couple of days, so the fix might take some time.

from vc4cl.

fangq avatar fangq commented on May 25, 2024

no problem. I look forward to testing this new function once it is added. thank you again for developing this wonderful library!

from vc4cl.

doe300 avatar doe300 commented on May 25, 2024

doe300/VC4CLStdLib@0cf109f adds these functions, can you re-check?

from vc4cl.

fangq avatar fangq commented on May 25, 2024

thanks @doe300. I recompiled vc4clstdlib and vc4c, reinstalled both packages, and reran my benchmark (./run_benchmark1.sh -n 1e5 -J "-DUSE_LL5_RAND"), the previously observed optimizer error no longer showed up.

Unfortunately, this is still not enough to get my kernel to run on the RPI. The program stalled after printing the line

Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DMCX_SRC_PENCIL -DUSE_LL5_RAND

here is the source code related to this message:

https://github.com/fangq/mcxcl/blob/master/src/mcx_host.cpp#L457-L458

it looks to me that the clBuildProgram() call kept running and not returned, otherwise, it either prints an error message, or print the message "build program complete" on line#477.

From the output of the top command, I don't see clang anymore, the mcxcl thread is taking 100% of the CPU. It has been running for over 50 minutes, but still no new message printing. I am going to kill it.

can you suggest how to debug this and find out what is stalling the clBuildProgram from returning?

thanks again.

from vc4cl.

fangq avatar fangq commented on May 25, 2024

I left the code running overnight, this morning, I saw the below error message on the screen, it must have been terminated many hours into the compilation:

root@raspberrypi:/home/pi/temp/mcxcl/example/benchmark# ./run_benchmark1.sh -n 1e5 -J "-DUSE_LL5_RAND"
==============================================================================
=                       Monte Carlo eXtreme (MCX) -- OpenCL                  =
...
- code name: [Vanilla MCXCL] compiled with OpenCL version [1]
- compiled with: [RNG] Logistic-Lattice [Seed Length] 5
initializing streams ...	init complete : 1 ms
Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DMCX_SRC_PENCIL  -DUSE_LL5_RAND
completed clBuildProgram
completed clGetProgramBuildInfo
Kernel build log:
[W] Fri Mar  2 00:30:27 2018: Warnings in precompilation:
[W] Fri Mar  2 00:30:27 2018: <stdin>:1039:1: warning: null character ignored
<U+0000>
^
1 warning generated.

[W] Fri Mar  2 01:22:50 2018: Register conflict resolver has exceeded its maximum rounds, there might still be errors!
[E] Fri Mar  2 01:22:50 2018: Error assigning local to register: %energylaunched.1606
[E] Fri Mar  2 01:22:50 2018:  (1) /usr/local/lib/libVC4CC.so.1.2 : vc4c::CompilationError::CompilationError(vc4c::CompilationStep, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xac [0xb66ca484]
[E] Fri Mar  2 01:22:50 2018:  (2) /usr/local/lib/libVC4CC.so.1.2 : vc4c::qpu_asm::GraphColoring::toRegisterMap() const+0x150 [0xb6778e18]
[E] Fri Mar  2 01:22:50 2018:  (3) /usr/local/lib/libVC4CC.so.1.2 : vc4c::qpu_asm::CodeGenerator::generateInstructions(vc4c::Method&)+0x464 [0xb6765f4c]
[E] Fri Mar  2 01:22:50 2018:  (4) /usr/local/lib/libVC4CC.so.1.2 : +0x2edee8 [0xb66cbee8]
[E] Fri Mar  2 01:22:50 2018:  (5) /usr/local/lib/libVC4CC.so.1.2 : +0x2ee0c8 [0xb66cc0c8]
[E] Fri Mar  2 01:22:50 2018:  (6) /usr/local/lib/libVC4CC.so.1.2 : +0x2ef7f8 [0xb66cd7f8]
[E] Fri Mar  2 01:22:50 2018:  (7) /usr/local/lib/libVC4CC.so.1.2 : std::function<void ()>::operator()() const+0x40 [0xb66d01b4]
[E] Fri Mar  2 01:22:50 2018:  (8) /usr/local/lib/libVC4CC.so.1.2 : threading::BackgroundWorker::operator()()::{lambda()#1}::operator()() const+0x58 [0xb66cf3d8]
[E] Fri Mar  2 01:22:50 2018:  (9) /usr/local/lib/libVC4CC.so.1.2 : void std::_Bind_simple<threading::BackgroundWorker::operator()()::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>)+0x38 [0xb66d717c]
[E] Fri Mar  2 01:22:50 2018:  (10) /usr/local/lib/libVC4CC.so.1.2 : std::_Bind_simple<threading::BackgroundWorker::operator()()::{lambda()#1} ()>::operator()()+0x1c [0xb66d70e8]
[E] Fri Mar  2 01:22:50 2018:  (11) /usr/local/lib/libVC4CC.so.1.2 : std::thread::_State_impl<std::_Bind_simple<threading::BackgroundWorker::operator()()::{lambda()#1} ()> >::_M_run()+0x20 [0xb66d7054]
[E] Fri Mar  2 01:22:50 2018:  (12) /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 : +0x9c9dc [0xb6e9a9dc]
[E] Fri Mar  2 01:22:51 2018: Background worker threw error: Label/Register Mapping: There are erroneous register-associations!
[E] Fri Mar  2 01:22:53 2018: Compiler threw exception: Label/Register Mapping: There are erroneous register-associations!


MCX ERROR(2):Error: Failed to build program executable! in unit mcx_host.cpp:476
terminate called without an active exception
Aborted


from vc4cl.

doe300 avatar doe300 commented on May 25, 2024

My assumption:
The input you are trying to compile is very complex (e.g. has long kernels). The register-allocation is the slowest part of VC4C, especially, if it cannot find a suitable register-mapping for all locals (as in your case). In this case, it will apply some fixes and try again a few times, which of course prolong the execution time even more.
Depending on the structure of the failing kernel, VC4C may not be able to find a valid register-mapping at all, since it doesn't support register-spilling yet.

from vc4cl.

fangq avatar fangq commented on May 25, 2024

@doe300, sorry for the delay in getting back to you.

MCXCL needs at least 53 registers based on our results on nvidia/amd/intel cpus/gpus. If vc4c does not support spilling, this can be an issue.

what is the maximum number of registers per kernel supported by a VideoCore IV GPU? if there is no such restriction, but has a total size limit for a block, I can perhaps reduce the block size.

also, is there a "wavefront/warp" size? in VideoCore IV? in nvidia hardware, I typically set my block size to 32 or 64; in AMD GPUs, I typically use 64 for block size.

from vc4cl.

doe300 avatar doe300 commented on May 25, 2024

A single QPU has 64 + 3/4 registers available. Since kernels are not threaded, every kernel has the same number of available registers. But not all registers can be used in instructions together, also additional registers may be reserved by variables not specified in the kernel source-code.
The VideoCore IV has 12 processors (which can execute independently and limit the work-group size to 12 work-elements, one per QPU) with 16 SIMD-elements each (execute the same code, and can be leveraged by using vector-types, e.g. intto int16).

I think the best/only way to solve this is to implement register-spilling, since this problem will occur on any more complex kernel.

from vc4cl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.