If the executing program has been compiled with -rdc=true</c

Have just found <a href="https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYP

I'm working with <a class="user-mention notranslate" data-hovercard-type="user" data-h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question: Linking RTC code with executing program about jitify HOT 4 CLOSED

nvidia commented on August 29, 2024

Question: Linking RTC code with executing program

from jitify.

Comments (4)

Robadob commented on August 29, 2024

Have just found CU_JIT_GLOBAL_SYMBOL enums, which looks like they might be a less dynamic way of achieving this.

I've modified jitify::experimental::Kernel so that I can pass in arguments nopts, opts, optsvals to the constructor of jtifiy::detail::CUDAKernel.

On passing the 3 CU_JIT_GLOBAL_SYMBOL options, the call to cuModuleLoadDataEx() within the constructor throws CUDA_ERROR_NOT_SUPPORTED (I've added checks to ensure it's not coming from a previous call).

This is not documented as a possible error from that method, so I'm not sure if their lack of support is undocumented or I've done something incorrectly.

I've tested this using Windows, Visual Studio 2015 & 2017, CUDA 10.2, Titan X Pascal (Building for sm_61).

My changes to the above example, and modified Jitify header can be found here, potentially of note I forked the header from the hotfix/more-msvc-issues branch.

from jitify.

benbarsdell commented on August 29, 2024

Unfortunately it is not possible for runtime code to gain access to symbols in the host program.

Jitify does support two related things that may be useful:

Linking external device libraries using -Lpath and -lname options. We could potentially extend this to support linking with ptx/cubin/object files as well if that would be useful.
Accessing __device__ and __constant__ variables declared inside the runtime code via kernel_instantiation.get_constant_ptr() (we will be extending this to support accessing __device__ variables as well soon).

I'm not very familiar with the CU_JIT_GLOBAL_SYMBOL flags, but it looks like they're used to make extern symbols in the runtime code use host addresses instead of looking for device symbols during linking. Is there a reason this is useful for your application vs. say using non-extern symbols in your runtime code and accessing them via get_constant_ptr?

from jitify.

Robadob commented on August 29, 2024

I'm working with @mondus.

We're developing a new framework for developing complex system simulations, whilst abstracting away most of the CUDA (FLAMEGPU2, to improve on usability over FLAMEGPU). He's also interested in developing a runtime compiled Python interface to the library, to make it more accessible, which is where this comes in. (I'm not sure if he told you this much the other day)

Accessing __device__ and __constant__ variables ...

Several common reused items/pointers are stored behind __device__ and __constant__ variables, hence automatic linking with the executing runtime would be convenient. There are several workarounds, such as reducing the number of symbols (by packing items inside a struct) and then manually setting the symbols inside the RTC units as each is compiled (using get_constant_ptr or similar, I hadn't noticed this, but it will require some changes to reduce the number of items we need to set).

One potential downside of this would be that duplicating symbols would inflate constant cache usage, as we're using a large portion of constant cache for a read-only hashtable central to accessing variables within models. For potentially large models with many RTC device functions, this could lead to capacity issues.

Edit: Given we'd be duplicating part/all of that hash table, this could actually fill up constant cache very fast. (The hash table is currently given half the constant cache, simply because it can't be scaled at runtime. We'd need to do some parsing of the RTC code to adapt the size per kernel presumably. This obviously requires some more though on our part.)

Edit2: I suppose we could actually not have the hash table in the main runtime (for a special Python/RTC build), and cat all of the RTC code so it builds as a single unit so we're not duplicating constant memory!

~~But that's a long way off at the stage, so I don't think it's a killer problem.~~

Linking external device libraries using -Lpath and -lname options. ...

At this point I'm not sure how viable it would be to load the entire library from PTX at runtime, as from my understanding that would necessitate require changing large parts of the codebase that interact with CUDA.

I'm assuming, if we built the library a second time to ptx/cubin, and linked against that, it wouldn't also link against the runtime. ~~So your second suggestion is likely the way to go.~~

Edit 3: I suppose the more technically adept approach would be to combined your suggestions, and build a users RTC kernel's separately and link them to achieve similar to edit 2 above.

Thanks for the advice.

from jitify.

Robadob commented on August 29, 2024

@benbarsdell
P.S. could you please add #include <cctype> to jitify.hpp, it doesn't build without this under Visual Studio 2015, due to isdigit and isalpha (removing std:: from them also works but that's probably a less portable solution).

Seems too minor of an issue for it's own ticket, (and the fact it's a deprecated IDE).

from jitify.

Question: Linking RTC code with executing program about jitify HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent