Some comments, questions and feedback about the SYCL USM proposal, very interesting wo

We'll have to think more on this. Things like device allocations

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[SYCL] [SPEC] USM Feedback about llvm HOT 5 OPEN

intel commented on May 9, 2024

[SYCL] [SPEC] USM Feedback

from llvm.

Comments (5)

jbrodman commented on May 9, 2024

We'll have to think more on this. Things like device allocations especially are really meant to be only accessible on the device they're allocated on, which doesn't map perfectly to the context.
This is possible. However, I do think there is an advantage in having a C API for these allocation routines - it makes it easier to integrate into other C-based programming solutions. C++ wrappers could easily be built on top of a C API.
I really don't like that the spec doesn't have this guarantee as I don't think it matches what a user would expect to happen. I would expect to see a Spec issue opened about this (if there isn't already...).

4.1) No - there is no good reason we can't just do this other than maybe consistency with other approaches.

4.2) Purely a shortcut for enqueueing a memcpy on an OoO queue that doesn't have any dependences. This is not an uncommon case.

Consistency with Linux APIs and other programming models. I get where you're coming from, but I'm not sure it adds a lot of value for this particular method.
Yes, I've received similar feedback from other sources. At a minimum, it should also return what device the pointer was allocated against, which make it easier to say, launch a kernel on that device.

7.1) Appreciated.

7.2) If the device does not support shared allocations, this should probably be zero or an error. Willing to be sold on which option is better.

7.3) I'd only be comfortable with this if the context ONLY contained devices for which this is a true statement. Not immediately clear how to enforce this.

Yeah - you're not being unreasonable here. My only concern about going to the context approach is device allocations. I suppose it's probably ok to be clear about the behavior if you do something naughty. The proposal doesn't really say much about this, but the intent is that there would be optional P2P capabilities that would enable devices to read other devices' memories, which would fit with the context idea.

9.1) This is probably fine.
9.2) Cool.

D'oh - artifact of older draft. The template argument was dropped to avoid ever possibly seeing template<> sycl_malloc<...>(...). My eyes! My eyes!

11.1) Correct.
11.2) I think generic pointers (as an impl detail perhaps) is really the only way to go for this. This is one of the reasons all the USM stuff is opt-in and not positioned to be required by the spec. If you use it, you'd better support it.
11.3) I think this would be easier, but a more proper answer probably depends on a more formal extension mechanism. Probably worth defining a macro that the device compiler could set to specialize code.

Umm - I don't see why not. Hadn't thought about it tbh - happy to accept suggestions.

from llvm.

keryell commented on May 9, 2024

After thinking to it for a while, SYCL is based on modern C++ and I feel that it does not fit well the purpose to have a full extension just written in plain old C at the first place.
I would prefer a good modern C++ API that looks like SYCL and then build a C, Fortran, Python, Cobol, APL... API on top of that, even if it just reusing the C API.

from llvm.

jeffhammond commented on May 9, 2024

I agree with both parties who would like a proper C++ API and not the C-style symbol naming. If C++ users wants C-style names, they can use (template) function aliasing.

from llvm.

bd4 commented on May 9, 2024

As a new user of SYCL+USM, for a mixed Fortran/CUDA/C++ fusion code (genecode.org) and a C++ multi-dimensional array library (https://github.com/wdmapp/gtensor), I'm fine with a C++ API. I likely have to maintain a small C layer anyway to gracefully handle CUDA/HIP/SYCL from Fortran parts of the code. Compared to other challenges, it's just a trivial bit of extra code. The most important thing is having the functionality available. Seems like the other initial users of USM (Kokkos and RAJA?) are already C++, so not an issue for them either.

One issue I am still working through for gtensor (the multi-d array lib) is how to handle the queue object. CUDA is a stateful library, in that it keeps track of a default context and you can set the device, and all the functions use it without having to pass it around. SYCL on the other hand requires you to keep and pass the queue object, particularly in the USM malloc/free calls. gtensor is a header only library, so I'm not sure yet how to handle the queue. I think there are clever ways to handle it, I don't think it's a blocker for us, but maybe something to consider if many other users are running up against it. Perhaps in the form of an example, rather than actually changing any spec.

from llvm.

keryell commented on May 9, 2024

@bd4 maintaining a state is difficult at scale. This is why SYCL, Vulkan, OpenCL... try to avoid it nowadays since high number of CPU cores and accelerators is the norm.
That said, I can hear you. If you want to contribute a nice SYCL wrapper extension provided some thread_local state because it simplifies programming in some cases, I am pretty sure the SYCL committee will look at it. :-)
Same if you have some great ideas about having a wrapper layer in C, even if it is not clear how to deal with the single-source aspects which makes the strength of SYCL...
In the meantime, you can have some thread_local global variables in your header-only library. C++17 inline static initialization helps a lot for header-only libraries.

from llvm.

[SYCL] [SPEC] USM Feedback about llvm HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent