I am trying to use Ray ( https://github.co

I don't know if CUDA run-time itself is fork-safe <p di

fork safety issue about sppark HOT 7 CLOSED

supranational commented on June 12, 2024

fork safety issue

from sppark.

Comments (7)

weikengchen commented on June 12, 2024

Actually I start to reevaluate my belief systems. Is it true that initializing the CUDA on the parent basically prevent any child to run CUDA? So fork safety is not possible and should not be there--the child should not work.

from sppark.

dot-asm commented on June 12, 2024

Just in case, it's not like multiple processes can't use the same GPU at the same time. One naturally has to recognize that they will compete for the same resources, most notably they can't oversubscribe the memory. So that if your main application uses a lot of GPU memory, it will effectively make you think of the GPU as an exclusive resource accessible from a single process. I don't know if CUDA run-time itself is fork-safe, but if it is, there surely are some limitations. I wouldn't be surprised if the limitations are so stringent that it would be virtually impractical to use. I mean I would guess that you can always fork prior to the firstmost CUDA call, otherwise it might be possible to perform some "retention" in the child process, but you won't be able to keep using all the CUDA run-time internal structures after the fork. I'd be surprised if you can use the same device pointers... With all this in mind it probably shouldn't come as a surprise that sppark is not fork-safe, we don't even think about CUDA programming in such terms. Moreover, internally sppark is not even MT-safe in respect to operations on the same GPU. I mean multiple threads can't use the same GPU at the same time [without external synchronization], but multiple threads can use multiple GPUs at the same time. This is also outcome of the "think of the GPU as an exclusive resource" mentality. I mean you tend to parallelize across GPUs, not across workers...

from sppark.

dot-asm commented on June 12, 2024

I don't know if CUDA run-time itself is fork-safe, but if it is, there surely are some limitations. I wouldn't be surprised if the limitations are so stringent that it would be virtually impractical to use.

And "virtually impractical" might in fact mean "practically impossible," because with all the layers of abstractions C++ and Rust piles on top of the CUDA, you have virtually no control over when the firstmost CUDA call is made...

from sppark.

dot-asm commented on June 12, 2024

internally sppark is not even MT-safe in respect to operations on the same GPU. I mean multiple threads can't use the same GPU at the same time [without external synchronization]

In other words serialization of the operations from multiple threads is considered the application's responsibility. Yes, if you have a lot of smaller operations, ones that don't utilize whole GPU, it would be an inefficient way to use the GPU. Yes, it's a limitation and one can make a case for removing it. Just in case, the sppark development is effectively driven by case studies and there was no compelling case for this so far...

from sppark.

dot-asm commented on June 12, 2024

I don't know if CUDA run-time itself is fork-safe

Another reason for not even thinking about it is platform neutrality. More specifically there is no fork on Windows (or should we say in Win32 system interface). Even if you personally don't care about Windows, you have to recognize that CUDA itself is supported on Windows which affects the decisions Nvidia makes in regard to CUDA run-time design. On related note, one can compile sppark on Windows and PoCs are known to work.

from sppark.

weikengchen commented on June 12, 2024

yes, the problem I am facing right now is that I couldn't figure out where the firstmost CUDA call is made...

Currently I am doing std::cout debugging... just to know why.
My current situation is that it seems that even though the parent does not touch CUDA, the child fails very quickly...

This issue would be closed after I figure out what happens, just for people in the future...

from sppark.

weikengchen commented on June 12, 2024

Going to close this issue. Still haven't figured out the exact issue.

from sppark.

fork safety issue about sppark HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent