Git Product home page Git Product logo

Comments (7)

weikengchen avatar weikengchen commented on June 12, 2024

Actually I start to reevaluate my belief systems. Is it true that initializing the CUDA on the parent basically prevent any child to run CUDA? So fork safety is not possible and should not be there--the child should not work.

from sppark.

dot-asm avatar dot-asm commented on June 12, 2024

Just in case, it's not like multiple processes can't use the same GPU at the same time. One naturally has to recognize that they will compete for the same resources, most notably they can't oversubscribe the memory. So that if your main application uses a lot of GPU memory, it will effectively make you think of the GPU as an exclusive resource accessible from a single process. I don't know if CUDA run-time itself is fork-safe, but if it is, there surely are some limitations. I wouldn't be surprised if the limitations are so stringent that it would be virtually impractical to use. I mean I would guess that you can always fork prior to the firstmost CUDA call, otherwise it might be possible to perform some "retention" in the child process, but you won't be able to keep using all the CUDA run-time internal structures after the fork. I'd be surprised if you can use the same device pointers... With all this in mind it probably shouldn't come as a surprise that sppark is not fork-safe, we don't even think about CUDA programming in such terms. Moreover, internally sppark is not even MT-safe in respect to operations on the same GPU. I mean multiple threads can't use the same GPU at the same time [without external synchronization], but multiple threads can use multiple GPUs at the same time. This is also outcome of the "think of the GPU as an exclusive resource" mentality. I mean you tend to parallelize across GPUs, not across workers...

from sppark.

dot-asm avatar dot-asm commented on June 12, 2024

I don't know if CUDA run-time itself is fork-safe, but if it is, there surely are some limitations. I wouldn't be surprised if the limitations are so stringent that it would be virtually impractical to use.

And "virtually impractical" might in fact mean "practically impossible," because with all the layers of abstractions C++ and Rust piles on top of the CUDA, you have virtually no control over when the firstmost CUDA call is made...

from sppark.

dot-asm avatar dot-asm commented on June 12, 2024

internally sppark is not even MT-safe in respect to operations on the same GPU. I mean multiple threads can't use the same GPU at the same time [without external synchronization]

In other words serialization of the operations from multiple threads is considered the application's responsibility. Yes, if you have a lot of smaller operations, ones that don't utilize whole GPU, it would be an inefficient way to use the GPU. Yes, it's a limitation and one can make a case for removing it. Just in case, the sppark development is effectively driven by case studies and there was no compelling case for this so far...

from sppark.

dot-asm avatar dot-asm commented on June 12, 2024

I don't know if CUDA run-time itself is fork-safe

Another reason for not even thinking about it is platform neutrality. More specifically there is no fork on Windows (or should we say in Win32 system interface). Even if you personally don't care about Windows, you have to recognize that CUDA itself is supported on Windows which affects the decisions Nvidia makes in regard to CUDA run-time design. On related note, one can compile sppark on Windows and PoCs are known to work.

from sppark.

weikengchen avatar weikengchen commented on June 12, 2024

yes, the problem I am facing right now is that I couldn't figure out where the firstmost CUDA call is made...

Currently I am doing std::cout debugging... just to know why.
My current situation is that it seems that even though the parent does not touch CUDA, the child fails very quickly...

This issue would be closed after I figure out what happens, just for people in the future...

from sppark.

weikengchen avatar weikengchen commented on June 12, 2024

Going to close this issue. Still haven't figured out the exact issue.

from sppark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.