Comments (7)
Actually I start to reevaluate my belief systems. Is it true that initializing the CUDA on the parent basically prevent any child to run CUDA? So fork safety is not possible and should not be there--the child should not work.
from sppark.
Just in case, it's not like multiple processes can't use the same GPU at the same time. One naturally has to recognize that they will compete for the same resources, most notably they can't oversubscribe the memory. So that if your main application uses a lot of GPU memory, it will effectively make you think of the GPU as an exclusive resource accessible from a single process. I don't know if CUDA run-time itself is fork-safe, but if it is, there surely are some limitations. I wouldn't be surprised if the limitations are so stringent that it would be virtually impractical to use. I mean I would guess that you can always fork prior to the firstmost CUDA call, otherwise it might be possible to perform some "retention" in the child process, but you won't be able to keep using all the CUDA run-time internal structures after the fork. I'd be surprised if you can use the same device pointers... With all this in mind it probably shouldn't come as a surprise that sppark is not fork-safe, we don't even think about CUDA programming in such terms. Moreover, internally sppark is not even MT-safe in respect to operations on the same GPU. I mean multiple threads can't use the same GPU at the same time [without external synchronization], but multiple threads can use multiple GPUs at the same time. This is also outcome of the "think of the GPU as an exclusive resource" mentality. I mean you tend to parallelize across GPUs, not across workers...
from sppark.
I don't know if CUDA run-time itself is fork-safe, but if it is, there surely are some limitations. I wouldn't be surprised if the limitations are so stringent that it would be virtually impractical to use.
And "virtually impractical" might in fact mean "practically impossible," because with all the layers of abstractions C++ and Rust piles on top of the CUDA, you have virtually no control over when the firstmost CUDA call is made...
from sppark.
internally sppark is not even MT-safe in respect to operations on the same GPU. I mean multiple threads can't use the same GPU at the same time [without external synchronization]
In other words serialization of the operations from multiple threads is considered the application's responsibility. Yes, if you have a lot of smaller operations, ones that don't utilize whole GPU, it would be an inefficient way to use the GPU. Yes, it's a limitation and one can make a case for removing it. Just in case, the sppark development is effectively driven by case studies and there was no compelling case for this so far...
from sppark.
I don't know if CUDA run-time itself is fork-safe
Another reason for not even thinking about it is platform neutrality. More specifically there is no fork on Windows (or should we say in Win32 system interface). Even if you personally don't care about Windows, you have to recognize that CUDA itself is supported on Windows which affects the decisions Nvidia makes in regard to CUDA run-time design. On related note, one can compile sppark on Windows and PoCs are known to work.
from sppark.
yes, the problem I am facing right now is that I couldn't figure out where the firstmost CUDA call is made...
Currently I am doing std::cout
debugging... just to know why.
My current situation is that it seems that even though the parent does not touch CUDA, the child fails very quickly...
This issue would be closed after I figure out what happens, just for people in the future...
from sppark.
Going to close this issue. Still haven't figured out the exact issue.
from sppark.
Related Issues (20)
- Implementing Display for error HOT 4
- WebGPU support HOT 1
- Upcoming error in rust crate HOT 2
- sppark/ff/mont_t.cuh(721): error: identifier "i" is undefined HOT 3
- vec256 type declaration HOT 7
- pasta_curve parameters HOT 1
- missing headers in `ff` module HOT 1
- missing definition of basic types HOT 1
- Race condition in msm/sort.cuh HOT 3
- how to understand the implementation of `reduce(uint32_t temp[4])` HOT 3
- Support halo2curves HOT 3
- gl64, the + result is incorrect when both operands > p HOT 1
- Compilation error (observed from pasta-msm) HOT 9
- what is the definition of `const uint32_t& M0` in mont_t.cuh HOT 1
- failed to run custom build command for `sppark v0.1.5' HOT 1
- data copy time fluctuating while concurrent NTT invokation HOT 1
- A question about Montgomery mult (form ff/mont_t.cuh) HOT 1
- Trying to combine MSM and NTT into single kernel HOT 5
- NTT curve support HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sppark.