Comments (6)
It seems that there is a logic in the cuFileDriverOpen()
method that assumes specific device mounts that crash when the assumption fails.
It is the same for WSL2.
I shared the information in GDS team and it is a bug. Filed a bug to address the issue.
- https://nvidia.slack.com/archives/CJ5FK152R/p1658945158190859
- https://nvidia.slack.com/archives/CJ5FK152R/p1658945626109219?thread_ts=1658945272.452739&cid=CJ5FK152R
- https://nvidia.slack.com/archives/CJ5FK152R/p1658946372829689?thread_ts=1658945769.993249&cid=CJ5FK152R
from kvikio.
Out of curiosity why would it crash on a cloud VM?
from kvikio.
I don't know, @gigony do you know?
from kvikio.
xref: rapidsai/cucim#346
from kvikio.
Hello @madsbk @gigony ,
Has this issue been resolved?
Im using CUDA-11.7 and still facing the error when installing GDS on a VM:
============
ENVIRONMENT:
============
=====================
DRIVER CONFIGURATION:
=====================
NVMe : Unsupported
NVMeOF : Unsupported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
BeeGFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
--Mellanox PeerDirect : Disabled
--rdma library : Not Loaded (libcufile_rdma.so)
--rdma devices : Not configured
--rdma_device_status : Up: 0 Down: 0
=====================
CUFILE CONFIGURATION:
=====================
properties.use_compat_mode : true
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
profile.nvtx : false
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false
=========
GPU INFO:
=========
GPU index 0 Tesla V100-PCIE-16GB bar:1 bar size (MiB):16384 supports GDS
==============
PLATFORM INFO:
==============
Assertion failure, file index :cufio-udev line :134
from kvikio.
AFAICT, KvikIO should detect this now
from kvikio.
Related Issues (20)
- libkvikio conda packages are built with INTERFACE_COMPILE_DEFINITIONS that require CUDA 12.2 for cuFile Stream APIs
- pip availability HOT 3
- ImportError: libnvcomp.so: cannot open shared object file: No such file or directory HOT 2
- Conda package isn't build with GDS support! HOT 6
- MacOS doesn't have O_DIRECT, so I commented out this line in cpp/include/kvikio/file_handle.hpp:
- [FEA]: Add NVTX annotations to public kvikio functions
- Incompatibility with ARM HOT 5
- Windows support
- Limit number of open files HOT 2
- How Do I Pronounce kvikio? HOT 2
- Deserialize bytes to array on the GPU directly HOT 5
- [BUG] `get_primary_cuda_context` static map is unsafe in three ways
- Very Chatty CPU-only Build HOT 3
- Install the AWS SDK in the devcontainer image
- Allow task tuning within the same process HOT 1
- Verify behavior of `KVIKIO_BOUNCE_BUFFER_SIZE` HOT 2
- Consider changing `KVIKIO_NTHREADS` default to 8 HOT 1
- Add multi-threaded memcpy to kvikIO
- Allow kvikIO to use an externel pinned memory pool
- cmake: explicit enable/disable optional dependencies
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kvikio.