Comments (7)
Hi @blue-cat-whale ,
Can you run the following code in a python interpreter and paste the output?
from ctypes import c_int, byref
from numba import cuda
dv = c_int(0)
cuda.cudadrv.driver.driver.cuDriverGetVersion(byref(dv))
drv_major = dv.value // 1000
drv_minor = (dv.value - (drv_major * 1000)) // 10
run_major, run_minor = cuda.runtime.get_version()
print(f'{drv_major} {drv_minor} {run_major} {run_minor}')
from cudf.
Hi @blue-cat-whale , Can you run the following code in a python interpreter and paste the output?
from ctypes import c_int, byref from numba import cuda dv = c_int(0) cuda.cudadrv.driver.driver.cuDriverGetVersion(byref(dv)) drv_major = dv.value // 1000 drv_minor = (dv.value - (drv_major * 1000)) // 10 run_major, run_minor = cuda.runtime.get_version() print(f'{drv_major} {drv_minor} {run_major} {run_minor}')
(cudf) [root@localhost nn]# python3
Python 3.11.5 (main, Sep 22 2023, 15:34:29) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes import c_int, byref
v.driver.driver.cuDriverGetVersion(byref(dv))
drv_major = dv.value // 1000
drv_minor = (dv.value - (drv_major * 1000)) // 10
run_major, run_minor = cuda.runtime.get_version()
print(f'{drv_major} {drv_minor} {run_major} {run_minor}')>>> from numba import cuda
>>> dv = c_int(0)
>>> cuda.cudadrv.driver.driver.cuDriverGetVersion(byref(dv))
>>> drv_major = dv.value // 1000
>>> drv_minor = (dv.value - (drv_major * 1000)) // 10
>>> run_major, run_minor = cuda.runtime.get_version()
>>> print(f'{drv_major} {drv_minor} {run_major} {run_minor}')
12 0 12 1
>>>
from cudf.
Thanks, this is helpful. Something strange is happening while numba is attempting to check the versions of cuda on your system. I would have expected the above command to show some more useful output, but it looks like we need to debug a bit deeper to reproduce the issue.
Would you be able to run this small python script in the failing environment and paste the output?
import sys
import subprocess
NUMBA_CHECK_VERSION_CMD = """\
from ctypes import c_int, byref
from numba import cuda
dv = c_int(0)
cuda.cudadrv.driver.driver.cuDriverGetVersion(byref(dv))
drv_major = dv.value // 1000
drv_minor = (dv.value - (drv_major * 1000)) // 10
run_major, run_minor = cuda.runtime.get_version()
print(f'{drv_major} {drv_minor} {run_major} {run_minor}')
"""
cp = subprocess.run(
[sys.executable, "-c", NUMBA_CHECK_VERSION_CMD], capture_output=True
)
print(cp.stdout)
from cudf.
Thanks, this is helpful. Something strange is happening while numba is attempting to check the versions of cuda on your system. I would have expected the above command to show some more useful output, but it looks like we need to debug a bit deeper to reproduce the issue.
Would you be able to run this small python script in the failing environment and paste the output?
import sys import subprocess NUMBA_CHECK_VERSION_CMD = """\ from ctypes import c_int, byref from numba import cuda dv = c_int(0) cuda.cudadrv.driver.driver.cuDriverGetVersion(byref(dv)) drv_major = dv.value // 1000 drv_minor = (dv.value - (drv_major * 1000)) // 10 run_major, run_minor = cuda.runtime.get_version() print(f'{drv_major} {drv_minor} {run_major} {run_minor}') """ cp = subprocess.run( [sys.executable, "-c", NUMBA_CHECK_VERSION_CMD], capture_output=True ) print(cp.stdout)
(cudf) [root@localhost test_cuda]# python tmp_0618.py
b'12 0 12 1\n2024-06-19 10:29:28 [DEBUG] After get lib arch_name=cuda lib_name=liborion_client_common.so, version=0, file_path=/root/.orion/lib/cuda4002000/liborion_client_common.so, ret=0.\n2024-06-19 10:29:28 [DEBUG] After get lib arch_name=cuda lib_name=cuda, version=0, file_path=/root/.orion/lib/cuda4002000/libcuda.so, ret=0.\n2024-06-19 10:29:28 [DEBUG] Using group resource 7cbca860-2ac2-4407-9c32-e4503043d0e5\n2024-06-19 10:29:28 [DEBUG] System initialization begin.\n2024-06-19 10:29:28 [DEBUG] Getting Orion resource ...\n2024-06-19 10:29:28 [DEBUG] Checking Unix socket at /var/tmp/orion/comm/orion.sock\n2024-06-19 10:29:28 [DEBUG] Requesting resource through /var/tmp/orion/comm/orion.sock\n2024-06-19 10:29:28 [INFO] Using Orion resource (7cbca860-2ac2-4407-9c32-e4503043d0e5) b103dad8-2472-42e9-bf13-0c95a1aa5b49 : 10.20.154.1:9960/1/1/12000/GPU-00000000-0000-000a-149a-0126e8010100,Allocation_id:b103dad8-2472-42e9-bf13-0c95a1aa5b49\n2024-06-19 10:29:28 [DEBUG] Architecture 66 initialization begin.\n2024-06-19 10:29:28 [INFO] \x1b[33mClient get resource list : 10.20.154.1:9960/1/1/12000/GPU-00000000-0000-000a-149a-0126e8010100,Allocation_id:b103dad8-2472-42e9-bf13-0c95a1aa5b49\x1b[0m\n2024-06-19 10:29:28 [INFO] \x1b[33mRPC mode. Because env ORION_ENABLE_LPC is not 1.\x1b[0m\n2024-06-19 10:29:28 [DEBUG] Skip orionrun initialization.\n2024-06-19 10:29:28 [DEBUG] System initialization is done.\n2024-06-19 10:29:28 [INFO] Releasing Orion resource ...\n'
from cudf.
Thanks @blue-cat-whale . I'm still not sure what the issue is yet. I'll need a little time to dig into this, until I have a better answer, can you try setting the following three environment variables as a workaround, and let me know if you're able to import cudf.pandas
afterwards?
export PTXCOMPILER_CHECK_NUMBA_CODEGEN_PATCH_NEEDED=0
export PTXCOMPILER_KNOWN_DRIVER_VERSION=12.0
export PTXCOMPILER_KNOWN_RUNTIME_VERSION=12.1
from cudf.
Thanks @blue-cat-whale . I'm still not sure what the issue is yet. I'll need a little time to dig into this, until I have a better answer, can you try setting the following three environment variables as a workaround, and let me know if you're able to
import cudf.pandas
afterwards?export PTXCOMPILER_CHECK_NUMBA_CODEGEN_PATCH_NEEDED=0 export PTXCOMPILER_KNOWN_DRIVER_VERSION=12.0 export PTXCOMPILER_KNOWN_RUNTIME_VERSION=12.1
(cudf) [wangyu@localhost test_cuda]$ python tmp_0618.py
b'12 0 12 1\n2024-06-19 10:31:35 [DEBUG] After get lib arch_name=cuda lib_name=liborion_client_common.so, version=0, file_path=/home/wangyu/.orion/lib/cuda4002000/liborion_client_common.so, ret=0.\n2024-06-19 10:31:35 [DEBUG] After get lib arch_name=cuda lib_name=cuda, version=0, file_path=/home/wangyu/.orion/lib/cuda4002000/libcuda.so, ret=0.\n2024-06-19 10:31:35 [DEBUG] Using group resource fb5f4dab-87de-4105-b01d-fd1d10a9cc69\n2024-06-19 10:31:35 [DEBUG] System initialization begin.\n2024-06-19 10:31:35 [DEBUG] Getting Orion resource ...\n2024-06-19 10:31:35 [DEBUG] Checking Unix socket at /var/tmp/orion/comm/orion.sock\n2024-06-19 10:31:35 [DEBUG] Requesting resource through /var/tmp/orion/comm/orion.sock\n2024-06-19 10:31:35 [INFO] Using Orion resource (fb5f4dab-87de-4105-b01d-fd1d10a9cc69) 8040c557-aae9-428b-a221-9ed3145a909a : 10.20.154.1:9960/1/1/12000/GPU-00000000-0000-000a-149a-0126e8010100,Allocation_id:8040c557-aae9-428b-a221-9ed3145a909a\n2024-06-19 10:31:35 [DEBUG] Architecture 66 initialization begin.\n2024-06-19 10:31:35 [INFO] \x1b[33mClient get resource list : 10.20.154.1:9960/1/1/12000/GPU-00000000-0000-000a-149a-0126e8010100,Allocation_id:8040c557-aae9-428b-a221-9ed3145a909a\x1b[0m\n2024-06-19 10:31:35 [INFO] \x1b[33mRPC mode. Because env ORION_ENABLE_LPC is not 1.\x1b[0m\n2024-06-19 10:31:35 [DEBUG] Skip orionrun initialization.\n2024-06-19 10:31:35 [DEBUG] System initialization is done.\n2024-06-19 10:31:35 [INFO] Releasing Orion resource ...\n'
ps. The output in the previous post is also updated. I made a mistake in the old one.
from cudf.
Ok - this makes sense. I think it's fair to treat this is a bug because the way that cuDF parses the cuda versions doesn't account for the possibility of additional stdout and stderr output, and this could be trimmed. I'll put in a PR for this.
As a temporary workaround, I believe the three environment variables above should allow cuDF to import successfully.
from cudf.
Related Issues (20)
- [FEA] Support duplicate column labels in cudf.DataFrame HOT 1
- [FEA] Add CI job to validate that `cudf.pandas` can be imported for all supported minor versions of pandas
- [BUG] result indices in `group_argmin` was not initialized to -1 as comment says
- [FEA] Add support for manual switching from CPU to GPU in `cudf.pandas` HOT 1
- [BUG]: `cudf.concat([empty DataFrame, empty DataFrame])` does not resolve axis types
- [BUG] Consider disabling managed memory in cudf.pandas on WSL2 HOT 4
- [BUG]cannot pip install on linux (Ubuntu) HOT 4
- [BUG] Dask cov operation is broken
- [FEA] Enable pow (and other Unary Ops) for cudf_polars
- [BUG] compute-sanitizer failure on `gtests/PARQUET_TEST` preventing its use in code reading parquet files HOT 7
- [FEA] Add integration testing of cudf.pandas
- [BUG] Illegal Memory Access w/ New cuDF Polars backend
- [FEA] Add testing to help defend against thread-safety errors
- [QST] Cannot dlopen some GPU libraries [can't find cuda driver]. Is this a lower linux kernel version issue? HOT 2
- Update update-version.sh for pylibcudf
- [FEA] HOT 1
- Consider changing the `column_metadata` expectations when converting list types to arrow
- [FEA] Accelerate CI/CD by building and testing only the changed modules HOT 1
- [BUG] Method called on moved-from object (json.hpp:723:73)
- [BUG] When finding `testing` then `enable_language(CUDA)` is called from the CMake config
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cudf.