Comments (8)
It seems the installation of nvtop is the problem here:
gpu-jupyter/src/Dockerfile.gpulibs
Lines 39 to 42 in 69a81e3
It installs the dependencies libnvidia-compute-418 libnvidia-compute-430 libnvidia-compute-530
which might be incompatible.
After removing nvtop it works, however nvcc does not work anymore. I will figure out a solution here.
from gpu-jupyter.
@ChristophSchranz Sorry for the delay but I'm able to run the pushed 1.5 image on a VM with driver version 510 and CUDA 11.6 now
Thanks for all the help!
from gpu-jupyter.
I was able to reproduce your issue on another server, however I can't update packages or reboot at the moment.
Some users say that a reboot helped.
Otherwise, it seems that version 1.5 requires an updated nvidia driver. You are using version 510 but 530 might be required here.
Could you tell me if a reboot helped?
I'm about to build it new on the server where I could reproduce your issue, maybe this will help. Another critical part could be a new installation to fix the pxtas-issue (#93)
gpu-jupyter/src/Dockerfile.gpulibs
Lines 44 to 49 in 69a81e3
from gpu-jupyter.
Thanks for the help!
Rebooting alone did not work, but updating nvidia drivers on the VM from 510->530 resolved the nvml mismatch issue I was seeing. This also updated the VM cuda version to 12.1.
The image is now cuda version 12.1 instead of 11.6, which was what I was expecting based on the name.
(base) root@9aa6ea4bbbce:~# nvidia-smi
Thu Mar 23 16:03:38 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P100-PCIE-16GB On | 00000000:0B:00.0 Off | 0 |
| N/A 27C P0 25W / 250W| 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE-16GB On | 00000000:13:00.0 Off | 0 |
| N/A 27C P0 25W / 250W| 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
(base) root@9aa6ea4bbbce:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
from gpu-jupyter.
That is a strange behavior I also see in my setup similarly: I'm installing Cuda and driver 11.6.2 as described in the medium-blog
sudo apt update
apt policy cuda # check available versions of cuda
sudo apt-get install cuda=11.6.2-1
apt policy nvidia-gds # check available versions of nvidia-gds
sudo apt install nvidia-gds=11.6.2-1
nvcc --verison shows the correct version, but nvidia-smi also Cuda 12.1 (and NVIDIA driver is 530), as seen here
It seems nvidia-smi can show a different version than nvcc, as noted in the nvidia-forum.
So I suppose the current installations in version 1.5 requires CUDA 11.6.2. I found that one of the packages in src/Dockerfile.gpulibs
forces cuda to upgrade which causes the failure for host systems with a CUDA version below 11.6.2
I will find and downgrade this package!
from gpu-jupyter.
Interestingly, nvtop have already made some troubles, see here
I'm looking forward to get rid of it, hopefully all tests pass.
from gpu-jupyter.
I did notice that nvtop wasn't working in my custom build, but it was low on my priority list of things to fix haha
from gpu-jupyter.
Another interesting insight is that the NVIDIA version on which the image is built affects the subsequent installations.
An image built on the driver version 530 leads on a node with version 520 to this error:
nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
However, if the same Dockerfile is built on the node with 520 it works.
I'll built and and push the images now on the server with version 520 and hope its upwards compatible!
@njacobson-nci please check if you can successfully build and run nvidia-smi
using the merged changes on the driver version 510 you are using and if it also works with the pulled version that will be pushed in the next hours on Dockerhub :)
from gpu-jupyter.
Related Issues (20)
- nvcc issue HOT 3
- Files are displayed empty after restart of the container HOT 1
- no 'latest' tag on docker hub HOT 1
- image build fails on Debian 11 when trying follow instructions to build for nvidia/cuda:12.1.0-base-ubuntu20.04 HOT 3
- Error when running your current image with host drivers cuda 12.1 : "Could not load dynamic library 'libnvinfer.so.7'" HOT 3
- jupyternbextension-not found HOT 9
- PyTorch 2 needs CUDA 11.7+ HOT 5
- How to add new packages into the image ? HOT 2
- Unable to change conda environment in kernel HOT 4
- Update CUDA to 11.8 HOT 17
- CUDA version incompatibility HOT 10
- TensorFlow throws missing libdevice errors
- Update to Jupyterlab 4.0.10 HOT 6
- Static Token HOT 1
- torch gpu problem HOT 3
- Upgrade to latest versions (e.g. CUDA 12.3) HOT 3
- Suggest way to use latest pytorch (2.2.2) HOT 4
- Container not accessible from the network with Podman instead of Docker HOT 1
- Use other servers like Jupyverse HOT 1
- tensorflow couldn't use nvidia GPU in v1.7_cuda-12.3_ubuntu-22.04 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpu-jupyter.