Comments (22)
actually... come to think of it, the CI testing is happening off of ubuntu, and is using your start-local.sh
script successfully. I wonder if it's a docker/docker-compose version thing. Not so sure which versions are used in the testing (can add a step that prints them out, would be useful)
from gpu-jupyter.
let me try pulling the latest down now and see if I can get the original (root) problem sorted out so we can close out this issue.
from gpu-jupyter.
funny, I just came back to update this part of the convo, then saw your comment
sudo apt install nvidia-cuda-toolkit
seemed to be missing from my system, which I hoped would resolve the (unrelated) --runtime nvidia
errors. Interesting how I could run torch on the GPU without it. beats me...
docker run -ti -d --runtime=nvidia --name gputest -p 8888:8888 -e GRANT_SUDO=1 --user root -v $(pwd):/home/jovyan/work gpu-jupyter
still didn't work for the same reason though about unknown runtime
$sudo apt-get install nvidia-container-runtime
Reading package lists... Done
Building dependency tree
Reading state information... Done
nvidia-container-runtime is already the newest version (3.3.0-1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
So I went to https://github.com/NVIDIA/nvidia-container-runtime and saw that I needed to point my config file to the runtime.
There does seem to be some clashes with nvidia-docker2
(which appear deprecated) https://github.com/NVIDIA/nvidia-docker
Also... "Note that you do not need to install the CUDA toolkit on the host, but the driver needs to be installed", which means I can uninstall it from my system afterall...
docker-compose not compatible with GPUs yet (see github above), so I do still think the shell script should avoid it since that is the default instruction set.
But I diverge... perhaps this really deserves its own issue/discussion. It's not clear to me what is the "correct" way to launch nvidia-docker, but ... That said, I'm just going to keep going on without --runtime
and instead using --gpus device=1
so that only my secondary GPU gets passed to docker.
Back to this issue...
good call on the permissions. I was kind of ignoring that part of the message thinking it was some option set in the image, but looking more into it, this was because I'm using docker with its root directory set to a USB device (so I can quickly take all of my work with me on the go, not count on wifi availability). I checked "Disks" and saw that it was being mounted with nosuid
.
Switched that to suid
, unmounted and remounted the docker partition on the USB drive, and bam, sudo apt update
works! woo. can finally close this out.
thanks so much for your help.
from gpu-jupyter.
Are you starting the via the script start-local.sh
? It uses the docker-compose.yml
that sets:
gpu-jupyter/docker-compose.yml
Lines 9 to 12 in 0091efe
from gpu-jupyter.
ah, well I had tried to start it with that script but was encountering problems with the build (re: other issue), so I started launching it manually.
< goes and tries again >
okay, yes. encountering same error as I was when I launched it myself with the same flag:
jovyan@ca38d65dfb56:~$ sudo apt update
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
jovyan@ca38d65dfb56:~$ apt update
Reading package lists... Done
E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)
jovyan@ca38d65dfb56:~$
from gpu-jupyter.
that first message doesn't show up if I try to launch a cpu-based image from the stacks with the GRANT_SUDO flag, so something's differing about permissions in these builds that I think needs to get sorted out.
also, this docker-compose file doesn't pass a GPU device, so I can't see my card:
jovyan@ca38d65dfb56:~$ nvidia-smi
bash: nvidia-smi: command not found
docker-compose doesn't yet support gpus: docker/compose#6691 (as far as I can tell), so perhaps consider switching to a docker-run command that passes --gpus all
did you really get nvidia-smi
working using start-local.sh
?
(sorry, this seems like a new issue, but this is the first I'm trying this compose file)
from gpu-jupyter.
I've deleted the images and overlays and rebuild it again as described in the README. It works fine!
Have you installed docker-cuda correctly, i.e., does docker run --runtime nvidia nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
yield your GPU stats?
from gpu-jupyter.
yeah it does, and I've been doing successful testing in your image when launching it manually
from gpu-jupyter.
though, that command is not the one I'm using. what docker version are you using?
I have to run docker run --gpus all nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
from gpu-jupyter.
Maybe, it works only for some operating systems (Ubuntu 18.04, 19.10 & 20.04 here).
What commands are you using that work? It makes sense for me to exchange the commands to those, that are working for a variety of OSs.
from gpu-jupyter.
I'm running Ubuntu 20.04
docker version
Client: Docker Engine - Community
Version: 19.03.12
API version: 1.40
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:45:44 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.12
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:44:15 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
docker-compose version 1.25.0, build unknown
$ docker run --runtime nvidia nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.
$ docker run --gpus all nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
Sat Jul 11 18:55:18 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 166... Off | 00000000:01:00.0 On | N/A |
| 31% 60C P0 27W / 130W | 2016MiB / 5910MiB | 11% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
$ docker run --gpus 0 nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
Sat Jul 11 18:55:24 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 166... Off | 00000000:01:00.0 On | N/A |
| 31% 59C P0 27W / 130W | 2016MiB / 5910MiB | 11% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
from gpu-jupyter.
I have the same setup:
docker version
Client: Docker Engine - Community
Version: 19.03.12
API version: 1.40
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:45:36 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.12
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:44:07 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
nvidia:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
docker-compose version
docker-compose version 1.24.1, build 4667896b
docker-py version: 3.7.3
CPython version: 3.6.8
OpenSSL version: OpenSSL 1.1.0j 20 Nov 2018
And all three commands work.
docker run --runtime nvidia nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
docker run --gpus all nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
docker run --gpus 0 nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
Maybe the nvidia runtime is not found on your setup. Do you have the file /usr/bin/nvidia-container-runtime
?
In this post I explained my setup, the section Install NVIDIA Docker is of interest here.
from gpu-jupyter.
hmm. no I do not. looks like we did install it a bit differently afterall.
what's the difference though I wonder... I'm still able to run code fine within the container, I just have to use a different syntax to start it up. hm. docker-compose differs slightly too.
from gpu-jupyter.
thanks for the link to the post. I think I can troubleshoot from that.
but perhaps we should figure out what's the "most general / most recent" invocation of passing GPU devices that docker wants, and use that in the shell script. It's very curious to me that I got the CLI but not the runtime (to be fair, I had been setting it up before finding your post)
from gpu-jupyter.
https://docs.docker.com/config/containers/resource_constraints/#gpu
looks like the --gpus all
syntax is what they list.
I just installed nvidia-container-runtime
with apt
and which nvidia-container-runtime-hook
works fine, but even with a restart of docker, I'm still getting
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.
from gpu-jupyter.
That's weird.
Well, I'm about to change the README and also the start-local.sh
to a direct docker run instead of the docker-compose dependency (but leave it as optional script), because I've experienced in a Windows setup also difficulties with it. Commit 708643b makes a start. Feel free to adapt!
from gpu-jupyter.
Ohh you've already tested it, sorry!
from gpu-jupyter.
same errors as before when using the jupyter terminal. I generated a fresh dockerfile, and copy/pasted the bash call that was printed, replaced with my port number.
adding USER root
at the end of the Dockerfile has no effect. Changing GRANT_SUDO = 1
instead of yes
does nothing... I'm stumped! completely.
are you really able to run bash start_local.sh
and then install packages with apt
in the jupyter terminal?
did you add some sort of option to disable sudo
? I can't even get it to work when manually launching the docker container from command line.
from gpu-jupyter.
Sorry for not having time so long. I've found now a solution for your initial problem: Try adding --user root
to the docker run
:
docker run -d -it -p 8848:8888 -v $(pwd)/data:/home/jovyan/work -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root --restart always --name gpu-jupyter_1 gpu-jupyter
from gpu-jupyter.
what the actual....
I copy/pasted your command and still got jovyan
for whoami
Still getting the same error message about su id when I run sudo su
...
And I definitely had been passing --user root
the whole time.
so confused.
from gpu-jupyter.
going to try a fresh attempt.
./generate-Dockerfile.sh --slim
docker build -t gpu-jupyter .build/
docker run -d -it -p 8848:8888 -v $(pwd)/data:/home/jovyan/work -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root --restart always --name gpu-jupyter_1 gpu-jupyter
I wonder if it's a matter of some mixed up tags/names for images. start_local.sh
seems to give the image a different name.
from gpu-jupyter.
I don't understand this either. I guess it is the same if you run bash in the container via docker exec -it [UID] bash
, right?
I just review your error messages again, and it says ... or an NFS file system without root priviledges
. Could that be the cause?
from gpu-jupyter.
Related Issues (20)
- no 'latest' tag on docker hub HOT 1
- image build fails on Debian 11 when trying follow instructions to build for nvidia/cuda:12.1.0-base-ubuntu20.04 HOT 3
- Error when running your current image with host drivers cuda 12.1 : "Could not load dynamic library 'libnvinfer.so.7'" HOT 3
- jupyternbextension-not found HOT 9
- PyTorch 2 needs CUDA 11.7+ HOT 5
- How to add new packages into the image ? HOT 2
- Unable to change conda environment in kernel HOT 4
- Update CUDA to 11.8 HOT 17
- CUDA version incompatibility HOT 10
- TensorFlow throws missing libdevice errors
- Update to Jupyterlab 4.0.10 HOT 6
- Static Token HOT 1
- torch gpu problem HOT 3
- Upgrade to latest versions (e.g. CUDA 12.3) HOT 3
- Suggest way to use latest pytorch (2.2.2) HOT 4
- Container not accessible from the network with Podman instead of Docker HOT 1
- Use other servers like Jupyverse HOT 1
- tensorflow couldn't use nvidia GPU in v1.7_cuda-12.3_ubuntu-22.04 HOT 4
- CUDA 12.1 support? HOT 3
- Error: The provided sha-commit is invalid. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpu-jupyter.