Git Product home page Git Product logo

Comments (22)

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024 1

actually... come to think of it, the CI testing is happening off of ubuntu, and is using your start-local.sh script successfully. I wonder if it's a docker/docker-compose version thing. Not so sure which versions are used in the testing (can add a step that prints them out, would be useful)

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024 1

let me try pulling the latest down now and see if I can get the original (root) problem sorted out so we can close out this issue.

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024 1

funny, I just came back to update this part of the convo, then saw your comment


sudo apt install nvidia-cuda-toolkit seemed to be missing from my system, which I hoped would resolve the (unrelated) --runtime nvidia errors. Interesting how I could run torch on the GPU without it. beats me...

docker run -ti -d --runtime=nvidia --name gputest -p 8888:8888 -e GRANT_SUDO=1 --user root -v $(pwd):/home/jovyan/work gpu-jupyter

still didn't work for the same reason though about unknown runtime

$sudo apt-get install nvidia-container-runtime
Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-container-runtime is already the newest version (3.3.0-1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

So I went to https://github.com/NVIDIA/nvidia-container-runtime and saw that I needed to point my config file to the runtime.

There does seem to be some clashes with nvidia-docker2 (which appear deprecated) https://github.com/NVIDIA/nvidia-docker

Also... "Note that you do not need to install the CUDA toolkit on the host, but the driver needs to be installed", which means I can uninstall it from my system afterall...

docker-compose not compatible with GPUs yet (see github above), so I do still think the shell script should avoid it since that is the default instruction set.

But I diverge... perhaps this really deserves its own issue/discussion. It's not clear to me what is the "correct" way to launch nvidia-docker, but ... That said, I'm just going to keep going on without --runtime and instead using --gpus device=1 so that only my secondary GPU gets passed to docker.


Back to this issue...

good call on the permissions. I was kind of ignoring that part of the message thinking it was some option set in the image, but looking more into it, this was because I'm using docker with its root directory set to a USB device (so I can quickly take all of my work with me on the go, not count on wifi availability). I checked "Disks" and saw that it was being mounted with nosuid.

Switched that to suid, unmounted and remounted the docker partition on the USB drive, and bam, sudo apt update works! woo. can finally close this out.

thanks so much for your help.

from gpu-jupyter.

ChristophSchranz avatar ChristophSchranz commented on September 22, 2024

Are you starting the via the script start-local.sh? It uses the docker-compose.yml that sets:

environment:
GRANT_SUDO: "yes"
JUPYTER_ENABLE_LAB: "yes"
# enable sudo permissions

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

ah, well I had tried to start it with that script but was encountering problems with the build (re: other issue), so I started launching it manually.

< goes and tries again >

okay, yes. encountering same error as I was when I launched it myself with the same flag:

jovyan@ca38d65dfb56:~$ sudo apt update
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
jovyan@ca38d65dfb56:~$ apt update
Reading package lists... Done
E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)
jovyan@ca38d65dfb56:~$ 

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

that first message doesn't show up if I try to launch a cpu-based image from the stacks with the GRANT_SUDO flag, so something's differing about permissions in these builds that I think needs to get sorted out.

also, this docker-compose file doesn't pass a GPU device, so I can't see my card:

jovyan@ca38d65dfb56:~$ nvidia-smi
bash: nvidia-smi: command not found

docker-compose doesn't yet support gpus: docker/compose#6691 (as far as I can tell), so perhaps consider switching to a docker-run command that passes --gpus all

did you really get nvidia-smi working using start-local.sh?

(sorry, this seems like a new issue, but this is the first I'm trying this compose file)

from gpu-jupyter.

ChristophSchranz avatar ChristophSchranz commented on September 22, 2024

I've deleted the images and overlays and rebuild it again as described in the README. It works fine!
Have you installed docker-cuda correctly, i.e., does docker run --runtime nvidia nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi yield your GPU stats?

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

yeah it does, and I've been doing successful testing in your image when launching it manually

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

though, that command is not the one I'm using. what docker version are you using?

I have to run docker run --gpus all nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi

from gpu-jupyter.

ChristophSchranz avatar ChristophSchranz commented on September 22, 2024

Maybe, it works only for some operating systems (Ubuntu 18.04, 19.10 & 20.04 here).
What commands are you using that work? It makes sense for me to exchange the commands to those, that are working for a variety of OSs.

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

I'm running Ubuntu 20.04

docker version
Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:45:44 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:15 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
docker-compose version 1.25.0, build unknown
$ docker run --runtime nvidia nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.

$ docker run --gpus all nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
Sat Jul 11 18:55:18 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:01:00.0  On |                  N/A |
| 31%   60C    P0    27W / 130W |   2016MiB /  5910MiB |     11%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

$ docker run --gpus 0 nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
Sat Jul 11 18:55:24 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:01:00.0  On |                  N/A |
| 31%   59C    P0    27W / 130W |   2016MiB /  5910MiB |     11%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

from gpu-jupyter.

ChristophSchranz avatar ChristophSchranz commented on September 22, 2024

I have the same setup:

docker version
Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:45:36 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:07 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 nvidia:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

docker-compose version
docker-compose version 1.24.1, build 4667896b
docker-py version: 3.7.3
CPython version: 3.6.8
OpenSSL version: OpenSSL 1.1.0j  20 Nov 2018

And all three commands work.

docker run --runtime nvidia nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
docker run --gpus all nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
docker run --gpus 0 nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi

Maybe the nvidia runtime is not found on your setup. Do you have the file /usr/bin/nvidia-container-runtime?
In this post I explained my setup, the section Install NVIDIA Docker is of interest here.

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

hmm. no I do not. looks like we did install it a bit differently afterall.
image
what's the difference though I wonder... I'm still able to run code fine within the container, I just have to use a different syntax to start it up. hm. docker-compose differs slightly too.

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

thanks for the link to the post. I think I can troubleshoot from that.

but perhaps we should figure out what's the "most general / most recent" invocation of passing GPU devices that docker wants, and use that in the shell script. It's very curious to me that I got the CLI but not the runtime (to be fair, I had been setting it up before finding your post)

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

https://docs.docker.com/config/containers/resource_constraints/#gpu
looks like the --gpus all syntax is what they list.

I just installed nvidia-container-runtime with apt and which nvidia-container-runtime-hook works fine, but even with a restart of docker, I'm still getting

docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.

from gpu-jupyter.

ChristophSchranz avatar ChristophSchranz commented on September 22, 2024

That's weird.
Well, I'm about to change the README and also the start-local.sh to a direct docker run instead of the docker-compose dependency (but leave it as optional script), because I've experienced in a Windows setup also difficulties with it. Commit 708643b makes a start. Feel free to adapt!

from gpu-jupyter.

ChristophSchranz avatar ChristophSchranz commented on September 22, 2024

Ohh you've already tested it, sorry!

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

same errors as before when using the jupyter terminal. I generated a fresh dockerfile, and copy/pasted the bash call that was printed, replaced with my port number.
image

adding USER root at the end of the Dockerfile has no effect. Changing GRANT_SUDO = 1 instead of yes does nothing... I'm stumped! completely.

are you really able to run bash start_local.sh and then install packages with apt in the jupyter terminal?
did you add some sort of option to disable sudo? I can't even get it to work when manually launching the docker container from command line.

from gpu-jupyter.

ChristophSchranz avatar ChristophSchranz commented on September 22, 2024

Sorry for not having time so long. I've found now a solution for your initial problem: Try adding --user root to the docker run:

docker run -d -it -p 8848:8888 -v $(pwd)/data:/home/jovyan/work -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root --restart always --name gpu-jupyter_1 gpu-jupyter 

Works for me:
grafik

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

what the actual....

I copy/pasted your command and still got jovyan for whoami Still getting the same error message about su id when I run sudo su...

And I definitely had been passing --user root the whole time.

so confused.

from gpu-jupyter.

mathematicalmichael avatar mathematicalmichael commented on September 22, 2024

going to try a fresh attempt.
./generate-Dockerfile.sh --slim
docker build -t gpu-jupyter .build/
docker run -d -it -p 8848:8888 -v $(pwd)/data:/home/jovyan/work -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root --restart always --name gpu-jupyter_1 gpu-jupyter

I wonder if it's a matter of some mixed up tags/names for images. start_local.sh seems to give the image a different name.

from gpu-jupyter.

ChristophSchranz avatar ChristophSchranz commented on September 22, 2024

I don't understand this either. I guess it is the same if you run bash in the container via docker exec -it [UID] bash, right?

I just review your error messages again, and it says ... or an NFS file system without root priviledges. Could that be the cause?

from gpu-jupyter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.