Git Product home page Git Product logo

Comments (35)

ccll avatar ccll commented on August 22, 2024 4

I'm using Coder as online development environment, and using sysbox in the dev container to enable dockerd-in-docker, now I need to work on a new project that do Machine Learning accelerated by GPU, hope this would be supported.

from sysbox.

evberrypi avatar evberrypi commented on August 22, 2024 2

The use case would be running sysbox to replace some CI build/test steps that need to be ran on special, pet like servers with multiple GPUs in favor of something that can run across various servers. Support for 2 or more GPUs is desired. Auto detecting the # of GPUs and allocating them could work, but preferably specifying the number of GPUs to pass to the container would be ideal and would absolutely meet requirements. Thanks for the prompt reply!

from sysbox.

zhongcloudtian avatar zhongcloudtian commented on August 22, 2024 2

@ctalledo we use sysbox k8s in docker ,hoping k8s runing gpu,but it is errors :
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

from sysbox.

traverseda avatar traverseda commented on August 22, 2024 2

We're using this feature to simulate flocks of robots, where each sysbox runs a replica of our software, as it would run on the robot.

This would be a very useful feature for us, as it lets our simulations stay closer to what our real hardware presents. Essentially we're using a nestybox runtime to simulate a complete robot stack, so we can test things like swarm-SLAM.

This is, presumably, possible without implementing nvidia-container-runtime capabilities, but it makes the job a lot harder and means porting to another compute cluster becomes a nightmare.

from sysbox.

r614 avatar r614 commented on August 22, 2024 1

Thanks for the prompt reply - appreciate it! Would love to know to find out when you guys start work on this, happy to help test it out.

from sysbox.

Darker7 avatar Darker7 commented on August 22, 2024 1

Another Use Case

My university wants to offer ML execution as a CI service in its private GitLab instance, so that it can offer more ML projects in the future.

I'm working on this for my Bachelor's thesis, so I'll have to just accept the inherent danger of --privileged dind or docker.sock mounting, but I'll include a link here for future upgrades :Ü™

from sysbox.

christopherhesse avatar christopherhesse commented on August 22, 2024 1

@ctalledo Thanks for the confirmation! I am running into the issue that GPUs are not supported, since the tests require GPUs.

from sysbox.

dray92 avatar dray92 commented on August 22, 2024 1

@ctalledo, wanted to check to see if there was any appetite for either scoping this work out, or helping build out the capability.
We've been using Sysbox in some areas and are currently exploring sharing GPUs in pods running with K8s+Sysbox. We're looking to support some CUDA workflows that require access to GPUs at runtime.
Happy to contribute and/or help investigate potential solutions if you're open to it.

from sysbox.

rodnymolina avatar rodnymolina commented on August 22, 2024

As expected, gpu-related resources are properly passed inside the sys container ...

$ docker run --runtime=sysbox-runc -it --rm --name test-1 --device=/dev/dri --device=/dev/nvidia0 --device=/dev/nvidiactl --device=/dev/nvidia-modeset --device=/dev/nvidia-uvm --device=/dev/nvidia-uvm-tools --device=/dev/vga_arbiter nestybox/ubuntu-focal-systemd-docker
admin@d9e318c31bec:~$ ls -lrt /dev/dri /dev/nvidia*
crw-rw-rw- 1 nobody nogroup 195, 255 Sep  5 20:30 /dev/nvidiactl
crw-rw-rw- 1 nobody nogroup 195,   0 Sep  5 20:30 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 235,   1 Sep  5 20:30 /dev/nvidia-uvm-tools
crw-rw-rw- 1 nobody nogroup 235,   0 Sep  5 20:30 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 195, 254 Sep  5 20:30 /dev/nvidia-modeset

/dev/dri:
total 0
crw-rw----+ 1 nobody nogroup 226, 128 Sep  5 20:30 renderD128
crw-rw----+ 1 nobody nogroup 226,   0 Sep  5 20:30 card0
admin@d9e318c31bec:~$

<-- Install same nvidia-driver as in the host (i.e. "440"):

rodny@nestybox-srv-01:~$ sudo nvidia-smi
[sudo] password for rodny:
Sat Sep  5 14:17:57 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K620         Off  | 00000000:03:00.0 Off |                  N/A |
| 35%   49C    P8     1W /  30W |    177MiB /  1994MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1438      G   /usr/lib/xorg/Xorg                            31MiB |
|    0      3138      G   /usr/lib/xorg/Xorg                            51MiB |
|    0      3343      G   /usr/bin/gnome-shell                          83MiB |
+-----------------------------------------------------------------------------+
rodny@nestybox-srv-01:~$
admin@d9e318c31bec:~$ sudo apt install nvidia-driver-440

<-- Device/driver properly seen within sys container:

admin@d9e318c31bec:~$ sudo nvidia-smi
[sudo] password for admin:
Sat Sep  5 21:11:47 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K620         Off  | 00000000:03:00.0 Off |                  N/A |
| 34%   48C    P8     1W /  30W |    177MiB /  1994MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
admin@d9e318c31bec:~$

<-- Install nvidia-runtime within sys container:

admin@d9e318c31bec:~$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -

admin@d9e318c31bec:~$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

admin@d9e318c31bec:~$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list

admin@d9e318c31bec:~$ sudo apt-get update

admin@d9e318c31bec:~$ sudo apt-get install -y nvidia-docker2

admin@d9e318c31bec:~$ sudo pkill -SIGHUP dockerd

<-- Check everything's looking good so far:

admin@d9e318c31bec:~$ sudo nvidia-container-cli --load-kmods info
NVRM version:   440.100
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          Quadro K620
Brand:          Quadro
GPU UUID:       GPU-6c5b3240-538b-0d41-f327-285da7535a9c
Bus Location:   00000000:03:00.0
Architecture:   5.0
admin@d9e318c31bec:~$

<-- Finally, let's launch a cuda-app as an L2 container:

admin@d9e318c31bec:~$ docker run --rm --runtime=nvidia -ti nvidia/cuda
Unable to find image 'nvidia/cuda:latest' locally
latest: Pulling from nvidia/cuda
3ff22d22a855: Pull complete
e7cb79d19722: Pull complete
323d0d660b6a: Pull complete
b7f616834fd0: Pull complete
c2607e16e933: Pull complete
46a16da628dc: Pull complete
4871b8b75027: Pull complete
e45235afa764: Pull complete
250da266cf64: Pull complete
78f4b6d02e6c: Pull complete
ebf42dcedf4b: Pull complete
Digest: sha256:0fe0406ec4e456ae682226751434bdd7e9b729a03067d795f9b34c978772b515
Status: Downloaded newer image for nvidia/cuda:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.0, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.
admin@d9e318c31bec:~$

<-- Let's pick an earlier cuda-app image -- looks like our nvidia driver (440) may not be the latest.

admin@d9e318c31bec:~$ docker run --rm --runtime=nvidia -ti nvidia/cuda:10.0-base
Unable to find image 'nvidia/cuda:10.0-base' locally
10.0-base: Pulling from nvidia/cuda
7ddbc47eeb70: Pull complete
c1bbdc448b72: Pull complete
8c3b70e39044: Pull complete
45d437916d57: Pull complete
d8f1569ddae6: Pull complete
de5a2c57c41d: Pull complete
ea6f04a00543: Pull complete
Digest: sha256:e6e1001f286d084f8a3aea991afbcfe92cd389ad1f4883491d43631f152f175e
Status: Downloaded newer image for nvidia/cuda:10.0-base
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: write error: /sys/fs/cgroup/devices/docker/27422711f8b99961b9ac6f6758f2016dc930a4704b2655bcc3943fdfb1eb72df/devices.allow: operation not permitted\\\\n\\\"\"": unknown.
admin@d9e318c31bec:~$

Hmm, need to check what is this nvidia prestart hook doing to trigger an EPERM while mounting this resource ...

from sysbox.

evberrypi avatar evberrypi commented on August 22, 2024

Will this work for multiple GPUs?
eg. --device=/dev/nvidia{0,1,2,3}

from sysbox.

rodnymolina avatar rodnymolina commented on August 22, 2024

@evberrypi, we haven't initiated this effort yet, so we may find surprises that could easily influence this feature's scope, but yes, we do intend to support multiple GPUs.

Ideally, our runtime should be capable of detecting the available GPUs and expose them automatically within the system-container. Alternatively, the user should be able to specify which GPU to utilize inside the container, and have only that one exposed in the container's rootfs.

Would this meet your requirements? Also, if you don't mind, can you please describe what's the use-case / setup you have in mind? Thanks.

from sysbox.

SoloGao avatar SoloGao commented on August 22, 2024

We are also working on this. Nvidia-docker is unable to deal with sysbox-modified userns-remap or cgroups. But no-cgroups = true in /etc/nvidia-container-toolkit/config.toml will cause Failed to initialize NVML: Unknown Error in system docker. As for now, there's a very dirty solution.
First, according to the debug logs provided by nvidia-container-toolkit, Nvidia didn't only mount the devices to the system container. Instead of installing driver with the same version, they just bind mount them. However, when we try to mount the drivers in /usr/lib/x86_64-linux-gnu, it simply tells us operation not permitted.
So, we firstly copy everything end with .so.{nv_drv_ver} to /usr/lib/x86_64-cuda(like here is *.so.455.32.00), then run with this:

docker run --detach --interactive --runtime=sysbox-runc \
--mount type=tmpfs,destination=/proc/driver/nvidia \
--mount type=bind,source=/usr/bin/nvidia-smi,target=/usr/bin/nvidia-smi \
--mount type=bind,source=/usr/bin/nvidia-debugdump,target=/usr/bin/nvidia-debugdump \
--mount type=bind,source=/usr/bin/nvidia-persistenced,target=/usr/bin/nvidia-persistenced \
--mount type=bind,source=/usr/bin/nvidia-cuda-mps-control,target=/usr/bin/nvidia-cuda-mps-control \
--mount type=bind,source=/usr/bin/nvidia-cuda-mps-server,target=/usr/bin/nvidia-cuda-mps-server \
--mount type=bind,source=/usr/lib/x86_64-cuda/libnvidia-ml.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-cuda/libnvidia-cfg.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-cuda/libcuda.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-cuda/libnvidia-opencl.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-cuda/libnvidia-ptxjitcompiler.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-cuda/libnvidia-allocator.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-cuda/libnvidia-compiler.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.455.32.00 \
--mount type=bind,source=/run/nvidia-persistenced/socket,target=/run/nvidia-persistenced/socket \
--device /dev/nvidiactl:/dev/nvidiactl \
--device /dev/nvidia-uvm:/dev/nvidia-uvm \
--device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \
--device /dev/nvidia0:/dev/nvidia0 \
--device /dev/nvidia1:/dev/nvidia1 \
--device /dev/nvidia2:/dev/nvidia2 \
--device /dev/nvidia3:/dev/nvidia3 \
--device /dev/nvidia4:/dev/nvidia4 \
--device /dev/nvidia5:/dev/nvidia5 \
--device /dev/nvidia6:/dev/nvidia6 \
--device /dev/nvidia7:/dev/nvidia7 \
nestybox/ubuntu-focal-systemd-docker:latest

It works but we don't know whether we missed something or something is not useful. Besides, /proc/driver/nvidia should be mounted read only with an overlay, but tmpfs is working too (but nvidia-docker run inside needs /proc/driver/nvidia on the host, need further investigation).
At first logging in, a sudo ldconfig.real is needed, which is also come from the log.
Then, to start inner docker, we also use a similar dirty script:

docker run \
--mount type=tmpfs,destination=/proc/driver/nvidia \
--mount type=bind,source=/usr/bin/nvidia-smi,target=/usr/bin/nvidia-smi \
--mount type=bind,source=/usr/bin/nvidia-debugdump,target=/usr/bin/nvidia-debugdump \
--mount type=bind,source=/usr/bin/nvidia-persistenced,target=/usr/bin/nvidia-persistenced \
--mount type=bind,source=/usr/bin/nvidia-cuda-mps-control,target=/usr/bin/nvidia-cuda-mps-control \
--mount type=bind,source=/usr/bin/nvidia-cuda-mps-server,target=/usr/bin/nvidia-cuda-mps-server \
--mount type=bind,source=/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.455.32.00 \
--mount type=bind,source=/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.455.32.00,target=/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.455.32.00 \
--mount type=bind,source=/run/nvidia-persistenced/socket,target=/run/nvidia-persistenced/socket \
--device /dev/nvidiactl:/dev/nvidiactl \
--device /dev/nvidia-uvm:/dev/nvidia-uvm \
--device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \
--device /dev/nvidia0:/dev/nvidia0 \
--device /dev/nvidia1:/dev/nvidia1 \
--device /dev/nvidia2:/dev/nvidia2 \
--device /dev/nvidia3:/dev/nvidia3 \
--device /dev/nvidia4:/dev/nvidia4 \
--device /dev/nvidia5:/dev/nvidia5 \
--device /dev/nvidia6:/dev/nvidia6 \
--device /dev/nvidia7:/dev/nvidia7 \
nvcr.io/nvidia/cuda:11.1-devel-ubuntu20.04

After starting the inner docker, a ldconfig.real is also needed. Then, gpu can run in the inner docker flawlessly.
Since we only have tesla cards, the graphics function is not taken into consideration.

from sysbox.

ctalledo avatar ctalledo commented on August 22, 2024

Hi @SoloGao, thanks for sharing that info, much appreciated. We've not had the cycles to add Nvidia GPU support to Sysbox yet, but your findings will certainly help when we do so.

Is this something that is high priority for you? What is the use-case you have in mind?

from sysbox.

rodnymolina avatar rodnymolina commented on August 22, 2024

@SoloGao, that's excellent feedback, thanks for that!

First, according to the debug logs provided by nvidia-container-toolkit, Nvidia didn't only mount the devices to the system container. Instead of installing driver with the same version, they just bind mount them. However, when we try to mount the drivers in /usr/lib/x86_64-linux-gnu, it simply tells us operation not permitted.
So, we firstly copy everything end with .so.{nv_drv_ver} to /usr/lib/x86_64-cuda(like here is *.so.455.32.00), then run with this.

We are also thinking about bind-mounting these libraries to avoid copying all these content back and forth. Ideally, sysbox should be able to hide all this complexity from the user, but i'm not sure how far we can go with this zero-touch approach.

The EPERM you are getting is expected as you're attempting to access a host-owned resource while being in a separate user-namespace. We should be able to fix that too.

Then, to start inner docker, we also use a similar dirty script:

Right, we also need to think about how we simplify this process for the user, doesn't look like an easy task given that we are relying on the regular oci-runc at this level.

Let's keep this communication channel open to exchange notes on your findings as well as our planning in this area. This is an important feature for us, we will start working on it asap.

One question for you. Have you tried to share one nvidia device across two separate sys-containers + inner-containers?

from sysbox.

SoloGao avatar SoloGao commented on August 22, 2024

Hello @ctalledo, just a short answer to your question.

Is this something that is high priority for you? What is the use-case you have in mind?

GPGPU support is the must-have features for us. Basically, we are using docker to run GPGPU-intensive tasks like deep-learning on servers. NVIDIA offers lots of pre-configured docker images in https://ngc.nvidia.com and has built a workflow to run different versions of CUDA or DL frameworks easily. To safely start/stop/remove containers by user rather than admin, Podman and Sysbox is the only choices. Moreover, all containers need to expose ports for services like Tensorboard. So in order to make port arrangement manually, we opt to docker in docker with Sysbox.

from sysbox.

SoloGao avatar SoloGao commented on August 22, 2024

Hi @rodnymolina, thanks for the reply and explanation.

Ideally, sysbox should be able to hide all this complexity from the user, but i'm not sure how far we can go with this zero-touch approach.
Right, we also need to think about how we simplify this process for the user, doesn't look like an easy task given that we are relying on the regular oci-runc at this level.

In my opinion, forking the https://github.com/NVIDIA/nvidia-container-toolkit might be a good starting point. NVIDIA offers a set of tools to start docker with GPU support. The docs: https://docs.nvidia.com/datacenter/cloud-native/index.html. Users might only need a patched nvidia-container-toolkit to run system/inner docker with GPGPU support. They use GoLang to do the stuffs, a superset of what I currently did with lots of validation works. However, I don't have that much time digging into the code, so I just grab the results and reproduce them now. Besides, the cgroups problem might need to solve for nvidia-container-toolkit.

Have you tried to share one nvidia device across two separate sys-containers + inner-containers?

Yes, that works flawlessly even on [sys-A(inner-a, inner-b), sys-B(inner-c, inner-d)] scheme.

from sysbox.

rodnymolina avatar rodnymolina commented on August 22, 2024

Thanks for your detailed responses @SoloGao, it all makes sense. Btw, I've already looked at nvidia-toolkit in the past and that's something that will certainly keep in mind.

One last thing. If possible, could you please ping me when have a chance? ([email protected]) There are a couple of points that I would like to clarify about your setup to make sure that we fully address your use-case.

Thanks!

from sysbox.

shinji62 avatar shinji62 commented on August 22, 2024

@rodnymolina any effort on the GPU side ? We will love to see that too.

from sysbox.

rodnymolina avatar rodnymolina commented on August 22, 2024

@shinji62, this is one of the top items in our todo-list, but we haven't had cycles for it yet. It would certainly help if you could provide a brief description of your use-case as that help us prioritize features accordingly (please ping us on Slack if you'd rather discuss this topic there).

from sysbox.

ctalledo avatar ctalledo commented on August 22, 2024

FYI: Another user for Sysbox is looking to use hardware accelerators with Sysbox towards the end of 2021.

from sysbox.

elgalu avatar elgalu commented on August 22, 2024

Me too

from sysbox.

ctalledo avatar ctalledo commented on August 22, 2024

FYI: some GPU functionality does work inside a Sysbox container currently, as described in this comment in issue #452.

from sysbox.

r614 avatar r614 commented on August 22, 2024

Are there any updates on this issue? Our use case is running heavy inference/scientific workflows inside nested containers on a Kubernetes pod.

from sysbox.

ctalledo avatar ctalledo commented on August 22, 2024

Hi Roshan (@r614), unfortunately no updates yet.

As Docker recently acquired Nestybox, we are currently busy integrating Sysbox into Docker Desktop but should get some more cycles to work on Sysbox improvements within a couple of months (and GPU passthrough is one of the top items).

Thanks for your patience!

from sysbox.

kkangle avatar kkangle commented on August 22, 2024

So looking forward to seeing this feature! Is there any plan to release this in 2023?

from sysbox.

ctalledo avatar ctalledo commented on August 22, 2024

Hi @kkangle, we are hoping to get some cycles to work on this soon.

What's the use case you have in mind, if you don't mind sharing?

from sysbox.

zhongcloudtian avatar zhongcloudtian commented on August 22, 2024

@ctalledo Sysbox supports the gpu function. Can we speed up the solution? Now there is a urgent project

from sysbox.

ctalledo avatar ctalledo commented on August 22, 2024

Can we speed up the solution? Now there is a urgent project

Unfortunately we still haven't had the cycles to work on this.

However, there have been some users that have had some limited success exposing GPUs inside Sysbox containers. See here for example.

from sysbox.

zhongcloudtian avatar zhongcloudtian commented on August 22, 2024

@SoloGao
Call in the tensorflow:2.9.1-gpu container in sysbox to check the number of gpus. The execution fails .
Refer to these cases #50 (comment)

from sysbox.

christopherhesse avatar christopherhesse commented on August 22, 2024

@ctalledo We would like to be able to use sysbox to run containerized tests on Nvidia GPUs under kubernetes without privileged mode. I'm curious, do you have an estimate on the amount of work required to add this sort of feature to sysbox?

from sysbox.

ctalledo avatar ctalledo commented on August 22, 2024

Hi @christopherhesse, unfortunately I don't have an estimate at this time for a couple of reasons:

  • I need to fully size-up the work that is required, given the latest changes in Nvidia-Docker integration as well as advancements in the Linux kernel.

  • We've not yet scheduled the work inside Docker, due to other priorities.

If I may ask, what's the big advantage of using Sysbox in your scenario (e.g., why not use regular containers)?

from sysbox.

christopherhesse avatar christopherhesse commented on August 22, 2024

We're unable to launch regular containers inside a kubernetes pod without privileged mode. Is that supported already somehow?

from sysbox.

ctalledo avatar ctalledo commented on August 22, 2024

We're unable to launch regular containers inside a kubernetes pod without privileged mode. Is that supported already somehow?

With Sysbox as the runtime for the pod yes, without it I don't believe so.

from sysbox.

christopherhesse avatar christopherhesse commented on August 22, 2024

@ctalledo Great, then IIUC this would be the big advantage of using Sysbox. Does that answer your question? Within a kubernetes pod, without privileged mode, we want to run a transient container for our tests and it seems like this is the most promising option there.

from sysbox.

ctalledo avatar ctalledo commented on August 22, 2024

@ctalledo Great, then IIUC this would be the big advantage of using Sysbox. Does that answer your question? Within a kubernetes pod, without privileged mode, we want to run a transient container for our tests and it seems like this is the most promising option there.

Hi @christopherhesse, yes that's a very common use case: running Docker inside an unprivileged K8s pod; the pod is launched with K8s + Sysbox. This way you get a lot more isolation / security around the pod as opposed to using a privileged one. Hope that helps; if you run into any trouble let us know please. Thanks.

from sysbox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.