Git Product home page Git Product logo

generic-device-plugin's Introduction

Kubernetes Generic Device Plugin

The generic-device-plugin enables allocating generic Linux devices, such as serial devices, the FUSE device, or video cameras, to Kubernetes Pods. This allows devices that don't require special drivers to be advertised to the cluster and scheduled, enabling various use-cases, e.g.:

  • accessing video and sound devices;
  • running IoT applications, which often require access to hardware devices; and
  • mounting FUSE filesysems without privileged.

Build Status Go Report Card

Overview

The generic-device-plugin can be configured to discover and allocate any desired device using the --device flag. For example, to advertise all video devices to the cluster, the following flag could be given:

--device='{"name": "video", "groups": [{"paths": [{"path": "/dev/video0"}]}]}'

Now, Pods that require a video capture device, such as an object detection service, could request to be allocated one using the Kubernetes Pod resources field:

resources:
  limits:
    squat.ai/video: 1

The --device flag can be provided multiple times to allow the plugin to discover and allocate different types of resources.

Getting Started

To install the generic-device-plugin, choose what devices should be discovered and deploy the included DaemonSet:

kubectl apply -f https://raw.githubusercontent.com/squat/generic-device-plugin/main/manifests/generic-device-plugin.yaml

Note: the example manifest included in this repository discovers serial devices, the /dev/video0 device, the /dev/fuse device, sound devices, and sound capture devices.

Now, deploy a workload that requests one of the newly discovered resources. For example, the following script could be used to run a Pod that creates an MJPEG stream from a video device on the node:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: mjpeg
  labels:
    app.kubernetes.io/name: mjpeg
spec:
  containers:
  - name: kceu
    image: squat/kubeconeu2019
    command:
    - /cam2ip
    args:
    - --bind-addr=:8080
    ports:
    - containerPort: 8080
      name: http
    resources:
      limits:
        squat.ai/video: 1
EOF

This application could then be accessed by port-forwarding to the Pod:

kubectl port-forward mjpeg http

Now, the MJPEG stream could be opened by pointing a browser to http://localhost:8080/mjpeg.

Usage

Usage of bin/amd64/generic-device-plugin:
      --config string             Path to the config file.
      --device stringArray        The devices to expose. This flag can be repeated to specify multiple device types.
                                  Multiple paths can be given for each type. Paths can be globs.
                                  Should be provided in the form:
                                  {"name": "<name>", "groups": [(device definitions)], "count": <count>}]}
                                  The device definition can be either a path to a device file or a USB device. You cannot define both in the same group.
                                  For device files, use something like: {"paths": [{"path": "<path-1>", "mountPath": "<mount-path-1>"},{"path": "<path-2>", "mountPath": "<mount-path-2>"}]}
                                  For USB devices, use something like: {"usb": [{"vendor": "1209", "product": "000F"}]}
                                  For example, to expose serial devices with different names: {"name": "serial", "groups": [{"paths": [{"path": "/dev/ttyUSB*"}]}, {"paths": [{"path": "/dev/ttyACM*"}]}]}
                                  The device flag can specify lists of devices that should be grouped and mounted into a container together as one single meta-device.
                                  For example, to allocate and mount an audio capture device: {"name": "capture", "groups": [{"paths": [{"path": "/dev/snd/pcmC0D0c"}, {"path": "/dev/snd/controlC0"}]}]}
                                  For example, to expose a CH340 serial converter: {"name": "ch340", "groups": [{"usb": [{"vendor": "1a86", "product": "7523"}]}]}
                                  A "count" can be specified to allow a discovered device group to be scheduled multiple times.
                                  For example, to permit allocation of the FUSE device 10 times: {"name": "fuse", "groups": [{"count": 10, "paths": [{"path": "/dev/fuse"}]}]}
                                  Note: if omitted, "count" is assumed to be 1
      --domain string             The domain to use when when declaring devices. (default "squat.ai")
      --listen string             The address at which to listen for health and metrics. (default ":8080")
      --log-level string          Log level to use. Possible values: all, debug, info, warn, error, none (default "info")
      --plugin-directory string   The directory in which to create plugin sockets. (default "/var/lib/kubelet/device-plugins/")
      --version                   Print version and exit

generic-device-plugin's People

Contributors

aledbf avatar dejanzelic avatar dependabot[bot] avatar duckfullstop avatar gabe565 avatar squat avatar usa-reddragon avatar vigmat28 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

generic-device-plugin's Issues

/dev/ttyUSB0 not detected by device plugin

Hello.

I have a 1-wire USB temperature sensor that sets up /dev/ttyUSB0 as a serial port:

root@excession:~# lsusb | grep Prolific
Bus 001 Device 002: ID 067b:2303 Prolific Technology, Inc. PL2303 Serial Port / Mobile Phone Data Cable

root@excession:~# ls -l /dev/ttyUSB0 
crw-rw---- 1 root dialout 188, 0 Dec  7 08:29 /dev/ttyUSB0

dmesg:

[77172.100651] usb 1-2: new full-speed USB device number 2 using xhci-hcd
[77172.249691] usb 1-2: New USB device found, idVendor=067b, idProduct=2303, bcdDevice= 3.00
[77172.249701] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[77172.249705] usb 1-2: Product: USB-Serial Controller
[77172.249708] usb 1-2: Manufacturer: Prolific Technology Inc.
[77172.255622] pl2303 1-2:1.0: pl2303 converter detected
[77172.256440] usb 1-2: pl2303 converter now attached to ttyUSB0

The sensor works fine directly on the host (Raspberry Pi 5 with k3s) using digitemp, however the device plugin isn't recognising /dev/ttyUSB0. If I set up the USB vendor/product codes, they get picked up and passed through successfully, but the container requires /dev/ttyUSB0 rather than using the USB device directly.

I've tried only with /dev/ttyUSB0, with /dev/ttyUSB*, with both USB and /dev/ttyUSB0 entries, and none of them will cause the plugin to detect /dev/ttyUSB0.

FWIW the USB device is detected and passed through successfully to the container, but unfortunately I need the serial device instead.

Plugin manifest:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: generic-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: generic-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: generic-device-plugin
  template:
    metadata:
      labels:
        app.kubernetes.io/name: generic-device-plugin
    spec:
      priorityClassName: system-node-critical
      tolerations:
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"
      containers:
      - image: squat/generic-device-plugin
        args:
        - --device
        - |
          name: serial
          groups:
            - paths:
                - path: /dev/ttyUSB0
                  mountPath: /dev/ttyUSB0
        - --device
        - |
          name: temperature
          groups:
            - usb:
                - product: 2303
                  vendor: 067b
        - --log-level=debug
        name: generic-device-plugin
        resources:
          requests:
            cpu: 50m
            memory: 10Mi
          limits:
            cpu: 50m
            memory: 20Mi
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          privileged: true
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
        - name: dev
          mountPath: /dev
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
      - name: dev
        hostPath:
          path: /dev
  updateStrategy:
    type: RollingUpdate

kubectl describe node:

...snip...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource              Requests    Limits
  --------              --------    ------
  cpu                   250m (6%)   50m (1%)
  memory                150Mi (1%)  190Mi (2%)
  ephemeral-storage     0 (0%)      0 (0%)
  squat.ai/serial       0           0
  squat.ai/temperature  1           1

Pod logs for device plugin:

root@excession:~# kubectl -n kube-system logs generic-device-plugin-4rl4k
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"squat.ai/serial\".","ts":"2023-12-07T10:26:39.925508384Z"}
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"squat.ai/temperature\".","ts":"2023-12-07T10:26:39.92548018Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"squat.ai/serial","socket":"/var/lib/kubelet/device-plugins/gdp-c3F1YXQuYWkvc2VyaWFs-1701944799.sock","ts":"2023-12-07T10:26:39.925755144Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"squat.ai/temperature","socket":"/var/lib/kubelet/device-plugins/gdp-c3F1YXQuYWkvdGVtcGVyYXR1cmU=-1701944799.sock","ts":"2023-12-07T10:26:39.925851347Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"squat.ai/serial","ts":"2023-12-07T10:26:39.925926033Z"}
{"caller":"plugin.go:176","level":"info","msg":"waiting for the gRPC server to be ready","resource":"squat.ai/serial","ts":"2023-12-07T10:26:39.92594757Z"}
{"caller":"plugin.go:176","level":"info","msg":"waiting for the gRPC server to be ready","resource":"squat.ai/temperature","ts":"2023-12-07T10:26:39.925969774Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"squat.ai/temperature","ts":"2023-12-07T10:26:39.925993218Z"}
{"caller":"plugin.go:188","level":"info","msg":"the gRPC server is ready","resource":"squat.ai/temperature","ts":"2023-12-07T10:26:40.023834494Z"}
{"caller":"plugin.go:226","level":"info","msg":"registering plugin with kubelet","resource":"squat.ai/temperature","ts":"2023-12-07T10:26:40.023891568Z"}
{"caller":"plugin.go:188","level":"info","msg":"the gRPC server is ready","resource":"squat.ai/serial","ts":"2023-12-07T10:26:40.023910124Z"}
{"caller":"plugin.go:226","level":"info","msg":"registering plugin with kubelet","resource":"squat.ai/serial","ts":"2023-12-07T10:26:40.023953827Z"}
{"caller":"generic.go:232","level":"info","msg":"starting listwatch","resource":"squat.ai/temperature","ts":"2023-12-07T10:26:40.123648957Z"}
{"caller":"generic.go:232","level":"info","msg":"starting listwatch","resource":"squat.ai/serial","ts":"2023-12-07T10:26:40.124360291Z"}
{"caller":"usb.go:269","level":"debug","msg":"USB device match","path":"/dev/bus/usb/001/002","resource":"squat.ai/temperature","ts":"2023-12-07T10:26:40.323616625Z","usbdevice":"067b:2303"}
{"caller":"usb.go:269","level":"debug","msg":"USB device match","path":"/dev/bus/usb/001/002","resource":"squat.ai/temperature","ts":"2023-12-07T10:26:45.325038652Z","usbdevice":"067b:2303"}
(...repeats...)

Announcement: New Architecture for 0.1.0!

Hi everyone! I want to announce that the project is nearing 0.1.0 and that in preparation for that release, the device plugin will be r-earchitected into an operator.

As mentioned in #21 (comment), the device plugin will soon have a new Device CRD with which cluster administrators can define device that should be discovered at run time. This will allow for faster and easier configuration of device discovery, without requiring the re-deployment of the DaemonSet across the cluster. For the initial 0.1.0, the spec for the Device CRD will remain virtually identical to the schema for defining devices via the CLI.

Stay tuned!

Device manager does not provide devices to the application container after reboot

I am not sure whether it is real issue or I am doing something wrong ... but

I have prepared my single node microk8s cluster for Home Assistant, installed this device plugin and propagated /dev/ttyUSB0 and /dev/zigbee2 (symlink to the first one) to the "zigbee2mqtt" pod.

After the first installation everything worked well, but after reboot the the "zigbee2mqtt" pod (with the /dev/ttyUSB0 and /dev/zigbee2 imported) didn't start. The pod stood in the state "UnexpectedAdmissionError", the other pod is created which is in state "Pending", falls of, new one is created ... etc.

zmq-5984c5f8cd-fxxhl    0/1     Pending                    0               3m14s
zmq-5984c5f8cd-z7dvd    0/1     UnexpectedAdmissionError   0               9m24s

In the pod description followin error is written (there is no log since pod didn't start):

Events:
  Type     Reason                    Age   From     Message
  ----     ------                    ----  ----     -------
  Warning  UnexpectedAdmissionError  48s   kubelet  Allocate failed due to no healthy devices present; cannot allocate unhealthy devices squat.ai/serial, which is unexpected

The situation repeats after each reboot.

When I kill all pods manually (the device manager one and also the application pods that don't work), new pods are started and everything works.

Here is log of the device manager container after the first boot (the situation, when it doesn't work - doesn't mount devices into the application container)

ha-test@zmh-lip:/home/k8s/_system/kube-system$ kcn logs device-plugin-zigbee-4vdbz
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"squat.ai/zigbee\".","ts":"2024-03-07T13:23:42.934752026Z"}
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"squat.ai/serial\".","ts":"2024-03-07T13:23:41.735530601Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"squat.ai/serial","socket":"/var/lib/kubelet/device-plugins/gdp-c3F1YXQuYWkvc2VyaWFs-1709817821.sock","ts":"2024-03-07T13:23:45.835273691Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"squat.ai/zigbee","socket":"/var/lib/kubelet/device-plugins/gdp-c3F1YXQuYWkvemlnYmVl-1709817821.sock","ts":"2024-03-07T13:23:43.93416597Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"squat.ai/zigbee","ts":"2024-03-07T13:23:57.034351851Z"}
{"caller":"plugin.go:176","level":"info","msg":"waiting for the gRPC server to be ready","resource":"squat.ai/zigbee","ts":"2024-03-07T13:23:57.034340518Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"squat.ai/serial","ts":"2024-03-07T13:23:58.334925943Z"}
{"caller":"plugin.go:176","level":"info","msg":"waiting for the gRPC server to be ready","resource":"squat.ai/serial","ts":"2024-03-07T13:23:58.434129443Z"}
{"caller":"plugin.go:188","level":"info","msg":"the gRPC server is ready","resource":"squat.ai/serial","ts":"2024-03-07T13:24:00.434686183Z"}
{"caller":"plugin.go:188","level":"info","msg":"the gRPC server is ready","resource":"squat.ai/zigbee","ts":"2024-03-07T13:24:01.334968608Z"}
{"caller":"plugin.go:226","level":"info","msg":"registering plugin with kubelet","resource":"squat.ai/zigbee","ts":"2024-03-07T13:24:01.335104052Z"}
{"caller":"plugin.go:226","level":"info","msg":"registering plugin with kubelet","resource":"squat.ai/serial","ts":"2024-03-07T13:24:01.237744071Z"}
ha-test@zmh-lip:/home/k8s/_system/kube-system$

This leds to the state described above (non - working application container).

Here is the same log from container after the first one was killed (and re-created by k8s):

ha-test@zmh-lip:/home/k8s/_system/kube-system$ kcn logs device-plugin-zigbee-tkgsv
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"squat.ai/zigbee\".","ts":"2024-03-07T13:26:41.838459105Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"squat.ai/zigbee","socket":"/var/lib/kubelet/device-plugins/gdp-c3F1YXQuYWkvemlnYmVl-1709818001.sock","ts":"2024-03-07T13:26:41.839476438Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"squat.ai/zigbee","ts":"2024-03-07T13:26:41.840105846Z"}
{"caller":"plugin.go:176","level":"info","msg":"waiting for the gRPC server to be ready","resource":"squat.ai/zigbee","ts":"2024-03-07T13:26:41.840379364Z"}
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"squat.ai/serial\".","ts":"2024-03-07T13:26:41.934053346Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"squat.ai/serial","socket":"/var/lib/kubelet/device-plugins/gdp-c3F1YXQuYWkvc2VyaWFs-1709818001.sock","ts":"2024-03-07T13:26:41.934398123Z"}
{"caller":"plugin.go:188","level":"info","msg":"the gRPC server is ready","resource":"squat.ai/zigbee","ts":"2024-03-07T13:26:41.936471975Z"}
{"caller":"plugin.go:226","level":"info","msg":"registering plugin with kubelet","resource":"squat.ai/zigbee","ts":"2024-03-07T13:26:41.936560327Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"squat.ai/serial","ts":"2024-03-07T13:26:42.034449846Z"}
{"caller":"plugin.go:176","level":"info","msg":"waiting for the gRPC server to be ready","resource":"squat.ai/serial","ts":"2024-03-07T13:26:42.034767179Z"}
{"caller":"plugin.go:188","level":"info","msg":"the gRPC server is ready","resource":"squat.ai/serial","ts":"2024-03-07T13:26:42.03771179Z"}
{"caller":"plugin.go:226","level":"info","msg":"registering plugin with kubelet","resource":"squat.ai/serial","ts":"2024-03-07T13:26:42.037867012Z"}
{"caller":"generic.go:232","level":"info","msg":"starting listwatch","resource":"squat.ai/zigbee","ts":"2024-03-07T13:26:42.53712666Z"}
{"caller":"generic.go:232","level":"info","msg":"starting listwatch","resource":"squat.ai/serial","ts":"2024-03-07T13:26:42.634323382Z"}
ha-test@zmh-lip:/home/k8s/_system/kube-system$

My environment is RaspberryPI 4/8GB (Arm64), dietpi OS (variant of Debian), USB drive. The system doesn't show any other issues.
I am not too experienced in k8s devices so I am not sure what can cause this strange behaviour.

Acquiring access to a USB device from a Pod container

I am running a k3s cluster in a series of ARM64 devices,

# k3s --version
k3s version v1.25.4+k3s1 (0dc63334)
go version go1.19.3

and am running the device plugin as follows,

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: generic-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: generic-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: generic-device-plugin
  template:
    metadata:
      labels:
        app.kubernetes.io/name: generic-device-plugin
    spec:
      priorityClassName: system-node-critical
      tolerations:
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"
      containers:
      - image: squat/generic-device-plugin
        args:
        - --log-level=debug
        - --device
        - |
          name: hut-monitor-device
          groups:
          - usb:
            - vendor: "2341"
              product: "0042"
        name: generic-device-plugin
        resources:
          requests:
            cpu: 50m
            memory: 10Mi
          limits:
            cpu: 50m
            memory: 20Mi
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          privileged: true
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
        - name: dev
          mountPath: /dev
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
      - name: dev
        hostPath:
          path: /dev
  updateStrategy:
    type: RollingUpdate

After I lunched the test Pod and got into the Pod container I found that the device seems shown in the container,

root@test:/# ls -lha  /dev/bus/usb/001/003 
crw-rw-r-- 1 root root 189, 2 Jan 29 22:22 /dev/bus/usb/001/003
root@test:/# ls -lha /dev/
total 4.0K
drwxr-xr-x 6 root root  380 Jan 29 22:22 .
drwxr-xr-x 1 root root 4.0K Jan 29 22:22 ..
drwxr-xr-x 3 root root   60 Jan 29 22:22 bus
lrwxrwxrwx 1 root root   11 Jan 29 22:22 core -> /proc/kcore
lrwxrwxrwx 1 root root   13 Jan 29 22:22 fd -> /proc/self/fd
crw-rw-rw- 1 root root 1, 7 Jan 29 22:22 full
drwxrwxrwt 2 root root   40 Jan 29 22:22 mqueue
crw-rw-rw- 1 root root 1, 3 Jan 29 22:22 null
lrwxrwxrwx 1 root root    8 Jan 29 22:22 ptmx -> pts/ptmx
drwxr-xr-x 2 root root    0 Jan 29 22:22 pts
crw-rw-rw- 1 root root 1, 8 Jan 29 22:22 random
drwxrwxrwt 2 root root   40 Jan 29 22:22 shm
lrwxrwxrwx 1 root root   15 Jan 29 22:22 stderr -> /proc/self/fd/2
lrwxrwxrwx 1 root root   15 Jan 29 22:22 stdin -> /proc/self/fd/0
lrwxrwxrwx 1 root root   15 Jan 29 22:22 stdout -> /proc/self/fd/1
-rw-rw-rw- 1 root root    0 Jan 29 22:22 termination-log
crw-rw-rw- 1 root root 5, 0 Jan 29 22:22 tty
crw-rw-rw- 1 root root 1, 9 Jan 29 22:22 urandom
crw-rw-rw- 1 root root 1, 5 Jan 29 22:22 zero

Plus, the host device seems to have allocated the serial device,

...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                     Requests     Limits
  --------                     --------     ------
  cpu                          1300m (32%)  50m (1%)
  memory                       630Mi (16%)  1364Mi (35%)
  ephemeral-storage            0 (0%)       0 (0%)
  squat.ai/audio               0            0
  squat.ai/capture             0            0
  squat.ai/fuse                0            0
  squat.ai/hut-monitor-device  1            1
  squat.ai/serial              0            0
  squat.ai/video               0            0

The problem is that I can't read from the serial device inside the container,

>>> from serial import Serial
>>> ser = Serial("/dev/bus/usb/001/003")
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/serial/serialposix.py", line 398, in _reconfigure_port
    orig_attr = termios.tcgetattr(self.fd)
termios.error: (25, 'Inappropriate ioctl for device')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/serial/serialutil.py", line 244, in __init__
    self.open()
  File "/usr/local/lib/python3.8/dist-packages/serial/serialposix.py", line 332, in open
    self._reconfigure_port(force_update=True)
  File "/usr/local/lib/python3.8/dist-packages/serial/serialposix.py", line 401, in _reconfigure_port
    raise SerialException("Could not configure port: {}".format(msg))
serial.serialutil.SerialException: Could not configure port: (25, 'Inappropriate ioctl for device')

I think what I should expect is something like /dev/ttyACM0 for the serial device for me to access. Is there anything I am missing on reading a serial output from the serial device? How do I create this /dev/ttyACM0 like symlink inside the container? Does this device-plugin support creating a symlink for serial device in the configuration?

Any way to do rolling restarts?

Plugin is working great with USB Zigbee stick on /dev/ttyUSB0 and Home Assistant as a Deployment, thanks! Is there any way I can roll the Deployment without deleting and re-applying? Eg, can I make this work:

kubectl rollout restart -n home-assistant deployment home-assistant

At the moment kubectl describe pod gives:

Warning  FailedScheduling  6m21s (x13 over 66m)  default-scheduler  0/1 nodes are available: 1 Insufficient squat.ai/serial. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..

I'm limiting the Deployment like this:

        resources:
          limits:
            squat.ai/serial: 1

Not fully sure what the number means, I just copy pasted the examples but it works.

panic: send on closed channel

There's a panic happening when one of the latter two functions in the run group return.

{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"squat.ai/device\".","ts":"2023-08-16T21:15:42.090254018Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"squat.ai/device","socket":"/var/lib/kubelet/device-plugins/gdp-c3F1YXQuYWkvd2F0Y2hkb2c=-1692220542.sock","ts":"2023-08-16T21:15:42.090439622Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"squat.ai/device","ts":"2023-08-16T21:15:42.090637276Z"}
{"caller":"plugin.go:174","level":"info","msg":"waiting for the gRPC server to be ready","resource":"squat.ai/device","ts":"2023-08-16T21:15:42.090677679Z"}
{"caller":"plugin.go:186","level":"info","msg":"the gRPC server is ready","resource":"squat.ai/device","ts":"2023-08-16T21:15:42.091178516Z"}
{"caller":"plugin.go:224","level":"info","msg":"registering plugin with kubelet","resource":"squat.ai/device","ts":"2023-08-16T21:15:42.091192467Z"}
{"caller":"generic.go:225","level":"info","msg":"starting listwatch","resource":"squat.ai/device","ts":"2023-08-16T21:15:42.187816593Z"}
panic: send on closed channel

goroutine 22 [running]:
github.com/squat/generic-device-plugin/deviceplugin.(*plugin).serve.func1()
        /src/deviceplugin/plugin.go:123 +0x12b
created by github.com/squat/generic-device-plugin/deviceplugin.(*plugin).serve
        /src/deviceplugin/plugin.go:121 +0x378

This panic masks the logging of the actual error returned by the returning function in the run group and prevents the device-plugin from retrying. I've seen hundreds of restarts from such panics over a few days.

"fusermount3: mount failed: Operation not permitted" in unprivileged pod

Hi squat,

Thanks for making this plugin. It seems to be exactly what I need: mounting a remote filesystem over sshfs without using privileged/SYS_ADMIN pods. Unfortunately I cannot get it working. I have installed the daemon set using the example yaml from the README, but I get the error in the title when I am actually trying to use sshfs.

When I grant a pod the SYS_ADMIN capability it works as expected, but I am trying to get it working without that capability. Unfortunately the examples on how to use this plugin for unprivileged FUSE mounts are a bit scarce.

Yaml for pod:

apiVersion: v1
kind: Pod
metadata:
  name: sshfs-test-pod
  namespace: sshfstest  
  labels:
    app: sshfs-test
spec:
  nodeName: jackalope
  containers:
    - name: sshfs-test
      image: mcr.microsoft.com/dotnet/sdk:7.0.203
      command:
        - sh
      args:
        - '-c'
        - while true; do sleep 2; done
      resources: 
        limits:
          squat.ai/fuse: 1
      # securityContext:
      #   capabilities:
      #     add: ["SYS_ADMIN"]
  restartPolicy: Never

In the pod I execute:

apt update
apt install sshfs -y
mkdir /mnt/sshfs
sshfs <sshtestuser>@<domain>:<path/to/shared/folders> /mnt/sshfs/

The exact same commands work when the pod has SYS_ADMIN capabilities so I am sure there is no issue on the remote server regarding rights/firewalling etc.

Please let me know if you need additional info.

Latest release regressed on matching USB devices

I think the USB detection may have regressed a bit with the recent changes to the USB serial logic. I am seeing an issue on the latest version that was not seen on the one just prior. I can reliably confirm by pinning the version to one or the other.

Here are my relevant args on the CLI for the DaemonSet pods:

        - '--device'
        - |
          name: coraltpu
          groups:
            - usb:
                - product: 089a
                  vendor: 1a6e
            - usb:
                - product: 9302
                  vendor: 18d1

Question: delay getting access to devices

I see an unexpected behavior (permission denied) when I try to access a device immediately after starting a pod.
Waiting for ~2s before trying to access the device solves the issue.
Do you know if this is expected? Or maybe I need to do something differently?
Thanks

Image platform tag is incorrect for arm and arm64 images

The os/arch label is wrong on the docker images for both arm and arm64. Docker hub has linux/amd64 for all 3 different digests in the latest tag.

image

Running the image throught docker on an arm64 machine to verify I get this warning:

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

Error on missing device

Hello, i wanted to use generic-device-plugin to schedule pods which need HW-accelerated video (Intel Quicksync).

This is my setup:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: generic-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: generic-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: generic-device-plugin
  template:
    metadata:
      labels:
        app.kubernetes.io/name: generic-device-plugin
    spec:
      priorityClassName: system-node-critical
      tolerations:
        - operator: "Exists"
          effect: "NoExecute"
        - operator: "Exists"
          effect: "NoSchedule"
      containers:
        - image: ghcr.io/squat/generic-device-plugin
          args:
            - --domain
            - generic-device
            - --device
            - |
              name: serial
              groups:
                - paths:
                    - path: /dev/ttyUSB*
                - paths:
                    - path: /dev/ttyACM*
                - paths:
                    - path: /dev/tty.usb*
                - paths:
                    - path: /dev/cu.*
                - paths:
                    - path: /dev/cuaU*
                - paths:
                    - path: /dev/rfcomm*
            - --device
            - |
              name: video
              groups:
                - paths:
                    - path: /dev/video0
            - --device
            - |
              name: dri
              groups:
                - count: 10
                  paths:
                    - path: /dev/dri/renderD128
                    - path: /dev/dri/card0
            - --device
            - |
              name: fuse
              groups:
                - count: 10
                  paths:
                    - path: /dev/fuse
            - --device
            - |
              name: audio
              groups:
                - count: 10
                  paths:
                    - path: /dev/snd
            - --device
            - |
              name: capture
              groups:
                - paths:
                    - path: /dev/snd/controlC0
                    - path: /dev/snd/pcmC0D0c
                - paths:
                    - path: /dev/snd/controlC1
                      mountPath: /dev/snd/controlC0
                    - path: /dev/snd/pcmC1D0c
                      mountPath: /dev/snd/pcmC0D0c
                - paths:
                    - path: /dev/snd/controlC2
                      mountPath: /dev/snd/controlC0
                    - path: /dev/snd/pcmC2D0c
                      mountPath: /dev/snd/pcmC0D0c
                - paths:
                    - path: /dev/snd/controlC3
                      mountPath: /dev/snd/controlC0
                    - path: /dev/snd/pcmC3D0c
                      mountPath: /dev/snd/pcmC0D0c
          name: generic-device-plugin
          resources:
            requests:
              cpu: 50m
              memory: 10Mi
          ports:
            - containerPort: 8080
              name: http
          securityContext:
            privileged: true
          volumeMounts:
            - name: device-plugin
              mountPath: /var/lib/kubelet/device-plugins
            - name: dev
              mountPath: /dev
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins
        - name: dev
          hostPath:
            path: /dev
  updateStrategy:
    type: RollingUpdate

As you can see, i request /dev/dri/renderD128 aswell as /dev/dri/card0.

But not all nodes actually have a /dev/dri/renderD128, so these pods crash:

{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"generic-device/capture\".","ts":"2023-07-30T17:48:36.817420355Z"}
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"generic-device/video\".","ts":"2023-07-30T17:48:36.817529452Z"}
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"generic-device/serial\".","ts":"2023-07-30T17:48:36.817756832Z"}
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"generic-device/fuse\".","ts":"2023-07-30T17:48:36.81795204Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"generic-device/fuse","socket":"/var/lib/kubelet/device-plugins/gdp-Z2VuZXJpYy1kZXZpY2UvZnVzZQ==-1690739316.sock","ts":"2023-07-30T17:48:36.81810015Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"generic-device/serial","socket":"/var/lib/kubelet/device-plugins/gdp-Z2VuZXJpYy1kZXZpY2Uvc2VyaWFs-1690739316.sock","ts":"2023-07-30T17:48:36.818275051Z"}
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"generic-device/dri\".","ts":"2023-07-30T17:48:36.818504645Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"generic-device/dri","socket":"/var/lib/kubelet/device-plugins/gdp-Z2VuZXJpYy1kZXZpY2UvZHJp-1690739316.sock","ts":"2023-07-30T17:48:36.818620283Z"}
{"caller":"plugin.go:174","level":"info","msg":"waiting for the gRPC server to be ready","resource":"generic-device/fuse","ts":"2023-07-30T17:48:36.818764636Z"}
{"caller":"main.go:218","msg":"Starting the generic-device-plugin for \"generic-device/audio\".","ts":"2023-07-30T17:48:36.818575279Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"generic-device/audio","socket":"/var/lib/kubelet/device-plugins/gdp-Z2VuZXJpYy1kZXZpY2UvYXVkaW8=-1690739316.sock","ts":"2023-07-30T17:48:36.819089851Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"generic-device/video","socket":"/var/lib/kubelet/device-plugins/gdp-Z2VuZXJpYy1kZXZpY2UvdmlkZW8=-1690739316.sock","ts":"2023-07-30T17:48:36.818050307Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"generic-device/dri","ts":"2023-07-30T17:48:36.81943341Z"}
{"caller":"plugin.go:174","level":"info","msg":"waiting for the gRPC server to be ready","resource":"generic-device/dri","ts":"2023-07-30T17:48:36.819574657Z"}
{"caller":"plugin.go:174","level":"info","msg":"waiting for the gRPC server to be ready","resource":"generic-device/audio","ts":"2023-07-30T17:48:36.819797719Z"}
{"caller":"plugin.go:174","level":"info","msg":"waiting for the gRPC server to be ready","resource":"generic-device/serial","ts":"2023-07-30T17:48:36.818663114Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"generic-device/fuse","ts":"2023-07-30T17:48:36.818698491Z"}
{"caller":"plugin.go:114","level":"info","msg":"listening on Unix socket","resource":"generic-device/capture","socket":"/var/lib/kubelet/device-plugins/gdp-Z2VuZXJpYy1kZXZpY2UvY2FwdHVyZQ==-1690739316.sock","ts":"2023-07-30T17:48:36.817984061Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"generic-device/video","ts":"2023-07-30T17:48:36.819858333Z"}
{"caller":"plugin.go:174","level":"info","msg":"waiting for the gRPC server to be ready","resource":"generic-device/video","ts":"2023-07-30T17:48:36.819881907Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"generic-device/serial","ts":"2023-07-30T17:48:36.818592661Z"}
{"caller":"plugin.go:186","level":"info","msg":"the gRPC server is ready","resource":"generic-device/dri","ts":"2023-07-30T17:48:36.820697529Z"}
{"caller":"plugin.go:224","level":"info","msg":"registering plugin with kubelet","resource":"generic-device/dri","ts":"2023-07-30T17:48:36.820886697Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"generic-device/audio","ts":"2023-07-30T17:48:36.821122172Z"}
{"caller":"plugin.go:122","level":"info","msg":"starting gRPC server","resource":"generic-device/capture","ts":"2023-07-30T17:48:36.821579135Z"}
{"caller":"plugin.go:174","level":"info","msg":"waiting for the gRPC server to be ready","resource":"generic-device/capture","ts":"2023-07-30T17:48:36.821965034Z"}
{"caller":"plugin.go:186","level":"info","msg":"the gRPC server is ready","resource":"generic-device/serial","ts":"2023-07-30T17:48:36.822779514Z"}
{"caller":"plugin.go:224","level":"info","msg":"registering plugin with kubelet","resource":"generic-device/serial","ts":"2023-07-30T17:48:36.822833887Z"}
{"caller":"plugin.go:186","level":"info","msg":"the gRPC server is ready","resource":"generic-device/audio","ts":"2023-07-30T17:48:36.82299425Z"}
{"caller":"plugin.go:224","level":"info","msg":"registering plugin with kubelet","resource":"generic-device/audio","ts":"2023-07-30T17:48:36.823032272Z"}
{"caller":"plugin.go:186","level":"info","msg":"the gRPC server is ready","resource":"generic-device/fuse","ts":"2023-07-30T17:48:36.823755299Z"}
{"caller":"plugin.go:224","level":"info","msg":"registering plugin with kubelet","resource":"generic-device/fuse","ts":"2023-07-30T17:48:36.826360352Z"}
{"caller":"plugin.go:186","level":"info","msg":"the gRPC server is ready","resource":"generic-device/video","ts":"2023-07-30T17:48:36.829967489Z"}
{"caller":"plugin.go:224","level":"info","msg":"registering plugin with kubelet","resource":"generic-device/video","ts":"2023-07-30T17:48:36.830163349Z"}
{"caller":"generic.go:225","level":"info","msg":"starting listwatch","resource":"generic-device/audio","ts":"2023-07-30T17:48:36.830563465Z"}
{"caller":"generic.go:225","level":"info","msg":"starting listwatch","resource":"generic-device/fuse","ts":"2023-07-30T17:48:36.831076755Z"}
{"caller":"generic.go:225","level":"info","msg":"starting listwatch","resource":"generic-device/dri","ts":"2023-07-30T17:48:36.831287243Z"}
panic: runtime error: index out of range [0] with length 0

goroutine 144 [running]:
github.com/squat/generic-device-plugin/deviceplugin.(*GenericPlugin).discoverPath(0x0?)
	/src/deviceplugin/path.go:108 +0x9fc
github.com/squat/generic-device-plugin/deviceplugin.(*GenericPlugin).discover(0xc0002209b0)
	/src/deviceplugin/generic.go:130 +0x25
github.com/squat/generic-device-plugin/deviceplugin.(*GenericPlugin).refreshDevices(0xc0002209b0)
	/src/deviceplugin/generic.go:151 +0x55
github.com/squat/generic-device-plugin/deviceplugin.(*GenericPlugin).ListAndWatch(0xc0002209b0, 0xb43de0?, {0xc873d0, 0xc0003b71b0})
	/src/deviceplugin/generic.go:226 +0xfc
k8s.io/kubelet/pkg/apis/deviceplugin/v1beta1._DevicePlugin_ListAndWatch_Handler({0xb2f780?, 0xc0002209b0}, {0xc86220, 0xc0000e89a0})
	/go/pkg/mod/k8s.io/[email protected]/pkg/apis/deviceplugin/v1beta1/api.pb.go:1424 +0xd0
google.golang.org/grpc.(*Server).processStreamingRPC(0xc00029a5a0, {0xc87cb8, 0xc0002fcd00}, 0xc000292900, 0xc00007c960, 0x10bc180, 0x0)
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1620 +0x11e7
google.golang.org/grpc.(*Server).handleStream(0xc00029a5a0, {0xc87cb8, 0xc0002fcd00}, 0xc000292900, 0x0)
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1708 +0x9ea
google.golang.org/grpc.(*Server).serveStreams.func1.2()
	/go/pkg/mod/google.golang.org/[email protected]/server.go:965 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/go/pkg/mod/google.golang.org/[email protected]/server.go:963 +0x28a

I kinda expected the plugin to just ignore these then, but apparently not?

Expected result:

  • When one device in a group doesnt exist
    • skip group

This also sparked another problem. I cannot distinguish between cards or features supported by it. A more powerful card might be able to do more and offer certain featuresets like encoders.
In fact, i actually only have 1 node with intel quicksync, but another node also has the /dev/dri/renderD128 device (one of the Oracle ARM systems), but that one just doesnt work at all. I dont think the plugin could detect that.

We could, when discovering special devices, check these more deeply and discover features and then be able to request these.

e.g. generic-device/dri-VAProfileH264High to show we discovered the VAProfileH264High encoding profile.

USB discovery error

Hi,
I'm running the device-plugin and I'm trying to discover /dev/fuse, but the device is not found and in the kubelet log I see these errors:

May 10 11:46:45 xxxx kubelet[14485]: I0510 11:46:45.807900   14485 reconciler.go:342] "operationExecutor.VerifyControllerAttachedVolume started for volume \"device-plugin\" (UniqueName: \"kubernetes.io/host-path/64159588-4463-452f-941e-7f11d39411f4-device-plugin\") pod \"generic-device-plugin-hfxdl\" (UID: \"64159588-4463-452f-941e-7f11d39411f4\") " pod="default/generic-device-plugin-hfxdl"
May 10 11:46:45 xxxx kubelet[14485]: I0510 11:46:45.808061   14485 reconciler.go:342] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-5lhdq\" (UniqueName: \"kubernetes.io/projected/64159588-4463-452f-941e-7f11d39411f4-kube-api-access-5lhdq\") pod \"generic-device-plugin-hfxdl\" (UID: \"64159588-4463-452f-941e-7f11d39411f4\") " pod="default/generic-device-plugin-hfxdl"
May 10 11:46:45 xxxx kubelet[14485]: I0510 11:46:45.808211   14485 reconciler.go:342] "operationExecutor.VerifyControllerAttachedVolume started for volume \"dev\" (UniqueName: \"kubernetes.io/host-path/64159588-4463-452f-941e-7f11d39411f4-dev\") pod \"generic-device-plugin-hfxdl\" (UID: \"64159588-4463-452f-941e-7f11d39411f4\") " pod="default/generic-device-plugin-hfxdl"
May 10 11:46:49 xxxx kubelet[14485]: I0510 11:46:49.019886   14485 manager.go:422] "Got registration request from device plugin with resource" resourceName="squat.ai/fuse"
May 10 11:46:49 xxxx kubelet[14485]: E0510 11:46:49.118835   14485 endpoint.go:107] "listAndWatch ended unexpectedly for device plugin" err="rpc error: code = Unknown desc = failed to refresh devices: failed to discover usb devices: open /sys/bus/usb/devices/: no such file or directory" resourceName="squat.ai/fuse"

On the worker node the directory /sys/bus/usb/devices/ doesn't exist:

xxxx@xxxx:/$ ls -la /sys/bus/
total 0
drwxr-xr-x 30 root root 0 May 10 09:55 .
dr-xr-xr-x 13 root root 0 May 10 09:55 ..
drwxr-xr-x  4 root root 0 May 10 09:55 acpi
drwxr-xr-x  4 root root 0 May 10 09:55 cec
drwxr-xr-x  4 root root 0 May 10 09:55 clockevents
drwxr-xr-x  4 root root 0 May 10 09:55 clocksource
drwxr-xr-x  4 root root 0 May 10 09:55 container
drwxr-xr-x  4 root root 0 May 10 09:55 cpu
drwxr-xr-x  4 root root 0 May 10 09:55 dax
drwxr-xr-x  4 root root 0 May 10 09:55 edac
drwxr-xr-x  4 root root 0 May 10 09:55 event_source
drwxr-xr-x  4 root root 0 May 10 09:55 gpio
drwxr-xr-x  4 root root 0 May 10 09:55 i2c
drwxr-xr-x  4 root root 0 May 10 09:55 machinecheck
drwxr-xr-x  4 root root 0 May 10 09:55 memory
drwxr-xr-x  4 root root 0 May 10 09:55 mipi-dsi
drwxr-xr-x  4 root root 0 May 10 09:55 node
drwxr-xr-x  4 root root 0 May 10 09:55 nvmem
drwxr-xr-x  5 root root 0 May 10 09:55 pci
drwxr-xr-x  4 root root 0 May 10 09:55 pci_express
drwxr-xr-x  4 root root 0 May 10 09:55 platform
drwxr-xr-x  4 root root 0 May 10 09:55 pnp
drwxr-xr-x  4 root root 0 May 10 09:55 rbd
drwxr-xr-x  4 root root 0 May 10 09:55 scsi
drwxr-xr-x  4 root root 0 May 10 09:55 serial
drwxr-xr-x  4 root root 0 May 10 09:55 serio
drwxr-xr-x  4 root root 0 May 10 09:55 spi
drwxr-xr-x  4 root root 0 May 10 09:55 workqueue
drwxr-xr-x  4 root root 0 May 10 09:55 xen
drwxr-xr-x  4 root root 0 May 10 09:55 xen-backend

This is the yaml file that I applied:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: generic-device-plugin
  namespace: default
  labels:
    app.kubernetes.io/name: generic-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: generic-device-plugin
  template:
    metadata:
      labels:
        app.kubernetes.io/name: generic-device-plugin
    spec:
      priorityClassName: system-node-critical
      containers:
      - image: squat/generic-device-plugin
        args:
        - --device
        - |
          name: fuse
          groups:
            - count: 10
              paths:
                - path: /dev/fuse
        name: generic-device-plugin
        resources:
          requests:
            cpu: 50m
            memory: 10Mi
          limits:
            cpu: 50m
            memory: 10Mi
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          privileged: true
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
        - name: dev
          mountPath: /dev
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
      - name: dev
        hostPath:
          path: /dev
  updateStrategy:
    type: RollingUpdate

On the source code I found this:

pathEntry, err := os.Stat(absolutePath)

How about checking if the folder exists instead of just throwing the error if it doesn't?

Cannot mount fuse on in minikube: fusermount: fuse device not found

Hi,
Thanks for making this plug-in.
I want to use rclone to mount several cloud storage (s3, Google Drive, etc.) in a Pod in k8s.
To test it I'm using Minikube where I installed the plugin using:

kubectl apply -f https://raw.githubusercontent.com/squat/generic-device-plugin/main/manifests/generic-device-plugin.yaml

Then I created a pod to test the mount:

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu
spec:
  containers:
  - name: ubuntu
    image: ubuntu:latest
    command: ["/bin/sh", "-c", "apt-get update && apt-get install -y rclone fuse && mkdir /mnt/my_remote_bucket && tail -f /dev/null"]
    resources:
    limits:
      squat.ai/fuse: 1
      securityContext:
        capabilities:
          add: ["SYS_ADMIN"]
    # mount the rclone config file
    volumeMounts:
    - name: rclone-config
      mountPath: /root/.config/rclone/rclone.conf
      subPath: rclone.conf
  volumes:
  - name: rclone-config
    configMap:
      name: rclone-config
      items:
      - key: rclone.conf
        path: rclone.conf

As soon as the pod starts I run in the pod:

rclone mount s3:bucket /mnt/my_remote_bucket --no-check-certificate --allow-other --allow-non-empty --vfs-cache-mode writes

The command returns:

2023/11/15 19:43:08 mount helper error: fusermount: fuse device not found, try 'modprobe fuse' first
2023/11/15 19:43:08 Fatal error: failed to mount FUSE fs: fusermount: exit status 1

Any idea why I may get this error?

UnexpectedAdmissionError occurs after node restart

Hi!

I use generic-device-plugin to expose Zigbee USB stick to Zigbee2Mqtt

  - --device
  - '{"name": "zigbee", "groups": [{"paths": [{"path": "/dev/ttyACM0"}]}]}'

It works and I'm able to match the node with squat.ai/zigbee: 1 in a Deployment with replicas 1

However if node (it runs both as K8S controller and K8S worker) restarts I start seeing many instances of that pod despite it being ran as Deployment with replicas: 1 in UnexpectedAdmissionError state

image

kubectl describe pod gives this:

Reason:           UnexpectedAdmissionError
Message:          Pod was rejected: Allocate failed due to no healthy devices present; cannot allocate unhealthy devices squat.ai/zigbee, which is unexpected

My understanding is that at the time K8S tries to run my Deployment the generic-device-plugin hasn't started yet and when it does there is some race condition and K8S tries to spin up many pods with access to same device and only one pod succeeds and others fail into UnexpectedAdmissionError

I wonder if there is any solution to this?

There are no versioned releases

There are no versioned releases in the project and there is only the latest tag published which changes every night with CI build pushing new builds. This can result in version differences between pod instances in the daemonset when new nodes are added. Perhaps adding tags when new code releases happen and publishing those tags as docker image tags would be a solution.

Device Synchronization Lag in Pod Readiness

Hello Squat,

We're experiencing a problem where the generic device plugin pod becomes ready before the devices on our host are actually ready. This leads to complications for another pod relying on this device plugin, as it can't start due to unmet requirements. Currently, we're resolving this by restarting the generic device plugin pod, which appears to solve the problem.

Is there a method to configure the generic device plugin so that it only reaches a ready state after the host devices are fully prepared?

Execution failed: failed to parse device

Dear developers,

following device flag value taken from https://github.com/squat/generic-device-plugin/#overview is failing at my 22.04.1 ubuntu system:

./bin/amd64/generic-device-plugin --device {"name": "video", "groups": [{"paths": [{"path": "/dev/video0"}]}]}
Execution failed: failed to parse device "{name:": error converting YAML to JSON: yaml: line 1: did not find expected node content

while version with following single quotes works fine:
./bin/amd64/generic-device-plugin --device '{"name": "video", "groups": [{"paths": [{"path": "/dev/video0"}]}]}'
please fix if relevant,

Best regards, P.~Mandrik

Push images to Github Container Registry

Docker Hub continues to do weird things to pull limits, license requirements, etc, so building and pushing containers to GHCR seems like a good idea.

I'm not 100% sure how we'd want to implement this - usually I just spin another workflow task with instructions to docker push to ghcr.io, but as we're making heavy use of Makefile (which isn't a native tongue by any stretch) I figured this is better as an issue rather than a draft PR.

Plugin doesn`t mount my usb devices inside pod

Hello :)

I have the following problem: I want to access a USB camera inside my minikube pod.
My env: Ubuntu 22.04, Minikube v1.32.0, Kubernetes v1.29.0

I am download generic-device-plugin.yaml to my computer and run kubectl apply -f=generic-device-plugin.yaml. My node configuration after this looks like this:

Name:               minikube
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=minikube
                    kubernetes.io/os=linux
                    minikube.k8s.io/commit=8220a6eb95f0a4d75f7f2d7b14cef975f050512d
                    minikube.k8s.io/name=minikube
                    minikube.k8s.io/primary=true
                    minikube.k8s.io/updated_at=2024_04_04T11_09_57_0700
                    minikube.k8s.io/version=v1.32.0
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 04 Apr 2024 11:09:55 +0300
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  minikube
  AcquireTime:     <unset>
  RenewTime:       Mon, 08 Apr 2024 15:35:00 +0300
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Mon, 08 Apr 2024 15:35:00 +0300   Thu, 04 Apr 2024 11:09:54 +0300   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Mon, 08 Apr 2024 15:35:00 +0300   Thu, 04 Apr 2024 11:09:54 +0300   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Mon, 08 Apr 2024 15:35:00 +0300   Thu, 04 Apr 2024 11:09:54 +0300   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Mon, 08 Apr 2024 15:35:00 +0300   Fri, 05 Apr 2024 18:49:09 +0300   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.49.2
  Hostname:    minikube
Capacity:
  cpu:                12
  ephemeral-storage:  59536404Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16100168Ki
  pods:               110
  squat.ai/audio:     10
  squat.ai/capture:   1
  squat.ai/fuse:      10
  squat.ai/serial:    0
  squat.ai/video:     1
Allocatable:
  cpu:                12
  ephemeral-storage:  59536404Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16100168Ki
  pods:               110
  squat.ai/audio:     10
  squat.ai/capture:   1
  squat.ai/fuse:      10
  squat.ai/serial:    0
  squat.ai/video:     1
System Info:
  Machine ID:                 cab3c46fd9344d3d98dd8dcafedee792
  System UUID:                b288c57c-b602-4e49-adf7-c88f90a0c79a
  Boot ID:                    009a4e8a-d144-4dcc-9bd8-a91534acbc50
  Kernel Version:             6.5.0-26-generic
  OS Image:                   Ubuntu 22.04.3 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://24.0.7
  Kubelet Version:            v1.29.0
  Kube-Proxy Version:         v1.29.0
PodCIDR:                      10.244.0.0/24
PodCIDRs:                     10.244.0.0/24
Non-terminated Pods:          (17 in total)
  Namespace                   Name                                          CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                          ------------  ----------  ---------------  -------------  ---
  default                     udp-server-89b74f87d-h9m8q                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         17h
  default                     udp-server-deployment-5c8d47786c-c4hjz        0 (0%)        0 (0%)      0 (0%)           0 (0%)         17h
  default                     udp-server-deployment-c4476f65b-hpgqb         0 (0%)        0 (0%)      0 (0%)           0 (0%)         18h
  default                     udp-stream-deployment-b88d78b44-ldbch         0 (0%)        0 (0%)      0 (0%)           0 (0%)         16h
  kafka                       kafka-broker-94d6ff58d-8xpkp                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         2d23h
  kafka                       zookeeper-cd79b98b6-hrqkq                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d23h
  kube-system                 coredns-76f75df574-8ld6b                      100m (0%)     0 (0%)      70Mi (0%)        170Mi (1%)     4d4h
  kube-system                 etcd-minikube                                 100m (0%)     0 (0%)      100Mi (0%)       0 (0%)         4d4h
  kube-system                 generic-device-plugin-p8ztd                   50m (0%)      50m (0%)    10Mi (0%)        20Mi (0%)      6s
  kube-system                 kube-apiserver-minikube                       250m (2%)     0 (0%)      0 (0%)           0 (0%)         4d4h
  kube-system                 kube-controller-manager-minikube              200m (1%)     0 (0%)      0 (0%)           0 (0%)         4d4h
  kube-system                 kube-proxy-t5n4r                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d4h
  kube-system                 kube-scheduler-minikube                       100m (0%)     0 (0%)      0 (0%)           0 (0%)         4d4h
  kube-system                 metrics-server-7c66d45ddc-6mknq               100m (0%)     0 (0%)      200Mi (1%)       0 (0%)         3d23h
  kube-system                 storage-provisioner                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d4h
  kubernetes-dashboard        dashboard-metrics-scraper-7fd5cb4ddc-dwqkw    0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d23h
  kubernetes-dashboard        kubernetes-dashboard-8694d4445c-klqrq         0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d23h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                900m (7%)   50m (0%)
  memory             380Mi (2%)  190Mi (1%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
  squat.ai/audio     0           0
  squat.ai/capture   0           0
  squat.ai/fuse      0           0
  squat.ai/serial    0           0
  squat.ai/video     0           0
Events:              <none>

And squat.ai/serial: 0 looks bad. I'm trying add -paths: - path: /dev/bus/usb/*/*/* to a serial groups. After this I got: squat.ai/serial: 10, but my camera doesn't throw inside pod.

In the next step I try to connect the simple webcam and add -paths: - path: /dev/video* to a video groups. And that doesn't work either.

Keeps crashing on a MacchiatoBin SBC

Hello!
I'm using this in 2 different k3s clusters and I have MacchiatoBin SBC (along with Rockpro64s and Pi-4s) in both. The MacchiatoBins keep crashing every few minutes.(Crashloopbackoff). While they are running they do seem to work as they will schedule workloads on the right node. The logs on the pod do not seem to indicate what is wrong, just looks like start-up messages.

Logs

Mon, Mar 27 2023 8:40:03 am | {"caller":"main.go:227","msg":"Starting the generic-device-plugin for \"squat.ai/adsb\".","ts":"2023-03-27T14:40:03.571928222Z"}
Mon, Mar 27 2023 8:40:03 am | {"caller":"plugin.go:116","level":"info","msg":"listening on Unix socket","resource":"squat.ai/adsb","socket":"/var/lib/kubelet/device-plugins/gdp-c3F1YXQuYWkvYWRzYg==-1679928003.sock","ts":"2023-03-27T14:40:03.572079912Z"}
Mon, Mar 27 2023 8:40:03 am | {"caller":"plugin.go:123","level":"info","msg":"starting gRPC server","resource":"squat.ai/adsb","ts":"2023-03-27T14:40:03.572447257Z"}
Mon, Mar 27 2023 8:40:03 am | {"caller":"plugin.go:138","level":"info","msg":"waiting for the gRPC server to be ready","resource":"squat.ai/adsb","ts":"2023-03-27T14:40:03.572464418Z"}
Mon, Mar 27 2023 8:40:03 am | {"caller":"plugin.go:150","level":"info","msg":"the gRPC server is ready","resource":"squat.ai/adsb","ts":"2023-03-27T14:40:03.573457284Z"}
Mon, Mar 27 2023 8:40:03 am | {"caller":"plugin.go:188","level":"info","msg":"registering plugin with kubelet","resource":"squat.ai/adsb","ts":"2023-03-27T14:40:03.573553451Z"}
Mon, Mar 27 2023 8:40:03 am | {"caller":"generic.go:215","level":"info","msg":"starting listwatch","resource":"squat.ai/adsb","ts":"2023-03-27T14:40:03.771650039Z"}

Architecture

mach-1:~$ cat /etc/*release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

mach-1:~$ uname -a
Linux mach-1 5.1.0-trunk-arm64 #1 SMP Debian 5.1.10-1~exp1sr2 (2019-06-18) aarch64 GNU/Linux

Sorry for very little information, there just doesn't seem to be much. Let me know if you need anything else.

how to release device during pod running

Hi Lucas,

the generic-devive-plugin can be perfectly used to limit the number of running pods on a specific node. Just create a dummy device or a group with a certain number. Add a device requirement on the pod. done.
very nice! very useful!
Now my question: how can I release a device during the pod is running? lets after the pod is ready.
Why: Java, Jakarta EE, JEE, and Spring applications need a huge amount of CPU during startup but often not during runtime. So when a node is restarting with many Java applications they will not come up.
When you do not define CPU request. the Java pods do not come up, because no pods get enough resources and hit startup timeout. when you define the CPU request high you waste a lot of CPU resources when all pods are running.
So it could be useful to somehow serialize the startup of the Java applications.
How: I know I can watch the pods and realize that the one pod gets ready (liveness probe), but how would I release the dummy device at this time, so that the next Java application can startup?
Other pods would not be affected, because they do not have a dummy device request.
Thanks for your feedback!
KR Robert

Namespace-specific Access Control for Low-Level Devices

Hi squat!

I am writing to propose a new feature related to access control for low-level devices. This feature aims to enhance the security in managing device access across different namespaces in a Kubernetes environment.

Feature Description

The primary goal is to allow cluster administrators to define which namespaces have access to specific low-level devices. This could be as simple as a list or regex-pattern of namespaces, or perhaps there's a way to use Kubernetes RBAC and your new CRD approach.

Use Case

We're working in a multi-tenant Kubernetes environment and we'd like to make sure only specific pods have access to hardware resources.

Thank you for considering this proposal.

Evict pod if the device is removed

I hope that I'm just missing something simple here. I have configured a USB device in the generic-device-plugin, and I'm able to ensure that a certain pod will only be scheduled on nodeA which has that USB device plugged in by setting resource limits. So far: AWESOME!

I can unplug the USB device from nodeA and plug it into nodeB, and each node's .status.capacity and .status.allocatable are updated to reflect which node has the device. PERFECT!

The problem that I have is that if the pod has already been scheduled and is running before I move the USB to nodeB, the pod will remain on the node which no longer has the device available. I was hoping that the scheduler would recognize that the node no longer has the resources to support the pod, evict it, and eventually reschedule it on nodeB once it's available. But this doesn't happen according to my testing.

I've thought of a few possible workarounds (involving labels and affinity rules), but I wanted to see if there's any existing ideas/solutions.

Investigate Increased Memory Usage

Hola!

Thanks for making this, it's super useful!

I'm running this on a 4 cluster node (3x 2GB Pi 4b, and 1x 8GB Pi 4b) to expose the sound device and USB device. I was having a weird issue where everything works great for 2 of my nodes and the I could only get it to work with the other 2 nodes once (with the default DaemonSet config). The 2 nodes that didn't work would never show any logs.

The issue started when I was trying to use the new USB device feature. But even when I would go back to the default, I still had the same issue.

I read the issue here: #11 And decided to try also upping the memory limit. As soon as I did that everything worked!

So it does sound like the new build uses more memory. Here is my current config that's working:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: generic-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: generic-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: generic-device-plugin
  template:
    metadata:
      labels:
        app.kubernetes.io/name: generic-device-plugin
    spec:
      priorityClassName: system-node-critical
      tolerations:
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"
      containers:
      - image: squat/generic-device-plugin
        args:
        - --device
        - '{"name": "audio", "groups": [{"count": 10, "paths": [{"path": "/dev/snd"}]}]}'
        - --device
        - '{"name": "zwave", "groups": [{"usb": [{"vendor": "0658", "product": "0200"}]}]}'
        name: generic-device-plugin
        resources:
          requests:
            cpu: 50m
            memory: 20Mi
          limits:
            cpu: 50m
            memory: 20Mi
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          privileged: true
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
        - name: dev
          mountPath: /dev
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
      - name: dev
        hostPath:
          path: /dev
  updateStrategy:
    type: RollingUpdate

While I'm here, I figured I should give some additional feedback. On the nodes without the USB device, I get this log message constently:

{"caller":"usb.go:245","level":"info","msg":"no USB devices found attached to system","resource":"squat.ai/zwave","ts":"2023-04-05T01:31:47.627269533Z"}

It's not a problem, but I don't think this should be a "info" level message.

Generic Device Plugin vs KubeEdge

Hi I'm looking to use a bluetooth usb dongle with my kubernetes cluster to connect with another bluetooth-enabled sensor.

I'm trying to understand the difference between this Generic Device Plugin and the device implementation in KubeEdge.

I'm curious how you all differentiate between the two. What are the tradeoffs between the this system and KubeEdge, is there one that is more secure/robust?

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.