Git Product home page Git Product logo

Comments (6)

przemeklal avatar przemeklal commented on August 29, 2024

Hey @pperiyasamy, if you see the pod being scheduled despite missing resources, then most likely the resources weren't injected into the pod spec. Could you please provide below details so that we can determine whether this is legitimate bug:

  • logs of the network-resources-injector (by default kubectl logs -n kube-system network-resources-injector)
  • net-attach-def object spec
  • pod spec of a running pod (kubectl get pod -o yaml <pod_name>)

from network-resources-injector.

pperiyasamy avatar pperiyasamy commented on August 29, 2024
  • logs of the network-resources-injector (by default kubectl logs -n kube-system network-resources-injector)
root@dl380-006-ECCD-SUT:~/cnis/sriov-network-device-plugin/deployments# kubectl logs -n kube-system network-resources-injector
I0702 15:00:48.400411       1 main.go:34] starting mutating admission controller for network resources injection
I0702 15:14:07.463084       1 webhook.go:273] Received mutation request
I0702 15:14:07.467068       1 webhook.go:157] 'sriov-net1' is not in JSON format: invalid character 's' looking for beginning of value... trying to parse as comma separated network selections list
I0702 15:14:07.474184       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:14:07.474199       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:14:07.474246       1 webhook.go:257] sending response to the Kubernetes API server
I0702 15:16:35.683004       1 webhook.go:273] Received mutation request
I0702 15:16:35.683331       1 webhook.go:157] 'sriov-net1' is not in JSON format: invalid character 's' looking for beginning of value... trying to parse as comma separated network selections list
I0702 15:16:35.686881       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:16:35.686897       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:16:35.686927       1 webhook.go:257] sending response to the Kubernetes API server
I0702 15:21:39.380198       1 webhook.go:273] Received mutation request
I0702 15:21:39.380515       1 webhook.go:157] 'sriov-net1, sriov-net1' is not in JSON format: invalid character 's' looking for beginning of value... trying to parse as comma separated network selections list
I0702 15:21:39.384277       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:21:39.384288       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:21:39.385425       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:21:39.385435       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:21:39.385475       1 webhook.go:257] sending response to the Kubernetes API server
I0702 15:44:52.425686       1 webhook.go:273] Received mutation request
I0702 15:44:52.426019       1 webhook.go:157] 'sriov-net1, sriov-net1' is not in JSON format: invalid character 's' looking for beginning of value... trying to parse as comma separated network selections list
I0702 15:44:52.429641       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:44:52.429656       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:44:52.430815       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:44:52.430827       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:44:52.430877       1 webhook.go:257] sending response to the Kubernetes API server
I0702 15:47:08.030942       1 webhook.go:273] Received mutation request
I0702 15:47:08.031230       1 webhook.go:157] 'sriov-net1, sriov-net1' is not in JSON format: invalid character 's' looking for beginning of value... trying to parse as comma separated network selections list
I0702 15:47:08.034926       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:47:08.034942       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:47:08.036039       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:47:08.036050       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:47:08.036101       1 webhook.go:257] sending response to the Kubernetes API server
I0702 15:49:57.651779       1 webhook.go:273] Received mutation request
I0702 15:49:57.652063       1 webhook.go:157] 'sriov-net1, sriov-net1, sriov-net1' is not in JSON format: invalid character 's' looking for beginning of value... trying to parse as comma separated network selections list
I0702 15:49:57.655560       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:49:57.655573       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:49:57.656644       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:49:57.656656       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:49:57.657805       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:49:57.657818       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:49:57.657860       1 webhook.go:257] sending response to the Kubernetes API server
I0702 15:55:39.454629       1 webhook.go:273] Received mutation request
I0702 15:55:39.454913       1 webhook.go:157] 'sriov-net1, sriov-net1, sriov-net1' is not in JSON format: invalid character 's' looking for beginning of value... trying to parse as comma separated network selections list
I0702 15:55:39.458468       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:55:39.458481       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:55:39.459630       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:55:39.459640       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:55:39.460671       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 15:55:39.460683       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 15:55:39.460726       1 webhook.go:257] sending response to the Kubernetes API server
I0702 16:13:31.904806       1 webhook.go:273] Received mutation request
I0702 16:13:31.905422       1 webhook.go:157] 'sriov-net1, sriov-net1' is not in JSON format: invalid character 's' looking for beginning of value... trying to parse as comma separated network selections list
I0702 16:13:31.908999       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 16:13:31.909011       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 16:13:31.910213       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0702 16:13:31.910224       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0702 16:13:31.910265       1 webhook.go:257] sending response to the Kubernetes API server
I0703 07:57:46.033590       1 webhook.go:273] Received mutation request
I0703 07:57:46.033917       1 webhook.go:157] 'sriov-net1, sriov-net1' is not in JSON format: invalid character 's' looking for beginning of value... trying to parse as comma separated network selections list
I0703 07:57:46.037423       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0703 07:57:46.037439       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0703 07:57:46.038600       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0703 07:57:46.038610       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0703 07:57:46.038662       1 webhook.go:257] sending response to the Kubernetes API server
I0703 07:57:54.101089       1 webhook.go:273] Received mutation request
I0703 07:57:54.101746       1 webhook.go:157] 'sriov-net1, sriov-net1, sriov-net1' is not in JSON format: invalid character 's' looking for beginning of value... trying to parse as comma separated network selections list
I0703 07:57:54.103332       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0703 07:57:54.103344       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0703 07:57:54.104544       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0703 07:57:54.104556       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0703 07:57:54.105605       1 webhook.go:312] network attachment definition 'default/sriov-net1' found
I0703 07:57:54.105617       1 webhook.go:318] resource 'intel.com/intel_sriov_netdevice' needs to be requested for network 'default/sriov-net1'
I0703 07:57:54.105659       1 webhook.go:257] sending response to the Kubernetes API server
  • net-attach-def object spec
root@dl380-006-ECCD-SUT:~/cnis/sriov-network-device-plugin/deployments# kubectl get net-attach-def
NAME         AGE
sriov-net1   16h
root@dl380-006-ECCD-SUT:~/cnis/sriov-network-device-plugin/deployments# kubectl describe net-attach-def sriov-net1
Name:         sriov-net1
Namespace:    default
Labels:       <none>
Annotations:  k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice
API Version:  k8s.cni.cncf.io/v1
Kind:         NetworkAttachmentDefinition
Metadata:
  Creation Timestamp:  2019-07-02T15:10:47Z
  Generation:          1
  Resource Version:    30311
  Self Link:           /apis/k8s.cni.cncf.io/v1/namespaces/default/network-attachment-definitions/sriov-net1
  UID:                 926530b4-9cdb-11e9-9f49-3cfdfe9eac40
Spec:
  Config:  { "type": "sriov", "name": "sriov-network", "ipam": { "type": "host-local", "subnet": "10.56.217.0/24", "routes": [{ "dst": "0.0.0.0/0" }], "gateway": "10.56.217.1" } }
Events:    <none>

  • pod spec of a running pod (kubectl get pod -o yaml <pod_name>)
root@dl380-006-ECCD-SUT:~/cnis/sriov-network-device-plugin/deployments# kubectl get pod -o yaml testpod3
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-net1, sriov-net1, sriov-net1
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "k8s-pod-network",
          "ips": [
              "192.168.162.232"
          ],
          "default": true,
          "dns": {}
      },{
          "name": "sriov-network",
          "dns": {}
      },{
          "name": "sriov-network",
          "dns": {}
      },{
          "name": "sriov-network",
          "dns": {}
      }]
  creationTimestamp: "2019-07-03T07:57:54Z"
  name: testpod3
  namespace: default
  resourceVersion: "126321"
  selfLink: /api/v1/namespaces/default/pods/testpod3
  uid: 4332a74b-9d68-11e9-9f49-3cfdfe9eac40
spec:
  containers:
  - args:
    - while true; do sleep 300000; done;
    command:
    - /bin/bash
    - -c
    - --
    image: repo-pmd:v2
    imagePullPolicy: IfNotPresent
    name: appcntr13
    resources:
      limits:
        intel.com/intel_sriov_netdevice: "3"
      requests:
        intel.com/intel_sriov_netdevice: "3"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-rpntn
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: dl380-006-eccd-sut
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-rpntn
    secret:
      defaultMode: 420
      secretName: default-token-rpntn
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-07-03T07:57:54Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-07-03T07:58:02Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-07-03T07:58:02Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-07-03T07:57:54Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://e62ef3acd384dc7847c519b097fd9233e7b038b6ea9b0e7c0e49abbf4b1a4376
    image: repo-pmd:v2
    imageID: docker://sha256:4d11eb5511d103af9fc38fe205ad919ae6c17f0ef0adeabc08f4cf527c206b0b
    lastState: {}
    name: appcntr13
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2019-07-03T07:58:01Z"
  hostIP: 10.85.4.61
  phase: Running
  podIP: 192.168.162.232
  qosClass: BestEffort
  startTime: "2019-07-03T07:57:54Z"

if you see the pod being scheduled despite missing resources, then most likely the resources weren't injected into the pod spec

Actually there are enough resources, There are 6 pci devices (out of this, 2 VF devices belong to ens3f2 pf) bound to i40evf driver. as per the configuration below, my expectation is 4 devices should get allocated to intel_sriov_netdevice due to resource injector capacity configuration and remaining 2 devices (due to pfNames in selectors criteria) are allocated to intel_sriov_hostdevice resource.
But looks resource injector doesn't honor the configured capacity.

{
    "resourceList": [{
            "resourceName": "intel_sriov_netdevice",
            "selectors": {
                "vendors": ["8086"],
                "devices": ["154c"],
                "drivers": ["i40evf"]
            }
        },
        {
            "resourceName": "intel_sriov_dpdk_vfio_device",
            "selectors": {
                "vendors": ["8086"],
                "devices": ["154c"],
                "drivers": ["vfio-pci"],
                "pfNames": ["ens3f1"]
            }
        },
        {
            "resourceName": "intel_sriov_hostdevice",
            "selectors": {
                "vendors": ["8086"],
                "devices": ["154c"],
                "pfNames": ["ens3f2"]
            }
        },
        {
            "resourceName": "intel_sriov_dpdk_uio_device",
            "selectors": {
                "vendors": ["8086"],
                "devices": ["154c"],
                "pfNames": ["ens3f3"]
            }
        }
    ]
}

from network-resources-injector.

przemeklal avatar przemeklal commented on August 29, 2024

@pperiyasamy okay, so if you have 6 resources available then everything works as expected.

Capacity isn't controlled by the network resources injector or isn't used in any way to be fair.

Injector is just a small handy tool that looks up what net-attach-defs have been requested, checks if there are any resources linked to it and injects them as necessary. Resources don't have to be available or even exist in the cluster - we leave scheduling to Kubernetes and resource management to device plugins.

Compute resources (Capacity and Allocatable) are controlled by the appropriate device plugin (SRIOV Network Device Plugin in this case), so the curl command you added in your first post doesn't have any effect and isn't required for your use-case - it was used in the readme just to show the basic usage of the injector without need to deploy any device plugins - apologies if it caused any confusion.

from network-resources-injector.

pperiyasamy avatar pperiyasamy commented on August 29, 2024

Thanks @przemeklal . In that case, can you just remove those details from readme ?

from network-resources-injector.

przemeklal avatar przemeklal commented on August 29, 2024

I think we should update it to show how it's supposed to work with the actual device plugin and/or move this basic example somewhere else - anyway thanks for suggestion.

from network-resources-injector.

pperiyasamy avatar pperiyasamy commented on August 29, 2024

This issue is same as the issue #30 and addressing it with this PR #29, so closing this issue.

from network-resources-injector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.