Git Product home page Git Product logo

cstor-operators's Introduction

cStor Operators

Go Report Build Status Slack Community Meetings Gitpod ready-to-code

OpenEBS Logo

Project Status: Beta

We are always happy to list users who run cStor in production, check out our existing adopters, and their feedbacks.

The new cStor Operators support the following Operations on cStor pools and volumes:

  1. Provisioning and De-provisioning of cStor pools.
  2. Pool expansion by adding disk.
  3. Disk replacement by removing a disk.
  4. Volume replica scale up and scale down.
  5. Volume resize.
  6. Backup and Restore via Velero-plugin.
  7. Seamless upgrades of cStor Pools and Volumes
  8. Support migration from old cStor operators (using SPC) to new cStor operators using CSPC and CSI Driver.

Operators Overview

Collection of enhanced Kubernetes Operators for managing OpenEBS cStor Data Engine. At a high-level, cstor operators consist of following components.

  • cspc-operator
  • pool-manager
  • cvc-operator
  • volume-manager

An OpenEBS admin/user can use CSPC(CStorPoolCluster) API (YAML) to provision cStor pools in a Kubernetes cluster. As the name suggests, CSPC can be used to create a cluster of cStor pools across Kubernetes nodes. It is the job of cspc-operator to reconcile the CSPC object and provision CStorPoolInstance(s) as specified in the CSPC. A cStor pool is provisioned on node by utilising the disks attached to the node and is represented by CStorPoolInstance(CSPI) custom resource in a Kubernetes cluster. One has freedom to specify the disks that they want to use for pool provisioning.

CSPC API comes with a variety of tunables and features and the API can be viewed for here

Once a CSPC is created, cspc-operator provision CSPI CR and pool-manager deployment on each node where cStor pool should be created. The pool-manager deployment watches for its corresponding CSPI on the node and finally executes commands to perform pool operations e.g pool provisioning.

More info on cStor Pool CRs can be found here.

Note: It is not recommended to modify the CSPI CR and pool-manager in the running cluster unless you know what you are trying to do. CSPC should be the only point of interaction.

Once the CStor pool(s) get provisioned successfully after creating CSPC, admin/user can create PVC to provision csi CStor volumes. When a user creates PVC, CStor CSI driver creates CStorVolumeConfig(CVC) resource, managed and reconciled by the cvc-controller which creates different volume-specific resources for each persistent volume, later managed by their respective controllers, more info can be found here.

The cStor operators work in conjunction with the cStor CSI driver to provide cStor volumes for stateful workloads.

Minimum Supported Versions

K8S : 1.21+

Usage

Raising Issues And PRs

If you want to raise any issue for cstor-operators please do that at openebs/openebs.

Contributing

OpenEBS welcomes your feedback and contributions in any form possible.

Code of conduct

Please read the community code of conduct here.

cstor-operators's People

Contributors

ajeetrai7 avatar akhilerm avatar asquare14 avatar cxfcxf avatar daximillian avatar didier-durand avatar ianroberts avatar jankoehnlein avatar kmova avatar mynktl avatar nareshdesh avatar niladrih avatar nisarg1499 avatar nsathyaseelan avatar parths007 avatar prateekpandey14 avatar ranjithwingrider avatar saintmalik avatar saltperfect avatar shovanmaity avatar shubham14bajpai avatar sonasingh46 avatar soniasingla avatar sreeharimohan avatar surajssd avatar survivant avatar vaniisgh avatar w3aman avatar willyrl avatar zlymeda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cstor-operators's Issues

Couldn't attach disk, err: stat

I got this error in my pod in the event section

054b55e7" : rpc error: code = Internal desc = failed to find device path: [], last error seen: Couldn't attach disk, err: stat /dev/disk/by-path/ip-10.103.120.238:3260-iscsi-iqn.2016-09.com.openebs.cstor:pvc-38394e5a-c078-4a6b-8c2d-394b054b55e7-lun-0: no such file or directory

but OpenEBS pods look fine

root@test-pcl109:~# kubectl get pods -n openebs | grep pvc-38394e5a-c078-4a6b-8c2d-394b054b55e7
pvc-38394e5a-c078-4a6b-8c2d-394b054b55e7-target-b95977f9f-2b4wv   3/3     Running   0          12m
root@test-pcl109:~#

but I don't have more details because I destroy everything to reinstall my apps.

I'm using OpenEBS 2.5.0

Unable to mount volume 'SyncFailed' failed to sync CVR error: unable to update snapshot list

What happened

Trying to mount a cstor volume from a CStorPoolCluster with 3 zvol blockdevices fails with:

Events:
  Type     Reason       Age                 From     Message
  ----     ------       ----                ----     -------
  Warning  FailedMount  26m (x3 over 53m)   kubelet  Unable to attach or mount volumes: unmounted volumes=[dbench-pv], unattached volumes=[default-token-f6f4b dbench-pv]: timed out waiting for the condition
  Warning  FailedMount  16m (x16 over 60m)  kubelet  Unable to attach or mount volumes: unmounted volumes=[dbench-pv], unattached volumes=[dbench-pv default-token-f6f4b]: timed out waiting for the condition
  Warning  FailedMount  73s (x38 over 62m)  kubelet  MountVolume.MountDevice failed for volume "pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name cstor.csi.openebs.io not found in the list of registered CSI drivers

Not sure about the "cstor.csi.openebs.io not found" but this is what I get:

$ kubectl get csidriver  cstor.csi.openebs.io
NAME                   ATTACHREQUIRED   PODINFOONMOUNT   MODES                  AGE
cstor.csi.openebs.io   false            true             Persistent,Ephemeral   47h

and I guess this is the registration looking good to me:

$ kubectl -n openebs logs pod/openebs-cstor-csi-node-6nvt5 csi-node-driver-registrar
I0127 12:08:22.461578       1 main.go:113] Version: v2.1.0
I0127 12:08:22.463064       1 main.go:137] Attempting to open a gRPC connection with: "/plugin/csi.sock"
I0127 12:08:22.463099       1 connection.go:153] Connecting to unix:///plugin/csi.sock
I0127 12:08:23.465723       1 main.go:144] Calling CSI driver to discover driver name
I0127 12:08:23.465756       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0127 12:08:23.465767       1 connection.go:183] GRPC request: {}
I0127 12:08:23.469124       1 connection.go:185] GRPC response: {"name":"cstor.csi.openebs.io","vendor_version":"2.5.0"}
I0127 12:08:23.469204       1 connection.go:186] GRPC error: <nil>
I0127 12:08:23.469221       1 main.go:154] CSI driver name: "cstor.csi.openebs.io"
I0127 12:08:23.469263       1 node_register.go:52] Starting Registration Server at: /registration/cstor.csi.openebs.io-reg.sock
I0127 12:08:23.470242       1 node_register.go:61] Registration Server started at: /registration/cstor.csi.openebs.io-reg.sock
I0127 12:08:23.470478       1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0127 12:08:23.506240       1 main.go:80] Received GetInfo call: &InfoRequest{}
I0127 12:08:23.610771       1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
$ kubectl -n openebs logs pod/openebs-cstor-csi-controller-0 csi-attacher
I0127 12:08:55.366696       1 main.go:93] Version: v3.1.0-0-g3a0c5a0e
I0127 12:08:55.371691       1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I0127 12:08:56.372994       1 common.go:111] Probing CSI driver for readiness
I0127 12:08:56.373046       1 connection.go:182] GRPC call: /csi.v1.Identity/Probe
I0127 12:08:56.373054       1 connection.go:183] GRPC request: {}
I0127 12:08:56.378408       1 connection.go:185] GRPC response: {}
I0127 12:08:56.378479       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.378502       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0127 12:08:56.378508       1 connection.go:183] GRPC request: {}
I0127 12:08:56.379007       1 connection.go:185] GRPC response: {"name":"cstor.csi.openebs.io","vendor_version":"2.5.0"}
I0127 12:08:56.379079       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.379089       1 main.go:147] CSI driver name: "cstor.csi.openebs.io"
I0127 12:08:56.379096       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0127 12:08:56.379101       1 connection.go:183] GRPC request: {}
I0127 12:08:56.380098       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
I0127 12:08:56.380292       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.380304       1 connection.go:182] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0127 12:08:56.380309       1 connection.go:183] GRPC request: {}
I0127 12:08:56.380965       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}}]}
I0127 12:08:56.381118       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.381132       1 main.go:188] CSI driver does not support ControllerPublishUnpublish, using trivial handler
I0127 12:08:56.381139       1 connection.go:182] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0127 12:08:56.381143       1 connection.go:183] GRPC request: {}
I0127 12:08:56.381591       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}}]}
I0127 12:08:56.381714       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.382041       1 controller.go:121] Starting CSI attacher
I0127 12:08:56.382342       1 reflector.go:219] Starting reflector *v1.VolumeAttachment (10m0s) from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.382342       1 reflector.go:219] Starting reflector *v1.PersistentVolume (10m0s) from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.382405       1 reflector.go:255] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.382385       1 reflector.go:255] Listing and watching *v1.VolumeAttachment from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.482264       1 shared_informer.go:270] caches populated

The PVC looks like this:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: iobench-network-hdd-pv
spec:
  storageClassName: cluster-hdd-x1-gli1
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi

and this is my storageclass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cluster-hdd-x1-gli1
provisioner: cstor.csi.openebs.io
allowVolumeExpansion: true
parameters:
  cas-type: cstor
  cstorPoolCluster: cstor-hdd-x1-gli1-pool
  replicaCount: "1"

The cspi looks fine to me:

$ kubectl -n openebs describe cspi cstor-hdd-x1-gli1-pool-rszn
Name:         cstor-hdd-x1-gli1-pool-rszn
Namespace:    openebs
Labels:       kubernetes.io/hostname=iuck8s1
              openebs.io/cas-type=cstor
              openebs.io/cstor-pool-cluster=cstor-hdd-x1-gli1-pool
              openebs.io/version=2.5.0
Annotations:  <none>
API Version:  cstor.openebs.io/v1
Kind:         CStorPoolInstance
Metadata:
  Creation Timestamp:  2021-01-27T06:36:18Z
  Finalizers:
    cstorpoolcluster.openebs.io/finalizer
    openebs.io/pool-protection
  Generation:  4
  Managed Fields:
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:kubernetes.io/hostname:
          f:openebs.io/cas-type:
          f:openebs.io/cstor-pool-cluster:
          f:openebs.io/version:
        f:ownerReferences:
      f:spec:
        .:
        f:hostName:
        f:nodeSelector:
          .:
          f:kubernetes.io/hostname:
        f:poolConfig:
          .:
          f:dataRaidGroupType:
          f:priorityClassName:
          f:roThresholdLimit:
          f:tolerations:
      f:status:
        .:
        f:capacity:
          .:
          f:zfs:
        f:healthyReplicas:
        f:provisionedReplicas:
        f:readOnly:
      f:versionDetails:
        .:
        f:desired:
        f:status:
          .:
          f:current:
          f:dependentsUpgraded:
          f:lastUpdateTime:
    Manager:      cspc-operator
    Operation:    Update
    Time:         2021-01-27T06:36:18Z
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
      f:spec:
        f:dataRaidGroups:
      f:status:
        f:capacity:
          f:free:
          f:total:
          f:used:
          f:zfs:
            f:logicalUsed:
        f:phase:
    Manager:    pool-manager
    Operation:  Update
    Time:       2021-01-27T06:36:24Z
  Owner References:
    API Version:           cstor.openebs.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  CStorPoolCluster
    Name:                  cstor-hdd-x1-gli1-pool
    UID:                   e92ec43c-70e6-4a6a-b5aa-8f657395f844
  Resource Version:        12149729
  UID:                     842a6004-05f2-4b1b-b486-1bc9a49cd4c7
Spec:
  Data Raid Groups:
    Block Devices:
      Block Device Name:  blockdevice-1aaceed9f7fff8072ff265dcefeb7b42
      Dev Link:           /dev/zd32
      Block Device Name:  blockdevice-c136a064f23bd2e7fc4fd171ac8fdc75
      Dev Link:           /dev/zd16
      Block Device Name:  blockdevice-d72ec61c59779b64eabaadb2adef6b7a
      Dev Link:           /dev/zd0
  Host Name:              iuck8s1
  Node Selector:
    kubernetes.io/hostname:  iuck8s1
  Pool Config:
    Data Raid Group Type:  stripe
    Priority Class Name:
    Ro Threshold Limit:    85
    Tolerations:
      Effect:    NoSchedule
      Key:       ik8/role
      Operator:  Equal
      Value:     storage
Status:
  Capacity:
    Free:   5770G
    Total:  5770000612k
    Used:   612k
    Zfs:
      Logical Used:      204k
  Healthy Replicas:      0
  Phase:                 ONLINE
  Provisioned Replicas:  0
  Read Only:             false
Version Details:
  Desired:  2.5.0
  Status:
    Current:              2.5.0
    Dependents Upgraded:  true
    Last Update Time:     <nil>
Events:
  Type    Reason   Age   From               Message
  ----    ------   ----  ----               -------
  Normal  Created  47s   CStorPoolInstance  Pool created successfully

This is the output from the cstor-pool-mgmt at the time I've created the pool:

$ kubectl -n openebs logs cstor-hdd-x1-gli1-pool-rszn-67b66fd55b-7bb54 cstor-pool-mgmt
+ rm /usr/local/bin/zrepl
+ pool_manager_pid=9
+ /usr/local/bin/pool-manager start
+ trap _sigint INT
+ trap _sigterm SIGTERM
+ wait 9
E0127 06:36:20.648096       9 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0127 06:36:20.648220       9 pool.go:123] Waiting for pool container to start...
I0127 06:36:20.654741       9 controller.go:109] Setting up event handlers for CSPI
I0127 06:36:20.654965       9 controller.go:115] will set up informer event handlers for cvr
I0127 06:36:20.655124       9 new_restore_controller.go:105] Setting up event handlers for restore
I0127 06:36:20.667539       9 controller.go:110] Setting up event handlers for backup
I0127 06:36:20.671985       9 runner.go:38] Starting CStorPoolInstance controller
I0127 06:36:20.672006       9 runner.go:41] Waiting for informer caches to sync
I0127 06:36:20.676276       9 common.go:262] CStorPool found: [cannot open 'name': no such pool ]
I0127 06:36:20.676319       9 run_restore_controller.go:38] Starting CStorRestore controller
I0127 06:36:20.676326       9 run_restore_controller.go:41] Waiting for informer caches to sync
I0127 06:36:20.676334       9 run_restore_controller.go:53] Started CStorRestore workers
I0127 06:36:20.676348       9 runner.go:39] Starting CStorVolumeReplica controller
I0127 06:36:20.676353       9 runner.go:42] Waiting for informer caches to sync
I0127 06:36:20.676359       9 runner.go:47] Starting CStorVolumeReplica workers
I0127 06:36:20.676365       9 runner.go:54] Started CStorVolumeReplica workers
I0127 06:36:20.676381       9 runner.go:38] Starting CStorBackup controller
I0127 06:36:20.676397       9 runner.go:41] Waiting for informer caches to sync
I0127 06:36:20.676414       9 runner.go:53] Started CStorBackup workers
I0127 06:36:20.772140       9 runner.go:45] Starting CStorPoolInstance workers
I0127 06:36:20.772257       9 runner.go:51] Started CStorPoolInstance workers
I0127 06:36:20.790975       9 handler.go:441] Added Finalizer: cstor-hdd-x1-gli1-pool-rszn, 842a6004-05f2-4b1b-b486-1bc9a49cd4c7
I0127 06:36:20.800769       9 import.go:73] Importing pool 842a6004-05f2-4b1b-b486-1bc9a49cd4c7 cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844
E0127 06:36:20.804784       9 import.go:94] Failed to import pool by reading cache file: failed to open cache file: No such file or directory
cannot import 'cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844': no such pool available
 : exit status 1
E0127 06:36:21.305102       9 import.go:114] Failed to import pool by scanning directory: 2021-01-27/06:36:20.809 Iterating over all the devices to find zfs devices using blkid
2021-01-27/06:36:21.185 Iterated over cache devices to find zfs devices
2021-01-27/06:36:21.186 Verifying pool existence on the device /dev/sdb1
2021-01-27/06:36:21.249 Verified the device /dev/sdb1 for pool existence
2021-01-27/06:36:21.302 Verifying pool existence on the device /dev/sdb1
2021-01-27/06:36:21.303 Verified the device /dev/sdb1 for pool existence
cannot import 'cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844': no such pool available
 : exit status 1
I0127 06:36:21.462725       9 create.go:41] Creating a pool for cstor-hdd-x1-gli1-pool-rszn cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844
I0127 06:36:21.868435       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorPoolInstance", Namespace:"openebs", Name:"cstor-hdd-x1-gli1-pool-rszn", UID:"842a6004-05f2-4b1b-b486-1bc9a49cd4c7", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12149516", FieldPath:""}): type: 'Normal' reason: 'Created' Pool created successfully

Here is the log from cstor-pool

$ kubectl -n openebs logs cstor-hdd-x1-gli1-pool-rszn-67b66fd55b-7bb54 cstor-pool
Disabling dumping core
sleeping for 2 sec
2021-01-27/06:36:19.808 disabled auto import (reading of zpool.cache)
physmem = 18471389 pages (70.46 GB)
Disk /dev/zd16 does not support synchronize cache SCSI command
Disk /dev/zd0p1 does not support synchronize cache SCSI command
Disk /dev/zd32 does not support synchronize cache SCSI command

And from the maya-exporter

$ kubectl -n openebs logs cstor-hdd-x1-gli1-pool-rszn-67b66fd55b-7bb54 maya-exporter
I0127 06:36:20.054564       1 command.go:118] Starting openebs-exporter ...
I0127 06:36:20.054624       1 command.go:130] Initialising openebs-exporter for the cstor pool
I0127 06:36:20.054797       1 command.go:174] Registered openebs exporter for cstor pool
I0127 06:36:20.054808       1 server.go:41] Starting http server....

Creating the PVC added the following to the logs for cstor-pool:

2021-01-27/06:40:11.947 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 status change: DEGRADED -> DEGRADED
2021-01-27/06:40:11.947 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 rebuild status change: INIT -> INIT
2021-01-27/06:40:11.947 Instantiating zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
2021-01-27/06:40:11.966 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:12.006 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:12.039 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:12.075 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:19.067 [tgt 10.90.19.191:6060:21]: Connected
2021-01-27/06:40:19.067 [tgt 10.90.19.191:6060:21]: Handshake command for zvol pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
2021-01-27/06:40:19.069 Volume:cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 has zvol_guid:9728312340268247483
2021-01-27/06:40:19.069 IO sequence number:0 Degraded IO sequence number:0
2021-01-27/06:40:19.070 New data connection on fd 8
2021-01-27/06:40:19.071 ERROR fail on unavailable snapshot pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485@rebuild_snap
2021-01-27/06:40:19.071 Quorum is on, and rep factor 1
2021-01-27/06:40:19.071 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 rebuild status change: INIT -> DONE
2021-01-27/06:40:19.071 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 status change: DEGRADED -> HEALTHY
2021-01-27/06:40:19.071 Data connection associated with zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 fd: 8
2021-01-27/06:40:19.071 Started ack sender for zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 fd: 8
2021-01-27/06:40:30.098 [tgt 10.90.19.191:6060:21]: Replica status command for zvol pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485

and cstor-pool-mgmt:

I0127 06:40:11.872702       9 handler.go:571] cVR empty status: 769557cc-1e88-4b0b-b65f-be973e39d928
I0127 06:40:11.872801       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151153", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource create event
I0127 06:40:11.917723       9 handler.go:225] will process add event for cvr {pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn} as volume {cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485}
I0127 06:40:11.923416       9 handler.go:574] cVR 'pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn': uid '769557cc-1e88-4b0b-b65f-be973e39d928': phase 'Init': is_empty_status: false
I0127 06:40:11.923449       9 handler.go:586] cVR pending: 769557cc-1e88-4b0b-b65f-be973e39d928
2021-01-27T06:40:11.950Z        INFO    volumereplica/volumereplica.go:307              {"eventcode": "cstor.volume.replica.create.success", "msg": "Successfully created CStor volume replica", "rname": "cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485"}
I0127 06:40:11.950570       9 handler.go:468] cVR creation successful: pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn, 769557cc-1e88-4b0b-b65f-be973e39d928
I0127 06:40:11.950670       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151159", FieldPath:""}): type: 'Normal' reason: 'Created' Resource created successfully
I0127 06:40:11.967145       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151159", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11
I0127 06:40:12.006655       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151168", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11
I0127 06:40:12.039959       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151172", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11
I0127 06:40:12.076037       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151173", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11

Looking at the CVR did not reveal any more useful things to me:

$ kubectl -n openebs describe cvr pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn
Name:         pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn
Namespace:    openebs
Labels:       cstorpoolinstance.openebs.io/name=cstor-hdd-x1-gli1-pool-rszn
              cstorpoolinstance.openebs.io/uid=842a6004-05f2-4b1b-b486-1bc9a49cd4c7
              cstorvolume.openebs.io/name=pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
              openebs.io/persistent-volume=pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
              openebs.io/version=2.5.0
Annotations:  cstorpoolinstance.openebs.io/hostname: iuck8s1
API Version:  cstor.openebs.io/v1
Kind:         CStorVolumeReplica
Metadata:
  Creation Timestamp:  2021-01-27T06:40:14Z
  Finalizers:
    cstorvolumereplica.openebs.io/finalizer
  Generation:  120
  Managed Fields:
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:cstorpoolinstance.openebs.io/hostname:
        f:finalizers:
          .:
          v:"cstorvolumereplica.openebs.io/finalizer":
        f:labels:
          .:
          f:cstorpoolinstance.openebs.io/name:
          f:cstorpoolinstance.openebs.io/uid:
          f:cstorvolume.openebs.io/name:
          f:openebs.io/persistent-volume:
          f:openebs.io/version:
        f:ownerReferences:
          .:
          k:{"uid":"4a420048-edad-4aa3-8b3a-0201392695ab"}:
            .:
            f:apiVersion:
            f:blockOwnerDeletion:
            f:controller:
            f:kind:
            f:name:
            f:uid:
      f:spec:
        .:
        f:targetIP:
      f:status:
        .:
        f:capacity:
          .:
          f:total:
          f:used:
      f:versionDetails:
        .:
        f:desired:
        f:status:
          .:
          f:current:
          f:dependentsUpgraded:
          f:lastUpdateTime:
    Manager:      cvc-operator
    Operation:    Update
    Time:         2021-01-27T06:40:14Z
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:capacity:
        f:replicaid:
      f:status:
        f:capacity:
          f:total:
          f:used:
        f:lastTransitionTime:
        f:lastUpdateTime:
        f:phase:
    Manager:    pool-manager
    Operation:  Update
    Time:       2021-01-27T06:40:14Z
  Owner References:
    API Version:           cstor.openebs.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  CStorVolume
    Name:                  pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
    UID:                   4a420048-edad-4aa3-8b3a-0201392695ab
  Resource Version:        12175494
  UID:                     769557cc-1e88-4b0b-b65f-be973e39d928
Spec:
  Capacity:   4G
  Replicaid:  078C69A9AC378F4DFE74A3BC56DD42F9
  Target IP:  10.90.19.191
Status:
  Capacity:
    Total:               6K
    Used:                6K
  Last Transition Time:  2021-01-27T06:40:20Z
  Last Update Time:      2021-01-27T07:37:50Z
  Phase:                 Healthy
Version Details:
  Desired:  2.5.0
  Status:
    Current:              2.5.0
    Dependents Upgraded:  true
    Last Update Time:     <nil>
Events:
  Type     Reason      Age                From                Message
  ----     ------      ----               ----                -------
  Normal   Synced      57m                CStorVolumeReplica  Received Resource create event
  Normal   Created     57m                CStorVolumeReplica  Resource created successfully
  Warning  SyncFailed  57m (x4 over 57m)  CStorVolumeReplica  failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11

The iSCSI exports seems to be fine:

$ sudo iscsiadm --mode discovery --type sendtargets --portal 10.90.19.191
10.90.19.191:3260,1 iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
$ sudo iscsiadm -d2 -m node -T iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 -p 10.90.19.191 --login
iscsiadm: Max file limits 1024 262144
iscsiadm: default: Creating session 1/1
Logging in to [iface: default, target: iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485, portal: 10.90.19.191,3260]
Login to [iface: default, target: iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485, portal: 10.90.19.191,3260] successful.

But the device seems to have no partition:
$ sudo fdisk /dev/sda

Welcome to fdisk (util-linux 2.32.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xe845e9e2.

Command (m for help): p
Disk /dev/sda: 4 GiB, 4294967296 bytes, 8388608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 32768 bytes / 1048576 bytes
Disklabel type: dos
Disk identifier: 0xe845e9e2

feature request: Automatic Node Deletion Detection and Migration of Volume Replicas

Ticket requested by @niladrih via Slack

When using OpenEBS cStor operators, it would be a very nice feature that in the event of an unexpected node failure the cStor operators would identify the failed node and automatically move the replicas attached to the CSPI to other, valid nodes. This would allow automatic continuation of cluster functionality and the failed node could be removed and replaced later by an engineer. This likely involves several steps.

  1. Identification of failed node(s)
  2. Modification of cStor objects to remove and replace bad CSPI references
  3. Preparation of CSPI to be deleted in the future

From my testing, recovery from this case manually is complicated by the fact that removal of old CSPI references does not update the CSPI object as the pod is no longer running due to the non-existent node. I had to do manual removal of some finalizers and objects to confirm that all references to the dead CSPI were removed. Then I had to modify the CSPI itself to remove replica counts so the CSPC would allow deletion of the CSPI from the object. Ultimately this all worked but was certainly not an ideal situation.

Kubernetes 1.17.12
OpenEBS 2.7.0
cStor CSI/CSPC Operators

version label on npm-operator should be the same as the chart version

Why openebs-cstor-openebs-ndm the label is not : openebs.io/version=2.5.0 ?  but in openebs-ndm  the label is correct ?    (I think the dependancy in ctsor chart should overwrite the label)
NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE   LABELS
openebs-cstor-csi-node      4         4         4       4            4           <none>          17m   app.kubernetes.io/managed-by=Helm,chart=cstor-2.5.0,component=openebs-cstor-csi-node,heritage=Helm,name=openebs-cstor-csi-node,openebs.io/component-name=openebs-cstor-csi-node,openebs.io/version=2.5.0,release=openebs-cstor
openebs-cstor-openebs-ndm   4         4         4       4            4           <none>          17m   app.kubernetes.io/managed-by=Helm,app=openebs-ndm,chart=openebs-ndm-1.1.0,component=ndm,heritage=Helm,openebs.io/component-name=ndm,openebs.io/version=1.1.0,release=openebs-cstor
openebs-ndm                 4         4         4       4            4           <none>          22m   app.kubernetes.io/managed-by=Helm,app=openebs,chart=openebs-2.5.0,component=ndm,heritage=Helm,openebs.io/component-name=ndm,openebs.io/version=2.5.0,release=openebs
root@test-pcl109:~/setup-openebs/migration/2.5.0#

you should overwrite the openebs-cstor-openebs-ndm chart version to match the cstor version.

bug(scaledown): admission server rejects update request of CSPC

  • Created cStor pool using CSPC by populating specs.
  • One of the node the cluster was lost and tried to remove corresponding node spec from CSPC.
  • Admission server rejects the request by throwing below error:
could not find common pool specs for validation: invalid no.of nodes 0 from the given node selectors

Above error is from function.

Expected:

  • Should be able to remove the pool spec from CSPC when node doesn't exists.

add integration test for cstor volumes in cvc

List of integration tests to be covered for cStor volumes ( CVC operator side tests only )
Provisioning:

  • Volume provisioning ( single and multiple replicas )
  • Volume provisioning replica count is greater than pool count. (-ve)

Deprovisioning:

  • Volume Deprovisioning

Volume Operations:

  • Scaling up cStor volume.
  • Scaling down cStor volume.
  • Scaling up cStor volume when not enough pool are available. (-ve)
  • Scale up in the same pool. (-ve)
  • Scale up when already a scale up operation is in progress.(-ve)
  • Scale down when already a scale down operation is in progress.(-ve)

Tuneables:

  • Passing resource and limits via CVC
  • Passing tolerations via CVC
  • Passing pod priority class via CVC
  • Configure CStor Tunnables like zvolworkers, quedepth, luworkers via CVC
  • Target Node Selectors via CVC

cStor (CSI) - GitHub Updates

  • README Updates
    • Badges
    • Project Status - Beta
    • k8s version compatibility
    • Quickstart guide
    • Contributor Docs
    • Adopters.md with links to openebs/openebs adopters
    • Roadmap link to openebs project
    • Community Links
  • Helm Charts
  • GitHub Builds
  • Multiarch builds
  • Disable Travis
  • Downstream tagging
  • e2e tests
  • Upgrades
  • Migration from non CSI
  • Monitoring
  • Troubleshooting guide

CStor Volumes day2 ops documentation

  • Add volume Provisioning Docs
  • Add Snapshot and Clone Provisioning Docs
  • Add Volume Resize Docs
  • Add Raw Block Volume Docs
  • Add Volume Policy Docs
  • CStor Volume Resource Organisation docs.

Add Cstor CSI Volume Integration tests

Provisioning:

  • Add filesystem volume provisoining
  • Add Raw Block volume provisioning

Deprovisioning:

  • Volume deprovisioning
  • Delete volume if clone volume exists

Volume Operations:

  • Add remount volumes tests
  • Snapshot Create and Delete
  • Clone Volume Create and Delete
  • Online filesystem based volume resize
  • Online Raw block volume resize

Policy Tunnables:

  • Configure target pod affinity via Policy
  • Configure Replica affinity via Policy

feature request

Automatically add newly created blockdevices to exiting CStorPoolCluster.

feat(migration): CSPC-Operator should able to identify the node label changes

User Stories:
There were few user stories when users migrate
the underlying storage disks from an existing node to
new node then cstor-operator should able intelligently
detect and it should inform the pool manager which is
pointing to old to point to the new storage node.

Pre-Requisite for user story:

  • Uniquely identifying the disks i.e Blockdevice name
    shouldn't be change across reboots or detaching
    from one node and attaching to a different node.
  • All the disks participated in that pool should migrate
    all of them to same node.

High-level implementation steps:

  • When ever cStor-operator reconciles for CSPC
    it should detect exiting node selector is now pointing
    to the new node(We can do it by comparing it with
    blockdevice names).
  • After finding the CSPI and it's corresponding pool manager
    CSPC-Operator should make them point to the new node so
    that pool will be importable and available for serving the IOs.

admission-webhook.cstor.openebs.io validation of compression arguments in CStorPoolCluster

The admission webhook expects values that are invalid for the ZFS compression setting.

Example:
Create a new CStorPoolCluster

apiVersion: cstor.openebs.io/v1
kind: CStorPoolCluster
metadata:
  name: cstor-hdd-x1-gli1-pool
  namespace: openebs
spec:
  pools:
  - dataRaidGroups:
    - blockDevices:
      - blockDeviceName: blockdevice-c89ddb7adce61f8d21b6e27a488b1940
    nodeSelector:
      kubernetes.io/hostname: iuck8s1.core.idnt.net
    poolConfig:
      dataRaidGroupType: stripe
      compression: lz4 # <- lz4 ./ lz 

Result
Failed CStorPoolInstance:
Failed to create pool due to 'Failed to create pool {cstor-6c7f1f4b-9466-4239-8743-cf0cea515df9} : Failed to create pool.. cannot create 'cstor-6c7f1f4b-9466-4239-8743-cf0cea515df9': 'compression' must be one of 'on | off | lzjb | gzip | gzip-[1-9] | zle | lz4'

lz4 or on gets rejected by the webhook:
Error from server (BadRequest): error when creating "STDIN": admission webhook "admission-webhook.cstor.openebs.io" denied the request: invalid cspc specification: invalid pool spec: unsupported compression 'lz4'
Error from server (BadRequest): error when creating "STDIN": admission webhook "admission-webhook.cstor.openebs.io" denied the request: invalid cspc specification: invalid pool spec: unsupported compression 'on' specified

cstor-pool : core dump

I have a CSPC that started to crash in loop this weekend

cspc-iep-mirror-hr8z-66b67d79c-xtkrx                              3/3     Running            0          3d22h
cspc-iep-mirror-t46v-6d65b46d55-s5hgp                             2/3     CrashLoopBackOff   35         3d22h

kubectl logs -n openebs cspc-iep-mirror-t46v-6d65b46d55-s5hgp cstor-pool

root@test-pcl109:~# kubectl logs -n openebs cspc-iep-mirror-t46v-6d65b46d55-s5hgp cstor-pool
Disabling dumping core
sleeping for 2 sec
2020-12-21/14:28:07.678 disabled auto import (reading of zpool.cache)
physmem = 6167776 pages (23.53 GB)
2020-12-21/14:28:34.695 zvol cstor-48e9ac96-24ff-42c0-9a14-2531c8118a87/pvc-dd5be511-db06-4b4e-a362-e0e5456e0477 status change: DEGRADED -> DEGRADED
2020-12-21/14:28:34.695 zvol cstor-48e9ac96-24ff-42c0-9a14-2531c8118a87/pvc-dd5be511-db06-4b4e-a362-e0e5456e0477 rebuild status change: INIT -> INIT
2020-12-21/14:28:34.696 Instantiating zvol cstor-48e9ac96-24ff-42c0-9a14-2531c8118a87/pvc-dd5be511-db06-4b4e-a362-e0e5456e0477
2020-12-21/14:28:34.696 [tgt 10.103.111.95:6060:23]: Connected
2020-12-21/14:28:34.696 [tgt 10.103.111.95:6060:23]: Handshake command for zvol pvc-dd5be511-db06-4b4e-a362-e0e5456e0477
2020-12-21/14:28:34.699 zvol cstor-48e9ac96-24ff-42c0-9a14-2531c8118a87/pvc-540876e9-f4fb-4494-adaf-9418565234c0 status change: DEGRADED -> DEGRADED
2020-12-21/14:28:34.699 zvol cstor-48e9ac96-24ff-42c0-9a14-2531c8118a87/pvc-540876e9-f4fb-4494-adaf-9418565234c0 rebuild status change: INIT -> INIT
pthread_create(&kt->t_tid, &attr, &zk_thread_helper, kt) == 0 (0x1c == 0x0)
ASSERT at kernel.c:198:zk_thread_create()Fatal signal received: 6
Stack trace:
/usr/local/bin/zrepl(+0x1e82)[0x55f68d4c9e82]
/lib/x86_64-linux-gnu/libc.so.6(+0x3f040)[0x7f514f101040]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f514f100fb7]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f514f102921]
/usr/lib/libzpool.so.2(+0x3c7ae)[0x7f514fb857ae]
/usr/lib/libzpool.so.2(zk_thread_create+0x2c8)[0x7f514fb85e38]
/usr/lib/libzpool.so.2(taskq_create+0x161)[0x7f514fb88f11]
/usr/lib/libcstor.so.2(uzfs_zinfo_init+0xa0)[0x7f514f6e8ef0]
/usr/lib/libcstor.so.2(uzfs_zvol_create_cb+0x98)[0x7f514f6e5e38]
/usr/lib/libzpool.so.2(+0x601b0)[0x7f514fba91b0]
/usr/lib/libzpool.so.2(+0x60253)[0x7f514fba9253]
/usr/lib/libzpool.so.2(dmu_objset_find+0x51)[0x7f514fbacdb1]
/usr/lib/libcstor.so.2(uzfs_zvol_create_minors+0x65)[0x7f514f6e5c75]
/usr/lib/libzpool.so.2(spa_import+0x4b4)[0x7f514fbf3964]
/usr/lib/libzfs.so.2(uzfs_handle_ioctl+0x24df)[0x7f514f92efdf]
/usr/lib/libcstor.so.2(+0xd72e)[0x7f514f6df72e]
/usr/lib/libzpool.so.2(zk_thread_helper+0x12c)[0x7f514fb858dc]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f514f4ba6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f514f1e371f]
Aborted
root@test-pcl109:~#

/dev/sda shoudn't be mapped as a blockdevice when there are partitions under

Here what I did.

my drives were partionned correctly, but I ran those commands

  • wipefs -fa /dev/sda4
  • wipefs -fa /dev/sda3
  • wipefs -fa /dev/sda2
  • wipefs -fa /dev/sda1
  • wipefs -fa /dev/sda

and I recreated the partitions (because the last command removed the partitions)

I rebooted the nodes.

after that, I deleted all blockdevices with

kubectl -n openebs delete bd --all

and

kubectl -n openebs delete --all

I think it's a bug. /dev/sda shoudn't be mapped as a blockdevice, it's the root.


blockdevice-c846cd569a09bdccd9c4784bdfa69a2d   test-pcl113   /dev/sda1            268435456000    Unclaimed    Inactive   20m
blockdevice-49dc377b3906a0f1b335ea8b75efe1e2   test-pcl113   /dev/sda2            268435456000    Unclaimed    Inactive   19m
blockdevice-20de503ab2d13694af725ca910cd1b75   test-pcl113   /dev/sda3            268435456000    Unclaimed    Inactive   19m
blockdevice-56f3fc2ea97a718ef3314ec9d1aa1a68   test-pcl113   /dev/sda4            154889707520    Unclaimed    Inactive   19m
blockdevice-c6030d2bc7adb3fb2db013e4b4fd06d3   test-pcl113   /dev/sda             960197124096    Unclaimed    Active     21m
root@test-pcl113:~# lsblk -f
NAME   FSTYPE     LABEL                                      UUID                                 MOUNTPOINT
sda
├─sda1
├─sda2
├─sda3
└─sda4
sdb    zfs_member cstor-fb97744a-f60f-488b-82b5-ca42a4e79430 13151713646392886077
├─sdb1 zfs_member cstor-9b4dd361-3a78-4f7d-983f-c3a96d9b2080 12653991938895525317
└─sdb2 zfs_member cstor-9b4dd361-3a78-4f7d-983f-c3a96d9b2080 12653991938895525317
sdc
├─sdc1 vfat                                                  F277-684F                            /boot/efi
└─sdc2 ext4                                                  d39bba24-de10-415e-b2eb-ade9a9ff81f6 /
root@test-pcl113:~#

feature : allow auto-expansionning PV

I create a cstor with multiple disks from few nodes in raidmode : stripes. I have around 4T of free space.

I see the allow expanding at true.

I created PVC starting at 150Mi. I'm expecting the PV to increase until there are not free space on the cstor.

Look like now the PV will expand to a max of 1gi.

I'll like to have that feature configurable.

We could have

  • a threshold like at 80% of used space -> double the PV size (or.. use a factor in the config.. like 0.5 ...)
  • a maximum value. Don't expand over 200Gi
  • flag : allowAutomaticExpansion : true/false

the goal is to avoid to rely on manual intervention to increase the size by editing the PVC manually. We will still have to monitor the cstor pool used size.

update cstor documentation to explain how to avoid 2 ndm-operators

there is a little lack in the cstor-operator documentation in gh-pages/index.md

There should have a section that tell the user that the OpenEBS-operator Helm chart is not needed when we use cspc only.

if I do that :

helm install openebs -n openebs openebs/openebs --version 2.5.0 --set ndm.filters.enableOsDiskExcludeFilter=false --set ndm.sparse.count=5 --set apiserver.sparse.enabled=true --set featureGates.UseOSDisk.enabled=true
helm install -n openebs openebs-cstor openebs-cstor/cstor --version 2.5.0

I will obtain 2 npm-operators

NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE   LABELS
openebs-cstor-csi-node      4         4         4       4            4           <none>          17m   app.kubernetes.io/managed-by=Helm,chart=cstor-2.5.0,component=openebs-cstor-csi-node,heritage=Helm,name=openebs-cstor-csi-node,openebs.io/component-name=openebs-cstor-csi-node,openebs.io/version=2.5.0,release=openebs-cstor
openebs-cstor-openebs-ndm   4         4         4       4            4           <none>          17m   app.kubernetes.io/managed-by=Helm,app=openebs-ndm,chart=openebs-ndm-1.1.0,component=ndm,heritage=Helm,openebs.io/component-name=ndm,openebs.io/version=1.1.0,release=openebs-cstor
openebs-ndm                 4         4         4       4            4           <none>          22m   app.kubernetes.io/managed-by=Helm,app=openebs,chart=openebs-2.5.0,component=ndm,heritage=Helm,openebs.io/component-name=ndm,openebs.io/version=2.5.0,release=openebs
root@test-pcl109:~/setup-openebs/migration/2.5.0#

I could add a param to exclude npm with cstor-operator like this

helm install -n openebs openebs-cstor openebs-cstor/cstor --version 2.5.0 --set openebsNDM.enabled=false

but it's not clealy explain in the documentation.

and if we don't need openebs/openebs when using ctor-operator with cspc.. it should be written too.

and if we only use ctor-operator there should have a example to explain how to pass the value to ndm like this

helm install -n openebs openebs-cstor openebs-cstor/cstor --version 2.5.0 --set openebs-ndm.filters.enableOsDiskExcludeFilter=false 

little formating error in CSPI output

the 3th rows Capacity is not formatted correctly

NAME                                                       HOSTNAME      FREE    CAPACITY        READONLY   PROVISIONEDREPLICAS   HEALTHYREPLICAS   STATUS   AGE
cstorpoolinstance.cstor.openebs.io/cspc-iep-localpv-2z7z   test-pcl112   3160G   3160000614k     false      0                     0                 ONLINE   3m19s
cstorpoolinstance.cstor.openebs.io/cspc-iep-localpv-shtf   test-pcl113   4T      4000000642k     false      0                     0                 ONLINE   3m17s
cstorpoolinstance.cstor.openebs.io/cspc-iep-localpv-wwjp   test-pcl111   4T      4000000063500   false      0                     0                 ONLINE   3m20s
cstorpoolinstance.cstor.openebs.io/cspc-iep-mirror-ch5f    test-pcl112   1920G   1920000614k     false      0                     0                 ONLINE   3m24s
cstorpoolinstance.cstor.openebs.io/cspc-iep-mirror-cmpv    test-pcl111   1920G   1920000614k     false      0                     0                 ONLINE   3m26s
cstorpoolinstance.cstor.openebs.io/cspc-iep-mirror-fs7l    test-pcl110   1920G   1920000062k     false      0                     0                 ONLINE   3m28s
cstorpoolinstance.cstor.openebs.io/cspc-iep-mirror-tpgh    test-pcl113   1920G   1920000062k     false      0                     0                 ONLINE   3m22s
root@test-pcl109:~#

AttachVolume.FindAttachablePluginBySpec failed for volume

I have pods that are stalled in init state or one was in 0/1 Running state.. but they all have the event :
attachdetach-controller AttachVolume.FindAttachablePluginBySpec failed for volume "xxxx"

Events:
  Type     Reason              Age                  From                     Message
  ----     ------              ----                 ----                     -------
  Warning  FailedAttachVolume  3m8s (x279 over 9h)  attachdetach-controller  AttachVolume.FindAttachablePluginBySpec failed for volume "pvc-21d65a88-5c98-4759-985c-e7af088798ff"
root@test-pcl109:~#
root@test-pcl109:~# kubectl get pv pvc-21d65a88-5c98-4759-985c-e7af088798ff
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS    REASON   AGE
pvc-21d65a88-5c98-4759-985c-e7af088798ff   2Gi        RWO            Delete           Bound    default/datadir-twin-neo4j-core-0   sc-iep-mirror            4d17h
root@test-pcl109:~# kubectl -n openebs get cva | grep pvc-21d65a88-5c98-4759-985c-e7af088798ff
pvc-21d65a88-5c98-4759-985c-e7af088798ff-test-pcl111   4d17h
root@test-pcl109:~# kubectl -n openebs get cvr | grep pvc-21d65a88-5c98-4759-985c-e7af088798ff
pvc-21d65a88-5c98-4759-985c-e7af088798ff-cspc-iep-mirror-zqbr    2.09M       17.3M   Healthy   4d17h
root@test-pcl109:~# kubectl -n openebs get cvc | grep pvc-21d65a88-5c98-4759-985c-e7af088798ff
pvc-21d65a88-5c98-4759-985c-e7af088798ff   2Gi        Bound    4d17h
root@test-pcl109:~# kubectl -n openebs get pods | grep pvc-21d65a88-5c98-4759-985c-e7af088798ff
pvc-21d65a88-5c98-4759-985c-e7af088798ff-target-5dfd4448ccrbv5f   3/3     Running   0          41h
root@test-pcl109:~#
1. kubectl get csidriver cstor.csi.openebs.io
2. kubectl get sts openebs-cstor-csi-controller -n openebs -oyaml


root@test-pcl109:~# kubectl get csidriver cstor.csi.openebs.io
NAME                   ATTACHREQUIRED   PODINFOONMOUNT   MODES                  AGE
cstor.csi.openebs.io   false            true             Persistent,Ephemeral   5d22h
root@test-pcl109:~#

controler-log.txt

I deleted that pod and it returned to RUNNING STATE fine. BUT after that. I got other pods that were previously in "Crashloop" because they were waiting after the database to came online (the pod that I previously deleted).

Some of them came back online fine.. but I have other pods that got the error :

AttachVolume.FindAttachablePluginBySpec failed for volume

I created a new cluster last week with kubeadm. And reinstalled OpenEBS cstor from ctor Helm chart. and installed my application after recreating the cstor pool.

NAME          STATUS   ROLES    AGE     VERSION
test-pcl109   Ready    master   6d23h   v1.18.4
test-pcl110   Ready    <none>   6d23h   v1.18.4
test-pcl111   Ready    <none>   6d23h   v1.18.4
test-pcl112   Ready    <none>   6d23h   v1.18.4
test-pcl113   Ready    <none>   6d23h   v1.18.4
root@test-pcl109:~#

from OpenEBS team in Slack : Since our new version of csidiver doesn't support attach, attach should not be tried by kubernetes. I think kubernetes is looking at some cashed version of the driver when the error occurred (Not sure). We have disabled attach/detach functionality in latest cstor-csi versions (2.6.0).

openebs/api major V2 go modules release

With go modules the major release (v2, v3 .. onwards) to be consumed as modules need some changes

  • change openebs/api the module path in go.mod file --> github.com/openebs/api/v2
  • then import with same path ^^ at the other ends.

There are two alternative mechanisms to release a v2 or higher module. Note that with both techniques, the new module release becomes available to consumers when the module author pushes the new tags. Using the example of creating a v3.0.0 release, the two options are:

  • Major branch: Update the go.mod file to include a /v3 at the end of the module path in the module directive (e.g., module github.com/my/module/v3). Update import statements within the module to also use /v3 (e.g., import "github.com/my/module/v3/mypkg"). Tag the release with v3.0.0.

  • Major subdirectory: Create a new v3 subdirectory (e.g., my/module/v3) and place a new go.mod file in that subdirectory. The module path must end with /v3. Copy or move the code into the v3 subdirectory. Update import statements within the module to also use /v3 (e.g., import "github.com/my/module/v3/mypkg"). Tag the release with v3.0.0.

More info https://github.com/golang/go/wiki/Modules#releasing-modules-v2-or-higher

Refer here https://blog.golang.org/v2-go-modules

How to setup virtual iscsi storage in containers

i am using k3d cluster as kubernetes cluster it has no nodes ,it has only containers as nodes .
How to setup cStor pool on k3d cluster

if i do kubectl get bd -n openebs i don't see any blockdevices because they are not nodes.

how to configure cStor pool for containers

wrong helm repo / chart name

If I try to follow the guide, it won't works because the name in the guide is not the same name in the repo.

root@test-pcl114:~# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "cstor" chart repository
...Successfully got an update from the "openebs" chart repository
...Successfully got an update from the "vmware-tanzu" chart repository
...Successfully got an update from the "minio" chart repository
...Successfully got an update from the "bitnami" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈Happy Helming!⎈
root@test-pcl114:~# helm install -n openebs openebs-cstor cstor/openebs-cstor
Error: failed to download "cstor/openebs-cstor" (hint: running `helm repo update` may help)
root@test-pcl114:~# helm repo list
NAME                    URL
bitnami                 https://charts.bitnami.com/bitnami
stable                  https://charts.helm.sh/stable
openebs                 https://openebs.github.io/charts
vmware-tanzu            https://vmware-tanzu.github.io/helm-charts
minio                   https://helm.min.io
cstor                   https://openebs.github.io/cstor-operators
root@test-pcl114:~# helm search repo cstor
NAME            CHART VERSION   APP VERSION     DESCRIPTION
cstor/cstor     2.5.0           2.5.0           CStor-Operator helm chart for Kubernetes
root@test-pcl114:~#

the repo name should be :

openebs-cstor and the helm chart should be cstor.

the guide need to be updated to reflect that

stuck at FailedMount : UnmountUnderProgress

This morning I have again this error :

 Warning  FailedMount  2m1s (x3 over 6m5s)  kubelet            MountVolume.MountDevice failed for volume "pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de" : rpc error: code = Internal desc = Volume pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de Busy, status: UnmountUnderProgress

It's the same scenario again. I did a helm install myapp. it was working for 7 days. This morning the database crached, so I did a : helm delete myapp. Waited that all the pods were completly removed. After that I did again : helm install myapp. Not now the database pod is stucked at : UnmountUnderProgress.
We never found how to fix that issue.

I have trouble telling my dev team that I have no idea what going on. I try also what was suggested last time.. but I still have that error when I try the command with zpool..

root@cspc-iep-mirror-szdj-67659c8c66-jztlt:/# zpool status
  pool: cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77
 state: ONLINE
  scan: none requested
config:
        NAME                                             STATE     READ WRITE CKSUM
        cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77       ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            scsi-0ATA_ST8000VE000-2P61_WKD3G868-part1    ONLINE       0     0     0
            scsi-1ATA_ST8000VE000-2P6101_WKD3EQ63-part1  ONLINE       0     0     0
errors: No known data errors
root@cspc-iep-mirror-szdj-67659c8c66-jztlt:/# zpool scrub cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77
cannot scrub cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77: operation not supported on this type of pool
root@cspc-iep-mirror-szdj-67659c8c66-jztlt:/#

What I should do next ?

I have openebs 2.7.0

cspi status should show usable pool capacity

As of know kubectl get cspi -n openebs shows sum of all the raid groups in the pool but for usability purposes, it needs to show the usable capacity of the pool after excluding metadata and raid parity.

feat(cspc-operator): cStor should identify the active pool on the disk

Use cases:

  • CSPC and CSI volumes are provisioned and everything is working fine. Now all of a sudden etcd was crashed and no backup exists for etcd but underlying disks still intact with cStor pool data.
  • CSPC and CSI volumes are provisioned and everything works fine. All of a sudden cluster got crashed/hung(which means destruction happens on all K8s controllers as well as etcd) but external underlying disks that were still intact with pool data.

** High-level solution**:

  • If a new cluster is created by attaching block devices that have valid pool data. Upon creation of CSPC with such block devices old pool should be re-imported.

unable to mount volume for statefulset application when the pods moved from nodes

new usecase problem with statefulset cstor mount :
here what I did.
0 - kubectl get statefulset maria -o yaml > statefulset-maria.yaml
1 - I remove my statefulset application that was running in node test110 (that node after that stop been healthy : PUGL warning)
2 - I updated my statefulset to change the database image mariadb 1.3.20->1.3.27
3 - kubectl apply -f statefulset-maria.yaml
4 - kubernetes installed it on node test 111 this time.
5 - I kill the pods few times.. and each time the database came back running fine
6 - the node 110 came back to life. So I kill my pod, and the pod was move to test110
7 - the pod is in ContainerCreating over 30 min. (got the event : MountVolume.WaitForAttach failed for volume "pvc-4422472a-6e1c-4b03-9287-63d8ce5aa793" : rpc error: code = Internal desc = Volume still mounted on node: test-pcl110)
8 - I did a describe on this

root@test-pcl109:/tmp# kubectl get volumeattachments csi-a59a3cd375d34f5604e49c735bff447820c4d2b4cef258b9a4a1bf1a6007a806
NAME                                                                   ATTACHER               PV                                         NODE          ATTACHED   AGE
csi-a59a3cd375d34f5604e49c735bff447820c4d2b4cef258b9a4a1bf1a6007a806   cstor.csi.openebs.io   pvc-4422472a-6e1c-4b03-9287-63d8ce5aa793   test-pcl110   false      18m

root@test-pcl109:/tmp# kubectl describe volumeattachments csi-a59a3cd375d34f5604e49c735bff447820c4d2b4cef258b9a4a1bf1a6007a806
Name:         csi-a59a3cd375d34f5604e49c735bff447820c4d2b4cef258b9a4a1bf1a6007a806
Namespace:
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: test-pcl110
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:  2020-12-09T16:10:52Z
  Finalizers:
    external-attacher/cstor-csi-openebs-io
  Managed Fields:
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:attacher:
        f:nodeName:
        f:source:
          f:persistentVolumeName:
    Manager:      kube-controller-manager
    Operation:    Update
    Time:         2020-12-09T16:10:52Z
    API Version:  storage.k8s.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:csi.alpha.kubernetes.io/node-id:
        f:finalizers:
          .:
          v:"external-attacher/cstor-csi-openebs-io":
      f:status:
        f:attachError:
          .:
          f:message:
          f:time:
    Manager:         csi-attacher
    Operation:       Update
    Time:            2020-12-09T16:29:09Z
  Resource Version:  73111510
  Self Link:         /apis/storage.k8s.io/v1/volumeattachments/csi-a59a3cd375d34f5604e49c735bff447820c4d2b4cef258b9a4a1bf1a6007a806
  UID:               4626d5a2-0519-47f0-96be-99837947dd79
Spec:
  Attacher:   cstor.csi.openebs.io
  Node Name:  test-pcl110
  Source:
    Persistent Volume Name:  pvc-4422472a-6e1c-4b03-9287-63d8ce5aa793
Status:
  Attach Error:
    Message:  rpc error: code = Internal desc = Volume still mounted on node: test-pcl110
    Time:     2020-12-09T16:29:09Z
  Attached:   false
Events:       <none>

9 - What I do now ?
10 - What I do to be sure that it won't happen again.. I'm trying to find a way to do that automaticaly.

Add CSPC integration tests.

List of integration tests that should be covered :
Provisioning:

  • Stripe pool provisioning with multiple disks and raid groups ( includes write cache and data raid groups )
  • mirror pool provisioning with multiple disks and raid groups ( includes write cache and data raid groups )
  • raidz1 pool provisioning with multiple disks and raid groups ( includes write cache and data raid groups )
  • raidz2 pool provisioning with multiple disks and raid groups ( includes write cache and data raid groups )

Deprovisioning:

  • Deleting a CSPC.
  • Deleting a CSPC when it is being used by volume.
  • Deleting pools by removing pool spec from CSPC.
  • Deleting specs when the pool has a replica on it. (-ve)
  • Deleting raid groups. (-ve)

Pool operations:

  • Block device expansion for stripe pool.

  • Block device expansion for mirror pool.

  • Block device expansion for raidz1 pool.

  • Block device expansion for raidz2 pool.

  • Block device replacement for mirror pool.

  • Block device expansion for raidz1 pool.

  • Block device expansion for raidz2 pool.

  • Block device expansion and replacement on the same raid group.

  • Block device expansion and replacement on the diff raid groups.

  • Block device expansion by adding multiple raid groups.

  • Block device replacement simultaneously in multiple raid groups.

  • Block device expansion using a block device that is already in use by the same CSPC. (-ve)

  • Block device expansion using a block device that is already in use by the other CSPC. (-ve)

  • Block device expansion by adding a raid group with incorrect block device count. (-ve)

  • Block device expansion by adding the same raid group (non-stripe) (-ve)

  • More than 1 block device replacement in a single raid group (-ve)

  • Block device replacement in stripe (-ve)

  • Block device replacement when the new disk has less capacity than the existing one. (-ve)

  • Block device replacement using a block device that is undergoing replacement. (-ve)

Tuneables:

  • Passing resource and limits via CSPC
  • Defaulting resource and limits
  • Passing tolerations via CSPC
  • Defaulting tolerations.
  • Passing pod priority class via CSPC
  • Defaulting pod priority class
  • Passing compression via CSPC
  • Defaulting compression
  • Passing RO threshold via CSPC
  • Defaulting RO threshhold

the cstor chart should include ndm settings

the description in readme.md should include the params for NDM like

helm upgrade -n openebs openebs-cstor openebs-cstor/cstor --set openebs-ndm.featureGates.UseOSDisk.enabled=true --set openebs-ndm.featureGates.enabled=true

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.