openebs / cstor-operators Goto Github PK

View Code? Open in Web Editor NEW

93.0 13.0 69.0 26.27 MB

Collection of OpenEBS cStor Data Engine Operators

Home Page: https://openebs.io

License: Apache License 2.0

Go 95.73% Makefile 0.86% Shell 1.32% Dockerfile 1.06% Smarty 0.19% Mustache 0.36% Jinja 0.48%

kubernetes storage kubernetes-operators openebs golang hacktoberfest cstor-pool cstor cstor-csi-driver

cstor-operators's Introduction

cStor Operators

Project Status: Beta

We are always happy to list users who run cStor in production, check out our existing adopters, and their feedbacks.

The new cStor Operators support the following Operations on cStor pools and volumes:

Provisioning and De-provisioning of cStor pools.
Pool expansion by adding disk.
Disk replacement by removing a disk.
Volume replica scale up and scale down.
Volume resize.
Backup and Restore via Velero-plugin.
Seamless upgrades of cStor Pools and Volumes
Support migration from old cStor operators (using SPC) to new cStor operators using CSPC and CSI Driver.

Operators Overview

Collection of enhanced Kubernetes Operators for managing OpenEBS cStor Data Engine. At a high-level, cstor operators consist of following components.

cspc-operator
pool-manager
cvc-operator
volume-manager

An OpenEBS admin/user can use CSPC(CStorPoolCluster) API (YAML) to provision cStor pools in a Kubernetes cluster. As the name suggests, CSPC can be used to create a cluster of cStor pools across Kubernetes nodes. It is the job of cspc-operator to reconcile the CSPC object and provision CStorPoolInstance(s) as specified in the CSPC. A cStor pool is provisioned on node by utilising the disks attached to the node and is represented by CStorPoolInstance(CSPI) custom resource in a Kubernetes cluster. One has freedom to specify the disks that they want to use for pool provisioning.

CSPC API comes with a variety of tunables and features and the API can be viewed for here

Once a CSPC is created, cspc-operator provision CSPI CR and pool-manager deployment on each node where cStor pool should be created. The pool-manager deployment watches for its corresponding CSPI on the node and finally executes commands to perform pool operations e.g pool provisioning.

More info on cStor Pool CRs can be found here.

Note: It is not recommended to modify the CSPI CR and pool-manager in the running cluster unless you know what you are trying to do. CSPC should be the only point of interaction.

Once the CStor pool(s) get provisioned successfully after creating CSPC, admin/user can create PVC to provision csi CStor volumes. When a user creates PVC, CStor CSI driver creates CStorVolumeConfig(CVC) resource, managed and reconciled by the cvc-controller which creates different volume-specific resources for each persistent volume, later managed by their respective controllers, more info can be found here.

The cStor operators work in conjunction with the cStor CSI driver to provide cStor volumes for stateful workloads.

Minimum Supported Versions

K8S : 1.21+

Usage

Raising Issues And PRs

If you want to raise any issue for cstor-operators please do that at openebs/openebs.

Contributing

OpenEBS welcomes your feedback and contributions in any form possible.

Join OpenEBS community on Kubernetes Slack
- Already signed up? Head to our discussions at #openebs
Want to raise an issue or help with fixes and features?
- See open issues
- See contributing guide
- See Project Roadmap
- Want to join our contributor community meetings, check this out.
Join our OpenEBS CNCF Mailing lists
- For OpenEBS project updates, subscribe to OpenEBS Announcements
- For interacting with other OpenEBS users, subscribe to OpenEBS Users

Code of conduct

Please read the community code of conduct here.

cstor-operators's People

Contributors

Stargazers

Watchers

cstor-operators's Issues

Automate the CRD generate with OpenAPIV3 validation

bug(cspc-operator): Dynamically adding write cache raidgroup is not propogating to CSPI

Provisioned cStor pool using OpenEBS 2.0.0 CStor-Operator2.0.0.
Tried to add write cache raid group dynamically no errors were shown
but changes are not propagated to CSPI.

Break the version dependency between CSI-Controller and CStor-Operator

If CSI-Controller and CVC-Operator are in a different version then CVC will not be reconciled.
To fix this CVC-Version should be populated by CVC-Controller instead of CSI-Controller.

Couldn't attach disk, err: stat

I got this error in my pod in the event section

054b55e7" : rpc error: code = Internal desc = failed to find device path: [], last error seen: Couldn't attach disk, err: stat /dev/disk/by-path/ip-10.103.120.238:3260-iscsi-iqn.2016-09.com.openebs.cstor:pvc-38394e5a-c078-4a6b-8c2d-394b054b55e7-lun-0: no such file or directory

but OpenEBS pods look fine

root@test-pcl109:~# kubectl get pods -n openebs | grep pvc-38394e5a-c078-4a6b-8c2d-394b054b55e7
pvc-38394e5a-c078-4a6b-8c2d-394b054b55e7-target-b95977f9f-2b4wv   3/3     Running   0          12m
root@test-pcl109:~#

but I don't have more details because I destroy everything to reinstall my apps.

I'm using OpenEBS 2.5.0

Unable to mount volume 'SyncFailed' failed to sync CVR error: unable to update snapshot list

What happened

Trying to mount a cstor volume from a CStorPoolCluster with 3 zvol blockdevices fails with:

Events:
  Type     Reason       Age                 From     Message
  ----     ------       ----                ----     -------
  Warning  FailedMount  26m (x3 over 53m)   kubelet  Unable to attach or mount volumes: unmounted volumes=[dbench-pv], unattached volumes=[default-token-f6f4b dbench-pv]: timed out waiting for the condition
  Warning  FailedMount  16m (x16 over 60m)  kubelet  Unable to attach or mount volumes: unmounted volumes=[dbench-pv], unattached volumes=[dbench-pv default-token-f6f4b]: timed out waiting for the condition
  Warning  FailedMount  73s (x38 over 62m)  kubelet  MountVolume.MountDevice failed for volume "pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name cstor.csi.openebs.io not found in the list of registered CSI drivers

Not sure about the "cstor.csi.openebs.io not found" but this is what I get:

$ kubectl get csidriver  cstor.csi.openebs.io
NAME                   ATTACHREQUIRED   PODINFOONMOUNT   MODES                  AGE
cstor.csi.openebs.io   false            true             Persistent,Ephemeral   47h

and I guess this is the registration looking good to me:

$ kubectl -n openebs logs pod/openebs-cstor-csi-node-6nvt5 csi-node-driver-registrar
I0127 12:08:22.461578       1 main.go:113] Version: v2.1.0
I0127 12:08:22.463064       1 main.go:137] Attempting to open a gRPC connection with: "/plugin/csi.sock"
I0127 12:08:22.463099       1 connection.go:153] Connecting to unix:///plugin/csi.sock
I0127 12:08:23.465723       1 main.go:144] Calling CSI driver to discover driver name
I0127 12:08:23.465756       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0127 12:08:23.465767       1 connection.go:183] GRPC request: {}
I0127 12:08:23.469124       1 connection.go:185] GRPC response: {"name":"cstor.csi.openebs.io","vendor_version":"2.5.0"}
I0127 12:08:23.469204       1 connection.go:186] GRPC error: <nil>
I0127 12:08:23.469221       1 main.go:154] CSI driver name: "cstor.csi.openebs.io"
I0127 12:08:23.469263       1 node_register.go:52] Starting Registration Server at: /registration/cstor.csi.openebs.io-reg.sock
I0127 12:08:23.470242       1 node_register.go:61] Registration Server started at: /registration/cstor.csi.openebs.io-reg.sock
I0127 12:08:23.470478       1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0127 12:08:23.506240       1 main.go:80] Received GetInfo call: &InfoRequest{}
I0127 12:08:23.610771       1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

$ kubectl -n openebs logs pod/openebs-cstor-csi-controller-0 csi-attacher
I0127 12:08:55.366696       1 main.go:93] Version: v3.1.0-0-g3a0c5a0e
I0127 12:08:55.371691       1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I0127 12:08:56.372994       1 common.go:111] Probing CSI driver for readiness
I0127 12:08:56.373046       1 connection.go:182] GRPC call: /csi.v1.Identity/Probe
I0127 12:08:56.373054       1 connection.go:183] GRPC request: {}
I0127 12:08:56.378408       1 connection.go:185] GRPC response: {}
I0127 12:08:56.378479       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.378502       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0127 12:08:56.378508       1 connection.go:183] GRPC request: {}
I0127 12:08:56.379007       1 connection.go:185] GRPC response: {"name":"cstor.csi.openebs.io","vendor_version":"2.5.0"}
I0127 12:08:56.379079       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.379089       1 main.go:147] CSI driver name: "cstor.csi.openebs.io"
I0127 12:08:56.379096       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0127 12:08:56.379101       1 connection.go:183] GRPC request: {}
I0127 12:08:56.380098       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
I0127 12:08:56.380292       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.380304       1 connection.go:182] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0127 12:08:56.380309       1 connection.go:183] GRPC request: {}
I0127 12:08:56.380965       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}}]}
I0127 12:08:56.381118       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.381132       1 main.go:188] CSI driver does not support ControllerPublishUnpublish, using trivial handler
I0127 12:08:56.381139       1 connection.go:182] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0127 12:08:56.381143       1 connection.go:183] GRPC request: {}
I0127 12:08:56.381591       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}}]}
I0127 12:08:56.381714       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.382041       1 controller.go:121] Starting CSI attacher
I0127 12:08:56.382342       1 reflector.go:219] Starting reflector *v1.VolumeAttachment (10m0s) from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.382342       1 reflector.go:219] Starting reflector *v1.PersistentVolume (10m0s) from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.382405       1 reflector.go:255] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.382385       1 reflector.go:255] Listing and watching *v1.VolumeAttachment from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.482264       1 shared_informer.go:270] caches populated

The PVC looks like this:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: iobench-network-hdd-pv
spec:
  storageClassName: cluster-hdd-x1-gli1
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi

and this is my storageclass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cluster-hdd-x1-gli1
provisioner: cstor.csi.openebs.io
allowVolumeExpansion: true
parameters:
  cas-type: cstor
  cstorPoolCluster: cstor-hdd-x1-gli1-pool
  replicaCount: "1"

The cspi looks fine to me:

$ kubectl -n openebs describe cspi cstor-hdd-x1-gli1-pool-rszn
Name:         cstor-hdd-x1-gli1-pool-rszn
Namespace:    openebs
Labels:       kubernetes.io/hostname=iuck8s1
              openebs.io/cas-type=cstor
              openebs.io/cstor-pool-cluster=cstor-hdd-x1-gli1-pool
              openebs.io/version=2.5.0
Annotations:  <none>
API Version:  cstor.openebs.io/v1
Kind:         CStorPoolInstance
Metadata:
  Creation Timestamp:  2021-01-27T06:36:18Z
  Finalizers:
    cstorpoolcluster.openebs.io/finalizer
    openebs.io/pool-protection
  Generation:  4
  Managed Fields:
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:kubernetes.io/hostname:
          f:openebs.io/cas-type:
          f:openebs.io/cstor-pool-cluster:
          f:openebs.io/version:
        f:ownerReferences:
      f:spec:
        .:
        f:hostName:
        f:nodeSelector:
          .:
          f:kubernetes.io/hostname:
        f:poolConfig:
          .:
          f:dataRaidGroupType:
          f:priorityClassName:
          f:roThresholdLimit:
          f:tolerations:
      f:status:
        .:
        f:capacity:
          .:
          f:zfs:
        f:healthyReplicas:
        f:provisionedReplicas:
        f:readOnly:
      f:versionDetails:
        .:
        f:desired:
        f:status:
          .:
          f:current:
          f:dependentsUpgraded:
          f:lastUpdateTime:
    Manager:      cspc-operator
    Operation:    Update
    Time:         2021-01-27T06:36:18Z
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
      f:spec:
        f:dataRaidGroups:
      f:status:
        f:capacity:
          f:free:
          f:total:
          f:used:
          f:zfs:
            f:logicalUsed:
        f:phase:
    Manager:    pool-manager
    Operation:  Update
    Time:       2021-01-27T06:36:24Z
  Owner References:
    API Version:           cstor.openebs.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  CStorPoolCluster
    Name:                  cstor-hdd-x1-gli1-pool
    UID:                   e92ec43c-70e6-4a6a-b5aa-8f657395f844
  Resource Version:        12149729
  UID:                     842a6004-05f2-4b1b-b486-1bc9a49cd4c7
Spec:
  Data Raid Groups:
    Block Devices:
      Block Device Name:  blockdevice-1aaceed9f7fff8072ff265dcefeb7b42
      Dev Link:           /dev/zd32
      Block Device Name:  blockdevice-c136a064f23bd2e7fc4fd171ac8fdc75
      Dev Link:           /dev/zd16
      Block Device Name:  blockdevice-d72ec61c59779b64eabaadb2adef6b7a
      Dev Link:           /dev/zd0
  Host Name:              iuck8s1
  Node Selector:
    kubernetes.io/hostname:  iuck8s1
  Pool Config:
    Data Raid Group Type:  stripe
    Priority Class Name:
    Ro Threshold Limit:    85
    Tolerations:
      Effect:    NoSchedule
      Key:       ik8/role
      Operator:  Equal
      Value:     storage
Status:
  Capacity:
    Free:   5770G
    Total:  5770000612k
    Used:   612k
    Zfs:
      Logical Used:      204k
  Healthy Replicas:      0
  Phase:                 ONLINE
  Provisioned Replicas:  0
  Read Only:             false
Version Details:
  Desired:  2.5.0
  Status:
    Current:              2.5.0
    Dependents Upgraded:  true
    Last Update Time:     <nil>
Events:
  Type    Reason   Age   From               Message
  ----    ------   ----  ----               -------
  Normal  Created  47s   CStorPoolInstance  Pool created successfully

This is the output from the cstor-pool-mgmt at the time I've created the pool:

$ kubectl -n openebs logs cstor-hdd-x1-gli1-pool-rszn-67b66fd55b-7bb54 cstor-pool-mgmt
+ rm /usr/local/bin/zrepl
+ pool_manager_pid=9
+ /usr/local/bin/pool-manager start
+ trap _sigint INT
+ trap _sigterm SIGTERM
+ wait 9
E0127 06:36:20.648096       9 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0127 06:36:20.648220       9 pool.go:123] Waiting for pool container to start...
I0127 06:36:20.654741       9 controller.go:109] Setting up event handlers for CSPI
I0127 06:36:20.654965       9 controller.go:115] will set up informer event handlers for cvr
I0127 06:36:20.655124       9 new_restore_controller.go:105] Setting up event handlers for restore
I0127 06:36:20.667539       9 controller.go:110] Setting up event handlers for backup
I0127 06:36:20.671985       9 runner.go:38] Starting CStorPoolInstance controller
I0127 06:36:20.672006       9 runner.go:41] Waiting for informer caches to sync
I0127 06:36:20.676276       9 common.go:262] CStorPool found: [cannot open 'name': no such pool ]
I0127 06:36:20.676319       9 run_restore_controller.go:38] Starting CStorRestore controller
I0127 06:36:20.676326       9 run_restore_controller.go:41] Waiting for informer caches to sync
I0127 06:36:20.676334       9 run_restore_controller.go:53] Started CStorRestore workers
I0127 06:36:20.676348       9 runner.go:39] Starting CStorVolumeReplica controller
I0127 06:36:20.676353       9 runner.go:42] Waiting for informer caches to sync
I0127 06:36:20.676359       9 runner.go:47] Starting CStorVolumeReplica workers
I0127 06:36:20.676365       9 runner.go:54] Started CStorVolumeReplica workers
I0127 06:36:20.676381       9 runner.go:38] Starting CStorBackup controller
I0127 06:36:20.676397       9 runner.go:41] Waiting for informer caches to sync
I0127 06:36:20.676414       9 runner.go:53] Started CStorBackup workers
I0127 06:36:20.772140       9 runner.go:45] Starting CStorPoolInstance workers
I0127 06:36:20.772257       9 runner.go:51] Started CStorPoolInstance workers
I0127 06:36:20.790975       9 handler.go:441] Added Finalizer: cstor-hdd-x1-gli1-pool-rszn, 842a6004-05f2-4b1b-b486-1bc9a49cd4c7
I0127 06:36:20.800769       9 import.go:73] Importing pool 842a6004-05f2-4b1b-b486-1bc9a49cd4c7 cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844
E0127 06:36:20.804784       9 import.go:94] Failed to import pool by reading cache file: failed to open cache file: No such file or directory
cannot import 'cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844': no such pool available
 : exit status 1
E0127 06:36:21.305102       9 import.go:114] Failed to import pool by scanning directory: 2021-01-27/06:36:20.809 Iterating over all the devices to find zfs devices using blkid
2021-01-27/06:36:21.185 Iterated over cache devices to find zfs devices
2021-01-27/06:36:21.186 Verifying pool existence on the device /dev/sdb1
2021-01-27/06:36:21.249 Verified the device /dev/sdb1 for pool existence
2021-01-27/06:36:21.302 Verifying pool existence on the device /dev/sdb1
2021-01-27/06:36:21.303 Verified the device /dev/sdb1 for pool existence
cannot import 'cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844': no such pool available
 : exit status 1
I0127 06:36:21.462725       9 create.go:41] Creating a pool for cstor-hdd-x1-gli1-pool-rszn cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844
I0127 06:36:21.868435       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorPoolInstance", Namespace:"openebs", Name:"cstor-hdd-x1-gli1-pool-rszn", UID:"842a6004-05f2-4b1b-b486-1bc9a49cd4c7", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12149516", FieldPath:""}): type: 'Normal' reason: 'Created' Pool created successfully

Here is the log from cstor-pool

$ kubectl -n openebs logs cstor-hdd-x1-gli1-pool-rszn-67b66fd55b-7bb54 cstor-pool
Disabling dumping core
sleeping for 2 sec
2021-01-27/06:36:19.808 disabled auto import (reading of zpool.cache)
physmem = 18471389 pages (70.46 GB)
Disk /dev/zd16 does not support synchronize cache SCSI command
Disk /dev/zd0p1 does not support synchronize cache SCSI command
Disk /dev/zd32 does not support synchronize cache SCSI command

And from the maya-exporter

$ kubectl -n openebs logs cstor-hdd-x1-gli1-pool-rszn-67b66fd55b-7bb54 maya-exporter
I0127 06:36:20.054564       1 command.go:118] Starting openebs-exporter ...
I0127 06:36:20.054624       1 command.go:130] Initialising openebs-exporter for the cstor pool
I0127 06:36:20.054797       1 command.go:174] Registered openebs exporter for cstor pool
I0127 06:36:20.054808       1 server.go:41] Starting http server....

Creating the PVC added the following to the logs for cstor-pool:

2021-01-27/06:40:11.947 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 status change: DEGRADED -> DEGRADED
2021-01-27/06:40:11.947 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 rebuild status change: INIT -> INIT
2021-01-27/06:40:11.947 Instantiating zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
2021-01-27/06:40:11.966 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:12.006 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:12.039 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:12.075 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:19.067 [tgt 10.90.19.191:6060:21]: Connected
2021-01-27/06:40:19.067 [tgt 10.90.19.191:6060:21]: Handshake command for zvol pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
2021-01-27/06:40:19.069 Volume:cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 has zvol_guid:9728312340268247483
2021-01-27/06:40:19.069 IO sequence number:0 Degraded IO sequence number:0
2021-01-27/06:40:19.070 New data connection on fd 8
2021-01-27/06:40:19.071 ERROR fail on unavailable snapshot pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485@rebuild_snap
2021-01-27/06:40:19.071 Quorum is on, and rep factor 1
2021-01-27/06:40:19.071 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 rebuild status change: INIT -> DONE
2021-01-27/06:40:19.071 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 status change: DEGRADED -> HEALTHY
2021-01-27/06:40:19.071 Data connection associated with zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 fd: 8
2021-01-27/06:40:19.071 Started ack sender for zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 fd: 8
2021-01-27/06:40:30.098 [tgt 10.90.19.191:6060:21]: Replica status command for zvol pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485

and cstor-pool-mgmt:

I0127 06:40:11.872702       9 handler.go:571] cVR empty status: 769557cc-1e88-4b0b-b65f-be973e39d928
I0127 06:40:11.872801       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151153", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource create event
I0127 06:40:11.917723       9 handler.go:225] will process add event for cvr {pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn} as volume {cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485}
I0127 06:40:11.923416       9 handler.go:574] cVR 'pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn': uid '769557cc-1e88-4b0b-b65f-be973e39d928': phase 'Init': is_empty_status: false
I0127 06:40:11.923449       9 handler.go:586] cVR pending: 769557cc-1e88-4b0b-b65f-be973e39d928
2021-01-27T06:40:11.950Z        INFO    volumereplica/volumereplica.go:307              {"eventcode": "cstor.volume.replica.create.success", "msg": "Successfully created CStor volume replica", "rname": "cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485"}
I0127 06:40:11.950570       9 handler.go:468] cVR creation successful: pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn, 769557cc-1e88-4b0b-b65f-be973e39d928
I0127 06:40:11.950670       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151159", FieldPath:""}): type: 'Normal' reason: 'Created' Resource created successfully
I0127 06:40:11.967145       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151159", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11
I0127 06:40:12.006655       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151168", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11
I0127 06:40:12.039959       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151172", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11
I0127 06:40:12.076037       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151173", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11

Looking at the CVR did not reveal any more useful things to me:

$ kubectl -n openebs describe cvr pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn
Name:         pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn
Namespace:    openebs
Labels:       cstorpoolinstance.openebs.io/name=cstor-hdd-x1-gli1-pool-rszn
              cstorpoolinstance.openebs.io/uid=842a6004-05f2-4b1b-b486-1bc9a49cd4c7
              cstorvolume.openebs.io/name=pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
              openebs.io/persistent-volume=pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
              openebs.io/version=2.5.0
Annotations:  cstorpoolinstance.openebs.io/hostname: iuck8s1
API Version:  cstor.openebs.io/v1
Kind:         CStorVolumeReplica
Metadata:
  Creation Timestamp:  2021-01-27T06:40:14Z
  Finalizers:
    cstorvolumereplica.openebs.io/finalizer
  Generation:  120
  Managed Fields:
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:cstorpoolinstance.openebs.io/hostname:
        f:finalizers:
          .:
          v:"cstorvolumereplica.openebs.io/finalizer":
        f:labels:
          .:
          f:cstorpoolinstance.openebs.io/name:
          f:cstorpoolinstance.openebs.io/uid:
          f:cstorvolume.openebs.io/name:
          f:openebs.io/persistent-volume:
          f:openebs.io/version:
        f:ownerReferences:
          .:
          k:{"uid":"4a420048-edad-4aa3-8b3a-0201392695ab"}:
            .:
            f:apiVersion:
            f:blockOwnerDeletion:
            f:controller:
            f:kind:
            f:name:
            f:uid:
      f:spec:
        .:
        f:targetIP:
      f:status:
        .:
        f:capacity:
          .:
          f:total:
          f:used:
      f:versionDetails:
        .:
        f:desired:
        f:status:
          .:
          f:current:
          f:dependentsUpgraded:
          f:lastUpdateTime:
    Manager:      cvc-operator
    Operation:    Update
    Time:         2021-01-27T06:40:14Z
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:capacity:
        f:replicaid:
      f:status:
        f:capacity:
          f:total:
          f:used:
        f:lastTransitionTime:
        f:lastUpdateTime:
        f:phase:
    Manager:    pool-manager
    Operation:  Update
    Time:       2021-01-27T06:40:14Z
  Owner References:
    API Version:           cstor.openebs.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  CStorVolume
    Name:                  pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
    UID:                   4a420048-edad-4aa3-8b3a-0201392695ab
  Resource Version:        12175494
  UID:                     769557cc-1e88-4b0b-b65f-be973e39d928
Spec:
  Capacity:   4G
  Replicaid:  078C69A9AC378F4DFE74A3BC56DD42F9
  Target IP:  10.90.19.191
Status:
  Capacity:
    Total:               6K
    Used:                6K
  Last Transition Time:  2021-01-27T06:40:20Z
  Last Update Time:      2021-01-27T07:37:50Z
  Phase:                 Healthy
Version Details:
  Desired:  2.5.0
  Status:
    Current:              2.5.0
    Dependents Upgraded:  true
    Last Update Time:     <nil>
Events:
  Type     Reason      Age                From                Message
  ----     ------      ----               ----                -------
  Normal   Synced      57m                CStorVolumeReplica  Received Resource create event
  Normal   Created     57m                CStorVolumeReplica  Resource created successfully
  Warning  SyncFailed  57m (x4 over 57m)  CStorVolumeReplica  failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11

The iSCSI exports seems to be fine:

$ sudo iscsiadm --mode discovery --type sendtargets --portal 10.90.19.191
10.90.19.191:3260,1 iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485

$ sudo iscsiadm -d2 -m node -T iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 -p 10.90.19.191 --login
iscsiadm: Max file limits 1024 262144
iscsiadm: default: Creating session 1/1
Logging in to [iface: default, target: iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485, portal: 10.90.19.191,3260]
Login to [iface: default, target: iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485, portal: 10.90.19.191,3260] successful.

But the device seems to have no partition:
$ sudo fdisk /dev/sda

Welcome to fdisk (util-linux 2.32.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xe845e9e2.

Command (m for help): p
Disk /dev/sda: 4 GiB, 4294967296 bytes, 8388608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 32768 bytes / 1048576 bytes
Disklabel type: dos
Disk identifier: 0xe845e9e2

When the node increases automaticlly with node autoscling, the block devices will be created automatically, how to add them to the cstor pool cluster automatically.

feature request: Automatic Node Deletion Detection and Migration of Volume Replicas

Ticket requested by @niladrih via Slack

When using OpenEBS cStor operators, it would be a very nice feature that in the event of an unexpected node failure the cStor operators would identify the failed node and automatically move the replicas attached to the CSPI to other, valid nodes. This would allow automatic continuation of cluster functionality and the failed node could be removed and replaced later by an engineer. This likely involves several steps.

Identification of failed node(s)
Modification of cStor objects to remove and replace bad CSPI references
Preparation of CSPI to be deleted in the future

From my testing, recovery from this case manually is complicated by the fact that removal of old CSPI references does not update the CSPI object as the pod is no longer running due to the non-existent node. I had to do manual removal of some finalizers and objects to confirm that all references to the dead CSPI were removed. Then I had to modify the CSPI itself to remove replica counts so the CSPC would allow deletion of the CSPI from the object. Ultimately this all worked but was certainly not an ideal situation.

Kubernetes 1.17.12
OpenEBS 2.7.0
cStor CSI/CSPC Operators

version label on npm-operator should be the same as the chart version

Why openebs-cstor-openebs-ndm the label is not : openebs.io/version=2.5.0 ?  but in openebs-ndm  the label is correct ?    (I think the dependancy in ctsor chart should overwrite the label)
NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE   LABELS
openebs-cstor-csi-node      4         4         4       4            4           <none>          17m   app.kubernetes.io/managed-by=Helm,chart=cstor-2.5.0,component=openebs-cstor-csi-node,heritage=Helm,name=openebs-cstor-csi-node,openebs.io/component-name=openebs-cstor-csi-node,openebs.io/version=2.5.0,release=openebs-cstor
openebs-cstor-openebs-ndm   4         4         4       4            4           <none>          17m   app.kubernetes.io/managed-by=Helm,app=openebs-ndm,chart=openebs-ndm-1.1.0,component=ndm,heritage=Helm,openebs.io/component-name=ndm,openebs.io/version=1.1.0,release=openebs-cstor
openebs-ndm                 4         4         4       4            4           <none>          22m   app.kubernetes.io/managed-by=Helm,app=openebs,chart=openebs-2.5.0,component=ndm,heritage=Helm,openebs.io/component-name=ndm,openebs.io/version=2.5.0,release=openebs
root@test-pcl109:~/setup-openebs/migration/2.5.0#

you should overwrite the openebs-cstor-openebs-ndm chart version to match the cstor version.

bug(scaledown): admission server rejects update request of CSPC

Created cStor pool using CSPC by populating specs.
One of the node the cluster was lost and tried to remove corresponding node spec from CSPC.
Admission server rejects the request by throwing below error:

could not find common pool specs for validation: invalid no.of nodes 0 from the given node selectors

Above error is from function.

Expected:

Should be able to remove the pool spec from CSPC when node doesn't exists.

add integration test for cstor volumes in cvc

List of integration tests to be covered for cStor volumes ( CVC operator side tests only )
Provisioning:

Volume provisioning ( single and multiple replicas )
Volume provisioning replica count is greater than pool count. (-ve)

Deprovisioning:

Volume Deprovisioning

Volume Operations:

Scaling up cStor volume.
Scaling down cStor volume.
Scaling up cStor volume when not enough pool are available. (-ve)
Scale up in the same pool. (-ve)
Scale up when already a scale up operation is in progress.(-ve)
Scale down when already a scale down operation is in progress.(-ve)

Tuneables:

Passing resource and limits via CVC
Passing tolerations via CVC
Passing pod priority class via CVC
Configure CStor Tunnables like zvolworkers, quedepth, luworkers via CVC
Target Node Selectors via CVC

Backup/Restore for CSI based volumes using CSPC pools with V1 APIs

This issue to track the task related to migrate , refactore and support for new V1 APIs

Helm chart for cstor-operators

cStor (CSI) - GitHub Updates

Add restore controller to pool manager in accordance to v1 api changes.

CStor Volumes day2 ops documentation

Add volume Provisioning Docs
Add Snapshot and Clone Provisioning Docs
Add Volume Resize Docs
Add Raw Block Volume Docs
Add Volume Policy Docs
CStor Volume Resource Organisation docs.

Velero Plugin for CSI volumes based on CSPC based pools

Add Cstor CSI Volume Integration tests

Provisioning:

Add filesystem volume provisoining
Add Raw Block volume provisioning

Deprovisioning:

Volume deprovisioning
Delete volume if clone volume exists

Volume Operations:

Add remount volumes tests
Snapshot Create and Delete
Clone Volume Create and Delete
Online filesystem based volume resize
Online Raw block volume resize

Policy Tunnables:

Configure target pod affinity via Policy
Configure Replica affinity via Policy

feature request

Automatically add newly created blockdevices to exiting CStorPoolCluster.

Multi arch support for cstor operators images

This will involve multiple PRs in all cstor related repos. Below are the PRs:
openebs-archive/libcstor#69 Merged
openebs-archive/libcstor#71
openebs-archive/istgt#346
openebs-archive/cstor-csi#128
openebs-archive/m-exporter#2
#155

Need to raise PRs for m-exporter & cstor-csi-driver images as well.

feat(migration): CSPC-Operator should able to identify the node label changes

User Stories:
There were few user stories when users migrate
the underlying storage disks from an existing node to
new node then cstor-operator should able intelligently
detect and it should inform the pool manager which is
pointing to old to point to the new storage node.

Pre-Requisite for user story:

Uniquely identifying the disks i.e Blockdevice name
shouldn't be change across reboots or detaching
from one node and attaching to a different node.
All the disks participated in that pool should migrate
all of them to same node.

High-level implementation steps:

When ever cStor-operator reconciles for CSPC
it should detect exiting node selector is now pointing
to the new node(We can do it by comparing it with
blockdevice names).
After finding the CSPI and it's corresponding pool manager
CSPC-Operator should make them point to the new node so
that pool will be importable and available for serving the IOs.

CVC controller mock and fake testing

Write tests using mock and using fake clientset

Add replica scheduling feature for CSPC based pools

This issue to track replica placement feature which is
required to distribute the replicas among the pool.

Add travis sanity checks for cstor-operators

admission-webhook.cstor.openebs.io validation of compression arguments in CStorPoolCluster

The admission webhook expects values that are invalid for the ZFS compression setting.

Example:
Create a new CStorPoolCluster

apiVersion: cstor.openebs.io/v1
kind: CStorPoolCluster
metadata:
  name: cstor-hdd-x1-gli1-pool
  namespace: openebs
spec:
  pools:
  - dataRaidGroups:
    - blockDevices:
      - blockDeviceName: blockdevice-c89ddb7adce61f8d21b6e27a488b1940
    nodeSelector:
      kubernetes.io/hostname: iuck8s1.core.idnt.net
    poolConfig:
      dataRaidGroupType: stripe
      compression: lz4 # <- lz4 ./ lz

Result
Failed CStorPoolInstance:
Failed to create pool due to 'Failed to create pool {cstor-6c7f1f4b-9466-4239-8743-cf0cea515df9} : Failed to create pool.. cannot create 'cstor-6c7f1f4b-9466-4239-8743-cf0cea515df9': 'compression' must be one of 'on | off | lzjb | gzip | gzip-[1-9] | zle | lz4'

lz4 or on gets rejected by the webhook:
Error from server (BadRequest): error when creating "STDIN": admission webhook "admission-webhook.cstor.openebs.io" denied the request: invalid cspc specification: invalid pool spec: unsupported compression 'lz4'
Error from server (BadRequest): error when creating "STDIN": admission webhook "admission-webhook.cstor.openebs.io" denied the request: invalid cspc specification: invalid pool spec: unsupported compression 'on' specified

cstor-pool : core dump

I have a CSPC that started to crash in loop this weekend

cspc-iep-mirror-hr8z-66b67d79c-xtkrx                              3/3     Running            0          3d22h
cspc-iep-mirror-t46v-6d65b46d55-s5hgp                             2/3     CrashLoopBackOff   35         3d22h

kubectl logs -n openebs cspc-iep-mirror-t46v-6d65b46d55-s5hgp cstor-pool

root@test-pcl109:~# kubectl logs -n openebs cspc-iep-mirror-t46v-6d65b46d55-s5hgp cstor-pool
Disabling dumping core
sleeping for 2 sec
2020-12-21/14:28:07.678 disabled auto import (reading of zpool.cache)
physmem = 6167776 pages (23.53 GB)
2020-12-21/14:28:34.695 zvol cstor-48e9ac96-24ff-42c0-9a14-2531c8118a87/pvc-dd5be511-db06-4b4e-a362-e0e5456e0477 status change: DEGRADED -> DEGRADED
2020-12-21/14:28:34.695 zvol cstor-48e9ac96-24ff-42c0-9a14-2531c8118a87/pvc-dd5be511-db06-4b4e-a362-e0e5456e0477 rebuild status change: INIT -> INIT
2020-12-21/14:28:34.696 Instantiating zvol cstor-48e9ac96-24ff-42c0-9a14-2531c8118a87/pvc-dd5be511-db06-4b4e-a362-e0e5456e0477
2020-12-21/14:28:34.696 [tgt 10.103.111.95:6060:23]: Connected
2020-12-21/14:28:34.696 [tgt 10.103.111.95:6060:23]: Handshake command for zvol pvc-dd5be511-db06-4b4e-a362-e0e5456e0477
2020-12-21/14:28:34.699 zvol cstor-48e9ac96-24ff-42c0-9a14-2531c8118a87/pvc-540876e9-f4fb-4494-adaf-9418565234c0 status change: DEGRADED -> DEGRADED
2020-12-21/14:28:34.699 zvol cstor-48e9ac96-24ff-42c0-9a14-2531c8118a87/pvc-540876e9-f4fb-4494-adaf-9418565234c0 rebuild status change: INIT -> INIT
pthread_create(&kt->t_tid, &attr, &zk_thread_helper, kt) == 0 (0x1c == 0x0)
ASSERT at kernel.c:198:zk_thread_create()Fatal signal received: 6
Stack trace:
/usr/local/bin/zrepl(+0x1e82)[0x55f68d4c9e82]
/lib/x86_64-linux-gnu/libc.so.6(+0x3f040)[0x7f514f101040]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f514f100fb7]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f514f102921]
/usr/lib/libzpool.so.2(+0x3c7ae)[0x7f514fb857ae]
/usr/lib/libzpool.so.2(zk_thread_create+0x2c8)[0x7f514fb85e38]
/usr/lib/libzpool.so.2(taskq_create+0x161)[0x7f514fb88f11]
/usr/lib/libcstor.so.2(uzfs_zinfo_init+0xa0)[0x7f514f6e8ef0]
/usr/lib/libcstor.so.2(uzfs_zvol_create_cb+0x98)[0x7f514f6e5e38]
/usr/lib/libzpool.so.2(+0x601b0)[0x7f514fba91b0]
/usr/lib/libzpool.so.2(+0x60253)[0x7f514fba9253]
/usr/lib/libzpool.so.2(dmu_objset_find+0x51)[0x7f514fbacdb1]
/usr/lib/libcstor.so.2(uzfs_zvol_create_minors+0x65)[0x7f514f6e5c75]
/usr/lib/libzpool.so.2(spa_import+0x4b4)[0x7f514fbf3964]
/usr/lib/libzfs.so.2(uzfs_handle_ioctl+0x24df)[0x7f514f92efdf]
/usr/lib/libcstor.so.2(+0xd72e)[0x7f514f6df72e]
/usr/lib/libzpool.so.2(zk_thread_helper+0x12c)[0x7f514fb858dc]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f514f4ba6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f514f1e371f]
Aborted
root@test-pcl109:~#

/dev/sda shoudn't be mapped as a blockdevice when there are partitions under

Here what I did.

my drives were partionned correctly, but I ran those commands

wipefs -fa /dev/sda4
wipefs -fa /dev/sda3
wipefs -fa /dev/sda2
wipefs -fa /dev/sda1
wipefs -fa /dev/sda

and I recreated the partitions (because the last command removed the partitions)

I rebooted the nodes.

after that, I deleted all blockdevices with

kubectl -n openebs delete bd --all

and

kubectl -n openebs delete --all

I think it's a bug. /dev/sda shoudn't be mapped as a blockdevice, it's the root.


blockdevice-c846cd569a09bdccd9c4784bdfa69a2d   test-pcl113   /dev/sda1            268435456000    Unclaimed    Inactive   20m
blockdevice-49dc377b3906a0f1b335ea8b75efe1e2   test-pcl113   /dev/sda2            268435456000    Unclaimed    Inactive   19m
blockdevice-20de503ab2d13694af725ca910cd1b75   test-pcl113   /dev/sda3            268435456000    Unclaimed    Inactive   19m
blockdevice-56f3fc2ea97a718ef3314ec9d1aa1a68   test-pcl113   /dev/sda4            154889707520    Unclaimed    Inactive   19m
blockdevice-c6030d2bc7adb3fb2db013e4b4fd06d3   test-pcl113   /dev/sda             960197124096    Unclaimed    Active     21m

root@test-pcl113:~# lsblk -f
NAME   FSTYPE     LABEL                                      UUID                                 MOUNTPOINT
sda
├─sda1
├─sda2
├─sda3
└─sda4
sdb    zfs_member cstor-fb97744a-f60f-488b-82b5-ca42a4e79430 13151713646392886077
├─sdb1 zfs_member cstor-9b4dd361-3a78-4f7d-983f-c3a96d9b2080 12653991938895525317
└─sdb2 zfs_member cstor-9b4dd361-3a78-4f7d-983f-c3a96d9b2080 12653991938895525317
sdc
├─sdc1 vfat                                                  F277-684F                            /boot/efi
└─sdc2 ext4                                                  d39bba24-de10-415e-b2eb-ade9a9ff81f6 /
root@test-pcl113:~#

Update CSPI status to offline when the pool pod is restarted

This issue will track the CSPI status:

Update the CSPI status.phase to Offline when the pool pod is restarted.
Update the CSPI status.Phase to Lost when the process not able to import/create the pool.

feature : allow auto-expansionning PV

I create a cstor with multiple disks from few nodes in raidmode : stripes. I have around 4T of free space.

I see the allow expanding at true.

I created PVC starting at 150Mi. I'm expecting the PV to increase until there are not free space on the cstor.

Look like now the PV will expand to a max of 1gi.

I'll like to have that feature configurable.

We could have

a threshold like at 80% of used space -> double the PV size (or.. use a factor in the config.. like 0.5 ...)
a maximum value. Don't expand over 200Gi
flag : allowAutomaticExpansion : true/false

the goal is to avoid to rely on manual intervention to increase the size by editing the PVC manually. We will still have to monitor the cstor pool used size.

Bump k8s and client-go version to use 1.20 k8s version

Failed to create daemonset: openebs-cstor-csi-node and statefulset: openebs-cstor-csi-controller

When getting details via kubectl describe both have:

Error creating: insufficient quota to match these scopes: [{PriorityClass In [system-node-critical system-cluster-critical]}]

What does this mean?

I am on a GKE Private Cluster.

Kubernetes Version: 1.17.12-gke.2502

update cstor documentation to explain how to avoid 2 ndm-operators

there is a little lack in the cstor-operator documentation in gh-pages/index.md

There should have a section that tell the user that the OpenEBS-operator Helm chart is not needed when we use cspc only.

if I do that :

helm install openebs -n openebs openebs/openebs --version 2.5.0 --set ndm.filters.enableOsDiskExcludeFilter=false --set ndm.sparse.count=5 --set apiserver.sparse.enabled=true --set featureGates.UseOSDisk.enabled=true
helm install -n openebs openebs-cstor openebs-cstor/cstor --version 2.5.0

I will obtain 2 npm-operators

NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE   LABELS
openebs-cstor-csi-node      4         4         4       4            4           <none>          17m   app.kubernetes.io/managed-by=Helm,chart=cstor-2.5.0,component=openebs-cstor-csi-node,heritage=Helm,name=openebs-cstor-csi-node,openebs.io/component-name=openebs-cstor-csi-node,openebs.io/version=2.5.0,release=openebs-cstor
openebs-cstor-openebs-ndm   4         4         4       4            4           <none>          17m   app.kubernetes.io/managed-by=Helm,app=openebs-ndm,chart=openebs-ndm-1.1.0,component=ndm,heritage=Helm,openebs.io/component-name=ndm,openebs.io/version=1.1.0,release=openebs-cstor
openebs-ndm                 4         4         4       4            4           <none>          22m   app.kubernetes.io/managed-by=Helm,app=openebs,chart=openebs-2.5.0,component=ndm,heritage=Helm,openebs.io/component-name=ndm,openebs.io/version=2.5.0,release=openebs
root@test-pcl109:~/setup-openebs/migration/2.5.0#

I could add a param to exclude npm with cstor-operator like this

helm install -n openebs openebs-cstor openebs-cstor/cstor --version 2.5.0 --set openebsNDM.enabled=false

but it's not clealy explain in the documentation.

and if we don't need openebs/openebs when using ctor-operator with cspc.. it should be written too.

and if we only use ctor-operator there should have a example to explain how to pass the value to ndm like this

helm install -n openebs openebs-cstor openebs-cstor/cstor --version 2.5.0 --set openebs-ndm.filters.enableOsDiskExcludeFilter=false

little formating error in CSPI output

the 3th rows Capacity is not formatted correctly

NAME                                                       HOSTNAME      FREE    CAPACITY        READONLY   PROVISIONEDREPLICAS   HEALTHYREPLICAS   STATUS   AGE
cstorpoolinstance.cstor.openebs.io/cspc-iep-localpv-2z7z   test-pcl112   3160G   3160000614k     false      0                     0                 ONLINE   3m19s
cstorpoolinstance.cstor.openebs.io/cspc-iep-localpv-shtf   test-pcl113   4T      4000000642k     false      0                     0                 ONLINE   3m17s
cstorpoolinstance.cstor.openebs.io/cspc-iep-localpv-wwjp   test-pcl111   4T      4000000063500   false      0                     0                 ONLINE   3m20s
cstorpoolinstance.cstor.openebs.io/cspc-iep-mirror-ch5f    test-pcl112   1920G   1920000614k     false      0                     0                 ONLINE   3m24s
cstorpoolinstance.cstor.openebs.io/cspc-iep-mirror-cmpv    test-pcl111   1920G   1920000614k     false      0                     0                 ONLINE   3m26s
cstorpoolinstance.cstor.openebs.io/cspc-iep-mirror-fs7l    test-pcl110   1920G   1920000062k     false      0                     0                 ONLINE   3m28s
cstorpoolinstance.cstor.openebs.io/cspc-iep-mirror-tpgh    test-pcl113   1920G   1920000062k     false      0                     0                 ONLINE   3m22s
root@test-pcl109:~#

Add cstor-operator helm charts

AttachVolume.FindAttachablePluginBySpec failed for volume

I have pods that are stalled in init state or one was in 0/1 Running state.. but they all have the event :
attachdetach-controller AttachVolume.FindAttachablePluginBySpec failed for volume "xxxx"

Events:
  Type     Reason              Age                  From                     Message
  ----     ------              ----                 ----                     -------
  Warning  FailedAttachVolume  3m8s (x279 over 9h)  attachdetach-controller  AttachVolume.FindAttachablePluginBySpec failed for volume "pvc-21d65a88-5c98-4759-985c-e7af088798ff"
root@test-pcl109:~#
root@test-pcl109:~# kubectl get pv pvc-21d65a88-5c98-4759-985c-e7af088798ff
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS    REASON   AGE
pvc-21d65a88-5c98-4759-985c-e7af088798ff   2Gi        RWO            Delete           Bound    default/datadir-twin-neo4j-core-0   sc-iep-mirror            4d17h
root@test-pcl109:~# kubectl -n openebs get cva | grep pvc-21d65a88-5c98-4759-985c-e7af088798ff
pvc-21d65a88-5c98-4759-985c-e7af088798ff-test-pcl111   4d17h
root@test-pcl109:~# kubectl -n openebs get cvr | grep pvc-21d65a88-5c98-4759-985c-e7af088798ff
pvc-21d65a88-5c98-4759-985c-e7af088798ff-cspc-iep-mirror-zqbr    2.09M       17.3M   Healthy   4d17h
root@test-pcl109:~# kubectl -n openebs get cvc | grep pvc-21d65a88-5c98-4759-985c-e7af088798ff
pvc-21d65a88-5c98-4759-985c-e7af088798ff   2Gi        Bound    4d17h
root@test-pcl109:~# kubectl -n openebs get pods | grep pvc-21d65a88-5c98-4759-985c-e7af088798ff
pvc-21d65a88-5c98-4759-985c-e7af088798ff-target-5dfd4448ccrbv5f   3/3     Running   0          41h
root@test-pcl109:~#

1. kubectl get csidriver cstor.csi.openebs.io
2. kubectl get sts openebs-cstor-csi-controller -n openebs -oyaml


root@test-pcl109:~# kubectl get csidriver cstor.csi.openebs.io
NAME                   ATTACHREQUIRED   PODINFOONMOUNT   MODES                  AGE
cstor.csi.openebs.io   false            true             Persistent,Ephemeral   5d22h
root@test-pcl109:~#

controler-log.txt

I deleted that pod and it returned to RUNNING STATE fine. BUT after that. I got other pods that were previously in "Crashloop" because they were waiting after the database to came online (the pod that I previously deleted).

Some of them came back online fine.. but I have other pods that got the error :

AttachVolume.FindAttachablePluginBySpec failed for volume

I created a new cluster last week with kubeadm. And reinstalled OpenEBS cstor from ctor Helm chart. and installed my application after recreating the cstor pool.

NAME          STATUS   ROLES    AGE     VERSION
test-pcl109   Ready    master   6d23h   v1.18.4
test-pcl110   Ready    <none>   6d23h   v1.18.4
test-pcl111   Ready    <none>   6d23h   v1.18.4
test-pcl112   Ready    <none>   6d23h   v1.18.4
test-pcl113   Ready    <none>   6d23h   v1.18.4
root@test-pcl109:~#

from OpenEBS team in Slack : Since our new version of csidiver doesn't support attach, attach should not be tried by kubernetes. I think kubernetes is looking at some cashed version of the driver when the error occurred (Not sure). We have disabled attach/detach functionality in latest cstor-csi versions (2.6.0).

openebs/api major V2 go modules release

With go modules the major release (v2, v3 .. onwards) to be consumed as modules need some changes

change openebs/api the module path in go.mod file --> github.com/openebs/api/v2
then import with same path ^^ at the other ends.

There are two alternative mechanisms to release a v2 or higher module. Note that with both techniques, the new module release becomes available to consumers when the module author pushes the new tags. Using the example of creating a v3.0.0 release, the two options are:

Major branch: Update the go.mod file to include a /v3 at the end of the module path in the module directive (e.g., module github.com/my/module/v3). Update import statements within the module to also use /v3 (e.g., import "github.com/my/module/v3/mypkg"). Tag the release with v3.0.0.
Major subdirectory: Create a new v3 subdirectory (e.g., my/module/v3) and place a new go.mod file in that subdirectory. The module path must end with /v3. Copy or move the code into the v3 subdirectory. Update import statements within the module to also use /v3 (e.g., import "github.com/my/module/v3/mypkg"). Tag the release with v3.0.0.

More info https://github.com/golang/go/wiki/Modules#releasing-modules-v2-or-higher

Refer here https://blog.golang.org/v2-go-modules

Provide alternate image registry to dockerhub

Hello,

In order to combat dockerhub's imposed rate limits it would be great to host these images on ghcr or quay.io. Are there on any thoughts from the community on this initiation?

How to setup virtual iscsi storage in containers

i am using k3d cluster as kubernetes cluster it has no nodes ,it has only containers as nodes .
How to setup cStor pool on k3d cluster

if i do kubectl get bd -n openebs i don't see any blockdevices because they are not nodes.

how to configure cStor pool for containers

wrong helm repo / chart name

If I try to follow the guide, it won't works because the name in the guide is not the same name in the repo.

root@test-pcl114:~# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "cstor" chart repository
...Successfully got an update from the "openebs" chart repository
...Successfully got an update from the "vmware-tanzu" chart repository
...Successfully got an update from the "minio" chart repository
...Successfully got an update from the "bitnami" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈Happy Helming!⎈
root@test-pcl114:~# helm install -n openebs openebs-cstor cstor/openebs-cstor
Error: failed to download "cstor/openebs-cstor" (hint: running `helm repo update` may help)
root@test-pcl114:~# helm repo list
NAME                    URL
bitnami                 https://charts.bitnami.com/bitnami
stable                  https://charts.helm.sh/stable
openebs                 https://openebs.github.io/charts
vmware-tanzu            https://vmware-tanzu.github.io/helm-charts
minio                   https://helm.min.io
cstor                   https://openebs.github.io/cstor-operators
root@test-pcl114:~# helm search repo cstor
NAME            CHART VERSION   APP VERSION     DESCRIPTION
cstor/cstor     2.5.0           2.5.0           CStor-Operator helm chart for Kubernetes
root@test-pcl114:~#

the repo name should be :

openebs-cstor and the helm chart should be cstor.

the guide need to be updated to reflect that

Add overprovisioning policy for cstor volumes

stuck at FailedMount : UnmountUnderProgress

This morning I have again this error :

 Warning  FailedMount  2m1s (x3 over 6m5s)  kubelet            MountVolume.MountDevice failed for volume "pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de" : rpc error: code = Internal desc = Volume pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de Busy, status: UnmountUnderProgress

It's the same scenario again. I did a helm install myapp. it was working for 7 days. This morning the database crached, so I did a : helm delete myapp. Waited that all the pods were completly removed. After that I did again : helm install myapp. Not now the database pod is stucked at : UnmountUnderProgress.
We never found how to fix that issue.

I have trouble telling my dev team that I have no idea what going on. I try also what was suggested last time.. but I still have that error when I try the command with zpool..

root@cspc-iep-mirror-szdj-67659c8c66-jztlt:/# zpool status
  pool: cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77
 state: ONLINE
  scan: none requested
config:
        NAME                                             STATE     READ WRITE CKSUM
        cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77       ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            scsi-0ATA_ST8000VE000-2P61_WKD3G868-part1    ONLINE       0     0     0
            scsi-1ATA_ST8000VE000-2P6101_WKD3EQ63-part1  ONLINE       0     0     0
errors: No known data errors
root@cspc-iep-mirror-szdj-67659c8c66-jztlt:/# zpool scrub cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77
cannot scrub cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77: operation not supported on this type of pool
root@cspc-iep-mirror-szdj-67659c8c66-jztlt:/#

What I should do next ?

I have openebs 2.7.0

pod not able to mount PV after the pv was resized

I followed this guide : https://github.com/openebs/cstor-operators/blob/master/docs/tutorial/volumes/resize.md

After that I deleted the pod. The pod was never able to mount the PV. I had to scale down to 0 the deployment to fix the problem

Add cstor csi volume metrics docs

Add step by step guide to deploy required components to get the cstor csi volumes available metrics

cspi status should show usable pool capacity

As of know kubectl get cspi -n openebs shows sum of all the raid groups in the pool but for usability purposes, it needs to show the usable capacity of the pool after excluding metadata and raid parity.

feat(cspc-operator): cStor should identify the active pool on the disk

Use cases:

CSPC and CSI volumes are provisioned and everything is working fine. Now all of a sudden etcd was crashed and no backup exists for etcd but underlying disks still intact with cStor pool data.
CSPC and CSI volumes are provisioned and everything works fine. All of a sudden cluster got crashed/hung(which means destruction happens on all K8s controllers as well as etcd) but external underlying disks that were still intact with pool data.

** High-level solution**:

If a new cluster is created by attaching block devices that have valid pool data. Upon creation of CSPC with such block devices old pool should be re-imported.

Enhance cspc events and logs

unable to mount volume for statefulset application when the pods moved from nodes

new usecase problem with statefulset cstor mount :
here what I did.
0 - kubectl get statefulset maria -o yaml > statefulset-maria.yaml
1 - I remove my statefulset application that was running in node test110 (that node after that stop been healthy : PUGL warning)
2 - I updated my statefulset to change the database image mariadb 1.3.20->1.3.27
3 - kubectl apply -f statefulset-maria.yaml
4 - kubernetes installed it on node test 111 this time.
5 - I kill the pods few times.. and each time the database came back running fine
6 - the node 110 came back to life. So I kill my pod, and the pod was move to test110
7 - the pod is in ContainerCreating over 30 min. (got the event : MountVolume.WaitForAttach failed for volume "pvc-4422472a-6e1c-4b03-9287-63d8ce5aa793" : rpc error: code = Internal desc = Volume still mounted on node: test-pcl110)
8 - I did a describe on this

root@test-pcl109:/tmp# kubectl get volumeattachments csi-a59a3cd375d34f5604e49c735bff447820c4d2b4cef258b9a4a1bf1a6007a806
NAME                                                                   ATTACHER               PV                                         NODE          ATTACHED   AGE
csi-a59a3cd375d34f5604e49c735bff447820c4d2b4cef258b9a4a1bf1a6007a806   cstor.csi.openebs.io   pvc-4422472a-6e1c-4b03-9287-63d8ce5aa793   test-pcl110   false      18m

root@test-pcl109:/tmp# kubectl describe volumeattachments csi-a59a3cd375d34f5604e49c735bff447820c4d2b4cef258b9a4a1bf1a6007a806
Name:         csi-a59a3cd375d34f5604e49c735bff447820c4d2b4cef258b9a4a1bf1a6007a806
Namespace:
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: test-pcl110
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:  2020-12-09T16:10:52Z
  Finalizers:
    external-attacher/cstor-csi-openebs-io
  Managed Fields:
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:attacher:
        f:nodeName:
        f:source:
          f:persistentVolumeName:
    Manager:      kube-controller-manager
    Operation:    Update
    Time:         2020-12-09T16:10:52Z
    API Version:  storage.k8s.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:csi.alpha.kubernetes.io/node-id:
        f:finalizers:
          .:
          v:"external-attacher/cstor-csi-openebs-io":
      f:status:
        f:attachError:
          .:
          f:message:
          f:time:
    Manager:         csi-attacher
    Operation:       Update
    Time:            2020-12-09T16:29:09Z
  Resource Version:  73111510
  Self Link:         /apis/storage.k8s.io/v1/volumeattachments/csi-a59a3cd375d34f5604e49c735bff447820c4d2b4cef258b9a4a1bf1a6007a806
  UID:               4626d5a2-0519-47f0-96be-99837947dd79
Spec:
  Attacher:   cstor.csi.openebs.io
  Node Name:  test-pcl110
  Source:
    Persistent Volume Name:  pvc-4422472a-6e1c-4b03-9287-63d8ce5aa793
Status:
  Attach Error:
    Message:  rpc error: code = Internal desc = Volume still mounted on node: test-pcl110
    Time:     2020-12-09T16:29:09Z
  Attached:   false
Events:       <none>

9 - What I do now ?
10 - What I do to be sure that it won't happen again.. I'm trying to find a way to do that automaticaly.

Stripe pool provisioning with multiple disks and raid groups ( includes write cache and data raid groups )
mirror pool provisioning with multiple disks and raid groups ( includes write cache and data raid groups )
raidz1 pool provisioning with multiple disks and raid groups ( includes write cache and data raid groups )
raidz2 pool provisioning with multiple disks and raid groups ( includes write cache and data raid groups )

Deprovisioning:

Deleting a CSPC.
Deleting a CSPC when it is being used by volume.
Deleting pools by removing pool spec from CSPC.
Deleting specs when the pool has a replica on it. (-ve)
Deleting raid groups. (-ve)

Pool operations:

Tuneables:

the cstor chart should include ndm settings

the description in readme.md should include the params for NDM like

helm upgrade -n openebs openebs-cstor openebs-cstor/cstor --set openebs-ndm.featureGates.UseOSDisk.enabled=true --set openebs-ndm.featureGates.enabled=true

openebs / cstor-operators Goto Github PK

cstor-operators's Introduction

cStor Operators

Project Status: Beta

Operators Overview

Minimum Supported Versions

Usage

Raising Issues And PRs

Contributing

Code of conduct

cstor-operators's People

Contributors

Stargazers

Watchers

Forkers

cstor-operators's Issues

Recommend Projects

Recommend Topics

Recommend Org