Git Product home page Git Product logo

datashim's Introduction

Go Report Card CII Best Practices

Datashim

drawing

Our Framework introduces the Dataset CRD which is a pointer to existing S3 and NFS data sources. It includes the necessary logic to map these Datasets into Persistent Volume Claims and ConfigMaps which users can reference in their pods, letting them focus on the workload development and not on configuring/mounting/tuning the data access. Thanks to Container Storage Interface it is extensible to support additional data sources in the future.

DLF

A Kubernetes Framework to provide easy access to S3 and NFS Datasets within pods. Orchestrates the provisioning of Persistent Volume Claims and ConfigMaps needed for each Dataset. Find more details in our FAQ

Warning

๐Ÿšจ (23 Jan 2024) - Group Name Change

If you have an existing installation of Datashim, please DO NOT follow the instructions below to upgrade it to version 0.4.0 or latest. The group name of the Dataset and DatasetInternal CRDs (objects) is changing from com.ie.ibm.hpsys to datashim.io. An upgrade in place will invalidate your Dataset definitions and will cause problems in your installation. You can upgrade up to version 0.3.2 without any problems.

To upgrade to 0.4.0 and beyond, please a) delete all datasets safely; b) uninstall Datashim; and c) reinstall Datashim either through Helm or using the manifest file as follows.

Quickstart

First, create the namespace for installing Datashim, if not present

kubectl create ns dlf

In order to quickly deploy Datashim, based on your environment execute one of the following commands:

  • Kubernetes/Minikube/kind
kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf.yaml
  • Kubernetes on IBM Cloud
kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf-ibm-k8s.yaml
  • Openshift
kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf-oc.yaml
  • Openshift on IBM Cloud
kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf-ibm-oc.yaml

Wait for all the pods to be ready :)

kubectl wait --for=condition=ready pods -l app.kubernetes.io/name=datashim -n dlf

As an optional step, label the namespace(or namespaces) you want in order have the pods labelling functionality (see below for an example with default namespace).

kubectl label namespace default monitor-pods-datasets=enabled

Tip

In case don't have an existing S3 Bucket follow our wiki to deploy an Object Store and populate it with data.

We will create now a Dataset named example-dataset pointing to your S3 bucket.

cat <<EOF | kubectl apply -f -
apiVersion: datashim.io/v1alpha1
kind: Dataset
metadata:
  name: example-dataset
spec:
  local:
    type: "COS"
    accessKeyID: "{AWS_ACCESS_KEY_ID}"
    secretAccessKey: "{AWS_SECRET_ACCESS_KEY}"
    endpoint: "{S3_SERVICE_URL}"
    bucket: "{BUCKET_NAME}"
    readonly: "true" #OPTIONAL, default is false  
    region: "" #OPTIONAL
EOF

If everything worked okay, you should see a PVC and a ConfigMap named example-dataset which you can mount in your pods. As an easier way to use the Dataset in your pod, you can instead label the pod as follows:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    dataset.0.id: "example-dataset"
    dataset.0.useas: "mount"
spec:
  containers:
    - name: nginx
      image: nginx

As a convention the Dataset will be mounted in /mnt/datasets/example-dataset. If instead you wish to pass the connection details as environment variables, change the useas line to dataset.0.useas: "configmap"

Feel free to explore our other examples

Important

We recommend using secrets to pass your S3/Object Storage Service credentials to Datashim, as shown in this example.

Note

Datashim uses a Mutating Webhook which uses a self-signed certificate. We recommend the use of cert-manager to manage this certificate. Please take a look at this note for instructions to do this.

Helm Installation

Hosted Helm charts have been made available for installing Datashim. This is how you can do a Helm install:

helm repo add datashim https://datashim-io.github.io/datashim/
helm repo update

This should produce an output of ...Successfully got an update from the "datashim" chart repository in addition to the other Helm repositories you may have.

To install, search for the latest stable release

helm search repo datashim --versions

which will result in:

NAME                    	CHART VERSION	APP VERSION	DESCRIPTION
datashim/datashim-charts	0.4.0        	0.4.0      	Datashim chart
datashim/datashim-charts	0.3.2        	0.3.2      	Datashim chart

Caution

Version 0.3.2 still has com.ie.ibm.hpsys as the apiGroup name. So, please proceed with caution. It is fine for upgrading from an existing Datashim installation but going forward the apiGroup will be datashim.io

Pass the option to create namespace, if you are installing Datashim for the first time:

helm install --namespace=dlf --create-namespace datashim datashim/datashim-charts --version <version_string>

Do not forget to label the target namespace to support pod labels, as shown in the previous section

Uninstalling through Helm

To uninstall, use helm uninstall like so:

helm uninstall -n dlf datashim

Installing intermediate releases

You can query the Helm repo for intermediate releases (.alpha, .beta, etc). To do this, you need to pass --devel flag to Helm repo search, like so:

helm search repo datashim --devel

To install an intermediate version,

helm install --namespace=dlf --create-namespace datashim datashim/datashim-charts --devel --version <version_name>

Questions

The wiki and Frequently Asked Questions documents are a bit out of date. We recommend browsing the issues for previously answered questions. Please open an issue if you are not able to find the answers to your questions, or if you have discovered a bug.

Contributing

We welcome all contributions to Datashim. Please read this document for setting up a Git workflow for contributing to Datashim. This project uses DCO (Developer Certificate of Origin) to certify code ownership and contribution rights.

If you use VSCode, then we have recommendations for setting it up for development.

If you have an idea for a feature request, please open an issue. Let us know in the issue description the problem or the pain point, and how the proposed feature would help solve it. If you are looking to contribute but you don't know where to start, we recommend looking at the open issues first. Thanks!

datashim's People

Contributors

alessandropomponio avatar anthonyhaussman avatar captainpatate avatar chazapis avatar christian-pinto avatar dependabot[bot] avatar gdubya avatar imgbot[bot] avatar imgbotapp avatar kimxogus avatar malvag avatar martialblog avatar muandane avatar olevski avatar pkoutsov avatar seb-835 avatar srikumar003 avatar stevemar avatar tomcli avatar vassilisvassiliadis avatar viktoriaas avatar yiannisgkoufas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datashim's Issues

Can Dataset support accessKey and secretAccessKey in Secret?

Can Dataset support preexisting Secret where accessKeyID and secretAccessKey are stored? There may be two reasons:

  1. End users do not want to store such information in Dataset if they open-sourced their project. It can be a security risk.
  2. In larger organizations, such information may not be available to programmers. Administrators create K8S Secret on their behalf.

Related, the rest of the information: endpoint, bucket, region may be available in a configmap. Please consider that as secondary.

Bug on dataset operator eviction

If the dataset operator pod gets evicted for some reason a new instance is started. However, the evicted instance seems to be holding a lock causing a deadlock.

In the new operator instance logs you can only see the following line repeated indefinitely:
{"level":"info","ts":1589796411.546784,"logger":"leader","msg":"Not the leader. Waiting."}
The pod is not capable of continuing its execution.

This seems to be an issue with the operator-framework that is being triaged at the below link:
operator-framework/operator-sdk#1305

Use datashim for helm with terraform

Hi,

I'd like to integrate your awesome project into my terraform script, using helm. I'm kind of a beginner with helm, so I was wondering if you could explain to me how to add the datashim charts as a repo. As far as I understand, it requires an index.yaml which I cannot find in the charts.
I could install it with kubectl and the yaml file, but I'd like to exclude the efs driver as I do not need it and I don't want it to waste resources.

If you intentionally didn't add an index.yaml, could you please point me into the right direction of how to handle this? Creating my own index.yaml? Thanks a lot in advance!

Throw errors for names containing illegal characters

As reported here #106
In dataset definitions like this:

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: kind-example-v0.2-try6-cp4d3f6d318f7c
spec:
  local:
    type: "COS"
    secret-name: "bucket-creds"
    secret-namespace: "m4d-system"
    endpoint: "http://s3.eu.cloud-object-storage.appdomain.cloud"
    provision: "true"
    bucket: "kind-example-v0.2-try6-cp4d3f6d318f7c"

there should be the necessary message in the dataset.status
FYI @shlomitk1

Existing files owned by root inaccessible by nonroot users

Storage buckets work nicely if they are empty. The existing files and directories are owned by root so they are inaccessible by non-root users in a container. I have tried object stores on GCP, AWS and a custom S3-compatible object store. Note that this applies to the files and directories created outside of DLF via S3 APIs. The ones created via DLF have the correct ownership if a bucket is mounted a second time.

Multi user support in NFS shares

Hi,
another question ๐Ÿ˜…
Have you thought about how to prevent users from mounting pvc's of others in NFS?

We have one export share. When user is creating a Dataset, he/she needs to specify the path.
Let's say the path is /nfs/export and option createDirPVC: "true". In this case, the user gets his/her own share at /nfs/export/myshare. However, nothing stops user from mounting whole export just by sepcifying path as /nfs/export and setting createDirPVC: "false"

I think this is a great issue in multitenancy environments and therfore unusable because of big security problem. Maybe if helm chart was up and ready for use, the path could be configurable somewhere in values and the user wouldn't actually have to specify if he/she wants to create a directory but default setting would be to create a directory with name path + Dataset name and mount only the resulting path in PVC.

Installation of datashim results in an error

I'm getting the following error while deploying datashim:
error: unable to recognize "https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf.yaml": no matches for kind "CSIDriver" in version "storage.k8s.io/v1"
Earlier installations were successful. I suspect that #105 is the cause.

Update deprecated apiextensions.k8s.io/v1beta1 and admissionregistration.k8s.io/v1beta1

The following apiVersion will be deprecated in in v1.22 and are used in the dataset-operator:

  • admissionregistration.k8s.io/v1beta1 => admissionregistration.k8s.io/v1
  • apiextensions.k8s.io/v1beta1 => apiextensions.k8s.io/v1

I have done manual some tests and upgrade the admissionregistration apiVersion from

apiVersion: admissionregistration.k8s.io/v1beta1
is easy and work well. ๐Ÿ‘Œ

Anyway, the problem where I blocking is on the migration of apiextensions.k8s.io/v1beta1 from

Following these recommendations, I have tried to update the dataset-operator CRD files and successfully deploy them.
Here what I have for the file src/dataset-operator/chart/templates/crds/com.ie.ibm.hpsys_datasets_crd.yaml:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: datasets.com.ie.ibm.hpsys
spec:
  group: com.ie.ibm.hpsys
  names:
    kind: Dataset
    listKind: DatasetList
    plural: datasets
    singular: dataset
  scope: Namespaced
  versions:
  - name: v1alpha1
    subresources:
      status: {}
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        description: Dataset is the Schema for the datasets API
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
              of an object. Servers should convert recognized schemas to the latest
              internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
              object represents. Servers may infer this from the endpoint the client
              submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
            type: string
          metadata:
            type: object
          spec:
            description: DatasetSpec defines the desired state of Dataset
            properties:
              local:
                additionalProperties:
                  type: string
                description: 'INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
                  Important: Run "operator-sdk generate k8s" to regenerate code after
                  modifying this file Add custom validation using kubebuilder tags:
                  https://book-v1.book.kubebuilder.io/beyond_basics/generating_crd.html
                  Conf map[string]string `json:"conf,omitempty"`'
                type: object
              remote:
                additionalProperties:
                  type: string
                type: object
            type: object
          status:
            description: DatasetStatus defines the observed state of Dataset
            properties:
              error:
                description: 'INSERT ADDITIONAL STATUS FIELD - define observed state
                  of cluster Important: Run "operator-sdk generate k8s" to regenerate
                  code after modifying this file Add custom validation using kubebuilder
                  tags: https://book-v1.book.kubebuilder.io/beyond_basics/generating_crd.html'
                type: string
            type: object

But when I try to deploy a new simple S3 Dataset:

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: test
spec:
  local:
    type: "COS"
    accessKeyID: "KeyID"
    secretAccessKey: "Secret"
    endpoint: "https://s3.eu-west-1.amazonaws.com"
    bucket: "test-bucket"
    readonly: "true" #OPTIONAL, default is false  

The controller sees the new resource:

...
{"level":"info","ts":1622124637.1511607,"logger":"controller_dataset","msg":"Reconciling Dataset","Request.Namespace":"default","Request.Name":"test"}
{"level":"info","ts":1622124637.1608975,"logger":"controller_dataset","msg":"Reconciling Dataset","Request.Namespace":"default","Request.Name":"test"}

But nothing happens. ๐Ÿ˜ž
I can describe the dataset resource but the PVC is not created.

If some people can help on this. ๐Ÿ™‚
I'm ready to help and contribute but blocking on this.

Removing bogus kubelet error message on IKS if possible

On IKS, mounting S3 CSI drivers still shows the below errors. Although this is not blocking the pod from mounting, it takes the kubelet few minutes to realize the message is bogus which creates some bottleneck on mounting time.

Events:
  Type     Reason       Age   From               Message
  ----     ------       ----  ----               -------
  Normal   Scheduled    5m6s  default-scheduler  Successfully assigned default/nginx to 10.168.14.70
  Warning  FailedMount  5m    kubelet            MountVolume.SetUp failed for volume "pvc-ae703fc0-26d4-4ba2-bb92-2d709985e72b" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Unknown desc = Error fuseMount command: goofys
args: [--endpoint=http://minio-service.kubeflow:9000 --profile=pvc-ae703fc0-26d4-4ba2-bb92-2d709985e72b --type-cache-ttl 1s -f --stat-cache-ttl 1s --dir-mode 0777 --file-mode 0777 --http-timeout 5m -o allow_other -o ro e6dbfd34-1ed9-11eb-8b10-d62589704c0d /var/data/kubelet/pods/f3889d7b-0ee6-40d3-8add-87ddb82a1901/volumes/kubernetes.io~csi/pvc-ae703fc0-26d4-4ba2-bb92-2d709985e72b/mount]
output: 2020/11/04 20:11:37.734151 s3.ERROR code=NoCredentialProviders msg=no valid providers in chain. Deprecated.
  For verbose messaging see aws.Config.CredentialsChainVerboseErrors, err=<nil>

2020/11/04 20:11:37.734280 main.ERROR Unable to access 'e6dbfd34-1ed9-11eb-8b10-d62589704c0d': NoCredentialProviders: no valid providers in chain. Deprecated.
  For verbose messaging see aws.Config.CredentialsChainVerboseErrors
2020/11/04 20:11:37.734297 main.FATAL Mounting file system: Mount: initialization failed
  Warning  FailedMount  3m3s   kubelet  Unable to attach or mount volumes: unmounted volumes=[example-dataset], unattached volumes=[default-token-wspcg example-dataset]: timed out waiting for the condition
  Warning  FailedMount  2m59s  kubelet  MountVolume.SetUp failed for volume "pvc-ae703fc0-26d4-4ba2-bb92-2d709985e72b" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   Pulling      2m50s  kubelet  Pulling image "nginx"
  Normal   Pulled       2m49s  kubelet  Successfully pulled image "nginx"
  Normal   Created      2m49s  kubelet  Created container nginx
  Normal   Started      2m49s  kubelet  Started container nginx

Dataset is stuck in Pending state

Scenario: created a Dataset CRD named "kind-example-v0.2-try6-cp4d3f6d318f7c".

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: kind-example-v0.2-try6-cp4d3f6d318f7c
spec:
  local:
    type: "COS"
    secret-name: "bucket-creds"
    secret-namespace: "m4d-system"
    endpoint: "http://s3.eu.cloud-object-storage.appdomain.cloud"
    provision: "true"
    bucket: "kind-example-v0.2-try6-cp4d3f6d318f7c"

Problem: A bucket has been successfully created. However, the Dataset status is stuck on "Pending".
Reason:
This is caused by a failure to reconcile a pvc resource. From csi-provisioner-s3-0 log in dlf namespace:
volume_store.go:144] error saving volume pvc-80abdaf5-bb2e-4f3b-a733-4ea96c0f1552: PersistentVolume "pvc-80abdaf5-bb2e-4f3b-a733-4ea96c0f1552" is invalid: spec.csi.name: Invalid value: "kind-example-v0.2-try6-cp4d3f6d318f7c": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'

Nooba Install on Mac OS X fails

$ make minikube-install
results in

Installing NooBaa...done
Building NooBaa data loader...done
Creating test OBC...error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"

This is happening because of this line in examples/noobaa/noobaa_install.sh:
wget -P ${DIR} https://github.com/noobaa/noobaa-operator/releases/download/v2.0.10/noobaa-linux-v2.0.10 > /dev/null 2>&1

ARCHIVE Type is having issue after failing to push datasets to S3

When we are trying to push a big dataset using the ARCHIVE type, sometimes it will fail due to the large workload. Then after that, all other datasets created by the same DLF cluster won't able to be mounted on any pod. Redeploy DLF and minio won't solve this issue.

Here are the events after pod failing to mount on DLF's PVC:

18m         Warning   ProvisioningFailed      persistentvolumeclaim/example-dataset                       failed to provision volume with StorageClass "csi-s3": rpc error: code = DeadlineExceeded desc = context deadline exceeded
2s          Warning   FailedMount             pod/nginx                                                   Unable to attach or mount volumes: unmounted volumes=[example-dataset], unattached volumes=[example-dataset default-token-7qlxh]: timed out waiting for the condition
3m11s       Warning   VolumeFailedDelete      persistentvolume/pvc-6f6e9892-7fdf-4d8f-b1a2-c75d416c9b97   rpc error: code = Unknown desc = failed to initialize S3 client: Endpoint:  does not follow ip address or domain name standards.

Here is the dataset that can ruin the whole DLF cluster:

cat <<EOF | kubectl apply -f -
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: example-dataset
  namespace: default
spec:
  type: "ARCHIVE"
  url: "https://dax-cdn.cdn.appdomain.cloud/dax-oil-reservoir-simulations/1.0.0/oil-reservoir-simulations.tar.gz"
  format: "application/x-tar"
EOF

Dataset for the 1000 Genome project

I am trying to mount a dataset for the 1000 Genome project - https://registry.opendata.aws/1000-genomes/
I have created the dataset object using:

---
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: 1000-genome-dataset
spec:
  local:
    type: "COS"
    accessKeyID: ""
    secretAccessKey: ""
    endpoint: "https://s3-us-east-1.amazonaws.com"
    bucket: "1000genomes"
    readonly: "true" #OPTIONAL, default is false

The PVC for the dataset gets provisioned but when trying to mount into a pod, get errors like this:

Warning  FailedMount  92s   kubelet            MountVolume.SetUp failed for volume "pvc-a785f139-e992-4790-a2f7-57ad1efa5476" : rpc error: code = Unknown desc = Error fuseMount command: goofys
args: [--endpoint=https://s3-us-east-1.amazonaws.com --profile=pvc-a785f139-e992-4790-a2f7-57ad1efa5476 --type-cache-ttl 1s --stat-cache-ttl 1s --dir-mode 0777 --file-mode 0777 --http-timeout 5m -o allow_other -o ro 1000genomes /var/lib/kubelet/pods/256d5c70-5751-4aaa-8095-fc951047a3db/volumes/kubernetes.io~csi/pvc-a785f139-e992-4790-a2f7-57ad1efa5476/mount]
output: 2021/05/03 22:35:32.659638 main.FATAL Unable to mount file system, see syslog for details

Any pointer would be greatly appreciated

S3 config format

I am trying to set this up with an S3 bucket. I have provided my keys. I get:

ย  failed to provision volume with StorageClass "csi-s3": rpc error: code = Unknown desc = failed to initialize S3 client: Endpoint: does not follow ip address or domain name standards.

I have:

endpoint=s3.eu-west-2.amazonaws.com

which is clearly wrong, but what would be RIGHT?

design document

Could you please share the design document to understand how it mounting the bucket/nfs share ... ?

Multiple namespaces installation problems

If you do the full installation on a different namespace, the previous installation breaks.
We need to have 2 things fixed

  • In the installation to check if the dataset operator is running in another namespace and prevent re-installation
  • Explain on the wiki how to extend existing installation to multiple namespaces

FYI @davidyuyuan

error 400 Bad Request

Hi all,

I have a Kubernetes cluster on AWS (EKS).
we are currently using some workaround (script at the init of a node) to be able to mount s3 bucket on pod.

I tried to use datashim which looks very promising.
I installed the setup with https://raw.githubusercontent.com/IBM/dataset-lifecycle-framework/master/release-tools/manifests/dlf.yaml

here my dataset config:

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: archive-dataset
spec:
  local:
    type: "COS"
    accessKeyID: "XXX"
    secretAccessKey: "XXX"
    endpoint: "https://s3.amazonaws.com"
    bucket: "bucket_name-ap-east-1"
    region: "ap-east-1" 

But I end up with the error:

  Warning  ProvisioningFailed    3m11s (x9 over 7m32s)  ch.ctrox.csi.s3-driver_csi-provisioner-s3-0_0ef0ce8b-2b1e-4a4e-8ebd-a82731d7ae1b  failed to provision volume with StorageClass "csi-s3": rpc error: code = Unknown desc = failed to check if bucket bucket_name-ap-east-1 exists: 400 Bad Request

All the pod in dlf namespace are running fine (Running status, I didn't dig the log yet)
I tried with different credentials.
I can mount successfully the bucket locally with s3fs or goofys (with the same credentials).

Did I miss anything?
Thank you very much for your work.

Dataset Operator permission mismatch

Hi,

I have cluster with pod security policy enabled. When I try to deploy operator I always get Error: container has runAsNonRoot and image will run as root in dataset-operator Deployment.

The issue is resolved with adding security context under spec.template.spec. I used

spec:
      securityContext:
        runAsUser: 1000

and pod starts now.

Could similar fix be added to the code?

Fix goofys disconnection in case of error

Hi,
goofys needs syslog if fatal event happens. Installing package netcat-openbsd and running nc -k -l -U /var/run/syslog & fixes the issue and enpoint will not disconnect. I think this image quay.io/k8scsi/csi-node-driver-registrar:v1.2.0 needs the fix. (source)

Dataset labels not working for deployments

When I use a dataset label to mount a dataset to a deployment, nothing happens.

Upon further inspection, it looks like the MutatingWebhookConfiguration does not trigger for deployments, but only for pods. For deployments, the mutate function should do exactly the same changes it does for pods, but work on the /spec/template/spec path instead of /spec.

Addressing by name or ID

The README states addressing of datasets is done by "using the unique ID defined at creation time" - when I look at the example it looks like the addressing is done by the name of dataset CR - can you maybe clarify that in the README?

Problem Mounting Existing Bucket

Hi there,

Thanks for the efforts, finding this project very useful. One issue i'm having however (apologies if it's obvious) is mounting an existing bucket, even though I am specifying the bucket in the secret e.g this is for a non-AWS s3 endpoint;

apiVersion: v1
data:
  accessKeyID: accessKey
  bucket: bucket
  endpoint: endpoint
  region: ""
  secretAccessKey: secretAccessKey
kind: Secret
metadata:
  name: csi-s3-pvc
  namespace: test-namespace
type: Opaque

Rather than mounting the specified bucket, it instead generates a new bucket with the name of the Kubernetes pvc. Just want to confirm that I am doing things correctly and if not what I need to change?

Versions:
Attacher: 2.2.0
Provisioner: 1.6.0

support IBM Cloud IAM API Key instead of HMAC keypair when configuring COS bucket as dataset

The use case is about working with data on IBM COS. I followed the guide here: https://github.com/IBM/dataset-lifecycle-framework/wiki/Data-Volumes-for-Notebook-Servers#create-a-dataset-for-the-s3-bucket

where it creates a COS bucket, it needs:

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: your-dataset
spec:
  local:
    type: "COS"
    accessKeyID: "access_key_id"
    secretAccessKey: "secret_access_key"
    endpoint: "https://YOUR_ENDPOINT"
    bucket: "YOUR_BUCKET"
    region: "" #it can be empty

Which requires a service credential to be created.

I wonder if it can support creating dataset via:

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: your-dataset
spec:
  local:
    type: "COS"
    ibm_cloud_iam_apikey: "<base64 encoded api key>"
    bucket: "YOUR_BUCKET"
    region: "" #it can be empty

Which will make COS admin's life much easier since it can delegate secret management/rotate to IBM Cloud IAM.

Specify access mode in NFS Dataset

As asked in comments in #67 , I'm opening a new issue tracking specifying access mode in NFS Dataset.

Goal: be able to specify whether RWO or RWX access mode should be used
A way how to do that is lightly described in #67

Support specifying which cache plugin (or None) to use for a particular dataset

I noticed that at the moment DLF will query for the installed caching plugins and will always use the first result (if any) to cache the dataset. However, in the case where multiple caching plugins are installed, it would come handy to be able to specify which one to use and cache the dataset. Also, there are cases that the user may want to opt-out of caching at all.

As a solution to the above points, I am thinking that a new label in the dataset, with the key cache.plugin and as value the name of the caching plugin to use, could be used to identify which plugin to use against the installed ones. Also, when the value of this plugin is None the user could easily opt-out caching the dataset.

Any thoughts?

Thanks

Creating a dataset from s3 bucket with 1GB of data results in ~9000 GB PVC

Hello and thank you for this really cool project.

I am trying to create a dataset on a k8s cluster that is hosted on an open stack provider. And it seems that every time I create a dataset I get a pvc and pv that are very large (9314 Gi) even though the S3 bucket I am using only has dummy data that is less than 1 GB in total.

NAME                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
example-dataset-gcs   Bound    pvc-85d52fa7-ab1e-4a4a-abb9-2ab687455188   9314Gi     RWX            csi-s3         22m

I thought this was happening because I was using open stack s3 compatible storage. However the same thing occurred when using gcs (which is also supposed to be s3 compatible). I apologize I could not try AWS s3 out because I do not have easy access to an account.

Is there a way to specify how big the PV/PVC can be?

This is my only issue. Everything else seems to work.

I followed the templates here for creating the dataset:
https://github.com/IBM/dataset-lifecycle-framework/blob/master/examples/templates/example-dataset-s3-secrets.yaml
https://github.com/IBM/dataset-lifecycle-framework/blob/master/examples/templates/example-s3-secret.yaml

I installed using this command:

kubectl apply -f https://raw.githubusercontent.com/IBM/dataset-lifecycle-framework/master/release-tools/manifests/dlf.yaml

For this I am using kubernetes 1.19.6 on a rke cluster deployed on an openstack provider.

Use base64-encoded secrets for dataset configuration

I was trying to configure an S3 dataset with a separate secret definition and realized that Datashim works only when the secrets are in the stringData format. Since kubectl creates strings with values in data, it would be more convenient to allow both formats.

Dataset using NFS cannot be attached to pods

kubectl logs csi-attacher-nfsplugin-0 -c csi-attacher on the cluster showed that the volume could not be attached as there was no patch permission for csi-attacher-nfs-plugin

Errors during `make minikube-install`

Running make minikube-install mostly seems to work, but between loading the images into minikube and applying the yaml, spits out these errors:

/bin/bash: ./release-tools/generate-keys.sh: No such file or directory
/bin/bash: line 1: /tmp/tmp.w89e0ut2NV/ca.crt: No such file or directory
W0921 17:11:40.845263  162519 helpers.go:535] --dry-run is deprecated and can be replaced with --dry-run=client.
error: Cannot read file /tmp/tmp.w89e0ut2NV/webhook-server-tls.crt, open /tmp/tmp.w89e0ut2NV/webhook-server-tls.crt: no such file or directory
error: no objects passed to apply
/bin/bash: line 5: ./src/dataset-operator/deploy/webhook.yaml.template: No such file or directory
error: no objects passed to apply

It seems like the generate-keys.sh stuff happens in-cluster now, so maybe this is nothing to worry about? It's a bit disconcerting though :-)

DLF labels are having conflicts with Istio sidecar Injection

When trying to deploy a pod with DLF labels inside an namespace with istio inject, I'm seeing the errors below. It looks like there's some conflicts between the DLF and istio mutation.

The Pod "nginx" is invalid: spec.volumes[4].name: Duplicate value: "example-dataset"

Here is my pod

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: default
  labels:
    dataset.0.id: "example-dataset"
    dataset.0.useas: "mount"
spec:
  containers:
    - name: nginx
      image: nginx
EOF

Create new directory in NFS for each Dataset deployment

Hi,
today I tried to deploy both S3 and NFS Datasets in our environment and they work flawlessly.
However, I found out that the NFS doesn't set up a new directory for each deployment but uses same for all.
Beforehand we've been using nfs-client-provisioner (some helm chart) but it is deprecated now. With that, you configured the nfs path and server and it created new directory for each PVC (under configured nfs path).
This behaviour is very handy because when you don't know in advance what you need, the deployment will create it for you and you don't have to worry about creating new path for each Pod.

Could this be supported?

Watch multiple namespaces

By default we are looking at all the namespaces, we should pass a list of namespaces to monitor instead

Node-driver-registrar not working after reboot

Hi,
I would like if container node-driver-registrar in daemonset csi-nodeplugin-nfsplugin is expected to fail and not restart after some node problem. If a node reboots, the container node-driver-registrar stops working with log

I0202 01:11:30.758475       1 node_register.go:58] Starting Registration Server at: /registration/nfs.csi.k8s.io-reg.sock
I0202 01:11:30.758743       1 node_register.go:67] Registration Server started at: /registration/nfs.csi.k8s.io-reg.sock
I0202 01:11:31.425742       1 main.go:77] Received GetInfo call: &InfoRequest{}
I0202 01:11:54.363628       1 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
E0202 14:07:00.314932       1 connection.go:129] Lost connection to unix:///plugin/csi.sock.

I've found out when I created job which had to mount some PVC with csi-nfs storageclass and it did not schedule on the node which was rebooted yesterday. Logs from the job:

Warning  FailedMount  51s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[dest-volume], unattached volumes=[dest-volume default-token-9btxt]: timed out waiting for the condition
  Warning  FailedMount  45s (x9 over 2m53s)  kubelet            MountVolume.MountDevice failed for volume "pvc-bd3d6316-0342-45f0-981d-0cdc9ca165c3" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name nfs.csi.k8s.io not found in the list of registered CSI drivers

Shouldn't it be somehow periodically ensured that the daemon is alive? I can create a cronjob or something for my cluster but thought I will ask first.

Limit access to datasets to specific pods running specific images

Currenty the user can create a dataset like this:

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: your-dataset
spec:
  local:
    type: "COS"
    accessKeyID: "{AWS_ACCESS_KEY_ID}"
    secretAccessKey: "{AWS_SECRET_ACCESS_KEY}"
    endpoint: "{S3_SERVICE_URL}"
    bucket: "{BUCKET_NAME}"
    region: "" #it can be empty

Then if they specify a pod like this:

apiVersion: v1
kind: Pod
metadata:
  name: simple-nginx
  labels:
    dataset.0.id: "your-dataset"
    dataset.0.useas: "configmap"
spec:
  containers:
    - name: nginx
      image: nginx

It will be mutated as follows:

    - configMapRef:
        name: your-dataset
      prefix: your-dataset_
    - prefix: your-dataset_
      secretRef:
        name: your-dataset

As a result the credentials would be available in the pod with the your-dataset_ prefix as env variables.

However, there are scenarios where we only want authorized images to access the credentials and not any pod.

We are designing with @mrsabath how this could be achieved with https://github.com/IBM/trusted-service-identity and this issue will capture this process.

From the DLF perspective we need to upload to Vault the secrets once a Dataset is created. The key-values would look like this:

<cluster>/<namespace>/<dataset>/accessKeyID
<cluster>/<namespace>/<dataset>/secretAccessKey
....

Then we need to modify our admission controller would add the necessary labels to the user's pod which would allow TSI to check whether this pod can use these credentials or not. Ideally it should work as before and expose it as env variables:

<dataset>_accessKeyID = xxxxx
<dataset>_secretAccessKey = xxxx

In case the image is not authorized, the credentials should not be injected

Support H3 as an additional dataset type

H3 is an embedded High speed, High volume, and High availability object store, backed by a high-performance key-value store (RocksDB, Redis, etc.). H3 also provides a FUSE implementation to allow object access using file semantics. The CSI H3 mount plugin (csi-h3 for short), allows you to use H3 FUSE for implementing persistent volumes in Kubernetes.

In practice, csi-h3 implements a fast and efficient filesystem on top of a key-value store. With csi-h3 deployed, and a Redis server running, you just need to specify the Redis endpoint and the bucket name you want to use, in order to get a mountpoint for your containers. H3 is embedded in csi-h3, so there are no other requirements to install.

H3 could be supported In DLF, with a dataset definition like the following:

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: example-dataset
spec:
  local:
    type: "H3"
    storageUri: "redis://redis.default.svc:6379"
    bucket: "b1"

Note that H3 supports many additional key-value stores, but in the distributed environment of Kubernetes, you need a key-value store that can be accessed through a network protocol. For persistent storage, Ardb provides Redis connectivity over a range of key-value implementations, including RocksDB, LevelDB, and others. In that case, the storageUri used will still be in the form redis://..., but the actual service will be provided by Ardb.

DaemonSets csi-s3 and csi-nodeplugin-nfsplugin are unable to create pods due to missing RBAC configuration for ServiceAccounts

The manifest file does not configure the SecurityContextConstraints for the following ServiceAccounts:

  • csi-provisioner
  • csi-s3
  • csi-nodeplugin
  • csi-attacher

As a result, on OpenShift, containers which are expecting to run in privileged mode are unable to get access to features such as hostNetwork, and hostPath. In turn, the DaemonSets csi-s3 and csi-nodeplugin-nfsplugin are unable to spawn pods on the cluster nodes because the ServiceAccounts csi-s3 and csi-nodeplugin are not registered as users in the privileged SecurityContextConstraints. Similar issues manifest due to the ServiceAccounts csi-attacher and csi-provisioner not being registered as users in the privileged SecurityContextConstraints.

Done when

The instructions in the README.md file address the RBAC configuration of the service accounts. This can either be done via

oc adm policy add-scc-to-user privileged -n dlf -z csi-provisioner -z csi-s3 -z csi-nodeplugin -z csi-attacher

alternatively, the instructions could use the JSON patch feature of the kubectl utility like so:

kubectl patch scc privileged --type=json -p '[
  {"op": "add", "path": "/users/-", "value": "system:serviceaccount:dlf:csi-provisioner"}, 
  {"op": "add", "path": "/users/-", "value": "system:serviceaccount:dlf:csi-nodeplugin"}, 
  {"op": "add", "path": "/users/-", "value": "system:serviceaccount:dlf:csi-attacher"}, 
  {"op": "add", "path": "/users/-", "value": "system:serviceaccount:dlf:csi-s3"}]'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.