Git Product home page Git Product logo

containerized-data-importer's Introduction

Containerized Data Importer

Go Report Card Coverage Status Licensed under Apache License version 2.0

Containerized-Data-Importer (CDI) is a persistent storage management add-on for Kubernetes. It's primary goal is to provide a declarative way to build Virtual Machine Disks on PVCs for Kubevirt VMs

CDI works with standard core Kubernetes resources and is storage device agnostic, while its primary focus is to build disk images for Kubevirt, it's also useful outside of a Kubevirt context to use for initializing your Kubernetes Volumes with data.

Introduction

Kubernetes extension to populate PVCs with VM disk images or other data

CDI provides the ability to populate PVCs with VM images or other data upon creation. The data can come from different sources: a URL, a container registry, another PVC (clone), or an upload from a client.

DataVolumes

CDI includes a CustomResourceDefinition (CRD) that provides an object of type DataVolume. The DataVolume is an abstraction on top of the standard Kubernetes PVC and can be used to automate creation and population of a PVC with data. Although you can use PVCs directly with CDI, DataVolumes are the preferred method since they offer full functionality, a stable API, and better integration with kubevirt. More details about DataVolumes can be found here.

Import from URL

This method is selected when you create a DataVolume with an http source. CDI will populate the volume using a pod that will download from the given URL and handle the content according to the contentType setting (see below). It is possible to configure basic authentication using a secret and specify custom TLS certificates in a ConfigMap.

Import from container registry

When a DataVolume has a registry source CDI will populate the volume with a Container Disk downloaded from the given image URL. The only valid contentType for this source is kubevirt and the image must be a Container Disk. More details can be found here.

Clone another PVC

To clone a PVC, create a DataVolume with a pvc source and specify namespace and name of the source PVC. CDI will attempt an efficient clone of the PVC using the storage backend if possible. Otherwise, the data will be transferred to the target PVC using a TLS secured connection between two pods on the cluster network. More details can be found here.

Upload from a client

To upload data to a PVC from a client machine first create a DataVolume with an upload source. CDI will prepare to receive data via an upload proxy which will transit data from an authenticated client to a pod which will populate the PVC according to the contentType setting. To send data to the upload proxy you must have a valid UploadToken. See the upload documentation for details.

Prepare an empty Kubevirt VM disk

The special source blank can be used to populate a volume with an empty Kubevirt VM disk. This source is valid only with the kubevirt contentType. CDI will create a VM disk on the PVC which uses all of the available space. See here for an example.

Import from oVirt

Virtual machine disks can be imported from a running oVirt installation using the imageio source. CDI will use the provided credentials to securely transfer the indicated oVirt disk image so that it can be used with kubevirt. See here for more information and examples.

Import from VMware

Disks can be imported from VMware with the vddk source. CDI will transfer the disks using vCenter/ESX API credentials and a user-provided image containing the non-redistributable VDDK library. See here for instructions.

Content Types

CDI features specialized handling for two types of content: Kubevirt VM disk images and tar archives.

  • The kubevirt content type indicates that the data being imported should be treated as a Kubevirt VM disk. CDI will automatically decompress and convert the file from qcow2 to raw format if needed. It will also resize the disk to use all available space.
  • The archive content type indicates that the data is a tar archive. Compression is not yet supported for archives. CDI will extract the contents of the archive into the volume; which can then be used with either a regular pod, or a VM using Kubevirt's filesystem feature.

The content type can be selected by specifying the contentType field in the DataVolume. kubevirt is the default content type.

CDI only supports certain combinations of source and contentType as indicated below:

  • httpkubevirt, archive
  • registrykubevirt
  • pvc → Not applicable - content is cloned
  • uploadkubevirt
  • imageiokubevirt
  • vddkkubevirt

Deploy it

Deploying the CDI controller is straightforward. In this document the default namespace is used, but in a production setup a protected namespace that is inaccessible to regular users should be used instead.

$ export VERSION=$(curl -s https://api.github.com/repos/kubevirt/containerized-data-importer/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
$ kubectl create -f https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-operator.yaml
$ kubectl create -f https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-cr.yaml

Use it

Create a DataVolume and populate it with data from an http source

$ kubectl create -f https://raw.githubusercontent.com/kubevirt/containerized-data-importer/$VERSION/manifests/example/import-kubevirt-datavolume.yaml

There are quite a few examples in the example manifests, check them out as a reference to create DataVolumes from additional sources like registries, S3, GCS and your local system.

Hack it

CDI includes a self contained development and test environment. We use Docker to build, and we provide a simple way to get a test cluster up and running on your laptop. The development tools include a version of kubectl that you can use to communicate with the cluster. A wrapper script to communicate with the cluster can be invoked using ./cluster-up/kubectl.sh.

$ mkdir $GOPATH/src/kubevirt.io && cd $GOPATH/src/kubevirt.io
$ git clone https://github.com/kubevirt/containerized-data-importer && cd containerized-data-importer
$ make cluster-up
$ make cluster-sync
$ ./cluster-up/kubectl.sh .....

For development on external cluster (not provisioned by our CI), check out the external provider.

Storage notes

CDI is designed to be storage agnostic. Since it works with the kubernetes storage APIs it should work well with any configuration that can produce a Bound PVC. The following are storage-specific notes that may be relevant when using CDI.

  • NFSv3 is not supported: CDI uses qemu-img to manipulate disk images and this program uses locking which is not compatible with the obsolete NFSv3 protocol. We recommend using NFSv4.

Connect with us

We'd love to hear from you, reach out on Github via Issues or Pull Requests!

Hit us up on Slack

Shoot us an email at: [email protected]

More details

  1. Hacking details
  2. Design docs
  3. Kubevirt documentation

containerized-data-importer's People

Contributors

aglitke avatar akalenyu avatar alicefr avatar alromeros avatar annastopel avatar arnongilboa avatar assafad avatar awels avatar brybacki avatar copejon avatar danielerez avatar davidvossel avatar eduardgomezescandell avatar fabiand avatar igoihman avatar j-griffith avatar jeffvance avatar kubevirt-bot avatar maya-r avatar mhenriks avatar mrnold avatar nunnatsa avatar rollandf avatar screeley44 avatar shellyka13 avatar tomob avatar u5surf avatar visheshtanksale avatar zherman0 avatar zvikorn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

containerized-data-importer's Issues

Do we have a plan to integrate this repo into kubevirt/kubevirt ?

I'm working on a research on how to setup e2e test, there're two solutions here: 1. setting up a ci in this repo. 2. integrate e2e test in kubevirt/kubevirt, integrate in kubevirt/kubevirt has some pros but it will be no meaning if we didn't integrate the code base into kubevirt/kubevirt: no pr trigger tests. so I was just asking if there's a long term plan for doing that ?

@jeffvance @copejon @aglitke

Validate image after copy

We are likely exposed to vulnerabilities by allowing users to import any image into the kubernetes cluster.

CDI version issues

CDI "releases" are arbitrary, do not represent any functional composition, and only serve to keep track of the latest kubevirt version string. There really is no CDI version, instead we have obscured "latest" with kubvirt's version tag.

Per a recent email from @davidvossel:
CDI's release are overwritten with every commit. This means that someone using the CDI v0.5.2 will get different code each time they they deploy depending on what happens upstream

I propose that we either:

  1. make CDI releases real and immutable. This means CDI updates its release tag when we have collected a group of prs worthy of being released, and we support that release for some reasonable number of months. It also requires that kubevirt-ansible be able to consume a specific CDI release that may differ from the target kubevirt release tag. Or,
  2. stop pretending that CDI has releases and use "latest".

Interested in everyone's thoughts on this. @copejon @screeley44 @erinboyd @davidvossel @aglitke

Containerized only build breaks downstream

Containerizing the build process removed the ability to do a non-containerized build. Can the ability to do a non-containerized build be put back into the makefile so that the option is available?

add label to pvc if user did not create it

We should add a label to the pvc object if a user has not added it, it's not a requirement for processing but a nice UXP to give ability to easily filter on CDI type resources using kubectl .i.e.

   kubectl get pvc -l app=containerized-data-importer --all-namespaces

New containerized `make` requires --privileged or `setenforce 0`

make fails with a perms error on a rhel 7.4 vm:
Eg: make controller

stat /go/src/github.com/kubevirt/containerized-data-importer/cmd/controller/controller.go: permission denied

Also, shelling into the golang container shows the perms issue:

: docker run -it --rm -w /go/src/github.com/kubevirt/containerized-data-importer -v $PWD:/go/src/github.com/kubevirt/containerized-data-importer 3f30f1fc3c43 sh
# pwd
/go/src/github.com/kubevirt/containerized-data-importer
# ls -l
ls: cannot open directory '.': Permission denied
# id
uid=0(root) gid=0(root) groups=0(root)
# mount|grep import
/dev/mapper/rhel-root on /go/src/github.com/kubevirt/containerized-data-importer type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

Running the golang container privileged or setenforce 0 fixes the perms problem.

cdi rbac roles

The advice in the primary README.md to bind the default service account for a namespace with cluster-admin privileges isn't wise. Even with the disclaimer, we don't want that accidentally replicated.

For example, I already see that kubevirt-ansible is using this method to deploy CDI right now. I'd hate for something like this to accidentally make it's way into production someday.

short term fix

As a short term method, we'd be better off giving the CDI deployment a service-account with cluster-admin roles rather than binding cluster-admin to the default account in a namespace.

I'd recommend making this change asap before CDI gains any more traction.

long term fix

add a ServiceAccount, RBAC ClusterRole, and ClusterRoleBinding into a manifest and make the cdi-controller-deployment.yaml reference the service account.

The kubevirt manifest has some examples of how we do this for our controllers.
https://github.com/kubevirt/kubevirt/blob/master/manifests/release/kubevirt.yaml.in

Improve controller unit tests

The current controller unit tests are more like functional tests in that they call higher level functions (NewController, ProcessNextItem) rather than some of lower level funcs (which may need to be exported?).

  1. We should consider moving the current unit tests to /test/ so they are treated as functional tests and create real controller unit tests.
  2. We probably should remove the tests using an empty namespace since this is not a supported condition.

Do not drop items from the queue if they error during processing

The call to should be reversed so that it calls Forget() on the key only if processItem() does not return an error. Errors that occur in processing may be ephemeral and only require that the process be retried. As such, keys that error should be requeued.

dockerhub image tag latest need to be latest image

When I was working on CDI e2e test in kubevirt-ansible pr here: kubevirt/kubevirt-ansible#246, I found the docker images here latest haven't been updated for a month, the actually latest image is v0.5.0-alpha.0.

The reason why this bothering me is because the old image I use doesn't have any label tagged on importer-pod, so it's not convenient for you sift some result you want to check (unless use regex but I think using label is kind of best practice)

So my understanding of latest == the latest usable image we pushed == v0.5.0-alpha.0

Could someone help to update those images IIUC ? also related to pr here: kubevirt/kubevirt-ansible#243

https://hub.docker.com/r/kubevirt/cdi-controller/
https://hub.docker.com/r/kubevirt/cdi-importer/

Streaming Data Conversion test fails

Test failure after the travis fix:

• Failure [0.037 seconds]
Streaming Data Conversion
/home/travis/gopath/src/github.com/kubevirt/containerized-data-importer/test/datastream/datastream_test.go:26
  when data is in a supported file format
  /home/travis/gopath/src/github.com/kubevirt/containerized-data-importer/test/datastream/datastream_test.go:28
    should convert .qcow2 [It]
    /home/travis/gopath/src/github.com/kubevirt/containerized-data-importer/test/datastream/datastream_test.go:79
    Test data filename doesn't match expected file name.
    Expected
        <string>: tinyCore.qcow2
    to equal
        <string>: tinyCore.iso.qcow2

Versioning releases

It's high time that releases of CDI be versioned in a format that conforms with Openshift and Kubevirt (v#.#.#). A versioning needs to be agreed upon to mark major, minor, and patch milestones. Some automation should be implemented for incrementing each to avoid human error.

controller needs to track pvc updates

Consider:

  1. create pvc but forget endpoint anno
  2. controller sees pvc but calls q.forget()
  3. user edits pvc and adds ep anno
  4. controller never sees the pvc update.
    Therefore the user has to delete the pvc, edit it, then re-create it.

Need ginko and gomega vendor'd

:  cd pkg/controller/
: ls
controller.go  controller_suite_test.go  controller_test.go  util.go
: go test -c
# github.com/kubevirt/containerized-data-importer/pkg/controller
controller_suite_test.go:4:2: cannot find package "github.com/onsi/ginkgo" in any of:
	/root/go/src/github.com/kubevirt/containerized-data-importer/vendor/github.com/onsi/ginkgo (vendor tree)
	/usr/local/go/src/github.com/onsi/ginkgo (from $GOROOT)
	/root/go/src/github.com/onsi/ginkgo (from $GOPATH)
FAIL	github.com/kubevirt/containerized-data-importer/pkg/controller [setup failed]

cid label names

We need to consider removing the 'kubevirt" prefix from all cdi labels. The reason being that we want to eventually decouple cdi from kubevirt.

Add access to secrets for RBAC

rules:
- apiGroups: [""]
  resources: ["persistentvolumeclaims"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch", "create"]
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "list", "watch", "create"]

need to clean up some general flow and error recovery scenarios

  1. If importer pod fails to create - don't want to forget the key in processItem
    - to ensure we try to process again

  2. importer race condition - multiple imports on a single pvc

$ kubectl logs importer-golden-pvcmzqld
I0425 17:36:37.917054       5 importer.go:35] main: Starting importer
I0425 17:36:37.919191       5 importer.go:50] main: importing file "tinyCore.qcow2.gz"
W0425 17:36:37.919256       5 dataStream.go:42] NewDataStream: IMPORTER_ACCESS_KEY_ID and/or IMPORTER_SECRET_KEY env variables are empty
I0425 17:36:37.919470       5 dataStream.go:71] Using S3 client to get data
I0425 17:36:37.919657       5 dataStream.go:78] Attempting to get object "s3://kubevirt-images/tinyCore.qcow2.gz" via S3 client
I0425 17:36:37.919850       5 importer.go:58] Beginning import from "/tinyCore.qcow2.gz"
I0425 17:36:37.919871       5 decompress.go:26] UnpackData: checking compressed and/or archive for file "tinyCore.qcow2.gz"
I0425 17:36:37.919875       5 decompress.go:45] DecompressData: checking if "tinyCore.qcow2.gz" is compressed
I0425 17:36:38.268947       5 decompress.go:59] DecompressData: decompressed "tinyCore.qcow2.gz"
I0425 17:36:38.268995       5 decompress.go:70] DearchiveData: checking if "tinyCore.qcow2" is an archive file
I0425 17:36:38.269281       5 util.go:44] StreamDataToFile: begin import...
I0425 17:36:52.916024       5 importer.go:88] main: converting qcow2 image to raw
I0425 17:36:52.949694       5 importer.go:98] main: Import complete, exiting
jcope@jonMBP | ~/.../kubevirt/manifests
$ kubectl logs importer-golden-pvcs9ntq
I0425 17:36:39.276378       7 importer.go:35] main: Starting importer
I0425 17:36:39.276943       7 importer.go:50] main: importing file "tinyCore.qcow2.gz"
W0425 17:36:39.276975       7 dataStream.go:42] NewDataStream: IMPORTER_ACCESS_KEY_ID and/or IMPORTER_SECRET_KEY env variables are empty
I0425 17:36:39.277832       7 dataStream.go:71] Using S3 client to get data
I0425 17:36:39.277960       7 dataStream.go:78] Attempting to get object "s3://kubevirt-images/tinyCore.qcow2.gz" via S3 client
I0425 17:36:39.278006       7 importer.go:58] Beginning import from "/tinyCore.qcow2.gz"
I0425 17:36:39.278050       7 decompress.go:26] UnpackData: checking compressed and/or archive for file "tinyCore.qcow2.gz"
I0425 17:36:39.278063       7 decompress.go:45] DecompressData: checking if "tinyCore.qcow2.gz" is compressed
I0425 17:36:39.536339       7 decompress.go:59] DecompressData: decompressed "tinyCore.qcow2.gz"
I0425 17:36:39.536358       7 decompress.go:70] DearchiveData: checking if "tinyCore.qcow2" is an archive file
I0425 17:36:39.536531       7 util.go:44] StreamDataToFile: begin import...
I0425 17:36:41.136349       7 importer.go:88] main: converting qcow2 image to raw
I0425 17:36:41.166394       7 importer.go:98] main: Import complete, exiting

DataVolume CRD Implementation Tasks

Tasks for Initial PR: #189

  • add new package dependencies
  • define DataVolume API and add client/informer generators
  • refactor cdi controller component in preparation for multiple controller loops
  • introduce DataVolume Controller
  • add DataVolume controller unit tests

Followup Doc Tasks

  • add DataVolume documentation to CDI README

Followup Dev Tasks

  • add filesystem source
  • add DataVolume events
  • add DataVolume functional tests
  • add autogen tests to travis (verifies autogen code is up to date)
  • Autogenerated openapiv3 crd validation
  • validating webhook
  • autogenerated swagger (similar to what we have for kubevirt)

import-pod error will conflict with next creatation

Let's say I'm about to create a pvc but provided some wrong endpoint url,
then the pod created by cdi will error out, then I figure it out my url was wrong, I have to manually delete the pod, then delele pvc, then change pvc and create.

but ideally, I just want to oc edit pvc change to a valid url, then let the cdi do the rest of job.

determine image type w/o relying on file extension names

Today VM images are validated only by their filename extension. We could improve this by peeking inside the image and verifying headers, checksums (if any), etc to be more certain that we have a supported image type.
Something to consider is should we use the file's extension (suffix) at all? What if an image is named
foo.tar.gz but it is not a gzip file? Is this an error or do we ignore the extension?

(travis) All tests fail due to minikube error

Here's an example:
https://travis-ci.org/kubevirt/containerized-data-importer/builds/365227336#L518

Setting environment variables from .travis.yml
$ export CHANGE_MINIKUBE_NONE_USER=true
$ export K8S_VER=1.9.0
$ export K6T_VER=0.3.0
$ export SRC="http://www.tinycorelinux.net/9.x/x86/release/Core-current.iso"
4.79s$ GIMME_OUTPUT="$(gimme 1.10 | tee -a $HOME/.bashrc)" && eval "$GIMME_OUTPUT"
go version go1.10 linux/amd64
$ export GOPATH=$HOME/gopath
$ export PATH=$HOME/gopath/bin:$PATH
$ mkdir -p $HOME/gopath/src/github.com/kubevirt/containerized-data-importer
$ rsync -az ${TRAVIS_BUILD_DIR}/ $HOME/gopath/src/github.com/kubevirt/containerized-data-importer/
$ export TRAVIS_BUILD_DIR=$HOME/gopath/src/github.com/kubevirt/containerized-data-importer
$ cd $HOME/gopath/src/github.com/kubevirt/containerized-data-importer
0.01s
$ gimme version
v1.3.0
$ go version
go version go1.10 linux/amd64
go.env
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/travis/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/travis/gopath"
GORACE=""
GOROOT="/home/travis/.gimme/versions/go1.10.linux.amd64"
GOTMPDIR=""
GOTOOLDIR="/home/travis/.gimme/versions/go1.10.linux.amd64/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build731311504=/tmp/go-build -gno-record-gcc-switches"
Using Go 1.5 Vendoring, not checking for Godeps
install
0.00s$ true
before_script.1
1.79s$ curl -Lo kubectl https://storage.googleapis.com/kubernetes-release/release/v$K8S_VER/bin/linux/amd64/kubectl && chmod +x kubectl && sudo mv kubectl /usr/local/bin/
before_script.2
0.49s$ curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/
41.27s$ sudo minikube start --vm-driver=none --kubernetes-version=v$K8S_VER
Starting local Kubernetes v1.9.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Downloading kubeadm v1.9.0
Downloading kubelet v1.9.0
Finished Downloading kubelet v1.9.0
Finished Downloading kubeadm v1.9.0
E0411 17:20:38.620011    4114 start.go:234] Error updating cluster:  starting kubelet: running command: 
sudo systemctl daemon-reload &&
sudo systemctl enable kubelet &&
sudo systemctl start kubelet
: exit status 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.