contiv-experimental / volplugin Goto Github PK
View Code? Open in Web Editor NEW**EXPERIMENTAL** Contiv Storage: Policy backed Clustered Storage (via Ceph or NFS) for Docker
License: Other
**EXPERIMENTAL** Contiv Storage: Policy backed Clustered Storage (via Ceph or NFS) for Docker
License: Other
Some ideas (please add your own!):
Splitting the policy into runtime and create-time configuration policies seems like the best idea for the purposes of modifying the runtime configuration and keeping the create-time configuration static, something that has been a problem up to this point.
Consists of things that would be applied in a variant facility, like at mount time.
Consists of things that would need to be applied at create time and cannot be modified at runtime, such as size.
Logic would look something like this:
This is a big change and I believe I will take it myself.
volsupervisor needs a HTTP REST endpoint to service a signal from applications or database containers which wish to sync their filesystems before snapshotting for a consistent snapshot; this signal would create a manual snapshot that would still get cleaned up in a normal keep
scenario.
The signaling will be toggle-able through the policy by adding a signal: true
property in the snapshot runtime configuration. It is orthogonal to the standard snaps (which should now accept a frequency
of null to turn them off, but still allow for keep
pruning)
It would also be nice if we had a tool that accomplished the client side of this operation, e.g., "run a command then fire the snapshot, then run another command". Wrapping this will play well into the security story later.
Currently this would involve changes to the volmaster, config library and volcli. If the volmaster was smart enough to return 404 on etcd "key not found" failures, it would be a much nicer user experience, and easier to test in the long term I think. It could also exit a different value if the etcd key is missing.
Hi there.
We are about to open source our Ceph RDB driver plugin and wondered if you all would like to collaborate on this to make one good one?
Let me know and thanks!
Hey there,
I am carefully crawl towards CEPH backed volumes after I played around with the new networking:
http://qnib.org/2015/11/29/multi-host-slurm/
I would like to expand the post with ceph-backed volumes. For now you only support etcd
if I get the config/config.go
file right... Not [yet] a way to plug in Consul
. No big deal, I could spin up an etcd-instance, no problem.
Just so that I know. Maybe others are interested in this as well.
Cheers, keep up the great work
Christian
volcli
has many tools that use this notation: volcli command subcommand tenant volume
, which I think would be more canonically represented as volcli command subcommand tenant/volume
like it is used inside docker.
I am just testing this out and I get an error when creating a volume and attempting to mount it into a container.
Here are some logs from volplugin:
time="2015-11-30T19:15:48Z" level=debug msg="Dispatching Path with {\"Name\":\"tenant1/foo\"}"
time="2015-11-30T19:15:48Z" level=info msg="Returning mount path to docker for volume: \"tenant1/foo\""
time="2015-11-30T19:15:48Z" level=warning msg="Returning HTTP error handling plugin negotiation: Requesting tenant configuration Status was not 200: was 404: \"\""
time="2015-11-30T19:15:48Z" level=debug msg="Dispatching Mount with {\"Name\":\"tenant1/foo\"}"
time="2015-11-30T19:15:48Z" level=info msg="Mounting volume \"tenant1/foo\""
time="2015-11-30T19:15:48Z" level=warning msg="Returning HTTP error handling plugin negotiation: Could not determine tenant configuration Status was not 200: was 404: \"\""
The logs from volmaster shows that the it get the requests:
time="2015-11-30T19:15:48Z" level=debug msg="Dispatching /request with {\"volume\":\"tenant1/foo\",\"tenant\":\"tenant1\"}"
time="2015-11-30T19:15:48Z" level=debug msg="Dispatching /request with {\"volume\":\"foo\",\"tenant\":\"tenant1\"}"
but the volume
is different.
should https://github.com/contiv/volplugin/blob/master/volplugin/handlers.go#L132
be volConfig, err := requestVolumeConfig(master, uc.Tenant, uc.Name)
?
A policy system for groups with large numbers of volumes to garbage collect volumes after they have been stale for a certain period of time.
Probably should be applied at the policy level. Volsupervisor can take care of the dirty work.
Currently, if the rbd
tool hangs, so does our suite of tools. incorporate a sideband timeout that appropriately errors out when this happens, so the administrator can fix the ceph installation (Which is probably already hosed)
I wonder if IOPS in particular should be handled as a part of the volume policy, or as a parameter on mount. Currently it does the former. I think it might be best if it did the latter. I'm not sure what the best solution is right now.
reproduction:
volcli policy upload policy1 < /testdata/intent1.json
volcli volume create policy1/test
volcli policy upload policy1 < /testdata/fastsnap.json
volcli volume create policy1/test2
volcli volume list-all
<output>
policy1/test2
</output>
The resulting policy1/test
is still in existence, but its etcd record is gone, basically orphaning it.
This should be an easy patch so I would ask that some others on the team could take it?
Right now we pass these around the daemon
object quite a bit, and in volsupervisor's case, a lot of arguments that travel down the stack. We should organize these into structs the methods live on, so that they can re-use the same data without having ugly call sites.
I've got a straightforward test setup going. With rbd I can create, map and mount volumes.
Running:
./volmaster --etcd http://172.16.31.215:24 --debug &
./volplugin --debug &
docker volume create -d volplugin --name='tenant1/test3'
I get
#( from volmaster)
DEBU[0109] Dispatching /create with {"tenant":"tenant1","volume":"test3","opts":{}}
DEBU[0110] mapped volume "tenant1.test3" as "/dev/rbd1"
WARN[0110] Returning HTTP error handling plugin negotiation: Creating volume exit status 1
#( from volplugin)
DEBU[0097] Dispatching Create with {"Name":"tenant1/test3","Opts":{}}
WARN[0098] Returning HTTP error handling plugin negotiation: Could not determine tenant configuration Status was not 200: was 500: "Creating volume exit status 1"
#(from docker)
Error response from daemon: Plugin Error: VolumeDriver.Create, {"Mountpoint":"","Err":"Could not determine tenant configuration Status was not 200: was 500: \"Creating volume exit status 1\""}
The end result is that the rbd volume is created, mappped, and the ext4 fs is created on it, but volcli volume list-all
doesnt show it. This is what's in etcd:
/volplugin/volumes
/volplugin/volumes/tenant1
/volplugin/users
/volplugin/users/tenant1
/volplugin/users/tenant1/test3
/volplugin/tenants
/volplugin/tenants/tenant1
I'm not sure what other debugging I can do on this. It's not clear to me what step of the create is failing.
After creating a volume, parameters like read/write iops should be tweakable. Currently, the system is designed to do all the work up front and ignore future attempts to modify the parameters are ignored.
What we need to do is identify modifiable parameters (e.g., iops or snapshot schedule) and partition them from the unmodifiable ones (e.g., volume group or pool), and provide some kind of gate that allows the modifiable ones to dispatch handlers which then correct anythign that needs to be corrected, overlay the new configuration over the old one, and push the corrected result safely. This will need to happen on the volmaster and propagate down to the volplugin, so we need some kind of pub/sub functionality I think.
It will also need to acquire a use lock for the transaction processing time.
We don't right now (we handle this in etcd), but we should do both for safety.
The info
package allows us to export some basic information about the health of the services. We can probably use this to define thresholds within the services themselves to take action based on how healthy they are; e.g. a volplugin who's failing a lot of mounts, or has an obnoxious number of goroutines, may choose to restart itself or refuse mounting new devices until serviced.
It should also be able to signal a REST endpoint with any health updates.
This is more of an observation than an issue, so feel free to close it.
I am trying to run the volplugin daemons in containers on a system that has ceph installed in containers.
I am sure I am missing something, but I thought that docker plugins were designed to be able to run as in containers. So this is an experiment.
I run some containers as follows:
docker run -d --name volmaster \
--net=host --privileged -e KV_IP=10.1.0.11 \
vol \
volmaster --debug --listen 10.1.0.11:9005 --etcd http://10.1.0.11:2379
docker run -d --name volsuper \
--net=host --privileged -e KV_IP=10.1.0.11 \
vol \
volsupervisor --etcd http://10.1.0.11:2379
docker run -d --name volplug \
--net=host --privileged -e KV_IP=10.1.0.11 \
-v /var/run/docker:/var/run/docker \
-v /sys:/sys -v /dev:/dev \
volplug \
volplugin --debug --master 10.1.0.11:9005
In the volplug container I can list my rbd images and map/unmap them fine with rbd
commands.
docker volume create -d volplugin --name=tenant1/test
works fine.
However docker run -it --rm -v tenant1/test:/mnt ubuntu bash
will generate an error from docker:
Timestamp: 2015-12-01 16:14:49.745337292 -0500 EST
Code: System error
Message: stat /mnt/ceph/rbd/tenant1.test: no such file or directory
Frames:
---
0: setupRootfs
Package: github.com/opencontainers/runc/libcontainer
File: rootfs_linux.go@40
---
1: Init
Package: github.com/opencontainers/runc/libcontainer.(*linuxStandardInit)
File: standard_init_linux.go@57
---
2: StartInitialization
Package: github.com/opencontainers/runc/libcontainer.(*LinuxFactory)
File: factory_linux.go@242
---
3: initializer
Package: github.com/docker/docker/daemon/execdriver/native
File: init.go@35
---
4: Init
Package: github.com/docker/docker/pkg/reexec
File: reexec.go@26
---
5: main
Package: main
File: docker.go@18
---
6: main
Package: runtime
File: proc.go@63
---
7: goexit
Package: runtime
File: asm_amd64.s@2232
Error response from daemon: Cannot start container 18abc76c63d7b2661b6423e553c5c03d03bca6874681cd1564fc225b7e3a923e: [8] System error: stat /mnt/ceph/rbd/tenant1.test: no such file or directory
I am guessing that I am not propagating the mounted filesystem correctly to the system docker. I have put a check in the storage/ceph/ceph.go code and the filesystem is mounted and readable in the volplug container. Everything then gets cleaned up after the error.
I have MountFlags=slave
set for the docker daemon.
Should I share the mount point /mnt/ceph
?
Are there any hints on how to run the volplugin
on CoreOS?
We run a fully Dockerized Mesos and Ceph environment, and for using stateful containers, we'd like to use Docker volumes on Ceph. As far as I understand the installation of this plugin requires the some libraries to be present on the host system.
As CoreOS works a little different then CentOS/Debian/etc., is there a way to run the plugin within a container itself? If no, how would I otherwise be able to use it on CoreOS?
Several error messages aren't very clear or incomplete, so we need to flesh this out some.
This breaks VM setup about 50% of the time. Use a static file in the ansible subtree instead.
something that supports 20G
or 14.4k
.
apply IOPS to new volumes on volmasters for formatting
probably just needs a defer in the right spot.
Questions arising from #78 seemed to indicate that implementing mandatory rbd locking would solve a few problems acquiring locks in circumstances where network partitions were present.
Implementing this directly in the storage layer is the way to go. It should not be part of the driver interface, but instead be an internal component to the ceph driver.
Use locks and mount operations both need to expire when the host dies, so that other hosts can reschedule the job and get the volumes they need.
Currently we have no support for this; the volumes will linger in etcd until someone manually clears them., and cannot be mounted anywhere else by volplugin. This is unacceptable.
Solution: implement TTL backed publishing of records to etcd. This will allow a failing volplugin to expire gracefully.
Requirements:
volplugin needs a feature which allows us to retrieve debugging information from a running system and display it in the CLI.
The following bits of information would be the minimum set of fields to show:
the number of file descriptors
volplugin version
ceph version (?)
architecture
kernel version
This will enable better volplugin recovery on host restart.
This is a bucket-list of ideas @jainvipin and I sketched out while thinking about this problem.
Basically we want to have a basic RBAC in the sense that each consumer has a role, and roles are used to gain access to things.
Certs are tied to roles. Certs are the unit of authentication. Superusers will have a secondary set of certs to access etcd directly. We are thinking about using notary or atlas to manage this portion.
As mentioned, there is a super-user group that has access to several commands, like:
get
which will be role-scoped.Tenants are policies which group other policies, and a role. Tenants live above the policy level and scope policies by role. Roles in this iteration will have no place in policies themselves (but it is expected this will change).
These are superuser commands:
Role-gated commands:
All other commands with volmaster access will be role-gated, otherwise etcd certs will be used.
This driver shall be primarily used for testing and benchmarking voplugin functionality in isolation.
The volmaster, volsupervisor, and volplugin should all expose metrics endpoints over their transports, e.g. volplugin serves over the unix socket and volmaster can serve over http. volsupervisor should probably publish over a unix socket as well to avoid having to expose a network endpoint.
Metrics to expose:
info
packageSnapshots in volplugin suck. Let's make them better.
(checklist is for separate PRs that could solve this)
volume copy <tenant> <orig> <snap> <dest>
:
write the volume after rbd create instead of mkfs.
This would provide top-level configuration the volmaster, volsupervisor, and volplugin share and read (and refresh) from accordingly.
The system should be designed to:
volcli config upload
See comments in #78 for more information
I noticed today that the process of taking a snapshot can race volcli volume remove
when the snapshot delay is short enough so that they compete.
Right now /
and .
are special characters. There's bound to be a few more as we implement data stores. However, we do not ensure that the volume requested to be created does not have these characters in them already. Since we do a lot of transliteration of these two (ceph does not accept /
in volume names) it's important we do not kill ourselves in the process.
This should be easy to do.
I made a volume with volplugin. Saw that i was able to mount with it in different hosts while having volplugin running. However, if i tried mounting the volume with more than one container, it either tells me that there is a lock or sometimes it seems to mess things up and im not able to mount to the volume anymore.
The current volume / pool relationship is flawed; it currently relies on volumes being the source of truth for a volume's existence, and the pool name / volume name to be present as a combination for mount handling.
This is a relational nightmare and is the result of thousands of cuts to this part of the code. What needs to happen is the naming of volumes needs to change to include the tenant name as well.
It currently returns
{
"Volume": {
"Name": "policy1/postgres-vol",
"Mountpoint": "/mnt/ceph/rbd/policy1.postgres-vol"
},
"Err": ""
}
Expect it to return same output as volcli volume get policy1/postgres-vol
:
{
"policy": "policy1",
"name": "postgres-vol",
"driver": {
"pool": "rbd"
},
"create": {
"size": "10MB",
"filesystem": "ext4"
},
"runtime": {
"snapshots": true,
"snapshot": {
"frequency": "30m",
"keep": 20
},
"rate-limit": {
"write-iops": 0,
"read-iops": 0,
"write-bps": 0,
"read-bps": 0
}
},
"backend": "ceph"
}
/cc @jainvipin
it may just be that an upgrade is required.
docker-archive/classicswarm#1189 looks like the relevant issue here.
Tracking this here: moby/moby#20939
Right now, if the host disappears and docker loses state of the volume, we leak a volume that was intended to be deleted upon container/volume teardown inside of docker.
To resolve this, I propose we implement a scheduled jobs daemon that handles this and snapshots, and of course, can be used for more later. This volsupervisor
could live on one host with a connection to the volmaster and etcd, and can be asynchronously managed with the rest of the cluster due to its tertiary purpose.
make run-build
go get -u github.com/kr/godep
GOGC=1000 godep go install -v ./volcli/volcli/ ./volplugin/volplugin/ ./volmaster/volmaster/ ./volsupervisor/volsupervisor/
config/global.go:12:2: cannot find package "github.com/Sirupsen/logrus" in any of:
/usr/lib/golang/src/github.com/Sirupsen/logrus (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/github.com/Sirupsen/logrus (from $GOPATH)
/home/rodrigo/newwork/src/github.com/Sirupsen/logrus
config/volume.go:13:2: cannot find package "github.com/alecthomas/units" in any of:
/usr/lib/golang/src/github.com/alecthomas/units (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/github.com/alecthomas/units (from $GOPATH)
/home/rodrigo/newwork/src/github.com/alecthomas/units
volcli/volcli/cli.go:7:2: cannot find package "github.com/codegangsta/cli" in any of:
/usr/lib/golang/src/github.com/codegangsta/cli (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/github.com/codegangsta/cli (from $GOPATH)
/home/rodrigo/newwork/src/github.com/codegangsta/cli
volcli/volcli.go:18:2: cannot find package "github.com/contiv/errored" in any of:
/usr/lib/golang/src/github.com/contiv/errored (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/github.com/contiv/errored (from $GOPATH)
/home/rodrigo/newwork/src/github.com/contiv/errored
storage/backend/ceph/ceph.go:16:2: cannot find package "github.com/contiv/executor" in any of:
/usr/lib/golang/src/github.com/contiv/executor (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/github.com/contiv/executor (from $GOPATH)
/home/rodrigo/newwork/src/github.com/contiv/executor
watch/watch.go:11:2: cannot find package "github.com/coreos/etcd/client" in any of:
/usr/lib/golang/src/github.com/coreos/etcd/client (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/github.com/coreos/etcd/client (from $GOPATH)
/home/rodrigo/newwork/src/github.com/coreos/etcd/client
volcli/volcli.go:21:2: cannot find package "github.com/kr/pty" in any of:
/usr/lib/golang/src/github.com/kr/pty (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/github.com/kr/pty (from $GOPATH)
/home/rodrigo/newwork/src/github.com/kr/pty
watch/watch.go:12:2: cannot find package "golang.org/x/net/context" in any of:
/usr/lib/golang/src/golang.org/x/net/context (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/golang.org/x/net/context (from $GOPATH)
/home/rodrigo/newwork/src/golang.org/x/net/context
storage/backend/ceph/ceph.go:13:2: cannot find package "golang.org/x/sys/unix" in any of:
/usr/lib/golang/src/golang.org/x/sys/unix (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/golang.org/x/sys/unix (from $GOPATH)
/home/rodrigo/newwork/src/golang.org/x/sys/unix
volplugin/handlers.go:15:2: cannot find package "github.com/docker/docker/pkg/plugins" in any of:
/usr/lib/golang/src/github.com/docker/docker/pkg/plugins (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/github.com/docker/docker/pkg/plugins (from $GOPATH)
/home/rodrigo/newwork/src/github.com/docker/docker/pkg/plugins
volplugin/volplugin.go:21:2: cannot find package "github.com/gorilla/mux" in any of:
/usr/lib/golang/src/github.com/gorilla/mux (from $GOROOT)
/home/rodrigo/newwork/src/github.com/contiv/volplugin/Godeps/_workspace/src/github.com/gorilla/mux (from $GOPATH)
/home/rodrigo/newwork/src/github.com/gorilla/mux
godep: go exit status 1
make: *** [run-build] Error 1
I want to use this plugin on coreos but rbd
isn't available.
I wonder why not using https://github.com/ceph/go-ceph/tree/master/rbd for any rbd operation? It seems to be more stable (thanks to godep) and with less dependency on an external tool who can change from different versions.
It will be good to tie bash completion scripts to the go source code as discussed here #126 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.