kubernetes-retired / kube-deploy Goto Github PK
View Code? Open in Web Editor NEW[EOL] A place for cluster deployment automation
License: Apache License 2.0
[EOL] A place for cluster deployment automation
License: Apache License 2.0
This will then let us store the state store on GCS buckets
Kube-aws is what we use at CoreOS to deploy production Kubernetes clusters on AWS. It offers a number of features that make it well-suited for a production environment:
We'd like to start the discussion on moving development of this tool over to the kube-deploy repository.
It is easy to typo a zone, or to type a duplicate zone. We should do some validation on the user provided list of zones.
We've had some problems with it in kube-up, and it seems likely that we aren't setting multizone everywhere for GCE also.
We have an early upgrade procedure. We should validate it for 1.3, including:
And then document the final procedure.
There is a certain amount of tooling that will benefit all deployment automations. We should discuss how to develop this tools in a way that they benefit as many of the maintained deployments as possible. Some work that I can think of that would benefit all deployments:
Obviously, some of these have higher relative priority than others. Let's use this issue to track the deployment shared infrastructure effort. Let me know if there are items on this list that are missing.
edit: see further down for the node hanging issue
I have something close to working with min-turnup
, but I'm running into an issue with the playbook.
Where does master_ip
get added to the cfg
object here?
RuntimeError: RUNTIME ERROR: No such field: master_ip
std.jsonnet:584:29-55 thunk <val>
std.jsonnet:589:41-43 thunk <val>
std.jsonnet:440:30-32 thunk <a>
std.jsonnet:28:21
std.jsonnet:28:12-22 thunk <a>
std.jsonnet:28:12-34 function <anonymous>
std.jsonnet:28:12-34 function <anonymous>
std.jsonnet:440:17-33 function <format_code>
std.jsonnet:589:29-63 thunk <s>
std.jsonnet:594:38 thunk <str>
...
std.jsonnet:595:61-68 thunk <v>
std.jsonnet:595:21-69 function <format_codes_obj>
std.jsonnet:595:21-69 function <format_codes_obj>
std.jsonnet:600:13-48 function <anonymous>
std.jsonnet:134:13-28 function <anonymous>
/opt/playbooks/roles/node/templates/kubeconfig.jsonnet:16:17-45 object <anonymous>
/opt/playbooks/roles/node/templates/kubeconfig.jsonnet:(14:16)-(17:7) object <anonymous>
/opt/playbooks/roles/node/templates/kubeconfig.jsonnet:(12:16)-(18:5) thunk <array_element>
/opt/playbooks/roles/node/templates/kubeconfig.jsonnet:(12:15)-(18:6) object <anonymous>
During manifestation
root@kube-master:~/kube-deploy/docker-multinode# service docker start
root@kube-master:~/kube-deploy/docker-multinode# ps axu | grep docker
root 6627 0.0 0.5 146936 10576 ? Ssl 01:59 0:00 docker-containerd -l /var/run/docker-bootstrap/libcontainerd/docker-containerd.sock --runtime docker-runc --start-timeout 2m
root 6937 2.5 1.6 348124 34124 ? Ssl 02:01 0:00 /usr/bin/docker -s overlay daemon -H fd://
root 6944 0.3 0.4 286204 9868 ? Ssl 02:01 0:00 docker-containerd -l /var/run/docker/libcontainerd/docker-containerd.sock --runtime docker-runc --start-timeout 2m
root 6995 0.0 0.0 12956 936 pts/0 S+ 02:01 0:00 grep --color=auto docker
root@kube-master:~/kube-deploy/docker-multinode# ./master.sh
+++ [0612 02:01:36] K8S_VERSION is set to: v1.2.4
+++ [0612 02:01:36] ETCD_VERSION is set to: 2.2.5
+++ [0612 02:01:36] FLANNEL_VERSION is set to: 0.5.5
+++ [0612 02:01:36] FLANNEL_IPMASQ is set to: true
+++ [0612 02:01:36] FLANNEL_NETWORK is set to: 10.1.0.0/16
+++ [0612 02:01:36] FLANNEL_BACKEND is set to: udp
+++ [0612 02:01:36] DNS_DOMAIN is set to: cluster.local
+++ [0612 02:01:36] DNS_SERVER_IP is set to: 10.0.0.10
+++ [0612 02:01:36] RESTART_POLICY is set to: on-failure
+++ [0612 02:01:36] MASTER_IP is set to: 163.172.162.23
+++ [0612 02:01:36] ARCH is set to: amd64
+++ [0612 02:01:36] NET_INTERFACE is set to: eth0
+++ [0612 02:01:36] --------------------------------------------
+++ [0612 02:01:36] Detected OS: ubuntu
+++ [0612 02:01:36] Launching docker bootstrap...
!!! [0612 02:01:55] docker bootstrap failed to start. Exiting...
We should support people building with golang 1.5 (or document a restriction if we can't do that). This may be as simple as adding GO15VENDOREXPERIMENT=1 to the instructions / makefile.
It means port range, but it is easy to confuse with source & destination ports.
The new Shared flag is giving us spurious changes...
*awstasks.VPC vpc/kubernetes.upgrade.awsdata.com
Shared <nil> -> false
*awstasks.InternetGateway internetGateway/kubernetes.upgrade.awsdata.com
Shared <nil> -> false
If we're going to accept traffic to the master (for the API server), we should probably allow ICMP 3:4
We need to check that the terraform output does not change spuriously between generations.
For example, that the blocks are always output in a consistent order.
It is easy to add things later; hard to remove them.
If we aren't actively using something, remove it / comment it out from the configuration schema.
W0611 00:02:27.020702 7 client_config.go:355] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
W0611 00:02:27.020788 7 client_config.go:360] error creating inClusterConfig, falling back to default config: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
We added a fail-fast for GCE, because upup support is lagging behind AWS
We should get GCE back up to par, and then remove the fail-fast
Users like to store their configuration in git. We should create a command to export a copy of the state store.
Hello,
being able to run the etcd cluster independently from the apiservers would allow:
All other servers would then run etcd in proxy mode.
We should create a command that allows easy editing of the statestore config file, like kubectl edit.
I can't find any docs or anything?
unable to save current status when restart server and use master.sh again.
Should we use ELB for single-master and multi-master configurations?
The DNS approach apparently has problems when not all nodes are available.
In particular:
Just for the record, certs can be generated using Terraform. I haven't tested this feature and only learned about it somewhat recently, I would love to hear if anyone has tried it in Kubernetes context or in general. It does seem appropriate for min-turnup and any other solutions using Terraform.
Hi,
We're evaluating k8s internally and, to do so, I've brought up a 2-machines local cluster with kube-deploy docker-multinode scripts.
I've read some k8s developers saying that this deployment option is different (less reliable?) from standard 'production-ready' clusters deployed using (I think) kube-up clusters/ scripts.
Is it so? If yes, what are the main differences among the two deployment options? Is HA covered in kube-up?
Thank you!
Straw-man proposal for how we should get from kube-up v1 to kube-up v2:
Terraform issue 2143 prevents creation of EC2 tags with dots, which we use for k8s.io/...
It actually works for autoscaling groups; the only place it causes us trouble is with volumes.
Going to investigate a workaround where we create the volumes outside of terraform anyway, as you might not want terraform blindly deleting the crucial state of your cluster!
The most important add-ons (DNS,Heapster,Dashboard) should be deployed easily or automatically when using docker-multinode
We have created an experimental implementation for that. It is an add-on container, similar to Hypercube:
https://github.com/FujitsuEnablingSoftwareTechnologyGmbH/k8s-docker-provisioner/tree/master/addons
One reason is that the descriptors are in yaml which requires kubectl.
Any other idea?
A consistent request is to make it easy to launch k8s clusters into an existing VPC.
Two parts, I think:
This is what AWS recommends and Amazon Linux does:
https://forums.aws.amazon.com/thread.jspa?messageID=572171
(Probably should check for any other "magic" settings in the official AWS AMIs)
upup includes a state store, currently S3 backed, but we can easily add GCS. We could put e.g. addons into it, so that we can dynamically reconfigure them without any SSH tricks.
This could also be used to dynamically populate the initial manifests (e.g. apiserver, kcm, scheduler etc). Maybe we could also include kubelet, in which case nodeup becomes even smaller.
Issue reported to me:
# I think strip-vendor is the workaround for 25572
glide install --strip-vendor --strip-vcs
[INFO] Downloading dependencies. Please wait...
[INFO] Fetching updates for github.com/aws/aws-sdk-go.
[INFO] Fetching updates for github.com/BurntSushi/toml.
[INFO] Fetching updates for github.com/cloudfoundry-incubator/candiedyaml.
[INFO] Fetching updates for github.com/davecgh/go-spew.
[INFO] Fetching updates for github.com/ghodss/yaml.
[INFO] Fetching updates for github.com/inconshreveable/mousetrap.
[INFO] Fetching updates for github.com/mitchellh/mapstructure.
[INFO] Fetching updates for github.com/spf13/cobra.
[INFO] Fetching updates for github.com/spf13/cast.
[INFO] Fetching updates for github.com/spf13/pflag.
[INFO] Fetching updates for github.com/spf13/jwalterweatherman.
[INFO] Fetching updates for golang.org/x/crypto.
[INFO] Fetching updates for github.com/golang/protobuf.
[INFO] Fetching updates for github.com/hashicorp/hcl.
[INFO] Fetching updates for github.com/magiconair/properties.
[INFO] Fetching updates for github.com/jmespath/go-jmespath.
[INFO] Fetching updates for github.com/golang/glog.
[INFO] Fetching updates for github.com/spf13/viper.
[INFO] Fetching updates for github.com/fsnotify/fsnotify.
[INFO] Fetching updates for github.com/go-ini/ini.
[INFO] Fetching updates for golang.org/x/net.
[INFO] Fetching updates for golang.org/x/oauth2.
[INFO] Fetching updates for golang.org/x/sys.
[INFO] Fetching updates for google.golang.org/api.
[INFO] Fetching updates for google.golang.org/appengine.
[INFO] Fetching updates for google.golang.org/cloud.
[INFO] Fetching updates for google.golang.org/grpc.
[INFO] Fetching updates for gopkg.in/yaml.v2.
[INFO] Fetching updates for k8s.io/kubernetes.
[WARN] Unable to checkout google.golang.org/api
[ERROR] Update failed for google.golang.org/api: Cloning into 'XXX/go/src/k8s.io/kube-deploy/upup/vendor/google.golang.org/api'...
error: RPC failed; HTTP 502 curl 22 The requested URL returned error: 502 Bad Gateway
fatal: The remote end hung up unexpectedly
: exit status 128
[INFO] Downloading dependencies. Please wait...
[INFO] Setting references.
[INFO] Setting version for golang.org/x/crypto to 77f4136a99ffb5ecdbdd0226bd5cb146cf56bc0e.
[INFO] Setting version for github.com/spf13/jwalterweatherman to 33c24e77fb80341fe7130ee7c594256ff08ccc46.
[INFO] Setting version for github.com/inconshreveable/mousetrap to 76626ae9c91c4f2a10f34cad8ce83ea42c93bb75.
[INFO] Setting version for github.com/BurntSushi/toml to f0aeabca5a127c4078abb8c8d64298b147264b55.
...
[INFO] Setting version for github.com/spf13/pflag to cb88ea77998c3f024757528e3305022ab50b43be.
[INFO] Setting version for github.com/magiconair/properties to c265cfa48dda6474e208715ca93e987829f572f8.
[ERROR] Failed to set version on google.golang.org/api to 63ade871fd3aec1225809d496e81ec91ab76ea29: open XXX/go/src/k8s.io/kube-deploy/upup/vendor/google.golang.org/api: no such file or directory
[INFO] Setting version for github.com/spf13/cobra to 1238ba19d24b0b9ceee2094e1cb31947d45c3e86.
...
An Error has occurred
make: *** [godeps] Error 2
Retrying did work.
We should support multiple nodesets in upup, at least in the configuration schema.
I'd like to experiment with using the componentconfig types for our configuration. (I think it would be awesome if we could converge on a single configuration system for the various phases, and the types in k8s itself seem a logical choice)
Is there are way to instruct e.g. kubelet to read configuration from a JSON file containing a componentconfig.KubeletConfiguration
? Or to generate flags from an object of the same?
I've poked around but didn't see anything.
cc @mikedanese
I'm confused by the project description. Will addon-manager be pulled from k8s and live here? Are there bigger plans for this project?
Support encryption for keystore keys and secretstore secrets.
Unclear whether this should be at the VFS level or not.
It should honor the version passed by cloudup
This would likely make for a better terraform experience, because the lifespan of a DNS zone is likely longer than a particular cluster (not least because users need to reconfigure their DNS hosts)
Or we could maybe mark it as ignored by terraform.
autoscaling-group:kubernetes.master.eu-west-1a.kubernetes-e2e-upup-aws.awsdata.com error deleting resource, will retry: error deleting autoscaling group "kubernetes.master.eu-west-1a.kubernetes-e2e-upup-aws.awsdata.com": ValidationError: AutoScalingGroup name not found - AutoScalingGroup 'kubernetes.master.eu-west-1a.kubernetes-e2e-upup-aws.awsdata.com' not found
I guess this is the "already deleted" message.
It should be possible to specify the Hosted Zone on AWS.
Currently it's implicit, e.g MYZONE="test.foo.bar.example.com"
will result in *awstasks.DNSZone dnsZone/example.com
whereas the Hosted Zone actually is foo.bar.example.com
It would be nice if the instance name in the AWS console matched the node name in k8s
We don't currently delete:
We've had a number of problems with ephemeral storage on EC2, not least that newer instance types don't include them (e.g. kubernetes/kubernetes#23787). Also symlinking /mnt/ephemeral seems to confuse the garbage collector.
We should figure out how to ensure that we have a big enough root disk, maybe how to re-enable btrfs, and then if there is anything we can do with the instance storage if we're otherwise not going to use it (maybe hostVolumes? Or some sort of caching service?)
I think I saw an error which suggested that the secret store served an invalid JSON from the secret. This would happen if we were mid-write during a read, I believe. We should write files atomically (anyway), and we should probably do some locking, particularly during create-if-not-exists operations.
Sometimes, typically when something else goes wrong, kube-addons will be stuck in a loop:
Jun 10 19:31:59 ip-172-20-22-48 kube-addons.sh[718]: Error from server: serviceaccounts "default" not found
Jun 10 19:32:00 ip-172-20-22-48 kube-addons.sh[718]: Error from server: serviceaccounts "default" not found
Jun 10 19:32:00 ip-172-20-22-48 kube-addons.sh[718]: Error from server: serviceaccounts "default" not found
This blocks cluster bring-up entirely. KCM won't allocate PodCIDRs etc.
Restarting kube-addons causes it to recover and the cluster starts normally.
Primary tasks before we can declare v1
P0
P1
P2
There are some edge cases where items in AWS won't be tagged, primarily when there is no change other than tags.
We should split out tagging, and also check if we can somehow avoid failures when we exit in between resource creation & tagging
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.