oracle / terraform-kubernetes-installer Goto Github PK

View Code? Open in Web Editor NEW

179.0 36.0 118.0 1.35 MB

Terraform Installer for Kubernetes on Oracle Cloud Infrastructure

License: Other

HCL 78.05% Shell 16.04% Python 5.91%

kubernetes k8s terraform

terraform-kubernetes-installer's Introduction

Terraform Kubernetes Installer for Oracle Cloud Infrastructure

About

The Kubernetes Installer for Oracle Cloud Infrastructure provides a Terraform-based Kubernetes installation for Oracle Cloud Infrastructure. It consists of a set of Terraform modules and an example base configuration that is used to provision and configure the resources needed to run a highly available and configurable Kubernetes cluster on Oracle Cloud Infrastructure (OCI).

Cluster Overview

Terraform is used to provision the cloud infrastructure and any required local resources for the Kubernetes cluster including:

OCI Infrastructure

Virtual Cloud Network (VCN) with dedicated subnets for etcd, masters, and workers in each availability domain
Dedicated compute instances for etcd, Kubernetes master and worker nodes in each availability domain
Public or Private TCP/SSL OCI Load Balancer to distribute traffic to the Kubernetes Master(s)
Public or Private TCP/SSL OCI Load Balancer to distribute traffic to the node(s) in the etcd cluster
Optional NAT instance for Internet-bound traffic on any private subnets
2048-bit SSH RSA Key-Pair for compute instances when not overridden by ssh_private_key and ssh_public_key_openssh input variables
Self-signed CA and TLS cluster certificates when not overridden by the input variables ca_cert, ca_key, etc.

Cluster Configuration

Terraform uses cloud-init scripts to handle the instance-level configuration for instances in the Control Plane to configure:

Highly Available (HA) Kubernetes master configuration
Highly Available (HA) etcd cluster configuration
Optional GPU support for worker nodes that need to run specific workloads
Kubernetes Dashboard and kube-DNS cluster add-ons
Kubernetes RBAC (role-based authorization control)
Integration with OCI Cloud Controller Manager (CCM)
Integration with OCI Flexvolume Driver

The Terraform scripts also accept a number of other input variables to choose instance shapes (including GPU) and how they are placed across the availability domain (ADs), etc. If your requirements extend beyond the base configuration, the modules can be used to form your own customized configuration.

Prerequisites

Download and install Terraform (v0.10.3 or later)
Download and install the OCI Terraform Provider (v2.0.0 or later)
Create an Terraform configuration file at ~/.terraformrc that specifies the path to the OCI provider:

providers {
  oci = "<path_to_provider_binary>/terraform-provider-oci"
}

Ensure you have Kubectl installed if you plan to interact with the cluster locally

Optionally create separate IAM resources for OCI plugins

The OCI Cloud Controller Manager (CCM) and Volume Provisioner (VP) enables Kubernetes to dynamically provision OCI resources such as Load Balancers and Block Volumes as a part of pod and service creation. In order to facilitate this, OCI credentials and OCID information are automatically stored in the cluster as a Kubernetes Secret.

By default, the credentials of the user creating the cluster is used. However, in some cases, it makes sense to use a more restricted set of credentials whose policies are limited to a particular set of resources within the compartment.

To Terraform separate IAM users, groups, and policy resources, run the terraform plan and terraform apply commands from the identity directory and set the appropriate input variables relating to your custom users, fingerprints, and key paths.

Quick start

Customize the configuration

Create a terraform.tfvars file in the project root that specifies your configuration.

# start from the included example
$ cp terraform.example.tfvars terraform.tfvars

Set mandatory OCI input variables relating to your tenancy, user, and compartment.
Override optional input variables to customize the default configuration.

Deploy the cluster

Initialize Terraform:

$ terraform init

View what Terraform plans do before actually doing it:

$ terraform plan

Use Terraform to Provision resources and stand-up k8s cluster on OCI:

$ terraform apply

Access the cluster

The Kubernetes cluster will be running after the configuration is applied successfully and the cloud-init scripts have been given time to finish asynchronously. Typically, this takes around 5 minutes after terraform apply and will vary depending on the overall configuration, instance counts, and shapes.

A working kubeconfig can be found in the ./generated folder or generated on the fly using the kubeconfig Terraform output variable.

Your network access settings determine whether your cluster is accessible from the outside. See Accessing the Cluster for more details.

Verify the cluster:

If you've chosen to configure a public cluster, you can do a quick and automated verification of your cluster from your local machine by running the cluster-check.sh located in the scripts directory. Note that this script requires your KUBECONFIG environment variable to be set (above), and SSH and HTTPs access to be open to etcd and worker nodes.

To temporarily open access SSH and HTTPs access for cluster-check.sh, add the following to your terraform.tfvars file:

# warning: 0.0.0.0/0 is wide open. remember to undo this.
etcd_ssh_ingress = "0.0.0.0/0"
master_ssh_ingress = "0.0.0.0/0"
worker_ssh_ingress = "0.0.0.0/0"
master_https_ingress = "0.0.0.0/0"
worker_nodeport_ingress = "0.0.0.0/0"

$ scripts/cluster-check.sh

[cluster-check.sh] Running some basic checks on Kubernetes cluster....
[cluster-check.sh]   Checking ssh connectivity to each node...
[cluster-check.sh]   Checking whether instance bootstrap has completed on each node...
[cluster-check.sh]   Checking Flannel's etcd key from each node...
[cluster-check.sh]   Checking whether expected system services are running on each node...
[cluster-check.sh]   Checking status of /healthz endpoint at each k8s master node...
[cluster-check.sh]   Checking status of /healthz endpoint at the LB...
[cluster-check.sh]   Running 'kubectl get nodes' a number of times through the master LB...

The Kubernetes cluster is up and appears to be healthy.
Kubernetes master is running at https://129.146.22.175:443
KubeDNS is running at https://129.146.22.175:443/api/v1/proxy/namespaces/kube-system/services/kube-dns
kubernetes-dashboard is running at https://129.146.22.175:443/ui

Deploy a simple load-balanced application with shared volumes

Check out the example application deployment for a walk through of deploying a simple application that leverages both the Cloud Controller Manager and Flexvolume Driver plugins.

Scale, upgrade, or delete the cluster

Check out the example cluster operations for details on how to use Terraform to scale, upgrade, replace, or delete your cluster.

Known issues and limitations

The OCI Load Balancers that gets created and attached to the VCN when a service of type --type=LoadBalancer is an out-of-band change to Terraform. As a result, the cluster's VCN will not be able to be destroyed until all services of type LoadBalancer have been deleted using kubectl or the OCI Console.
The OCI Block Volumes that gets created and attached to the workers when persistent volumes are create is also an out-of-band change to Terraform. As a result, the instances will not be able to be destroyed until all persistent volumes have been deleted using kubectl or the OCI Console.
Scaling or replacing etcd members in or out after the initial deployment is currently unsupported
Failover or HA configuration for NAT instance(s) is currently unsupported
Resizing the iSCSI volume will delete and recreate the volume
GPU Bare Metal instance shapes are currently only available in the Ashburn region and may be limited to specific availability domains
Provisioning a mix of GPU-enabled and non-GPU-enabled worker node instance shapes is currently unsupported

Testing

Tests run automatically on every commit to the main branch. Additionally, the tests should be run against any pull-request before it is merged.

See Testing for details.

Contributing

This project is open source. Oracle appreciates any contributions that are made by the open source community.

See Contributing for details.

terraform-kubernetes-installer's People

Stargazers

Watchers

Forkers

mario21ic wercker-demos carsonoid iahmad-khan gotonz abikavi fossabot samdgz jralmaraz karthequian jan-g pritamsahoo1982 garthy ddamojip fmarsaud1 edivaserman chetangaonker nikawang rbg gabfol ljsmith93 jamiebg1 srirg junior janalex bsmoyers xinnong-wang tjames221188 dansimone sunilake dial-once enterstudio honghzzhang zytrik johnk9000 sekka1 chpatel3 styledigger beide mies peranders thimotyb craigcarl-oracle adityavishwekar olgasever eversmily orashio hhiroshell hvarman hzhao-github watsh-rajneesh nakolli jamalarif derekoneil devcamcar rppgithub jchendevops vmleon heroback123 zhuangxiaosong bhargav-b ivinpolosony yekki singhdeepu markxnelson amyroh sarikaoracle engchina injeti-manohar ik-kubernetes jlamillan ashishsingh27 dannykang jsalaman akemoah gugalnikov babubalagani josekurian ruycury sergiorochalopes eupraxialabs mukeshkbj choolairaj krishnakvk ichigo54 mikelitoluistro praveenracharla shwetanshu nagendra1991 jespinoz sreekumar-vg sombirdevops ptorresr udaydharmaiahgari prabhu9683 jeongjuwon faisalanz devopseze texaspark mubsad

terraform-kubernetes-installer's Issues

Add support for provisioning a cluster in an existing VCN (i.e. shared VCN)

Currently, each cluster creates a brand new VCN, subnets, reoute tables, etc. for each cluster. It would be nice if we supported the ability to provision subsequent clusters into an an existing VCN.

flannel After request in unit file is undetected

Terraform Version

10.8

OCI Provider Version

2.02

Terraform Installer for Kubernetes Version

commit f47ed4b

Input Variables

label_prefix ="k8s-"
etcdAd1Count=1
etcdAd2Count=0
etcdAd3Count=0
k8sWorkerAd1Count=1
k8sWorkerAd2Count=1
k8sWorkerAd3Count=0
k8sMasterAd1Count=0
k8sMasterAd2Count=0
k8sMasterAd3Count=1
etcdShape="VM.Standard1.1"
k8sMasterShape="VM.Standard1.2"
k8sWorkerShape="VM.Standard1.4"
k8sMasterLBShape="100Mbps"
etcdLBShape="100Mbps"
etcd_lb_enabled="false"
k8s_ver = "1.8.3"
etcd_docker_max_log_size = "250m"
etcd_docker_max_log_files = "2"
master_docker_max_log_size = "100m"
worker_docker_max_log_size = "75m"
worker_docker_max_log_files = "3"
disable_auto_retries="false"

Description of issue:

systemd: [/etc/systemd/system/flannel.service:10] Unknown lvalue 'After' in section 'Service'

oci_core_virtual_network.CompleteVCN: Status: 404; Code: NotAuthorizedOrNotFound;

The issue should be easy to reproduce with the following:

sign up for OCI trial
clone this repo
terraform init
enter terraform.tfvars details
terraform plan. This works which seems to prove the credentials are correct
terraform apply.
This fails after 2 minutes with

Error applying plan:

1 error(s) occurred:

* module.vcn.oci_core_virtual_network.CompleteVCN: 1 error(s) occurred:

* oci_core_virtual_network.CompleteVCN: Status: 404; Code: NotAuthorizedOrNotFound; OPC Request ID: /OMITTED; Message: Authorization failed or requested resource not found.

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

After the failure I ran terraform destroy and it cleaned by successfully.

Any missing prerequisites ? Or is this a legitimate defect?

latest sre installer etcd LBR is setup as private, provide user an option to make it public.

we used the latest sre installer and noticed that the cluster is setup with etcd LBR as a private LBR

We are concerned that using this setup with a private etcd LBR, losing network connectivity to that AD with etcd could potentially bring down the service.

SRE installer should provide the user with an option to make it public, so the user can choose how they want it to be setup based on their cluster setup requirements.

Add support for creating a service with --type=LoadBalancer

This will required creating an LB managed by BMC in some form.

Empty generated node label values on OCI

Created a new cluster and see empty / null values for some labels
node.info/node.id_prefix:
node.info/node.id_suffix:
node.info/node.shape:

It was fresh git clone on 11/2. The provider is terraform-provider-oci_v2.0.3
More details in slack #kubernetes-installer

Configure the bootstrap processes on etcd, master and worker node VM as daemon process

If there is a restart of the VM instances, the processes on the etcd, master and worker VMs are not coming up. we should configure the processes that we start during the bootstrap process on the etcd, master and worker VMs as daemon process.

SNAT for nonlocal traffic from a pod is broken (breaks external DNS resolution)

Cluster set up with a fairly recent (last couple of days) version of the installer.

I am seeing the following:

for a request from the worker node hosting the kubedns pod:

14:10:00.460463 IP 10.0.42.2.39359 > 169.254.169.254.domain: 48005+ A? ioctl.org. (27)
14:10:00.460982 IP 169.254.169.254.domain > 10.0.42.2.39359: 48005 1/13/0 A 94.136.54.223 (254)

ie, that’s working.

Traffic on that node that’s targeting the upstream resolver from inside the pod:

14:10:03.873217 IP 10.99.38.7.30422 > 169.254.169.254.domain: 38833+ A? registry-1.docker.io. (38)
14:10:03.873263 IP 10.99.38.7.41485 > 169.254.169.254.domain: 40603+ AAAA? registry-1.docker.io. (38)

in other words, it looks to me like upstream resolver traffic is not being correctly SNATted.

In fact, all traffic destined for off the cluster is not NATted correctly:

14:17:15.508146 IP 10.99.16.11 > 8.8.8.8: ICMP echo request, id 24576, seq 0, length 64

(This a tcpdump of the traffic from a node containing an arbitrary pod pinging that address.)

It looks like the docker-engine systemd configuration is overriding --ip-masq=false (and the other options there). That is, /usr/lib/systemd/system/docker.service has the right ExecStart but /etc/systemd/system/docker.service.d/docker-sysconfig.conf (which is package-installed) overrides it with (empty) values from /etc/sysconfig/docker*

Without the --ipmasq=false setting on dockerd, pod traffic can't correctly get out of the cluster.

(enhancement) Add new import variables to control whether instances, subnets, lbs are public (default) or private

Based on some early internal feedback, we should let users control whether instances, load-balancers, subnets are public or private.

oci_core_images now rurns GPU as first Oracle Linux 7.4 image - breaking apply

It looks like oci_core_images is now returning a GPU image as the first image in the list, which will cause an error unless the user is using the BM.GPU2.2 shape due to the way we set up our data and resources:

e.g. data:

data "oci_core_images" "ImageOCID" {
  compartment_id           = "${var.compartment_ocid}"
  operating_system         = "Oracle Linux"
  operating_system_version = "7.4"
}

e.g. resource:

resource "oci_core_instance" "TFInstance" {
  ...
  image               = "${lookup(data.oci_core_images.ImageOCID.images[0], "id")}"
  ...
}

which leads to the following error:

* oci_core_instance.TFInstanceK8sMaster: Status: 400; Code: InvalidParameter; OPC Request ID: /XXXXX; Message: Shape VM.Standard1.8 is not valid for image ocid1.image.oc1.iad.aaaaaaaaflaxipg2k4l7mjudowij5vic76jjnqgzktw46pga4knjpqcrxw6a.

This goes to show that the oci_core_images image list can and will change underneath us. Therefore, I think we'd be better off passing the image name in directly in the data source so that it is fixed for the query and the live cluster:

i.e. something like:

data "baremetal_core_images" "ImageOCID" {
    compartment_id = "${var.compartment_ocid}"
    display_name = "Oracle-Linux-7.4-2017.08.25-0"
}

Cluster creation failed because of intermittent failure in getting etcd discovery url

Intermittently, the call to https://discovery.etcd.io can fail, which causes the following invalid discovery token to be generated in the generated folder:

jemillanmbp:generated jessem$ cat discoverye3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 
Unable to generate token

This situation in turn leads to an invalid parameter being passed to -discovery in terraform-kubernetes-installer/instances/etcd/cloud_init/bootstrap.template.sh:

docker run -d \
	-p 2380:2380 -p 2379:2379 \
	-v /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt \
	--net=host \
	quay.io/coreos/etcd:${etcd_ver} \
	/usr/local/bin/etcd \
	-name $HOSTNAME \
	-advertise-client-urls http://$IP_LOCAL:2379 \
	-listen-client-urls http://$IP_LOCAL:2379,http://127.0.0.1:2379 \
	-listen-peer-urls http://0.0.0.0:2380 \
	-discovery "Unable to generate token"

This causes the cluster to fail to come up properly.

We are moving away from using the discovery URL, but in the meantime, we should not try to create a cluster if we failed to get a discovery url.

Create issue template for terraform-kubernetes-installer repository

Should include the things we need to reproduce issues including:

Description of problem
Version / release of terraform-kubernetes-installer
Version / release of OCI provider
Values of (non-sensitive) input variables
The output of terraform plan
Possibly the output of scripts/cluster-check.sh

Support etcd data directory on Block Volumes

Currently the etcd data directory is in the docker volume. we need to mount to an instance and use a block volume so that we could do snapshots of the block volume to recover from any failures.

Does terraform-kubernetes-installer support Oracle Public Cloud(OCI-C)?

Terraform Version

2.0.4 - 2017-11-2

I have Oracle Public Cloud environment. Does terraform-kubernetes-installer support Oracle Public Cloud(OCI-C)?
Thanks in advance!

Configuration support to allow for etcd and k8smaster to share the same machine

Provide an option in the config to merge the etcd instance and k8smaster instance to the same node.

Failure to parse etcd endpoints using Terraform v0.10.2 and earlier

Error applying plan:

9 error(s) occurred:

module.instances-k8smaster-ad2.data.template_file.setup-template: data.template_file.setup-template: failed to render : 4:24: unknown variable accessed: domain_name
...
module.instances-k8smaster-ad2.data.template_file.kube-apiserver: data.template_file.kube-apiserver: failed to render : 10:40: unknown variable accessed: k8s_ver
...

This message is a bit of a red herring, the problem isn't that domain_name and k8s_ver are missing, it is related to parsing outputs that aren't in the state file.

doesn't reproduce with v0.10.3 and higher.

I don't understand some of the documentation (Prereq Step 3)

Step 3 says I need to create a file that has

providers {
oci = "<path_to_provider_binary>/terraform-provider-oci"
}

Ooookkkk.....what is a provider binary? That is not discussed anywhere that I can see unless I'm really missing something. Is this something that I downloaded in the previous packages, the code that I want to upload using kubernetes, or is it an option in the OCI dashboard?

Please, could you make this more clear?

Convert bash init scripts into local Ansible playbooks

We need to move away from bash init scripts for configuring the cluster to local Ansible playbooks.

One of many advantages of this is that we would no longer be tied to one OS.

k8s clusters missing requestheader-client-ca-file

I am trying to install the kubernetes service-catalog into the k8s 1.7.4 provisioned by BMC terraform.
The link for installing service catalog is https://github.com/kubernetes-incubator/service-catalog/blob/master/docs/install.md
I am getting Error: cluster doesn't provide requestheader-client-ca-file

Azure's acs-engine also had the same issue and they fixed it. The link for that issue is Azure/acs-engine#1390

Add support to have an option for both etcd and master to share a single load balancer

When load balancer resources are limited in a single bmc tenancy, it'll limit the total number of k8s clusters that can be set up in a team to be used in testing and development, if every installation requires two load balancers, one for etcd, and one for the masters.

Remove VCN access to k8s master insecure port

Access to the k8s master insecure port is already disallowed by ingress. However, we should also ensure that communication within the VCN is also only on the secure port using SSL.

Add Support For Persistent Storage / Disk Integration

Enable support for Persistent Volumes and Persistent Volume Claims

Add support for dynamically scaling etcd cluster in and out

We currently do not support adding or removing members of the etcd cluster after it is provisioned.

We need to add support for dynamically scaling etcd cluster in and out at runtime.

Consider Support for Autoscaling on Created Clusters

https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

could be one option

Is it possible to create multiple kubernetes clusters in a single vcn in a single compartment

Terraform Version

# Run this command to get the terraform version:

$ terraform -v

OCI Provider Version

# Execute the plugin directly to get the version:

$ <path-to-plugin>/terraform-provider-oci

Terraform Installer for Kubernetes Version

# The version/tag/release or commit hash (of this project) the issue occurred on

Input Variables

# Values of non-sensitive input variables

Description of issue:

differences between Yum repositories cause issues with new k8s default 1.7.10

We recently bumped the default version of Kubernetes (i.e. k8s_ver) from 1.7.4 to 1.7.10. Unfortunately, 1.7.10 yum packages appear to be available in less Yum repos, which leads to issues initializing the software.

Specifically, when $k8s_ver is not available in the Yum repository, we randomly pick up whatever version of the kubelet that kubernetes-cni pulls in to fulfill its dependency:

until yum install -y kubelet-${k8s_ver}-$RPM_TAG kubectl-${k8s_ver}-$RPM_TAG kubernetes-cni; do sleep 1 && echo -n ".";done

(enhancement) automatically create instance private key file to generated folder by default

Based on some internal feedback, it would be better if we automatically wrote out the instance private key file to generated folder by default instead of having the user manually do it as indicated in the README:

$ terraform output ssh_private_key > generated/instances_id_rsa
$ chmod 600 generated/instances_id_rsa
$ ssh -i  generated/instances_id_rsa ...

Allow VCN CIDR to be optional parameter

Create an optional parameter for VCN CIDR for the created VCN.

This would give users more flexibility to leverage VCN peering which requires non-overlapping VCN CIDRs.

docker-engine 17.03.1.ce no longer available from some yum.oracle.com servers

The RPM packages for the version of Docker-Engine we depend on (17.03.1.ce) appears to have been removed from some yum repos in favor of 17.06.2.ol, which is causing the bootstrap to fail.

We need to update the default version to 17.06.2.ol ASAP.

Porting to openstack

Hi ,

Does oracle use some flavor of open-stack? If yes , I want to discuss the possibility of porting this project to a generic open-stack installation. What major changes will be needed to make this work on any latest open-stack installation?

Thanks.

With private subnets, nodes come up in not ready state

With private subnets, we see that the worker nodes come up in not ready state

worker nodes(1 or 2 or 3) are coming up in NotReady status.

kubectl get nodes
NAME STATUS AGE VERSION
k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com NotReady 31m v1.7.4
k8s-worker-ad2-0.k8sworkerad2.k8sbmcs.oraclevcn.com NotReady 31m v1.7.4
k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com NotReady 31m v1.7.4

Below in the master node journal entries.

Jan 02 17:37:44 k8s-master-ad1-0 kubelet[13086]: W0102 17:37:44.615767 13086 container_manager_linux.go:218] Running with swap on is not supported, please disable swap! This will be a fatal error by default starting in K8s v1.6! In the meantime, you can opt-in to making this a fatal error by enabling --experimental-fail-swap-on.

On the worker nodes journal entries, seeing the below

Jan 02 18:20:05 k8s-worker-ad1-0 kubelet[13436]: E0102 18:20:05.035951 13436 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com' not found
Jan 02 18:20:15 k8s-worker-ad1-0 kubelet[13436]: E0102 18:20:15.073231 13436 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com' not found
Jan 02 18:20:25 k8s-worker-ad1-0 kubelet[13436]: E0102 18:20:25.111192 13436 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com' not found

Noticed that the kube-proxy is not coming up..for another 2 workers nodes are in not ready status now..[opc@k8s-master-ad1-0 ~]$ kubectl get po -n=kube-system
NAME READY STATUS RESTARTS AGE
kube-dns-2272871451-v3p1r 3/3 Running 0 16m
kube-proxy-k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com 1/1 Running 0 14m
kube-proxy-k8s-worker-ad2-0.k8sworkerad2.k8sbmcs.oraclevcn.com 0/1 Unknown 0 14m
kube-proxy-k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com 0/1 Pending 0 14m
kubernetes-dashboard-3313488171-1z911 1/1 Running 0 16m
tiller-deploy-2136207906-kct9c 1/1 Running 0 7m

Cluster created with default settings should have a public load balancer ip address

Terraform Version

0.10.8

OCI Provider Version

2.02

Terraform Installer for Kubernetes Version

1a48b04

Input Variables

Description of issue:

creating a cluster using default access policy should give a public load balancer ip address

master_lb_ip = [
10.0.30.4
]

Support for other CNI (calico?)

Hi,

Is there a plan to add support for Calico or other CNI or is Flannel the only option that you will support in this repo?

Thanks,

Provide instructions about how to upgrade k8s to newer version

Since there is no kubeadm, so can't use kubeadm to upgrade k8s.
It will be best if can provide a instruction for upgrade k8s with this installer. Thanks!

Provide example configuration that uses Amazon S3 Compatibility API for remote state

It'd be good to have an example configuration that uses Oracle's Amazon S3 Compatibility API for for remote state storage.

Add wercker.yml to have build+test automated on app.wercker.com/oracle

Have build / validation status on homepage.
Consider triggering terraform apply after a successful build / validation.
Integrate with private Docker registry service

Add configuration support to allow disabling the etcd load balancer

Add a parameter to support the enable/disable of using an etcd load balancer.

etcdLBEnabled = "false"

Kubernetes swap failure running with versions higher than v1.8.0

Terraform Version

10.8

OCI Provider Version

2.0.2

Terraform Installer for Kubernetes Version

1.8.0 and higher

The version/tag/release or commit hash (of this project) the issue occurred on

commit f47ed4b

Description of issue:

localhost kubelet: error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename#011#011#011#011Type#011#011Size#011Used#011Priority /dev/sda2 partition#0118420344#0110#011-1]
Nov 15 17:59:40 localhost systemd: kubelet.service: main process exited, code=exited, status=1/FAILURE

Add support for new OCI Terraform provider v2.

Oracle Cloud Infrastructure provider has been updated since this Installer was published and breaks the TF configs. The latest version I tried (v2.0.1) does not work for the instructions given for terraform-kubernetes-installer. I had to install an earlier version (v1.0.18) of the provider before the name change was done to successfully install with this installer.

Allow replacement/updates to etcd nodes in cluster

Currently, after a k8s cluster has been created, etcd nodes cannot successfully be replaced. This is a limitation due to how the etcd cluster is created. The current method is to use a discovery url. Switching this to a more convention list of etcd endpoints would allow for more flexible etcd nodes within the installer cluster.

Move from using discovery_url to configure the cluster to building a list of cluster ips, or dns names to allow for terraform node udpates or replacements.

(enhancement) have a different ssh key for the NAT instances when using private subnets

we should have a different ssh key for the NAT instances when using private subnets. Currently its shared with worker, master and etcd instances as well.

Add support for all Oracle-Provided Linux distributions on BMC

We should add support for all the flavors of Linux distributions BMC provides including Oracle Linux, CentOS, and Ubuntu.

Can a worker node have ExternalIP?

Terraform Version

# Run this command to get the terraform version:

$ terraform -v
Terraform v0.11.1

provider.null v1.0.0
provider.oci (unversioned)
provider.random v1.1.0
provider.template v1.0.0
provider.tls v1.0.1

OCI Provider Version

# Execute the plugin directly to get the version:

$ <path-to-plugin>/terraform-provider-oci
terraform-provider-oci 2.0.5

Terraform Installer for Kubernetes Version

# The version/tag/release or commit hash (of this project) the issue occurred on

34b6c90

Input Variables

# Values of non-sensitive input variables

Description of issue:

A worker node doesn't have ExternalIP. Can it be provisioned or configured after the fact?

Move to Kubernetes 1.8 in terraform-kubernetes-installer

We need to move to Kubernetes 1.8 in the terraform-kubernetes-installer. This task also includes making whatever changes are necessary to the bootstrap scripts, and re-running all tests including Kubernetes conformance tests.

In addition, we should consider moving to the latest:

Docker version
etcd version
Flannel version

(enhancement) allow placement of NAT instances in a separate subnet than that of k8s master LB

Description of issue:

Currently, when control_plane_subnet_access=private a single set of public subnets (public-subnet-AD[1-3], are provisioned which are shared by the NAT instances as well as the k8s master LB (if k8s_master_lb_access=public). As a consequence, this means that the NAT and the k8s master LB share the same security list.
This was thought to be OK with the assumption that there would not be any overlap between the ingress between the two (e.g. 22 for the NAT, 443 for the LB) . However, there could be corner cases wherein the aforementioned assumption doesn't hold true and thereby the topology is not desirable. (redacting the internal usecase from this public filing) As discussed with @jlamillan , If we could add another flag that separates the two subnets - that would be ideal.

Add some docker log configuration to the default cluster installation

Currently, the installer does not modify any docker logging configuration. This can lead to full-disk usage errors if the cluster is left running long enough.

Need to add some basic log size limits to docker containers. Also add some log rotation capabilities.

can't communicate with <pod-ip>:<pod-port> between nodes when second gen (O2) instance shape is used for worker(s)

We're seeing some sort of pod communication issue when a second generation instance shapes are used for worker nodes (e.g. VM.DenseIO2.16, BM.GPU2.2) where we cannot connect to pod-ip:pod-port from a node other than the node the pod is running on even though it is part of the same Flannel overlay network.

Steps to reproduce

Choose a O2 instance shape for worker(s) (e.g. k8sWorkerShape = "VM.DenseIO2.16")
Provision a new cluster: terraform apply
Create a nginx pod: kubectl run my-nginx --image=nginx --replicas=1 --port=80 kubectl expose deployment my-nginx --port=80
From a master or worker node that is not the same node that is running the pod, run curl http://<pod-ip>:80 and see the command hang. (note that you can ping ).
From the node where the pod is running: run curl http://<pod-ip>:80 and see the command succeed.

Also note that the issue only occurs when either VXLAN (default) or host-gw are used as the backend mode for Flannel are used. udp backend mode appears to be working correctly, although this comes with the limitation of only being able to have nodes in a single AD.

Secure peer-to-peer etcd communication with SSL

client-to-server communication is already disallowed by ingress. The etcd LB is also private. However, we should also ensure that communication within the VCN is also secured using SSL.

pod fail to start with network failure after etcd connectivity issue causes flannel/cni subnet lease to expire

An extended etcd connectivity issue can lead to pods failing to start.

How: flannel uses an expiring (24 hours) etcd keys to manage the subnets allocated it worker nodes

e.g. subnet allocated to k8s-worker-ad1-0:

etcdctl ls /flannel/network/subnets
/flannel/network/subnets/10.99.82.0-24

When the worker nodes lose connectivity to etcd (e.g. when the etcd-lb is malfunctioning) the /flannel/network/subnets/10.99.82.0-24 key TTL expires and the key will be gone:

etcdctl ls /flannel/network/subnets

When the connectivity to etcd is restored, a new key is created and distributed to the flannel service on each worker node:

etcdctl ls /flannel/network/subnets
/flannel/network/subnets/10.99.43.0-24

At this point, you'll see a number of symptoms including that new pods will fail to start at complaining (presumably about the pods on the old network):

Failed to setup network for pod \"hello-2093073260-lk3f2_default(9f0a4b8a-90f0-11e6-b54b-080027242396)\" using network plugins \"cni\": \"cni0\" already has an IP address different from 10.99.43.1/24,

You'll also see the network namespace container for the pod (that "pause" pod that starts alongside other containers) with the related error:

Failed to start with docker id 99a811606b51 with error: API error (500): Cannot start container 99a811606b51cdbeddbea14af474f0df432278ac9f73baea5d8ecaf5453f521e: cannot join network of a non running container: c8a4a648e63d5b18dcd8fad6fd2d70f466584f2a749bb4245bb86a6da1ceea55

It may also have something these files are doing or not doing when a new subnet is allocated to the worker:

./instances/k8smaster/scripts/flannel.service
./instances/k8sworker/scripts/flannel.service
./instances/k8smaster/scripts/cni-bridge.service
./instances/k8smaster/scripts/cni-bridge.sh

No cluster-info resource in Kubernetes deployment

On running command
$ kubectl get cluster-info

I get an error - No such resource type "cluster-info".

oracle / terraform-kubernetes-installer Goto Github PK

terraform-kubernetes-installer's Introduction

Terraform Kubernetes Installer for Oracle Cloud Infrastructure

About

Cluster Overview

OCI Infrastructure

Cluster Configuration

Prerequisites

Optionally create separate IAM resources for OCI plugins

Quick start

Customize the configuration

Deploy the cluster

Access the cluster

Verify the cluster:

Deploy a simple load-balanced application with shared volumes

Scale, upgrade, or delete the cluster

Known issues and limitations

Testing

Contributing

terraform-kubernetes-installer's People

Stargazers

Watchers

Forkers

terraform-kubernetes-installer's Issues

Terraform Version

OCI Provider Version

Terraform Installer for Kubernetes Version

Input Variables

Description of issue:

Terraform Version

Terraform Version

OCI Provider Version

Terraform Installer for Kubernetes Version

Input Variables

Description of issue:

Terraform Version

OCI Provider Version

Terraform Installer for Kubernetes Version

Input Variables

Description of issue:

Terraform Version

OCI Provider Version

Terraform Installer for Kubernetes Version

The version/tag/release or commit hash (of this project) the issue occurred on

Description of issue:

Terraform Version

OCI Provider Version

Terraform Installer for Kubernetes Version

Input Variables

Description of issue:

Description of issue:

Recommend Projects

Recommend Topics

Recommend Org