Git Product home page Git Product logo

terraform-kubernetes-installer's Introduction

Terraform Kubernetes Installer for Oracle Cloud Infrastructure

wercker status

About

The Kubernetes Installer for Oracle Cloud Infrastructure provides a Terraform-based Kubernetes installation for Oracle Cloud Infrastructure. It consists of a set of Terraform modules and an example base configuration that is used to provision and configure the resources needed to run a highly available and configurable Kubernetes cluster on Oracle Cloud Infrastructure (OCI).

Cluster Overview

Terraform is used to provision the cloud infrastructure and any required local resources for the Kubernetes cluster including:

OCI Infrastructure

  • Virtual Cloud Network (VCN) with dedicated subnets for etcd, masters, and workers in each availability domain
  • Dedicated compute instances for etcd, Kubernetes master and worker nodes in each availability domain
  • Public or Private TCP/SSL OCI Load Balancer to distribute traffic to the Kubernetes Master(s)
  • Public or Private TCP/SSL OCI Load Balancer to distribute traffic to the node(s) in the etcd cluster
  • Optional NAT instance for Internet-bound traffic on any private subnets
  • 2048-bit SSH RSA Key-Pair for compute instances when not overridden by ssh_private_key and ssh_public_key_openssh input variables
  • Self-signed CA and TLS cluster certificates when not overridden by the input variables ca_cert, ca_key, etc.

Cluster Configuration

Terraform uses cloud-init scripts to handle the instance-level configuration for instances in the Control Plane to configure:

  • Highly Available (HA) Kubernetes master configuration
  • Highly Available (HA) etcd cluster configuration
  • Optional GPU support for worker nodes that need to run specific workloads
  • Kubernetes Dashboard and kube-DNS cluster add-ons
  • Kubernetes RBAC (role-based authorization control)
  • Integration with OCI Cloud Controller Manager (CCM)
  • Integration with OCI Flexvolume Driver

The Terraform scripts also accept a number of other input variables to choose instance shapes (including GPU) and how they are placed across the availability domain (ADs), etc. If your requirements extend beyond the base configuration, the modules can be used to form your own customized configuration.

Prerequisites

  1. Download and install Terraform (v0.10.3 or later)
  2. Download and install the OCI Terraform Provider (v2.0.0 or later)
  3. Create an Terraform configuration file at ~/.terraformrc that specifies the path to the OCI provider:
providers {
  oci = "<path_to_provider_binary>/terraform-provider-oci"
}
  1. Ensure you have Kubectl installed if you plan to interact with the cluster locally
Optionally create separate IAM resources for OCI plugins

The OCI Cloud Controller Manager (CCM) and Volume Provisioner (VP) enables Kubernetes to dynamically provision OCI resources such as Load Balancers and Block Volumes as a part of pod and service creation. In order to facilitate this, OCI credentials and OCID information are automatically stored in the cluster as a Kubernetes Secret.

By default, the credentials of the user creating the cluster is used. However, in some cases, it makes sense to use a more restricted set of credentials whose policies are limited to a particular set of resources within the compartment.

To Terraform separate IAM users, groups, and policy resources, run the terraform plan and terraform apply commands from the identity directory and set the appropriate input variables relating to your custom users, fingerprints, and key paths.

Quick start

Customize the configuration

Create a terraform.tfvars file in the project root that specifies your configuration.

# start from the included example
$ cp terraform.example.tfvars terraform.tfvars
  • Set mandatory OCI input variables relating to your tenancy, user, and compartment.
  • Override optional input variables to customize the default configuration.

Deploy the cluster

Initialize Terraform:

$ terraform init

View what Terraform plans do before actually doing it:

$ terraform plan

Use Terraform to Provision resources and stand-up k8s cluster on OCI:

$ terraform apply

Access the cluster

The Kubernetes cluster will be running after the configuration is applied successfully and the cloud-init scripts have been given time to finish asynchronously. Typically, this takes around 5 minutes after terraform apply and will vary depending on the overall configuration, instance counts, and shapes.

A working kubeconfig can be found in the ./generated folder or generated on the fly using the kubeconfig Terraform output variable.

Your network access settings determine whether your cluster is accessible from the outside. See Accessing the Cluster for more details.

Verify the cluster:

If you've chosen to configure a public cluster, you can do a quick and automated verification of your cluster from your local machine by running the cluster-check.sh located in the scripts directory. Note that this script requires your KUBECONFIG environment variable to be set (above), and SSH and HTTPs access to be open to etcd and worker nodes.

To temporarily open access SSH and HTTPs access for cluster-check.sh, add the following to your terraform.tfvars file:

# warning: 0.0.0.0/0 is wide open. remember to undo this.
etcd_ssh_ingress = "0.0.0.0/0"
master_ssh_ingress = "0.0.0.0/0"
worker_ssh_ingress = "0.0.0.0/0"
master_https_ingress = "0.0.0.0/0"
worker_nodeport_ingress = "0.0.0.0/0"
$ scripts/cluster-check.sh
[cluster-check.sh] Running some basic checks on Kubernetes cluster....
[cluster-check.sh]   Checking ssh connectivity to each node...
[cluster-check.sh]   Checking whether instance bootstrap has completed on each node...
[cluster-check.sh]   Checking Flannel's etcd key from each node...
[cluster-check.sh]   Checking whether expected system services are running on each node...
[cluster-check.sh]   Checking status of /healthz endpoint at each k8s master node...
[cluster-check.sh]   Checking status of /healthz endpoint at the LB...
[cluster-check.sh]   Running 'kubectl get nodes' a number of times through the master LB...

The Kubernetes cluster is up and appears to be healthy.
Kubernetes master is running at https://129.146.22.175:443
KubeDNS is running at https://129.146.22.175:443/api/v1/proxy/namespaces/kube-system/services/kube-dns
kubernetes-dashboard is running at https://129.146.22.175:443/ui

Deploy a simple load-balanced application with shared volumes

Check out the example application deployment for a walk through of deploying a simple application that leverages both the Cloud Controller Manager and Flexvolume Driver plugins.

Scale, upgrade, or delete the cluster

Check out the example cluster operations for details on how to use Terraform to scale, upgrade, replace, or delete your cluster.

Known issues and limitations

  • The OCI Load Balancers that gets created and attached to the VCN when a service of type --type=LoadBalancer is an out-of-band change to Terraform. As a result, the cluster's VCN will not be able to be destroyed until all services of type LoadBalancer have been deleted using kubectl or the OCI Console.
  • The OCI Block Volumes that gets created and attached to the workers when persistent volumes are create is also an out-of-band change to Terraform. As a result, the instances will not be able to be destroyed until all persistent volumes have been deleted using kubectl or the OCI Console.
  • Scaling or replacing etcd members in or out after the initial deployment is currently unsupported
  • Failover or HA configuration for NAT instance(s) is currently unsupported
  • Resizing the iSCSI volume will delete and recreate the volume
  • GPU Bare Metal instance shapes are currently only available in the Ashburn region and may be limited to specific availability domains
  • Provisioning a mix of GPU-enabled and non-GPU-enabled worker node instance shapes is currently unsupported

Testing

Tests run automatically on every commit to the main branch. Additionally, the tests should be run against any pull-request before it is merged.

See Testing for details.

Contributing

This project is open source. Oracle appreciates any contributions that are made by the open source community.

See Contributing for details.

terraform-kubernetes-installer's People

Contributors

bsmoyers avatar craigcarl-oracle avatar dansmithy avatar derekoneil avatar devcamcar avatar duhang avatar fossabot avatar garthy avatar j0nr33v3 avatar jan-g avatar janalex avatar jlamillan avatar jrosinsk avatar karthequian avatar owainlewis avatar rbg avatar rjeberhard avatar sekka1 avatar srirg avatar xinnong-wang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terraform-kubernetes-installer's Issues

oci_core_virtual_network.CompleteVCN: Status: 404; Code: NotAuthorizedOrNotFound;

The issue should be easy to reproduce with the following:

  1. sign up for OCI trial
  2. clone this repo
  3. terraform init
  4. enter terraform.tfvars details
  5. terraform plan. This works which seems to prove the credentials are correct
  6. terraform apply.
    This fails after 2 minutes with
Error applying plan:

1 error(s) occurred:

* module.vcn.oci_core_virtual_network.CompleteVCN: 1 error(s) occurred:

* oci_core_virtual_network.CompleteVCN: Status: 404; Code: NotAuthorizedOrNotFound; OPC Request ID: /OMITTED; Message: Authorization failed or requested resource not found.

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

After the failure I ran terraform destroy and it cleaned by successfully.

Any missing prerequisites ? Or is this a legitimate defect?

SNAT for nonlocal traffic from a pod is broken (breaks external DNS resolution)

Cluster set up with a fairly recent (last couple of days) version of the installer.

I am seeing the following:

  • for a request from the worker node hosting the kubedns pod:
14:10:00.460463 IP 10.0.42.2.39359 > 169.254.169.254.domain: 48005+ A? ioctl.org. (27)
14:10:00.460982 IP 169.254.169.254.domain > 10.0.42.2.39359: 48005 1/13/0 A 94.136.54.223 (254)

ie, that’s working.

Traffic on that node that’s targeting the upstream resolver from inside the pod:

14:10:03.873217 IP 10.99.38.7.30422 > 169.254.169.254.domain: 38833+ A? registry-1.docker.io. (38)
14:10:03.873263 IP 10.99.38.7.41485 > 169.254.169.254.domain: 40603+ AAAA? registry-1.docker.io. (38)
  • in other words, it looks to me like upstream resolver traffic is not being correctly SNATted.

In fact, all traffic destined for off the cluster is not NATted correctly:

14:17:15.508146 IP 10.99.16.11 > 8.8.8.8: ICMP echo request, id 24576, seq 0, length 64

(This a tcpdump of the traffic from a node containing an arbitrary pod pinging that address.)

It looks like the docker-engine systemd configuration is overriding --ip-masq=false (and the other options there). That is, /usr/lib/systemd/system/docker.service has the right ExecStart but /etc/systemd/system/docker.service.d/docker-sysconfig.conf (which is package-installed) overrides it with (empty) values from /etc/sysconfig/docker*

Without the --ipmasq=false setting on dockerd, pod traffic can't correctly get out of the cluster.

Support etcd data directory on Block Volumes

Currently the etcd data directory is in the docker volume. we need to mount to an instance and use a block volume so that we could do snapshots of the block volume to recover from any failures.

Is it possible to create multiple kubernetes clusters in a single vcn in a single compartment

Terraform Version

# Run this command to get the terraform version:

$ terraform -v

OCI Provider Version

# Execute the plugin directly to get the version:

$ <path-to-plugin>/terraform-provider-oci

Terraform Installer for Kubernetes Version

# The version/tag/release or commit hash (of this project) the issue occurred on

Input Variables

# Values of non-sensitive input variables

Description of issue:

Add support for new OCI Terraform provider v2.

Oracle Cloud Infrastructure provider has been updated since this Installer was published and breaks the TF configs. The latest version I tried (v2.0.1) does not work for the instructions given for terraform-kubernetes-installer. I had to install an earlier version (v1.0.18) of the provider before the name change was done to successfully install with this installer.

Failure to parse etcd endpoints using Terraform v0.10.2 and earlier

Error applying plan:

9 error(s) occurred:

  • module.instances-k8smaster-ad2.data.template_file.setup-template: data.template_file.setup-template: failed to render : 4:24: unknown variable accessed: domain_name
    ...
  • module.instances-k8smaster-ad2.data.template_file.kube-apiserver: data.template_file.kube-apiserver: failed to render : 10:40: unknown variable accessed: k8s_ver
    ...

This message is a bit of a red herring, the problem isn't that domain_name and k8s_ver are missing, it is related to parsing outputs that aren't in the state file.

doesn't reproduce with v0.10.3 and higher.

can't communicate with <pod-ip>:<pod-port> between nodes when second gen (O2) instance shape is used for worker(s)

We're seeing some sort of pod communication issue when a second generation instance shapes are used for worker nodes (e.g. VM.DenseIO2.16, BM.GPU2.2) where we cannot connect to pod-ip:pod-port from a node other than the node the pod is running on even though it is part of the same Flannel overlay network.

Steps to reproduce

  1. Choose a O2 instance shape for worker(s) (e.g. k8sWorkerShape = "VM.DenseIO2.16")
  2. Provision a new cluster: terraform apply
  3. Create a nginx pod: kubectl run my-nginx --image=nginx --replicas=1 --port=80 kubectl expose deployment my-nginx --port=80
  4. From a master or worker node that is not the same node that is running the pod, run curl http://<pod-ip>:80 and see the command hang. (note that you can ping ).
  5. From the node where the pod is running: run curl http://<pod-ip>:80 and see the command succeed.

Also note that the issue only occurs when either VXLAN (default) or host-gw are used as the backend mode for Flannel are used. udp backend mode appears to be working correctly, although this comes with the limitation of only being able to have nodes in a single AD.

Secure peer-to-peer etcd communication with SSL

client-to-server communication is already disallowed by ingress. The etcd LB is also private. However, we should also ensure that communication within the VCN is also secured using SSL.

(enhancement) allow placement of NAT instances in a separate subnet than that of k8s master LB

Description of issue:

Currently, when control_plane_subnet_access=private a single set of public subnets (public-subnet-AD[1-3], are provisioned which are shared by the NAT instances as well as the k8s master LB (if k8s_master_lb_access=public). As a consequence, this means that the NAT and the k8s master LB share the same security list.
This was thought to be OK with the assumption that there would not be any overlap between the ingress between the two (e.g. 22 for the NAT, 443 for the LB) . However, there could be corner cases wherein the aforementioned assumption doesn't hold true and thereby the topology is not desirable. (redacting the internal usecase from this public filing) As discussed with @jlamillan , If we could add another flag that separates the two subnets - that would be ideal.

oci_core_images now rurns GPU as first Oracle Linux 7.4 image - breaking apply

It looks like oci_core_images is now returning a GPU image as the first image in the list, which will cause an error unless the user is using the BM.GPU2.2 shape due to the way we set up our data and resources:

e.g. data:

data "oci_core_images" "ImageOCID" {
  compartment_id           = "${var.compartment_ocid}"
  operating_system         = "Oracle Linux"
  operating_system_version = "7.4"
}

e.g. resource:

resource "oci_core_instance" "TFInstance" {
  ...
  image               = "${lookup(data.oci_core_images.ImageOCID.images[0], "id")}"
  ...
}

which leads to the following error:

* oci_core_instance.TFInstanceK8sMaster: Status: 400; Code: InvalidParameter; OPC Request ID: /XXXXX; Message: Shape VM.Standard1.8 is not valid for image ocid1.image.oc1.iad.aaaaaaaaflaxipg2k4l7mjudowij5vic76jjnqgzktw46pga4knjpqcrxw6a.

This goes to show that the oci_core_images image list can and will change underneath us. Therefore, I think we'd be better off passing the image name in directly in the data source so that it is fixed for the query and the live cluster:

i.e. something like:

data "baremetal_core_images" "ImageOCID" {
    compartment_id = "${var.compartment_ocid}"
    display_name = "Oracle-Linux-7.4-2017.08.25-0"
}

pod fail to start with network failure after etcd connectivity issue causes flannel/cni subnet lease to expire

An extended etcd connectivity issue can lead to pods failing to start.

How: flannel uses an expiring (24 hours) etcd keys to manage the subnets allocated it worker nodes

e.g. subnet allocated to k8s-worker-ad1-0:

etcdctl ls /flannel/network/subnets
/flannel/network/subnets/10.99.82.0-24

When the worker nodes lose connectivity to etcd (e.g. when the etcd-lb is malfunctioning) the /flannel/network/subnets/10.99.82.0-24 key TTL expires and the key will be gone:

etcdctl ls /flannel/network/subnets

When the connectivity to etcd is restored, a new key is created and distributed to the flannel service on each worker node:

etcdctl ls /flannel/network/subnets
/flannel/network/subnets/10.99.43.0-24

At this point, you'll see a number of symptoms including that new pods will fail to start at complaining (presumably about the pods on the old network):

Failed to setup network for pod \"hello-2093073260-lk3f2_default(9f0a4b8a-90f0-11e6-b54b-080027242396)\" using network plugins \"cni\": \"cni0\" already has an IP address different from 10.99.43.1/24,

You'll also see the network namespace container for the pod (that "pause" pod that starts alongside other containers) with the related error:

Failed to start with docker id 99a811606b51 with error: API error (500): Cannot start container 99a811606b51cdbeddbea14af474f0df432278ac9f73baea5d8ecaf5453f521e: cannot join network of a non running container: c8a4a648e63d5b18dcd8fad6fd2d70f466584f2a749bb4245bb86a6da1ceea55

It may also have something these files are doing or not doing when a new subnet is allocated to the worker:

./instances/k8smaster/scripts/flannel.service
./instances/k8sworker/scripts/flannel.service
./instances/k8smaster/scripts/cni-bridge.service
./instances/k8smaster/scripts/cni-bridge.sh

flannel After request in unit file is undetected

Terraform Version

10.8

OCI Provider Version

2.02

Terraform Installer for Kubernetes Version

commit f47ed4b

Input Variables

label_prefix ="k8s-"
etcdAd1Count=1
etcdAd2Count=0
etcdAd3Count=0
k8sWorkerAd1Count=1
k8sWorkerAd2Count=1
k8sWorkerAd3Count=0
k8sMasterAd1Count=0
k8sMasterAd2Count=0
k8sMasterAd3Count=1
etcdShape="VM.Standard1.1"
k8sMasterShape="VM.Standard1.2"
k8sWorkerShape="VM.Standard1.4"
k8sMasterLBShape="100Mbps"
etcdLBShape="100Mbps"
etcd_lb_enabled="false"
k8s_ver = "1.8.3"
etcd_docker_max_log_size = "250m"
etcd_docker_max_log_files = "2"
master_docker_max_log_size = "100m"
worker_docker_max_log_size = "75m"
worker_docker_max_log_files = "3"
disable_auto_retries="false"

Description of issue:

systemd: [/etc/systemd/system/flannel.service:10] Unknown lvalue 'After' in section 'Service'

Cluster creation failed because of intermittent failure in getting etcd discovery url

Intermittently, the call to https://discovery.etcd.io can fail, which causes the following invalid discovery token to be generated in the generated folder:

jemillanmbp:generated jessem$ cat discoverye3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 
Unable to generate token

This situation in turn leads to an invalid parameter being passed to -discovery in terraform-kubernetes-installer/instances/etcd/cloud_init/bootstrap.template.sh:

docker run -d \
	-p 2380:2380 -p 2379:2379 \
	-v /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt \
	--net=host \
	quay.io/coreos/etcd:${etcd_ver} \
	/usr/local/bin/etcd \
	-name $HOSTNAME \
	-advertise-client-urls http://$IP_LOCAL:2379 \
	-listen-client-urls http://$IP_LOCAL:2379,http://127.0.0.1:2379 \
	-listen-peer-urls http://0.0.0.0:2380 \
	-discovery "Unable to generate token"

This causes the cluster to fail to come up properly.

We are moving away from using the discovery URL, but in the meantime, we should not try to create a cluster if we failed to get a discovery url.

Kubernetes swap failure running with versions higher than v1.8.0

Terraform Version

10.8

OCI Provider Version

2.0.2

Terraform Installer for Kubernetes Version

1.8.0 and higher

The version/tag/release or commit hash (of this project) the issue occurred on

commit f47ed4b

Description of issue:

localhost kubelet: error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename#011#011#011#011Type#011#011Size#011Used#011Priority /dev/sda2 partition#0118420344#0110#011-1]
Nov 15 17:59:40 localhost systemd: kubelet.service: main process exited, code=exited, status=1/FAILURE

Cluster created with default settings should have a public load balancer ip address

Terraform Version

0.10.8

OCI Provider Version

2.02

Terraform Installer for Kubernetes Version

1a48b04

Input Variables

label_prefix ="k8s-"
etcdAd1Count=1
etcdAd2Count=0
etcdAd3Count=0
k8sWorkerAd1Count=1
k8sWorkerAd2Count=1
k8sWorkerAd3Count=0
k8sMasterAd1Count=0
k8sMasterAd2Count=0
k8sMasterAd3Count=1
etcdShape="VM.Standard1.1"
k8sMasterShape="VM.Standard1.2"
k8sWorkerShape="VM.Standard1.4"
k8sMasterLBShape="100Mbps"
etcdLBShape="100Mbps"

Description of issue:

creating a cluster using default access policy should give a public load balancer ip address

master_lb_ip = [
10.0.30.4
]

Allow replacement/updates to etcd nodes in cluster

Currently, after a k8s cluster has been created, etcd nodes cannot successfully be replaced. This is a limitation due to how the etcd cluster is created. The current method is to use a discovery url. Switching this to a more convention list of etcd endpoints would allow for more flexible etcd nodes within the installer cluster.

Move from using discovery_url to configure the cluster to building a list of cluster ips, or dns names to allow for terraform node udpates or replacements.

Move to Kubernetes 1.8 in terraform-kubernetes-installer

We need to move to Kubernetes 1.8 in the terraform-kubernetes-installer. This task also includes making whatever changes are necessary to the bootstrap scripts, and re-running all tests including Kubernetes conformance tests.

In addition, we should consider moving to the latest:

  • Docker version
  • etcd version
  • Flannel version

With private subnets, nodes come up in not ready state

With private subnets, we see that the worker nodes come up in not ready state

worker nodes(1 or 2 or 3) are coming up in NotReady status.

kubectl get nodes
NAME STATUS AGE VERSION
k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com NotReady 31m v1.7.4
k8s-worker-ad2-0.k8sworkerad2.k8sbmcs.oraclevcn.com NotReady 31m v1.7.4
k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com NotReady 31m v1.7.4

Below in the master node journal entries.

Jan 02 17:37:44 k8s-master-ad1-0 kubelet[13086]: W0102 17:37:44.615767 13086 container_manager_linux.go:218] Running with swap on is not supported, please disable swap! This will be a fatal error by default starting in K8s v1.6! In the meantime, you can opt-in to making this a fatal error by enabling --experimental-fail-swap-on.

On the worker nodes journal entries, seeing the below

Jan 02 18:20:05 k8s-worker-ad1-0 kubelet[13436]: E0102 18:20:05.035951 13436 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com' not found
Jan 02 18:20:15 k8s-worker-ad1-0 kubelet[13436]: E0102 18:20:15.073231 13436 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com' not found
Jan 02 18:20:25 k8s-worker-ad1-0 kubelet[13436]: E0102 18:20:25.111192 13436 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com' not found

Noticed that the kube-proxy is not coming up..for another 2 workers nodes are in not ready status now..[opc@k8s-master-ad1-0 ~]$ kubectl get po -n=kube-system
NAME READY STATUS RESTARTS AGE
kube-dns-2272871451-v3p1r 3/3 Running 0 16m
kube-proxy-k8s-worker-ad1-0.k8sworkerad1.k8sbmcs.oraclevcn.com 1/1 Running 0 14m
kube-proxy-k8s-worker-ad2-0.k8sworkerad2.k8sbmcs.oraclevcn.com 0/1 Unknown 0 14m
kube-proxy-k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com 0/1 Pending 0 14m
kubernetes-dashboard-3313488171-1z911 1/1 Running 0 16m
tiller-deploy-2136207906-kct9c 1/1 Running 0 7m

Create issue template for terraform-kubernetes-installer repository

Should include the things we need to reproduce issues including:

  • Description of problem
  • Version / release of terraform-kubernetes-installer
  • Version / release of OCI provider
  • Values of (non-sensitive) input variables
  • The output of terraform plan
  • Possibly the output of scripts/cluster-check.sh

Support for other CNI (calico?)

Hi,

Is there a plan to add support for Calico or other CNI or is Flannel the only option that you will support in this repo?

Thanks,

I don't understand some of the documentation (Prereq Step 3)

Step 3 says I need to create a file that has

providers {
oci = "<path_to_provider_binary>/terraform-provider-oci"
}

Ooookkkk.....what is a provider binary? That is not discussed anywhere that I can see unless I'm really missing something. Is this something that I downloaded in the previous packages, the code that I want to upload using kubernetes, or is it an option in the OCI dashboard?

Please, could you make this more clear?

differences between Yum repositories cause issues with new k8s default 1.7.10

We recently bumped the default version of Kubernetes (i.e. k8s_ver) from 1.7.4 to 1.7.10. Unfortunately, 1.7.10 yum packages appear to be available in less Yum repos, which leads to issues initializing the software.

Specifically, when $k8s_ver is not available in the Yum repository, we randomly pick up whatever version of the kubelet that kubernetes-cni pulls in to fulfill its dependency:

until yum install -y kubelet-${k8s_ver}-$RPM_TAG kubectl-${k8s_ver}-$RPM_TAG kubernetes-cni; do sleep 1 && echo -n ".";done

Empty generated node label values on OCI

Created a new cluster and see empty / null values for some labels
node.info/node.id_prefix:
node.info/node.id_suffix:
node.info/node.shape:

It was fresh git clone on 11/2. The provider is terraform-provider-oci_v2.0.3
More details in slack #kubernetes-installer

Can a worker node have ExternalIP?

Terraform Version

# Run this command to get the terraform version:

$ terraform -v
Terraform v0.11.1

  • provider.null v1.0.0
  • provider.oci (unversioned)
  • provider.random v1.1.0
  • provider.template v1.0.0
  • provider.tls v1.0.1

OCI Provider Version

# Execute the plugin directly to get the version:

$ <path-to-plugin>/terraform-provider-oci
terraform-provider-oci 2.0.5

Terraform Installer for Kubernetes Version

# The version/tag/release or commit hash (of this project) the issue occurred on
34b6c90

Input Variables

# Values of non-sensitive input variables

Description of issue:

A worker node doesn't have ExternalIP. Can it be provisioned or configured after the fact?

latest sre installer etcd LBR is setup as private, provide user an option to make it public.

we used the latest sre installer and noticed that the cluster is setup with etcd LBR as a private LBR

We are concerned that using this setup with a private etcd LBR, losing network connectivity to that AD with etcd could potentially bring down the service.

SRE installer should provide the user with an option to make it public, so the user can choose how they want it to be setup based on their cluster setup requirements.

Allow VCN CIDR to be optional parameter

Create an optional parameter for VCN CIDR for the created VCN.

This would give users more flexibility to leverage VCN peering which requires non-overlapping VCN CIDRs.

Remove VCN access to k8s master insecure port

Access to the k8s master insecure port is already disallowed by ingress. However, we should also ensure that communication within the VCN is also only on the secure port using SSL.

Porting to openstack

Hi ,

Does oracle use some flavor of open-stack? If yes , I want to discuss the possibility of porting this project to a generic open-stack installation. What major changes will be needed to make this work on any latest open-stack installation?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.