Git Product home page Git Product logo

kops-cn's People

Contributors

alyanli avatar bigdrum avatar cplo avatar dean205 avatar eagleye1115 avatar elbertwang avatar fromthebridge avatar fsadykov avatar jansony1 avatar jhaohai avatar jonkeyguan avatar missingcharacter avatar nowfox avatar pahud avatar satchinjoshi avatar seimutig avatar skrieger82 avatar sunfuze avatar totorochina avatar walkley avatar xmubeta avatar yizhizoe avatar yujunz avatar zhangquanhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kops-cn's Issues

Run create-cluster.sh faild

I run the install on my mac.
when I had configured env.config and run the create-cluster.sh, I got the error msg below:

I0130 10:08:37.058539 5466 create_cluster.go:1407] Using SSH public key: /Users/wangqi/.ssh/id_rsa
I0130 10:08:38.840289 5466 subnets.go:184] Assigned CIDR 172.0.32.0/19 to subnet cn-northwest-1a
I0130 10:08:38.840325 5466 subnets.go:184] Assigned CIDR 172.0.64.0/19 to subnet cn-northwest-1b
I0130 10:08:38.840362 5466 subnets.go:184] Assigned CIDR 172.0.96.0/19 to subnet cn-northwest-1c

error determining default DNS zone: error querying zones: RequestError: send request failed
caused by: Get https://route53.cn-northwest-1.amazonaws.com.cn/2013-04-01/hostedzone: dial tcp: lookup route53.cn-northwest-1.amazonaws.com.cn on 172.20.53.163:53: no such host

amazon-vpc-routed-eni as default networking for cluster creation

https://github.com/nwcdlabs/kops-cn/blob/448b5fc45d47d525c9db6c6e54dfbb19a34b1c73/create-cluster.sh#L7-L18

As AWS ALB Ingress is v1.0.0 now and the ip mode requires AWS VPC CNI as the default networking, we should use it as the default networking mode.

--networking amazon-vpc-routed-eni

the new creation script would be like this

 kops create cluster \ 
      --cloud=aws \ 
      --name=$cluster_name \ 
      --image=$ami \ 
      --zones=$zones \ 
      --master-count=$master_count \ 
      --master-size=$master_size \ 
      --node-count=$node_count \ 
      --node-size=$node_size  \ 
      --vpc=$vpcid \ 
      --networking amazon-vpc-routed-eni \
      --kubernetes-version="$kubernetesVersion" \ 
      --ssh-public-key=$ssh_public_key 

How to get the ECR repo full path from required-images.txt ?

  1. git clone the repo and make sure the latest required-images.txt and display-remote-repos.sh are in the ./mirror sub-directory.
$ cd ./mirror
$ bash display-remote-repos.sh 

You'll immediate get the full ECR repo path.

$ bash display-remote-repos.sh 
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kope-dns-controller:1.11.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-dnsmasq-nanny-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-sidecar-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-kube-dns-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/cluster-proportional-autoscaler-amd64:1.1.2-r2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/coredns:1.1.3
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/coredns:1.2.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:2.2.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/pause-amd64:3.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-controller-manager:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-scheduler:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-proxy:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-apiserver:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io-heptio-images-authenticator:v0.3.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/602401143452.dkr.ecr.us-west-2.amazonaws.com-amazon-k8s-cni:v1.3.2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-coreos-flannel:v0.10.0-amd64
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/ottoyiu-k8s-ec2-srcdst:v0.2.0-3-gc0c26eca
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/ottoyiu-k8s-ec2-srcdst:v0.2.2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/894847497797.dkr.ecr.us-west-2.amazonaws.com-aws-alb-ingress-controller:v1.1.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-node:v3.4.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-cni:v3.4.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-node:v2.6.12
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-cni:v1.11.8
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-kube-controllers:v1.0.5
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-kube-policy-controller:v0.7.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-calico-upgrade:v1.0.5
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/defaultbackend:1.4
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-kubernetes-ingress-controller-nginx-ingress-controller:0.20.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:3.2.18
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:3.2.24
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kubernetes-dashboard-amd64:v1.10.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxy_init:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxyv2:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/galley:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/pilot:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/mixer:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/kubectl:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/sidecar_injector:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxy_init:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxyv2:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/galley:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/pilot:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/mixer:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/kubectl:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/sidecar_injector:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io-kubernetes-helm-tiller:v2.12.3

faster mirror improvement

  1. compare the image digest before docker push

e.g.

  res=$(aws --profile bjs --region $ECR_REGION ecr describe-images --repository-name "$repo" \
--query "imageDetails[?(@.imageDigest=='$2')].contains(@.imageTags, '$tag') | [0]")

   if [ "$res" == "true" ]; then 
    return 0 
  else
    return 1
  fi

If this returns 0 we don't have to push this image to ecr as image already exists.

Unable to bring up k8s-ec2-srcdst deployment on CentOS 7

Due to the ca certificate in different location on CentOS, the container k8s-ec2-srcdst is unable to go up because it cannot find the certificate. It is mentioned in kubernetes/kops#4331

Default certificate path: /etc/ssl/certs/ca-certificates.crt
Centos path: /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt

One workaround is to run the 'kubectl patch' command mentioned in the issue. Another thought I have is to change the source code of kops and recompile it.

Welcome any other good advice, thank you.

can't create cluster in China - error fetching https://.../channels/stable

summary

cluster can't be created if the kops client has poor internet connection to github. You can test the connectivity by

curl https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable

how to re-produce this issue

  1. turning on -v flag in kops create
kops create cluster \
     -v 9 \
     --cloud=aws \
...
  1. running create-cluster.sh

9801a7a9620b:kops-cn hunhsieh $ bash create-cluster.sh
I0129 01:36:33.093275 18727 create_cluster.go:1407] Using SSH public key: /Users/hunhsieh/.ssh/id_rsa.pub
I0129 01:36:33.093795 18727 factory.go:68] state store s3://pahud-kops-state-store-zhy
I0129 01:36:33.342730 18727 s3context.go:194] found bucket in region "cn-northwest-1"
I0129 01:36:33.342805 18727 s3fs.go:220] Reading file "s3://pahud-kops-state-store-zhy/cluster.zhy.k8s.local/config"
I0129 01:36:33.487717 18727 channel.go:97] resolving "stable" against default channel location "https://raw.githubusercontent.com/kubernetes/kops/master/channels/"
I0129 01:36:33.487769 18727 channel.go:102] Loading channel from "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable"
I0129 01:36:33.489467 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:37:03.492838 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:37:03.993923 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:37:33.997853 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:37:35.000502 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:38:05.001673 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:38:07.002652 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:38:37.009644 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:38:41.010692 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:39:11.012203 18727 context.go:231] hit maximum retries 5 with error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout

error reading channel "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout

k8s.gcr.io/kube-proxy:v1.11.6 is missing

After upgrading to 1.11.6, the kube-proxy pod failed to come up since this image is mirrored. (I guess it is the same for many other 1.11.6 image)

I wonder if we can just have automatic script to enumerate all versions of the white listed images and make a mirror of it, so that we don't need to worry about that any more.

etcd3 as the default cluster

According to Kops and etcd roadmap document
https://github.com/kubernetes/kops/blob/af4df08b694e2a1f8814a7b3649060477be67c86/docs/etcd/roadmap.md

etcd3 will eventually become the default cluster version, however, for some reason it still sticks to v2.2 at this moment.

We have a PR trying to get it sorted but at this moment, to align to the Kops upstream, we still stick to v2.2 now.

If you prefer to provision etcd3 as the default cluster, you can update the spec.yml like this
https://github.com/nwcdlabs/kops-cn/pull/29/files#diff-ce22796966d5547919fe1967f7781563

Hi @jansony1 , feel free to update this issue if you have any other useful insights.

Thanks.

helm installation requires dependency update

In the helm official document, helm repo add and helm dep update is required before the helm install

Add istio.io chart repository and point to the daily release:

$ helm repo add istio.io https://storage.googleapis.com/istio-prerelease/daily-build/master-latest-daily/charts
Build the Helm dependencies:

$ helm dep update install/kubernetes/helm/istio

Can't pull image for Istio release-1.0-latest-daily

Got ImagePullBackOff errors during install istio

Events:
  Type     Reason                 Age                From                                                  Message
  ----     ------                 ----               ----                                                  -------
  Normal   Scheduled              22m                default-scheduler                                     Successfully assigned istio-citadel-5768b899d4-jg226 to ip-172-31-60-71.cn-north-1.compute.internal
  Normal   SuccessfulMountVolume  22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  MountVolume.SetUp succeeded for volume "istio-citadel-service-account-token-skxg9"
  Warning  Failed                 22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Failed to pull image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily": rpc error: code = Unknown desc = Error response from daemon: manifest for 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily not found
  Warning  Failed                 22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Error: ErrImagePull
  Normal   BackOff                22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Back-off pulling image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily"
  Warning  Failed                 22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Error: ImagePullBackOff
  Normal   Pulling                22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  pulling image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily"

Can't pull heptio-images-authenticator

Cant pull 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/heptio-images-authenticator:v0.3.0 when running aws-iam-authenticator.

Containers:
  aws-iam-authenticator:
    Container ID:
    Image:         937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/heptio-images-authenticator:v0.3.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      server
      --config=/etc/aws-iam-authenticator/config.yaml
      --state-dir=/var/aws-iam-authenticator
      --generate-kubeconfig=/etc/kubernetes/aws-iam-authenticator/kubeconfig.yaml

I have seen the image in https://github.com/nwcdlabs/kops-cn/blob/master/mirror/required-images.txt#L32.

How to use customized nodeup binary ?

We need to use a customized nodeup for our cluster. When I try to override the default URL with

export NODEUP_URL='https://s3-us-west-2.amazonaws.com/my-bucket/nodeup/linux/amd64/01/23/18/1516747024/nodeup'

It seems hijacked by the fileRepository setting.

I1214 16:28:21.474161   94429 builder.go:297] error reading hash file "https://s3.cn-north-1.amazonaws.com.cn/kops-bjs/fileRepository/my-bucket/nodeup/linux/amd64/01/23/18/1516747024/nodeup.sha1": file does not exist

you may have not staged your files correctly, please execute kops update cluster using the assets phase

Any suggestions?

error parsing SSH public key: ssh: no key found

I run into following issues when creating the cluster
I0416 09:48:45.708089 3672 create_cluster.go:1407] Using SSH public key: /home/ec2-user/.ssh/id_rsa.pub

error reading cluster configuration "cluster.zhy.k8s.local": error reading s3://liuhongxi-kops-cn/cluster.zhy.k8s.local/config: Unable to list AWS regions: NoCredentialProviders: no valid providers in chain
caused by: EnvAccessKeyNotFound: failed to find credentials in the environment.
SharedCredsLoad: failed to load profile, default.
EC2RoleRequestError: no EC2 instance role found
caused by: EC2MetadataError: failed to make EC2Metadata request
caused by:

<title>404 - Not Found</title>

404 - Not Found

make: *** [create-cluster] Error 1

docker repository for Amazon VPC CNI

Symptom
Deploy kops with Amazon VPC CNI(--networking=amazon-vpc-routed-eni), the daemonset aws-node will be failed due to ImagePullBackOff.

Root Cause
The generated image url of aws-node is invalid:
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/602401143452.dkr.ecr.us-west-2.amazonaws.com-amazon-k8s-cni:1.0.0

From the yaml template, image url will be generated from parameter "Networking.AmazonVPC.ImageName" or the default image url from us-west-2 ECR.

It works well after changing image url of aws-node to "pahud/amazon-k8s-cni:1.0.0"

Suggested Solution

  1. Specify "Networking.AmazonVPC.ImageName" in kops edit as following:
    networking:
    amazonvpc:
    imageName: amazon-k8s-cni:1.0.0
  2. Add image amazon-k8s-cni:1.0.0 in docker registry 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn

Primary interface IP is unable to reach, which caused out-of-service behind ELB

I am using Private topology and Internal ELB to create a K8S cluster.

kops create cluster
#omit other parameters
--topology=private 
--networking=amazon-vpc-routed-eni 
--api-loadbalancer-type=internal

After the cluster started, each nodes was assigned with two interfaces. However, I found only one node is in-service behind ELB.

From the working node, the ip route table is:


core@ip-172-20-53-190 ~ $ ip route
default via 172.20.32.1 dev eth0 proto dhcp src 172.20.53.190 metric 1024
default via 172.20.32.1 dev eth1 proto dhcp src 172.20.61.156 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.20.32.0/19 dev eth0 proto kernel scope link src 172.20.53.190
172.20.32.1 dev eth0 proto dhcp scope link src 172.20.53.190 metric 1024
172.20.32.1 dev eth1 proto dhcp scope link src 172.20.61.156 metric 1024

You can see the eth0 is on the top. That explains why you can reach the eth0 IP.

From the other two defunc nodes, the ip route is like:

core@ip-172-20-114-248 ~ $ ip route
default via 172.20.96.1 dev eth1 proto dhcp src 172.20.98.243 metric 1024
default via 172.20.96.1 dev eth0 proto dhcp src 172.20.114.248 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.20.96.0/19 dev eth1 proto kernel scope link src 172.20.98.243
172.20.96.0/19 dev eth0 proto kernel scope link src 172.20.114.248
172.20.96.1 dev eth1 proto dhcp scope link src 172.20.98.243 metric 1024
172.20.96.1 dev eth0 proto dhcp scope link src 172.20.114.248 metric 1024
core@ip-172-20-114-248 ~ $

The entry of eth1 is on the top, thus you can only reach the IP of eth1.

I am not sure if there is something wrong with my creation. Hope someone can help. Thank you.

--Beta

HOWTO - create multiple clusters in a VPC

這個範例示範如何在寧夏regon同一個VPC裡面創建兩個kops集群,並且指定不同的cluster_name

cluster name

1st cluster: cluster1.zhy.k8s.local
2nd cluster: cluster2.zhy.k8s.local
(注意,必須k8s.local結尾)

subnets

在一個VPC裡面準備六個subnet如下,在這個範例我們每個cluster將會用其中三個subnets

image

Makefiles

準備兩個Makefile分別是cluster1.mk and cluster2.mk 內容範例:

https://github.com/nwcdlabs/kops-cn/blob/master/samples/multi-clusters-in-shared-vpc/cluster1.mk
https://github.com/nwcdlabs/kops-cn/blob/master/samples/multi-clusters-in-shared-vpc/cluster2.mk

創建第一個cluster

create cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk create-cluster

edit cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk edit-cluster

spec.yml內容貼進去

update cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk update-cluster

validate cluster

切換context到cluster1

$ kubectl config use-context cluster1.zhy.k8s.local

validate cluster1

$ AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk validate-cluster

image

創建第二個cluster

create cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk create-cluster

edit cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk edit-cluster

spec.yml內容貼進去

update cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk update-cluster

validate cluster

切換context到cluster2

$ kubectl config use-context cluster2.zhy.k8s.local

validate cluster2

$ AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk validate-cluster

image

get po

兩個cluster都可以列出所有kube-system內的Pod,全部都正常Running

image

error when: kops edit cluster

kops edit cluster
tried to add:
assets:
containerRegistry: 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn
fileRepository: https://s3.cn-north-1.amazonaws.com.cn/kops-bjs/fileRepository/
docker:
logDriver: ""
registryMirrors:
- https://registry.docker-cn.com

wq: to save, however ,editor re-opens and getting error:
error populating cluster spec: error building complete spec: options did not converge after 10 iterations

i tried to ignore and keep going to the final step : kops validate cluster
unexpected error during validation: error listing nodes: Get https://api-cluster-zhy-k8s-local-qpbf7n-1465482247.cn-northwest-1.elb.amazonaws.com.cn/api/v1/nodes: EOF

go to aws console, find:
the three master instances out of service.
i checked the security group rules are all right.
i already made icp exception for 80/8080/443 , and i can telnet elb:443.

and i googled it ,find similar issue: kubernetes/kops#5061

how to fix it ?

Create your cluster with exsiting subnet

Customer need to deploy their cluster into existing subnets, while Kops official page is not clear, as you could see here
https://github.com/kubernetes/kops/blob/master/docs/run_in_existing_vpc.md#shared-subnets

Specifically for those parts,

export SUBNET_ID=subnet-12345678 # replace with your subnet id
export SUBNET_CIDR=10.100.0.0/24 # replace with your subnet CIDR
export SUBNET_IDS=$SUBNET_IDS # replace with your comma separated subnet ids

What you really need is to just modify the script like below according to your subnet and zones, and keep other scripts no change in Makefile (PS: Only include necessary parts, )

.PHONY: create-cluster
create-cluster:  
	@KOPS_STATE_STORE=$(KOPS_STATE_STORE) \
	AWS_PROFILE=$(AWS_PROFILE) \
	AWS_REGION=$(AWS_REGION) \
	AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \
	kops create cluster \
     --cloud=aws \
     --name=$(CLUSTER_NAME) \
     --image=$(AMI) \
     --master-count=$(MASTER_COUNT) \
     --master-size=$(MASTER_SIZE) \
     --node-count=$(NODE_COUNT) \
     --node-size=$(NODE_SIZE)  \
     --vpc=$(VPCID) \
     --kubernetes-version=$(KUBERNETES_VERSION_URI) \
     --networking=amazon-vpc-routed-eni \
     --ssh-public-key=$(SSH_PUBLIC_KEY) \
     --zones=cn-northwest-1a,cn-northwest-1b \
     --subnets=subnet-2cf25a45,subnet-9315d7e8

1.Delete the original zone option
2.And then add last two lines.
3.Your subnet‘s order must comply with your zone’s order

Add more script to makefile

As some of our customer may edit wrong in "make edit-cluster" step or they need to update their cluster, so they need rolling update options. Also, if they choose use makefile to doing their kops operation, we my better list all operations so that they have no need to maintain another set of
Environment variable.

Here are two my customer requirement, so add it here.

.PHONY: rolling-cluster
rolling-cluster:
@KOPS_STATE_STORE=$(KOPS_STATE_STORE)
AWS_PROFILE=$(AWS_PROFILE)
AWS_REGION=$(AWS_REGION)
AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION)
kops rolling-update cluster --name $(CLUSTER_NAME) --yes --cloudonly

.PHONY: get-cluster
get-cluster:
@KOPS_STATE_STORE=$(KOPS_STATE_STORE)
AWS_PROFILE=$(AWS_PROFILE)
AWS_REGION=$(AWS_REGION)
AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION)
kops get cluster --name $(CLUSTER_NAME)

change the default AMI from CoreOS to Amazon Linux 2 LTS

Background

to make sure the OS is more compatible with other components such as

  1. AWS VPC CNI
  2. AWS ALB Ingress

And eliminate potential complexity of maintainence in the future.

TODO

  • make sure the latest AMI in cn-north-1 and cn-northwest-1 is compatible with the latest stable kops
  • make sure latest AWS VPC CNI (1.3) is compatible
  • kubernetes/kops#6341 need to be identified and fixed
  • CoreDNS as the replacement of kube-dns
  • make sure AWS ALB Ingress is compatible
  • update env.config and set Amazon Linux 2 LTS as the default AMI

How to specify c5.large spot instance in Ningxia region?

C5 instance type with Nitro hypervisor is available in Ningxia and Beijing regions and it has very great cost-price ratio.

Let's make a note on how to build kops-cn in Ningxia with C5 spot instance to save the cost up to +60% off.

image

node NotReady after create and update

when i create cluster and edit then update cluster follow the instruction with 3 master and 1 node in cn-northwest-1. but the node is NotReady status.

if i change the Makefile set the NETWORKING to flannel-vxlan, it is ok.

i guess because the node have muli private ips, if the primary ip is not the first, the node is NotReady status.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.