nwcdheap / kops-cn Goto Github PK

View Code? Open in Web Editor NEW

121.0 121.0 74.0 1.68 MB

AWS**宁夏区域/北京区域，快速Kops部署K8S集群

License: Apache License 2.0

Shell 76.60% Makefile 23.40%

kops-cn's People

Contributors

Stargazers

Watchers

kops-cn's Issues

Run create-cluster.sh faild

I run the install on my mac.
when I had configured env.config and run the create-cluster.sh, I got the error msg below:

I0130 10:08:37.058539 5466 create_cluster.go:1407] Using SSH public key: /Users/wangqi/.ssh/id_rsa
I0130 10:08:38.840289 5466 subnets.go:184] Assigned CIDR 172.0.32.0/19 to subnet cn-northwest-1a
I0130 10:08:38.840325 5466 subnets.go:184] Assigned CIDR 172.0.64.0/19 to subnet cn-northwest-1b
I0130 10:08:38.840362 5466 subnets.go:184] Assigned CIDR 172.0.96.0/19 to subnet cn-northwest-1c

error determining default DNS zone: error querying zones: RequestError: send request failed
caused by: Get https://route53.cn-northwest-1.amazonaws.com.cn/2013-04-01/hostedzone: dial tcp: lookup route53.cn-northwest-1.amazonaws.com.cn on 172.20.53.163:53: no such host

amazon-vpc-routed-eni as default networking for cluster creation

https://github.com/nwcdlabs/kops-cn/blob/448b5fc45d47d525c9db6c6e54dfbb19a34b1c73/create-cluster.sh#L7-L18

As AWS ALB Ingress is v1.0.0 now and the ip mode requires AWS VPC CNI as the default networking, we should use it as the default networking mode.

--networking amazon-vpc-routed-eni

the new creation script would be like this

 kops create cluster \ 
      --cloud=aws \ 
      --name=$cluster_name \ 
      --image=$ami \ 
      --zones=$zones \ 
      --master-count=$master_count \ 
      --master-size=$master_size \ 
      --node-count=$node_count \ 
      --node-size=$node_size  \ 
      --vpc=$vpcid \ 
      --networking amazon-vpc-routed-eni \
      --kubernetes-version="$kubernetesVersion" \ 
      --ssh-public-key=$ssh_public_key

Add AWS VPC CNI v1.4.0 image

https://github.com/nwcdlabs/kops-cn/blob/a811a288d1e17cdc42667d25f2b0ea889238b02e/mirror/required-images.txt#L19

https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.4.0

error reading hash file from storage.googleapis.com

https://github.com/kubernetes/kops/blob/6cf4f35970ea9690acc9a97cac9cb80d1b05535c/pkg/assets/builder.go#L297

how to ssh the master or node

after finish creating the cluster, when I use ssh
"ssh -i ~/.ssh/id_rsa.pub [email protected]"
to login the master or node, it need to input the password, what's the password?

please upgrade helm docker version in reg to latest 2.11.0

when trying to use 'helm ls',
got this error "Error: incompatible versions client[v2.11.0] server[v2.9.1]"

support k8s 1.11.8 and 1.11.9

https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable

 - range: ">=1.11.0"
    recommendedVersion: 1.11.8

mirror fileRepository for 1.11.8
mirror fileRepository for 1.11.9

ref: aws/containers-roadmap#188

How to get the ECR repo full path from required-images.txt ?

git clone the repo and make sure the latest required-images.txt and display-remote-repos.sh are in the ./mirror sub-directory.

$ cd ./mirror
$ bash display-remote-repos.sh

You'll immediate get the full ECR repo path.

$ bash display-remote-repos.sh 
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kope-dns-controller:1.11.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-dnsmasq-nanny-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-sidecar-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-kube-dns-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/cluster-proportional-autoscaler-amd64:1.1.2-r2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/coredns:1.1.3
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/coredns:1.2.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:2.2.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/pause-amd64:3.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-controller-manager:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-scheduler:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-proxy:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-apiserver:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io-heptio-images-authenticator:v0.3.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/602401143452.dkr.ecr.us-west-2.amazonaws.com-amazon-k8s-cni:v1.3.2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-coreos-flannel:v0.10.0-amd64
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/ottoyiu-k8s-ec2-srcdst:v0.2.0-3-gc0c26eca
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/ottoyiu-k8s-ec2-srcdst:v0.2.2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/894847497797.dkr.ecr.us-west-2.amazonaws.com-aws-alb-ingress-controller:v1.1.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-node:v3.4.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-cni:v3.4.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-node:v2.6.12
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-cni:v1.11.8
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-kube-controllers:v1.0.5
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-kube-policy-controller:v0.7.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-calico-upgrade:v1.0.5
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/defaultbackend:1.4
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-kubernetes-ingress-controller-nginx-ingress-controller:0.20.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:3.2.18
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:3.2.24
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kubernetes-dashboard-amd64:v1.10.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxy_init:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxyv2:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/galley:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/pilot:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/mixer:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/kubectl:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/sidecar_injector:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxy_init:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxyv2:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/galley:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/pilot:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/mixer:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/kubectl:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/sidecar_injector:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io-kubernetes-helm-tiller:v2.12.3

prepare file assets for K8s 1.11.7 for Kops 1.11.1

We got the upgrade from Kops stream with Kops 1.11.1 and K8s 1.11.7
https://github.com/kubernetes/kops/releases/tag/1.11.1

kubernetes/kops@239d599

https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable

  - range: ">=1.11.0"
    recommendedVersion: 1.11.7
    requiredVersion: 1.11.0

mirror required fileRepository
update required-images.txt 70f5920

faster mirror improvement

compare the image digest before docker push

e.g.

  res=$(aws --profile bjs --region $ECR_REGION ecr describe-images --repository-name "$repo" \
--query "imageDetails[?(@.imageDigest=='$2')].contains(@.imageTags, '$tag') | [0]")

   if [ "$res" == "true" ]; then 
    return 0 
  else
    return 1
  fi

If this returns 0 we don't have to push this image to ecr as image already exists.

CNI version will roll back to 1.3.0 after upgrade to 1.40

The CNI version will roll back to 1.3.0 after upgrading to 1.3.3 or 1.4.0

failed to install helm tiller with v2.13.1 image

Unable to bring up k8s-ec2-srcdst deployment on CentOS 7

Due to the ca certificate in different location on CentOS, the container k8s-ec2-srcdst is unable to go up because it cannot find the certificate. It is mentioned in kubernetes/kops#4331

Default certificate path: /etc/ssl/certs/ca-certificates.crt
Centos path: /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt

One workaround is to run the 'kubectl patch' command mentioned in the issue. Another thought I have is to change the source code of kops and recompile it.

Welcome any other good advice, thank you.

fix CVE-2018-1002105

https://github.com/nwcdlabs/kops-cn/blob/6a13f9d831c6c68792ba04219a06b97a4013ff26/mirror/fileRepository/mirro.sh#L5

mirror fileRepository 1.10.11 to China
kubernetes/kops#6146

can't create cluster in China - error fetching https://.../channels/stable

summary

cluster can't be created if the kops client has poor internet connection to github. You can test the connectivity by

curl https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable

how to re-produce this issue

turning on -v flag in kops create

kops create cluster \
     -v 9 \
     --cloud=aws \
...

running create-cluster.sh

9801a7a9620b:kops-cn hunhsieh $ bash create-cluster.sh
I0129 01:36:33.093275 18727 create_cluster.go:1407] Using SSH public key: /Users/hunhsieh/.ssh/id_rsa.pub
I0129 01:36:33.093795 18727 factory.go:68] state store s3://pahud-kops-state-store-zhy
I0129 01:36:33.342730 18727 s3context.go:194] found bucket in region "cn-northwest-1"
I0129 01:36:33.342805 18727 s3fs.go:220] Reading file "s3://pahud-kops-state-store-zhy/cluster.zhy.k8s.local/config"
I0129 01:36:33.487717 18727 channel.go:97] resolving "stable" against default channel location "https://raw.githubusercontent.com/kubernetes/kops/master/channels/"
I0129 01:36:33.487769 18727 channel.go:102] Loading channel from "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable"
I0129 01:36:33.489467 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:37:03.492838 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:37:03.993923 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:37:33.997853 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:37:35.000502 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:38:05.001673 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:38:07.002652 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:38:37.009644 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:38:41.010692 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:39:11.012203 18727 context.go:231] hit maximum retries 5 with error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout

error reading channel "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout

Pulling private image (ECR) from outside china

It may not be the right place to ask this, but just wanted to check if there is a fast way to pull ECR image from another AWS account outside china into aws china.

How to pull alb-ingress-controller docker image from ECR?

According to this:

https://github.com/nwcdlabs/kops-cn/blob/e84b7e00b43fc3209952f1ab788faee7a20e71a0/mirror/required-images.txt#L18

you will be able to pull image from

937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-coreos-alb-ingress-controller:1.0-beta.7

Please note the version tag may change over time.

k8s.gcr.io/kube-proxy:v1.11.6 is missing

After upgrading to 1.11.6, the kube-proxy pod failed to come up since this image is mirrored. (I guess it is the same for many other 1.11.6 image)

I wonder if we can just have automatic script to enumerate all versions of the white listed images and make a mirror of it, so that we don't need to worry about that any more.

amazon-vpc-routed-eni 节点会出现多个ip导致ssh 不到节点

目前使用默认的ami会导致机器一直重启，所以我看官方又说**区可以通过拷贝的姿势拿到最新的ami：kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17

链接：
https://github.com/kubernetes/kops/blob/master/docs/aws-china.md
kubernetes-retired/kube-aws#390 (comment)

multiple versions of amazon vpc cni mirror required

We only have 1.0.0 now. According to the release notes, we may need multiple versions including 1.3.0, 1.2.1, 1.2.0,etc.

https://github.com/nwcdlabs/kops-cn/blob/d3ce3f04487ae7585a2989935c4bfd360d9d2485/mirror/required-images.txt#L48

默认的配置，dns controller启动失败， kope/dns-controller: 1.11.0 找不到

dns controller启动失败， kope/dns-controller: 1.11.0 找不到

cann't validate the cluster

after finish creating the cluster, all the related services are been created sucessfully, but when I "kops validate cluster", it cann't connect the ELB, the log below:

unexpected error during validation: error listing nodes: Get https://api-cluster-bjs-k8s-local-c9l1qd-2011066806.cn-north-1.elb.amazonaws.com.cn/api/v1/nodes: dial tcp 54.222.209.4:443: i/o timeout

Anyone know the reason?
Thanks

aws-alb-ingress-controller need acm support

If you try to deploy aws-alb-ingress-controller in bjs&zhy region, it will need acm service. While ACM is not currently supported in ZHY&BJS YET.

The issue link in aws-alb-ingress-controller project is right here:
kubernetes-sigs/aws-load-balancer-controller#439

require

https://github.com/nwcdlabs/kops-cn/blob/d3ce3f04487ae7585a2989935c4bfd360d9d2485/mirror/required-images.txt#L48

Add a sample for extra instance group

add a sample for the extra instance group with mixed instance types and purchase options

https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#creating-a-instance-group-of-mixed-instances-types-aws-only

Dependency

fix kubernetes/kops#6763

etcd3 will eventually become the default cluster version, however, for some reason it still sticks to v2.2 at this moment.

We have a PR trying to get it sorted but at this moment, to align to the Kops upstream, we still stick to v2.2 now.

If you prefer to provision etcd3 as the default cluster, you can update the spec.yml like this
https://github.com/nwcdlabs/kops-cn/pull/29/files#diff-ce22796966d5547919fe1967f7781563

Hi @jansony1 , feel free to update this issue if you have any other useful insights.

Thanks.

TODO - update to Kops 1.10.1

https://github.com/kubernetes/kops/releases/tag/1.10.1
https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable

mirror the assets and FileRepository
update README(English)
update README(Chinese)

Simplify the cluster create/update/delete with a Makefile

let's simplify the control with a single Makefile and some entrypoints like:

make create-cluster
make edit-ig-nodes
make update-cluster
make delete-cluster

Add Istio release-1.0-latest-daily image mirror in required-images.txt

#24 (comment)

helm installation requires dependency update

In the helm official document, helm repo add and helm dep update is required before the helm install

Add istio.io chart repository and point to the daily release:

$ helm repo add istio.io https://storage.googleapis.com/istio-prerelease/daily-build/master-latest-daily/charts
Build the Helm dependencies:

$ helm dep update install/kubernetes/helm/istio

Can't pull image for Istio release-1.0-latest-daily

Got ImagePullBackOff errors during install istio

Events:
  Type     Reason                 Age                From                                                  Message
  ----     ------                 ----               ----                                                  -------
  Normal   Scheduled              22m                default-scheduler                                     Successfully assigned istio-citadel-5768b899d4-jg226 to ip-172-31-60-71.cn-north-1.compute.internal
  Normal   SuccessfulMountVolume  22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  MountVolume.SetUp succeeded for volume "istio-citadel-service-account-token-skxg9"
  Warning  Failed                 22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Failed to pull image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily": rpc error: code = Unknown desc = Error response from daemon: manifest for 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily not found
  Warning  Failed                 22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Error: ErrImagePull
  Normal   BackOff                22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Back-off pulling image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily"
  Warning  Failed                 22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Error: ImagePullBackOff
  Normal   Pulling                22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  pulling image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily"

Can't pull heptio-images-authenticator

Cant pull 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/heptio-images-authenticator:v0.3.0 when running aws-iam-authenticator.

Containers:
  aws-iam-authenticator:
    Container ID:
    Image:         937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/heptio-images-authenticator:v0.3.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      server
      --config=/etc/aws-iam-authenticator/config.yaml
      --state-dir=/var/aws-iam-authenticator
      --generate-kubeconfig=/etc/kubernetes/aws-iam-authenticator/kubeconfig.yaml

I have seen the image in https://github.com/nwcdlabs/kops-cn/blob/master/mirror/required-images.txt#L32.

For CN mainland scenario without ICP recordal, could we have an internal ELB being used to kubectl internally ?

How to use customized nodeup binary ?

We need to use a customized nodeup for our cluster. When I try to override the default URL with

export NODEUP_URL='https://s3-us-west-2.amazonaws.com/my-bucket/nodeup/linux/amd64/01/23/18/1516747024/nodeup'

It seems hijacked by the fileRepository setting.

I1214 16:28:21.474161   94429 builder.go:297] error reading hash file "https://s3.cn-north-1.amazonaws.com.cn/kops-bjs/fileRepository/my-bucket/nodeup/linux/amd64/01/23/18/1516747024/nodeup.sha1": file does not exist

you may have not staged your files correctly, please execute kops update cluster using the assets phase

Any suggestions?

error parsing SSH public key: ssh: no key found

I run into following issues when creating the cluster
I0416 09:48:45.708089 3672 create_cluster.go:1407] Using SSH public key: /home/ec2-user/.ssh/id_rsa.pub

error reading cluster configuration "cluster.zhy.k8s.local": error reading s3://liuhongxi-kops-cn/cluster.zhy.k8s.local/config: Unable to list AWS regions: NoCredentialProviders: no valid providers in chain
caused by: EnvAccessKeyNotFound: failed to find credentials in the environment.
SharedCredsLoad: failed to load profile, default.
EC2RoleRequestError: no EC2 instance role found
caused by: EC2MetadataError: failed to make EC2Metadata request
caused by:

<title>404 - Not Found</title>

404 - Not Found

make: *** [create-cluster] Error 1

docker repository for Amazon VPC CNI

Symptom
Deploy kops with Amazon VPC CNI(--networking=amazon-vpc-routed-eni), the daemonset aws-node will be failed due to ImagePullBackOff.

Root Cause
The generated image url of aws-node is invalid:
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/602401143452.dkr.ecr.us-west-2.amazonaws.com-amazon-k8s-cni:1.0.0

From the yaml template, image url will be generated from parameter "Networking.AmazonVPC.ImageName" or the default image url from us-west-2 ECR.

It works well after changing image url of aws-node to "pahud/amazon-k8s-cni:1.0.0"

Suggested Solution

Specify "Networking.AmazonVPC.ImageName" in kops edit as following:
networking:
amazonvpc:
imageName: amazon-k8s-cni:1.0.0
Add image amazon-k8s-cni:1.0.0 in docker registry 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn

Primary interface IP is unable to reach, which caused out-of-service behind ELB

I am using Private topology and Internal ELB to create a K8S cluster.

kops create cluster
#omit other parameters
--topology=private 
--networking=amazon-vpc-routed-eni 
--api-loadbalancer-type=internal

After the cluster started, each nodes was assigned with two interfaces. However, I found only one node is in-service behind ELB.

From the working node, the ip route table is:


core@ip-172-20-53-190 ~ $ ip route
default via 172.20.32.1 dev eth0 proto dhcp src 172.20.53.190 metric 1024
default via 172.20.32.1 dev eth1 proto dhcp src 172.20.61.156 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.20.32.0/19 dev eth0 proto kernel scope link src 172.20.53.190
172.20.32.1 dev eth0 proto dhcp scope link src 172.20.53.190 metric 1024
172.20.32.1 dev eth1 proto dhcp scope link src 172.20.61.156 metric 1024

You can see the eth0 is on the top. That explains why you can reach the eth0 IP.

From the other two defunc nodes, the ip route is like:

core@ip-172-20-114-248 ~ $ ip route
default via 172.20.96.1 dev eth1 proto dhcp src 172.20.98.243 metric 1024
default via 172.20.96.1 dev eth0 proto dhcp src 172.20.114.248 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.20.96.0/19 dev eth1 proto kernel scope link src 172.20.98.243
172.20.96.0/19 dev eth0 proto kernel scope link src 172.20.114.248
172.20.96.1 dev eth1 proto dhcp scope link src 172.20.98.243 metric 1024
172.20.96.1 dev eth0 proto dhcp scope link src 172.20.114.248 metric 1024
core@ip-172-20-114-248 ~ $

The entry of eth1 is on the top, thus you can only reach the IP of eth1.

I am not sure if there is something wrong with my creation. Hope someone can help. Thank you.

--Beta

HOWTO - create multiple clusters in a VPC

這個範例示範如何在寧夏regon同一個VPC裡面創建兩個kops集群，並且指定不同的cluster_name

cluster name

1st cluster: cluster1.zhy.k8s.local
2nd cluster: cluster2.zhy.k8s.local
(注意，必須k8s.local結尾)

subnets

在一個VPC裡面準備六個subnet如下，在這個範例我們每個cluster將會用其中三個subnets

Makefiles

準備兩個Makefile分別是cluster1.mk and cluster2.mk 內容範例：

https://github.com/nwcdlabs/kops-cn/blob/master/samples/multi-clusters-in-shared-vpc/cluster1.mk
https://github.com/nwcdlabs/kops-cn/blob/master/samples/multi-clusters-in-shared-vpc/cluster2.mk

創建第一個cluster

create cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk create-cluster

edit cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk edit-cluster

將spec.yml內容貼進去

update cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk update-cluster

validate cluster

切換context到cluster1

$ kubectl config use-context cluster1.zhy.k8s.local

validate cluster1

$ AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk validate-cluster

創建第二個cluster

create cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk create-cluster

edit cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk edit-cluster

將spec.yml內容貼進去

update cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk update-cluster

validate cluster

切換context到cluster2

$ kubectl config use-context cluster2.zhy.k8s.local

validate cluster2

$ AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk validate-cluster

get po

兩個cluster都可以列出所有kube-system內的Pod，全部都正常Running

error when: kops edit cluster

kops edit cluster
tried to add:
assets:
containerRegistry: 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn
fileRepository: https://s3.cn-north-1.amazonaws.com.cn/kops-bjs/fileRepository/
docker:
logDriver: ""
registryMirrors:
- https://registry.docker-cn.com

wq: to save, however ,editor re-opens and getting error:
error populating cluster spec: error building complete spec: options did not converge after 10 iterations

i tried to ignore and keep going to the final step : kops validate cluster
unexpected error during validation: error listing nodes: Get https://api-cluster-zhy-k8s-local-qpbf7n-1465482247.cn-northwest-1.elb.amazonaws.com.cn/api/v1/nodes: EOF

go to aws console, find:
the three master instances out of service.
i checked the security group rules are all right.
i already made icp exception for 80/8080/443 , and i can telnet elb:443.

and i googled it ,find similar issue: kubernetes/kops#5061

how to fix it ?

Create your cluster with exsiting subnet

Customer need to deploy their cluster into existing subnets, while Kops official page is not clear, as you could see here
https://github.com/kubernetes/kops/blob/master/docs/run_in_existing_vpc.md#shared-subnets

Specifically for those parts,

export SUBNET_ID=subnet-12345678 # replace with your subnet id
export SUBNET_CIDR=10.100.0.0/24 # replace with your subnet CIDR
export SUBNET_IDS=$SUBNET_IDS # replace with your comma separated subnet ids

What you really need is to just modify the script like below according to your subnet and zones, and keep other scripts no change in Makefile (PS: Only include necessary parts, )

.PHONY: create-cluster
create-cluster:  
	@KOPS_STATE_STORE=$(KOPS_STATE_STORE) \
	AWS_PROFILE=$(AWS_PROFILE) \
	AWS_REGION=$(AWS_REGION) \
	AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \
	kops create cluster \
     --cloud=aws \
     --name=$(CLUSTER_NAME) \
     --image=$(AMI) \
     --master-count=$(MASTER_COUNT) \
     --master-size=$(MASTER_SIZE) \
     --node-count=$(NODE_COUNT) \
     --node-size=$(NODE_SIZE)  \
     --vpc=$(VPCID) \
     --kubernetes-version=$(KUBERNETES_VERSION_URI) \
     --networking=amazon-vpc-routed-eni \
     --ssh-public-key=$(SSH_PUBLIC_KEY) \
     --zones=cn-northwest-1a,cn-northwest-1b \
     --subnets=subnet-2cf25a45,subnet-9315d7e8

1.Delete the original zone option
2.And then add last two lines.
3.Your subnet‘s order must comply with your zone’s order

running kops-cn with AWS VPC CNI 1.3

Amazon EKS now supports AWS VPC CNI 1.3

https://docs.aws.amazon.com/en_us/eks/latest/userguide/cni-upgrades.html

Let's make sure kops-cn goes well with CNI 1.3

Add more script to makefile

As some of our customer may edit wrong in "make edit-cluster" step or they need to update their cluster, so they need rolling update options. Also, if they choose use makefile to doing their kops operation, we my better list all operations so that they have no need to maintain another set of
Environment variable.

Here are two my customer requirement, so add it here.

.PHONY: rolling-cluster
rolling-cluster:
@KOPS_STATE_STORE=$(KOPS_STATE_STORE)
AWS_PROFILE=$(AWS_PROFILE)
AWS_REGION=$(AWS_REGION)
AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION)
kops rolling-update cluster --name $(CLUSTER_NAME) --yes --cloudonly

.PHONY: get-cluster
get-cluster:
@KOPS_STATE_STORE=$(KOPS_STATE_STORE)
AWS_PROFILE=$(AWS_PROFILE)
AWS_REGION=$(AWS_REGION)
AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION)
kops get cluster --name $(CLUSTER_NAME)

change the default AMI from CoreOS to Amazon Linux 2 LTS

Background

to make sure the OS is more compatible with other components such as

AWS VPC CNI
AWS ALB Ingress

And eliminate potential complexity of maintainence in the future.

TODO

make sure the latest AMI in cn-north-1 and cn-northwest-1 is compatible with the latest stable kops
make sure latest AWS VPC CNI (1.3) is compatible
kubernetes/kops#6341 need to be identified and fixed
CoreDNS as the replacement of kube-dns
make sure AWS ALB Ingress is compatible
update env.config and set Amazon Linux 2 LTS as the default AMI

i guess because the node have muli private ips, if the primary ip is not the first, the node is NotReady status.

nwcdheap / kops-cn Goto Github PK

kops-cn's People

Contributors

Stargazers

Watchers

Forkers

kops-cn's Issues

summary

how to re-produce this issue

Dependency

404 - Not Found

cluster name

subnets

Makefiles

創建第一個cluster

validate cluster

創建第二個cluster

validate cluster

get po

Background

TODO

Recommend Projects

Recommend Topics

Recommend Org