mesosphere / dcos-kubernetes-quickstart Goto Github PK

View Code? Open in Web Editor NEW

167.0 68.0 54.0 80.96 MB

Quickstart guide for Kubernetes on DC/OS

Home Page: https://mesosphere.com/kubernetes/

License: Apache License 2.0

Makefile 40.64% Shell 5.66% HCL 53.70%

kubernetes dcos dcos-orchestration-guild

dcos-kubernetes-quickstart's Introduction

Kubernetes on DC/OS

Kubernetes is now available as a DC/OS package to quickly, and reliably run Kubernetes clusters on Mesosphere DC/OS.

NOTE: The latest dcos-kubernetes-quickstart doesn't support any Kubernetes framework version before 2.0.0-1.12.1. The reason is that now creating Kubernetes clusters requires the installation of the Mesosphere Kubernetes Engine.

Known limitations

Before proceeding, please check the current package limitations.

Pre-Requisites

Check the requirements for running this quickstart:

Linux or MacOS
Terraform 0.11.x. On MacOS, you can install with brew:
```
$ brew install terraform
```
Google Cloud or AWS account with enough permissions to provide the needed infrastructure

Quickstart

Once the pre-requisites are met, clone this repo:

$ git clone [email protected]:mesosphere/dcos-kubernetes-quickstart.git && cd dcos-kubernetes-quickstart

Prepare infrastructure configuration

This quickstart defaults to Google Cloud

First, make sure you have have followed the Google Cloud setup instructions.

Then, start by generating the default infrastructure configuration:

$ make gcp

This will output sane defaults to .deploy/terraform.tfvars. Now, edit said file and set your gcp_project and the ssh_public_key_file (the SSH public key you will use to log-in into your new VMs later).

WARNING: Please, do not set a smaller instance (VM) type on the risk of failing to install Kubernetes.

cluster_name = "dcos-kubernetes"
cluster_name_random_string = true

dcos_version = "1.12.3"

num_of_masters = "1"
num_of_private_agents = "4"
num_of_public_agents = "1"

bootstrap_instance_type = "n1-standard-1"
master_instance_type = "n1-standard-8"
private_agent_instance_type = "n1-standard-8"
public_agent_instance_type = "n1-standard-8"

# admin_ips = "0.0.0.0/0" # uncomment to access master from any IP

gcp_project = "YOUR_GCP_PROJECT"
gcp_region = "us-central1"
ssh_public_key_file = "/PATH/YOUR_GCP_SSH_PUBLIC_KEY.pub"
#
# If you want to use GCP service account key instead of GCP SDK
# uncomment the line below and update it with the path to the key file
# gcp_credentials = "/PATH/YOUR_GCP_SERVICE_ACCOUNT_KEY.json"
#

NOTE: The current release of the DC/OS GCP Terraform module also requires the GOOGLE_PROJECT and GOOGLE_REGION environment variables to be set. Please set them with appropriates values for your deployment:

$ export GOOGLE_PROJECT="YOUR_GCP_PROJECT"
$ export GOOGLE_REGION="us-central1"

Kubernetes configuration

RBAC

NOTE: This quickstart will provision a Kubernetes cluster with RBAC support.

To deploy a cluster with RBAC disabled RBAC update .deploy/options.json:

{
  "service": {
    "name": "dev/kubernetes01"
  },
  "kubernetes": {
    "authorization_mode": "AlwaysAllow"
  }
}

If you want to give users access to the Kubernetes API check documentation.

NOTE: The authorization mode for a cluster must be chosen when installing the package. Changing the authorization mode after installing the package is not supported.

HA Cluster

NOTE: By default, it will provision a Kubernetes cluster with one (1) worker node, and a single instance of every control plane component.

To deploy a highly-available cluster with three (3) private Kubernetes nodes update .deploy/options.json:

{
  "service": {
    "name": "dev/kubernetes01"
  },
  "kubernetes": {
    "high_availability": true,
    "private_node_count": 3
  }
}

Download command-line tools

If you haven't already, please download DC/OS client, dcos and Kubernetes client, kubectl:

$ make get-cli

The dcos and kubectl binaries will be downloaded to the current workdir. It's up to you to decided whether or not to copy or move them to another path, e.g. a path included in PATH.

Install

You are now ready to provision the DC/OS cluster and install the Kubernetes package:

$ make deploy

Terraform will now try and provision the infrastructure on your chosen cloud provider, and then proceed to install DC/OS.

When DC/OS is up and running, the Kubernetes package installation will take place.

Wait until all tasks are running before trying to access the Kubernetes API.

You can watch the progress what was deployed so far with:

$ make watch-kubernetes-cluster

Below is an example of how it looks like when the install ran successfully:

Using Kubernetes cluster: dev/kubernetes01
deploy (serial strategy) (COMPLETE)
   etcd (serial strategy) (COMPLETE)
      etcd-0:[peer] (COMPLETE)
   control-plane (dependency strategy) (COMPLETE)
      kube-control-plane-0:[instance] (COMPLETE)
   mandatory-addons (serial strategy) (COMPLETE)
      mandatory-addons-0:[instance] (COMPLETE)
   node (dependency strategy) (COMPLETE)
      kube-node-0:[kubelet] (COMPLETE)
   public-node (dependency strategy) (COMPLETE)

You can access DC/OS Dashboard and check Kubernetes package tasks under Services:

$ make ui

Exposing the Kubernetes API

Check the exposing Kubernetes API doc to understand how the Kubernetes API gets exposed. To actually expose the Kubernetes API for the new Kubernetes cluster using Marathon-LB, run:

$ make marathon-lb

NOTE: If you have changed in .deploy/terraform.tfvars file the number of num_of_public_agents to more than 1, please scale marathon-lb service to the same number, so you can access Kubernetes API from any DC/OS public agent.

Accessing the Kubernetes API

In order to access the Kubernetes API from outside the DC/OS cluster, one needs to configure kubectl, the Kubernetes CLI tool:

$ make kubeconfig

Let's test accessing the Kubernetes API and list the Kubernetes cluster nodes:

$ ./kubectl --context devkubernetes01 get nodes
NAME                                                  STATUS   ROLES    AGE     VERSION
kube-control-plane-0-instance.devkubernetes01.mesos   Ready    master   5m18s   v1.16.9
kube-node-0-kubelet.devkubernetes01.mesos             Ready    <none>   2m58s   v1.16.9

And now, let's check how the system Kubernetes pods are doing:

$ ./kubectl --context devkubernetes01 -n kube-system get pods
NAME                                                                          READY   STATUS    RESTARTS   AGE
calico-node-s9828                                                             2/2     Running   0          3m21s
calico-node-zc8qw                                                             2/2     Running   0          3m38s
coredns-6c7669957f-rvz85                                                      1/1     Running   0          3m38s
kube-apiserver-kube-control-plane-0-instance.devkubernetes01.mesos            1/1     Running   0          4m43s
kube-controller-manager-kube-control-plane-0-instance.devkubernetes01.mesos   1/1     Running   0          4m42s
kube-proxy-kube-control-plane-0-instance.devkubernetes01.mesos                1/1     Running   0          4m48s
kube-proxy-kube-node-0-kubelet.devkubernetes01.mesos                          1/1     Running   0          3m21s
kube-scheduler-kube-control-plane-0-instance.devkubernetes01.mesos            1/1     Running   0          4m26s
kubernetes-dashboard-5cbf45898-nkjsm                                          1/1     Running   0          3m37s
local-dns-dispatcher-kube-node-0-kubelet.devkubernetes01.mesos                1/1     Running   0          3m21s
metrics-server-594576c7d8-cb4pj                                               1/1     Running   0          3m35s

Accessing the Kubernetes Dashboard

You will be able to access the Kubernetes Dashboard by running:

$ kubectl --context devkubernetes01 proxy

Then pointing your browser at:

http://127.0.0.1:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

Please note that you will have to sign-in into the Kubernetes Dashboard before being able to perform any action.

Uninstall Kubernetes

To uninstall the DC/OS Kubernetes package while leaving your DC/OS cluster up, run:

$ make uninstall

NOTE: This will only uninstall Kubernetes. Make sure you destroy your DC/OS cluster using the instructions below when you finish testing, or otherwise you will need to delete all cloud resources manually!

Destroy cluster

To destroy the whole deployment:

$ make destroy

Last, clean generated resources:

$ make clean

Documentation

For more details, please see the docs folder and as well check the official service docs

Community

Get help and connect with other users on the mailing list or on DC/OS community Slack in the #kubernetes channel.

dcos-kubernetes-quickstart's People

Contributors

Stargazers

Watchers

dcos-kubernetes-quickstart's Issues

Exception when using custom TLS provisioning

As reported by @deric:

I'm having the same issue with 0.4.0-1.9.0-beta. All artifacts seems to be downloaded in the work dir.

2018/01/11 10:58:59 Resolve disabled via -resolve=false: Skipping host resolution
2018/01/11 10:58:59 Template handling disabled via -template=false: Skipping any config templates
2018/01/11 10:58:59 No $MESOS_SANDBOX/.ssl directory found. Cannot install certificate. Error: stat /var/lib/mesos/slave/slaves/59336e89-bac9-44c0-ba44-78a66392b7b1-S8/frameworks/56308059-3e74-48f4-86a9-67f9bd519f44-0001/executors/dev_k8.69908f7c-f6be-11e7-8ff9-fe0d65b1eb4a/runs/82acd8eb-26f1-47c8-9dce-b72d3cfbb954/.ssl: no such file or directory
2018/01/11 10:58:59 Local IP --> 192.168.5.91
2018/01/11 10:58:59 SDK Bootstrap successful.
Exception in thread "main" com.mesosphere.sdk.kubernetes.scheduler.tls.exceptions.UnrecoverableTLSException: com.mesosphere.sdk.state.StateStoreException: Key cannot contain '/' (reason: LOGIC_ERROR)
	at com.mesosphere.sdk.kubernetes.scheduler.tls.StateStoreTLSStore.loadServiceAccountKeyPair(StateStoreTLSStore.java:69)
	at com.mesosphere.sdk.kubernetes.scheduler.tls.OpenTLSProvisioner.initializeServiceAccountKeyPair(OpenTLSProvisioner.java:64)
	at com.mesosphere.sdk.kubernetes.scheduler.tls.TLSProvisioner.initialize(TLSProvisioner.java:63)
	at com.mesosphere.sdk.kubernetes.scheduler.Main.main(Main.java:61)
Caused by: com.mesosphere.sdk.state.StateStoreException: Key cannot contain '/' (reason: LOGIC_ERROR)
	at com.mesosphere.sdk.state.StateStore.validateKey(StateStore.java:582)
	at com.mesosphere.sdk.state.StateStore.fetchProperty(StateStore.java:387)
	at com.mesosphere.sdk.kubernetes.scheduler.tls.StateStoreTLSStore.loadServiceAccountKeyPair(StateStoreTLSStore.java:60)
	... 3 more
I0111 10:59:21.747092    11 executor.cpp:925] Command exited with status 1 (pid: 15)
I0111 10:59:22.748448    14 process.cpp:1072] Failed to accept socket: future discarded

@bmcstdio any idea what is triggering this?

Pod cannot call it's own service

I installed flink 1.5 on kubernetes on DC/OS, but it seems the jobmanager pod cannot call itself through the jobmanager service, which means that flink cannot deploy jars and the cluster is useless.

I wonder if there is a bug in the DC/OS overlay network integration with kubernetes? I'm pretty sure this should work on a normally working kubernetes cluster.

kubectl get pods
NAME READY STATUS RESTARTS AGE
flink-jobmanager-75bbb96f4d-p5fh8 1/1 Running 0 1h
flink-taskmanager-7679c9d55d-dzcrk 1/1 Running 0 1h
flink-taskmanager-7679c9d55d-z9lsp 1/1 Running 0 1h

kubectl get svc flink-jobmanager
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
flink-jobmanager ClusterIP 10.100.242.186 6123/TCP,6124/TCP,6125/TCP,8081/TCP 1h

calling jobmanager from a taskmanager pod works fine:

kubectl exec -it flink-taskmanager-7679c9d55d-z9lsp /bin/bash
root@flink-taskmanager-7679c9d55d-z9lsp:/opt/flink# curl flink-jobmanager:8081 -v

Rebuilt URL to: flink-jobmanager:8081/

Trying 10.100.242.186...

TCP_NODELAY set

Connected to flink-jobmanager (10.100.242.186) port 8081 (#0)
GET / HTTP/1.1
Host: flink-jobmanager:8081
User-Agent: curl/7.52.1
Accept: /

< HTTP/1.1 200 OK

Calling jobmanager service url from jobmanager pod fails with timeout. Notice the same service IP is used, so dns seems to be working fine:

kubectl exec -it flink-jobmanager-75bbb96f4d-p5fh8 /bin/bash
root@flink-jobmanager-75bbb96f4d-p5fh8:/opt/flink# curl flink-jobmanager:8081 -v

Rebuilt URL to: flink-jobmanager:8081/

Trying 10.100.242.186...

TCP_NODELAY set

Calling jobmanager on localhost from jobmanager pod works:

root@flink-jobmanager-75bbb96f4d-p5fh8:/opt/flink# curl localhost:8081 -v

Rebuilt URL to: localhost:8081/

Trying 127.0.0.1...

TCP_NODELAY set

Connected to localhost (127.0.0.1) port 8081 (#0)
GET / HTTP/1.1
Host: localhost:8081
User-Agent: curl/7.52.1
Accept: /

< HTTP/1.1 200 OK

Broken pipe for dcos-launch create -c launch.yaml

Running make deploy (after make docker) and getting broken pipe errors:

dcos-launch create -c launch.yaml
...
dcos-launch wait
[2017-12-18 16:13:58,791|dcos_launch.onprem|INFO]: Waiting for bare cluster provisioning status..
...
[2017-12-18 16:15:00,799|googleapiclient.discovery|INFO]: URL being requested: GET https://www.googleapis.com/deploymentmanager/v2/projects/xxx/global/deployments/dcos-cluster-mg8773wr?alt=json
Traceback (most recent call last):
  File "dcos_launch/cli.py", line 133, in <module>
  File "dcos_launch/cli.py", line 125, in main
  File "dcos_launch/cli.py", line 91, in do_main
  File "dcos_launch/onprem.py", line 99, in wait
  File "dcos_launch/gce.py", line 75, in wait
  File "site-packages/retrying.py", line 49, in wrapped_f
  File "site-packages/retrying.py", line 206, in call
  File "site-packages/retrying.py", line 247, in get
  File "site-packages/six.py", line 686, in reraise
  File "site-packages/retrying.py", line 200, in call
  File "dcos_launch/platforms/gce.py", line 271, in wait_for_completion
  File "dcos_launch/platforms/gce.py", line 135, in handle_exception
  File "dcos_launch/platforms/gce.py", line 251, in get_info
  File "site-packages/oauth2client/util.py", line 137, in positional_wrapper
  File "site-packages/googleapiclient/http.py", line 837, in execute
  File "site-packages/googleapiclient/http.py", line 176, in _retry_request
  File "site-packages/googleapiclient/http.py", line 163, in _retry_request
  File "site-packages/oauth2client/transport.py", line 169, in new_request
  File "site-packages/httplib2/__init__.py", line 1322, in request
  File "site-packages/httplib2/__init__.py", line 1072, in _request
  File "site-packages/httplib2/__init__.py", line 996, in _conn_request
  File "http/client.py", line 1106, in request
  File "http/client.py", line 1151, in _send_request
  File "http/client.py", line 1102, in endheaders
  File "http/client.py", line 934, in _send_output
  File "http/client.py", line 908, in send
  File "ssl.py", line 891, in sendall
  File "ssl.py", line 861, in send
  File "ssl.py", line 586, in write
BrokenPipeError: [Errno 32] Broken pipe
Failed to execute script cli
make: *** [launch-dcos] Error 255

Repeated deploy a few times, always getting the same errors.

etcd, api-server mesos tasks is unable to start on dcos v1.9.2

tail of etcd stderr

I0910 13:46:01.699224    36 exec.cpp:162] Version: 1.4.0
I0910 13:46:01.705013    29 exec.cpp:237] Executor registered on agent 6e686eb8-8059-422e-9123-9f709f6e0c1c-S54
2017-09-10 13:46:05.402293 I | pkg/flags: recognized and used environment variable ETCD_DATA_DIR=data-dir
2017-09-10 13:46:05.402559 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_TOKEN=kubernetes
2017-09-10 13:46:05.402692 I | pkg/flags: recognized and used environment variable ETCD_WAL_DIR=wal-pv/wal-dir
2017-09-10 13:46:05.402807 W | pkg/flags: unrecognized environment variable ETCD_LISTEN_CLIENT_PORT=2379
2017-09-10 13:46:05.402912 W | pkg/flags: unrecognized environment variable ETCD_LISTEN_PEER_PORT=2380
2017-09-10 13:46:05.403017 E | etcdmain: error verifying flags, expected IP in URL for binding (http://:2380). See 'etcd --help'.

tail of api-server stderr

I0910 13:46:26.444809    36 exec.cpp:162] Version: 1.4.0
I0910 13:46:26.450470    30 exec.cpp:237] Executor registered on agent 6e686eb8-8059-422e-9123-9f709f6e0c1c-S52
invalid argument "" for --bind-address=: failed to parse IP: ""
Usage of ./kube-apiserver:

I am trying to run the kubernetes framework on an existing DCOS v1.9.2. I used the beta-kubernetes package released on the docker image mesosphere/universe-server:20170908T222335Z-version-3.x-3d0a4e4f33

Does this release have any hard dependency on DCOS v1.10?

marathon app.json for kubernetes framework

{
  "id": "/kubernetes",
  "connected": false,
  "recovered": false,
  "TASK_UNREACHABLE": 0,
  "cmd": "export LD_LIBRARY_PATH=$MESOS_SANDBOX/libmesos-bundle/lib:$LD_LIBRARY_PATH; export MESOS_NATIVE_JAVA_LIBRARY=$(ls $MESOS_SANDBOX/libmesos-bundle/lib/libmesos-*.so); export JAVA_HOME=$(ls -d $MESOS_SANDBOX/jre*/); export JAVA_HOME=${JAVA_HOME%/}; export PATH=$(ls -d $JAVA_HOME/bin):$PATH &&  export JAVA_OPTS=\"-Xms256M -Xmx512M\" &&  ./kubernetes-scheduler/bin/kubernetes ./kubernetes-scheduler/svc.yml",
  "instances": 1,
  "cpus": 1,
  "mem": 1024,
  "disk": 0,
  "gpus": 0,
  "constraints": [],
  "fetch": [
    {
      "uri": "https://downloads.mesosphere.com/java/jre-8u144-linux-x64.tar.gz",
      "extract": true,
      "executable": false,
      "cache": false
    },
    {
      "uri": "https://downloads.mesosphere.com/kubernetes/assets/0.1.0-1.7.5-beta/kubernetes-scheduler.zip",
      "extract": true,
      "executable": false,
      "cache": false
    },
    {
      "uri": "https://downloads.mesosphere.io/libmesos-bundle/libmesos-bundle-1.10-1.4-63e0814.tar.gz",
      "extract": true,
      "executable": false,
      "cache": false
    }
  ],
  "storeUrls": [],
  "backoffSeconds": 1,
  "backoffFactor": 1.15,
  "maxLaunchDelaySeconds": 3600,
  "healthChecks": [
    {
      "gracePeriodSeconds": 900,
      "intervalSeconds": 30,
      "timeoutSeconds": 30,
      "maxConsecutiveFailures": 0,
      "portIndex": 0,
      "path": "/v1/plans/deploy",
      "protocol": "HTTP",
      "ignoreHttp1xx": false
    },
    {
      "gracePeriodSeconds": 900,
      "intervalSeconds": 30,
      "timeoutSeconds": 30,
      "maxConsecutiveFailures": 0,
      "portIndex": 0,
      "path": "/v1/plans/recovery",
      "protocol": "HTTP",
      "ignoreHttp1xx": false
    }
  ],
  "readinessChecks": [],
  "dependencies": [],
  "upgradeStrategy": {
    "minimumHealthCapacity": 0,
    "maximumOverCapacity": 0
  },
  "unreachableStrategy": {
    "inactiveAfterSeconds": 300,
    "expungeAfterSeconds": 600
  },
  "killSelection": "YOUNGEST_FIRST",
  "portDefinitions": [
    {
      "port": 0,
      "protocol": "tcp",
      "name": "api",
      "labels": {
        "VIP_0": "/api.kubernetes:80"
      }
    }
  ],
  "requirePorts": false,
  "labels": {
    "DCOS_COMMONS_UNINSTALL": "true",
    "DCOS_PACKAGE_RELEASE": "0",
    "DCOS_SERVICE_SCHEME": "http",
    "DCOS_PACKAGE_SOURCE": "http://kubernetes-universe.marathon.l4lb.thisdcos.directory/repo",
    "DCOS_PACKAGE_METADATA": "eyJwYWNrYWdpbmdWZXJzaW9uIjoiMy4wIiwibmFtZSI6ImJldGEta3ViZXJuZXRlcyIsInZlcnNpb24iOiIwLjEuMC0xLjcuNS1iZXRhIiwibWFpbnRhaW5lciI6InN1cHBvcnRAbWVzb3NwaGVyZS5jb20iLCJkZXNjcmlwdGlvbiI6IkhpZ2hseSBBdmFpbGFibGUgS3ViZXJuZXRlcyIsInRhZ3MiOlsia3ViZXJuZXRlcyJdLCJzZWxlY3RlZCI6ZmFsc2UsImZyYW1ld29yayI6dHJ1ZSwicHJlSW5zdGFsbE5vdGVzIjoiS3ViZXJuZXRlcyBvbiBEQy9PUyBpcyBjdXJyZW50bHkgaW4gQmV0YSBhbmQgc2hvdWxkIG5vdCBiZSB1c2VkIGluIHByb2R1Y3Rpb24uXG5cbkRlZmF1bHQgY29uZmlndXJhdGlvbiByZXF1aXJlcyAzIGFnZW50IG5vZGVzIGVhY2ggd2l0aDogNiBDUFUgfCA1NjU0IE1CIE1FTSB8IDYxNSBNQiBEaXNrXG5cblBsZWFzZSB2aXNpdCBvdXIgcXVpY2tzdGFydCByZXBvc2l0b3J5IChodHRwczovL2dpdGh1Yi5jb20vbWVzb3NwaGVyZS9kY29zLWt1YmVybmV0ZXMtcXVpY2tzdGFydCkgZm9yIGV4YW1wbGUgZGVwbG95bWVudHMuIiwicG9zdEluc3RhbGxOb3RlcyI6IkRDL09TIEt1YmVybmV0ZXMgaXMgYmVpbmcgaW5zdGFsbGVkIVxuXG5cdERvY3VtZW50YXRpb246IGh0dHBzOi8vZ2l0aHViLmNvbS9tZXNvc3BoZXJlL2Rjb3Mta3ViZXJuZXRlcy1xdWlja3N0YXJ0XG5cdElzc3VlczogaHR0cHM6Ly9qaXJhLm1lc29zcGhlcmUuY29tIiwicG9zdFVuaW5zdGFsbE5vdGVzIjoiREMvT1MgS3ViZXJuZXRlcyBoYXMgYmVlbiB1bmluc3RhbGxlZC4iLCJpbWFnZXMiOnsiaWNvbi1zbWFsbCI6Imh0dHBzOi8vZG93bmxvYWRzLm1lc29zcGhlcmUuY29tL2t1YmVybmV0ZXMvYXNzZXRzL2s4cy1zbWFsbC00OHg0OC5wbmciLCJpY29uLW1lZGl1bSI6Imh0dHBzOi8vZG93bmxvYWRzLm1lc29zcGhlcmUuY29tL2t1YmVybmV0ZXMvYXNzZXRzL2s4cy1tZWRpdW0tOTZ4OTYucG5nIiwiaWNvbi1sYXJnZSI6Imh0dHBzOi8vZG93bmxvYWRzLm1lc29zcGhlcmUuY29tL2t1YmVybmV0ZXMvYXNzZXRzL2s4cy1sYXJnZS0yNTZ4MjU2LnBuZyJ9fQ==",
    "DCOS_PACKAGE_REGISTRY_VERSION": "3.0",
    "DCOS_SERVICE_NAME": "kubernetes",
    "DCOS_PACKAGE_FRAMEWORK_NAME": "kubernetes",
    "DCOS_SERVICE_PORT_INDEX": "0",
    "DCOS_PACKAGE_VERSION": "0.1.0-1.7.5-beta",
    "DCOS_COMMONS_API_VERSION": "v1",
    "DCOS_PACKAGE_NAME": "beta-kubernetes",
    "MARATHON_SINGLE_INSTANCE_APP": "true",
    "DCOS_PACKAGE_IS_FRAMEWORK": "true"
  },
  "env": {
    "APISERVER_CPUS": "0.2",
    "KUBE_PROXY_MEM": "128",
    "ETCD_DISK_TYPE": "ROOT",
    "APISERVER_MEM": "512",
    "KUBERNETES_SERVICE_CIDR": "10.100.0.0/16",
    "KUBELET_MEM": "2048",
    "KUBERNETES_NODE_COUNT": "3",
    "KUBELET_CPUS": "2",
    "JAVA_URI": "https://downloads.mesosphere.com/java/jre-8u144-linux-x64.tar.gz",
    "CONTROLLERMANAGER_COUNT": "3",
    "KUBERNETES_VERSION": "v1.7.5",
    "LIBMESOS_URI": "https://downloads.mesosphere.io/libmesos-bundle/libmesos-bundle-1.10-1.4-63e0814.tar.gz",
    "ETCD_VERSION": "v3.2.6",
    "CONTROLLERMANAGER_CPUS": "0.2",
    "ETCD_PLACEMENT": "hostname:UNIQUE",
    "ETCD_CPUS": "0.2",
    "KUBERNETES_NODE_DISK": "1024",
    "SCHEDULER_COUNT": "3",
    "APISERVER_PLACEMENT": "hostname:UNIQUE",
    "ETCD_DISK": "512",
    "CONTROLLERMANAGER_PLACEMENT": "hostname:UNIQUE",
    "CONTROLLERMANAGER_MEM": "256",
    "APISERVER_COUNT": "3",
    "ETCD_COUNT": "3",
    "ETCD_MEM": "512",
    "SCHEDULER_CPUS": "0.2",
    "SCHEDULER_MEM": "256",
    "SCHEDULER_PLACEMENT": "hostname:UNIQUE",
    "EXECUTOR_URI": "https://downloads.mesosphere.com/dcos-commons/artifacts/0.30.0/executor.zip",
    "KUBE_PROXY_CPUS": "0.1",
    "FRAMEWORK_NAME": "kubernetes",
    "KUBERNETES_NODE_PLACEMENT": "hostname:UNIQUE",
    "FRAMEWORK_LOG_LEVEL": "INFO"
  }
}

Unable to delpoy Application on Kubernetes running on DCOS

Hello All,

I am deploying on Application on top of Kubernetes Cluster which is running on top of DC/OS Cluster.

But when I am executing below the command:

# kubectl create -f deploy.yaml

The container is not getting created and below the error I can see in describe command output.:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m41s (x156 over 49m) default-scheduler 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.

Please let me know in case you need any further information.

Thanks in advance.

Regards,
Arnab

Fail to download pod containers when proxy is in place

I tried to install the default config of the dcos beta-kubernetes package v 0.2.2-1.7.7-beta.
The first 19 tasks are healthy but the mandatory-addons-0 failed with the following error
Any idea?

`2017/10/14 21:47:28 No $MESOS_SANDBOX/.ssl directory found. Cannot install certificate. Error: stat /var/lib/mesos/slave/slaves/2153cc40-c292-4ee6-a7d9-9771aacd2109-S0/frameworks/dd0f75fe-9b95-4a2d-a514-56ab403fb7bf-0001/executors/mandatory-addons__ae8a9ee7-a3fa-4e8d-88ce-c4edcff236a9/runs/5ce2fe35-496c-4ba6-9032-87f7a05a5bff/containers/c79ed785-29d7-41e9-8b5d-d9e351e05c82/.ssl: no such file or directory
2017/10/14 21:47:28 Local IP --> 10.90.140.54
2017/10/14 21:47:28 SDK Bootstrap successful.

Deploying kube-dns components --

configmap "kube-dns" configured
service "kube-dns" configured
deployment "kube-dns" configured
kube-dns check failed and is not ready`

[FIXED] Can't create Kubernetes nodes

I'm trying to install beta-kubernetes on DC/OS 1.10.0 using GUI (Services->Run a Service->Install a Package). I changed default parameters:
Kubelet MEMORY SIZE -> 8192
Kubernetes NODE COUNT -> 5
After service is deployed I see only 7 active tasks and there is no any node

#kubectl get namespaces
NAME          STATUS    AGE
default       Active    1d
kube-public   Active    1d
kube-system   Active    1d

#kubectl get nodes
No resources found.

We run DC/OS on AWS 7 instances m3.xlarge. Our current usage is 8 of 28 Shares for CPU and 9 GiB of 96 GiB, so lack of resources shouldn't be an issue.

Unable to setup Kubectl command in DCOS Master

Hello,

I am running below the command to get kubectl in master node but getting below the error.

Command:
dcos kubernetes kubeconfig --apiserver-url https://<public-slave-agent-ip>:6443 --insecure-skip-tls-verify

ERROR:
2018/10/10 14:12:40 failed to update kubeconfig context '10211203926443': HTTP GET Query for http://<master-ip>/service/kubernetes/v1/auth/data failed: 403 Forbidden Response: the service account secret cannot be served over an insecure connection Response data (71 bytes): the service account secret cannot be served over an insecure connection HTTP query failed
Please help me.

Regards,
Arnab

Unable to update config of kubernetes service

When someone wants to change some parameters of configuration, for example change the number of kubernetes cpus from default values:

{
  "kubernetes": {
    "high_availability": true,
    "node_count": 3,
    "public_node_count": 1
  }
}

to:

{
 "kubernetes": {
   "high_availability": true,
   "node_count": 3,
   "public_node_count": 1,
   "reserved_resources": {
     "kube_cpus": 5
   }
 }
}

WE never get a ready kubernetes service. And by using the dcos update command then plans never end successfully.

All tests were launched on GCP. With the following desired_cluster_profile.gcp contents:

num_of_masters = "1"
num_of_private_agents = "3"
num_of_public_agents = "1"

gcp_bootstrap_instance_type = "n1-standard-1"
gcp_master_instance_type = "n1-standard-8"
gcp_agent_instance_type = "n1-standard-8"
gcp_public_agent_instance_type = "n1-standard-8"

cannot get 3 nodes

I've deployed DC/OS using CloudFormation with all default parameter values.

First time I installed Kubernetes (with default config), I only got 1 node and 14h later it was still only one of them:

> kubectl -s localhost:9000 get nodes
NAME                                   STATUS    AGE       VERSION
kube-node-0-kubelet.kubernetes.mesos   Ready     14h       v1.7.7

I've re-deployed it on the same cluster, and got 2 nodes:

> kubectl -s localhost:9000 get nodes
NAME                                   STATUS    AGE       VERSION
kube-node-0-kubelet.kubernetes.mesos   Ready     6m        v1.7.7
kube-node-2-kubelet.kubernetes.mesos   Ready     6m        v1.7.7

Creating with AWS

When I run make docker , I can see my AWS creds being passed in, but when I run make deploy PLATFORM=aws or make launch-dcos PLATFORM=aws, I get errors about GCE not being found. Should this not be a dependency since I am trying to install on AWS?

Error: 'tls: oversized record received with length 20527'

DC/OS 1.10 on CentOS7.4

Following these instructions:
https://docs.mesosphere.com/services/beta-kubernetes/0.4.0-1.9.0-beta/connecting-clients/

Everything works fine from the CLI, and http://localhost:9000 returns the expected list of apis...

But when I try to browse to:

http://localhost:9000/ui

I get:

Error: 'tls: oversized record received with length 20527'
Trying to reach: 'https://9.0.10.3:9090/'

If I change the url from https to http, it will bring up the dashboard:

localhost:9000/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/#!/overview?namespace=default

But heapster graphs will not show.

I'm sure I had this working fine in an earlier try at beta-kubernetes without having to do anything.

I do see a lot of these in the logs:

No $MESOS_SANDBOX/.ssl directory found. Cannot install certificate.
Not sure if it is related.

Would like to get this working properly... thanks.

"make deploy" fails

The following command:

KUBERNETES_FRAMEWORK_VERSION=1.0.3-1.9.7 make deploy

resulted in the following failure:

Waiting for DC/OS Master to be ready...

/Users/pires/Work/mesosphere/dcos-kubernetes-quickstart/./dcos package install --yes marathon-lb;\
	/Users/pires/Work/mesosphere/dcos-kubernetes-quickstart/./dcos package install --yes kubernetes --package-version=1.0.3-1.9.7 --options=./.deploy/options.json;\
	/Users/pires/Work/mesosphere/dcos-kubernetes-quickstart/./dcos marathon app add ./.deploy/kubeapi-proxy.json
By Deploying, you agree to the Terms and Conditions https://mesosphere.com/catalog-terms-conditions/#community-services
We recommend at least 2 CPUs and 1GiB of RAM for each Marathon-LB instance.

*NOTE*: For additional ```Enterprise Edition``` DC/OS instructions, see https://docs.mesosphere.com/administration/id-and-access-mgt/service-auth/mlb-auth/
Installing Marathon app for package [marathon-lb] version [1.12.2]
Marathon-lb DC/OS Service has been successfully installed!
See https://github.com/mesosphere/marathon-lb for documentation.
By Deploying, you agree to the Terms and Conditions https://mesosphere.com/catalog-terms-conditions/#certified-services
Kubernetes on DC/OS.

	Documentation: https://docs.mesosphere.com/service-docs/kubernetes
	Issues: https://github.com/mesosphere/dcos-kubernetes-quickstart/issues
Installing Marathon app for package [kubernetes] version [1.0.3-1.9.7]
Installing CLI subcommand for package [kubernetes] version [1.0.3-1.9.7]
New command available: dcos kubernetes
DC/OS Kubernetes is being installed!
Can't read from resource: ./.deploy/kubeapi-proxy.json.
Please check that it exists.
make: *** [install] Error 1

GCE setup missing GCE_CREDENTIALS/GCE_CREDENTIALS_PATH

the latest release (0.2.0-1.7.6-beta) ignores GOOGLE_APPLICATION_CREDENTIALS env var and is looking for GCE_CREDENTIALS/GCE_CREDENTIALS_PATH ones again

etcd is unable to start on DC/OS 1.12.1

After installing Kubernetes Engine, I was able to create a kubernetes cluster on DC/OS 1.12.1 (Centos 7.5, docker 1.13.1)

$dcos kubernetes cluster create
...
DC/OS Kubernetes is being installed!

But the kubernetes-cluster/etcd-0-peer service could not start (always in staging status)

Any suggestion would be really appreciate

node-1 kublet and kubeproxy failing repeatedly

We have configured a local universe with kubernetes 0.2.2-1.7.7 on DC/OS 1.10. The environment is completely air gapped with no access to the internet. Upon installing the k8s framework, all tasks launch properly except for node-1-kubelet and node-1-kubeproxy. Node-0 and node-2 kubelet and kubeproxy run successfully. There are plenty of resources available so it's not a resource issue. The tasks stages, then fails immediately. Marathon tries to reschedule it on another node but it fails on every single node.

Warning about DC/OS nodes requirement

read me should have a warning about 6 monsters nodes will be created.
got puzzled for a while as docker deploy failed with missing VMs, then double checked that GCS compute instance group was trying to created 6 instances, my account needed quote increase.

Access kubernetes nodeport service via marathon-lb?

Is it possible to expose nodeport service via marathon-lb ? If yes, can you please provide the link which explains the process of exposing kubernetes cluster nodeport service via marathon-lb ?

Is it possible to convert same quickstart using API call rather than DCOS CLI ?

Hi All,
I am deploying Kubernetes cluster on DCOS sometimes it works fine but sometimes it stuck at private node deployment even I check I have more than enough resources in my DCOS cluster.

If I deploy 4 private nodes in kubernetes it will deploy 3 but stuck at 4th node and If I choose 3 private nodes It stuck at 3rd one and deployment plan will never be completed.

Also sometimes it works fine but after deleting the whole secrets and Kubernetes related components if I try to redeploy the same thing my Kubernetes cluster will frequently going to staging stage in DCOS and deployment plan stuck.

Also, It will be very helpful for others if you add the same quickstart example using DCOS API rather than CLI.

Kubernetes installation stuck in IN_PROGRESS

Hi,

I've been trying to install kubernetes (1.0.3-1.9.7 & 1.0.2-1.9.6) but the installation gets stuck. Can't access kubernetes dashboard either (similar error to #77).

$ dcos kubernetes plan status deploy
deploy (serial strategy) (IN_PROGRESS)
├─ etcd (serial strategy) (COMPLETE)
│  └─ etcd-0:[peer] (COMPLETE)
├─ apiserver (parallel strategy) (COMPLETE)
│  └─ kube-apiserver-0:[instance] (COMPLETE)
├─ kubernetes-api-proxy (parallel strategy) (COMPLETE)
│  └─ kubernetes-api-proxy-0:[install] (COMPLETE)
├─ controller-manager (parallel strategy) (COMPLETE)
│  └─ kube-controller-manager-0:[instance] (COMPLETE)
├─ scheduler (parallel strategy) (COMPLETE)
│  └─ kube-scheduler-0:[instance] (COMPLETE)
├─ node (parallel strategy) (IN_PROGRESS)
│  ├─ kube-node-0:[kube-proxy] (PREPARED)
│  ├─ kube-node-0:[coredns] (PREPARED)
│  └─ kube-node-0:[kubelet] (PREPARED)
├─ public-node (parallel strategy) (COMPLETE)
└─ mandatory-addons (serial strategy) (PENDING)
   ├─ mandatory-addons-0:[kube-dns] (PENDING)
   ├─ mandatory-addons-0:[metrics-server] (PENDING)
   ├─ mandatory-addons-0:[dashboard] (PENDING)
   └─ mandatory-addons-0:[ark] (PENDING)

Any ideas why this is?

Launching it in AWS does not finish because snapshot IDs are not public or wrong

When launching the quickstart in AWS it will hang in creating the BareServer autoscaling group due to instances incapable of getting the snapshot. In the Cloudformation template these are the ids listed:

"SDBSnapshot": {
"us-west-2": {"snap": "snap-00ae1a58"},
"us-west-1": {"snap": "snap-2f2c1516"},
"sa-east-1": {"snap": "snap-356b2302"},
"us-east-1": {"snap": "snap-5f580f42"},
"ap-southeast-1": {"snap": "snap-a7be23b6"},
"ap-southeast-2": {"snap": "snap-e1f4bfed"},
"ap-northeast-1": {"snap": "snap-b970ec81"},
"eu-west-1": {"snap": "snap-056ada2d"},
"eu-central-1": {"snap": "snap-2c830027"}
}

but as the image attached shows, instances cannot launch due to this

kube-dns addon fails in air gapped environment

the mandatory-addons-0-kube-dns task fails in an air gapped environment due to inability to pull the following resources

    "container": {
      "docker": {
        "kubedns": "gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.5",
        "kubedns-dnsmasq": "gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.5",
        "kubedns-sidecar": "gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.5",
        "heapster": "gcr.io/google_containers/heapster-amd64:v1.4.2",
        "dashboard": "gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.3"
      }
    },

[FIXED] kubectl-tunnel cannot bind: Cannot assign requested address

make kubectl-tunnel
ssh -i .id_key -f -o "UserKnownHostsFile=/dev/null" -o "StrictHostKeyChecking=no" -o "ServerAliveInterval=120" \
	 -N -L 9000:apiserver-insecure.kubernetes.l4lb.thisdcos.directory:9000 \
	[email protected]
Warning: Permanently added '1.2.3.4' (ECDSA) to the list of known hosts.
bind: Cannot assign requested address
channel_setup_fwd_listener_tcpip: cannot listen to port: 9000
Could not request local forwarding.

Internet via proxy

I run DCOS on coreOS and i just try Beta-Kubernetes on DCOS 1.10.0
I use proxy to connect to Internet.

All the tasks are in a running state, but when i try to install a new pod i get the following error

createPodSandbox for pod "redis-master-1405623842-zt2hq_default(97f75286-972a-11e7-937b-e41f13303b18)" failed: rpc error: code = 2 desc = unable to pull sandbox image
"gcr.io/google_containers/pause-amd64:3.0": Error response from daemon: {"message":"Get https://gcr.io/v1/_ping: dial tcp 74.125.133.82:443: i/o timeout"}

I tried to pull gcr.io/google_containers/pause-amd64:3.0 from command line on my coreos host and it works.

Can we use this framework with a proxy to access Internet ?

Kubernetes on DC/OS general availability timeline

Hi - I'm currently just getting started with the Kubernetes on DC/OS beta and was asked to reach out about what the timeline is for general availability.

I saw the beta warning added in September 2017 with DC/OS 1.10 and was curious if this status has changed since then. Does GA come with the 1.11 release or before/after?

I searched the docs and release policy but didn't see this info out there yet.

@dmmcquay @pires It looks like one of you might be the best person to ask.

Accessing a Service from outside the cluster

I have deployed kubernetes cluster using make deploy command mentioned in README.md.
Nginx is deployed on top this cluster and exposed using NodePort service. If I try to hit the nodeIP:nodePort I get connection refused error.

I can see service is listening on nodePort.

netstat -tulpn | grep 30302
tcp6       0      0 :::30302                :::*                    LISTEN      7168/./kube-proxy

Hitting the endpoint from same machine results in same error.

curl -v http://localhost:30302
* Rebuilt URL to: http://localhost:30302/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* connect to 127.0.0.1 port 30302 failed: Connection refused
* Failed to connect to localhost port 30302: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 30302: Connection refuse

Creating service of type 'LoadBalancer' remains in pending state.
What is the right way to access deployed kubernetes workloads outside the cluster ? Are there any additional steps required apart from creating a kubernetes service?

Can't uninstall kubernetes service

We had k8s service with 5 nodes (21 running tasks) on DC/OC 1.10. I noticed that some tasks failed and today only 8 tasks were running. I tried to uninstall service by:

> dcos package uninstall beta-kubernetes --app-id=/kubernetes
WARNING: This action cannot be undone. This will uninstall [/kubernetes] and delete all of its persistent (logs, configurations, database artifacts, everything).
Please type the name of the service to confirm: /kubernetes
Uninstalled package [beta-kubernetes] version [0.2.1-1.7.7-beta]
DC/OS Kubernetes has been uninstalled.

However, service wasn't completely removed and now I see only 1 task kubernetes.565e588c-b318-11e7-963f-46f4cd883ac6 in unhealthy state. Tried to run uninstall command again, but it didn't help.
How I can delete the service?

2nd replica for dns fails

I'm running open source DC/OS 1.11 kubernetes. When I start with one kublet, one kube-dns starts up and everything works perfect. When I start with two or more kublets, two kube-dns startup and the second one fails. This appears to cause failures in the metrics server and dashboard.

po/kube-dns-754f9cd4f5-gxxnf               2/3       CrashLoopBackOff   4          4m
po/kube-dns-754f9cd4f5-zkshx               2/3       Running            4          4m
po/kubernetes-dashboard-5cfddd7d5b-qxtbz   1/1       Running            2          2m
po/metrics-server-54974fd587-6nrnf         1/1       Running            0          2m

NAME                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
svc/kube-dns               ClusterIP   10.90.0.2      <none>        53/UDP,53/TCP   4m
svc/kubernetes-dashboard   ClusterIP   10.90.68.207   <none>        80/TCP          2m
svc/metrics-server         ClusterIP   10.90.218.94   <none>        443/TCP         2m

I0314 13:12:58.423621       1 dns.go:48] version: 1.14.8
I0314 13:12:58.424580       1 server.go:71] Using configuration read from directory: /kube-dns-config with period 10s
I0314 13:12:58.424615       1 server.go:119] FLAG: --alsologtostderr="false"
I0314 13:12:58.424623       1 server.go:119] FLAG: --config-dir="/kube-dns-config"
I0314 13:12:58.424629       1 server.go:119] FLAG: --config-map=""
I0314 13:12:58.424633       1 server.go:119] FLAG: --config-map-namespace="kube-system"
I0314 13:12:58.424638       1 server.go:119] FLAG: --config-period="10s"
I0314 13:12:58.424644       1 server.go:119] FLAG: --dns-bind-address="0.0.0.0"
I0314 13:12:58.424648       1 server.go:119] FLAG: --dns-port="10053"
I0314 13:12:58.424653       1 server.go:119] FLAG: --domain="cluster.local."
I0314 13:12:58.424659       1 server.go:119] FLAG: --federations=""
I0314 13:12:58.424666       1 server.go:119] FLAG: --healthz-port="8081"
I0314 13:12:58.424670       1 server.go:119] FLAG: --initial-sync-timeout="1m0s"
I0314 13:12:58.424675       1 server.go:119] FLAG: --kube-master-url=""
I0314 13:12:58.424680       1 server.go:119] FLAG: --kubecfg-file=""
I0314 13:12:58.424684       1 server.go:119] FLAG: --log-backtrace-at=":0"
I0314 13:12:58.424691       1 server.go:119] FLAG: --log-dir=""
I0314 13:12:58.424698       1 server.go:119] FLAG: --log-flush-frequency="5s"
I0314 13:12:58.424702       1 server.go:119] FLAG: --logtostderr="true"
I0314 13:12:58.424707       1 server.go:119] FLAG: --nameservers=""
I0314 13:12:58.424711       1 server.go:119] FLAG: --stderrthreshold="2"
I0314 13:12:58.424717       1 server.go:119] FLAG: --v="2"
I0314 13:12:58.424723       1 server.go:119] FLAG: --version="false"
I0314 13:12:58.424729       1 server.go:119] FLAG: --vmodule=""
I0314 13:12:58.424819       1 server.go:201] Starting SkyDNS server (0.0.0.0:10053)
I0314 13:12:58.425795       1 server.go:220] Skydns metrics enabled (/metrics:10055)
I0314 13:12:58.425977       1 dns.go:146] Starting endpointsController
I0314 13:12:58.426116       1 dns.go:149] Starting serviceController
I0314 13:12:58.426248       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0314 13:12:58.426917       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0314 13:12:58.427036       1 sync.go:177] Updated upstreamNameservers to [198.51.100.1]
I0314 13:12:58.927318       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:12:59.427268       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:12:59.927243       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:00.427277       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:00.927293       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:01.427263       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:01.927404       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:02.427330       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:02.927274       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:03.427224       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:03.927308       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:04.427364       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:04.927257       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:05.427258       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:05.927243       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:06.427304       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:06.927258       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:07.427254       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:07.927309       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:08.427261       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:08.927315       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:09.427358       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:09.927340       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:10.427366       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:10.927267       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:11.427363       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:11.927262       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:12.427283       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:12.927322       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:13.427269       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:13.927318       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:14.427345       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:14.927240       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:15.427340       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:15.927341       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:16.427322       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:16.927331       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:17.427316       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:17.927327       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:18.427298       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:18.927315       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:19.427348       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:19.927324       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:20.427275       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:20.927310       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:21.427367       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:21.927373       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:22.427311       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:22.927262       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:23.427212       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:23.927397       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:24.427248       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:24.927296       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:25.427363       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:25.927285       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:26.427304       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:26.927312       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:27.427404       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:27.927336       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:28.427290       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
E0314 13:13:28.427771       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.90.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.90.0.1:443: i/o timeout
E0314 13:13:28.427776       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.90.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.90.0.1:443: i/o timeout
I0314 13:13:28.927322       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:29.427223       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:29.927259       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:30.427395       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:30.927284       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:31.427278       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:31.927355       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:32.427330       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:32.927349       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:33.427322       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:33.927278       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:34.427274       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:34.927287       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:35.427233       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:35.927407       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:36.427259       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:36.927291       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:37.427379       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:37.927260       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:38.427216       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:38.927398       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:39.427310       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:39.927314       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:40.427307       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:40.927264       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:41.427321       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:41.927328       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:42.427268       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:42.927327       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:43.427330       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:43.927315       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:44.427381       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:44.927297       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:45.427309       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:45.927282       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:46.427412       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:46.927293       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:47.427319       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:47.927321       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:48.427322       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:48.927298       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:49.427309       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:49.927396       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:50.427271       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:50.927390       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:51.427366       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:51.927335       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:52.427294       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:52.927327       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:53.427262       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:53.927313       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:54.427310       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:54.927349       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:55.427254       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:55.927368       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:56.427365       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:56.927281       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:57.427326       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0314 13:13:57.927370       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
F0314 13:13:58.427216       1 dns.go:167] Timeout waiting for initialization

GCE Instance templates do not get deleted

Deleting cluster with with make destroy-dcos leaves GCE Instance template

Fluentd not able to tail logs

Create new cluster:
Mesosphere DC/OS Version 1.11.0

Apply Kubernetes cloud provider support (don't think there is relevance though):
https://aws.amazon.com/blogs/opensource/cloud-provider-support-kubernetes-dcos/
Kubernetes version v1.9.4

Starting Fluentd daemonset:
https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-elasticsearch.yaml

Logs fluentd pod output:

2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [info]: #0 starting fluentd worker pid=15 ppid=1 worker=0
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/kube-dns-754f9cd4f5-pnfnk_kube-system_dnsmasq-e75753d29a5e4879f712525b268575a6c9fbaf55a17d677037980d0e2bd75495.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/metrics-server-54974fd587-cdjc7_kube-system_metrics-server-270577277bc1213532129c467f07c6bf6413c7f25b7d219717e9422cc437117d.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/fluentd-cxkfr_kube-system_fluentd-2b138775674b9b090d1f2492bec0732346a89bf8b2a3758c6fd4d8d9abcb19a9.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/kubernetes-dashboard-5cfddd7d5b-cj8c7_kube-system_kubernetes-dashboard-171e2225ecb06d685a38cd61fe04f380cabf91c4bea77c4739117155a3dbe674.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/kube-dns-754f9cd4f5-pnfnk_kube-system_kubedns-eb34aa08266b0e7b317fb7e21f1857ac727ee98354559d291ec425d08601d3ad.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/kube-dns-754f9cd4f5-pnfnk_kube-system_sidecar-c6c1709f49fe1c16630531ed199a027a1b48fab093ead6f339015d6936b6aec2.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [info]: #0 fluentd worker is now running worker=0

As earlier today the logs ended by a non able to follow symlink. Seems like sort of the same issue as given at kubernetes/kubernetes#39225, however the thread is rather old.

Drop SSH tunnel

When we have figured out if MLB is causing timeouts when executing kubectl {logs,attach,exec,port-forward,proxy}, and if we find a fix, we should drop the SSH tunnel.

Please, let's not forget to update the conformance docs - and make sure tests run!

"Bad CPU type in executable" with macOS Catalina

After upgrading to macOS Catalina, I'm not able to run commands like dcos kubernetes cluster create.

Here is what I get when I try to create a cluster:

dcos -vv kubernetes cluster create --yes --options="cluster-options.json" --package-version="2.3.0-1.14.1"

Loading plugin 'dcos-core-cli'...
Loading plugin 'kubernetes'...
fork/exec /Users/theo/.dcos/clusters/<UUID>/subcommands/kubernetes/env/bin/dcos-kubernetes: bad CPU type in executable

I suspect the upgrade to Catalina to be the issue 💥

kube-dns crashes because it can't access Kubernetes API

@wyfligamo reported the following in #57:

(issue comments edited by @pires, in order to make things more clear here)

my kube-dns pod status is CrashLoopBackOff and kubedns container log says

[root@localhost ~]# kubectl get pods --all-namespaces 
NAMESPACE     NAME                        READY     STATUS             RESTARTS   AGE
kube-system   heapster-7d7b7f4f87-brp2l   1/1       Running            0          21h
kube-system   kube-dns-6dc79b7c4-dbw5g    1/3       CrashLoopBackOff   506        21h

My k8s only start kube-dns and heapster pods, is that right? they can't communicate with api service, but I can get api response with browser outside the cluster

[root@localhost ~]# kubectl logs --namespace=kube-system heapster-7d7b7f4f87-brp2l
E0202 05:26:09.524966       1 reflector.go:190] k8s.io/heapster/metrics/heapster.go:322: Failed to list *v1.Pod: Get https://kubernetes.default/api/v1/pods?resourceVersion=0: dial tcp: i/o timeout

[root@localhost ~]# kubectl logs --namespace=kube-system kube-dns-6dc79b7c4-dbw5g kubedns
E0202 05:23:47.244936       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.100.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.100.0.1:443: i/o timeout
E0202 05:23:47.244981       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.100.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.100.0.1:443: i/o timeout
I0202 05:23:47.742922       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0202 05:23:48.242824       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0202 05:23:48.742850       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0202 05:23:49.242879       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...

Unable to deploy with local universe.

cd /root
git clone https://github.com/mesosphere/universe.git --branch version-3.x
cd universe/docker/local-universe/
sudo make base
sudo make DCOS_VERSION=1.10 DCOS_PACKAGE_INCLUDE="beta-kubernetes:0.3.0-1.7.10-beta" local-universe

Fails with:

Successfully built xxxx
Errors: ['beta-kubernetes']
These packages are not included in the image.

Support 1.1.0-1.10.3 or newer

Due to the 1.0.1-1.10.3 breaking changes, there needs to be some work on this quickstart to automate exposing the Kubernetes API.

Add flannel error

quickstart version: beta-kubernetes:0.4.0-1.9.0-beta
dcos version: 1.10.4

I installed kubernetes with web ui,it seems every service in dcos thing is going on and I can get three Ready nodes

[root@localhost ~]# kubectl get nodes
NAME                                   STATUS    ROLES     AGE       VERSION
kube-node-0-kubelet.kubernetes.mesos   Ready         23h       v1.9.0
kube-node-1-kubelet.kubernetes.mesos   Ready         23h       v1.9.0
kube-node-2-kubelet.kubernetes.mesos   Ready         23h       v1.9.0

Then I try to add flannel to cluster, but get an error

[root@localhost ~]# kubectl get pods --namespace=kube-system
NAME                        READY     STATUS             RESTARTS   AGE
heapster-7d7b7f4f87-j4j4j   1/1       Running            0          23h
kube-dns-6dc79b7c4-kgmc2    2/3       CrashLoopBackOff   555        23h
kube-flannel-ds-9mkw4       0/1       CrashLoopBackOff   4          2m
kube-flannel-ds-s7w6h       0/1       CrashLoopBackOff   4          2m
kube-flannel-ds-xncpc       0/1       CrashLoopBackOff   4          2m

Logs look like this

[root@localhost ~]# kubectl logs --namespace=kube-system kube-flannel-ds-9mkw4 
I0201 02:24:56.214697       1 main.go:474] Determining IP address of default interface
I0201 02:24:56.218057       1 main.go:487] Using interface with name eth0 and address 10.100.100.15
I0201 02:24:56.218141       1 main.go:504] Defaulting external address to interface address (10.100.100.15)
I0201 02:24:56.277590       1 kube.go:130] Waiting 10m0s for node controller to sync
I0201 02:24:56.277757       1 kube.go:283] Starting kube subnet manager
I0201 02:24:57.277999       1 kube.go:137] Node controller sync successful
I0201 02:24:57.278089       1 main.go:234] Created subnet manager: Kubernetes Subnet Manager - kube-node-1-kubelet.kubernetes.mesos
I0201 02:24:57.278107       1 main.go:237] Installing signal handlers
I0201 02:24:57.278338       1 main.go:352] Found network config - Backend type: vxlan
I0201 02:24:57.278490       1 vxlan.go:119] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E0201 02:24:57.279419       1 main.go:279] Error registering network: failed to acquire lease: node "kube-node-1-kubelet.kubernetes.mesos" pod cidr not assigned
I0201 02:24:57.279476       1 main.go:332] Stopping shutdownHandler...

How can I fix this?

CLI-related make goals are confusing

get-cli downloads kubectl and dcos to the working directory. It never checks if the binaries are already available, if the version matches, etc. before downloading, so every call downloads the binaries to the working directory.
setup-cli calls dcos from the working directory. It doesn't check if the binary is available, if the version matches, etc.

Then, references to kubectl always assume that kubectl is available in $PATH. If that's the case, we may end up with a kubectl v1.7.x in $PATH being called to interact with Kubernetes 1.10.3, even after get-cli downloaded kubectl v1.10.3 to the working directory.

I ran into this just now and it was painful to figure out.

My suggestion is to keep thing simple and just check for dcos and kubectl in $PATH, and show a warning about potential conflicts if their versions don't match the server-side versions.

kubernetes-dashboard not found

I have installed kubernetes-1.0.2-1.9.6 on a DC/OS 0.11 cluster and the Kubernetes Dashboard is not available. The following is returned...

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "services \"kubernetes-dashboard\" not found",
  "reason": "NotFound",
  "details": {
    "name": "kubernetes-dashboard",
    "kind": "services"
  },
  "code": 404
}

I did previously have it working, but after changing a configuration setting (public_nodes) and restarting this issue occured. I have tried re-installing from Kubernetes and even restoring the master node to a snapshot taken before I first did the Kubernetes install and get the same result. I'm guessing this may be to do with which nodes the various kubernetes tasks are allocated to.

Cheers

Daniel Sherwood

kubeconfig fails

I have kubernetes-1.0.2-1.9.6 installed on a DC/OS 0.11 cluster. The kubconfig command fails as follows...

[p3061@cuckoo-accessvm-p3061 ~]$ dcos kubernetes kubeconfig
failed to retrieve 'cluster.name'

Cheers

Daniel Sherwood

Error in Kubernetes-proxy log

Running DCOS 1.11 kubernetes 1.0.1-1.9.6. After launching kubernetes I see the following error in the kubernetes-proxy log. Startup is looking for a directory /var/log/nginx to exist. This directory does not exist in a generic Centos 7 install. Creating this directory on the nodes resolves the error at next launch.

I0323 15:49:06.166402 7 executor.cpp:651] Forked command at 12
nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (2: No such file or directory)

Can't uninstall

Kubernetes service keeps crashing when I provision it from UI or from dcos console.
I see only one task running (name is changing):

kubernetes.fcf88214-b366-11e7-963f-46f4cd883ac6

and a 200+ completed tasks (failed/killed).
Trying to install beta-kubernetes 0.2.2-1.7.7-beta on DC/OS 1.10.
I tried to deleted kubernetes, uninstall command said that it's removed, but I see that kubernetes service is still exist. The only way to remove it is:

dcos marathon app remove /kubernetes

kubernetes CLI package error

When I try to create a k8s cluster on dcos cluster - open source.

➜  ws dcos package install kubernetes
Extracting "dcos-core-cli"...
By Deploying, you agree to the Terms and Conditions https://mesosphere.com/catalog-terms-conditions/#certified-services
Mesosphere Kubernetes Engine
Continue installing? [yes/no] yes
Installing Marathon app for package [kubernetes] version [2.2.2-1.13.5]
Installing CLI subcommand for package [kubernetes] version [2.2.2-1.13.5]
New command available: dcos kubernetes
The Mesosphere Kubernetes Engine service is being installed.
➜  vim kc1.json
{
  "service": {
        "name": "kube-cluster",
          "log_level": "INFO"
    }
}
➜  dcos kubernetes cluster create --options=kc1.json
➜  dcos -vv kubernetes cluster create --options=kc1.json
Loading plugin 'dcos-core-cli'...
Loading plugin 'kubernetes'...
Loading plugin 'dcos-core-cli'...
Loading plugin 'kubernetes'...
fork/exec /home/afaik/.dcos/clusters/34bf627a-f463-47a1-950d-8fef42079a99/subcommands/kubernetes/env/bin/dcos-kubernetes: exec format error
➜  vim kc1.json
➜   dcos --version
dcoscli.version=0.7.11
dcos.version=1.12.3
dcos.commit=080f0e7561466c9cd2cc77c54024885607feb689
dcos.bootstrap-id=bf6a8e4690f63cc7e1da8e9b730e96b7c603078e
➜   dcos package list
NAME        VERSION       APP          COMMAND     DESCRIPTION
kubernetes  2.2.2-1.13.5  /kubernetes  kubernetes  Mesosphere Kubernetes Engine

Invalid SSH key format error on GCE dashboard

I have created DCOS+Kubernetes cluster on GCE using procedure mentioned in README.md.
When I try to edit any parameters (Network tags etc.) for VMs spawned by above procedure, GCE dashboard does not allow to save any changes since the SSH public key already injected does not follow standard SSH key format. GCE dashboard shows bellow error:

Invalid key. Required format: <protocol> <key-blob> <[email protected]> or <protocol> <key-blob> google-ssh {"userName":"<[email protected]>","expireOn":"<date>"}

This makes doing any changes to VM configuration from GCE dashboard impossible.

Not able to access host binded volume in Kubernetes

I have some volumes mounted on every node as /dcos/volume[number] , I am not able access node volumes in kuberenets nodes

JS exception when creating Kubernetes service using GUI

I can't create kubernetes service (1.7.6-1.7.7) using GUI in DC/OS 1.10.0. However Iwas able to deploy kubernetes 1.7.5 before.
My steps are as follows: Services->Run a Service->Install a Package->beta kubernetes.
When I click Configure or Deploy nothing happens. In web console I see errors:
for Configure

tracker.js:38 Uncaught Error: TrackJS Caught: Cannot convert undefined or null to object
    at keys (<anonymous>)
    at SchemaUtil.js:279
    at Array.forEach (<anonymous>)
    at Object.schemaToMultipleDefinition (SchemaUtil.js:269)
    at t.value (SchemaForm.js:245)
    at t.value (SchemaForm.js:49)
    at h.performInitialMount (vendor.574d2ad67f110a649eab.js:sourcemap:18)
    at h.mountComponent (vendor.574d2ad67f110a649eab.js:sourcemap:18)
    at Object.mountComponent (vendor.574d2ad67f110a649eab.js:sourcemap:3)
    at p.mountChildren (vendor.574d2ad67f110a649eab.js:sourcemap:18)

for Deploy

tracker.js:38 Uncaught Error: TrackJS Caught: Cannot convert undefined or null to object
    at keys (<anonymous>)
    at SchemaUtil.js:279
    at Array.forEach (<anonymous>)
    at Object.schemaToMultipleDefinition (SchemaUtil.js:269)
    at t.value (SchemaForm.js:245)
    at t.value (SchemaForm.js:49)
    at h.performInitialMount (ReactCompositeComponent.js:351)
    at h.mountComponent (ReactCompositeComponent.js:258)
    at Object.mountComponent (ReactReconciler.js:46)
    at p.mountChildren (ReactMultiChild.js:238)

Add ability to specify existing etcd cluster in options.

Certain folks may have an external etcd cluster running and want to reuse it.

Anyone using the portworx framework will have one, as will certain folks running other services.

DC/OS clusters running on CoreOS will also likely have etcd available.

authentication issue

hi guys, following the quickstart guide on a local DC/OS community cluster, I've successfully deployed kubernetes + kubernetes cluster , yet kubectl complains about authentication even whilst setting the cluster with "Always allow" authorization:

if I hit enter to avoid credential usage it just hangs otherwise:

kubectl get nodes
Please enter Username: a
Please enter Password: *
error: You must be logged in to the server (Unauthorized)

curl also outputs:

$ curl --insecure https://xx.xx.xx.xx:6443/api
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

Any suggestion much appreciated. (PS sorry for posting here could be not a code/examples related issue)

Failed to Write to cpu.cfs_quota_us

Hi!

I am running a small Kubernetes Cluster on DC/OS on GCP. For one of my deployments I see the error below, when trying to run a Pod with resource limits higher than the number of CPUs specified as reserved resources, i.e. the "Allocatable" resources, e.g. when I specify 3 CPUs reserved for pods below error happens with 5 CPUs or more in the container resource limits, but not with up to 4 CPUs. Is this the expected behavior? Any pointers as to what is going wrong here are appreciated.

Cheers and thank you,

Konstantin

Error Message
Error: failed to start container "<placeholder>": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:367: setting cgroup config for procHooks process caused \\\"failed to write 500000 to cpu.cfs_quota_us: write /sys/fs/cgroup/cpu,cpuacct/mesos/c141cb01-5d9c-483c-b6dd-5329d4adb166/kubepods/burstable/pod25a067c5-9b04-11e8-9e77-42010a400405/<placeholder>/cpu.cfs_quota_us: invalid argument\\\"\"": unknown

Container Resource Limits

    Limits:
      cpu:     5
      memory:  16Gi
    Requests:
      cpu:      100m
      memory:   256Mi

Node Resources

Capacity:
 cpu:                8
 ephemeral-storage:  127617744Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             30880100Ki
 pods:               80
Allocatable:
 cpu:                3
 ephemeral-storage:  127515344Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             4000Mi
 pods:               80

Kubernetes Framework Config

{
  "service": {
    "name": "kubernetes",
    "sleep": 1000,
    "service_account": "",
    "service_account_secret": "",
    "log_level": "INFO"
  },
  "kubernetes": {
    "authorization_mode": "AlwaysAllow",
    "high_availability": false,
    "enable_insecure_port": true,
    "control_plane_placement": "",
    "service_cidr": "10.100.0.0/16",
    "network_provider": "dcos",
    "cloud_provider": "(none)",
    "node_count": 3,
    "node_placement": "",
    "public_node_count": 1,
    "public_node_placement": "",
    "reserved_resources": {
      "kube_cpus": 3,
      "kube_mem": 4096,
      "kube_disk": 10240,
      "system_cpus": 1,
      "system_mem": 1024
    }
  },
  "etcd": {
    "cpus": 0.5,
    "mem": 1024,
    "data_disk": 3072,
    "wal_disk": 512,
    "disk_type": "ROOT"
  },
  "scheduler": {
    "cpus": 0.5,
    "mem": 512
  },
  "controller_manager": {
    "cpus": 0.5,
    "mem": 512
  },
  "apiserver": {
    "cpus": 0.5,
    "mem": 1024
  },
  "kube_proxy": {
    "cpus": 0.1,
    "mem": 512
  }
}

How to access kubernetes service from outside Mesos?

I installed kubernetes on top of Mesos on AWS using Cloudformation. Installed dashboard and was able to open UI using ssh port forwarding.
However I want to open dashboard from outside Mesos - how I can do it?
I don't understand how to expose port of kubernetes service with Mesos.

Cloud Provider option makes kube-controller-manager to fail

I am using the new version of Kubernetes 1.9.6 and I noticed that no matter what kind of configuration I use (even the default options), when enabling the aws cloud-provider the kube-controller-manager is unable to start. The error that is throwing is the following:

 ######  Starting Kube CONTROLLER MANAGER -- kube-controller-manager-0-instance ###### 
I0406 16:15:12.779558      10 controllermanager.go:108] Version: v1.9.6
I0406 16:15:12.786006      10 leaderelection.go:174] attempting to acquire leader lease...
I0406 16:15:12.871094      10 leaderelection.go:184] successfully acquired lease kube-system/kube-controller-manager
I0406 16:15:12.871316      10 event.go:218] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"f275b8cf-39ab-11e8-9288-028a57b514c0", APIVersion:"v1", ResourceVersion:"7486", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-10-0-1-211.eu-central-1.compute.internal became leader
I0406 16:15:12.893967      10 aws.go:1000] Building AWS cloudprovider
I0406 16:15:12.894005      10 aws.go:963] Zone not specified in configuration file; querying AWS metadata service
E0406 16:15:13.071847      10 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
W0406 16:15:13.071873      10 tags.go:78] AWS cloud - no clusterID filtering applied for shared resources; do not run multiple clusters in this AZ.
F0406 16:15:13.071930      10 controllermanager.go:150] error building controller context: no ClusterID Found.  A ClusterID is required for the cloud provider to function properly.  This check can be bypassed by setting the allow-untagged-cloud option

If I disable the option, Kubernetes is able to run correctly again.
It's worth to mention that I tried to install Kubernetes with the cloud-provider set on aws after a brand new DC/OS installation. Still though, the result was negative.
This is a major issue because I need to let Traefik to expose a Loadbalancer so that I can connect my services.