Git Product home page Git Product logo

nats-operator's Introduction

NATS Operator

⚠️ The recommended way of running NATS on Kubernetes is by using the Helm charts. If looking for JetStream support, this is supported in the Helm charts. The NATS Operator is not recommended to be used for new deployments.

License Apache 2.0 Build Status Version

NATS Operator manages NATS clusters atop Kubernetes using CRDs. If looking to run NATS on K8S without the operator you can also find Helm charts in the nats-io/k8s repo. You can also find more info about running NATS on Kubernetes in the docs as well as a minimal setup using StatefulSets only without using the operator to get started here.

Requirements

Introduction

NATS Operator provides a NatsCluster Custom Resources Definition (CRD) that models a NATS cluster. This CRD allows for specifying the desired size and version for a NATS cluster, as well as several other advanced options:

apiVersion: nats.io/v1alpha2
kind: NatsCluster
metadata:
  name: example-nats-cluster
spec:
  size: 3
  version: "2.1.8"

NATS Operator monitors creation/modification/deletion of NatsCluster resources and reacts by attempting to perform the any necessary operations on the associated NATS clusters in order to align their current status with the desired one.

Installing

NATS Operator supports two different operation modes:

  • Namespace-scoped (classic): NATS Operator manages NatsCluster resources on the Kubernetes namespace where it is deployed.
  • Cluster-scoped (experimental): NATS Operator manages NatsCluster resources across all namespaces in the Kubernetes cluster.

The operation mode must be chosen when installing NATS Operator and cannot be changed later.

Namespace-scoped installation

To perform a namespace-scoped installation of NATS Operator in the Kubernetes cluster pointed at by the current context, you may run:

$ kubectl apply -f https://github.com/nats-io/nats-operator/releases/latest/download/00-prereqs.yaml
$ kubectl apply -f https://github.com/nats-io/nats-operator/releases/latest/download/10-deployment.yaml

This will, by default, install NATS Operator in the default namespace and observe NatsCluster resources created in the default namespace, alone. In order to install in a different namespace, you must first create said namespace and edit the manifests above in order to specify its name wherever necessary.

WARNING: To perform multiple namespace-scoped installations of NATS Operator, you must manually edit the nats-operator-binding cluster role binding in deploy/00-prereqs.yaml file in order to add all the required service accounts. Failing to do so may cause all NATS Operator instances to malfunction.

WARNING: When performing a namespace-scoped installation of NATS Operator, you must make sure that all other namespace-scoped installations that may exist in the Kubernetes cluster share the same version. Installing different versions of NATS Operator in the same Kubernetes cluster may cause unexpected behavior as the schema of the CRDs which NATS Operator registers may change between versions.

Alternatively, you may use Helm to perform a namespace-scoped installation of NATS Operator. To do so you may go to helm/nats-operator and use the Helm charts found in that repo.

Cluster-scoped installation (experimental)

Cluster-scoped installations of NATS Operator must live in the nats-io namespace. This namespace must be created beforehand:

$ kubectl create ns nats-io

Then, you must manually edit the manifests in deployment/ in order to reference the nats-io namespace and to enable the ClusterScoped feature gate in the NATS Operator deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nats-operator
  namespace: nats-io
spec:
  (...)
    spec:
      containers:
      - name: nats-operator
        (...)
        args:
        - nats-operator
        - --feature-gates=ClusterScoped=true
        (...)

Once you have done this, you may install NATS Operator by running:

$ kubectl apply -f https://github.com/nats-io/nats-operator/releases/latest/download/00-prereqs.yaml
$ kubectl apply -f https://github.com/nats-io/nats-operator/releases/latest/download/10-deployment.yaml

WARNING: When performing a cluster-scoped installation of NATS Operator, you must make sure that there are no other deployments of NATS Operator in the Kubernetes cluster. If you have a previous installation of NATS Operator, you must uninstall it before performing a cluster-scoped installation of NATS Operator.

Creating a NATS cluster

Once NATS Operator has been installed, you will be able to confirm that two new CRDs have been registered in the cluster:

$ kubectl get crd
NAME                       CREATED AT
natsclusters.nats.io       2019-01-11T17:16:36Z
natsserviceroles.nats.io   2019-01-11T17:16:40Z

To create a NATS cluster, you must create a NatsCluster resource representing the desired status of the cluster. For example, to create a 3-node NATS cluster you may run:

$ cat <<EOF | kubectl create -f -
apiVersion: nats.io/v1alpha2
kind: NatsCluster
metadata:
  name: example-nats-cluster
spec:
  size: 3
  version: "1.3.0"
EOF

NATS Operator will react to the creation of such a resource by creating three NATS pods. These pods will keep being monitored (and replaced in case of failure) by NATS Operator for as long as this NatsCluster resource exists.

Listing NATS clusters

To list all the NATS clusters:

$ kubectl get nats --all-namespaces
NAMESPACE   NAME                   AGE
default     example-nats-cluster   2m

TLS support

By using a pair of opaque secrets (one for the clients and then another for the routes), it is possible to set TLS for the communication between the clients and also for the transport between the routes:

apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "nats"
spec:
  # Number of nodes in the cluster
  size: 3
  version: "1.3.0"

  tls:
    # Certificates to secure the NATS client connections:
    serverSecret: "nats-clients-tls"

    # Certificates to secure the routes.
    routesSecret: "nats-routes-tls"

In order for TLS to be properly established between the nodes, it is necessary to create a wildcard certificate that matches the subdomain created for the service from the clients and the one for the routes.

By default, the routesSecret has to provide the files: ca.pem, route-key.pem, route.pem, for the CA, server private and public key respectively.

$ kubectl create secret generic nats-routes-tls --from-file=ca.pem --from-file=route-key.pem --from-file=route.pem

Similarly, by default the serverSecret has to provide the files: ca.pem, server-key.pem, and server.pem for the CA, server private key and public key used to secure the connection with the clients.

$ kubectl create secret generic nats-clients-tls --from-file=ca.pem --from-file=server-key.pem --from-file=server.pem

Consider though that you may wish to independently manage the certificate authorities for routes between clusters, to support the ability to roll between CAs or their intermediates.

Any filename in the below can also be an absolute path, allowing you to mount a CA bundle in a place of your choosing.

NATS also supports kubernetes.io/tls secrets (like the ones managed by cert-manager) and any secrets containing a CA, private and public keys with arbitrary names. It is possible to overwrite the default names as follows:

apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "nats"
spec:
  # Number of nodes in the cluster
  size: 3
  version: "1.3.0"

  tls:
    # Certificates to secure the NATS client connections:
    serverSecret: "nats-clients-tls"
    # Name of the CA in serverSecret
    serverSecretCAFileName: "ca.crt"
    # Name of the key in serverSecret
    serverSecretKeyFileName: "tls.key"
    # Name of the certificate in serverSecret
    serverSecretCertFileName: "tls.crt"

    # Certificates to secure the routes.
    routesSecret: "nats-routes-tls"
    # Name of the CA, but not from this secret
    routesSecretCAFileName: "/etc/ca-bundle/routes-bundle.pem"
    # Name of the key in routesSecret
    routesSecretKeyFileName: "tls.key"
    # Name of the certificate in routesSecret
    routesSecretCertFileName: "tls.crt"

  template:
    spec:
      containers:
      - name: "nats"
        volumeMounts:
        - name: "ca-bundle"
          mountPath: "/etc/ca-bundle"
          readOnly: true
      volumes:
      - name: "ca-bundle"
        configMap:
          name: "our-ca-bundle"

Cert-Manager

If cert-manager is available in your cluster, you can easily generate TLS certificates for NATS as follows:

Create a self-signed cluster issuer (or namespace-bound issuer) to create NATS' CA certificate:

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: selfsigning
spec:
  selfSigned: {}

Create your NATS cluster's CA certificate using the new selfsigning issuer:

apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: nats-ca
spec:
  secretName: nats-ca
  duration: 8736h # 1 year
  renewBefore: 240h # 10 days
  issuerRef:
    name: selfsigning
    kind: ClusterIssuer
  commonName: nats-ca
  usages: 
    - cert sign # workaround for odd cert-manager behavior
  organization:
  - Your organization
  isCA: true

Create your NATS cluster issuer based on the new nats-ca CA:

apiVersion: cert-manager.io/v1alpha2
kind: Issuer
metadata:
  name: nats-ca
spec:
  ca:
    secretName: nats-ca

Create your NATS cluster's server certificate (assuming NATS is running in the nats-io namespace, otherwise, set the commonName and dnsNames fields appropriately):

apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: nats-server-tls
spec:
  secretName: nats-server-tls
  duration: 2160h # 90 days
  renewBefore: 240h # 10 days
  usages:
  - signing
  - key encipherment
  - server auth
  issuerRef:
    name: nats-ca
    kind: Issuer
  organization:
  - Your organization
  commonName: nats.nats-io.svc.cluster.local
  dnsNames:
  - nats.nats-io.svc

Create your NATS cluster's routes certificate (assuming NATS is running in the nats-io namespace, otherwise, set the commonName and dnsNames fields appropriately):

apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: nats-routes-tls
spec:
  secretName: nats-routes-tls
  duration: 2160h # 90 days
  renewBefore: 240h # 10 days
  usages:
  - signing
  - key encipherment
  - server auth
  - client auth # included because routes mutually verify each other
  issuerRef:
    name: nats-ca
    kind: Issuer
  organization:
  - Your organization
  commonName: "*.nats-mgmt.nats-io.svc.cluster.local"
  dnsNames:
  - "*.nats-mgmt.nats-io.svc"

Authorization

Using ServiceAccounts

⚠️ The ServiceAccounts uses a very rudimentary approach of config reloading and watching CRDs and advanced K8S APIs that may not be available in your cluster. Instead, the decentralized JWT approach should be preferred, to learn more: https://docs.nats.io/developing-with-nats/tutorials/jwt

The NATS Operator can define permissions based on Roles by using any present ServiceAccount in a namespace. This feature requires a Kubernetes v1.12+ cluster having the TokenRequest API enabled. To try this feature using minikube v0.30.0+, you can configure it to start as follows:

$ minikube start \
    --extra-config=apiserver.service-account-signing-key-file=/var/lib/minikube/certs/sa.key \
    --extra-config=apiserver.service-account-key-file=/var/lib/minikube/certs/sa.pub \
    --extra-config=apiserver.service-account-issuer=api \
    --extra-config=apiserver.service-account-api-audiences=api,spire-server \
    --extra-config=apiserver.authorization-mode=Node,RBAC \
    --extra-config=kubelet.authentication-token-webhook=true

Please note that availability of this feature across Kubernetes offerings may vary widely.

ServiceAccounts integration can then be enabled by setting the enableServiceAccounts flag to true in the NatsCluster configuration.

apiVersion: nats.io/v1alpha2
kind: NatsCluster
metadata:
  name: example-nats
spec:
  size: 3
  version: "1.3.0"

  pod:
    # NOTE: Only supported in Kubernetes v1.12+.
    enableConfigReload: true
  auth:
    # NOTE: Only supported in Kubernetes v1.12+ clusters having the "TokenRequest" API enabled.
    enableServiceAccounts: true

Permissions for a ServiceAccount can be set by creating a NatsServiceRole for that account. In the example below, there are two accounts, one is an admin user that has more permissions.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: nats-admin-user
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nats-user
---
apiVersion: nats.io/v1alpha2
kind: NatsServiceRole
metadata:
  name: nats-user
  namespace: nats-io

  # Specifies which NATS cluster will be mapping this account.
  labels:
    nats_cluster: example-nats
spec:
  permissions:
    publish: ["foo.*", "foo.bar.quux"]
    subscribe: ["foo.bar"]
---
apiVersion: nats.io/v1alpha2
kind: NatsServiceRole
metadata:
  name: nats-admin-user
  namespace: nats-io
  labels:
    nats_cluster: example-nats
spec:
  permissions:
    publish: [">"]
    subscribe: [">"]

The above will create two different Secrets which can then be mounted as volumes for a Pod.

$ kubectl -n nats-io get secrets
NAME                                       TYPE          DATA      AGE
...
nats-admin-user-example-nats-bound-token   Opaque        1         43m
nats-user-example-nats-bound-token         Opaque        1         43m

Please note that NatsServiceRole must be created in the same namespace as NatsCluster is running, but bound-token will be created for ServiceAccount resources that can be placed in various namespaces.

An example of mounting the secret in a Pod can be found below:

apiVersion: v1
kind: Pod
metadata:
  name: nats-user-pod
  labels:
    nats_cluster: example-nats
spec:
  volumes:
    - name: "token"
      projected:
        sources:
        - secret:
            name: "nats-user-example-nats-bound-token"
            items:
              - key: token
                path: "token"
  restartPolicy: Never
  containers:
    - name: nats-ops
      command: ["/bin/sh"]
      image: "wallyqs/nats-ops:latest"
      tty: true
      stdin: true
      stdinOnce: true
      volumeMounts:
      - name: "token"
        mountPath: "/var/run/secrets/nats.io"

Then within the Pod the token can be used to authenticate against the server using the created token.

$ kubectl -n nats-io attach -it nats-user-pod

/go # nats-sub -s nats://nats-user:`cat /var/run/secrets/nats.io/token`@example-nats:4222 hello.world
Listening on [hello.world]
^C
/go # nats-sub -s nats://nats-admin-user:`cat /var/run/secrets/nats.io/token`@example-nats:4222 hello.world
Can't connect: nats: authorization violation

Using a single secret with the explicit configuration.

Authorization can also be set for the server by using a secret where the permissions are defined in JSON:

{
  "users": [
    { "username": "user1", "password": "secret1" },
    { "username": "user2", "password": "secret2",
      "permissions": {
	"publish": ["hello.*"],
	"subscribe": ["hello.world"]
      }
    }
  ],
  "default_permissions": {
    "publish": ["SANDBOX.*"],
    "subscribe": ["PUBLIC.>"]
  }
}

Example of creating a secret to set the permissions:

kubectl create secret generic nats-clients-auth --from-file=clients-auth.json

Now when creating a NATS cluster it is possible to set the permissions as in the following example:

apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
pmetadata:
  name: "example-nats-auth"
spec:
  size: 3
  version: "1.1.0"

  auth:
    # Definition in JSON of the users permissions
    clientsAuthSecret: "nats-clients-auth"

    # How long to wait for authentication
    clientsAuthTimeout: 5

Configuration Reload

On Kubernetes v1.12+ clusters it is possible to enable on-the-fly reloading of configuration for the servers that are part of the cluster. This can also be combined with the authorization support, so in case the user permissions change, then the servers will reload and apply the new permissions.

apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "example-nats-auth"
spec:
  size: 3
  version: "1.1.0"

  pod:
    # Enable on-the-fly NATS Server config reload
    # NOTE: Only supported in Kubernetes v1.12+.
    enableConfigReload: true

    # Possible to customize version of reloader image
    reloaderImage: connecteverything/nats-server-config-reloader
    reloaderImageTag: "0.2.2-v1alpha2"
    reloaderImagePullPolicy: "IfNotPresent"
  auth:
    # Definition in JSON of the users permissions
    clientsAuthSecret: "nats-clients-auth"

    # How long to wait for authentication
    clientsAuthTimeout: 5

Connecting operated NATS clusters to external NATS clusters

By using the extraRoutes field on the spec you can make the operated NATS cluster create routes against clusters outside of Kubernetes:

apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "nats"
spec:
  size: 3
  version: "1.4.1"

  extraRoutes:
    - route: "nats://nats-a.example.com:6222"
    - route: "nats://nats-b.example.com:6222"
    - route: "nats://nats-c.example.com:6222"

It is also possible to connect to another operated NATS cluster as follows:

apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "nats-v2-2"
spec:
  size: 3
  version: "1.4.1"

  extraRoutes:
    - cluster: "nats-v2-1"

Resolvers

The operator only supports the URL() resolver, see example/example-super-cluster.yaml

Development

Building the Docker Image

To build the nats-operator Docker image:

$ docker build -f docker/operator/Dockerfile . -t <image:tag>

To build the nats-server-config-reloader:

$ docker build -f docker/reloader/Dockerfile . -t <image:tag>

You'll need Docker 17.06.0-ce or higher.

nats-operator's People

Contributors

arminbuerkle avatar benconstable avatar bmcustodio avatar devklauss avatar gcolliso avatar georgesapkin avatar matt-christiansen-exa avatar matthiashanel avatar michaelsp avatar mjpitz avatar mtryfoss avatar nberlee avatar nsurfer avatar patricio78 avatar philpennock avatar pires avatar prune998 avatar rayjanoka avatar richerlariviere avatar schancel avatar schantaraud avatar shaunc avatar siredmar avatar smana avatar squat avatar ttjiaa avatar variadico avatar wallyqs avatar whynowy avatar will2817 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nats-operator's Issues

service?

How to create a service?

apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "cryptovalue-nats"
spec:
  size: 3
  version: "1.1.0"

Helm chart

First of all, thank you for your amazing work on Nats and particularly nats-operator.

What do you think of creating a Helm chart for nats-operator? That would be nice I think because we could do something like this:

helm install nats-operator

I have some knowledge in creating helm charts but my global understanding of nats is not so great so I can help creating a base chart for a minimal setup. Edge cases and more specific uses of nats could be supported later. What do you think?

AKS TLS

Has anyone gotten the nats-operator running in AKS using TLS?

Documentation used to generate .pem files for cert / server-cert / ca:
https://docs.docker.com/engine/security/https/

Nats:
apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
name: "nats-example"
spec:
size: 3

auth:
# Definition in JSON of the users permissions
clientsAuthSecret: "nats-clients-auth"

tls:
# Certificates to secure the NATS client connections:
serverSecret: "nats-clients-tls"

The pod starts:
[1] 2019/02/01 01:30:10.672066 [INF] Starting nats-server version 1.3.0
[1] 2019/02/01 01:30:10.672096 [INF] Git commit [eed4fbc]
[1] 2019/02/01 01:30:10.672214 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2019/02/01 01:30:10.672239 [INF] Listening for client connections on 0.0.0.0:4222
[1] 2019/02/01 01:30:10.672266 [INF] TLS required for client connections
[1] 2019/02/01 01:30:10.672269 [INF] Server is ready
[1] 2019/02/01 01:30:10.672400 [INF] Listening for route connections on 0.0.0.0:6222
[1] 2019/02/01 01:30:10.678703 [INF] 10.244.1.22:6222 - rid:1 - Route connection created
[1] 2019/02/01 01:30:10.678811 [INF] 10.244.2.19:6222 - rid:2 - Route connection created
[1] 2019/02/01 01:30:10.680400 [INF] 10.244.2.19:54850 - rid:3 - Route connection created

But then immediately start receiving these errors:
[1] 2019/02/01 01:31:28.071952 [ERR] 10.240.0.4:56227 - cid:31 - TLS handshake error: read tcp 10.244.2.19:4222->10.240.0.4:56227: i/o timeout
[1] 2019/02/01 01:31:28.072088 [ERR] 10.240.0.4:56227 - cid:31 - TLS handshake timeout

Revamp upgrade

Need to find a way to upgrade live clusters with minimum disruption, e.g. no message loss.

RBAC support

Currently it is not possible to run provided resources when RBAC is enabled.

can't set NatsCluster pods to specific nodes

I am trying to have all the nats related pods on particular nodes. I have everything but the NatsCluster pods working

---
apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "my-nats"
spec:
  size: 3
  template:
    spec:
      nodeSelector:
        my.pool: nats

I have the nodeselector in place for NatsStreamingCluster and it works.
But for NatsCluster, it puts the pods randomly on different nodes.

Deploy in namespace without using a ClusterRole

I want nats operator to be fully self-contained in a namespace, without requiring cluster level permissions. To that end I modified the yamls to use Role instead of ClusterRole. But I get this error from nats operator:

time="2018-11-30T18:40:50Z" level=error msg="initialization failed: fail to create CRD: customresourcedefinitions.apiextensions.k8s.io is forbidden: User \"system:serviceaccount:molly-dev:nats-operator\" cannot create resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope" pkg=controller

Why does nats-operator need to create customresourcedefinitions?

I've been googling for the issue, and the closes I can find that is relevant is the etcd-operator: https://github.com/coreos/etcd-operator/blob/master/doc/user/rbac.md#role-vs-clusterrole

--create-crd=false Creates a CR without first creating a CRD.
In this mode the operator can be run with just a Role without the permission to create a CRD.

Maybe nats operator needs a similar option?

Here's my full yaml file:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: nats-operator
rules:
# Allow creating CRDs
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs: ["*"]
# Allow all actions on NatsClusters
- apiGroups:
  - nats.io
  resources:
  - natsclusters
  - natsserviceroles
  verbs: ["*"]
# Allow actions on basic Kubernetes objects
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  - pods
  - services
  - serviceaccounts
  - serviceaccounts/token
  - endpoints
  - events
  verbs: ["*"]

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nats-operator

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: nats-operator-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: nats-operator
subjects:
- kind: ServiceAccount
  name: nats-operator

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: natsclusters.nats.io
spec:
  group: nats.io
  names:
    kind: NatsCluster
    listKind: NatsClusterList
    plural: natsclusters
    singular: natscluster
    shortNames:
    - nats
  scope: Namespaced
  version: v1alpha2

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: natsserviceroles.nats.io
spec:
  group: nats.io
  names:
    kind: NatsServiceRole
    listKind: NatsServiceRoleList
    plural: natsserviceroles
    singular: natsservicerole
  scope: Namespaced
  version: v1alpha2

---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: nats-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      name: nats-operator
  template:
    metadata:
      labels:
        name: nats-operator
    spec:
      serviceAccountName: nats-operator
      containers:
      - name: nats-operator
        image: connecteverything/nats-operator:0.3.0-v1alpha2
        imagePullPolicy: Always
        ports:
        - name: readyz
          containerPort: 8080
        env:
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        readinessProbe:
          httpGet:
            path: /readyz
            port: readyz
          initialDelaySeconds: 15
          timeoutSeconds: 3

Support Cluster-Wide Operator

Would love for the nats-operator to function in a similar way to istio where there is a management namespace and the operator can watch all namespaces (w/ opt-out). Our main use can for this is we leverage namespaces at the environment level.

My ideal setup with where resources are defined would be as follows:

Cluster

  • ClusterRole
  • CRDs

ns: nats-system

  • ServiceAccount
  • Deployment
  • ClusterRoleBinding

ns: dev

  • NatsCluster (size: 1)

ns: staging

  • NatsCluster (size: 3)

ns: production

  • NatsCluster (size: 3)

Readiness probe never passes when updating nats-operator deployment

When making changes to nats-operator deployment, the pod in the new replicaset from updated deployment never passes readiness probe until previous nats-operator pod is manually deleted.

Example

For example, I apply the updated deployment (with a newer nats-operator image version):

kubectl apply -f ./nats-operator/ --namespace my-namespace

I observe a new pod is created:

kubectl get pods -l app=nats-operator --namespace=my-namespace
NAME                             READY   STATUS    RESTARTS   AGE
nats-operator-68856b7bb6-tlghf     1/1     Running   0          4h17m
nats-operator-68856b7bb6-ntpwj     0/1     Running   0          15m

But the pod is failing the readiness probe:

kubectl describe pod nats-operator-68856b7bb6-ntpwj --namespace=my-namespace

Name:           nats-operator-68856b7bb6-ntpwj
Namespace:      my-namespace
…
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
…
Warning  Unhealthy              3m29s (x149 over 28m)  kubelet, gke-my-cluster-default-pool-xxx  Readiness probe failed: HTTP probe failed with statuscode: 500

If I delete the old pod:

kubectl delete pod nats-operator-5fc7849b6-tlghf --namespace=my-namespace

The new one becomes Ready:

kubectl get pods -l app=nats-operator --namespace=my-namespace
NAME                             READY   STATUS    RESTARTS   AGE
nats-operator-68856b7bb6-ntpwj   1/1     Running   0          33m

Potential Solution

Change nats-operator deployment to use Recreate strategy (default is a Rolling Update) so the old pod is deleted first:
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy

But this would mean nats-operator might not exist for a short period, is that actually a problem? The NATS cluster should still be in place and functional during that time.

Didn't get any resource after apply example/deployment.yaml

Hi

I'm trying the example in README.md

I did kubectl apply -f https://raw.githubusercontent.com/nats-io/nats-operator/master/example/deployment.yaml

But kubectl get crd got nothing.

here is my kubectl get all :

NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/nats-operator   1         1         1            1           8m

NAME                          DESIRED   CURRENT   READY     AGE
rs/nats-operator-7b89ff4879   1         1         1         8m

NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/nats-operator   1         1         1            1           8m

NAME                          DESIRED   CURRENT   READY     AGE
rs/nats-operator-7b89ff4879   1         1         1         8m

NAME                                READY     STATUS    RESTARTS   AGE
po/nats-operator-7b89ff4879-hqm5j   1/1       Running   0          8m

NAME             TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
svc/kubernetes   ClusterIP   10.11.240.1   <none>        443/TCP   49m

Support config reload

When certain updates are applied to the configmap, it would be convenient to trigger a reload to the NATS server so that new config is applied.

How to update server config?

Hello,

I've been trying to figure out how to set a custom configuration for the NATS Cluster, and I couldn't find anything in the docs or looking for uses of ServerConfig in the code.

Any leads or clues on how to set it?
My specific need is to find out how to increase max_payload_size.

Thanks!

Exposing the NATS server with an External IP

In my use case, subscribers can come from outside the K8s cluster, and thus I need an external IP for the messaging port. What would be the right design to accomplish this in the chart as a configurable parameter?

Kubernetes ServiceAccounts Integration with NATS Authorization

In last release, it was added support for authorization using a custom secret where the credentials are defined in JSON, but we might be able to simplify this by using the service accounts already present in Kubernetes.

Below a full example of a manifest creating 2 pods, one binding to the default service account and the other to a new service account, then a couple of custom ServiceRoles (new CRD managed by the operator) created to define the pub/sub permissions. A NATS client running in one of the pods would be able to use the token provided by the service account, and in case of changes the configuration would be reloaded with the new authorization rules.

# Container using a new service account
nats-sub -s nats://demo-nats-service-account:`cat /var/run/secrets/kubernetes.io/serviceaccount/token`@demo-nats:4222 SANDBOX.hello

# Container using the default service account
nats-sub -s nats://default:`cat /var/run/secrets/kubernetes.io/serviceaccount/token`@demo-nats:4222 SANDBOX.hello

Example manifest of how this would work:

---
apiVersion: nats.io/v1alpha3
kind: NatsCluster
metadata:
  name: demo-nats
spec:
  size: 3
  version: "1.1.0"
  pod:
    enableConfigReload: true
  auth:
    enableServiceAccounts: true
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: demo-nats-service-account
---
apiVersion: nats.io/v1alpha3
kind: ServiceRole
metadata:
  name: demo-nats-role
  namespace: default

  # Specifies which NATS cluster will be mapping this account,
  # (have to create a service role with permission per cluster).
  labels:
    nats_cluster: demo-nats
spec:
  serviceAccountName: demo-nats-service-account
  permissions:
    publish: ["foo.*", "foo.bar.quux"]
    subscribe: ["foo.bar"]
---
apiVersion: nats.io/v1alpha3
kind: ServiceRole
metadata:
  name: demo-nats-default-role
  namespace: default

  # Specifies which NATS cluster will be mapping this account,
  # (have to create a service role with permission per cluster).
  labels:
    nats_cluster: demo-nats
spec:
  serviceAccountName: default
  permissions:
    publish: ["SANDBOX.>"]
    subscribe: ["SANDBOX.>"]
---
apiVersion: nats.io/v1alpha3
kind: ServiceRole
metadata:
  name: default
  namespace: default
  labels:
    nats_cluster: demo-nats
spec:
  serviceAccountName: demo-nats-service-account
  permissions:
    publish: ["foo.*"]
    subscribe: ["foo.bar"]
---
apiVersion: v1
kind: Pod
metadata:
  name: demo-nats-client-pod
spec:
  serviceAccountName: demo-nats-service-account
  restartPolicy: Never
  containers:
    - name: nats-ops
      command: ["/bin/sh"]
      image: "wallyqs/nats-ops:latest"
      tty: true
      stdin: true
      stdinOnce: true
---
apiVersion: v1
kind: Pod
metadata:
  name: demo-nats-client-pod-default-account
spec:
  # No account means using the default service token
  restartPolicy: Never
  containers:
    - name: nats-ops
      command: ["/bin/sh"]
      image: "wallyqs/nats-ops:latest"
      tty: true
      stdin: true
      stdinOnce: true

Support installing the CRD from k8s manifests instead of from the operator

Instead of creating the CRDs from the code, I think it would make sense to create it from k8s manifests while installing the operator itself. This could be done for the helm chart or for all installs.

This would allow:

  • using the crd-install helm hook in helm charts
  • getting rid of a rule in Role definitions when using rbac required to install a CRD:
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs: ["*"]

Implementation:

  • support a command line flag to toggle crd install from the operator
  • in the helm chart, disable crd installs from the operator
  • in the helm chart, remove the rule for CRDs CRUD

Operator exits on disconnect from API server

We should change it so that it retries and reconnect without causing it to restart

leaderelection.RunOrDie(context.TODO(), leaderelection.LeaderElectionConfig{
Lock: rl,
LeaseDuration: 15 * time.Second,
RenewDeadline: 10 * time.Second,
RetryPeriod: 2 * time.Second,
Callbacks: leaderelection.LeaderCallbacks{
OnStartedLeading: func(ctx context.Context) {
run(ctx, kubeCfg, kubeClient)
},
OnStoppedLeading: func() {
logrus.Fatalf("leader election lost")
},
},
})

error viewing pod in gui

Everything works just fine but if i try to look at a pod from the gui i get this:
image

RBAC is enabled and i´ve installed the rbac-yaml.

Helm chart: support permissions configuration

This is an issue for the future me when the PR #62 is merged. Based on this discussion, we want to be able to configure NATS like this:

{
  "users": [
    { "username": "user1", "password": "secret1" },
    { "username": "user2", "password": "secret2",
      "permissions": {
	"publish": ["hello.*"],
	"subscribe": ["hello.world"]
      }
    }
  ],
  "default_permissions": {
    "publish": ["SANDBOX.*"],
    "subscribe": ["PUBLIC.>"]
  }
}

As of now we only support multiple users creation.

Remove adhoc natsconf config generation

In latest release of the server v1.1.0, JSON would be supported for configuring the server: nats-io/nats-server#653
This would mean that can remove the current config generation and just convert to JSON:

func Marshal(conf *ServerConfig) ([]byte, error) {
js, err := json.MarshalIndent(conf, "", " ")
if err != nil {
return nil, err
}
if len(js) < 1 || len(js)-1 <= 1 {
return nil, ErrInvalidConfig
}
// Slice the initial and final brackets from the
// resulting JSON configuration so gnatsd config parsers
// almost treats it as valid config.
js = js[1:]
js = js[:len(js)-1]
// Replacing all commas with line breaks still keeps
// arrays valid and makes the top level configuration
// be able to be parsed as gnatsd config.
result := bytes.Replace(js, []byte(","), []byte("\n"), -1)
return result, nil
}

Support NATS Streaming

At first let me say that this is super cool, love the concept of operators & thanks for doing this with NATS.

I was was wondering (and would like to see) if there are any plans to support NATS Streaming?

can't lock NatsCluster to certain nodes

I am trying to have all the nats related pods on particular nodes. I have everything but the NatsCluster pods working

---
apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "my-nats"
spec:
  size: 3
  template:
    spec:
      nodeSelector:
        my.pool: nats

I have the nodeselector in place for NatsStreamingCluster and it works.
But for NatsCluster, it puts the pods randomly on different nodes.

Missing client authentication enabled check in natscluster.yaml helm template

Currently the auth section in natscluster.yaml does not check if client authentication is set to enabled in values.yaml.

It is missing {{- if .Values.auth.enabled }}

  {{- if .Values.auth.enabled }}
  auth:
    enableServiceAccounts: {{ .Values.auth.enableServiceAccounts }}
    clientsAuthSecret: {{ template "nats.fullname" . }}-clients-auth
    clientsAuthTimeout: 5
  {{- end }}

Official docker images

Would be great to provide official or nats supported docker images to use for the nats-operator and reloader.

TLS Route Certificates need IP SANs when Deployed to same IP

I know this is not a NATS issue, but I wanted to point out an issue we are experiencing due to the nature of k8s deployments and the way the operator is currently configured.

Also, I'm not sure if this is related to #32

I'm using a wildcard CN *.nats-mgmt.nats-io.svc for the routes and when a cluster node is deployed with the same IP Address as another node, I receive the following error:

[1] 2018/10/12 18:46:42.988808 [ERR] 172.28.219.187:6222 - rid:4 - TLS route handshake error: x509: cannot validate certificate for 172.28.219.187 because it doesn't contain any IP SANs

I could add IP SANs to my certificate, but that seems crazy since the IP is unpredictable (and IP SANs can't use wildcards AFAIK).

Is there a strategy I'm missing? Perhaps we need an option to skip hostname verification? Or better yet, force deployment to different IPs?

Add Cluster/Client Advertise to cluster config

Currently when TLS is setup for clients and routes, the NATS clients receive routes as IPs via auto discovery so hostname verification would fail when reconnecting to one of these IPs. Setting instead the pods to advertise the A record that they get in the cluster would allow clients to be able to failover to an available node right away.

https://github.com/nats-io/gnatsd/blob/163ba3f6a7521300a4f578eb90b9ff5d40d658c4/main.go#L24

https://github.com/nats-io/gnatsd/blob/163ba3f6a7521300a4f578eb90b9ff5d40d658c4/main.go#L51

logs being spammed when a pod is deleted

I'm testing the resilience of an app of mine that uses NATS and then I've been doing some tests removing pods from the cluster and then waiting for new ones to surge, they do surge and everything seems to be working fine, however, logs like that are being spammed:

nats-4xwt4vd2gl nats [1] 2018/04/20 19:40:27.277348 [ERR] Error trying to connect to route: dial tcp: lookup nats-glbxsvxbb3.nats-cluster-1-mgmt.nats-io.svc on 100.64.0.10:53: no such host
nats-4xwt4vd2gl nats [1] 2018/04/20 19:40:27.360110 [ERR] Error trying to connect to route: dial tcp: lookup nats-jftp8nrf0r.nats-cluster-1-mgmt.nats-io.svc on 100.64.0.10:53: no such host
nats-xnqsk7ncm4 nats [1] 2018/04/20 19:40:27.690995 [ERR] Error trying to connect to route: dial tcp: lookup nats-jftp8nrf0r.nats-cluster-1-mgmt.nats-io.svc on 100.64.0.10:53: no such host
nats-4xwt4vd2gl nats [1] 2018/04/20 19:40:28.285597 [ERR] Error trying to connect to route: dial tcp: lookup nats-glbxsvxbb3.nats-cluster-1-mgmt.nats-io.svc on 100.64.0.10:53: no such host
nats-4xwt4vd2gl nats [1] 2018/04/20 19:40:28.364757 [ERR] Error trying to connect to route: dial tcp: lookup nats-jftp8nrf0r.nats-cluster-1-mgmt.nats-io.svc on 100.64.0.10:53: no such host
nats-xnqsk7ncm4 nats [1] 2018/04/20 19:40:28.699119 [ERR] Error trying to connect to route: dial tcp: lookup nats-jftp8nrf0r.nats-cluster-1-mgmt.nats-io.svc on 100.64.0.10:53: no such host
nats-4xwt4vd2gl nats [1] 2018/04/20 19:40:29.292426 [ERR] Error trying to connect to route: dial tcp: lookup nats-glbxsvxbb3.nats-cluster-1-mgmt.nats-io.svc on 100.64.0.10:53: no such host
nats-4xwt4vd2gl nats [1] 2018/04/20 19:40:29.376717 [ERR] Error trying to connect to route: dial tcp: lookup nats-jftp8nrf0r.nats-cluster-1-mgmt.nats-io.svc on 100.64.0.10:53: no such host

it seems that nats will never stop trying to find the members that I've deleted...

Nats-Operator incompatible with istio?

When I follow the instructions in the project readme to create a nats cluster with 3 members on a gke cluster using istio, all three members immediately show unhealthy and quickly go to crashloopbackoff. Is there something additional I need to do to get nats-operator to play nice with a service mesh?

My Nats Cluster:

echo '
apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "example-nats-cluster"
spec:
  size: 3
  version: "1.3.0"
' | kubectl apply -f -

Log from one member:

[1] 2018/10/30 20:27:15.907885 [INF] Starting nats-server version 1.3.0
[1] 2018/10/30 20:27:15.907943 [INF] Git commit [eed4fbc]
[1] 2018/10/30 20:27:15.908133 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2018/10/30 20:27:15.908194 [INF] Listening for client connections on 0.0.0.0:4222
[1] 2018/10/30 20:27:15.908208 [INF] Server is ready
[1] 2018/10/30 20:27:15.908541 [INF] Listening for route connections on 0.0.0.0:6222
[1] 2018/10/30 20:27:15.914868 [ERR] Error trying to connect to route: dial tcp 10.12.12.4:6222: connect: connection refused
[1] 2018/10/30 20:27:16.930604 [ERR] Error trying to connect to route: dial tcp 10.12.12.4:6222: connect: connection refused
[1] 2018/10/30 20:27:17.935214 [INF] 10.12.12.4:6222 - rid:1 - Route connection created
[1] 2018/10/30 20:27:17.940613 [INF] 127.0.0.1:41486 - rid:2 - Route connection created
[1] 2018/10/30 20:27:18.962862 [INF] 10.12.12.4:6222 - rid:3 - Route connection created

(and the Route connection messages continue 290 times before the container is shut down as unhealthy)

My Istio deployment is the default Isitio App from the GCP marketplace, with three nodes in it.
K8S version info:

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitCommit:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2018-08-20T10:09:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.7-gke.6", GitCommit:"9b635efce81582e1da13b35a7aa539c0ccb32987", GitTreeState:"clean", BuildDate:"2018-08-16T21:33:47Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

istio-pilot version is 1.3

I'd be happy to add more detail if there are follow up questions. I can also cross-post this issue to Istio if the problem appears to be on their side...

Unable to recover from node crash

I was testing this locally until the node crashed (minikube). Now I get this constantly:

time="2018-12-05T16:34:51Z" level=info msg="Cluster size needs reconciling: expected 3, has 0" cluster-name=nats pkg=cluster
time="2018-12-05T16:34:51Z" level=error msg="failed to reconcile: pods \"nats-1\" already exists" cluster-name=nats pkg=cluster

The describe on nats-1 (it's permanently terminated:

Name:           nats-1
Namespace:      nats-io
Node:           minikube/10.0.2.15
Start Time:     Wed, 05 Dec 2018 11:46:42 +0100
Labels:         app=nats
                nats_cluster=nats
                nats_version=1.3.0
Annotations:    nats.version=1.3.0
Status:         Failed
IP:             
Controlled By:  NatsCluster/nats
Containers:
  nats:
    Container ID:  docker://b0892b87b0ac1d25ff64e197a53e138b3e635f76f74589924d16eafa3a55d50e
    Image:         nats:1.3.0
    Image ID:      docker-pullable://nats@sha256:5e99caf7ca7b2e4a242e741328bde393bbd7a529a2cfdd19b84870da87ad6ca1
    Ports:         6222/TCP, 4222/TCP, 8222/TCP
    Command:
      /gnatsd
      -c
      /etc/nats-config/nats.conf
      -P
      /var/run/nats/gnatsd.pid
    State:          Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Wed, 05 Dec 2018 11:46:49 +0100
      Finished:     Wed, 05 Dec 2018 17:22:54 +0100
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:8222/ delay=10s timeout=10s period=60s #success=1 #failure=3
    Environment:
      SVC:    nats-mgmt
      EXTRA:  --http_port=8222
    Mounts:
      /etc/nats-config from nats-config (rw)
      /var/run/nats from pid (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-pzlst (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  nats-config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nats
    Optional:    false
  pid:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  default-token-pzlst:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-pzlst
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

Operator crashes constantly on Azure AKS

I am trying to deploy NATS using NATS Operator on a 3 Node Kubernetes (v. 1.9.6) cluster on Azure, following this projects readme.md. While the NATS pods appear being fine, the operator is in a never ending crash/restart loop.

The k8s resources and logs of the NATS pods are listed below. At the very end of the operators log an error and a fatal entry are the last things that are logged before the operator goes down.

PS C:\src\aks_test> kubectl get all -o wide
NAME                                 READY     STATUS             RESTARTS   AGE       IP           NODE
pod/example-nats-cluster-1           1/1       Running            0          24m       10.244.2.5   aks-nodepool1-32539510-1
pod/example-nats-cluster-2           1/1       Running            0          23m       10.244.1.5   aks-nodepool1-32539510-2
pod/example-nats-cluster-3           1/1       Running            0          23m       10.244.0.4   aks-nodepool1-32539510-0
pod/nats-operator-7fdf945577-jxg5s   0/1       CrashLoopBackOff   7          24m       10.244.1.4   aks-nodepool1-32539510-2

NAME                                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE       SELECTOR
service/example-nats-cluster        ClusterIP   10.0.5.144   <none>        4222/TCP            24m       app=nats,nats_cluster=example-nats-cluster
service/example-nats-cluster-mgmt   ClusterIP   None         <none>        6222/TCP,8222/TCP   24m       app=nats,nats_cluster=example-nats-cluster,nats_version=1.1.0
service/kubernetes                  ClusterIP   10.0.0.1     <none>        443/TCP             30m       <none>

NAME                                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINERS      IMAGES                                           SELECTOR
deployment.extensions/nats-operator   1         1         1            0           24m       nats-operator   connecteverything/nats-operator:0.2.2-v1alpha2   name=nats-operator

NAME                                             DESIRED   CURRENT   READY     AGE       CONTAINERS      IMAGES                                           SELECTOR
replicaset.extensions/nats-operator-7fdf945577   1         1         0         24m       nats-operator   connecteverything/nats-operator:0.2.2-v1alpha2   name=nats-operator,pod-
template-hash=3989501133

NAME                            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINERS      IMAGES                                           SELECTOR
deployment.apps/nats-operator   1         1         1            0           24m       nats-operator   connecteverything/nats-operator:0.2.2-v1alpha2   name=nats-operator

NAME                                       DESIRED   CURRENT   READY     AGE       CONTAINERS      IMAGES                                           SELECTOR
replicaset.apps/nats-operator-7fdf945577   1         1         0         24m       nats-operator   connecteverything/nats-operator:0.2.2-v1alpha2   name=nats-operator,pod-templa
te-hash=3989501133
PS C:\src\aks_test> kubectl logs example-nats-cluster-1
[1] 2018/06/24 07:57:54.986814 [INF] Starting nats-server version 1.1.0
[1] 2018/06/24 07:57:54.986911 [INF] Git commit [add6d79]
[1] 2018/06/24 07:57:54.987207 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2018/06/24 07:57:54.987305 [INF] Listening for client connections on 0.0.0.0:4222
[1] 2018/06/24 07:57:54.987370 [INF] Server is ready
[1] 2018/06/24 07:57:54.987877 [INF] Listening for route connections on 0.0.0.0:6222
[1] 2018/06/24 07:58:00.347433 [INF] 10.244.1.5:52728 - rid:1 - Route connection created
[1] 2018/06/24 07:58:07.904174 [INF] 10.244.0.4:60902 - rid:2 - Route connection created
PS C:\src\aks_test> kubectl logs example-nats-cluster-2
[1] 2018/06/24 07:58:00.328032 [INF] Starting nats-server version 1.1.0
[1] 2018/06/24 07:58:00.328063 [INF] Git commit [add6d79]
[1] 2018/06/24 07:58:00.328240 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2018/06/24 07:58:00.328276 [INF] Listening for client connections on 0.0.0.0:4222
[1] 2018/06/24 07:58:00.328286 [INF] Server is ready
[1] 2018/06/24 07:58:00.328526 [INF] Listening for route connections on 0.0.0.0:6222
[1] 2018/06/24 07:58:00.336456 [INF] 10.244.2.5:6222 - rid:1 - Route connection created
[1] 2018/06/24 07:58:00.373683 [ERR] Error trying to connect to route: dial tcp: lookup example-nats-cluster-2.example-nats-cluster-mgmt.default.svc on 10.0.0.10:53: no such hos
t
[1] 2018/06/24 07:58:01.377851 [INF] 10.244.1.5:43288 - rid:2 - Route connection created
[1] 2018/06/24 07:58:01.378108 [INF] 10.244.1.5:6222 - rid:3 - Route connection created
[1] 2018/06/24 07:58:07.893824 [INF] 10.244.0.4:60642 - rid:4 - Route connection created
PS C:\src\aks_test> kubectl logs example-nats-cluster-3
[1] 2018/06/24 07:58:07.882819 [INF] Starting nats-server version 1.1.0
[1] 2018/06/24 07:58:07.882851 [INF] Git commit [add6d79]
[1] 2018/06/24 07:58:07.883013 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2018/06/24 07:58:07.883045 [INF] Listening for client connections on 0.0.0.0:4222
[1] 2018/06/24 07:58:07.883055 [INF] Server is ready
[1] 2018/06/24 07:58:07.883346 [INF] Listening for route connections on 0.0.0.0:6222
[1] 2018/06/24 07:58:07.893434 [INF] 10.244.2.5:6222 - rid:1 - Route connection created
[1] 2018/06/24 07:58:07.893807 [INF] 10.244.1.5:6222 - rid:2 - Route connection created
[1] 2018/06/24 07:58:07.901949 [ERR] Error trying to connect to route: dial tcp: lookup example-nats-cluster-3.example-nats-cluster-mgmt.default.svc on 10.0.0.10:53: no such hos
t
[1] 2018/06/24 07:58:08.906155 [INF] 10.244.0.4:6222 - rid:3 - Route connection created
[1] 2018/06/24 07:58:08.906461 [INF] 10.244.0.4:36868 - rid:4 - Route connection created
PS C:\src\aks_test> kubectl logs -f nats-operator-7fdf945577-jxg5s
time="2018-06-24T08:22:07Z" level=info msg="nats-operator Version: 0.2.2-v1alpha2+git"
time="2018-06-24T08:22:07Z" level=info msg="Git SHA: fb2847b"
time="2018-06-24T08:22:07Z" level=info msg="Go Version: go1.9"
time="2018-06-24T08:22:07Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-06-24T08:22:08Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"default\", Name:\"nats-operator\", UID:\"3902fb5a-7784-11e8-b0cb-0a58ac1f054
6\", APIVersion:\"v1\", ResourceVersion:\"3266\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' nats-operator-7fdf945577-jxg5s became leader"
time="2018-06-24T08:22:08Z" level=info msg="finding existing clusters..." pkg=controller
time="2018-06-24T08:22:08Z" level=info msg="starts running from watch version: 3270" pkg=controller
time="2018-06-24T08:22:08Z" level=info msg="start running..." cluster-name=example-nats-cluster pkg=cluster
time="2018-06-24T08:22:08Z" level=info msg="start watching at 3270" pkg=controller
time="2018-06-24T08:23:08Z" level=error msg="received invalid event from API server: fail to decode raw event from apiserver (unexpected EOF)" pkg=controller
time="2018-06-24T08:23:08Z" level=fatal msg="controller Run() ended with failure: fail to decode raw event from apiserver (unexpected EOF)"

Full repro:

Prerequisites:

$ az login
$ az provider register -n Microsoft.Network
$ az provider register -n Microsoft.Storage
$ az provider register -n Microsoft.Compute
$ az provider register -n Microsoft.ContainerService
$ az group create --name aks --location westeurope
$ az aks create --resource-group aks --name aksCluster --node-count 3 --generate-ssh-keys #Create k8s cluster with 3 nodes, this will take 10 to 15 minutes
$ az aks install-cli #Install kubectl, may not be necessary if already installed or if running Azure Cloud Shell
$ az aks get-credentials --resource-group aks --name aksCluster #Setup kubectl to connect to our Azure k8s cluster
$ kubectl apply -f https://raw.githubusercontent.com/nats-io/nats-operator/88f19bcd7da571a3004c364859ebbce3202c510e/example/deployment.yaml
$ echo '
apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "example-nats-cluster"
spec:
  size: 3
  version: "1.1.0"
' | kubectl apply -f -

$ az group delete --name aks --yes --no-wait #Delete everything once you don't need the cluster anymore

No clusters created after CRD creation. RBAC enabled.

Hey,

I'm using local cluster based on https://github.com/Mirantis/kubeadm-dind-cluster in 1.11 version with 3 nodes:

NAME          STATUS    ROLES     AGE       VERSION  
kube-master   Ready     master    30m       v1.11.0
kube-node-1   Ready     <none>    29m       v1.11.0
kube-node-2   Ready     <none>    29m       v1.11.0
kube-node-3   Ready     <none>    29m       v1.11.0

RBAC is enabled so therefore I executed:
kubectl apply -f https://raw.githubusercontent.com/nats-io/nats-operator/master/example/deployment-rbac.yaml
which ended up with:

$ kubectl -n nats-io logs deployment/nats-operator
time="2018-08-24T11:20:15Z" level=info msg="nats-operator Version: 0.2.3-v1alpha2+git"
time="2018-08-24T11:20:15Z" level=info msg="Git SHA: d88048a"
time="2018-08-24T11:20:15Z" level=info msg="Go Version: go1.9"
time="2018-08-24T11:20:15Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-08-24T11:20:33Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"nats-io\", Name:\"nats-operator\", UID:\"fd436814-a78d-11e8-9920-e20744c33dad\", APIVersion:\"v1\", ResourceVersion:\"3701\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' nats-operator-f44c5854d-ph6cg became leader"
time="2018-08-24T11:20:33Z" level=info msg="finding existing clusters..." pkg=controller
time="2018-08-24T11:20:33Z" level=info msg="starts running from watch version: 3702" pkg=controller
time="2018-08-24T11:20:33Z" level=info msg="start watching at 3702" pkg=controller
time="2018-08-24T11:22:16Z" level=info msg="apiserver closed watch stream, retrying after 5s..." pkg=controller
time="2018-08-24T11:22:21Z" level=info msg="start watching at 3702" pkg=controller
time="2018-08-24T11:23:52Z" level=info msg="apiserver closed watch stream, retrying after 5s..." pkg=controller
time="2018-08-24T11:23:57Z" level=info msg="start watching at 3702" pkg=controller

so far so good BUT then

$ kubectl apply -f https://raw.githubusercontent.com/nats-io/nats-operator/master/example/example-nats-cluster.yaml
natscluster.nats.io/example-nats-1 created```
and then:
```$ kubectl get natsclusters.nats.io
NAME             AGE                                                         
example-nats-1   2s

but sadly

$ kubectl -n nats-io get pods -l nats_cluster=example-nats-1
No resources found.

There are no logs in operator no other pods what so ever. I tried to check what it actually tries to do but it just seems that it's stalled. I also recreated deployment and the same.

Failed to re-create cluster

I created a sample cluster with

kc apply -f example-nats-cluster.yaml -n nats-io

After playing with the cluster a while, I did

kc delete -f example-nats-cluster.yaml -n nats-io

and then

kc apply -f example-nats-cluster.yaml -n nats-io

again. But it failed this time, describe shows the following reason:

cluster create: fail to create shared config map: configmaps "example-nats-1" already exists

Allow setting custom image for the NATS server

Should allow making it possible to set a custom image for the server, for example:

apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "example-nats-1"
spec:
  size: 3
  # version "1.3.0"
  image: "nats:1.3.0"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.