Git Product home page Git Product logo

Comments (4)

timflannagan avatar timflannagan commented on June 25, 2024 1

@nioshield Awesome - thanks for the link! Okay, sounds like this is a known issue that folks have seen before then.

from chaos-mesh.

cwen0 avatar cwen0 commented on June 25, 2024

Delete the chaos manager Pod in the "management", "base", etc. cluster (i.e. the cluster where those RemoteCluster CRs live) and re-create the same PodChaos resource that was previously working

Master Chaosmanager pod didn't comulate with remote chaos-manage pod directly and comulate with remote cluster's apiserver. Can you check if the target remotecluster crd is exist and the remote cluster kubeconfig changed?

from chaos-mesh.

timflannagan avatar timflannagan commented on June 25, 2024

Can you check if the target remotecluster crd is exist

Yep, here are the RemoteCluster CRs in my local environment. I'm spinning up three kind clusters, where the kc1 and kc2 alias' are pointing to the kind-cluster-1 and kind-cluster-2 kube contexts. In this case, we have a base cluster named "kind-mgmt-cluster" which houses all the RemoteCluster CRs:

$ kc1 get remotecluster -A
No resources found
$ kc2 get remotecluster -A
No resources found
$ k get remotecluster -A -oyaml
apiVersion: v1
items:
- apiVersion: chaos-mesh.org/v1alpha1
  kind: RemoteCluster
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"chaos-mesh.org/v1alpha1","kind":"RemoteCluster","metadata":{"annotations":{},"name":"cluster-1"},"spec":{"configOverride":{"chaosDaemon":{"hostNetwork":true,"privileged":true,"runtime":"containerd","socketPath":"/run/containerd/containerd.sock"},"controllerManager":{"leaderElection":{"enabled":false},"replicaCount":1},"dashboard":{"create":false}},"kubeConfig":{"secretRef":{"key":"kubeconfig","name":"cluster-1-kubeconfig","namespace":"default"}},"namespace":"chaos-mesh","version":"2.6.3"}}
    creationTimestamp: "2024-03-19T01:34:54Z"
    finalizers:
    - chaos-mesh/remotecluster-controllers
    generation: 2
    name: cluster-1
    resourceVersion: "49064"
    uid: 03c71898-53e9-4172-8b08-c13f1f02f166
  spec:
    configOverride:
      chaosDaemon:
        hostNetwork: true
        privileged: true
        runtime: containerd
        socketPath: /run/containerd/containerd.sock
      controllerManager:
        leaderElection:
          enabled: false
        replicaCount: 1
      dashboard:
        create: false
    kubeConfig:
      secretRef:
        key: kubeconfig
        name: cluster-1-kubeconfig
        namespace: default
    namespace: chaos-mesh
    version: 2.6.3
  status:
    currentVersion: 2.6.3
    observedGeneration: 2
- apiVersion: chaos-mesh.org/v1alpha1
  kind: RemoteCluster
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"chaos-mesh.org/v1alpha1","kind":"RemoteCluster","metadata":{"annotations":{},"name":"cluster-2"},"spec":{"configOverride":{"chaosDaemon":{"hostNetwork":true,"privileged":true,"runtime":"containerd","socketPath":"/run/containerd/containerd.sock"},"controllerManager":{"leaderElection":{"enabled":false},"replicaCount":1},"dashboard":{"create":false}},"kubeConfig":{"secretRef":{"key":"kubeconfig","name":"cluster-2-kubeconfig","namespace":"default"}},"namespace":"chaos-mesh","version":"2.6.3"}}
    creationTimestamp: "2024-03-19T01:34:54Z"
    finalizers:
    - chaos-mesh/remotecluster-controllers
    generation: 2
    name: cluster-2
    resourceVersion: "49075"
    uid: 6bceffa1-1b78-4a01-a41b-cfb1c0d2588d
  spec:
    configOverride:
      chaosDaemon:
        hostNetwork: true
        privileged: true
        runtime: containerd
        socketPath: /run/containerd/containerd.sock
      controllerManager:
        leaderElection:
          enabled: false
        replicaCount: 1
      dashboard:
        create: false
    kubeConfig:
      secretRef:
        key: kubeconfig
        name: cluster-2-kubeconfig
        namespace: default
    namespace: chaos-mesh
    version: 2.6.3
  status:
    currentVersion: 2.6.3
    observedGeneration: 2
kind: List
metadata:
  resourceVersion: ""

And verified the CRD exists for those remote clusters:

$ kc1 api-resources | grep remotecluster
remoteclusters                                            chaos-mesh.org/v1alpha1                 false        RemoteCluster
$ kc2 api-resources | grep remotecluster
remoteclusters                                            chaos-mesh.org/v1alpha1                 false        RemoteCluster

and the remote cluster kubeconfig changed

Both of the remote clusters haven't been updated when the base cluster's chaos manager pod gets kicked. i.e. there weren't any updates to the control plane components for those remote clusters, or any updates to the kubeconfig Secrets that live in the base cluster. Hope that makes sense, but let me know if you need any additional details.

As an aside, I used the following bash function to generate the kubeconfig files for those kind clusters:

$ declare -f kind_write_kubeconfig_files
kind_write_kubeconfig_files () {
	context=${1:-kind-mgmt-cluster} 
	kubectl config set-context $context
	for cluster in $(kind get clusters)
	do
		if [[ $cluster == "mgmt-cluster" ]]
		then
			continue
		fi
		external_ip=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $cluster-control-plane) 
		kind get kubeconfig --name $cluster | sed -E "s/server: https:\/\/[0-9\.]+:[0-9]+/server: https:\/\/$external_ip:6443/g" > .bin/$cluster.kubeconfig
		kubectl -n default create secret generic $cluster-kubeconfig --from-file=kubeconfig=.bin/$cluster.kubeconfig
	done
}

I was largely following the multi-cluster documentation and the steps outlined in the #4150 issue. I'm also able to consistently reproduce this, so let me know if more information is needed here. Lastly, I tried manually restarting the chaos manager pod in the remote clusters to see whether there were any potential races between the base and remote cluster's chaos mesh deployments, but I didn't have any luck there as well.

from chaos-mesh.

nioshield avatar nioshield commented on June 25, 2024

Delete the chaos manager Pod in the "management", "base", etc. cluster (i.e. the cluster where those RemoteCluster CRs live) and re-create the same PodChaos resource that was previously working

It seems to be a problem with that #4208.
After you delete the controller pod, you cannot re-register the previous RemoteCluster

from chaos-mesh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.