Comments (4)
@nioshield Awesome - thanks for the link! Okay, sounds like this is a known issue that folks have seen before then.
from chaos-mesh.
Delete the chaos manager Pod in the "management", "base", etc. cluster (i.e. the cluster where those RemoteCluster CRs live) and re-create the same PodChaos resource that was previously working
Master Chaosmanager pod didn't comulate with remote chaos-manage pod directly and comulate with remote cluster's apiserver. Can you check if the target remotecluster crd is exist and the remote cluster kubeconfig changed?
from chaos-mesh.
Can you check if the target remotecluster crd is exist
Yep, here are the RemoteCluster CRs in my local environment. I'm spinning up three kind clusters, where the kc1
and kc2
alias' are pointing to the kind-cluster-1 and kind-cluster-2 kube contexts. In this case, we have a base cluster named "kind-mgmt-cluster" which houses all the RemoteCluster CRs:
$ kc1 get remotecluster -A
No resources found
$ kc2 get remotecluster -A
No resources found
$ k get remotecluster -A -oyaml
apiVersion: v1
items:
- apiVersion: chaos-mesh.org/v1alpha1
kind: RemoteCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"chaos-mesh.org/v1alpha1","kind":"RemoteCluster","metadata":{"annotations":{},"name":"cluster-1"},"spec":{"configOverride":{"chaosDaemon":{"hostNetwork":true,"privileged":true,"runtime":"containerd","socketPath":"/run/containerd/containerd.sock"},"controllerManager":{"leaderElection":{"enabled":false},"replicaCount":1},"dashboard":{"create":false}},"kubeConfig":{"secretRef":{"key":"kubeconfig","name":"cluster-1-kubeconfig","namespace":"default"}},"namespace":"chaos-mesh","version":"2.6.3"}}
creationTimestamp: "2024-03-19T01:34:54Z"
finalizers:
- chaos-mesh/remotecluster-controllers
generation: 2
name: cluster-1
resourceVersion: "49064"
uid: 03c71898-53e9-4172-8b08-c13f1f02f166
spec:
configOverride:
chaosDaemon:
hostNetwork: true
privileged: true
runtime: containerd
socketPath: /run/containerd/containerd.sock
controllerManager:
leaderElection:
enabled: false
replicaCount: 1
dashboard:
create: false
kubeConfig:
secretRef:
key: kubeconfig
name: cluster-1-kubeconfig
namespace: default
namespace: chaos-mesh
version: 2.6.3
status:
currentVersion: 2.6.3
observedGeneration: 2
- apiVersion: chaos-mesh.org/v1alpha1
kind: RemoteCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"chaos-mesh.org/v1alpha1","kind":"RemoteCluster","metadata":{"annotations":{},"name":"cluster-2"},"spec":{"configOverride":{"chaosDaemon":{"hostNetwork":true,"privileged":true,"runtime":"containerd","socketPath":"/run/containerd/containerd.sock"},"controllerManager":{"leaderElection":{"enabled":false},"replicaCount":1},"dashboard":{"create":false}},"kubeConfig":{"secretRef":{"key":"kubeconfig","name":"cluster-2-kubeconfig","namespace":"default"}},"namespace":"chaos-mesh","version":"2.6.3"}}
creationTimestamp: "2024-03-19T01:34:54Z"
finalizers:
- chaos-mesh/remotecluster-controllers
generation: 2
name: cluster-2
resourceVersion: "49075"
uid: 6bceffa1-1b78-4a01-a41b-cfb1c0d2588d
spec:
configOverride:
chaosDaemon:
hostNetwork: true
privileged: true
runtime: containerd
socketPath: /run/containerd/containerd.sock
controllerManager:
leaderElection:
enabled: false
replicaCount: 1
dashboard:
create: false
kubeConfig:
secretRef:
key: kubeconfig
name: cluster-2-kubeconfig
namespace: default
namespace: chaos-mesh
version: 2.6.3
status:
currentVersion: 2.6.3
observedGeneration: 2
kind: List
metadata:
resourceVersion: ""
And verified the CRD exists for those remote clusters:
$ kc1 api-resources | grep remotecluster
remoteclusters chaos-mesh.org/v1alpha1 false RemoteCluster
$ kc2 api-resources | grep remotecluster
remoteclusters chaos-mesh.org/v1alpha1 false RemoteCluster
and the remote cluster kubeconfig changed
Both of the remote clusters haven't been updated when the base cluster's chaos manager pod gets kicked. i.e. there weren't any updates to the control plane components for those remote clusters, or any updates to the kubeconfig Secrets that live in the base cluster. Hope that makes sense, but let me know if you need any additional details.
As an aside, I used the following bash function to generate the kubeconfig files for those kind clusters:
$ declare -f kind_write_kubeconfig_files
kind_write_kubeconfig_files () {
context=${1:-kind-mgmt-cluster}
kubectl config set-context $context
for cluster in $(kind get clusters)
do
if [[ $cluster == "mgmt-cluster" ]]
then
continue
fi
external_ip=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $cluster-control-plane)
kind get kubeconfig --name $cluster | sed -E "s/server: https:\/\/[0-9\.]+:[0-9]+/server: https:\/\/$external_ip:6443/g" > .bin/$cluster.kubeconfig
kubectl -n default create secret generic $cluster-kubeconfig --from-file=kubeconfig=.bin/$cluster.kubeconfig
done
}
I was largely following the multi-cluster documentation and the steps outlined in the #4150 issue. I'm also able to consistently reproduce this, so let me know if more information is needed here. Lastly, I tried manually restarting the chaos manager pod in the remote clusters to see whether there were any potential races between the base and remote cluster's chaos mesh deployments, but I didn't have any luck there as well.
from chaos-mesh.
Delete the chaos manager Pod in the "management", "base", etc. cluster (i.e. the cluster where those RemoteCluster CRs live) and re-create the same PodChaos resource that was previously working
It seems to be a problem with that #4208.
After you delete the controller pod, you cannot re-register the previous RemoteCluster
from chaos-mesh.
Related Issues (20)
- Remote Cluster Condition Enhancement HOT 3
- IO chaos injection delete incompletely, chaosFS still exists. HOT 1
- HTTPChaos not injecting faults with an Istio sidecar HOT 2
- Unsuccessful Network Delay experiment keeps running after being paused. Deletion is also problematic HOT 2
- Chaos Mesh experiment failing with "admission webhook 'vstresschaos.kb.io' denied the request: Spec: Invalid value: xxx. missing stressors" HOT 4
- Always in a "Waiting for pod running" state
- Permission to create namespaces through RBAC. However, the error report does not have permission HOT 1
- Tracking Issue: Better observability for StressChaos HOT 1
- Failed to apply StressChaos in minikube with qemu driver: controller is not supported HOT 1
- dashboard: panic in namespace scoped mode with specific targetNamespace
- chaos-mesh go package does not work: 'go: finding module for package sigs.k8s.io/controller-runtime/pkg/envtest/printer' fails HOT 3
- Report the use of components with vulnerabilities in chaos-mesh
- controller-manager can't access RemoteCluster due to namespaced role HOT 2
- StressChaos - certificate has expired or is not yet valid HOT 1
- Memory stressor is not accepting time field HOT 2
- Bug: Remove unnecessary permissions in Helm Charts HOT 2
- Not able to run HTTP Requests in Workflow HOT 1
- chaos-controller-manager CrashLoopBackOff, reporting failed to get informer from cache and too many open files HOT 2
- Unable to force delete experiment executions after Chaos Mesh pods already removed
- overly divided network partition HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chaos-mesh.