Git Product home page Git Product logo

chaos-mesh / chaos-mesh Goto Github PK

View Code? Open in Web Editor NEW
6.4K 124.0 794.0 66.33 MB

A Chaos Engineering Platform for Kubernetes.

Home Page: https://chaos-mesh.org

License: Apache License 2.0

Makefile 0.48% Go 65.54% Shell 4.31% Dockerfile 0.24% C 0.15% HTML 0.04% TypeScript 27.82% Python 0.25% JavaScript 0.66% Mustache 0.15% CSS 0.05% Smarty 0.19% MDX 0.13%
chaos chaos-engineering chaos-testing kubernetes operator golang site-reliability-engineering fault-injection cncf cloud-native

chaos-mesh's Introduction

Chaos Mesh Logo

Chaos Mesh Logo


LICENSE codecov Go Report Card GoDoc Upload Image

FOSSA Status CII Best Practices Artifact Hub

Chaos Mesh is an open source cloud-native Chaos Engineering platform. It offers various types of fault simulation and has an enormous capability to orchestrate fault scenarios.

Using Chaos Mesh, you can conveniently simulate various abnormalities that might occur in reality during the development, testing, and production environments and find potential problems in the system. To lower the threshold for a Chaos Engineering project, Chaos Mesh provides you with a visualization operation. You can easily design your Chaos scenarios on the Web UI and monitor the status of Chaos experiments.

cncf_logo cncf_logo

Chaos Mesh is a Cloud Native Computing Foundation (CNCF) incubating project. If you are an organization that wants to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented, consider joining the CNCF. For details about who's involved and how Chaos Mesh plays a role, read the CNCF announcement.


At the current stage, Chaos Mesh has the following components:

  • Chaos Operator: the core component for chaos orchestration. Fully open sourced.
  • Chaos Dashboard: a Web UI for managing, designing, monitoring Chaos Experiments.

See the following demo video for a quick view of Chaos Mesh:

Watch the video

Chaos Operator

Chaos Operator injects chaos into the applications and Kubernetes infrastructure in a manageable way, which provides easy, custom definitions for chaos experiments and automatic orchestration. There are two components at play:

Chaos Controller Manager: is primarily responsible for the scheduling and management of Chaos experiments. This component contains several CRD Controllers, such as Workflow Controller, Scheduler Controller, and Controllers of various fault types.

Chaos Daemon: runs as DaemonSet and has Privileged permission by default (which can be disabled). This component mainly interferes with specific network devices, file systems, kernels by hacking into the target Pod Namespace.

Chaos Operator

Chaos Operator uses CustomResourceDefinition (CRD) to define chaos objects.

The current implementation supports a few types of CRD objects for fault injection, namely PodChaos, NetworkChaos, IOChaos, TimeChaos, StressChaos, and so on. You can get the full list of CRD objects and their specifications in the Chaos Mesh Docs.

Quick start

See Quick Start and Install Chaos Mesh using Helm.

Contributing

See the contributing guide and development guide.

Adopters

See ADOPTERS.

Blogs

Blogs on Chaos Mesh design & implementation, features, chaos engineering, community updates, etc. See Chaos Mesh Blogs. Here are some recommended ones for you to start with:

Community

Please reach out for bugs, feature requests, and other issues via:

  • Following us on Twitter @chaos_mesh.

  • Joining the #project-chaos-mesh channel in the CNCF Slack workspace.

  • Filling an issue or opening a PR against this repository.

Community meetings

  • Chaos Mesh Community Monthly (Community and project-level updates, community sharing/demo, office hours)

  • Chaos Mesh Development Meeting (Releases, roadmap/features/RFC planning and discussion, issue triage/discussion, etc)

Community blogs

Community talks

Media coverage

License

Chaos Mesh is licensed under the Apache License, Version 2.0. See LICENSE for the full content.

FOSSA Status

Trademark

Chaos Mesh is a trademark of The Linux Foundation. All rights reserved.

chaos-mesh's People

Contributors

andrewmatilde avatar anuragpaliwal80 avatar asternight avatar bellaxiang avatar colstuwjx avatar cwen0 avatar dcalvin avatar dependabot[bot] avatar fewdan avatar fingerleader avatar g1eny0ung avatar gallardot avatar hexilee avatar iguoyr avatar johncming avatar lucklove avatar mahjonp avatar maplefu avatar milasuperstar avatar oraluben avatar strrl avatar tanglizigit avatar wangxiangustc avatar xlgao-zju avatar yangkeao avatar yeya24 avatar yisaer avatar yiyiyimu avatar yujunz avatar zhouqiang-cl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chaos-mesh's Issues

The IO doc is not very correct

Bug Report

What version of Kubernetes are you using?

None

What did you do?

None

What did you expect to see?

  • What is the common error number for IO injection?
  • What happend when set errno empty and meet other errno such as EL2HLT 51
  • The errno 32 is broken pipe,Seems not a IO error
  • The Linux system error create date is 2004, should we update a new link?

What did you see instead?
The error number in create a yaml is misunderstand. The "32" is not a IO error number

UCP: Add CPU chaos

Description

In an actual production environment, the CPU may be busy. So there needs to be a way to simulate a busy CPU situation. Sometimes we need to look at the performance of the server's performance when the CPU is busy.

So we need to add a new chaos type, namely "CPU chaos". Maybe we can you stress-ng to simulate CPU is busy.

Score

1679.5

Mentor(s)

@ethercflow

Recommended Skills

  • Golang
  • stress-ng

Learning Materials

Add SLA Measurement to Chaos-verify

Some times, We should measure SLA in QPS/Latency drops during tests. I think it is necessary to calculate QPS drops(such as 20% of qps drop) and RTO(Recovery Time Objective) and publish the event

refine `get started on your local machine` section in readme

Feature Request

The get started on your local machine section in readme is not very friendly. we need to improve it.

Describe the feature you'd like:

The mode kind is not very fault-tolerant, lack of necessary error handling. Maybe we can make minikube as primary mode for local machine and kind as secondary mode

Describe alternatives you've considered:

None

Teachability, Documentation, Adoption, Migration Strategy:

None

fail to recover network chaos

networkchaos was not working halfway
image

chaos-daemon error log

2020-01-06T02:37:41.045Z        INFO    chaos-daemon-server     Delete netem    {"Request": "container_id:\"docker://3b45b159bfddcf83602149939bdde8e89643ef47003e1f1b3b15da5ce8564c2e\" "}
2020-01-06T02:37:41.047Z        INFO    chaos-daemon-server     Cancel netem on PID     {"pid": 121865}
2020-01-06T02:37:41.047Z        ERROR   chaos-daemon-server     failed to remove Qdisc  {"error": "invalid argument"}
github.com/go-logr/zapr.(*zapLogger).Error
        /home/pingcap/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/pingcap/chaos-mesh/pkg/chaosdaemon.Cancel
        /home/pingcap/go/src/github.com/pingcap/chaos-mesh/pkg/chaosdaemon/netem.go:98
github.com/pingcap/chaos-mesh/pkg/chaosdaemon.(*Server).DeleteNetem
        /home/pingcap/go/src/github.com/pingcap/chaos-mesh/pkg/chaosdaemon/netem_server.go:51
github.com/pingcap/chaos-mesh/pkg/chaosdaemon/pb._ChaosDaemon_DeleteNetem_Handler
        /home/pingcap/go/src/github.com/pingcap/chaos-mesh/pkg/chaosdaemon/pb/chaosdaemon.pb.go:625
google.golang.org/grpc.(*Server).processUnaryRPC
        /home/pingcap/go/pkg/mod/google.golang.org/[email protected]/server.go:995
google.golang.org/grpc.(*Server).handleStream
        /home/pingcap/go/pkg/mod/google.golang.org/[email protected]/server.go:1275
google.golang.org/grpc.(*Server).serveStreams.func1.1
        /home/pingcap/go/pkg/mod/google.golang.org/[email protected]/server.go:710

Make scheduler as optional in APIs

I'm trying to use chaos-mesh and I found that most APIs have a scheduler option and it is also necessary. I think it's weird for some API with scheduler and would be hard for users to observe the status of k8s-cluster or tidb-cluster.

For example, if I want to observe the performance of my distributed system in network-partition situation, I would like to apply a network-partition chaos API and it would take effect after the chaos-operator received it. After this, I can observe my cluster and see how the cluster performs. After the chaos-testing is over, I could delete the network-partition API and the network recover as previous.

I think Scheduler is also valuable to simulate a weak connection network but not user-friendly for a first quick trial.

Add contribution doc

Feature Request

Is your feature request related to a problem? Please describe:

Describe the feature you'd like:

Contribution guide doc is need for contributors.

Describe alternatives you've considered:

No doc about contribution guide

Teachability, Documentation, Adoption, Migration Strategy:

Controller is always restarted and has some errors

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.7-gke.2", GitCommit:"96c91b15a936b36701e8704844bc356862440a21", GitTreeState:"clean", BuildDate:"2019-12-13T12:37:28Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

Install chaos-mesh and keep it running for several days.

What did you expect to see?

Running without crashes and errors.

What did you see instead?

The controller is restarted several times.

NAME                                        READY   STATUS    RESTARTS   AGE
chaos-collector-database-69f75c4fd6-k6ghw   1/1     Running   0          6d5h
chaos-controller-manager-8fb4bfc78-68sdb    1/1     Running   11         6d5h
chaos-daemon-dfg59                          1/1     Running   0          6d5h
chaos-daemon-gndkg                          1/1     Running   0          6d5h
chaos-daemon-mfhmp                          1/1     Running   0          6d5h
chaos-daemon-n6m7h                          1/1     Running   0          6d5h
chaos-daemon-r9jdw                          1/1     Running   0          6d5h
chaos-dashboard-6654866f7c-twdwg            1/1     Running   0          6d5h
2020-01-15T10:24:51.258Z	ERROR	inject-webhook	channel has closed, should restart watcher
github.com/go-logr/zapr.(*zapLogger).Error
	/home/jenkins/agent/workspace/build_chaos_mesh_master/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/pingcap/chaos-mesh/pkg/webhook/config/watcher.(*K8sConfigMapWatcher).Watch
	/home/jenkins/agent/workspace/build_chaos_mesh_master/go/src/github.com/pingcap/chaos-mesh/pkg/webhook/config/watcher/watcher.go:122
main.watchConfig.func1.1
	/home/jenkins/agent/workspace/build_chaos_mesh_master/go/src/github.com/pingcap/chaos-mesh/cmd/controller-manager/main.go:176
2020-01-15T10:24:51.258Z	ERROR	setup	watcher got error, try to restart watcher	{"error": "watcher channel has closed"}
github.com/go-logr/zapr.(*zapLogger).Error
	/home/jenkins/agent/workspace/build_chaos_mesh_master/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
main.watchConfig.func1.1
	/home/jenkins/agent/workspace/build_chaos_mesh_master/go/src/github.com/pingcap/chaos-mesh/cmd/controller-manager/main.go:180
2020-01-15T10:24:53.259Z	INFO	setup	Launching watcher for ConfigMaps
2020-01-15T10:24:53.259Z	INFO	inject-webhook	Watching for ConfigMaps for changes	{"namespace": "chaos-testing", "labels": {"app.kubernetes.io/component":"webhook"}}
2020-01-15T11:13:57.608Z	ERROR	inject-webhook	channel has closed, should restart watcher

error occurs when creating chaos-controller-manager

dongxu@dongxus-MacBook-Pro ~
$ kubectl get po -n chaos-testing                                                                                                                                                                                                                                    [17:17:56]
NAME                                        READY   STATUS      RESTARTS   AGE
chaos-controller-manager-75ff888c45-nrq4z   0/1     Error       5          94m
chaos-daemon-2tvb4                          1/1     Running     0          94m
chaos-daemon-644l2                          1/1     Running     0          94m
chaos-daemon-6vfkf                          1/1     Running     0          94m
chaos-daemon-8kw2t                          1/1     Running     0          94m
chaos-daemon-ffgz9                          1/1     Running     0          94m
chaos-daemon-kwr6s                          1/1     Running     0          94m
webhook-certs-job-n2ffj                     0/1     Completed   0          94m
webhook-mw-job-4n6jz                        0/1     Completed   0          93m


$ kubectl describe po chaos-controller-manager-75ff888c45-nrq4z -n chaos-testing                                                                                                                                                                                     [17:18:25]
Name:               chaos-controller-manager-75ff888c45-nrq4z
Namespace:          chaos-testing
Priority:           0
PriorityClassName:  <none>
Node:               kind-worker2/172.17.0.3
Start Time:         Mon, 06 Jan 2020 15:43:40 +0800
Labels:             app.kubernetes.io/component=controller-manager
                    app.kubernetes.io/instance=chaos-mesh
                    app.kubernetes.io/name=chaos-mesh
                    pod-template-hash=75ff888c45
Annotations:        <none>
Status:             Running
IP:                 10.244.2.4
Controlled By:      ReplicaSet/chaos-controller-manager-75ff888c45
Containers:
  chaos-mesh:
    Container ID:  containerd://c634de296bd9bfc5b4f850ab29e3317ec34779ffa9b7e9ffea6fc2a53599091e
    Image:         pingcap/chaos-mesh:latest
    Image ID:      docker.io/pingcap/chaos-mesh@sha256:79dcc52939857a48bc524465985617765b3bde14f4b32a64c0c5e24654244f4a
    Port:          9443/TCP
    Host Port:     0/TCP
    Command:
      /usr/local/bin/chaos-controller-manager
      -configmap-labels=app.kubernetes.io/component=webhook
      -conf=/etc/webhook/conf
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 06 Jan 2020 17:18:17 +0800
      Finished:     Mon, 06 Jan 2020 17:18:19 +0800
    Ready:          False
    Restart Count:  5
    Limits:
      cpu:     500m
      memory:  1Gi
    Requests:
      cpu:     250m
      memory:  512Mi
    Environment:
      NAMESPACE:          chaos-testing (v1:metadata.namespace)
      TZ:                 UTC
      CHAOS_DAEMON_PORT:  31767
    Mounts:
      /etc/webhook/certs from webhook-certs (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from chaos-controller-manager-token-htbrf (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  webhook-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  webhook-certs
    Optional:    false
  chaos-controller-manager-token-htbrf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  chaos-controller-manager-token-htbrf
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                    From                   Message
  ----     ------       ----                   ----                   -------
  Normal   Scheduled    94m                    default-scheduler      Successfully assigned chaos-testing/chaos-controller-manager-75ff888c45-nrq4z to kind-worker2
  Warning  FailedMount  94m (x7 over 94m)      kubelet, kind-worker2  MountVolume.SetUp failed for volume "webhook-certs" : secrets "webhook-certs" not found
  Normal   Started      2m40s (x4 over 3m39s)  kubelet, kind-worker2  Started container
  Warning  BackOff      2m5s (x7 over 3m32s)   kubelet, kind-worker2  Back-off restarting failed container
  Normal   Pulling      113s (x5 over 93m)     kubelet, kind-worker2  pulling image "pingcap/chaos-mesh:latest"
  Normal   Pulled       110s (x5 over 3m40s)   kubelet, kind-worker2  Successfully pulled image "pingcap/chaos-mesh:latest"
  Normal   Created      110s (x5 over 3m40s)   kubelet, kind-worker2  Created container

Make sidecar image configurable

Feature Request

Is your feature request related to a problem? Please describe:

Sometimes a different sidecar image is needed for many use case

Describe the feature you'd like:

Sidecar image can be configureable

Describe alternatives you've considered:

None

Teachability, Documentation, Adoption, Migration Strategy:

None

Add your FAQ

Feature Request

Is your feature request related to a problem? Please describe:

Sometimes it takes many time to understand chaos-mesh and take more time to make it run normally。If let's improve FAQ, then we can help others more quick to run normally

Describe the feature you'd like:

FAQ for Chaos Mesh you have ever meet

Describe alternatives you've considered:

None

Teachability, Documentation, Adoption, Migration Strategy:

None

Have you tried operator-framework or controller-runtime?

While I am developing network related chaos operation for chaos-operator, I found there are a lot of duplicated codes between network chaos and pod chaos. Then I tried to abstract them into a framework or package to reuse these duplicate codes.

However, I found operator-sdk and controller-runtime these two frameworks.

operator-sdk provides an SDK to add controller, manager and version with a easy to use CLI.

controller-runtime is the low level abstraction of operator-sdk which provides an abstraction of manager, controller and hooks. Here is an example for it.

Write a tutorial to use Chaos Mesh to test a simple program

Feature Request

Is your feature request related to a problem? Please describe:

Write a very simple program and use chaos mesh test it

Describe the feature you'd like:

  • write a simple program
  • use chaos mesh test it
  • write the tutorial about the program and how to use chaos mesh to test it

Describe alternatives you've considered:

None

Teachability, Documentation, Adoption, Migration Strategy:

None

Add container kill

Feature Request

Is your feature request related to a problem? Please describe:

When a pod contains multi container. how to kill the special container.

Describe the feature you'd like:

We should support container kill.

Describe alternatives you've considered:

None

Teachability, Documentation, Adoption, Migration Strategy:

None

Cannot start tiflash when to inject iochaos

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:54:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:43:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

What did you do?
inject iochaos

tiflash configmap

apiVersion: v1
data:
  chaosfs-tiflash.yaml: |
    name: chaosfs-tiflash
    selector:
      labelSelectors:
        "app": "tiflash"
    initContainers:
    - name: inject-scripts
      image: pingcap/chaos-scripts:latest
      imagePullpolicy: Always
      command: ["sh", "-c", "/scripts/init.sh -d /data/data/db -f /data/data/fuse-data"]
    containers:
    - name: chaosfs
      image: pingcap/chaos-fs:latest
      imagePullpolicy: Always
      ports:
      - containerPort: 65534
      securityContext:
        privileged: true
      command:
        - /usr/local/bin/chaosfs
        - -addr=:65534
        - -pidfile=/tmp/fuse/pid
        - -original=/data/data/fuse-data
        - -mountpoint=/data/data/db
      volumeMounts:
        - name: tiflash
          mountPath: /data/data
          mountPropagation: Bidirectional
    volumeMounts:
    - name: tiflash
      mountPath: /data/data
      mountPropagation: HostToContainer
    - name: scripts
      mountPath: /tmp/scripts
    - name: fuse
      mountPath: /tmp/fuse
    volumes:
    - name: scripts
      emptyDir: {}
    - name: fuse
      emptyDir: {}
    postStart:
      tiflash:
        command:
          - /tmp/scripts/wait-fuse.sh

error log

E0117 13:35:00.855365       1 stateful_set.go:400] Error syncing StatefulSet tiflash-test5/tiflash, requeuing: Internal error occurred: Internal error occurred: jsonpatch add operation does not apply: doc is missing path: "/
metadata/annotations/admission-webhook.pingcap.com~1status"
I0117 13:35:00.856044       1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"tiflash-test5", Name:"tiflash", UID:"3956b26b-385b-11ea-b391-4cd98f4bd3ae", APIVersion:"apps/v1", ResourceVersion:"393890116
", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' create Pod tiflash-1 in StatefulSet tiflash failed error: Internal error occurred: Internal error occurred: jsonpatch add operation does not apply: doc is missing pat
h: "/metadata/annotations/admission-webhook.pingcap.com~1status"
E0117 13:35:00.868822       1 stateful_set.go:400] Error syncing StatefulSet tiflash-test5/tiflash, requeuing: Internal error occurred: Internal error occurred: jsonpatch add operation does not apply: doc is missing path: "/
metadata/annotations/admission-webhook.pingcap.com~1status"
I0117 13:35:00.868891       1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"tiflash-test5", Name:"tiflash", UID:"3956b26b-385b-11ea-b391-4cd98f4bd3ae", APIVersion:"apps/v1", ResourceVersion:"393890116
", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' create Pod tiflash-1 in StatefulSet tiflash failed error: Internal error occurred: Internal error occurred: jsonpatch add operation does not apply: doc is missing pat
h: "/metadata/annotations/admission-webhook.pingcap.com~1status"
E0117 13:35:00.886926       1 stateful_set.go:400] Error syncing StatefulSet tiflash-test5/tiflash, requeuing: Internal error occurred: Internal error occurred: jsonpatch add operation does not apply: doc is missing path: "/
metadata/annotations/admission-webhook.pingcap.com~1status"

Cannot uninstall iochaos when sidecar chaosfs running failed

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:23:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

Follow the doc to test iochaos #121 on TiKV.

I found the sidecar chaosfs was in the Waiting state because of CrashLoopBackOff, so I want to uninstall iochaos.

What did you expect to see?

uninstall iochaos successfully.

What did you see instead?

found the command is hanging:

$ kubectl delete -f examples/io-mixed-example.yaml
iochaos.pingcap.com "io-delay-example" deleted
...

here are part of controller logs:

2020-01-12T02:42:05.380Z	INFO	controllers.IoChaos	Recover I/O chaos action, network is not ok, retrying...	{"iochaos": "chaos-testing/io-delay-example", "reconciler": "chaosfs", "namespace": "default", "name": "demo-tikv-0"}
2020-01-12T02:42:07.380Z	INFO	controllers.IoChaos	Recover I/O chaos action, network is not ok, retrying...	{"iochaos": "chaos-testing/io-delay-example", "reconciler": "chaosfs", "namespace": "default", "name": "demo-tikv-0"}
2020-01-12T02:42:09.379Z	INFO	controllers.IoChaos	Recover I/O chaos action, network is not ok, retrying...	{"iochaos": "chaos-testing/io-delay-example", "reconciler": "chaosfs", "namespace": "default", "name": "demo-tikv-0"}
2020-01-12T02:42:11.380Z	INFO	controllers.IoChaos	Recover I/O chaos action, network is not ok, retrying...	{"iochaos": "chaos-testing/io-delay-example", "reconciler": "chaosfs", "namespace": "default", "name": "demo-tikv-0"}
2020-01-12T02:42:13.380Z	INFO	controllers.IoChaos	Recover I/O chaos action, network is not ok, retrying...	{"iochaos": "chaos-testing/io-delay-example", "reconciler": "chaosfs", "namespace": "default", "name": "demo-tikv-0"}
2020-01-12T02:42:15.379Z	ERROR	controllers.IoChaos	failed to recover I/O chaos action	{"iochaos": "chaos-testing/io-delay-example", "reconciler": "chaosfs", "namespace": "default", "name": "demo-tikv-0", "error": "timed out waiting for the condition"}
github.com/go-logr/zapr.(*zapLogger).Error
	/Users/manjunpeng/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/pingcap/chaos-mesh/controllers/iochaos/fs.(*Reconciler).recoverPod
	/Users/manjunpeng/Workspaces/chaos-mesh/controllers/iochaos/fs/types.go:184
github.com/pingcap/chaos-mesh/controllers/iochaos/fs.(*Reconciler).cleanFinalizersAndRecover
	/Users/manjunpeng/Workspaces/chaos-mesh/controllers/iochaos/fs/types.go:152
github.com/pingcap/chaos-mesh/controllers/iochaos/fs.(*Reconciler).Recover
	/Users/manjunpeng/Workspaces/chaos-mesh/controllers/iochaos/fs/types.go:112
github.com/pingcap/chaos-mesh/controllers/twophase.(*Reconciler).Reconcile
	/Users/manjunpeng/Workspaces/chaos-mesh/controllers/twophase/types.go:87
github.com/pingcap/chaos-mesh/controllers/iochaos.(*Reconciler).Reconcile
	/Users/manjunpeng/Workspaces/chaos-mesh/controllers/iochaos/types.go:46
github.com/pingcap/chaos-mesh/controllers.(*IoChaosReconciler).Reconcile
	/Users/manjunpeng/Workspaces/chaos-mesh/controllers/iochaos_controller.go:43
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/Users/manjunpeng/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/Users/manjunpeng/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/Users/manjunpeng/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/Users/manjunpeng/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/Users/manjunpeng/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/Users/manjunpeng/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
2020-01-12T02:42:15.379Z	ERROR	controllers.IoChaos	failed to recover chaos	{"iochaos": "chaos-testing/io-delay-example", "reconciler": "chaosfs", "error": "timed out waiting for the condition"}
github.com/go-logr/zapr.(*zapLogger).Error
	/Users/manjunpeng/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/pingcap/chaos-mesh/controllers/twophase.(*Reconciler).Reconcile
	/Users/manjunpeng/Workspaces/chaos-mesh/controllers/twophase/types.go:89
github.com/pingcap/chaos-mesh/controllers/iochaos.(*Reconciler).Reconcile
	/Users/manjunpeng/Workspaces/chaos-mesh/controllers/iochaos/types.go:46
github.com/pingcap/chaos-mesh/controllers.(*IoChaosReconciler).Reconcile
	/Users/manjunpeng/Workspaces/chaos-mesh/controllers/iochaos_controller.go:43
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/Users/manjunpeng/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/Users/manjunpeng/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/Users/manjunpeng/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/Users/manjunpeng/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/Users/manjunpeng/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/Users/manjunpeng/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

chaos-manager: panic when creating client to chaos-dameon

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:54:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:43:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

Install chaos-mesh and keep it running for several days.

What did you expect to see?

Running without crashes and errors.

What did you see instead?

chaos-manager panic when creating client to chaos-daemon

2020-01-17T05:40:08.498Z	INFO	util	Creating client to chaos-daemon	{"node": "172.16.4.33"}
E0117 05:40:08.498571       1 runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
goroutine 351 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x159cdc0, 0xc008d30d60)
	/home/pingcap/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/home/pingcap/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x82
panic(0x159cdc0, 0xc008d30d60)
	/home/pingcap/local/go/src/runtime/panic.go:679 +0x1b2
github.com/pingcap/chaos-mesh/controllers/networkchaos/partition.(*Reconciler).sendIPTables(0xc0041fe200, 0x18afd20, 0xc000044230, 0xc0029b4c00, 0x1, 0xc008d30ca0, 0x1b, 0x0, 0x0, 0x0, ...)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/networkchaos/partition/types.go:350 +0x30c
github.com/pingcap/chaos-mesh/controllers/networkchaos/partition.(*Reconciler).cleanFinalizersAndRecover(0xc0041fe200, 0x18afd20, 0xc000044230, 0xc007b94380, 0x0, 0x0)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/networkchaos/partition/types.go:292 +0x758
github.com/pingcap/chaos-mesh/controllers/networkchaos/partition.(*Reconciler).Recover(0xc0041fe200, 0x18afd20, 0xc000044230, 0xc007db08f0, 0xd, 0xc007081060, 0x19, 0x18ced40, 0xc007b94380, 0x0, ...)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/networkchaos/partition/types.go:194 +0x71
github.com/pingcap/chaos-mesh/controllers/twophase.(*Reconciler).Reconcile(0xc0016d5b28, 0xc007db08f0, 0xd, 0xc007081060, 0x19, 0xc0041fe1e0, 0x19, 0x187f120, 0xc006ca1c00)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/twophase/types.go:98 +0x6a5
github.com/pingcap/chaos-mesh/controllers/networkchaos.(*Reconciler).Reconcile(0xc0016d5c40, 0xc007db08f0, 0xd, 0xc007081060, 0x19, 0xc0041fe040, 0x1d4abffc08e8bc00, 0x5e2148b8, 0xc0016d5c60)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/networkchaos/types.go:50 +0x6ff
github.com/pingcap/chaos-mesh/controllers.(*NetworkChaosReconciler).Reconcile(0xc0001433e0, 0xc007db08f0, 0xd, 0xc007081060, 0x19, 0xc0016d5cd8, 0xc00069c120, 0xc0007ca908, 0x1885800)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/networkchaos_controller.go:43 +0x10e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001a4b40, 0x1515ae0, 0xc008e8bc00, 0xc0004b5d00)
	/home/pingcap/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x162
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001a4b40, 0x0)
	/home/pingcap/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xcb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0001a4b40)
	/home/pingcap/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0000c7460)
	/home/pingcap/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0000c7460, 0x3b9aca00, 0x0, 0x1, 0xc00027a720)
	/home/pingcap/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0000c7460, 0x3b9aca00, 0xc00027a720)
	/home/pingcap/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/home/pingcap/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x328
panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 351 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/home/pingcap/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:55 +0x105
panic(0x159cdc0, 0xc008d30d60)
	/home/pingcap/local/go/src/runtime/panic.go:679 +0x1b2
github.com/pingcap/chaos-mesh/controllers/networkchaos/partition.(*Reconciler).sendIPTables(0xc0041fe200, 0x18afd20, 0xc000044230, 0xc0029b4c00, 0x1, 0xc008d30ca0, 0x1b, 0x0, 0x0, 0x0, ...)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/networkchaos/partition/types.go:350 +0x30c
github.com/pingcap/chaos-mesh/controllers/networkchaos/partition.(*Reconciler).cleanFinalizersAndRecover(0xc0041fe200, 0x18afd20, 0xc000044230, 0xc007b94380, 0x0, 0x0)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/networkchaos/partition/types.go:292 +0x758
github.com/pingcap/chaos-mesh/controllers/networkchaos/partition.(*Reconciler).Recover(0xc0041fe200, 0x18afd20, 0xc000044230, 0xc007db08f0, 0xd, 0xc007081060, 0x19, 0x18ced40, 0xc007b94380, 0x0, ...)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/networkchaos/partition/types.go:194 +0x71
github.com/pingcap/chaos-mesh/controllers/twophase.(*Reconciler).Reconcile(0xc0016d5b28, 0xc007db08f0, 0xd, 0xc007081060, 0x19, 0xc0041fe1e0, 0x19, 0x187f120, 0xc006ca1c00)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/twophase/types.go:98 +0x6a5
github.com/pingcap/chaos-mesh/controllers/networkchaos.(*Reconciler).Reconcile(0xc0016d5c40, 0xc007db08f0, 0xd, 0xc007081060, 0x19, 0xc0041fe040, 0x1d4abffc08e8bc00, 0x5e2148b8, 0xc0016d5c60)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/networkchaos/types.go:50 +0x6ff
github.com/pingcap/chaos-mesh/controllers.(*NetworkChaosReconciler).Reconcile(0xc0001433e0, 0xc007db08f0, 0xd, 0xc007081060, 0x19, 0xc0016d5cd8, 0xc00069c120, 0xc0007ca908, 0x1885800)
	/home/pingcap/go/src/github.com/pingcap/chaos-mesh/controllers/networkchaos_controller.go:43 +0x10e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001a4b40, 0x1515ae0, 0xc008e8bc00, 0xc0004b5d00)
	/home/pingcap/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x162
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001a4b40, 0x0)
	/home/pingcap/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xcb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0001a4b40)
	/home/pingcap/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0000c7460)
	/home/pingcap/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0000c7460, 0x3b9aca00, 0x0, 0x1, 0xc00027a720)
	/home/pingcap/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0000c7460, 0x3b9aca00, 0xc00027a720)
	/home/pingcap/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/home/pingcap/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x328

Make Chaos Mesh can run with Istio

Feature Request

Is your feature request related to a problem? Please describe:

If Chaos Mesh can run with istio, it will be very helpful to service mesh

Describe the feature you'd like:

Make Chaos Mesh run with istio

Describe alternatives you've considered:

None

Teachability, Documentation, Adoption, Migration Strategy:

TBD

proposal: use sidecar to pause the pod

Now we use a fake image to pause the pod, this is not elegant, I think we can use sidecar mechanism, maybe:

  1. When we start the POD, using sidecar to inject an init container and start the init container.
  2. Let the init container start the real container, so we can support pause.

This is just my guess, I don't think whether it can work or not, but we can have a try.

Or maybe we can find a better way?

Add codecov for test code coverage

Feature Request

Is your feature request related to a problem? Please describe:

Now test code coverage is not clear and may need to improve
Describe the feature you'd like:

We can use codecov to visualization our test code coverage

Describe alternatives you've considered:

No
Teachability, Documentation, Adoption, Migration Strategy:

No

SW-2: Using chaos-operator to test binlog

Description

Now. Binlog chaos testing is on going. we should finish it. Add more tests in abtests

Difficulty

  • Hard

Score

  • 3000

TODO list

  • Make a helm chart that can start short road tests (500 points / medium)
  • Make a helm chart that can start long road tests (500 points / medium)
  • Support more nodes in SQLSmith (500 points / medium)
  • Refine abtest framework and SQLSmith (1000 points / hard)
  • Use test framework finding binlog bugs (500 points)

Mentor(s)

Recommended Skills

  • k8s, chaos, binlog architect

More information

Short Road

TiDB -> pump -> drainer -> TiDB/MySQL

Long Road

TiDB -> pump -> drainer -> Kafka -> arbiter -> TiDB/MySQL

SQLSmith

There are some urgent-need nodes in SQLSmith.

  • Alter table statement (add/delete index/column)

  • Delete table statement

Add more unit test

Feature Request

Is your feature request related to a problem? Please describe:

Now unit test is only 30%, I think first we should increase it to 50%
屏幕快照 2020-01-12 下午3 41 55

Describe the feature you'd like:

Increase unit test coverage to 50%, we can split it into many PRs .
Code coverage detail

  • pkg/chaosdaemon
  • pkg/webhook
  • pkg/chaosfs
  • pkg/chaosfs
  • pkg/mapreader
  • pkg/pidfile
  • pkg/util
  • pkg
  • api/v1alpha1

Describe alternatives you've considered:

None

Teachability, Documentation, Adoption, Migration Strategy:

Add event to record each chaos action

Feature Request

Is your feature request related to a problem? Please describe:

Currently, we didn't have any places to show the chaos actions which have been done after we apply the chaos api. I think it might be important for users to know the effect caused by chaos-mesh by each chaos api component.

Describe the feature you'd like:

For example, after I apply the PodChaos api, I could get the information by

kubectl describe pod-chaos xx 

And I can find the information for when and which pod the pod-chaos effected in events.

Weird message output when using kind

when I was running ./hack/kind-cluster-build.sh, it shows a warning: kind get kubeconfig-path is deprecated!
Not sure if the sorftware is successfully installed.

...
############# success create cluster:[kind] #############
To start using your cluster, run:
kind get kubeconfig-path is deprecated!

KIND will export and merge kubeconfig like kops, minikube, etc.
This command is now unnecessary and will be removed in a future release.

For more info see: kubernetes-sigs/kind#1060
See also the output of kind create cluster

export KUBECONFIG=/Users/dongxu/.kube/config

dongxu@dongxus-MacBook-Pro ~/gopkg/src/github.com/pingcap/chaos-mesh
master $ kind --version [15:40:42]
kind version 0.6.1

proposal: support restful API to trigger chaos

A friendly HTTP API can be very helpful to use chaos-mesh. We can support APIs such as triggr request

POST /api/v1/network
{
     "type":"partition",
     "source":"1.1.1.1",
     "target":"2.2.2.2",
     ...
}

Add Roadmap for Chaos Mesh

Feature Request

Is your feature request related to a problem? Please describe:

Describe the feature you'd like:

Add a roadmap for Chaos Mesh

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

chaos-controller: use pod ip instead of the node ip when connecting to chaos-daemon instances

Feature Request

Is your feature request related to a problem? Please describe:

Although the spec of chaos-daemon has set the hostPort, it doesn't use the host network explicitly, so in some cases there may occurs no route to host error when the chaos-controller try to connect to chaos-daemon instances using nodeip:port.

Describe the feature you'd like:

use pod ip instead of the node ip when connecting to chaos-daemon instances

Describe alternatives you've considered:

NONE

Teachability, Documentation, Adoption, Migration Strategy:

NONE

Test Chaos Mesh with Istio

Feature Request

Is your feature request related to a problem? Please describe:

Maybe Chaos Mesh can be work with Istio, If it can, we should test it

Describe the feature you'd like:

Make Chaos Mesh can work with istio, but first we should test it

Describe alternatives you've considered:

None

Teachability, Documentation, Adoption, Migration Strategy:

None

SW-1: Using chaos-operator to test DM

Description

Now. DM did not have chaos testing. we should add it. first it should be run on k8s. Then use chaos-operator to test it and add some test in it

Difficulty

  • Hard

Score

  • 3000

Mentor(s)

Recommended Skills

  • k8s, chaos, dm architect

SW-3: Using chaos-operator to test cdc

Description

cdc now did not have chaos test. we should add it

Difficulty

  • Hard

Score

  • 4500

Mentor(s)

Recommended Skills

  • k8s, chaos,cdc architect

TODOs

  • run cdc in kubernetes
  • develop a workload case
  • develop a check case ( can be merged with task 2 )
  • inject chaos tikv and to check cdc ( pod kill / pod failure / network delay / network partation / io delay )
  • inject chaos to cdc
  • inject chaos both cdc and tikv
  • last but not least, output a test report to record issue and test result

Support containerd when getting networknamespace

Feature Request

As we have use kind as suggested development environment in development guide #160, and minikube linux mirrors doesn't support netem (because kernel doesn't have sch_netem module), we should support containerd runtime.

Now we use docker pkg to get pid by container id, and we can do the same thing with containerd. We can filter the container out from client.Containers() and find pid with container.Task().Pid()

failed to create cluster using `hack/kind-cluster-build.sh`

Bug Report

What did you do?

run bash hack/kind-cluster-build.sh in a centos 7 vm

What did you expect to see?
using kind to create cluster success

What did you see instead?

ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged kind-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

when using docker exec --privileged kind-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6 command to create, the output is

Error response from daemon: No such container: kind-control-plane

Add bandwidth chaos

Feature Request

Is your feature request related to a problem? Please describe:

Sometimes bandwidth limit and chaos is necessary for multi cloud

Describe the feature you'd like:

We can limit some pod output bandwidth is 50Mb

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

Add chaos-fs client in repo and docker image

Feature Request

Is your feature request related to a problem? Please describe:

When I want to debug chaos-fs, We can only start a chaos experiment and define a yaml to achieve the fault injection

Describe the feature you'd like:

It is useful to have a chaos-fs client in docker image and repo for debug which can trigger manual

Describe alternatives you've considered:

No.
Teachability, Documentation, Adoption, Migration Strategy:

No

make commands more clear in readme

Feature Request

Is your feature request related to a problem? Please describe:

  • refine readme: follow the step by step commands, just copy and past can start the whole cluster with chaos

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

proposal: add metrics for chaos CRD

From @yeya24 's suggestion:
Chaos Mesh need some metrics to show how many CRDs it have for each kind of injection, we can use gauge metric.
If we have such metrics, it can display the history of CRDs and can be added to dashboard

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.