Git Product home page Git Product logo

Comments (4)

bsamuels453 avatar bsamuels453 commented on May 26, 2024

i was able to repro this with another program with a similar configuration with a latency delay of 0s

from chaos-mesh.

PerSundstr8m avatar PerSundstr8m commented on May 26, 2024

(Sorry for the formatting, I hate these editors)

I looked into this bug and the issue is not a "toda" crash.
The issue is that there is a "timing hole" between the "fd replacer" and the "cwd replacer" where the a process is allowed to run and open new files causing an EBUSY on the "umount".

"postgres" typically creates a new process following a network request, and that process will open files mapped to the request.
In the reported case, there is a large backlog of requests that are released when the experiment closes causing a burst of new processes [that open files].

I wrote a mock program that tries to reproduce this behaviour and it causes the problem almost every time.
(script included at the end of the post)

If I run it in a loop like this:
"while true;do kubectl -n "$NAMESPACE" delete pod test-pod; timeout 100 ./test-chaos.sh; done"
and then do a selective "grep" on the output, I see this pattern:

$ grep -E 'Transport|Starting|delay_..ms' typescript
Starting I/O chaos
ls: cannot access '/mnt/data': Transport endpoint is not connected
lrwx------ 1 root root 64 Mar 14 23:42 /proc/110/fd/3 -> /mnt/__chaosfs__data__/delay_40ms
lrwx------ 1 root root 64 Mar 14 23:42 /proc/111/fd/3 -> /mnt/__chaosfs__data__/delay_50ms
lrwx------ 1 root root 64 Mar 14 23:42 /proc/112/fd/3 -> /mnt/__chaosfs__data__/delay_60ms
lrwx------ 1 root root 64 Mar 14 23:42 /proc/113/fd/3 -> /mnt/__chaosfs__data__/delay_70ms
lrwx------ 1 root root 64 Mar 14 23:42 /proc/114/fd/3 -> /mnt/__chaosfs__data__/delay_80ms
lrwx------ 1 root root 64 Mar 14 23:42 /proc/115/fd/3 -> /mnt/__chaosfs__data__/delay_90ms
lrwx------ 1 root root 64 Mar 14 23:42 /proc/107/fd/3 -> /mnt/data/delay_10ms
lrwx------ 1 root root 64 Mar 14 23:42 /proc/108/fd/3 -> /mnt/data/delay_20ms
lrwx------ 1 root root 64 Mar 14 23:42 /proc/109/fd/3 -> /mnt/data/delay_30ms
Starting I/O chaos
ls: cannot access '/mnt/data': Transport endpoint is not connected
lrwx------ 1 root root 64 Mar 14 23:45 /proc/112/fd/3 -> /mnt/__chaosfs__data__/delay_60ms
lrwx------ 1 root root 64 Mar 14 23:45 /proc/113/fd/3 -> /mnt/__chaosfs__data__/delay_70ms
lrwx------ 1 root root 64 Mar 14 23:45 /proc/114/fd/3 -> /mnt/__chaosfs__data__/delay_80ms
lrwx------ 1 root root 64 Mar 14 23:45 /proc/115/fd/3 -> /mnt/__chaosfs__data__/delay_90ms
lrwx------ 1 root root 64 Mar 14 23:45 /proc/107/fd/3 -> /mnt/data/delay_10ms
lrwx------ 1 root root 64 Mar 14 23:45 /proc/108/fd/3 -> /mnt/data/delay_20ms
lrwx------ 1 root root 64 Mar 14 23:45 /proc/109/fd/3 -> /mnt/data/delay_30ms
lrwx------ 1 root root 64 Mar 14 23:45 /proc/110/fd/3 -> /mnt/data/delay_40ms
lrwx------ 1 root root 64 Mar 14 23:45 /proc/111/fd/3 -> /mnt/data/delay_50ms
Starting I/O chaos
ls: cannot access '/mnt/data': Transport endpoint is not connected
lrwx------ 1 root root 64 Mar 14 23:47 /proc/113/fd/3 -> /mnt/__chaosfs__data__/delay_70ms
lrwx------ 1 root root 64 Mar 14 23:47 /proc/114/fd/3 -> /mnt/__chaosfs__data__/delay_80ms
lrwx------ 1 root root 64 Mar 14 23:47 /proc/115/fd/3 -> /mnt/__chaosfs__data__/delay_90ms
lrwx------ 1 root root 64 Mar 14 23:47 /proc/107/fd/3 -> /mnt/data/delay_10ms
lrwx------ 1 root root 64 Mar 14 23:47 /proc/108/fd/3 -> /mnt/data/delay_20ms
lrwx------ 1 root root 64 Mar 14 23:47 /proc/109/fd/3 -> /mnt/data/delay_30ms
lrwx------ 1 root root 64 Mar 14 23:47 /proc/110/fd/3 -> /mnt/data/delay_40ms
lrwx------ 1 root root 64 Mar 14 23:47 /proc/111/fd/3 -> /mnt/data/delay_50ms
lrwx------ 1 root root 64 Mar 14 23:47 /proc/112/fd/3 -> /mnt/data/delay_60ms
Starting I/O chaos
ls: cannot access '/mnt/data': Transport endpoint is not connected
lrwx------ 1 root root 64 Mar 14 23:49 /proc/110/fd/3 -> /mnt/__chaosfs__data__/delay_40ms
lrwx------ 1 root root 64 Mar 14 23:49 /proc/111/fd/3 -> /mnt/__chaosfs__data__/delay_50ms
lrwx------ 1 root root 64 Mar 14 23:49 /proc/112/fd/3 -> /mnt/__chaosfs__data__/delay_60ms
lrwx------ 1 root root 64 Mar 14 23:49 /proc/113/fd/3 -> /mnt/__chaosfs__data__/delay_70ms
lrwx------ 1 root root 64 Mar 14 23:49 /proc/114/fd/3 -> /mnt/__chaosfs__data__/delay_80ms
lrwx------ 1 root root 64 Mar 14 23:49 /proc/115/fd/3 -> /mnt/__chaosfs__data__/delay_90ms
lrwx------ 1 root root 64 Mar 14 23:49 /proc/107/fd/3 -> /mnt/data/delay_10ms
lrwx------ 1 root root 64 Mar 14 23:49 /proc/108/fd/3 -> /mnt/data/delay_20ms
lrwx------ 1 root root 64 Mar 14 23:49 /proc/109/fd/3 -> /mnt/data/delay_30ms
Starting I/O chaos
ls: cannot access '/mnt/data': Transport endpoint is not connected
lrwx------ 1 root root 64 Mar 14 23:51 /proc/107/fd/3 -> /mnt/__chaosfs__data__/delay_10ms
lrwx------ 1 root root 64 Mar 14 23:51 /proc/111/fd/3 -> /mnt/__chaosfs__data__/delay_50ms
lrwx------ 1 root root 64 Mar 14 23:51 /proc/112/fd/3 -> /mnt/__chaosfs__data__/delay_60ms
lrwx------ 1 root root 64 Mar 14 23:51 /proc/113/fd/3 -> /mnt/__chaosfs__data__/delay_70ms
lrwx------ 1 root root 64 Mar 14 23:51 /proc/114/fd/3 -> /mnt/__chaosfs__data__/delay_80ms
lrwx------ 1 root root 64 Mar 14 23:51 /proc/115/fd/3 -> /mnt/__chaosfs__data__/delay_90ms
lrwx------ 1 root root 64 Mar 14 23:51 /proc/108/fd/3 -> /mnt/data/delay_20ms
lrwx------ 1 root root 64 Mar 14 23:51 /proc/109/fd/3 -> /mnt/data/delay_30ms
lrwx------ 1 root root 64 Mar 14 23:51 /proc/110/fd/3 -> /mnt/data/delay_40ms
Starting I/O chaos
ls: cannot access '/mnt/data': Transport endpoint is not connected
lrwx------ 1 root root 64 Mar 14 23:53 /proc/112/fd/3 -> /mnt/__chaosfs__data__/delay_60ms
lrwx------ 1 root root 64 Mar 14 23:53 /proc/113/fd/3 -> /mnt/__chaosfs__data__/delay_70ms
lrwx------ 1 root root 64 Mar 14 23:53 /proc/114/fd/3 -> /mnt/__chaosfs__data__/delay_80ms
lrwx------ 1 root root 64 Mar 14 23:53 /proc/115/fd/3 -> /mnt/__chaosfs__data__/delay_90ms
lrwx------ 1 root root 64 Mar 14 23:53 /proc/107/fd/3 -> /mnt/data/delay_10ms
lrwx------ 1 root root 64 Mar 14 23:53 /proc/108/fd/3 -> /mnt/data/delay_20ms
lrwx------ 1 root root 64 Mar 14 23:53 /proc/109/fd/3 -> /mnt/data/delay_30ms
lrwx------ 1 root root 64 Mar 14 23:53 /proc/110/fd/3 -> /mnt/data/delay_40ms
lrwx------ 1 root root 64 Mar 14 23:53 /proc/111/fd/3 -> /mnt/data/delay_50ms

A possible solution would be to instead of doing

for replacer in replacers {
..for process in processes {
....replace
..}
}
do
for process in processes {
..for replacer in replaces {
....replace
..}
}

Another possible solution would be to run the "cwd replacer" before the "fd replacer", but I do not know if that would have other implications.

/Per
reproducer script:

$ more test-chaos.sh
#!/bin/bash
NAMESPACE=${NAMESPACE:-default}

cat > test.c <<"EOF"
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/errno.h>
#include <sys/fcntl.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    int delay_ms = 0;
    // Default delay 10ms
    int delay_step_ms = argc > 1 ? atoi(argv[1]) : 10;
    int ret;
    char fname[30];

    // Create 50 process with open files
    // to give "toda" something to work with
    for (int i=0; i<50 ; i++) {
        if (fork() == 0) {
            sprintf(fname, "init_%d",delay_ms);
            open(fname, O_RDWR|O_CREAT, 0444);
            sleep(200);
            exit(0);
        }
    }

    // Allow some time for starting the i/o experimet
    fprintf(stderr, "Using %dms delay\n", delay_step_ms);
    fprintf(stderr, "Please start the experiment\n");
    sleep(10);

    // block
    struct stat s;
    fprintf(stderr, "Block..\n");
    stat("a_file", &s);
    fprintf(stderr, "..done blocking\n");

    // simulate the case where a backlog of processes finally
    // get to run (as in a listen+fork setup)
    //
    // open a file with varying delay ([0 .. 2] second) from the unfreeze

    while(delay_ms < 2000) {
        delay_ms += delay_step_ms;
        if (fork() == 0) {
            char fname[30];
            sprintf(fname, "delay_%dms",delay_ms);
            usleep(1000 * delay_ms);
            open(fname, O_RDWR|O_CREAT, 0444);
            sleep(1000000);
            exit(0);
        }
        fprintf(stderr,"forked child\n");
    }
    do {
        int status;
        fprintf(stderr,"waiting for children\n");
        ret = wait(&status);
    } while (errno != ECHILD);
}
EOF

make test

cat >test-pod.yaml <<"EOF"
kind: Pod
apiVersion: v1
metadata:
  name: test-pod
  labels:
    app: chaos-test
spec:
  containers:
    - name: test-container
      image: registry.suse.com/bci/bci-base:15.5
      volumeMounts:
      - mountPath: "/mnt/data"
        name: test-volume
      command: [ "sleep", "1000000" ]
  volumes:
    - name: test-volume
      emptyDir: {}
EOF

test -n "$NODENAME" && {
  echo "  nodeName: $NODENAME" | \
   sed -i "s/^spec:/spec:\n  nodeName: $NODENAME/" test-pod.yaml
}

sed -e "s/_NAMESPACE_/$NAMESPACE/" >test-chaos.yaml <<"EOF"
apiVersion: chaos-mesh.org/v1alpha1
kind: IOChaos
metadata:
  name: io-latency-test
  namespace: _NAMESPACE_
spec:
  action: latency
  mode: one
  selector:
    labelSelectors:
      app: chaos-test
  volumePath: /mnt/data
  path: /mnt/data/**/*
  delay: "60s"
  percent: 100
  duration: "60s"
  containerNames: [test-container]
EOF

kubectl -n "$NAMESPACE" create -f test-pod.yaml
kubectl -n "$NAMESPACE" wait --for=condition=ready pod/test-pod
kubectl cp test "$NAMESPACE"/test-pod:/test
echo "cd /mnt/data;/test" | kubectl -n "$NAMESPACE" exec -i test-pod -- bash &
sleep 4
echo "Starting I/O chaos"
for cmd in delete apply;do kubectl $cmd -f test-chaos.yaml;done
sleep 80
echo "Status after experiment:"
echo "ls -l /mnt;ls -l /proc/*/fd/*|grep delay|sort -k11" | kubectl -n "$NAMESPACE" exec -i test-pod -- bash
wait

from chaos-mesh.

PerSundstr8m avatar PerSundstr8m commented on May 26, 2024

Built a "toda" where the order of the replacers is "CWD" "FD" "MMAP", but then the problem just moved to a state where some processes would have the incorrect "cwd" instead of fd's, still causing a EBUSY on the umount.

from chaos-mesh.

PerSundstr8m avatar PerSundstr8m commented on May 26, 2024

Looks like the "timing hole" is between the "prepare" and the "run" methods.

If we assume that the process creating process has its working directory on the mount point is stable and that its "cwd" was successful, then re-running the prepare/run sequence will pick up any missed processes, and new processes will create files in the right place.

With this patch, I cannot reproduce the problem.
A more intelligent solution would of course be to map the retries to a failed recover_mount().

diff --git a/src/main.rs b/src/main.rs
index d860e90..49c7bcc 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -122,6 +122,12 @@ fn resume(option: Options, mount_guard: MountInjectionGuard) -> Result<()> {
         let result = replacer.run();
         info!("replace result: {:?}", result);

+        replacer = UnionReplacer::default();
+        replacer.prepare(&path, &new_path)?;
+        info!("running replacer again");
+        let result2 = replacer.run();
+        info!("replace2 result: {:?}", result2);
+
         Some(replacer)
     } else {
         None

from chaos-mesh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.