The following <a class="issue-link js-issue-link" data-error-text="Failed to load titl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

There's still quite a few issues with CI, if you look at <a href="http://perf-sm5039-4

There's still quite a few issues with CI, if you look at <a href="http://

Nothing to do with fs-drift <a class="user-mention notranslate" data-hovercard-type="u

CI reliability about benchmark-wrapper HOT 12 CLOSED

cloud-bulldozer commented on August 19, 2024

CI reliability

from benchmark-wrapper.

Comments (12)

aakarshg commented on August 19, 2024

@dry923 might be worth the time to investigate the issue when you get back and have some cycles.

from benchmark-wrapper.

aakarshg commented on August 19, 2024

@bengland2 has pointed out that the CI hasn't been rebuilding the workload images ( snafu_ci tag ) in over a month, which is evident through logs so this would mean that the CI has been using old images for workloads and thus the tests could have been passing ?

from benchmark-wrapper.

aakarshg commented on August 19, 2024

The fix for this would be to

clear out the prior incarnation of the image before pushing the new one? That way, either the test fails or the image used is actually reflecting the source code that we want to test.

as pointed out by @bengland2

from benchmark-wrapper.

bengland2 commented on August 19, 2024

Here is a set of changes that I believe fixes most of the problem. It isn't fully tested yet because I have to finish other PRs but wanted to make you aware of what I had so far:

https://github.com/bengland2/snafu/tree/ensure-benchmark-updated

I'll submit as a PR if there is interest, but was waiting to get Russ input, etc.
The basic ideas are: 1) centralize as much logic as possible in ci/common.sh, and 2) never assume that a bash command succeeds. While bash "set -e" is a brute force way to accomplish this, it also prevents you from doing things like retrying podman push, or using an error status as input to an if conditional.

from benchmark-wrapper.

rsevilla87 commented on August 19, 2024

@bengland2 just merged this PR some hours ago. #113

from benchmark-wrapper.

bengland2 commented on August 19, 2024

I didn't know if anyone was working on it, that's why I didn't submit as a PR. As long as the problem gets solved...

from benchmark-wrapper.

aakarshg commented on August 19, 2024

There's still quite a few issues with CI, if you look at http://perf-sm5039-4-8.perf.lab.eng.rdu2.redhat.com:8080/job/snafu_jjb/87/consoleFull for fs-drift you can see that the uuid cannot even be parsed ( the reason is because we delete all before we can look at the uuid ), but the CI still replied with a passed on fs-drift !

from benchmark-wrapper.

dry923 commented on August 19, 2024

There's still quite a few issues with CI, if you look at http://perf-sm5039-4-8.perf.lab.eng.rdu2.redhat.com:8080/job/snafu_jjb/87/consoleFull for fs-drift you can see that the uuid cannot even be parsed ( the reason is because we delete all before we can look at the uuid ), but the CI still replied with a passed on fs-drift !

@aakarshg so I found an issue when running the kubectl delete namespace in wait clean fails because there is no namespace it falls into the error function. So it would still show that the test was successful since this would likely be outside of that loop (there's a wait_clean at the end of run_test.sh which I think this is happening on) and the test would have really succeeded. I'm testing out a simple fix now.

from benchmark-wrapper.

bengland2 commented on August 19, 2024

did this have anything to do with fs-drift at all? I couldn't see it.

from benchmark-wrapper.

aakarshg commented on August 19, 2024

Nothing to do with fs-drift @bengland2 , the benchmark actually works fine... its the ci script in ripsaw that's also affecting CI runs in snafu.

from benchmark-wrapper.

rsevilla87 commented on August 19, 2024

The problem is that fs-drift, fio and smallfile ripsaw tests deploy two different CRs. One for the standard IO and another one for hostPath IO. The tests on ripsaw work well, but Snafu uses these test with a different wrapper.

from benchmark-wrapper.

rsevilla87 commented on August 19, 2024

I figured out that wait_clean is being called twice in IO tests, one at the end of each test in snafu https://github.com/cloud-bulldozer/snafu/blob/master/ci/run_ci.sh#L61 and another one at the end of IO tests.
https://github.com/cloud-bulldozer/ripsaw/blob/master/tests/test_fiod.sh#L25

Both wait_clean functions are in different source files, but either of them delete the namespace, so the second fails when called.
I also have detected that the function check_es does not check if all arguments have been passed to it. So when for example, the second argument is missing (indices to check), it does not check any index and returns 0.

from benchmark-wrapper.

CI reliability about benchmark-wrapper HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent