Some of the tests intermittently fail due to race conditions, and this happens enough

Example failure: <div class="highlight highlight-source-shell notranslate position

I can think of two ways to fix it: ideally: run the tests in a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Fix flaky tests about balena-sdk HOT 14 CLOSED

balena-io commented on August 18, 2024 1

Fix flaky tests

from balena-sdk.

Comments (14)

pimterry commented on August 18, 2024 2

Seems like https://github.com/resin-io/resin-api/issues/743 was indeed the cause of this. Now resolved, seems completely stable so far, closing.

from balena-sdk.

pimterry commented on August 18, 2024 1

Progress! This got a lot worse, which made it easier (and more important) to find a local repro case for this (https://www.flowdock.com/app/rulemotion/resin-tech/threads/L1h5i3yoD9wLEKYKqI9k6pmoWxH), and we've now got a fix for at least one underlying cause: https://github.com/resin-io/resin-api/pull/744

from balena-sdk.

pimterry commented on August 18, 2024

Example failure:

1) SDK Integration Tests given a logged in fresh user given a single application with a single offline device "before each" hook for "should be rejected if the device does not exist":
     ResinDeviceNotFound: Device not found: cb4200a4845a00156a01bf5dca5911e55c746603af2d14ddbde9e00cc2f366
    at lib/models/device.coffee:205:15
    at PassThroughHandlerContext.finallyHandler (node_modules/bluebird/js/release/finally.js:56:23)
    at bound (domain.js:280:14)
    at PassThroughHandlerContext.runBound (domain.js:293:12)
    at PassThroughHandlerContext.tryCatcher (node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (node_modules/bluebird/js/release/promise.js:510:31)
    at Promise._settlePromise (node_modules/bluebird/js/release/promise.js:567:18)
    at Promise._settlePromise0 (node_modules/bluebird/js/release/promise.js:612:10)
    at Promise._settlePromises (node_modules/bluebird/js/release/promise.js:691:18)
    at Async._drainQueue (node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues (node_modules/bluebird/js/release/async.js:17:14)

Failing for test at https://github.com/resin-io/resin-sdk/blob/6f27556db7a14725fb5357f2e534e6e2bddf7e1a/tests/integration.spec.coffee#L814, failing in beforeEach here: https://github.com/resin-io/resin-sdk/blob/6f27556db7a14725fb5357f2e534e6e2bddf7e1a/tests/integration.spec.coffee#L726-L735

from balena-sdk.

emirotin commented on August 18, 2024

I can think of two ways to fix it:

ideally: run the tests in an isolated environment. Can be tricky as the env is big and also parts of it are closed-source. Maybe some job of @hedss can be helpful here though
as a much easier plan: generate the random test session ID and dynamically augment all IDs/UUIDs to include it

from balena-sdk.

pimterry commented on August 18, 2024

@emirotin It's interesting. I'm pretty sure I'd seen failures before this that were because of reused ids and conflicts, but the error above (which was failing the UMD build, until I reran the job) is already using a random device UUID, but it's still saying that that random id doesn't exist. The application name is the same between test suites, but that's not what it's complaining about, and I don't think that should matter.

To me, the example above at least doesn't look like an isolation problem. Could be a race condition in the API, where querying immediately after device registration sometimes gets a 404?

I think I have seen other isolation problems elsewhere though - we should keep an eye out and add more examples here if they come up.

from balena-sdk.

emirotin commented on August 18, 2024

Wow interesting indeed, I thought we used fixed fixtures.
The API should not result in 404s, response is supposed to be closed only after the entry is saved in the DB, but that's as far as I know.
The app name must be unique within the account, so in case of race conditions the app creation could fail

from balena-sdk.

pimterry commented on August 18, 2024

I've been doing some local testing on this. I can't seem to reproduce conflicts between parallel tests using different user accounts, but I can very reliably with parallel tests using the same user, including that 'Device not found' error.

As discussed above though, this probably isn't just because of potentially conflicting names. It seems to happen if you use the same user because each test deletes everything in the account, but that doesn't explain much because Travis and Appveyor use different user credentials, and our Travis settings don't allow any concurrent jobs - they should never see this (but they are). Appveyor did allow parallel jobs, but I've just disabled that.

I'm going to keep an eye on the builds while I work on #243 and include other error details in here. I'm not sure what to make of this otherwise for now - each user should have only one test run going at a time, but it seems like other tests are still interfering with them.

from balena-sdk.

emirotin commented on August 18, 2024

If the tests use the different users they couldn't delete anything from other users' accounts.

from balena-sdk.

pimterry commented on August 18, 2024

And yet, something seems to be.

Do we have any other builds somewhere that could be using these same credentials maybe? Or something else? It seems a lot like something is resetting these user accounts while the tests are running, but they're not run in parallel, and appveyor and travis are using different user accounts. I don't understand where the interference is coming from. Any ideas?

from balena-sdk.

emirotin commented on August 18, 2024

No ideas :(

…

On Tue, Jan 31, 2017 at 3:04 PM, Tim Perry ***@***.***> wrote: And yet, something seems to be. Do we have any other builds somewhere that could be using these same credentials maybe? Or something else? It seems a lot like something is resetting these user accounts while the tests are running, but they're not run in parallel, and appveyor and travis are using different user accounts. I don't understand where the interference is coming from. Any ideas? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#252 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAgGCHbOHcdOq3onAKfIW5FaDA_0crXHks5rXyM9gaJpZM4LfcdN> .

-- Eugene Mirotin Senior Frontend Engineer site: Resin.io <https://resin.io/>, twitter: @resin_io <https://twitter.com/resin_io>

from balena-sdk.

Page- commented on August 18, 2024

Maybe end to end tests? Or people running local tests? (@jviotti did you maybe use the credentials you use locally when setting up travis/appveyor?) Do the cli tests use the same credentials maybe?

from balena-sdk.

emirotin commented on August 18, 2024

I think the CLI doesn't have tests. e2e should not use the same creds afaik.

…

On Wed, Feb 1, 2017 at 1:50 AM, Page- ***@***.***> wrote: Maybe end to end tests? Or people running local tests? ***@***.*** <https://github.com/jviotti> did you maybe use the credentials you use locally when setting up travis/appveyor?) Do the cli tests use the same credentials maybe? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#252 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAgGCMet-sSJb4Mp9yiDzy4ya0URIpUfks5rX7qlgaJpZM4LfcdN> .

-- Eugene Mirotin Senior Frontend Engineer site: Resin.io <https://resin.io/>, twitter: @resin_io <https://twitter.com/resin_io>

from balena-sdk.

pimterry commented on August 18, 2024

Continuing collecting data: on #263 the Appveyor build passed happily 1st time, despite seemingly running two jobs with the same credentials in parallel, despite now being set to run only one job at a time. That's weird for starters.

Then the Travis build, which did run jobs sequentially, entirely after the Appveyor build had finished, failed the first job with:

  1) SDK Integration Tests given a logged in fresh user given a single application with a single offline device "before each" hook for "should be rejected if the device uuid does not exist":
     ResinDeviceNotFound: Device not found: d505a6fb66352e5cf4b5e2ad7c58362c05331146048518b43002ad3ddf80ed
    at build/models/device.js:238:15
    at PassThroughHandlerContext.finallyHandler (node_modules/bluebird/js/release/finally.js:56:23)
    at bound (domain.js:280:14)
    at PassThroughHandlerContext.runBound (domain.js:293:12)
    at PassThroughHandlerContext.tryCatcher (node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (node_modules/bluebird/js/release/promise.js:510:31)
    at Promise._settlePromise (node_modules/bluebird/js/release/promise.js:567:18)
    at Promise._settlePromise0 (node_modules/bluebird/js/release/promise.js:612:10)
    at Promise._settlePromises (node_modules/bluebird/js/release/promise.js:691:18)
    at Async._drainQueue (node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues (node_modules/bluebird/js/release/async.js:17:14)

That's another intermitten failure from this beforeEach hook. It fails in device.get, looking for the uuid that was just returned by device.register. Second job passed happily, and first job did then pass after being rerun.

Seems unlikely that there are other people running local builds at the exact same time so reliably (this seems to hit around half of PRs as they're submitted). I'm trying to do some more local testing, but I still can't reproduce this locally. I'll keep slowly running at it, but any more ideas are very welcome too.

from balena-sdk.

lurch commented on August 18, 2024

I still can't reproduce this locally. I'll keep slowly running at it, but any more ideas are very welcome too.

Would it be worth using something like ApacheBench (or a modified version thereof) to temporarily increase the load on the server while running the unit tests? *shrug* https://www.simonholywell.com/post/2015/06/parallel-benchmark-many-urls-with-apachebench/

from balena-sdk.

Fix flaky tests about balena-sdk HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent