Git Product home page Git Product logo

Comments (14)

pimterry avatar pimterry commented on August 18, 2024 2

Seems like https://github.com/resin-io/resin-api/issues/743 was indeed the cause of this. Now resolved, seems completely stable so far, closing.

from balena-sdk.

pimterry avatar pimterry commented on August 18, 2024 1

Progress! This got a lot worse, which made it easier (and more important) to find a local repro case for this (https://www.flowdock.com/app/rulemotion/resin-tech/threads/L1h5i3yoD9wLEKYKqI9k6pmoWxH), and we've now got a fix for at least one underlying cause: https://github.com/resin-io/resin-api/pull/744

from balena-sdk.

pimterry avatar pimterry commented on August 18, 2024

Example failure:

1) SDK Integration Tests given a logged in fresh user given a single application with a single offline device "before each" hook for "should be rejected if the device does not exist":
     ResinDeviceNotFound: Device not found: cb4200a4845a00156a01bf5dca5911e55c746603af2d14ddbde9e00cc2f366
    at lib/models/device.coffee:205:15
    at PassThroughHandlerContext.finallyHandler (node_modules/bluebird/js/release/finally.js:56:23)
    at bound (domain.js:280:14)
    at PassThroughHandlerContext.runBound (domain.js:293:12)
    at PassThroughHandlerContext.tryCatcher (node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (node_modules/bluebird/js/release/promise.js:510:31)
    at Promise._settlePromise (node_modules/bluebird/js/release/promise.js:567:18)
    at Promise._settlePromise0 (node_modules/bluebird/js/release/promise.js:612:10)
    at Promise._settlePromises (node_modules/bluebird/js/release/promise.js:691:18)
    at Async._drainQueue (node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues (node_modules/bluebird/js/release/async.js:17:14)

Failing for test at https://github.com/resin-io/resin-sdk/blob/6f27556db7a14725fb5357f2e534e6e2bddf7e1a/tests/integration.spec.coffee#L814, failing in beforeEach here: https://github.com/resin-io/resin-sdk/blob/6f27556db7a14725fb5357f2e534e6e2bddf7e1a/tests/integration.spec.coffee#L726-L735

from balena-sdk.

emirotin avatar emirotin commented on August 18, 2024

I can think of two ways to fix it:

  • ideally: run the tests in an isolated environment. Can be tricky as the env is big and also parts of it are closed-source. Maybe some job of @hedss can be helpful here though
  • as a much easier plan: generate the random test session ID and dynamically augment all IDs/UUIDs to include it

from balena-sdk.

pimterry avatar pimterry commented on August 18, 2024

@emirotin It's interesting. I'm pretty sure I'd seen failures before this that were because of reused ids and conflicts, but the error above (which was failing the UMD build, until I reran the job) is already using a random device UUID, but it's still saying that that random id doesn't exist. The application name is the same between test suites, but that's not what it's complaining about, and I don't think that should matter.

To me, the example above at least doesn't look like an isolation problem. Could be a race condition in the API, where querying immediately after device registration sometimes gets a 404?

I think I have seen other isolation problems elsewhere though - we should keep an eye out and add more examples here if they come up.

from balena-sdk.

emirotin avatar emirotin commented on August 18, 2024

Wow interesting indeed, I thought we used fixed fixtures.
The API should not result in 404s, response is supposed to be closed only after the entry is saved in the DB, but that's as far as I know.
The app name must be unique within the account, so in case of race conditions the app creation could fail

from balena-sdk.

pimterry avatar pimterry commented on August 18, 2024

I've been doing some local testing on this. I can't seem to reproduce conflicts between parallel tests using different user accounts, but I can very reliably with parallel tests using the same user, including that 'Device not found' error.

As discussed above though, this probably isn't just because of potentially conflicting names. It seems to happen if you use the same user because each test deletes everything in the account, but that doesn't explain much because Travis and Appveyor use different user credentials, and our Travis settings don't allow any concurrent jobs - they should never see this (but they are). Appveyor did allow parallel jobs, but I've just disabled that.

I'm going to keep an eye on the builds while I work on #243 and include other error details in here. I'm not sure what to make of this otherwise for now - each user should have only one test run going at a time, but it seems like other tests are still interfering with them.

from balena-sdk.

emirotin avatar emirotin commented on August 18, 2024

from balena-sdk.

pimterry avatar pimterry commented on August 18, 2024

And yet, something seems to be.

Do we have any other builds somewhere that could be using these same credentials maybe? Or something else? It seems a lot like something is resetting these user accounts while the tests are running, but they're not run in parallel, and appveyor and travis are using different user accounts. I don't understand where the interference is coming from. Any ideas?

from balena-sdk.

emirotin avatar emirotin commented on August 18, 2024

from balena-sdk.

Page- avatar Page- commented on August 18, 2024

Maybe end to end tests? Or people running local tests? (@jviotti did you maybe use the credentials you use locally when setting up travis/appveyor?) Do the cli tests use the same credentials maybe?

from balena-sdk.

emirotin avatar emirotin commented on August 18, 2024

from balena-sdk.

pimterry avatar pimterry commented on August 18, 2024

Continuing collecting data: on #263 the Appveyor build passed happily 1st time, despite seemingly running two jobs with the same credentials in parallel, despite now being set to run only one job at a time. That's weird for starters.

Then the Travis build, which did run jobs sequentially, entirely after the Appveyor build had finished, failed the first job with:

  1) SDK Integration Tests given a logged in fresh user given a single application with a single offline device "before each" hook for "should be rejected if the device uuid does not exist":
     ResinDeviceNotFound: Device not found: d505a6fb66352e5cf4b5e2ad7c58362c05331146048518b43002ad3ddf80ed
    at build/models/device.js:238:15
    at PassThroughHandlerContext.finallyHandler (node_modules/bluebird/js/release/finally.js:56:23)
    at bound (domain.js:280:14)
    at PassThroughHandlerContext.runBound (domain.js:293:12)
    at PassThroughHandlerContext.tryCatcher (node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (node_modules/bluebird/js/release/promise.js:510:31)
    at Promise._settlePromise (node_modules/bluebird/js/release/promise.js:567:18)
    at Promise._settlePromise0 (node_modules/bluebird/js/release/promise.js:612:10)
    at Promise._settlePromises (node_modules/bluebird/js/release/promise.js:691:18)
    at Async._drainQueue (node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues (node_modules/bluebird/js/release/async.js:17:14)

That's another intermitten failure from this beforeEach hook. It fails in device.get, looking for the uuid that was just returned by device.register. Second job passed happily, and first job did then pass after being rerun.

Seems unlikely that there are other people running local builds at the exact same time so reliably (this seems to hit around half of PRs as they're submitted). I'm trying to do some more local testing, but I still can't reproduce this locally. I'll keep slowly running at it, but any more ideas are very welcome too.

from balena-sdk.

lurch avatar lurch commented on August 18, 2024

I still can't reproduce this locally. I'll keep slowly running at it, but any more ideas are very welcome too.

Would it be worth using something like ApacheBench (or a modified version thereof) to temporarily increase the load on the server while running the unit tests? *shrug* https://www.simonholywell.com/post/2015/06/parallel-benchmark-many-urls-with-apachebench/

from balena-sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.