Git Product home page Git Product logo

Comments (16)

nellh avatar nellh commented on August 19, 2024 1

I have not seen any issues caused by the Docker upgrade on the newest AMI on dev and it does seem to do better with this device mapper issue, so I'm going to go ahead and update this in production in the next couple days.

from openneuro.

nellh avatar nellh commented on August 19, 2024

This looks like a known issue with the device mapper storage driver never freeing space. 2017.03.f was recently released and it looks like the issues that prevented an upgrade previously are fixed, we should test upgrading this again on dev.

from openneuro.

nellh avatar nellh commented on August 19, 2024

Dev is updated with a 2017.03.f compute environment.

@chrisfilo If you can test some of the jobs you've had issues with (TRACULA for example) it would be good to see if this fixes any other storage bugs (or introduces new ones). It takes a lot of jobs to reproduce the out of disk error, so the more jobs run the better.

from openneuro.

chrisgorgo avatar chrisgorgo commented on August 19, 2024

Running - will let you know when it finishes.

from openneuro.

oesteban avatar oesteban commented on August 19, 2024

Re-running MRIQC on https://openneuro.org/datasets/ds001060/versions/00001, which also showed this issue.

from openneuro.

chrisgorgo avatar chrisgorgo commented on August 19, 2024

Hey @oesteban the fix is on dev (openneuro.dev.sqm.io) not on prod (openneuro.org). You would have to run this job on dev to test it.

from openneuro.

oesteban avatar oesteban commented on August 19, 2024

Ups, sorry about that.

from openneuro.

chrisgorgo avatar chrisgorgo commented on August 19, 2024

@nellh The TRACULA job is still running (over 24h now) https://openneuro.dev.sqm.io/datasets/ds000118/versions/00001?app=TRACULA&version=10 This should have finished by now.

from openneuro.

nellh avatar nellh commented on August 19, 2024

@chrisfilo It looks like it is still running successfully but it was started late, so it has been running for around 12 hours. We did run a fair number of other jobs yesterday on dev and the compute environment is fairly small.

You can see the logs by looking at TRACULA/default/83ec2b17-50d3-4cd5-af19-5358ccf99b72 and TRACULA/default/197c2379-2a61-44b4-8ec1-852c5131a7a0.

One thing I noticed, 83ec2b has some shell errors in the log.

trac-preproc finished without error at Wed Sep 13 14:15:28 UTC 2017

INFO: SUBJECTS_DIR is /output/data
INFO: Diffusion root is /output/data
Actual FREESURFER_HOME /opt/freesurfer

Running /output/data/sub-02/jobs/bedp.pre.txt
Running commands ['bedpostx_preproc.sh /output/data/sub-02_ses-retest.long.sub-02/dmri', 'bedpostx_preproc.sh /output/data/sub-02_ses-test.long.sub-02/dmri']
Copying files to bedpost directory
/usr/lib/fsl/5.0/bedpostx_preproc.sh: 79: [: -eq: unexpected operator
Done

Copying files to bedpost directory
/usr/lib/fsl/5.0/bedpostx_preproc.sh: 79: [: -eq: unexpected operator
Done

from openneuro.

chrisgorgo avatar chrisgorgo commented on August 19, 2024

The queue is empty:
image

But the job is marked as running:

image

from openneuro.

chrisgorgo avatar chrisgorgo commented on August 19, 2024

This is even more weird - no jobs are marked as running on aws console and OpenNeuro log (production) still shows a job that begun, but never completed:
image

from openneuro.

nellh avatar nellh commented on August 19, 2024

@chrisfilo Somehow that job stopped getting polled after dev was updated. I'm looking into why but it lines up with the dev instance being restarted earlier. It looks like the Batch job finished successfully.

from openneuro.

chrisgorgo avatar chrisgorgo commented on August 19, 2024

How can I check the outputs?

from openneuro.

nellh avatar nellh commented on August 19, 2024

@chrisfilo The job is fixed on dev and you can see the results there now. Let me know if anything looks odd with them, I filed #68 to track the polling issue discovered here, to work around it for this job I had to kick the status in mongo manually.

from openneuro.

chrisgorgo avatar chrisgorgo commented on August 19, 2024

Thanks for triggering this. The outputs looks good!

from openneuro.

nellh avatar nellh commented on August 19, 2024

The docker updates are deployed. Any currently running jobs are on the old version but any new jobs will use the new one.

from openneuro.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.