Comments (16)
I have not seen any issues caused by the Docker upgrade on the newest AMI on dev and it does seem to do better with this device mapper issue, so I'm going to go ahead and update this in production in the next couple days.
from openneuro.
This looks like a known issue with the device mapper storage driver never freeing space. 2017.03.f was recently released and it looks like the issues that prevented an upgrade previously are fixed, we should test upgrading this again on dev.
from openneuro.
Dev is updated with a 2017.03.f compute environment.
@chrisfilo If you can test some of the jobs you've had issues with (TRACULA for example) it would be good to see if this fixes any other storage bugs (or introduces new ones). It takes a lot of jobs to reproduce the out of disk error, so the more jobs run the better.
from openneuro.
Running - will let you know when it finishes.
from openneuro.
Re-running MRIQC on https://openneuro.org/datasets/ds001060/versions/00001, which also showed this issue.
from openneuro.
Hey @oesteban the fix is on dev (openneuro.dev.sqm.io) not on prod (openneuro.org). You would have to run this job on dev to test it.
from openneuro.
Ups, sorry about that.
from openneuro.
@nellh The TRACULA job is still running (over 24h now) https://openneuro.dev.sqm.io/datasets/ds000118/versions/00001?app=TRACULA&version=10 This should have finished by now.
from openneuro.
@chrisfilo It looks like it is still running successfully but it was started late, so it has been running for around 12 hours. We did run a fair number of other jobs yesterday on dev and the compute environment is fairly small.
You can see the logs by looking at TRACULA/default/83ec2b17-50d3-4cd5-af19-5358ccf99b72
and TRACULA/default/197c2379-2a61-44b4-8ec1-852c5131a7a0
.
One thing I noticed, 83ec2b has some shell errors in the log.
trac-preproc finished without error at Wed Sep 13 14:15:28 UTC 2017
INFO: SUBJECTS_DIR is /output/data
INFO: Diffusion root is /output/data
Actual FREESURFER_HOME /opt/freesurfer
Running /output/data/sub-02/jobs/bedp.pre.txt
Running commands ['bedpostx_preproc.sh /output/data/sub-02_ses-retest.long.sub-02/dmri', 'bedpostx_preproc.sh /output/data/sub-02_ses-test.long.sub-02/dmri']
Copying files to bedpost directory
/usr/lib/fsl/5.0/bedpostx_preproc.sh: 79: [: -eq: unexpected operator
Done
Copying files to bedpost directory
/usr/lib/fsl/5.0/bedpostx_preproc.sh: 79: [: -eq: unexpected operator
Done
from openneuro.
But the job is marked as running:
from openneuro.
This is even more weird - no jobs are marked as running on aws console and OpenNeuro log (production) still shows a job that begun, but never completed:
from openneuro.
@chrisfilo Somehow that job stopped getting polled after dev was updated. I'm looking into why but it lines up with the dev instance being restarted earlier. It looks like the Batch job finished successfully.
from openneuro.
How can I check the outputs?
from openneuro.
@chrisfilo The job is fixed on dev and you can see the results there now. Let me know if anything looks odd with them, I filed #68 to track the polling issue discovered here, to work around it for this job I had to kick the status in mongo manually.
from openneuro.
Thanks for triggering this. The outputs looks good!
from openneuro.
The docker updates are deployed. Any currently running jobs are on the old version but any new jobs will use the new one.
from openneuro.
Related Issues (20)
- [ENH] Revert to last version on openneuro.org HOT 1
- File corruption detection HOT 1
- Validation pending HOT 5
- CLI upload may throw ERR_STREAM_PREMATURE_CLOSE on retried requests
- Update to latest elastic-apm-node HOT 1
- Web downloads capped at 7.5GB
- git-annex copy fails "with no reason" in OpenNeuro 4.17.2
- NeuroBagel roadmap HOT 6
- bucket removed permission to list versions? HOT 1
- CORS errors for S3 objects when viewed HOT 1
- Search fields for BIDS derivatives datasets
- ds001510: files download as MD5E checksum rather than correct file name HOT 1
- Add summary of all datasets page to replace metadata.openneuro.org
- Global access to NiiVue instance HOT 1
- git-annex filter-process can crash with large commits containing annex destined files
- Double-blind reviews HOT 3
- Issue with the download button mechanism
- Missing dataset after updating dataset_description.json HOT 2
- Hanging validation preventing web upload HOT 1
- Allow web upload to remove remote files not present locally
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openneuro.