Comments (19)
We have now submitted a PR #91 to the dev branch we believe will have fixed this issue and the related issue #89 . Essentially, we have created a script to change the absolute paths in the MISO data files to a relative path. Big thanks to @jhgraber for pointing us in the right direction. This also means we no longer have to use the scratch directory or copy the files into the working directory. Please could those interested test the PR code on their systems, both local and through your HPC/AWS environments would be great to make sure we catch all edge-cases. Notifying @dkoppstein as well as this may fix the issues opened by you previously.
from rnasplice.
Hi @heathfuqua
Thanks for submitting the big report. I've assigned Valentino to instigate. Please give us a few days to get back to you.
Thanks!
from rnasplice.
Hi @heathfuqua
To replicate your issue I need to know if you made any chance to the nextflow.config file (I saw you passed it trough the -c option). In case you used a custom nextflow.config file please share it with me so I'll be able to replicate your configuration for the run.
Thank you
from rnasplice.
Sure. It's a minimal config file, just setting up the batch parameters.
aws {
accessKey = ''
secretKey = ''
region = 'us-east-1'
batch {
cliPath = '/home/ec2-user/miniconda/bin/aws'
}
}
process {
executor = 'awsbatch'
queue = 'nextflow-queue-2'
}
workDir = 's3://mdibl-nextflow-work/jhfuqua/'
from rnasplice.
Great thank you.
Have you specified any specific containerization method (docker, conda, singularity) for the execution?
from rnasplice.
I normally run my pipelines with '-profile docker' but only used the 'test' profile for this, assuming that it would have all necessary profile settings.
from rnasplice.
Could you try with '-profile test,docker' ?
from rnasplice.
Got the same error.
from rnasplice.
Hi, again. I just wanted to add that in the referenced work directory not only is there no .gff3 file, there is no /index, either. I'm not sure if this might help narrow down the possibilities, but thought I should mention it.
from rnasplice.
Jumping in here (I'm Heath's supervisor). I found what seems to be the problem. The MISO_INDEX process, defined in the local module "miso_index.nf" includes the directive "scratch false". After a bit of comparisons between various other modules, this seemed to be one difference, so seemed worth trying, so I deleted it. With that deletion, the pipeline was able to find the output directory "index" and move it to the working directory and the process is now moving forward
As Heath noted above, we are running this using awsbatch as the executor, so perhaps this is specific to that.
ON EDIT- it still failed, but later in the process at the step VISUALISE_MISO:MISO_SETTINGS. Will dig into that next
from rnasplice.
The new error in MISO_SETTINGS appears to be a problem with the ext.args setting, which is specifying the two prefixes, --bam_prefix and --miso_prefix.
In each case, they are being set to a concatenation of two things- for both of them, the first part of the concatenation is the local directory (on my Mac) from which I am launching the nextflow process.
The second part of the incorrect bam prefix is the star_salmon subdirectory of the Publish directory.
The second part of the incorrect miso prefix is the misopy subdirectory of the Publish directory.
I haven't found yet where this setting is being created
ON EDIT: this setting is being manually put in place in the conf/modules.config file. I disabled that, and MISO_SETTINGS now runs successfully, however MISO_SASHIMI now fails. This appears to be an issue with "sashimi_plot" which is searching for a directory /tmp/nxf.XXXX149UyS/ where the miso index directory should be found-- this is despite the fact that the directory is located in the current working directory (verified by looking at the .command.run file).
I haven't been able to figure out yet how to override where sashimi_plot is looking for the index
THAT SAID- the rest of the pipeline will now run successfully on awsbatch with --sashimi_plot=false
from rnasplice.
Hi @heathfuqua @jhgraber
Thank you both for pointing out the issues.
Similar problems were highlighted in the issue #74 and #75 and we are still working on those to find a proper solution.
Keep you updated
from rnasplice.
Hello again. I did work with this a bit further today. You may already know this, but just in case, I have identified where the problem lies with sashimi_plot. It's directly an issue of misopy, in that the miso_data files hard code the absolute path to the "pickle" files on the local machine. As far as I can tell, it's in the files that are found in "miso_data/[sampleID]/batch-genes/". (It's also in the batch-logs files, but I doubt that affects downstream steps.).
A typical line in the batch-genes files (named batch-0_genes.txt, batch-1_genes.txt, ...) looks like this:
ENSG00000102054 /tmp/nxf.XXXXLWuex4/index/chrX/ENSG00000102054.pickle
I don't see any obvious fixes in the command-line options to miso that would disable the use of the absolute path, however, I think that you might be able to get around it by globally deleting the prefixes of the form "/tmp/nxf.XXXXLWuex4/" from all of these files, so that it was instead looking for a relative path that started in index in the current directory. That is how the pipeline currently places the index in the working machine on AWS (once I disabled the scratch option)
ON EDIT- it's still not that simple, unfortunately. I tried it in a local machine, and the offending file is this one: index/genes_to_filenames.shelve.dat, which is a binary file (probably a database) that maps gene IDs to filenames, and clearly the file names in this database are absolute.
We are successfully running the pipeline with --sashimi_plot=false for the time being.
from rnasplice.
We tried it again using the AWS execution engine, and unfortunately, it's still not working. The specific error that stopped the process came in VISUALISE_MISO:MISO_SASHIMI, however when I went back into the process, it was clear that VISUALISE_MISO:MISO_RUN had also failed to execute properly.
I spent a little time reading through the .command.run files as well as some of the others, and my best guess so far is that it's an issue with the channels. For example, the index directory is not being created as a directory, but is instead just a file. When I compare the .command.run files from different runs, when MISO_RUN previously ran correctly, the channel was leading to an S3 download statement:
downloads+=("nxf_s3_download s3://mdibl-nextflow-work/jhgraber/rnasplice/3a/46a2e1dba62bc98c2b662b360ecf18/index index")
In contrast, int the run that just failed, the equivalent set up was for a symbolic link (which would definitely not work in the AWS environment):
ln -s /mdibl-nextflow-work/jhfuqua/rna_splice_test/ae/2e9df2bb10ce05289025b6c51be3ab/index index
That seems to indicate to me an issue with the channel being connected from MISO_INDEX to MISO_RUN and MISO_SASHIMI.
One more issue though, as well- the exact error that came out was this:
IOError: [Errno 2] could not open alignment file ./ERR204916_sorted.bam
: No such file or directory`, size: 1580 (max: 255)
Which seems to indicate that the bam files should be provided to MISO_SASHIMI module, but as far as I can tell from looking at the VISUALIZE_MISO subworkflow, the bam channel isn't being connected into MISO_SASHIMI.
I hope this is helpful.
from rnasplice.
That's disappointing. Could you provide the logs for us to inspect, either on here or direct message on the Slack channel if they contain any private information. Could I also confirm you are running the PR code? For example, you mention that the MISO_SASHIMI module requires a bam channel, but if you look at the PR code for that module and sub workflow you can see we added a ch_bam_bai channel. See #91 code for subworkflows/local/visualise_miso.nf and modules/local/miso_sashimi.nf
from rnasplice.
Ok. I didn't notice that the PR hadn't yet been accepted to the Dev branch, so my last run didn't reflect the changes in that PR. Will clone the PR and run that.
from rnasplice.
Good news. After running the latest PR, the pipeline completed without errors.
Thanks for your help.
from rnasplice.
That’s great news, I will leave the issue open until @jhgraber reports back as well.
from rnasplice.
All good here- My comments above were with regard to Heath's run of the data. He has confirmed both that it ran without errors and also that it produced MISO output. That's great news. Thanks for getting it updated so quickly!
from rnasplice.
Related Issues (17)
- dexseq: stager.R: subscript out of bounds. HOT 2
- prepare_genome gencode HOT 2
- check_contrastsheet not identifying headers even though they're present HOT 2
- DRIMSEQ_FILTER error HOT 10
- ERROR ~ No such variable: Exception evaluating property 'out' for nextflow.script.ChannelOut HOT 7
- SUPPA cluster events error HOT 14
- GTF_2_GFF3 fails with latest Gencode GTF for human HOT 21
- MISO_SASHIMI step fails when using scratch directory HOT 2
- MISO_SASHIMI looks in NXF_HOME subdirectory instead of working directory for output files HOT 4
- The default --miso_genes are invalid when --gencode is specified and cause the pipeline to fail HOT 2
- EDGER_EXON fails if only single entry in contrast sheet HOT 1
- STAGER error: subscript out of bounds HOT 6
- The pipeline should be able to infer strandedness from FASTQ (i.e. allow "auto" for strandedness in the CSV) HOT 2
- Error with rmats.py when running with multiple contrasts: <filename>.bam not found in .rmats files HOT 10
- MISO_SASHIMI error: Could not find MISO output files HOT 4
- example contrastsheet.csv missing HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rnasplice.