Git Product home page Git Product logo

Comments (19)

jma1991 avatar jma1991 commented on May 28, 2024 1

We have now submitted a PR #91 to the dev branch we believe will have fixed this issue and the related issue #89 . Essentially, we have created a script to change the absolute paths in the MISO data files to a relative path. Big thanks to @jhgraber for pointing us in the right direction. This also means we no longer have to use the scratch directory or copy the files into the working directory. Please could those interested test the PR code on their systems, both local and through your HPC/AWS environments would be great to make sure we catch all edge-cases. Notifying @dkoppstein as well as this may fix the issues opened by you previously.

from rnasplice.

jma1991 avatar jma1991 commented on May 28, 2024

Hi @heathfuqua

Thanks for submitting the big report. I've assigned Valentino to instigate. Please give us a few days to get back to you.

Thanks!

from rnasplice.

valentinoruggieri avatar valentinoruggieri commented on May 28, 2024

Hi @heathfuqua

To replicate your issue I need to know if you made any chance to the nextflow.config file (I saw you passed it trough the -c option). In case you used a custom nextflow.config file please share it with me so I'll be able to replicate your configuration for the run.

Thank you

from rnasplice.

heathfuqua avatar heathfuqua commented on May 28, 2024

Sure. It's a minimal config file, just setting up the batch parameters.

aws {
accessKey = ''
secretKey = '
'
region = 'us-east-1'
batch {
cliPath = '/home/ec2-user/miniconda/bin/aws'
}
}

process {
executor = 'awsbatch'
queue = 'nextflow-queue-2'
}

workDir = 's3://mdibl-nextflow-work/jhfuqua/'

from rnasplice.

valentinoruggieri avatar valentinoruggieri commented on May 28, 2024

Great thank you.
Have you specified any specific containerization method (docker, conda, singularity) for the execution?

from rnasplice.

heathfuqua avatar heathfuqua commented on May 28, 2024

I normally run my pipelines with '-profile docker' but only used the 'test' profile for this, assuming that it would have all necessary profile settings.

from rnasplice.

valentinoruggieri avatar valentinoruggieri commented on May 28, 2024

Could you try with '-profile test,docker' ?

from rnasplice.

heathfuqua avatar heathfuqua commented on May 28, 2024

Got the same error.

from rnasplice.

heathfuqua avatar heathfuqua commented on May 28, 2024

Hi, again. I just wanted to add that in the referenced work directory not only is there no .gff3 file, there is no /index, either. I'm not sure if this might help narrow down the possibilities, but thought I should mention it.

from rnasplice.

jhgraber avatar jhgraber commented on May 28, 2024

Jumping in here (I'm Heath's supervisor). I found what seems to be the problem. The MISO_INDEX process, defined in the local module "miso_index.nf" includes the directive "scratch false". After a bit of comparisons between various other modules, this seemed to be one difference, so seemed worth trying, so I deleted it. With that deletion, the pipeline was able to find the output directory "index" and move it to the working directory and the process is now moving forward

As Heath noted above, we are running this using awsbatch as the executor, so perhaps this is specific to that.

ON EDIT- it still failed, but later in the process at the step VISUALISE_MISO:MISO_SETTINGS. Will dig into that next

from rnasplice.

jhgraber avatar jhgraber commented on May 28, 2024

The new error in MISO_SETTINGS appears to be a problem with the ext.args setting, which is specifying the two prefixes, --bam_prefix and --miso_prefix.

In each case, they are being set to a concatenation of two things- for both of them, the first part of the concatenation is the local directory (on my Mac) from which I am launching the nextflow process.

The second part of the incorrect bam prefix is the star_salmon subdirectory of the Publish directory.
The second part of the incorrect miso prefix is the misopy subdirectory of the Publish directory.

I haven't found yet where this setting is being created

ON EDIT: this setting is being manually put in place in the conf/modules.config file. I disabled that, and MISO_SETTINGS now runs successfully, however MISO_SASHIMI now fails. This appears to be an issue with "sashimi_plot" which is searching for a directory /tmp/nxf.XXXX149UyS/ where the miso index directory should be found-- this is despite the fact that the directory is located in the current working directory (verified by looking at the .command.run file).

I haven't been able to figure out yet how to override where sashimi_plot is looking for the index

THAT SAID- the rest of the pipeline will now run successfully on awsbatch with --sashimi_plot=false

from rnasplice.

valentinoruggieri avatar valentinoruggieri commented on May 28, 2024

Hi @heathfuqua @jhgraber
Thank you both for pointing out the issues.
Similar problems were highlighted in the issue #74 and #75 and we are still working on those to find a proper solution.
Keep you updated

from rnasplice.

jhgraber avatar jhgraber commented on May 28, 2024

Hello again. I did work with this a bit further today. You may already know this, but just in case, I have identified where the problem lies with sashimi_plot. It's directly an issue of misopy, in that the miso_data files hard code the absolute path to the "pickle" files on the local machine. As far as I can tell, it's in the files that are found in "miso_data/[sampleID]/batch-genes/". (It's also in the batch-logs files, but I doubt that affects downstream steps.).

A typical line in the batch-genes files (named batch-0_genes.txt, batch-1_genes.txt, ...) looks like this:
ENSG00000102054 /tmp/nxf.XXXXLWuex4/index/chrX/ENSG00000102054.pickle

I don't see any obvious fixes in the command-line options to miso that would disable the use of the absolute path, however, I think that you might be able to get around it by globally deleting the prefixes of the form "/tmp/nxf.XXXXLWuex4/" from all of these files, so that it was instead looking for a relative path that started in index in the current directory. That is how the pipeline currently places the index in the working machine on AWS (once I disabled the scratch option)

ON EDIT- it's still not that simple, unfortunately. I tried it in a local machine, and the offending file is this one: index/genes_to_filenames.shelve.dat, which is a binary file (probably a database) that maps gene IDs to filenames, and clearly the file names in this database are absolute.

We are successfully running the pipeline with --sashimi_plot=false for the time being.

from rnasplice.

jhgraber avatar jhgraber commented on May 28, 2024

We tried it again using the AWS execution engine, and unfortunately, it's still not working. The specific error that stopped the process came in VISUALISE_MISO:MISO_SASHIMI, however when I went back into the process, it was clear that VISUALISE_MISO:MISO_RUN had also failed to execute properly.

I spent a little time reading through the .command.run files as well as some of the others, and my best guess so far is that it's an issue with the channels. For example, the index directory is not being created as a directory, but is instead just a file. When I compare the .command.run files from different runs, when MISO_RUN previously ran correctly, the channel was leading to an S3 download statement:

downloads+=("nxf_s3_download s3://mdibl-nextflow-work/jhgraber/rnasplice/3a/46a2e1dba62bc98c2b662b360ecf18/index index")

In contrast, int the run that just failed, the equivalent set up was for a symbolic link (which would definitely not work in the AWS environment):

ln -s /mdibl-nextflow-work/jhfuqua/rna_splice_test/ae/2e9df2bb10ce05289025b6c51be3ab/index index

That seems to indicate to me an issue with the channel being connected from MISO_INDEX to MISO_RUN and MISO_SASHIMI.

One more issue though, as well- the exact error that came out was this:
IOError: [Errno 2] could not open alignment file ./ERR204916_sorted.bam: No such file or directory`, size: 1580 (max: 255)

Which seems to indicate that the bam files should be provided to MISO_SASHIMI module, but as far as I can tell from looking at the VISUALIZE_MISO subworkflow, the bam channel isn't being connected into MISO_SASHIMI.

I hope this is helpful.

from rnasplice.

jma1991 avatar jma1991 commented on May 28, 2024

That's disappointing. Could you provide the logs for us to inspect, either on here or direct message on the Slack channel if they contain any private information. Could I also confirm you are running the PR code? For example, you mention that the MISO_SASHIMI module requires a bam channel, but if you look at the PR code for that module and sub workflow you can see we added a ch_bam_bai channel. See #91 code for subworkflows/local/visualise_miso.nf and modules/local/miso_sashimi.nf

from rnasplice.

heathfuqua avatar heathfuqua commented on May 28, 2024

Ok. I didn't notice that the PR hadn't yet been accepted to the Dev branch, so my last run didn't reflect the changes in that PR. Will clone the PR and run that.

from rnasplice.

heathfuqua avatar heathfuqua commented on May 28, 2024

Good news. After running the latest PR, the pipeline completed without errors.
Thanks for your help.

from rnasplice.

jma1991 avatar jma1991 commented on May 28, 2024

That’s great news, I will leave the issue open until @jhgraber reports back as well.

from rnasplice.

jhgraber avatar jhgraber commented on May 28, 2024

All good here- My comments above were with regard to Heath's run of the data. He has confirmed both that it ran without errors and also that it produced MISO output. That's great news. Thanks for getting it updated so quickly!

from rnasplice.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.