Git Product home page Git Product logo

Comments (12)

lczech avatar lczech commented on September 5, 2024

Hi Ben,

interesting. There are several things that might be off here.

Job needs threads=5 but only threads=3 are available. This is likely because two jobs are connected via a pipe and have to run simultaneously. Consider providing more resources (e.g. via --cores).

You might want to try using more cores, to resolve this warning. That is however probably not the cause here.

The problem here seems to be with samtools stats, wich is a quality control tool that is independent of freebayes and GATK HaplotypeCaller, and hence can be executed by the pipeline whenever - it might hence be coincidence that it fails before freebayes is even started, and that you might run into the same problem with HaplotypeCaller as well. Hence: Did the same configuration with just changing the calling tool work for you? Was that with the exact same grenepipe version?

One thing that might help: Could you please post the samtool stats log file here? It should be at logs/samtools-stats/111D03-1.log.

Cheers and so long
Lucas

from grenepipe.

bensprung avatar bensprung commented on September 5, 2024

I'm actually running on a pretty old 4-core machine so I only allocate 3 cores, but yes--it works fine with HaplotypeCaller set instead of freebayes and everything else exactly the same. Unfortunately, the log file logs/samtools-stats/111D03-1.log is not actually created so I can't post it (you can see it complaining that it doesn't exist in the error message above). The logs/samtools-stats directory exists but there are no files in it.

Interestingly, for the runs where HaplotypeCaller is used (with success), that samtools-stats log file exists but it's totally empty.

from grenepipe.

bensprung avatar bensprung commented on September 5, 2024

I'm back from other projects trying to troubleshoot this. Is there a way, for a particular config.yaml, to get grenepipe to produce a list of the commands that it is going to do, without actually running them? I was hoping to run each step by hand as it were to try to get a more precise sense of what the issue is.

Also, if I allocate another core, I just get Job needs threads=6 but only threads=4 are available.

from grenepipe.

lczech avatar lczech commented on September 5, 2024

Hi @bensprung,

thanks for digging into this!

Is there a way, for a particular config.yaml, to get grenepipe to produce a list of the commands that it is going to do, without actually running them?

Kind of. Snakemake offers the option --dry-run to list all rules that are going to be executed, see here. This will give you the tools and their input and output files, but you will have to somehow cobble together the actual command lines to execute. I don't think there is another way, as the construction of command lines and their subsequent execution are part of scripts that are just executed as a whole by snakemake.

However, I am not entirely sure that this is necessary. According to your above log, you can see which tools fail, can you not? So you'd only need to execute the failing ones by hand, I think.

Also, if I allocate another core, I just get Job needs threads=6 but only threads=4 are available.

As for that, yes, I see. I am developing on my 8 core (16 with hyperthreading) laptop, and the pipeline is mostly geared towards even larger systems such as clusters. Hence, I did not optimize it for smaller laptops. However, a change of the pipeline to work without warnings on 4 cores or fewer does not make much sense to me: Such a change would make it slower on larger systems and datasets. And for small datasets that can be run on your laptop, it does not matter much anyway - you should be able to just use --cores 6. This will of course oversubscribe your cores, and so your laptop will be slow while running, but it should work. For larger datasets where this is inconvenient, I would suggest to use a larger machine or cluster anyway.

Let me know if that helped or if you need any further input for now! Worst case, send me your data, and I can help debugging.

Cheers and a happy holiday season!
Lucas

from grenepipe.

bensprung avatar bensprung commented on September 5, 2024

Well I can't really tell tbh. It seems like samtools-stats is failing. I will look at --dry-run.

The strange thing is I can run with 1 core with the default tools with no errors. But it's only changing the caller to freebayes that creates this error--but it's very early in the pipeline. Seems weird?

from grenepipe.

lczech avatar lczech commented on September 5, 2024

That is indeed weird. The core issue should maximally lead to snakemake complaining or failing. But these errors seem to come from the tools being run, and not from snakemake itself... I did have issues in the past where one tool complained, but another was at fault, by producing erroneous or empty output files. You could check that the files that samtool stats wants to use (e.g., dedup/111D03-1.bam) are correct.

If that does not help - would you mind sharing your data or part of it with me?

from grenepipe.

bensprung avatar bensprung commented on September 5, 2024

Hrm, I don't think it's a problem with the bam file, because it turns out (surprisingly) that I get the same error with --dry-run, but only if calling-tool is set to freebayes. If it is set to haplotypecaller or bcftools it completes without issue. I'll attach the output of --dry-run for all three.

Happy to send the data if you think it makes sense.

freebayes.txt
haplotypecaller.txt
bcftools.txt

from grenepipe.

lczech avatar lczech commented on September 5, 2024

Oh interesting, that error is indeed simply caused by too few cores. I thought snakemake would handle that differently, sorry for that. As said above, just run it with --cores 6 - that should work, but make your computer slow while the pipeline is running. As I would not recommend running the pipeline for any large dataset on a laptop anyway, that should not be a limitation, and hence suffice for testing ;-)

from grenepipe.

bensprung avatar bensprung commented on September 5, 2024

Ok will try that. Any idea why it only happens with freebayes as the caller?

from grenepipe.

lczech avatar lczech commented on September 5, 2024

Yes, because the freebayes rules are implemented to use more cores by default, see the config file. You can change this setting as well (instead of changing --cores) to help with the issue.

from grenepipe.

bensprung avatar bensprung commented on September 5, 2024

Got it. So, changing threads: 8 for freebays in the config file didn't yield a completed run (I tried ramping it down all the way to 1 but still continued to get various odd errors) but running snakemake with --cores 8 (which I didn't think I could do with only 4 physical cores) worked. Thank you!

from grenepipe.

lczech avatar lczech commented on September 5, 2024

Ah nice, glad to hear it worked out now! Closing the issue now, but feel free to re-open if needed.

Some things remain though:

I tried ramping it down all the way to 1 but still continued to get various odd errors

Hm, what exactly happened there?

which I didn't think I could do with only 4 physical cores

Ah yes, it's possible to over-subscribe your cores. As said, that will make your computer slow for a while, but absolutely okay to do, technically speaking.

from grenepipe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.