Git Product home page Git Product logo

Comments (4)

lczech avatar lczech commented on July 30, 2024

Hi Rifa,

hm, from the log files you sent I cannot tell what is going on there. You can try without the --rerun-incomplete flag - maybe snakemake thinks that some of the files are not complete, when in fact, they are. Also, add the --reason flag as well - this will print the reason for each job being executed. If you do that in combination with -n (dry run) you can just check what Snakemake wants to do, without actually using up resources. Maybe post the resulting file from that here as well, so that I can have a look.

As a general thing, usually, Snakemake does these repeated job executions due to some weird interaction of some of the input files being updated later on for some reason. For instance, it could be that somehow some of the reference genome index files got updated later on, and because those are inputs to the mapping, Snakemake then thinks "oh, updated files in the input, better re-generate the output", although the contents of those files might not have changed... That's what's usually behind these things.

If that is the case, you can try to manually re-set the time stamps of those reference genome index files. Its just a few files, so that should be not too much effort. That way, Snakemake will think that they are older than your mapped files, and will not try to update the mapping again. See here for how to do that.

If that is the reason, the files in question are

/dss/dsslegfs01/pr53da/pr53da-dss-0026/assemblies/Poecile_montanus_v1.0_12.07.2024/12.07.24_Poecile.montanus__genome.mitogenome__N2005.N2008__PacBioHiFi.Hi-C.Illumina__hifiasm.yahs.ragtag.mitofinder__v1.0.fasta.*

in your case. You can then probably just do

touch -d "2024-07-01T00:00:00" /dss/dsslegfs01/pr53da/pr53da-dss-0026/assemblies/Poecile_montanus_v1.0_12.07.2024/12.07.24_Poecile.montanus__genome.mitogenome__N2005.N2008__PacBioHiFi.Hi-C.Illumina__hifiasm.yahs.ragtag.mitofinder__v1.0.fasta.*

to reset all of them to first of July (or any other date that is older than your mapping run).

Also, as you are using grenepipe v0.12.2: I recently had users with issues due to conda having broken backwards compatibility. Did you get it to work still? Or when did you install it and used it for the first time (which is when the conda environments it needs are created)? Interesting that you still got this to work :-) There is a new version out now, which updates all conda-related things, as well as Snakemake. Maybe you could try that as well - it could be that the new version of Snakemake is smarter with respect to your file issue. However, that will definitely require at least one full re-run, as the complete output file structure has changed, so all files will have to be created again. But from then on it might be working better.

Finally, have you figured out what was wrong with that one mapping job in the first place? That of course needs to be fixed as well, so maybe check the log file logs/samtools/merge/merge-N_1413.log for that.

Hope that helps, so long
Lucas

from grenepipe.

athenasyarifa avatar athenasyarifa commented on July 30, 2024

Hi Lucas,

Thanks so much for the help and suggestions! A few days ago I encountered another error with my grenepipe run, and I did as you suggested: (1) rerunning without the --rerun-incomplete flag, and (2) manually resetting the time for the reference genome. The run is going smoothly until now, hopefully I don't have to rerun again the whole thing.

Not completely sure why grenepipe v.0.12.2 is working for me, but my best guess is that I am using an older version of conda (I never bothered to update my conda although I should). I installed grenepipe v.0.12.2 on 8th of April if it helps you.

About the mapping, I figured out the problem, it was truncated somehow (maybe when I stopped the grenepipe in the middle of running or something?).

$ samtools quickcheck -v N_1413-1.sorted.bam
N_1413-1.sorted.bam was missing EOF block when one should be present.

But of course, after the rerun the file is now fine. I will close the issue, thanks again!

Best,
Rifa

from grenepipe.

lczech avatar lczech commented on July 30, 2024

Hey Rifa,

thanks for the feedback, and happy to hear that things are working now!

Thanks so much for the help and suggestions! A few days ago I encountered another error with my grenepipe run, and I did as you suggested: (1) rerunning without the --rerun-incomplete flag, and (2) manually resetting the time for the reference genome. The run is going smoothly until now, hopefully I don't have to rerun again the whole thing.

Ah nice, glad that worked!

Not completely sure why grenepipe v.0.12.2 is working for me, but my best guess is that I am using an older version of conda (I never bothered to update my conda although I should). I installed grenepipe v.0.12.2 on 8th of April if it helps you.

I see, that makes sense. I think once you update that, you will have to use the newer versions though. The other way round, I don't know if you could use those already with your older conda version - I would not be surprise if conda breaks there somehow as well. But anyway, as you have a running system now, all good :-)

About the mapping, I figured out the problem, it was truncated somehow (maybe when I stopped the grenepipe in the middle of running or something?).

Makes sense. Could have just been some job in the cluster that failed for mysterious reasons, that happens sometimes. If you are curious, you might be able to trace this via the log files, but honestly, if a simple re-run has worked now, all good.

Cheers
Lucas

from grenepipe.

athenasyarifa avatar athenasyarifa commented on July 30, 2024

Hi @lczech sorry to bother you again with this.

My grenepipe run had another error again with a few contig groups missing in few samples, below I attached the snakemake log file... I checked the logs of the haplotype caller for these samples, and they were truncated (attached one example). Not sure about these errors, but maybe because I had disabled the slurm-status.py a while back because my slurm administrators said that it was slowing down the slurm database.

2024-07-22T110323.749526.snakemake.log
N_4843.contig-group-5.log

But then, I want to start again the grenepipe run so that it reruns these samples only. I ran the following first as you suggested:

touch -d "2024-07-01T00:00:00" /dss/dsslegfs01/pr53da/pr53da-dss-0026/assemblies/Poecile_montanus_v1.0_12.07.2024/12.07.24_Poecile.montanus__genome.mitogenome__N2005.N2008__PacBioHiFi.Hi-C.Illumina__hifiasm.yahs.ragtag.mitofinder__v1.0.fasta.*

touch -d "2024-07-01T00:00:00" /dss/dsslegfs01/pr53da/pr53da-dss-0026/assemblies/Poecile_montanus_v1.0_12.07.2024/12.07.24_Poecile.montanus__genome.mitogenome__N2005.N2008__PacBioHiFi.Hi-C.Illumina__hifiasm.yahs.ragtag.mitofinder__v1.0.fasta

and then tried with the -n dry run flag. Here I attached the dry run log, and it seems that grenepipe wants to repeat the variant calling (and generating bam index) again for all of the samples. Do you have an idea why? Should I also manually change the dates of the bam files?

dry_run_1.txt

How can I rerun grenepipe without starting the variant calling for all samples? Thank you so much in advance for your help, Lucas.

Cheers,
Rifa

from grenepipe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.