Hi, I have found your pipeline very helpful and would like to know the best way to use

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Ah thanks, the touch makes sense :-) <p dir="auto

Updating run with new reads,about moiexpositoalonsolab/grenepipe

Comments (12)

Brent-Saylor-Canopy commented on July 30, 2024 1

Ahh, Ok. That setting won't change for me, so hopefully I'll only need to rerun the variant calling this one time.

Thanks for your help!

from grenepipe.

Brent-Saylor-Canopy commented on July 30, 2024

Nevermind

from grenepipe.

lczech commented on July 30, 2024

Hey @Brent-Saylor-Canopy,

did it work in the end? Usually, this should work - as you said, snakemake is good at this. But I never explicitly tested this, so I'd be curious to hear your feedback!

Cheers and so long
Lucas

from grenepipe.

Brent-Saylor-Canopy commented on July 30, 2024

It does work. I had copied a input file that was older than the output file initially, but that was fixed with a touch command.

The only error that comes up is that filtered/all.vcf.gz is write protected, so that needs to be deleted before it can be regenerated.

from grenepipe.

lczech commented on July 30, 2024

Ah thanks, the touch makes sense :-)

As for the write protected file: I think that it is good to keep it that way, in order to avoid accidental overwriting, meaning that users need to make sure that they actually want this.

from grenepipe.

Brent-Saylor-Canopy commented on July 30, 2024

Yes the touch is easy enough. The other option would be to add a step where links are made to each of the reads and updated on each run. That way the timestamp on the files corresponds to when the analysis was run rather than when the file was created.

Yes the file protection makes sense. I'm not sure how many people will have a similar use case to mine.

from grenepipe.

lczech commented on July 30, 2024

Hm, interesting idea to use symlinks. It might also solve some file naming issues with downstream tools. I am not entirely sure though that it would not also introduce new issues - I'll have to think about this. But thanks for the suggestion!

Also, here is another way to solve this: https://snakemake.readthedocs.io/en/stable/project_info/faq.html#snakemake-does-not-trigger-re-runs-if-i-add-additional-input-files-what-can-i-do
(the "snakemake" way).

from grenepipe.

Brent-Saylor-Canopy commented on July 30, 2024

I've encountered another problem with trying a larger scale test of adding new samples and rerunning the pipeline.

I keep getting an error when the call_variants rule is launched stating

ProtectedOutputException in line 38 of /mnt/Data1/GBS_data/grenepipe/rules/calling-haplotypecaller.smk:
Write-protected output files for rule call_variants:
called/Sample23.10A.g.vcf.gz

This seems to happen when the rule is launched, I'm not sure why but the pipeline seems to be recalling the variants for every sample, not just the new ones.

from grenepipe.

lczech commented on July 30, 2024

Not sure that I understand your question here.

Is the issue that it fails with the exception about write protected files? Because that is intentional: I've marked these files as write-protected in order to avoid accidental re-computation. Hence, in cases where you want to compute them again (which does not seem to be what you want here...), you'd have to delete them manually first - this is meant as a protection from mistakes that could otherwise lead to expensive re-computation.

If your question however is why these files are being re-computed in the first place: As you noted before, snakemake works by comparing timestamps of files and rules, and re-runs downstream rules if their input is newer than their output. I cannot tell from the information that you provided what exact chain of updates leads snakemake to want to do this, but you can call the pipeline with the -n --reason flags, which is a dry run (-n) that gives you this information for each executed rule. It might be that your input sample files were somehow updated, or some intermediate files changed.

Let me know if this helped and if you have further questions :-)

from grenepipe.

Brent-Saylor-Canopy commented on July 30, 2024

My question is why they are being recomputed at all. It is running call_variants on both the 100 samples that were run previously, and the 20 samples I added. I would expect that variants would only need to get called for the new 20 samples. The only thing I changed was to add new samples and change the "known-variants" setting in the config file. I'll have to test it out on another run. I removed the write protection on the files for now so I could get the updated results.

Would changing the config parameter "known-variants" cause each sample to get call_variants run on them again?

from grenepipe.

lczech commented on July 30, 2024

Ah yes, that is the reason then! The variant calling takes these known variants into account, and hence produces different output - hence, the variant calling needs to be repeated. You can check with the --reason flag as well, as there might be additional reasons, but this is definitely one of them!

from grenepipe.

lczech commented on July 30, 2024

For anyone finding this in the future: In recent versions, I have removed the file write-protection, because users were confused by this. This comes at the risk that unnecessary computation is done though, but that can easily checked beforehand by running snakemake with -n or -nd for a dry-run to check that the rule executions are as expected.

from grenepipe.

Updating run with new reads about grenepipe HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent