Comments (12)
Ahh, Ok. That setting won't change for me, so hopefully I'll only need to rerun the variant calling this one time.
Thanks for your help!
from grenepipe.
Nevermind
from grenepipe.
Hey @Brent-Saylor-Canopy,
did it work in the end? Usually, this should work - as you said, snakemake is good at this. But I never explicitly tested this, so I'd be curious to hear your feedback!
Cheers and so long
Lucas
from grenepipe.
It does work. I had copied a input file that was older than the output file initially, but that was fixed with a touch command.
The only error that comes up is that filtered/all.vcf.gz is write protected, so that needs to be deleted before it can be regenerated.
from grenepipe.
Ah thanks, the touch
makes sense :-)
As for the write protected file: I think that it is good to keep it that way, in order to avoid accidental overwriting, meaning that users need to make sure that they actually want this.
from grenepipe.
Yes the touch is easy enough. The other option would be to add a step where links are made to each of the reads and updated on each run. That way the timestamp on the files corresponds to when the analysis was run rather than when the file was created.
Yes the file protection makes sense. I'm not sure how many people will have a similar use case to mine.
from grenepipe.
Hm, interesting idea to use symlinks. It might also solve some file naming issues with downstream tools. I am not entirely sure though that it would not also introduce new issues - I'll have to think about this. But thanks for the suggestion!
Also, here is another way to solve this: https://snakemake.readthedocs.io/en/stable/project_info/faq.html#snakemake-does-not-trigger-re-runs-if-i-add-additional-input-files-what-can-i-do
(the "snakemake" way).
from grenepipe.
I've encountered another problem with trying a larger scale test of adding new samples and rerunning the pipeline.
I keep getting an error when the call_variants rule is launched stating
ProtectedOutputException in line 38 of /mnt/Data1/GBS_data/grenepipe/rules/calling-haplotypecaller.smk:
Write-protected output files for rule call_variants:
called/Sample23.10A.g.vcf.gz
This seems to happen when the rule is launched, I'm not sure why but the pipeline seems to be recalling the variants for every sample, not just the new ones.
from grenepipe.
Not sure that I understand your question here.
Is the issue that it fails with the exception about write protected files? Because that is intentional: I've marked these files as write-protected in order to avoid accidental re-computation. Hence, in cases where you want to compute them again (which does not seem to be what you want here...), you'd have to delete them manually first - this is meant as a protection from mistakes that could otherwise lead to expensive re-computation.
If your question however is why these files are being re-computed in the first place: As you noted before, snakemake works by comparing timestamps of files and rules, and re-runs downstream rules if their input is newer than their output. I cannot tell from the information that you provided what exact chain of updates leads snakemake to want to do this, but you can call the pipeline with the -n --reason
flags, which is a dry run (-n
) that gives you this information for each executed rule. It might be that your input sample files were somehow updated, or some intermediate files changed.
Let me know if this helped and if you have further questions :-)
from grenepipe.
My question is why they are being recomputed at all. It is running call_variants on both the 100 samples that were run previously, and the 20 samples I added. I would expect that variants would only need to get called for the new 20 samples. The only thing I changed was to add new samples and change the "known-variants" setting in the config file. I'll have to test it out on another run. I removed the write protection on the files for now so I could get the updated results.
Would changing the config parameter "known-variants" cause each sample to get call_variants run on them again?
from grenepipe.
Ah yes, that is the reason then! The variant calling takes these known variants into account, and hence produces different output - hence, the variant calling needs to be repeated. You can check with the --reason
flag as well, as there might be additional reasons, but this is definitely one of them!
from grenepipe.
For anyone finding this in the future: In recent versions, I have removed the file write-protection, because users were confused by this. This comes at the risk that unnecessary computation is done though, but that can easily checked beforehand by running snakemake with -n
or -nd
for a dry-run to check that the rule executions are as expected.
from grenepipe.
Related Issues (20)
- bwa-mem2 "{tmp}.0000.bam": File exists HOT 5
- threads for bwa-mem2 via slurm HOT 2
- Error running toy example HOT 6
- MissingRuleException HOT 13
- PID error HOT 9
- java.lang.OutOfMemoryError: Java heap space HOT 2
- GRENEPIPE v12.1 HOT 5
- Make "trimming-tool" optional HOT 4
- restrict-regions and short contigs HOT 2
- ModuleNotFoundError: No module named 'chardet' HOT 2
- Write full executed command for each step to log files for reproducibility HOT 3
- merging calls from multiple pipeline runs? HOT 2
- mamba is difficult to install in grenepipe environment HOT 6
- Feature Request: Download reference genome and known variation HOT 2
- config file HOT 5
- greenepipe run error HOT 5
- problem with dedup HOT 4
- a new type of error HOT 2
- a new type of error HOT 1
- another type of error HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grenepipe.