The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular command line tools from Snakemake rules and workflows.
Visit https://snakemake-wrappers.readthedocs.io for more information.
This is the development home of the Snakemake wrapper repository, see
Home Page: https://snakemake-wrappers.readthedocs.io
The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular command line tools from Snakemake rules and workflows.
Visit https://snakemake-wrappers.readthedocs.io for more information.
Is your feature request related to a problem? Please describe.
I am currently writing a manuscript that uses snakemake with a lot of wrapper scripts, and have to go through all tools individually to get (a) the exact version of the tool, and (b) then look up its reference for the bibliography.
My process is:
wrapper:
"0.51.3/bio/trimmomatic/pe"
This is repetitive work, not only for me, but for everyone who is in a similar situation. The two pieces of information that I am interested in here are the version of the tool being used, and its reference.
Describe the solution you'd like
Ideally, snakemake could offer some command that takes a config file/workflow, and figures out all tools being used in that exact workflow configuration (skipping any rules that are not being invoked!). It then automatically collects the needed information, and prints it in some format.
Tool version information is already available via the environment.yaml
in the wrapper, and citation or links to the tool websites could be added (optionally and bit by bit for each tool) to the meta.yaml
and read from there.
I guess that this is only feasible for wrapped tools, and that I still have to go through my individual (shell/script based) rules. But the above snakemake command then could at least also list all those rules, so that I do not forget about them. Basically, it prints out a linear list of the dependency graph (similar to the normal terminal output when running snakemake, but only once for each rule), but with all tools and references listed.
So, my ideal solution would look something like this:
$ snakemake [...] --print-tools # or whatever you would want to call this command
Rule `trim_read`:
- Uses wrapper `TRIMMOMATIC PE`
- Tool: trimmomatic v0.36
- Conda: https://anaconda.org/bioconda/trimmomatic
- URL: http://www.usadellab.org/cms/index.php?page=trimmomatic
- Reference: Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114โ2120. https://doi.org/10.1093/bioinformatics/btu170
Rule `my_rule`:
- Uses shell-based command
Rule `my_other_rule`:
- Uses python-based command
....
That would be awesome!
Describe alternatives you've considered
I thought that a simple overview table here in the repo might also be a solution, but that would get too messy rather quickly, with all wrappers in different versions etc... and it would not really speed things up, would be extra maintenance work, and would probably be outdated all the time, and... No, not a good alternative.
Hi, apologies for this very basic question but I am wondering how snakemake deals with the threads parameter in the bwa mem wrapper. The wrapper uses this parameter to specify the number of threads used by bwa mem. But when also applying a sort the output is piped to either samtools or picard and this I assume also requires at least one additional thread.
So my question is whether the this causes a mismatch between the number of threads anticipated by snakemake (i.e. the threads parameter) and the actual threads used (i.e. the thread parameter + 1) and if this could have unwanted effects?
Thanks
Is your feature request related to a problem? Please describe.
If I remember correctly the order of channels has a big impact on the performance of the install process.
conda-forge should be prioritized over bioconda.
many wrappers have it the other way around.
Describe the solution you'd like
have consistent channel priorities in al environement.aml files
If I remember correctly this is the optimal solution.
channels:
- conda-forge
- bioconda
- defaults
Describe alternatives you've considered
I know mamba is another solution
Additional context
I can help, if this change is accepted.
Snakemake version
snakemake version: 5.26.1
wrapper version: not yet released, current version on master branch
Describe the bug
The tests on the pull request for unicycler's addition ran fine (and quick, a few minutes):
https://github.com/snakemake/snakemake-wrappers/runs/1342977411#step:7:265
Since it has been merged into master, tests on master are consistently failing with the following error (after running for hours):
Creating conda environment /tmp/tmprc8d9w2k/master/bio/unicycler/environment.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /tmp/tmprc8d9w2k/master/bio/unicycler/environment.yaml:
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
Preparing transaction: ...working... done
Verifying transaction: ...working... failed
NoSpaceLeftError: No space left on devices.
See here:
https://github.com/snakemake/snakemake-wrappers/runs/1343943714#step:7:321
Logs
If applicable, any terminal output to help explain your problem.
Minimal example
See GitHub Actions results of recent merge commits.
Snakemake version
5.8.1+12.g85a09c4
Describe the bug
Namedlist attributes are used in several wrappers. Originally reported here
Additional context
This is the result of making functions in Namedlist class read only. Any parameters with conflicting names need to be changed to function properly, e.g. index, sort, etc.
Is your feature request related to a problem? Please describe.
I noticed that while the meta.yaml
are well completed, the info within is not displayed in the documentation (even in v0.67.0), especially the description of input and output of the wrapper.
Describe the solution you'd like
Update the Jinja .rst
template to print out these info.
Describe alternatives you've considered
I will do a PR with a suggestion of template.
Best,
Charlie
Some wrappers, e.g. samtools sort
or samtools index
consume params entirely to customize tool invocation. Others, like trim_galore-pe
use a named param called extra
.
The latter allows the specification of additional params for cluster execution as suggested in the documentation, the former breaks when additional named params are used. I'm not sure if there are use cases for additional params other than --cluster
, as they are obviously not going to be used in the rule code itself.
Is your feature request related to a problem? Please describe.
I was wondering why the fastqc
wrapper is not using the threads
directive unlike other tools such as bwa mem
.
Describe the solution you'd like
Could the wrapper be modified to use the threads
directive?
Describe alternatives you've considered
Right now I'm using the params
directive with -t 8
alongside threads: 8
for it to work on my HPC infrastructure, but I would like to avoid hardcoding the number of threads I'm using.
Would it be ok for me to create a PR where I add snakefmt
to the CI checks and also format all current Snakfiles with it?
Snakemake version
snakemake version: 0.63.x
onwards
Describe the bug
Log filepath gets improperly quoted for fastqc wrapper, and hence its logs won't be written into a file. Note that job using this wrapper will be successful (unless singularity container is used) but the log file will be empty.
Logs
$snakemake --use-conda -f qc/fastQC/before_trim/rep1_R1_fastqc.html -p
....
....
python /workflow_path/.snakemake/scripts/tmpsz98rhl8.wrapper.py
Activating conda environment:/workflow_path/.snakemake/conda/06b7292e
fastqc --quiet -t 1 --outdir /scratch/local/tmpenzw80c2/workflow_path/rep1_R1.fastq.gz ' 2> logs/fastQC/rep1_R1.log'
Skipping ' 2> logs/fastQC/rep1_R1.log' which didn't exist, or couldn't be read
mv /scratch/local/tmpenzw80c2/rep1_R1_fastqc.html qc/fastQC/before_trim/rep1_R1_fastqc.html
mv /scratch/local/tmpenzw80c2/rep1_R1_fastqc.zip qc/fastQC/before_trim/rep1_R1_fastqc.zip
[Tue Sep 22 17:14:16 2020]
Finished job 0.
1 of 1 steps (100%) done
Complete log:/workflow_path/.snakemake/log/2020-09-22T171404.568908.snakemake.log
Minimal example
Try using 0.62.0
and 0.66.0
to replicate the issue. Former should be problem-free (assuming other dir/filepaths dont have whitespace character).
Additional context
Add any other context about the problem here.
snakemake-wrappers/bio/fastqc/wrapper.py
Lines 32 to 34 in 517d848
In version 0.63.0
, log directive was quoted using snakemake's :q
feature, and this has the side effect of quoting not just the filepath but also the redirection symbol.
Snakemake version
snakemake: 5.9.1
snakemake-wrappers: 0.45.1
Describe the bug
It's not possible to download latest versions of GRCh37 build (releases up from 76 on). These are at a different location relative to the main releases.
ftp://ftp.ensembl.org/pub/grch37/release-98/fasta/homo_sapiens/dna/
and ftp://ftp.ensembl.org/pub/release-98/fasta/homo_sapiens/dna/
respectively.
Logs
ValueError: Requested sequence does not seem to exist on ensembl FTP servers or servers are unavailable (url ftp://ftp.ensembl.org/pub/release-98/fasta/homo_sapiens/dna/Homo_sapiens.grch37.dna.toplevel.fa.gz)
Snakemake version
Snakemake - 5.6.0
Snakemake wrapper - 0.47.0
Describe the bug
Rule markduplicates
uses {snakemake.log}
directly in file, instead of utilizing snakemake.log_fmt_shell
. This necessiates the use of file in log:
directive and its absence results in job ending in error.
Relevant code:
snakemake-wrappers/bio/picard/markduplicates/wrapper.py
Lines 10 to 14 in dfe6d54
Logs
/bin/bash: -c: line 0: syntax error near unexpected token `newline'
/bin/bash: -c: line 0: `set -euo pipefail; picard MarkDuplicates REMOVE_DUPLICATES=true TMP_DIR=/tmp INPUT=a.sorted.bam OUTPUT=dedup/a.bam METRICS_FILE=dedup/a.metrics.txt &>'
Traceback (most recent call last):
File "/up/.snakemake/scripts/tmp075fw1hj.wrapper.py", line 14, in <module>
shell("picard MarkDuplicates {snakemake.params} INPUT={snakemake.input} "
File "/mnt/snakemake/snakemake/shell.py", line 149, in __new__
raise sp.CalledProcessError(retcode, cmd)
Minimal example
rule mark_duplicates:
input:
"a.sorted.bam"
output:
bam="dedup/a.bam",
metrics="dedup/a.metrics.txt"
log:
"/logs/a.log"
wrapper:
"0.47.0/bio/picard/markduplicates"
Additional context
Add any other context about the problem here.
Snakemake version
snakemake 5.30.1
wrapper:
"0.68.0/bio/bowtie2/build"
Describe the bug
When running a large index through the bowtie2_build, wrapper appends extra "l" in the middle of the index base name which results in output like:
bowtie2/index/rep82l.1.bt2l SHOULD BE bowtie2/index/rep82.1.bt2l
bowtie2/index/rep82l.2.bt2l SHOULD BE bowtie2/index/rep82.2.bt2l
....
....
....
bowtie2/index/rep82l.rev.2.bt2l SHOULD BE bowtie2/index/rep82.rev.2.bt2l
which further results in MissingOutputException as output is different from expected.
Logs
MissingOutputException in line 77 of /klaster/scratch/kkopera/COVID/microbiome/snakemake/QC_and_FT/Snakefile:
Job Missing files after 5 seconds:
bowtie2/index/rep82.1.bt2l
bowtie2/index/rep82.2.bt2l
bowtie2/index/rep82.3.bt2l
bowtie2/index/rep82.4.bt2l
bowtie2/index/rep82.rev.1.bt2l
bowtie2/index/rep82.rev.2.bt2l
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 4 completed successfully, but some output files are missing.
Minimal example
rule bowtie2_build:
input:
reference="prebuilt_data/rep82.fna"
output:
multiext(
"bowtie2/index/rep82",
".1.bt2l", ".2.bt2l", ".3.bt2l", ".4.bt2l", ".rev.1.bt2l", ".rev.2.bt2l",
),
log:
"logs/bowtie2_build/build.log"
params:
extra="--large-index" # optional parameters
threads: 10
wrapper:
"0.68.0/bio/bowtie2/build"
Could it be possible to fix this bug as it prevents from using the wrapper for large index?
Snakemake version
Snakemake 5.26.1 and snakemake-wrappers 0.66.0
Describe the bug
Building the documentation failed with an extension error No module named 'sphinx_copybutton'
.
Logs
Extension error:
L'extension ne peut pas รชtre importรฉe sphinx_copybutton (exception: No module named 'sphinx_copybutton')
Makefile:20ย : la recette pour la cible ยซย htmlย ยป a รฉchouรฉe
make: *** [html] Erreur 2
Minimal example
conda create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml
conda activate test-snakemake-wrapper-docs
cd docs
make html
Additional context
The code was executed within the test-snakemake-wrapper-docs
environment as documented on the Testing locally section here
When running the two snakemake wrappers Varscan
-> SnpEff
in succession there is an error thrown pcingola/SnpEff#267 (comment)
This does not happen when feeding Varscan
to the VEP
annotation tool.
This should be mentioned on the docs: https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/varscan.html
Is your feature request related to a problem? Please describe.
@tdayris is currently working on unifying the handling of Java memory specifications in wrappers where the bioconda recipes allow for this. This leads to rather large blocks of code that is repeated across all the respective recipes and whenever we want to change the strategy of memory handling, we will have to edit all those places in sync. See here:
#204 (comment)
Describe the solution you'd like
I would like to have some kind of common
or utils
module that provides useful functions. In the case above, it would e.g. need to take the snakemake object and a java_opts
string variable as parameters and would adjust the java_opts string accordingly with the Java memory option.
Describe alternatives you've considered
The alternative is keeping all those copies of code around. This might make the wrappers more readable, as no reference to another code file is needed. But as described above, it incurs the maintenance burden of having to keep all those code blocks in sync.
Snakemake version
Snakemake v5.31.1
Wrapper 0.68.0/bio/gatk/variantrecalibrator
Describe the bug
For VariantRecalibrator, I have my own path for the vcf files of the resources. When running the rule, I get an error from GATK that the resource does not exist. Checking the wrapper's code, it seems that there is a syntax error when returning the resources path. For GATK v4.1.1, the ":" in the parameters is removed according to:
https://gatk.broadinstitute.org/hc/en-us/community/posts/360072126112-Variant-Recalibrator-Couldn-t-read-file
Logs
Snakemake
RuleException:
CalledProcessError in line 52 of /home/VITO/correara/genomics/rules/filtering.smk:
Command 'source /home/VITO/correara/miniconda3/bin/activate '/home/VITO/correara/genomics/.snakemake/conda/d509627e'; set -euo pipefail; python /home/VITO/correara/genomics/.snakemake/scripts/tmpz9cha92t.wrapper.py' returned non-zero exit status 1.
File "/home/VITO/correara/miniconda3/envs/BioSnake/lib/python3.9/site-packages/snakemake/executors/init.py", line 2317, in run_wrapper
File "/home/VITO/correara/genomics/rules/filtering.smk", line 52, in __rule_recalibrate_calls
File "/home/VITO/correara/miniconda3/envs/BioSnake/lib/python3.9/site-packages/snakemake/executors/init.py", line 566, in _callback
File "/home/VITO/correara/miniconda3/envs/BioSnake/lib/python3.9/concurrent/futures/thread.py", line 52, in run
File "/home/VITO/correara/miniconda3/envs/BioSnake/lib/python3.9/site-packages/snakemake/executors/init.py", line 552, in cached_or_run
File "/home/VITO/correara/miniconda3/envs/BioSnake/lib/python3.9/site-packages/snakemake/executors/init.py", line 2348, in run_wrapper
GATK
Using GATK jar /home/VITO/correara/genomics/.snakemake/conda/d509627e/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/VITO/correara/genomics/.snakemake/conda/d509627e/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar VariantRecalibrator --max-gaussians 4 --resource hapmap,known=false,training=true,truth=true,prior=15.0:/home/VITO/correara/genomics/hapmap/hapmap_3.3.hg38.vcf.gz --resource omni,known=false,training=true,truth=false,prior=12.0:/home/VITO/correara/genomics/omni/1000G_omni2.5.hg38.vcf.gz --resource g1k,known=false,training=true,truth=false,prior=10.0:/home/VITO/correara/genomics/g1k/1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource dbsnp,known=true,training=false,truth=false,prior=2.0:/home/VITO/correara/genomics/dbsnp/hg38_dbsnp138.vcf.gz -R resources/hg38/hg38.fa -V filtered/ERR032031.indels.vcf.gz -mode INDEL --output filtered/ERR032031.indels.recalibrated.vcf.gz --tranches-file filtered/ERR032031.indels.tranches -an QD -an FS -an ReadPosRankSum -an MQRankSum -an SOR -an DP
13:58:07.143 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/VITO/correara/genomics/.snakemake/conda/d509627e/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 09, 2021 1:58:07 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:58:07.421 INFO VariantRecalibrator - ------------------------------------------------------------
13:58:07.421 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.4.1
13:58:07.422 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
13:58:07.422 INFO VariantRecalibrator - Executing as correara@dev01 on Linux v4.4.0-198-generic amd64
13:58:07.422 INFO VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-b11
13:58:07.422 INFO VariantRecalibrator - Start Date/Time: January 9, 2021 1:58:07 PM CET
13:58:07.422 INFO VariantRecalibrator - ------------------------------------------------------------
13:58:07.422 INFO VariantRecalibrator - ------------------------------------------------------------
13:58:07.422 INFO VariantRecalibrator - HTSJDK Version: 2.21.0
13:58:07.422 INFO VariantRecalibrator - Picard Version: 2.21.2
13:58:07.422 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:58:07.422 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:58:07.422 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:58:07.423 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:58:07.423 INFO VariantRecalibrator - Deflater: IntelDeflater
13:58:07.423 INFO VariantRecalibrator - Inflater: IntelInflater
13:58:07.423 INFO VariantRecalibrator - GCS max retries/reopens: 20
13:58:07.423 INFO VariantRecalibrator - Requester pays: disabled
13:58:07.423 INFO VariantRecalibrator - Initializing engine
13:58:08.124 INFO VariantRecalibrator - Shutting down engine
[January 9, 2021 1:58:08 PM CET] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=627572736
A USER ERROR has occurred: Couldn't read file file:///home/VITO/correara/genomics/hapmap,known=false,training=true,truth=true,prior=15.0:/home/VITO/correara/genomics/hapmap/hapmap_3.3.hg38.vcf.gz. Error was: It doesn't exist.
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Minimal example
rule recalibrate_calls:
input:
vcf = "filtered/{sample}.{vartype}.vcf.gz",
ref = "resources/hg38/hg38.fa",
hapmap = "/home/VITO/correara/genomics/hapmap/hapmap_3.3.hg38.vcf.gz",
omni = "/home/VITO/correara/genomics/omni/1000G_omni2.5.hg38.vcf.gz",
g1k = "/home/VITO/correara/genomics/g1k/1000G_phase1.snps.high_confidence.hg38.vcf.gz",
dbsnp = "/home/VITO/correara/genomics/dbsnp/hg38_dbsnp138.vcf.gz",
output:
vcf = temp("filtered/{sample}.{vartype}.recalibrated.vcf.gz"),
tranches = "filtered/{sample}.{vartype}.tranches",
rscript = "filtered/{sample}.{vartype}.recal.plots.R"
params:
mode = get_mode,
resources = {"hapmap": {"known": False, "training": True, "truth": True, "prior": 15.0},
"omni": {"known": False, "training": True, "truth": False, "prior": 12.0},
"g1k": {"known": False, "training": True, "truth": False, "prior": 10.0},
"dbsnp": {"known": True, "training": False, "truth": False, "prior": 2.0}},
annotation = ["QD", "FS", "ReadPosRankSum", "MQRankSum", "SOR", "DP"],
extra = get_gaussians
log:
"logs/gatk.{sample}.{vartype}.variantrecalibrator.log"
wrapper:
"0.68.0/bio/gatk/variantrecalibrator"
Could it be possible to fix the wrapper with the updated GATK syntax?
Is your feature request related to a problem? Please describe.
At least in bwa
and bwa-mem2
index, the destination (prefix
) could be directly inferred from the output, being unnecessary to specify the prefix
param.
Describe the solution you'd like
The wrapper would infer the prefix from snakemake.output[0]
@Smeds do you think it makes sense?
Transferring issue from bitbucket.
Adrien Leger:
Conda env files are a great way to easily deploy software in wrappers, but sometimes the program you want is not on anaconda cloud or requires more complicated installation and setting steps.
What about having the possibility to use singularity/docker recipes instead to auto deploy a wrapper ?
Johannes:
Yes, that indeed wonโt hurt. But the container images have to come from somewhere where sustainability is guaranteed, e.g. biocontainers. Snakemake wrapper implementation needs a minor extension such that it also searches for singularity/docker image definitions in the wrapper repo. I will put this on my TODO list.
Adrien:
Awesome
An synthax to use a local file, similar to what is currently available for a conda recipe would be nice as well, at least for dev ?
Thanks
Johannes:
I think it would make sense to simply put a container URL into the meta.yaml.
Is your feature request related to a problem? Please describe.
Specifying this via params
strings makes snakemake unaware of these resources.
Describe the solution you'd like
Via resources
, snakemake can manage available memory in its scheduling.
I've already submitted related Pull Requests into old repo:
Do I need to resubmit the PR here?
The logic for getting the basename (and thus the output filesnames) from FastQC doesn't quite match FastQC. For example:
Input file | FastQC Wrapper | FastQC |
---|---|---|
sample.fastq.gz | sample_fastqc.html | sample_fastqc.html |
sample.fastqsanger.gz | sample_fastqc.html | sample.fastqsanger_fastqc.html |
I think this file has the logic used: https://github.com/s-andrews/FastQC/blob/master/uk/ac/babraham/FastQC/Utilities/CasavaBasename.java
Thus something like:
split_ind = 2 if base.endswith(".fastq.gz") else 1
will work.
Snakemake version
5.8.1
NOTE: I will go over why I'm not using the latest version in the additional context section
Wrapper version
0.64.0/bio/fastqc
Describe the bug
I'm getting the following error trying to use the fastqc
wrapper:
Error occurred during initialization of VM
Logs
Building DAG of jobs...
Using shell: /cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 qc_before_trim_r2
1
[Wed Aug 26 09:34:45 2020]
Job 0: --- Quality check of raw data with FastQC before trimming.
python /scratch/moldach/MADDOG/VCF-FILES/biostars439754/MTG353/.snakemake/scripts/tmpb0_czsv2.wrapper.py
Activating conda environment: /scratch/moldach/MADDOG/VCF-FILES/biostars439754/MTG353/.snakemake/conda/91998b6c
fastqc --quiet -t 1 --outdir /tmp/tmpjnd9fio7 /home/moldach/projects/def-mtarailo/common/WGS_6/MTG324/MTG324_R2.fastq.gz ' 2> logs/fastqc/MTG324_R2.log'
Error occurred during initialization of VM
java/lang/NoClassDefFoundError: java/lang/Object
Traceback (most recent call last):
File "/scratch/moldach/MADDOG/VCF-FILES/biostars439754/MTG353/.snakemake/scripts/tmpb0_czsv2.wrapper.py", line 35, in <module>
shell(
File "/home/moldach/bin/snakemake/lib/python3.8/site-packages/snakemake/shell.py", line 156, in __new__
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail; fastqc --quiet -t 1 --outdir /tmp/tmpjnd9fio7 /home/moldach/projects/def-mtarailo/common/WGS_6/MTG324/MTG324_R2.fastq.gz ' 2> logs$
[Wed Aug 26 09:35:09 2020]
Error in rule qc_before_trim_r2:
jobid: 0
output: qc/fastQC/before_trim/MTG324_R2_fastqc.html, qc/fastQC/before_trim/MTG324_R2_fastqc.zip
log: logs/fastqc/MTG324_R2.log (check log file(s) for error message)
conda-env: /scratch/moldach/MADDOG/VCF-FILES/biostars439754/MTG353/.snakemake/conda/91998b6c
RuleException:
CalledProcessError in line 140 of /scratch/moldach/MADDOG/VCF-FILES/biostars439754/MTG353/Snakefile:
Command 'source /home/moldach/miniconda3/bin/activate '/scratch/moldach/MADDOG/VCF-FILES/biostars439754/MTG353/.snakemake/conda/91998b6c'; set -euo pipefail; python /scratch/moldach/MADDOG/V$
File "/scratch/moldach/MADDOG/VCF-FILES/biostars439754/MTG353/Snakefile", line 140, in __rule_qc_before_trim_r2
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.8.0/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Minimal example
config.yaml
# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"
# Tools
QC_TOOL: "fastQC"
TRIM_TOOL: "trimmomatic"
ALIGN_TOOL: "bwa"
MARKDUP_TOOL: "picard"
CALLING_TOOL: "varscan"
ANNOT_TOOL: "vep"
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
import os
import pandas as pd
# getting the samples information (names, path to r1 & r2) from samples.txt
samples_information = pd.read_csv("samples.txt", sep='\t', index_col=False)
# get a list of the sample names
sample_names = list(samples_information['sample'])
sample_locations = list(samples_information['location'])
samples_dict = dict(zip(sample_names, sample_locations))
# get number of samples
len_samples = len(sample_names)
# Rules -----------------------------------------------------------------------
rule all:
input:
expand('{QC_DIR}/{QC_TOOL}/before_trim/{sample}_{pair}_fastqc.{ext}', QC_DIR=dirs_dict["QC_DIR"], QC_TOOL=config["QC_TOOL"], sample=sample_names, pair=['R1', 'R2'], ext=['html', 'zip'])
def getHome(sample):
return(list(os.path.join(samples_dict[sample],"{0}_{1}.fastq.gz".format(sample,pair)) for pair in ['R1','R2']))
rule qc_before_trim_r1:
input:
r1 = lambda wildcards: getHome(wildcards.sample)[0]
output:
html=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R1_fastqc.html"),
zip=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R1_fastqc.zip"),
params: ""
log:
"logs/fastqc/{sample}_R1.log"
resources:
mem = 1000,
time = 30
threads: 1
message: """--- Quality check of raw data with FastQC before trimming."""
wrapper:
"0.64.0/bio/fastqc"
rule qc_before_trim_r2:
input:
r1 = lambda wildcards: getHome(wildcards.sample)[1]
output:
html=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R2_fastqc.html"),
zip=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R2_fastqc.zip"),
params: ""
log:
"logs/fastqc/{sample}_R2.log"
resources:
mem = 1000,
time = 30
threads: 1
message: """--- Quality check of raw data with FastQC before trimming."""
wrapper:
"0.64.0/bio/fastqc"
Additional context
It was mentioned in the preamble to this issue that I should I try the newest version of Snakemake. I downloaded the newest version via:
$ mamba create -c conda-forge -c bioconda -n snakemake snakemake
$ conda activate snakemake
But now when I try a dry-run I get a Segmentation fault
$ source ~/bin/snakemake/bin/activate
(snakemake) $ snakemake -n -r
Building DAG of jobs...
Job counts:
count jobs
1 all
1 qc_before_trim_r1
1 qc_before_trim_r2
3
[Wed Aug 26 10:28:41 2020]
Job 2: --- Quality check of raw data with FastQC before trimming.
Reason: Missing output files: qc/fastQC/before_trim/MTG324_R2_fastqc.html, qc/fastQC/before_trim/MTG324_R2_fastqc.zip
[Wed Aug 26 10:28:41 2020]
Job 1: --- Quality check of raw data with FastQC before trimming.
Reason: Missing output files: qc/fastQC/before_trim/MTG324_R1_fastqc.zip, qc/fastQC/before_trim/MTG324_R1_fastqc.html
[Wed Aug 26 10:28:41 2020]
localrule all:
input: qc/fastQC/before_trim/MTG324_R1_fastqc.html, qc/fastQC/before_trim/MTG324_R1_fastqc.zip, qc/fastQC/before_trim/MTG324_R2_fastqc.html, qc/fastQC/before_trim/MTG324_R2_fastqc.zip
jobid: 0
reason: Input files updated by another job: qc/fastQC/before_trim/MTG324_R2_fastqc.html, qc/fastQC/before_trim/MTG324_R2_fastqc.zip, qc/fastQC/before_trim/MTG324_R1_fastqc.zip, qc/fastQC/before_trim/MTG324_R1_fastqc.html
Job counts:
count jobs
1 all
1 qc_before_trim_r1
1 qc_before_trim_r2
3
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
$ deactivate
$ conda activate snakemake
$ conda --version
5.23.0
$ snakemake -n -r
Segmentation fault
I guess other pertinent information is that I'm on an academic HPC with a SLURM scheduler.
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
The issue I see with me using the newest Snakemake
version is the following:
If I were to run an interactive job (to get more memory) (via salloc --time=1:0:0 --mem=1000
) and then try to submit a job (which is a full pipeline consisting of many wrappers) (via bash -c "nohup snakemake --profile slurm --use-conda --jobs 500 &"
) it would only run jobs as long as the interactive job was set for.
As I understand it, Snakemake
needs to be run from the head node - it submit jobs to the SLURM
scheduler.
Is it possible that Snakemake
version 5.23.0
is more memory intensive than 5.8.1
? And if so does this preclude me from using it?
Previously my Trimmomatic
shell command included the following trimmers:
ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
However, only LEADING:3
and MINLEN:36
are working in the params: trimmer
?
rule trimming:
input:
r1 = lambda wildcards: getHome(wildcards.sample)[0],
r2 = lambda wildcards: getHome(wildcards.sample)[1]
output:
r1 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_paired.fastq.gz"),
r1_unpaired = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_unpaired.fastq.gz"),
r2 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R2_trim_paired.fastq.gz"),
r2_unpaired = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R2_trim_unpaired.fastq.gz")
log: os.path.join(dirs_dict["LOG_DIR"],config["TRIM_TOOL"],"{sample}.log")
threads: 32
params:
# list of trimmers (see manual)
trimmer=["LEADING:3", "MINLEN:36"],
# optional parameters
extra="",
compression_level="-9"
resources:
mem = 1000,
time = 120
message: """--- Trimming FASTQ files with Trimmomatic."""
wrapper:
"0.64.0/bio/trimmomatic/pe"
When trying to use any of the other parameters (ILLUMINACLIP:adapters.fa:2:30:10 TRAILING:3 SLIDINGWINDOW:4:15
) it fails.
For example, trying only TRAILING:3
:
rule trimming:
input:
r1 = lambda wildcards: getHome(wildcards.sample)[0],
r2 = lambda wildcards: getHome(wildcards.sample)[1]
output:
r1 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_paired.fastq.gz"),
r1_unpaired = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_unpaired.fastq.gz"),
r2 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R2_trim_paired.fastq.gz"),
r2_unpaired = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R2_trim_unpaired.fastq.gz")
log: os.path.join(dirs_dict["LOG_DIR"],config["TRIM_TOOL"],"{sample}.log")
threads: 32
params:
# list of trimmers (see manual)
trimmer=["TRAILING:3"],
# optional parameters
extra="",
compression_level="-9"
resources:
mem = 1000,
time = 120
message: """--- Trimming FASTQ files with Trimmomatic."""
wrapper:
"0.64.0/bio/trimmomatic/pe"
Results in the following error:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 qc_before_align_r1
1
[Mon Sep 14 13:42:08 2020]
Job 0: --- Quality check of raw data with FastQC before alignment.
Activating conda environment: /home/moldach/wrappers/.snakemake/conda/975fb1fd
Activating conda environment: /home/moldach/wrappers/.snakemake/conda/975fb1fd
Skipping ' 2> logs/fastqc/before_align/MTG324_R1.log' which didn't exist, or couldn't be read
Failed to process file MTG324_R1_trim_paired.fastq.gz
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle of a fastq entry. Your file is probably truncated
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:179)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77)
at java.base/java.lang.Thread.run(Thread.java:834)
mv: cannot stat โ/tmp/tmpsnncjthh/MTG324_R1_trim_paired_fastqc.htmlโ: No such file or directory
Traceback (most recent call last):
File "/home/moldach/wrappers/.snakemake/scripts/tmpp34b98yj.wrapper.py", line 47, in <module>
shell("mv {html_path:q} {snakemake.output.html:q}")
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/shell.py", line 205, in __new__
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail; mv /tmp/tmpsnncjthh/MTG324_R1_trim_paired_fastqc.html qc/fastQC/before_align/MTG324_R1_trim_paired_fastqc.html' returned non-zero exit status $
[Mon Sep 14 13:45:16 2020]
Error in rule qc_before_align_r1:
jobid: 0
output: qc/fastQC/before_align/MTG324_R1_trim_paired_fastqc.html, qc/fastQC/before_align/MTG324_R1_trim_paired_fastqc.zip
log: logs/fastqc/before_align/MTG324_R1.log (check log file(s) for error message)
conda-env: /home/moldach/wrappers/.snakemake/conda/975fb1fd
RuleException:
CalledProcessError in line 181 of /home/moldach/wrappers/Trim:
Command 'source /home/moldach/anaconda3/bin/activate '/home/moldach/wrappers/.snakemake/conda/975fb1fd'; set -euo pipefail; python /home/moldach/wrappers/.snakemake/scripts/tmpp34b98yj.wrapper.py' retu$
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2189, in run_wrapper
File "/home/moldach/wrappers/Trim", line 181, in __rule_qc_before_align_r1
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 529, in _callback
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/concurrent/futures/thread.py", line 57, in run
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2201, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
I'm struggling with editing snakemake/wrappers.py because I don't understand all the options - the default seems to be to specify the subfolder in the GitHub repository here, but that wouldn't make sense to have the testing files use it (unless they are merged to master) because we would always be testing the master branch. I was testing a local wrapper with a file://../wrapper.py
(up one folder from the test Snakemake file) but that doesn't seem to work. It looks like it could also be specified as http: or file: or git+file: and I'm hoping someone can help me to clearly lay out how this is supposed to work so I can do the implementation (the current one doesn't work / make sense to me). Thanks!
Hello,
I'd like to contribute adding this tool.
I've already made my own pipeline with 3 rules: it starts with a target sequencing VCF file, makes all the necessary input files for plink using that and any column from CSV file with a continous variable (preprocessing). With these I make the association testing (with multiple testing correction by permutation) and finally output a nice HTML report (using Rmarkdown) with the "tsv" tables that Plink outputs.
Thanks in advance!
Hi folks, I am on parental leave until end of February.
Hence, I won't have the chance to look into your bug reports and PRs until then.
Of course, you are all invited to do mutual reviews :-)!
Thanks a lot for your patience.
Johannes
Snakemake version
Stable release of wrappers - 0.66.0
Describe the bug
When viewing the documentation on readthedocs, the stable
tag doesn't show the code
blocks. Other version tags and the latest
tag does.
Logs
Did not try building the docs, so I'm not sure if there are useful log messages.
SonarCloud inspections aren't configured properly for this project, so current master and all recent wrappers releases (e.g. v0.41.0) are marked as "Failed" by SonarCloud
In detail, SonarCloud doesn't understand that snakemake
is already available in all wrapper.py
files and shoudn't be marked as blocker
error due to undefined variable
Such SonarCloud
profile marks pull requests into this repo as failed not due to bugs, but due to false positives.
I've made a custom wrapper in a local repo. When I use the wrapper locally, the environment is built just fine and everything runs as expected. However, when I try to call it through github, conda will fail to build environment.yaml
. For reference, I've been able to run wrappers for this test case and this test case for other wrappers without any issues.
I've tried modifying the environment file several different ways:
I've even tried using an absolute path for the wrapper as opposed to specifying a prefix. But nothing seems to be working. Here's an example that replicates the error.
snakemake version: 5.26.1
Here's the Snakefile:
configfile: "config.yaml"
MODELS = config["NNI_MODEL_PARAMS"].keys()
rule optimize_hypperparameters:
output:
experiment = expand("nni/{model}_experiments.yaml",
model = MODELS),
search_space = expand("nni/search_space/{model}_search_space.json",
model = MODELS),
experiment_results = expand("nni/experiment_results/{model}.csv",
model = MODELS)
wrapper:
"https://github.com/DamLabResources/pipeline-multitool/tree/main/nni_automation"
Here's the output log:
Building DAG of jobs...
Creating conda environment https:/github.com/DamLabResources/pipeline-multitool/tree/main/nni_automation/environment.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /tmp/tmpc5g0j_1m.yaml:
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/conda/exceptions.py", line 1079, in __call__
return func(*args, **kwargs)
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/conda_env/cli/main.py", line 80, in do_call
exit_code = getattr(module, func_name)(args, parser)
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/conda_env/cli/main_create.py", line 86, in execute
spec = specs.detect(name=name, filename=filename, directory=os.getcwd())
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/conda_env/specs/__init__.py", line 43, in detect
if spec.can_handle():
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/conda_env/specs/yaml_file.py", line 18, in can_handle
self._environment = env.from_file(self.filename)
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/conda_env/env.py", line 157, in from_file
return from_yaml(yamlstr, filename=filename)
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/conda_env/env.py", line 138, in from_yaml
data = yaml_safe_load(yamlstr)
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/conda/common/serialize.py", line 67, in yaml_safe_load
return yaml.safe_load(string, version="1.2")
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/ruamel_yaml/main.py", line 980, in safe_load
return load(stream, SafeLoader, version)
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/ruamel_yaml/main.py", line 935, in load
return loader._constructor.get_single_data()
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/ruamel_yaml/constructor.py", line 109, in get_single_data
node = self.composer.get_single_node()
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/ruamel_yaml/composer.py", line 78, in get_single_node
document = self.compose_document()
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/ruamel_yaml/composer.py", line 104, in compose_document
self.parser.get_event()
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/ruamel_yaml/parser.py", line 163, in get_event
self.current_event = self.state()
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/ruamel_yaml/parser.py", line 239, in parse_document_end
token = self.scanner.peek_token()
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/ruamel_yaml/scanner.py", line 182, in peek_token
self.fetch_more_tokens()
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/ruamel_yaml/scanner.py", line 282, in fetch_more_tokens
return self.fetch_value()
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/ruamel_yaml/scanner.py", line 655, in fetch_value
self.reader.get_mark(),
ruamel_yaml.scanner.ScannerError: mapping values are not allowed here
in "<unicode string>", line 148, column 34:
<span style="background-color: #79b8ff;width: 0%;" class="Pro ...
^ (line: 148)
`$ /home/robertlink/anaconda3/bin/conda-env create --quiet --file /home/robertlink/nni_wrapper_test/.snakemake/conda/2002495d.yaml --prefix /home/robertlink/nni_wrapper_test/.snakemake/conda/2002495d`
environment variables:
CIO_TEST=<not set>
CNI_PATH=/home/robertlink/anaconda3/lib/cni
CONDA_AUTO_UPDATE_CONDA=false
CONDA_DEFAULT_ENV=base
CONDA_EXE=/home/robertlink/anaconda3/bin/conda
CONDA_MKL_INTERFACE_LAYER_BACKUP=
CONDA_PREFIX=/home/robertlink/anaconda3
CONDA_PROMPT_MODIFIER=(base)
CONDA_PYTHON_EXE=/home/robertlink/anaconda3/bin/python
CONDA_ROOT=/home/robertlink/anaconda3
CONDA_SHLVL=1
CUDA_PATH=/home/robertlink/anaconda3
CURL_CA_BUNDLE=<not set>
LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:/usr/local/cuda/extras/CUPTI/lib64
PATH=/usr/local/cuda-10.2/bin:/home/robertlink/edirect:/home/robertlink/CNN
_promoter_prediction/Jurkat_CNN_transcriptome/Jurkat_RNA_Seq_Processin
g/Dart:/home/robertlink/sratoolkit.2.8.2-1-ubuntu64/bin:/home/robertli
nk/cd-hit-v4.8.1-2019-0228:/usr/local/cuda-10.0/bin:/home/robertlink/b
in:/home/robertlink/.local/bin:/usr/local/cuda-10.2/bin:/home/robertli
nk/anaconda3/bin:/home/robertlink/anaconda3/condabin:/home/robertlink/
edirect:/home/robertlink/CNN_promoter_prediction/Jurkat_CNN_transcript
ome/Jurkat_RNA_Seq_Processing/Dart:/home/robertlink/sratoolkit.2.8.2-1
-ubuntu64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:
/bin:/usr/games:/usr/local/games:/snap/bin
PYTHONPATH=/home/robertlink/custom_python_modules:/home/robertlink/custom_python_
modules:
REQUESTS_CA_BUNDLE=<not set>
SSL_CERT_FILE=<not set>
active environment : base
active env location : /home/robertlink/anaconda3
shell level : 1
user config file : /home/robertlink/.condarc
populated config files :
conda version : 4.8.5
conda-build version : 3.18.11
python version : 3.7.8.final.0
virtual packages : __cuda=10.2
__glibc=2.27
base environment : /home/robertlink/anaconda3 (writable)
channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /home/robertlink/anaconda3/pkgs
/home/robertlink/.conda/pkgs
envs directories : /home/robertlink/anaconda3/envs
/home/robertlink/.conda/envs
platform : linux-64
user-agent : conda/4.8.5 requests/2.24.0 CPython/3.7.8 Linux/5.4.0-47-generic ubuntu/18.04.4 glibc/2.27
UID:GID : 1002:1002
netrc file : None
offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
File "/home/robertlink/anaconda3/lib/python3.7/site-packages/snakemake/deployment/conda.py", line 352, in create
Any thoughts or help would be very appreciated. Thanks!
snakemake --version
5.10.0
Not sure if this belongs here, or on the general snakemake page, but it would be really nice to see the shell command for the snakemake wrapper rules when using the -p
parameter to submit the job.
Hi,
Snakemake version
Snakemake: bioconda:snakemake=5.10.0 (latest)
Snakemake-wrappers: 0.49.0 (latest)
However, the issue is raised since commit: d90abc3, which includes a modification of the file wrapper.py
of SnpEff
Describe the bug
In the file bio/snpeff/wrapper.py
, he lines 22 to 24 overloads the value of the variable incalls
:
incalls = snakemake.input[0]
if incalls.endswith(".bcf"):
incalls = "<(bcftools view {})".format(incalls)
However, last modifications includes bcftools
in the final shell line, at line 40
shell(
"(bcftools view {incalls} | "
"snpEff {data_dir} {stats_opt} {csvstats_opt} {extra} "
"{snakemake.params.reference} "
"{outprefix} > {outcalls}) {log}"
)
So, with bcf files, the final command line is:
(bcftools view <(bcftools view {incalls}) | snpEff ...
Additional context
Additionally, this behaviour leads to the systematic use of BCFTools piped with SnpEff. This wrapper uses at least two threads while running (one for each tools).
Solution
Please remove the lines 23 and 24, since they are no longer useful.
Many thanks in advance.
Hi,
the wrapper for PICARD CREATESEQUENCEDICTIONARY creates an unfinished *.dict file.
Do I need to attach any additional parameters for a correct *.dict file?
Many thanks.
Best regards,
Raphael
Hi,
Currently the FastQ_Screen wrapper defines the prefix after removing only the '.fastq' extension from the input file.
However, as shown below, fastq_screen also looks for other extensions ('.seq', '.txt' and '.fq')
Line 358 of fastq_screen:
This means the wrapper will fail for FASTQ files ending in .fq, .txt or .seq as there will be a filename mismatch between the actual fastq_screen output and the anticipated output based on the prefix.
Thanks,
Stephen
Snakemake version
Snakemake
5.23.0
Issue
I'm trying to adapt my regular VEP code to use the snakemake wrapper instead but am running into an issue.
I want to make sure that a) the wrapper works for me and b) it produces the same results as the following:
vep \
-i {input.sample} \
--species "caenorhabditis_elegans" \
--format "vcf" \
--everything \
--offline \
--force_overwrite \
--fasta {input.ref} \
--gff {input.annot} \
--tab \
--variant_class \
--regulatory \
--show_ref_allele \
--numbers \
--symbol \
--protein \
-o {params.sample}
In order to use VEP
with wrappers there are 3 different rules.
I have got the following two working:
# VEP Download Plugins
rule download_vep_plugins:
output:
directory("resources/vep/plugins")
params:
release=100
wrapper:
"0.64.0/bio/vep/plugins"
# VEP Cache
rule get_vep_cache:
output:
directory("resources/vep/cache")
params:
species="caenorhabditis_elegans",
build="WBcel235",
release="100"
log:
"logs/vep/cache.log"
wrapper:
"0.64.0/bio/vep/cache"
The third rule is to actually run VEP for which I've written the following rule:
rule variant_annotation:
input:
calls= lambda wildcards: getVCFs(wildcards.sample),
cache="resources/vep/cache",
plugins="resources/vep/plugins",
output:
calls=os.path.join(dirs_dict["ANNOT_DIR"],config["ANNOT_TOOL"],"{sample}.annotated.vcf"),
stats=os.path.join(dirs_dict["ANNOT_DIR"],config["ANNOT_TOOL"],"{sample}.html")
params:
plugins=["LoFtool"],
extra="--everything"
message: """--- Annotating Variants."""
resources:
mem = 30000,
time = 120
threads: 4
wrapper:
"0.64.0/bio/vep/annotate"
When submitting the job this is the error I receive:
Building DAG of jobs...
Using shell: /cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 variant_annotation
1
[Tue Aug 25 11:23:04 2020]
Job 0: --- Annotating Variants.
Activating conda environment: /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f
Failed to open VARIANT_CALLING/varscan/470_sorted_dedupped_snp_varscan.vcf: unknown file type
Possible precedence issue with control flow operator at /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Traceback (most recent call last):
File "/scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/scripts/tmpm4v6gdij.wrapper.py", line 44, in <module>
"(bcftools view {snakemake.input.calls} | "
File "/home/moldach/bin/snakemake/lib/python3.8/site-packages/snakemake/shell.py", line 156, in __new__
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail; (bcftools view VARIANT_CALLING/varscan/470_sorted_dedupped_snp_varscan.vcf | vep --everything --fork 4 --format vcf --vcf --cache --cache_version 100 --species caenorhabditis_elegans --assembly WBcel235 --dir_cache resources/vep/cache --dir_plugins resources/vep/plugins --offline --plugin LoFtool --output_file STDOUT --stats_file ANNOTATION/VEP/470.html | bcftools view -Ov > ANNOTATION/VEP/470.annotated.vcf)' returned non-zero exit status 1.
[Tue Aug 25 11:25:02 2020]
Error in rule variant_annotation:
jobid: 0
output: ANNOTATION/VEP/470.annotated.vcf, ANNOTATION/VEP/470.html
conda-env: /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f
RuleException:
CalledProcessError in line 393 of /scratch/moldach/MADDOG/VCF-FILES/biostars439754/Snakefile:
Command 'source /home/moldach/miniconda3/bin/activate '/scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f'; set -euo pipefail; python /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/scripts/tmpm4v6gdij.wrapper.py' returned non-zero exit status 1.
File "/scratch/moldach/MADDOG/VCF-FILES/biostars439754/Snakefile", line 393, in __rule_variant_annotation
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.8.0/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Removing output files of failed job variant_annotation since they might be corrupted:
ANNOTATION/VEP/470.annotated.vcf, ANNOTATION/VEP/470.html
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
This issue was originally brought up on StackOverflow, then on the ensembl-vep
repo but seems it should be posted here for a definitive answer.
Snakemake version
5.10.0
Describe the bug
For each of bwa index and bwa mem the user might put in a prefix that will work for a local job, but fail given a remote. Examples are provided below:
For bwa mem, note that the params has an "index" below that is used to specify the path. The first below I don't think will work:
rule map_reads:
input:
reads=get_merged,
idx=rules.bwa_index.output
output:
temp("mapped/{sample}.sorted.bam")
log:
"logs/bwa_mem/{sample}.log"
params:
index="refs/genome",
extra=get_read_group,
sort="samtools",
sort_order="coordinate"
threads: 8
wrapper:
"0.39.0/bio/bwa/mem"
But this would:
rule map_reads:
input:
reads=get_merged,
idx=rules.bwa_index.output
output:
temp("mapped/{sample}.sorted.bam")
log:
"logs/bwa_mem/{sample}.log"
params:
index="snakemake-testing/kim-wxs-varlociraptor/refs/genome",
extra=get_read_group,
sort="samtools",
sort_order="coordinate"
threads: 8
wrapper:
"0.39.0/bio/bwa/mem"
Unlike bwa index (discussed next) I'm not sure we can just remove this one.
For this prefix, given running on a remote with this recipe:
rule bwa_index:
input:
"refs/genome.fasta"
output:
multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
params:
prefix="refs/genome"
log:
"logs/bwa_index.log"
resources:
mem_mb=6000,disk_mb=128000
benchmark:
"benchmarks/bwa_index.tsv"
wrapper:
"0.45.1/bio/bwa/index"
The error log will report that bwa/genome.pac cannot be found. It's not the inputs or outputs, but rather that prefix is used to determine the path to the file! The correct usage would be:
rule bwa_index:
input:
"refs/genome.fasta"
output:
multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
params:
prefix="snakemake-testing/kim-wxs-varlociraptor/refs/genome"
log:
"logs/bwa_index.log"
resources:
mem_mb=6000,disk_mb=128000
benchmark:
"benchmarks/bwa_index.tsv"
wrapper:
"0.45.1/bio/bwa/index"
But rather we aren't required to specify it, so even better would be to remove it entirely:
rule bwa_index:
input:
"refs/genome.fasta"
output:
multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
log:
"logs/bwa_index.log"
resources:
mem_mb=6000,disk_mb=128000
benchmark:
"benchmarks/bwa_index.tsv"
wrapper:
"0.45.1/bio/bwa/index"
Of course the user writing the pipeline might not know this, in which case maybe there should be a fix to allow for a prefix specified that would, given a default remote prefix, add it as well?
For both of the above, if I can get started on work to fix the wrappers and then do a PR here, I'd be happy to do that! I'm not sure if there are other cases like this too in the wrappers. Let me know your thoughts.
Snakemake version
snakemake: 5.25.0
snakemake-wrappers: 0.65.0
Describe the bug
Wrapper vep/cache
has, as single output, a path where to download the cache:
https://github.com/snakemake/snakemake-wrappers/blob/master/bio/vep/cache/test/Snakefile#L3
Inside this folder, the wrapper will create folders with species, build and release. However, these folders are not part of the rule output path, so the rule cannot properly check if the output exists or not. That is, if the pipeline is run twice with different genomes, the second time will use the cache of the first run.
Suggested fix
I'd suggest having the species
/build
/release
as wildcards and part of the output folder. The wrapper would have to make sure the output path is valid (would have to end in {species}/{release}_{build}
), download the cache to a temp folder and then move to the right output folder; for example:
rule get_vep_cache:
output:
directory("resources/vep/cache/{species}/{release}_{build}")
params:
species = lambda wildcards: wildcards.species,
build = lambda wildcards: wildcards.build,
release = lambda wildcards: wildcards.release,
log:
"logs/vep/cache/{species}/{release}_{build}.log"
wrapper:
"master/bio/vep/cache"
The params
could also be deprecated, if the wrapper can get species
/build
/release
from the wildcards.
The vep annotate
rule would take the full path to the cache and infer species
/build
/release
.
Is your feature request related to a problem? Please describe.
Several bwa
wrappers include a pipe to convert to bam
or to sort:
bwa/mem
bwa/mem-samblaster
bwa/samse
bwa/sampe
bwa-mem2/mem
bwa-mem2/mem-samblaster
I believe this is not needed, does not make sense and is, sometimes, inconvenient.
Not needed since the pipe can be easily implemented in snakemake with other rules/wrappers. Since these are actually wrappers of two wrappers, shouldn't they actually be meta wrappers? And inconvenient when you actually want a sam
file (and not a bam
).
Describe the solution you'd like
Remove the optional sorting from all bwa
wrappers and have them output a sam
file. Optionally, they could be re-implemented as meta-wrappers.
The same would apply to bwa/mem-samblaster
and bwa-mem2/mem-samblaster
.
Snakemake version
Snakemake-wrappers version 0.67.0 and Sphinx v3.2.1
Describe the bug
A warning is issued by Sphinx when building the docs locally or remotely with the Github Actions.
Logs
/home/runner/work/snakemake-wrappers/snakemake-wrappers/docs/generate_docs.py:72: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
env = yaml.load(env)
Can be found here as well.
Additional context
https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation#how-to-disable-the-warning
This is not a priority, I just stumble upon several times and decided to register it. I will probably PR a minor fix soon.
Best,
Charlie
There are currently two separate wrappers for SAMTOOLS MPILEUP and VARSCAN MPILEUP2SNP
These tools are used sequentially and unfortunately single-threaded.
I'm in the process of converting shell
commands to wrappers
so I have not had a chance to benchmark these wrappers specifically; however, I assume it is the same as piping the output of samtools mpileup into varscan, e.g.:
samtools mpileup -f $ref ${array_bam[$j]{"bam"}} |\\
java -jar /home/${user}/projects/def-mtarailo/common/tools/VarScan.v2.3.9.jar pileup2snp \\
--variants \\
--min-coverage $mincov \\
--min-avg-qual $minqual \\
--min-var-freq $minfreq > ${path}/calling/varscan/${array_bam[$j]{"id"}}_snp_varscan.vcf ;
When I initially benchmarked the above code on C. elegans it took 101 minutes.
It would be ideal to create a joint-wrapper, combining the two tools, taking advantage of samtools mpileup
's --region
parameter and GNU Parallel.
The shell
command I'm currently using is:
rule variant_calling:
input:
ref = os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"]),
bam = lambda wildcards: getDeduppedBams(wildcards.sample),
bam_index = lambda wildcards: getDeduppedBamsIndex(wildcards.sample)
output:
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_sorted_dedupped_snp_varscan.vcf")
log: os.path.join(dirs_dict["LOG_DIR"],config["CALLING_TOOL"],"{sample}.log")
resources:
mem = 10000,
time = 90
threads: 7
params:
sample = "{sample}"
message: """--- Varscan pileup2snp."""
shell: """
module load samtools;
module load java/13.0.1;
echo -n "I II III IV V X MtDNA" |\
xargs -d " " -n 1 -P 7 -I {{}} /bin/bash -c \
"samtools mpileup -r {{}} \
-f ~/projects/def-mtarailo/common/indexes/WS265_wormbase/{{}}.fa \
{input.bam} |\
java -Xmx5G -jar ~/projects/def-mtarailo/common/tools/VarScan.v2.3.9.jar pileup2snp \
--variants \
--min-coverage 5 \
--min-avg-qual 30 \
--min-var-freq 0.9 > {params.sample}_{{}}.vcf"
awk 'FNR==1 && NR!=1 {{ while (/^<header>/) getline; }} 1 {{print}} ' *.vcf > {output}
rm {params.sample}_I.vcf {params.sample}_II.vcf {params.sample}_III.vcf {params.sample}_IV.vcf {params.sample}_V.vcf {params.sample}_X.vcf {params.sample}_MtDNA.vcf
"""
This parallelized the variant calling process by applying these operations to each chromosome (on a separate core) reducing computation time to 17 minutes - A 81% decrease in processing time for the lengthiest step in the C. elegans pipeline.
I've used snakemake version 5.19.2 and cutadapt wrapper version 0.60 (cutadapt version 1.9.1) .
I got an error when running my rule cutadapt that still worked a couple of months ago. From the logfile:
cutadapt: error: no such option: -j
(the threads option)
rule in *.smk file:
rule cutadapt:
input:
"fastq/raw/{unit}.R1.fastq.gz"
output:
fastq=temp("fastq/trimmed/{unit}.R1.fastq.gz"),
qc="qc/cutadapt/{unit}.txt"
params:
cutadapt_extra
log:
"logs/cutadapt/{unit}.log"
wrapper:
"0.60.0/bio/cutadapt/se"
Configfile:
cutadapt:
extra:
- "-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
Snakemake version
5.24.2
Describe the bug
I'm getting an error when using the Strelka2
wrapper when using the --use-singularity
and --use-conda
commands in conjunction but not when --use-conda
is used alone.
I'm wondering if this is problem with the wrapper or the singularity image?
If it's the image can you suggest a better one to use please
Logs
Workflow defines that rule get_vep_cache is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 strelka2
1
[Thu Sep 24 17:16:03 2020]
Job 0: --- Call germline variants with Strelka2.
python /home/moldach/wrappers/SUBSET/.snakemake/scripts/tmpoc6_0bnr.wrapper.py
Activating singularity image /home/moldach/wrappers/SUBSET/.snakemake/singularity/d7617773b315c3abcb29e0484085ed06.simg
Activating conda environment: /home/moldach/wrappers/SUBSET/.snakemake/conda/1995398f
Traceback (most recent call last):
File "/home/moldach/wrappers/SUBSET/.snakemake/scripts/tmpoc6_0bnr.wrapper.py", line 3, in <module>
x00sampleqhK\x00N\x86qish\x15]qj(h\x17h\x18eh\x17h\x19h\x1e\x85qkRql(h\x1e)}qmh"h\x17sNtqnbh\x18h\x19h\x1e\x85qoRqp(h\x1e)}qqh"h\x18sNtqrbX\x06\x00\x00\x00sampleqsheubX\x07\x00\x00\x00threadsqtK\x08$
File "/home/moldach/wrappers/SUBSET/.snakemake/conda/1995398f/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/home/moldach/wrappers/SUBSET/.snakemake/conda/1995398f/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/home/moldach/wrappers/SUBSET/.snakemake/conda/1995398f/lib/python2.7/pickle.py", line 892, in load_proto
raise ValueError, "unsupported pickle protocol: %d" % proto
ValueError: unsupported pickle protocol: 3
[Thu Sep 24 17:16:18 2020]
Error in rule strelka2:
jobid: 0
output: strelka/MTG324
log: logs/bowtie2/MTG324.log (check log file(s) for error message)
conda-env: /home/moldach/wrappers/SUBSET/.snakemake/conda/1995398f
RuleException:
CalledProcessError in line 453 of /home/moldach/wrappers/SUBSET/Snakefile:
Command ' singularity exec --home /home/moldach/wrappers/SUBSET --bind /home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages:/mnt/snakemake /home/moldach/wrappers/SUBSET/.snakemake/singula$
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2189, in run_wrapper
File "/home/moldach/wrappers/SUBSET/Snakefile", line 453, in __rule_strelka2
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 529, in _callback
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/concurrent/futures/thread.py", line 57, in run
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2201, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Minimal example
singularity: "docker://continuumio/miniconda3:4.5.11"
def getDeduppedBamsIndex(sample):
return(list(os.path.join(aligns_dict[sample],"{0}.sorted.dedupped.bam.bai".format(sample,pair)) for pair in ['']))
if (config["CALLING_TOOL"]=="strelka2"):
rule strelka2:
input:
fasta=os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"]),
bam=lambda wildcards: getDeduppedBams(wildcards.sample),
bam_index=lambda wildcards: getDeduppedBamsIndex(wildcards.sample),
fasta_index=os.path.join(dirs_dict["REF_DIR"],GENOME_INDEX)
output:
temp(directory("strelka/{sample}"))
log: os.path.join(dirs_dict["LOG_DIR"],config["ALIGN_TOOL"],"{sample}.log")
message: """--- Call germline variants with Strelka2."""
threads: 8
resources:
mem=4000,
time=100
params:
config_extra="",
run_extra=""
wrapper:
"0.65.0/bio/strelka/germline"
"""
Snakemake version
current snakemake-wrappers version 0.67.0 (bbd7c0f)
Describe the bug
The rownames of the resulting sequence table should have the names c("a","b")
but have c("a",NA)
instead.
Minimal example
The minimal example for reproducing the bug is in the test directory of the wrapper.
In the same spirit of having https://github.com/snakemake-workflows, is there any concern here about maintaining versions / containers for wrappers? I just cloned the repository locally and it wasn't tiny, but my larger concern is that if we also serve containers from here (that have automated builds for the wrappers) that will be more complicated than having a https://github.com/snakemake-wrappers organization, and storing wrappers under the complete uri. E.g., the current bio/arriba would be at https://github.com/snakemake-wrappers/bio/arriba. If some web interface (or similar) is then desired to show the wrappers, it could be rendered at snakemake-wrappers.github.io, and use a simple query to the API to update the listing.
Snakemake version
Snakemake wrappers version 0.64.0,
Snakemake version any (e.g. 5.26.1)
Describe the bug
FASTA or FASTQ files could be often be named like *.fq.gz
. Starting from 0.50.0 version fastqc wrapper doesn't support such naming conventions and always fails for such files. It happens because now FASTQC tools and wrapper expect different name for result file and as a result, the wrapper cannot find the report file.
E.g. for foo_1.fq.gz
FASTQC generates foo_1_fastqc.html
report. Wrappers < 0.50.0 also uses the same naming conventions, but after #14 fix the wrapper expects foo_1.fq_fastqc.html
which doesn't exist, so mv
command fails with file not found error.
Logs
fastqc --quiet -t 1 --outdir /scratch1/fs1/martyomov/rcherniatchik/pilot_small/tmp/tmp7kg3b6vx bams/A01/V300010918_L4_HUMvotMAAAAAAA-1_hg38_unmapped_reads_1.fq.gz 2> qc/bams_unmapped/A01/V300010918_L4_HUMvotMAAAAAAA-1_hg38_unmapped_reads_1_fastqc.log
mv /scratch1/fs1/martyomov/rcherniatchik/pilot_small/tmp/tmp7kg3b6vx/V300010918_L4_HUMvotMAAAAAAA-1_hg38_unmapped_reads_1.fq_fastqc.html qc/bams_unmapped/A01/V300010918_L4_HUMvotMAAAAAAA-1_hg38_unmapped_reads_1_fastqc.html
mv: cannot stat '/scratch1/fs1/martyomov/rcherniatchik/pilot_small/tmp/tmp7kg3b6vx/V300010918_L4_HUMvotMAAAAAAA-1_hg38_unmapped_reads_1.fq_fastqc.html': No such file or directory
Traceback (most recent call last):
File "/scratch1/fs1/martyomov/rcherniatchik/pilot_small/.snakemake/scripts/tmpgp95haer.wrapper.py", line 47, in <module>
shell("mv {html_path} {snakemake.output.html}")
File "/opt/conda/lib/python3.7/site-packages/snakemake/shell.py", line 176, in __new__
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail; mv /scratch1/fs1/martyomov/rcherniatchik/pilot_small/tmp/tmp7kg3b6vx/V300010918_L4_HUMvotMAAAAAAA-1_hg38_unmapped_reads_1.fq_fastqc.html qc/bams_unmapped/A01/V300010918_L4_HUMvotMAAAAAAA-1_hg38_unmapped_reads_1_fastqc.html' returned non-zero exit status 1.
Work dir folder content:
$ ls /scratch1/fs1/martyomov/rcherniatchik/pilot_small/tmp/tmp7kg3b6vx/
V300010918_L4_HUMvotMAAAAAAA-1_hg38_unmapped_reads_1_fastqc.html
V300010918_L4_HUMvotMAAAAAAA-1_hg38_unmapped_reads_1_fastqc.zip
Minimal example
rule fastqc:
input:
"reads/{sample}.fq.gz"
output:
html="qc/fastqc/{sample}.html",
zip="qc/fastqc/{sample}_fastqc.zip"
log:
"logs/fastqc/{sample}.log"
threads: 1
wrapper:
"master/bio/fastqc"
Additional context
Before #14 basename_without_ext
method was:
split_ind = 2 if base.endswith(".gz") else 1
base = ".".join(base.split(".")[:-split_ind])
after:
split_ind = 2 if base.endswith(".fastq.gz") else 1
base = ".".join(base.split(".")[:-split_ind])
Hi!
I've been submitting several wrappers for DADA2
recently and planned to submit a meta-wrapper as well. However, I wanted some advice on the preferred way to do so seeing that DADA2
wrappers are scattered in different branches. I see two options so far:
Option 1 I can merge all branches, squash the wrappers commits, code the meta wrapper and submit the PR. But the merges of so many branches might get dirty.
Option 2 Wait for the wrappers inclusion and built the meta-wrappers from the new state of the master branch.
I apologize if this message sounds like I want to speed up the reviews, it is not intended. I just want to know the preferred way to contribute to this awesome project!
Thanks in advance,
Charlie
Is your feature request related to a problem? Please describe.
Right now, bwa
samse
and sampe
are two different wrappers. I feel this is a bit unnecessary, since the output is only one and the input can just be an array of fastq
and another of sai
.
Describe the solution you'd like
A single wrapper for both, where the algorithm was chosen by a param
?
params:
index="genome",
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none", # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
sort_order="queryname", # optional: Sort by 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
alg="sampe"
or
params:
alg = lambda wildcards, input: "samse" if len(input.sai) == 1 else "sampe",
or even have it being auto-detected inside the wrapper.
Does this make sense?
Snakemake-wrappers 0.66.0, snakemake 5.24.0
The disparition of the displayed code appeared after addition of meta wrappers. So, the latest version of snakemake-wrappers
Describe the bug
There is no code displayed anymore in any wrapper's documentation. The code is still displayed in meta-wrappers.
Minimal example
See missing documentation in:
https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/bedtools/slop.html#code
See available documentation in:
https://snakemake-wrappers.readthedocs.io/en/stable/meta-wrappers/star_arriba.html#code
Describe the bug
While extra
variable is defined in vep cache wrapper to read extra
from params
, this extra
variable is not used in the shell
command.
Snakemake version
5.23.0
Describe the bug
Trying to get snakemake
running on another server we have access to (this one has a LSF
scheduler). Used the VEP CACHE
wrapper example verbatim. as a minimal example.
Logs
Taking a look at less/gpfs/home/moldach/projects/saliva/SIMPLE_TEST/.snakemake/log/2020-08-27T163012.417959.snakemake.log
Workflow defines that rule get_vep_cache is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 1
Job counts:
count jobs
1 all
1 get_vep_cache
2
[Thu Aug 27 16:30:13 2020]
rule get_vep_cache:
output: resources/vep/cache
log: logs/vep/cache.log
jobid: 1
Submitted job 1 with external jobid '89368 logs/cluster/get_vep_cache/unique/jobid1_089efa0e-a6ea-47aa-9c02-c1ce6ad92cbe.out'.
[Thu Aug 27 16:30:33 2020]
Error in rule get_vep_cache:
jobid: 1
output: resources/vep/cache
log: logs/vep/cache.log (check log file(s) for error message)
conda-env: /gpfs/home/moldach/projects/saliva/SIMPLE_TEST/.snakemake/conda/60d0d409
cluster_jobid: 89368 logs/cluster/get_vep_cache/unique/jobid1_089efa0e-a6ea-47aa-9c02-c1ce6ad92cbe.out
Error executing rule get_vep_cache on cluster (jobid: 1, external: 89368 logs/cluster/get_vep_cache/unique/jobid1_089efa0e-a6ea-47aa-9c02-c1ce6ad92cbe.out, jobscript: /gpfs/home/moldach/projects/saliva/SIMPLE_TEST/.snakemake/tmp.a24u6iry/snakejob.get_vep_cache.1.sh). For error details see the cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/home/moldach/projects/saliva/SIMPLE_TEST/.snakemake/log/2020-08-27T163012.417959.snakemake.log
Minimal example
The Snakefile
:
rule all:
input:
"resources/vep/cache"
rule get_vep_cache:
output:
directory("resources/vep/cache")
params:
species="saccharomyces_cerevisiae",
build="R64-1-1",
release="98"
log:
"logs/vep/cache.log"
cache: True
wrapper:
"0.64.0/bio/vep/cache"
The submit script:
bash -c "nohup snakemake --profile lsf --use-conda --jobs 1 &"
Additional context
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.