dillonl / bayescmg Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
We probably don't want them to generate a data/
folder with clinvar and other annotations in it for every run they do, could we not just have it download to the bAyesCMG install dir and all future runs reference it at that location?
To reduce run time, check if (only a few examples listed, there are likely more we could consider):
Add variant quality filtering using slivar up front, will reduce number of false positives in low complexity and repeat regions, multiallelics, and otherwise generally problematic regions. Can give you the specific slivar code @dillonl
Add a flag to keep intermediate files like slivar.vep.vcf.gz and others to reduce runtime on rerunning - goes along with #11
VarBayes.py
should be bAyesCMG.py
etc
Fix everywhere
Looks like you're expecting VarBayes.py
to be in the current (run) directory, which is not an expected behavior
$ bash ~/bin/bAyesCMG/run.sh -v 18-08-03_VAR-Marth-PND_Final_1533364009.vcf.gz -p 18-08-03_VAR-Marth-PND.ped -r /scratch/ucgd/lustre/work/u0691312/reference/ucgd_reference/GRCh37/human_g1k_v37_decoy_phix.fasta -c -g /scratch/ucgd/lustre/work/u0691312/reference/slivar.gnomad.hg37.zip -d /scratch/ucgd/lustre/work/u0691312/reference/ensembl/ -u /scratch/ucgd/lustre/work/u0691312/reference/ensembl/Plugins/ -l /scratch/ucgd/lustre/work/u0691312/reference/ensembl/Plugins/revel_all_chromosomes_vep.tsv.gz -f 18-08-03_VAR-Marth-PND_Final_1533364009.vcf.gz_runme.vcf.gz
wget: /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/libuuid.so.1: no version information available (required by wget)
--2020-04-14 08:01:57-- ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz
=> ‘data/clinvar.grc37.vcf.gz’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 2607:f220:41e:250::10
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/clinvar/vcf_GRCh37 ... done.
==> SIZE clinvar.vcf.gz ... 25516616
==> PASV ... done. ==> RETR clinvar.vcf.gz ... done.
Length: 25516616 (24M) (unauthoritative)
100%[=====================================================================================================================================================================>] 25,516,616 20.6MB/s in 1.2s
2020-04-14 08:02:00 (20.6 MB/s) - ‘data/clinvar.grc37.vcf.gz’ saved [25516616]
wget: /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/libuuid.so.1: no version information available (required by wget)
--2020-04-14 08:02:00-- ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz.tbi
=> ‘data/clinvar.grc37.vcf.gz.tbi’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 2607:f220:41e:250::10
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/clinvar/vcf_GRCh37 ... done.
==> SIZE clinvar.vcf.gz.tbi ... 284386
==> PASV ... done. ==> RETR clinvar.vcf.gz.tbi ... done.
Length: 284386 (278K) (unauthoritative)
100%[=====================================================================================================================================================================>] 284,386 915KB/s in 0.3s
2020-04-14 08:02:02 (915 KB/s) - ‘data/clinvar.grc37.vcf.gz.tbi’ saved [284386]
Possible precedence issue with control flow operator at /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
gzip: stdout: Broken pipe
/uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/bAyesCMG/run.sh: line 157: externals/slivar/slivar: No such file or directory
Possible precedence issue with control flow operator at /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
-------------------- EXCEPTION --------------------
MSG: ERROR: File "data/slivar.tmp" does not exist
STACK Bio::EnsEMBL::VEP::Parser::file /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Parser.pm:231
STACK Bio::EnsEMBL::VEP::Parser::new /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Parser.pm:125
STACK Bio::EnsEMBL::VEP::Runner::get_Parser /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:791
STACK Bio::EnsEMBL::VEP::Runner::get_InputBuffer /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:818
STACK Bio::EnsEMBL::VEP::Runner::init /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:131
STACK Bio::EnsEMBL::VEP::Runner::run /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/bin/vep:224
Date (localtime) = Tue Apr 14 08:13:45 2020
Ensembl API version = 99
---------------------------------------------------
[bgzip] No such file or directory: data/slivar.tmp.vep.vcf
open: No such file or directory
[tabix] was bgzip used to compress this file? data/slivar.tmp.vep.vcf.gz
python VarBayes.py -v data/slivar.tmp.vep.vcf.gz -f 18-08-03_VAR-Marth-PND.ped -d 18-08-03_VAR-Marth-PND_Final_1533364009.vcf.gz_runme.vcf.gz -c data/clinvar.grc37.vep.vcf.gz -e 2.0 -o 350 -p 0.1 -a 0.01 -r 0.6
python: can't open file 'VarBayes.py': [Errno 2] No such file or directory
slivar comp het function requires the VCF to be annotated (to track gene names), need to annotate with VEP before calling this function
[slivar] evaluating on 1 trios
fatal.nim(39) sysFatal
Error: unhandled exception: comphet.nim(226, 12) `gene_fields.len > 0` [slivar] error! no gene-like field found in /dev/stdin [AssertionError]
is the slivar error
$ bash ~/bin/bAyesCMG/run.sh
Make Sure you provide out all required parameters
Usage: VarBayes [OPTION]
Description of VarBayes
-h, --help Print help instructions
-v, --vcf_file Input VCF File Path [REQUIRED]
-p, --ped_file Input PED File Path [REQUIRED]
-r, --reference_file Reference (FASTA) File Path [REQUIRED]
-c, --get_clinvar Download latest ClinVar file [if no ClinVar file available in the data directory this arg will be ignored and ClinVar will be downloaded automatically reguardless]
-g, --gnomad GNOMAD File Path [REQUIRED]
-d, --vep_cache_dir VEP Cache Directory Path [REQUIRED]
-u, --vep_plugin_dir VEP Plugin Directory Path [REQUIRED]
-l, --vep_revel_file VEP REVEL File Path [REQUIRED]
-t, --gnomad_af_threshold gnomAD_AF threshold (default value = 0.01)
-j, --revel_af_threshold REVEL threshold [Ask Matt] (default value = 0.6)
-y, --prior_probability Prior probability [Optional, default 0.1]
-o, --odds_pathogenic The odds of pathogenicity for 'Very Strong' [Optional, default 350]
-e, --exponent The exponent that sets the strength of Supporting/Moderate/Strong compared to 'Very Strong' [Optional, default 0.1]
-f, --finished_vcf_path File name of the output vcf [REQUIRED]
Should be a simple single command that does:
normalize, subset, decompose > slivar > vep > varbayes
AND make sure there are checks along the way, if for example a user feeds in a VCF/PED combination where PED samples don’t match VCF samples, or a VCF that is VEP annotated but not slivar annotated, etc etc
Double check only proband variants are being considered
When running multiple times, multiple clinvar vcf tbis are generated
clinvar.vcf.gz.tbi clinvar.vcf.gz.tbi.1 clinvar.vcf.gz.tbi.2
VQSRTranche variants are being retained. Should only be having PASS variants.
You need to add
--pass-only
to the slivar commands
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.