Git Product home page Git Product logo

bayescmg's Issues

Skip (pre) processing steps if possible

To reduce run time, check if (only a few examples listed, there are likely more we could consider):

  • VCF is decomposed, normalized, subsetted correctly with the correct samples from the ped, if so, skip this step
  • VCF is annotated (with VEP or otherwise) and already contains the annotations we need, if so, skip this step

Add variant quality filtering using slivar

Add variant quality filtering using slivar up front, will reduce number of false positives in low complexity and repeat regions, multiallelics, and otherwise generally problematic regions. Can give you the specific slivar code @dillonl

run.sh crashes even with all necessary components in PATH

Looks like you're expecting VarBayes.py to be in the current (run) directory, which is not an expected behavior

$ bash ~/bin/bAyesCMG/run.sh -v 18-08-03_VAR-Marth-PND_Final_1533364009.vcf.gz -p 18-08-03_VAR-Marth-PND.ped -r /scratch/ucgd/lustre/work/u0691312/reference/ucgd_reference/GRCh37/human_g1k_v37_decoy_phix.fasta -c -g /scratch/ucgd/lustre/work/u0691312/reference/slivar.gnomad.hg37.zip -d /scratch/ucgd/lustre/work/u0691312/reference/ensembl/ -u /scratch/ucgd/lustre/work/u0691312/reference/ensembl/Plugins/ -l /scratch/ucgd/lustre/work/u0691312/reference/ensembl/Plugins/revel_all_chromosomes_vep.tsv.gz -f 18-08-03_VAR-Marth-PND_Final_1533364009.vcf.gz_runme.vcf.gz
wget: /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/libuuid.so.1: no version information available (required by wget)
--2020-04-14 08:01:57--  ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz
           => ‘data/clinvar.grc37.vcf.gz’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 2607:f220:41e:250::10
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/clinvar/vcf_GRCh37 ... done.
==> SIZE clinvar.vcf.gz ... 25516616
==> PASV ... done.    ==> RETR clinvar.vcf.gz ... done.
Length: 25516616 (24M) (unauthoritative)

100%[=====================================================================================================================================================================>] 25,516,616  20.6MB/s   in 1.2s   

2020-04-14 08:02:00 (20.6 MB/s) - ‘data/clinvar.grc37.vcf.gz’ saved [25516616]

wget: /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/libuuid.so.1: no version information available (required by wget)
--2020-04-14 08:02:00--  ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz.tbi
           => ‘data/clinvar.grc37.vcf.gz.tbi’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 2607:f220:41e:250::10
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/clinvar/vcf_GRCh37 ... done.
==> SIZE clinvar.vcf.gz.tbi ... 284386
==> PASV ... done.    ==> RETR clinvar.vcf.gz.tbi ... done.
Length: 284386 (278K) (unauthoritative)

100%[=====================================================================================================================================================================>] 284,386      915KB/s   in 0.3s   

2020-04-14 08:02:02 (915 KB/s) - ‘data/clinvar.grc37.vcf.gz.tbi’ saved [284386]

Possible precedence issue with control flow operator at /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.

gzip: stdout: Broken pipe
/uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/bAyesCMG/run.sh: line 157: externals/slivar/slivar: No such file or directory
Possible precedence issue with control flow operator at /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.

-------------------- EXCEPTION --------------------
MSG: ERROR: File "data/slivar.tmp" does not exist

STACK Bio::EnsEMBL::VEP::Parser::file /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Parser.pm:231
STACK Bio::EnsEMBL::VEP::Parser::new /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Parser.pm:125
STACK Bio::EnsEMBL::VEP::Runner::get_Parser /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:791
STACK Bio::EnsEMBL::VEP::Runner::get_InputBuffer /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:818
STACK Bio::EnsEMBL::VEP::Runner::init /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:131
STACK Bio::EnsEMBL::VEP::Runner::run /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/bin/vep:224
Date (localtime)    = Tue Apr 14 08:13:45 2020
Ensembl API version = 99
---------------------------------------------------
[bgzip] No such file or directory: data/slivar.tmp.vep.vcf
open: No such file or directory
[tabix] was bgzip used to compress this file? data/slivar.tmp.vep.vcf.gz
python VarBayes.py -v data/slivar.tmp.vep.vcf.gz -f 18-08-03_VAR-Marth-PND.ped -d 18-08-03_VAR-Marth-PND_Final_1533364009.vcf.gz_runme.vcf.gz -c data/clinvar.grc37.vep.vcf.gz -e 2.0 -o 350 -p 0.1 -a 0.01 -r 0.6
python: can't open file 'VarBayes.py': [Errno 2] No such file or directory

Need to annotate VCF before slivar comp het function

slivar comp het function requires the VCF to be annotated (to track gene names), need to annotate with VEP before calling this function

[slivar] evaluating on 1 trios
fatal.nim(39)            sysFatal
Error: unhandled exception: comphet.nim(226, 12) `gene_fields.len > 0` [slivar] error! no gene-like field found in /dev/stdin [AssertionError]

is the slivar error

Fix run.sh message

$ bash ~/bin/bAyesCMG/run.sh 
Make Sure you provide out all required parameters
Usage: VarBayes [OPTION]
 	Description of VarBayes
 		-h, --help Print help instructions
 		-v, --vcf_file Input VCF File Path [REQUIRED]
 		-p, --ped_file Input PED File Path [REQUIRED]
 		-r, --reference_file Reference (FASTA) File Path [REQUIRED]
 		-c, --get_clinvar Download latest ClinVar file [if no ClinVar file available in the data directory this arg will be ignored and ClinVar will be downloaded automatically reguardless]
 		-g, --gnomad GNOMAD File Path [REQUIRED]
 		-d, --vep_cache_dir VEP Cache Directory Path [REQUIRED]
 		-u, --vep_plugin_dir VEP Plugin Directory Path [REQUIRED]
 		-l, --vep_revel_file VEP REVEL File Path [REQUIRED]
 		-t, --gnomad_af_threshold gnomAD_AF threshold (default value = 0.01)
 		-j, --revel_af_threshold REVEL threshold [Ask Matt] (default value = 0.6)
 		-y, --prior_probability Prior probability [Optional, default 0.1]
 		-o, --odds_pathogenic The odds of pathogenicity for 'Very Strong' [Optional, default 350]
 		-e, --exponent The exponent that sets the strength of Supporting/Moderate/Strong compared to 'Very Strong' [Optional, default 0.1]
 		-f, --finished_vcf_path File name of the output vcf [REQUIRED]
  • remove mention of VarBayes
  • revel is not an af, make it revel_threshold
  • remove all [Ask Matt]
  • put all REQUIRED parameters together at the top, maybe even in their own section for easier visuals
  • same as above for all OPTIONAL parameters
  • change "Optional" to OPTIONAL to be consistent with REQUIRED
  • fix typos like "reguardless"
  • etc

Processing on unannotated vcf in new environment

Should be a simple single command that does:
normalize, subset, decompose > slivar > vep > varbayes

AND make sure there are checks along the way, if for example a user feeds in a VCF/PED combination where PED samples don’t match VCF samples, or a VCF that is VEP annotated but not slivar annotated, etc etc

Getting multiple clinvar tbis

When running multiple times, multiple clinvar vcf tbis are generated

clinvar.vcf.gz.tbi  clinvar.vcf.gz.tbi.1  clinvar.vcf.gz.tbi.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.