Git Product home page Git Product logo

bayescmg's Introduction

bAyesCMG

An applied Bayesian framework for the ACMG/AMP criteria

Purpose

Applying the ACMG/AMP criteria often is tedious, manual and subject to human error. bAyesCMG provides automated application of pathogenic and benign ACMG/AMP evidence codes to all variant records in a given VCF file. bAyesCMG then uses these evidence codes to assign a Bayesian posterior probability of pathogenicity (0 to 1 scale), according to Tavtigian et al, 2018, for downstream filtering and variant review. bAyesCMG dramatically reduces the number of variants for consideration in Mendelian disease studies and is capable of correctly prioritizing the diagnostic variant in whole exome and whole genome sequencing data.

Installation

Usage

Requires:

  1. a multi-sample merged (preferably joint called) VCF with three samples: proband, mom, dad - pre-processing this VCF is recommended, normalizing, decomposing and annotating with VEP
  2. a pedigree file describing the relatedness between the VCF samples

Flags

Output

Assertions:

-1 == evidence code evaluated, NEGATIVE assertion
0 == evidence code NOT evaludated
1 == evidence code evaluated, POSITIVE assertion

Filtering

Evidence Codes

Evidence Code and Descrption Implementation Description
PVS1 "Null variant (nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single or multiexon deletion) in a gene where LOF is a known mechanism of disease" VEP IMPACT field is HIGH, gene with known LOF mechanism is not considered
PS1 "Same amino acid change as a previously established pathogenic variant regardless of nucleotide change" Same amino acid change as annotated pathogenic variant in ClinVar VCF, disease/phenotype not considered
PS2 "De novo (both maternity and paternity confirmed) in a patient with the disease and no family history" Genotype 0/1 in proband, genotypes 0/0 in both parents
PS3 "Well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product" Not currently implemented
PS4 "The prevalence of the variant in affected individuals is significantly increased compared with the prevalence in controls" Genotype segregates with affected status, regardless of genotype, 0/1 in proband and 0/0 in both parents, or 1/1 in proband and 0/0 or 0/1 in parents
PM1 "Located in a mutational hot spot and/or critical and well-established functional domain (e.g., active site of an enzyme) without benign variation" Variant has a functional domain annotation in VEP --domains field
PM2 "Absent from controls (or at extremely low frequency if recessive) in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium" Less than 0.01 frequency in gnomAD (default), can also be specified by user
PM3 "For recessive disorders, detected in trans with a pathogenic variant" Not currently implemented
PM4 "Protein length changes as a result of in-frame deletions/insertions in a non-repeat region or stop-loss variants" VEP Consequence field is inframe_insertion, inframe_deletion or stop_lost, currently not considering repeat regions or surrounding bases
PM5 "Novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before" Where ClinVar VCF CLNSIG is Pathogenic, given variant VEP Protein_position matches ClinVar variant, but Amino_acids does not match
PM6 "Assumed de novo, but without confirmation of paternity and maternity" Not currently implemented, requiring trios
PP1 "Cosegregation with disease in multiple affected family members in a gene definitively known to cause the disease" Not currently implemented, requiring trios, not quartets or larger with multiple affected individuals
PP2 "Missense variant in a gene that has a low rate of benign missense variation and in which missense variants are a common mechanism of disease" Not currently implemented
PP3 "Multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact, etc.)" VEP REVEL plugin score greater than 0.6 (default) or a user-specified value
PP4 "Patient’s phenotype or family history is highly specific for a disease with a single genetic etiology" Not currently implemented
PP5 "Reputable source recently reports variant as pathogenic, but the evidence is not available to the laboratory to perform an independent evaluation" Not currently implemented
BA1 "Allele frequency is >5% in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium" gnomAD population max allele frequency is greater than 0.05
BS1 "Allele frequency is greater than expected for disorder" gnomAD population max allele frequency is greater than 0.01 (default) or user-specified value
BS2 "Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder, with full penetrance expected at an early age" Variant does not segregate with affected status, for 1/1 (hom-alt), 1/1 genotype also in an unaffected parent, for 0/1 (het), 0/1 genotype also in an unaffected parent
BS3 "Well-established in vitro or in vivo functional studies show no damaging effect on protein function or splicing" Not currently implemented
BS4 "Lack of segregation in affected members of a family" Variant does not segregate with affected status, for 1/1 (hom-alt), 1/1 genotype also in an unaffected parent, for 0/1 (het), 0/1 genotype also in an unaffected parent
BP1 "Missense variant in a gene for which primarily truncating variants are known to cause disease" Not currently implemented
BP2 "Observed in trans with a pathogenic variant for a fully penetrant dominant gene/disorder or observed in cis with a pathogenic variant in any inheritance pattern" Not currently implemented
BP3 "In-frame deletions/insertions in a repetitive region without a known function" No VEP DOMAINS field and Consequence field is inframe_insertion, inframe_deletion or stop_lost, currently not considering repeat regions or surrounding bases
BP4 "Multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact, etc.)" VEP REVEL plugin score less than 0.6 (default) or a user-specified value
BP5 "Variant found in a case with an alternate molecular basis for disease" Not currently implemented
BP6 "Reputable source recently reports variant as benign, but the evidence is not available to the laboratory to perform an independent evaluation" Not currently implemented
BP7 "A synonymous (silent) variant for which splicing prediction algorithms predict no impact to the splice consensus sequence nor the creation of a new splice site AND the nucleotide is not highly conserved" VEP Consequence is synonymous and not splice_region, or annotated from SpliceRegion plugin

bayescmg's People

Contributors

mvelinder avatar dillonl avatar

Stargazers

Karma avatar Larry avatar  avatar

Watchers

 avatar

Forkers

mvelinder

bayescmg's Issues

Fix run.sh message

$ bash ~/bin/bAyesCMG/run.sh 
Make Sure you provide out all required parameters
Usage: VarBayes [OPTION]
 	Description of VarBayes
 		-h, --help Print help instructions
 		-v, --vcf_file Input VCF File Path [REQUIRED]
 		-p, --ped_file Input PED File Path [REQUIRED]
 		-r, --reference_file Reference (FASTA) File Path [REQUIRED]
 		-c, --get_clinvar Download latest ClinVar file [if no ClinVar file available in the data directory this arg will be ignored and ClinVar will be downloaded automatically reguardless]
 		-g, --gnomad GNOMAD File Path [REQUIRED]
 		-d, --vep_cache_dir VEP Cache Directory Path [REQUIRED]
 		-u, --vep_plugin_dir VEP Plugin Directory Path [REQUIRED]
 		-l, --vep_revel_file VEP REVEL File Path [REQUIRED]
 		-t, --gnomad_af_threshold gnomAD_AF threshold (default value = 0.01)
 		-j, --revel_af_threshold REVEL threshold [Ask Matt] (default value = 0.6)
 		-y, --prior_probability Prior probability [Optional, default 0.1]
 		-o, --odds_pathogenic The odds of pathogenicity for 'Very Strong' [Optional, default 350]
 		-e, --exponent The exponent that sets the strength of Supporting/Moderate/Strong compared to 'Very Strong' [Optional, default 0.1]
 		-f, --finished_vcf_path File name of the output vcf [REQUIRED]
  • remove mention of VarBayes
  • revel is not an af, make it revel_threshold
  • remove all [Ask Matt]
  • put all REQUIRED parameters together at the top, maybe even in their own section for easier visuals
  • same as above for all OPTIONAL parameters
  • change "Optional" to OPTIONAL to be consistent with REQUIRED
  • fix typos like "reguardless"
  • etc

run.sh crashes even with all necessary components in PATH

Looks like you're expecting VarBayes.py to be in the current (run) directory, which is not an expected behavior

$ bash ~/bin/bAyesCMG/run.sh -v 18-08-03_VAR-Marth-PND_Final_1533364009.vcf.gz -p 18-08-03_VAR-Marth-PND.ped -r /scratch/ucgd/lustre/work/u0691312/reference/ucgd_reference/GRCh37/human_g1k_v37_decoy_phix.fasta -c -g /scratch/ucgd/lustre/work/u0691312/reference/slivar.gnomad.hg37.zip -d /scratch/ucgd/lustre/work/u0691312/reference/ensembl/ -u /scratch/ucgd/lustre/work/u0691312/reference/ensembl/Plugins/ -l /scratch/ucgd/lustre/work/u0691312/reference/ensembl/Plugins/revel_all_chromosomes_vep.tsv.gz -f 18-08-03_VAR-Marth-PND_Final_1533364009.vcf.gz_runme.vcf.gz
wget: /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/libuuid.so.1: no version information available (required by wget)
--2020-04-14 08:01:57--  ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz
           => ‘data/clinvar.grc37.vcf.gz’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 2607:f220:41e:250::10
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/clinvar/vcf_GRCh37 ... done.
==> SIZE clinvar.vcf.gz ... 25516616
==> PASV ... done.    ==> RETR clinvar.vcf.gz ... done.
Length: 25516616 (24M) (unauthoritative)

100%[=====================================================================================================================================================================>] 25,516,616  20.6MB/s   in 1.2s   

2020-04-14 08:02:00 (20.6 MB/s) - ‘data/clinvar.grc37.vcf.gz’ saved [25516616]

wget: /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/libuuid.so.1: no version information available (required by wget)
--2020-04-14 08:02:00--  ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz.tbi
           => ‘data/clinvar.grc37.vcf.gz.tbi’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 2607:f220:41e:250::10
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/clinvar/vcf_GRCh37 ... done.
==> SIZE clinvar.vcf.gz.tbi ... 284386
==> PASV ... done.    ==> RETR clinvar.vcf.gz.tbi ... done.
Length: 284386 (278K) (unauthoritative)

100%[=====================================================================================================================================================================>] 284,386      915KB/s   in 0.3s   

2020-04-14 08:02:02 (915 KB/s) - ‘data/clinvar.grc37.vcf.gz.tbi’ saved [284386]

Possible precedence issue with control flow operator at /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.

gzip: stdout: Broken pipe
/uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/bAyesCMG/run.sh: line 157: externals/slivar/slivar: No such file or directory
Possible precedence issue with control flow operator at /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.

-------------------- EXCEPTION --------------------
MSG: ERROR: File "data/slivar.tmp" does not exist

STACK Bio::EnsEMBL::VEP::Parser::file /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Parser.pm:231
STACK Bio::EnsEMBL::VEP::Parser::new /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Parser.pm:125
STACK Bio::EnsEMBL::VEP::Runner::get_Parser /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:791
STACK Bio::EnsEMBL::VEP::Runner::get_InputBuffer /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:818
STACK Bio::EnsEMBL::VEP::Runner::init /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:131
STACK Bio::EnsEMBL::VEP::Runner::run /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/share/ensembl-vep-99.2-0/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/miniconda3/bin/vep:224
Date (localtime)    = Tue Apr 14 08:13:45 2020
Ensembl API version = 99
---------------------------------------------------
[bgzip] No such file or directory: data/slivar.tmp.vep.vcf
open: No such file or directory
[tabix] was bgzip used to compress this file? data/slivar.tmp.vep.vcf.gz
python VarBayes.py -v data/slivar.tmp.vep.vcf.gz -f 18-08-03_VAR-Marth-PND.ped -d 18-08-03_VAR-Marth-PND_Final_1533364009.vcf.gz_runme.vcf.gz -c data/clinvar.grc37.vep.vcf.gz -e 2.0 -o 350 -p 0.1 -a 0.01 -r 0.6
python: can't open file 'VarBayes.py': [Errno 2] No such file or directory

Processing on unannotated vcf in new environment

Should be a simple single command that does:
normalize, subset, decompose > slivar > vep > varbayes

AND make sure there are checks along the way, if for example a user feeds in a VCF/PED combination where PED samples don’t match VCF samples, or a VCF that is VEP annotated but not slivar annotated, etc etc

Need to annotate VCF before slivar comp het function

slivar comp het function requires the VCF to be annotated (to track gene names), need to annotate with VEP before calling this function

[slivar] evaluating on 1 trios
fatal.nim(39)            sysFatal
Error: unhandled exception: comphet.nim(226, 12) `gene_fields.len > 0` [slivar] error! no gene-like field found in /dev/stdin [AssertionError]

is the slivar error

Skip (pre) processing steps if possible

To reduce run time, check if (only a few examples listed, there are likely more we could consider):

  • VCF is decomposed, normalized, subsetted correctly with the correct samples from the ped, if so, skip this step
  • VCF is annotated (with VEP or otherwise) and already contains the annotations we need, if so, skip this step

Getting multiple clinvar tbis

When running multiple times, multiple clinvar vcf tbis are generated

clinvar.vcf.gz.tbi  clinvar.vcf.gz.tbi.1  clinvar.vcf.gz.tbi.2

Add variant quality filtering using slivar

Add variant quality filtering using slivar up front, will reduce number of false positives in low complexity and repeat regions, multiallelics, and otherwise generally problematic regions. Can give you the specific slivar code @dillonl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.