Git Product home page Git Product logo

vpot's Introduction

VPOT

VPOT - Variant Prioritisation Ordering Tool.

VPOT is a Python tool written to allow prioritisation of variants in ANNOVAR annotated VCF files. VPOT provides various functions for the purpose of speeding up variant discovery.

  • priority - priority tool
  • genef - gene filter
  • samplef - samples and inheritance model filtering
  • stats - general statistics on the VPOT priority output file
  • merge - merge multiple VPOT priority output files into a single VPOT priority output file
  • utility - VPOT utilities

Requirements

  • Python 3.6.+ and NumPy
  • Linux environment, or access to linux via ssh

Installation

  • Navigate to desired install directory and clone this repository.

git clone https://github.com/VCCRI/VPOT.git VPOT

  • Ensure that requirements are met.
  • Test that VPOT is working using the test_data and README.MD/tutorial provided.
  • all done!

Usage - VPOT.py < option >

  • help - will return a help screen

  • priority - priority tool this option performs the variant proritisation process on the input samples VCF files. It will score each variant found for the supplied samples based on the weighting affixed to the annotations supplied in the Prioritisation Parameter File, PPF.

    Inputs :

    • location for output file+prefix
    • file of input VCF files and sample IDs
    • Prioritisation Parameter File (PPF)

    Output :

    • VPOT Priority Output List (VPOL)
  • genef - gene filter this option performs variant filtering of the VPOL based on genes supplied as input.

    Inputs :

    • location for output file+prefix
    • VPOT Priority Output List (VPOL)
    • gene list

    Output :

    • gene filtered VPOT Priority Output List (VPOL)
  • samplef - samples and inheritance model filtering this option performs variant filtering of the VPOL based on a suplied ped format file. It can be used for a simple case-control filtering or an inheritance model filtering for a family trio.

    Inputs :

    • location for output file+prefix
    • VPOT Priority Output List (VPOL)
    • sample selection file (ped format)
    • proband sample ID (for inheritance model filtering)
    • inheritance model (DN/AD/AR/CH for inheritance model filtering)

    Output :

    • sample filtered VPOT Priority Output List (VPOL)
  • stats - general statistics on the VPOT priority output file this option returns a summary statistic file for the VPOL supplied. It provide a small report listing the number of variants (the total number of variants, the number of scoring variants, the number of non-scoring variants), the number of genes and the number of samples. There is a breakdown of the number of variants found in each score 10% percentile range. The top 20 variants are also listed. A breakdown for each sample is also provided, with a table containing the number of variants in genes found above the percentile value supplied.

    Inputs :

    • location for output file+prefix
    • VPOT Priority Output List (VPOL)
    • a percentile value [1-99] for quick summary statistic

    Output :

    • statistic VPOT Priority Output List (VPOL)
  • merge - merge multiple VPOT priority output list files into a single VPOT priority output list file this option provide the ability to merge various VPOLs into one single VPOL. This function allows large cohorts to be split into small groups to speed up proritisation processing and then output to be re-consolidated back into one single large cohort VPOL for downstream analysis or filtering.

    Inputs :

    • location for output file+prefix
    • file containing a list of VPOT Priority Output List (VPOL)

    Output :

    • merged VPOT Priority Output List (VPOL)
  • utilities - VPOT utilities.

         Utility 1 : convertVEP - this utility convert VEP annotated VCF into the standard VCF format.
     		    Inputs :
     			* full pathname of input VEP annotated VCF file
     		    * full pathname of output VCF file, including directory path 
     
     		    Output :
     			 * standard VCF 
    

see README.MD in the test_data directory for more details on each function.

Citation Eddie Ip, Gavin Chapman, David Winlaw, Sally L. Dunwoodie, Eleni Giannoulatou, VPOT: A Customizable Variant Prioritization Ordering Tool for Annotated Variants, Genomics, Proteomics & Bioinformatics, Volume 17, Issue 5, 2019, Pages 540-545, ISSN 1672-0229, https://doi.org/10.1016/j.gpb.2019.11.001. (http://www.sciencedirect.com/science/article/pii/S1672022919301494)

vpot's People

Contributors

croct avatar eipvccri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vpot's Issues

PRIORTIZATION

Hi

I am really interested in VPOT to prioritize variants. first I used the example provided (test data) it worked perfectly.

I have multiple VCF multiple patient samples, VPOT worked with two but not the third one

  1. Could you please help me to figure out why it didn't work with my third vcf file ? as the error message doesn't specify couldn't upload the vcf

  2. when i create test_VCF_sample_list.txt in Linux VPOT doesn't read the samples and it gives me error ; how can i overcome this issue although im checking the space and the spelling?
    error message : FileNotFoundError: [Errno 2] No such file or directory: 'PBG-653-21_S3.vcf PBG-653-21_S3'
    code used:
    [sturkistany@klogin1 project2] i = PBG-653-21_S3.vcf
    [sturkistany@klogin1 project2] j=$(echo $i | cut -f1 -d'.')
    [sturkistany@klogin1 project2]$ echo -e $i $j > test_VCF_sample_$j.txt
    [sturkistany@klogin1 project2]$ sed -i 's/\r$//g' test_VCF_sample_$j.txt
    [sturkistany@klogin1 project2]python3 ../../VPOT.py priority testout_ test_VCF_sample_$j.txt default_0.001_variants_parameters_PPF.txt
    #VPOT version 2 - 07/01/2021
    VPOT Prioritisation - Main
    Samples input files : test_VCF_sample_PBG-653-21_S3.txt
    vcf PBG-653-21_S3
    VPOT: Parameter file supplied
    QC MaxCOverage : 0
    QC Hete_balance : 0
    QC Genotype_Quality : 0
    VS Score Threshold : 0
    default_0.001_variants_parameters_PPF.txt
    Traceback (most recent call last):
    File "../../VPOT.py", line 177, in
    main()
    File "../../VPOT.py", line 135, in main
    VPOT_1_prioritise.main() #
    File "/fefs1/cegmr/sturkistany/VPOT-master/VPOT_1_prioritise.py", line 299, in main
    error=VPOT_1_1_VCF.read_variant_source_file() #
    File "/fefs1/cegmr/sturkistany/VPOT-master/VPOT_1_1_VCF.py", line 164, in read_variant_source_file
    if ( setup_for_this_src_file(this_line) != 0 ): #
    File "/fefs1/cegmr/sturkistany/VPOT-master/VPOT_1_1_VCF.py", line 268, in setup_for_this_src_file
    with open(file_line[0],'r',encoding="utf-8") as source_vcf : #
    FileNotFoundError: [Errno 2] No such file or directory: 'PBG-653-21_S3.vcf PBG-653-21_S3'

  3. can i add Clinvar to the PFF parameters ? how ?

Thanks!

VPOT with a txt of mine

Sorry, I want to use VPOT tool because I want to do variant prioritization with my data. How can I do?

VPOT Prioritisation - genotype IndexError: list index out of range

Hi!
I would like to use VPOT to prioritize variants in my dataset which were annotated with VEP. I have successfully run the convertVEP step, however while submitting the converted vcf into the VPOT Prioritisation step I am getting the following error:
#VPOT version 2 - 07/01/2021 VPOT Prioritisation - Main Samples input files : sample_list.tsv vcf VPOT: Parameter file supplied QC MaxCOverage : 0 QC Hete_balance : 0 QC Genotype_Quality : 0 VS Score Threshold : 0 ppf.txt processing input file : ['data.vcf', 'sample1', ''] Traceback (most recent call last): File "../VPOT.py", line 177, in <module> main() File "../VPOT.py", line 135, in main VPOT_1_prioritise.main() # File "~/VPOT/VPOT_1_prioritise.py", line 299, in main error=VPOT_1_1_VCF.read_variant_source_file() # File "~/VPOT/VPOT_1_1_VCF.py", line 178, in read_variant_source_file work_this_src_file(this_line) # File "~/VPOT/VPOT_1_1_VCF.py", line 401, in work_this_src_file work_this_src_file_1(source_vcf,wrkf1) # File "~/VPOT/VPOT_1_1_VCF.py", line 379, in work_this_src_file_1 check_this_variant(src_line, wrkf1) # File "~/VPOT/VPOT_1_1_VCF.py", line 433, in check_this_variant elif (GENOTYPE1[0] == GENOTYPE1[1]) : # a homozygous alt genotype IndexError: list index out of range

It seems that the issue arises during the parsing of sample genotypes. I would appreciate a lot any hints of how to tackle this problem!

Best,

Anastasia

IndexError: list index out of range

Hi

I am using VPOT and its working very well except for multiple vcfs . Please see the error message below
[sturkistany@kdata1 test5]$ python3 ../VPOT.py priority testout_$vcf_f test_VCF_sample_Rabie_I_HD99_S28.txt default_0.001_variants_parameters_PPF.txt
#VPOT version 2 - 07/01/2021
VPOT Prioritisation - Main
Samples input files : test_VCF_sample_Rabie_I_HD99_S28.txt
vcf
VPOT: Parameter file supplied
QC MaxCOverage : 0
QC Hete_balance : 0
QC Genotype_Quality : 0
VS Score Threshold : 0
default_0.001_variants_parameters_PPF.txt
processing input file : ['Rabie_I_HD99_S28.vcf', 'Rabie_I_HD99_S28', '']
Traceback (most recent call last):
File "../VPOT.py", line 177, in
main()
File "../VPOT.py", line 135, in main
VPOT_1_prioritise.main() #
File "/fefs1/cegmr/sturkistany/VPOT-master/VPOT_1_prioritise.py", line 299, in main
error=VPOT_1_1_VCF.read_variant_source_file() #
File "/fefs1/cegmr/sturkistany/VPOT-master/VPOT_1_1_VCF.py", line 178, in read_variant_source_file
work_this_src_file(this_line) #
File "/fefs1/cegmr/sturkistany/VPOT-master/VPOT_1_1_VCF.py", line 401, in work_this_src_file
work_this_src_file_1(source_vcf,wrkf1) #
File "/fefs1/cegmr/sturkistany/VPOT-master/VPOT_1_1_VCF.py", line 379, in work_this_src_file_1
check_this_variant(src_line, wrkf1) #
File "/fefs1/cegmr/sturkistany/VPOT-master/VPOT_1_1_VCF.py", line 433, in check_this_variant
elif (GENOTYPE1[0] == GENOTYPE1[1]) : # a homozygous alt genotype
IndexError: list index out of range

not sure what is the problem here, Please let me know how i can overcome this?

Shereen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.