Git Product home page Git Product logo

ensembl-vep's Introduction

ensembl-vep

GitHub Coverage Status Docker Build Status Docker Hub Pulls

  • VEP (Variant Effect Predictor) predicts the functional effects of genomic variants.
  • Haplosaurus uses phased genotype data to predict whole-transcript haplotype sequences.
  • Variant Recoder translates between different variant encodings.
Table of contents

Installation and requirements

The VEP package requires:

  • gcc, g++ and make
  • Perl (>=5.10 recommended, tested on 5.10, 5.14, 5.18, 5.22, 5.26)
  • Perl libraries Archive::Zip and DBI

The remaining dependencies can be installed using the included INSTALL.pl script. Basic instructions:

git clone https://github.com/Ensembl/ensembl-vep.git
cd ensembl-vep
perl INSTALL.pl

The installer may also be used to check for updates to this and co-dependent packages, simply re-run INSTALL.pl.

See documentation for full installation instructions.

Additional CPAN modules

The following modules are optional but most users will benefit from installing them. We recommend using cpanminus to install.

  • DBD::mysql - required for database access (--database or --cache without --offline)
  • Set::IntervalTree - required for Haplosaurus, also confers speed updates to VEP
  • JSON - required for writing JSON output
  • PerlIO::gzip - faster compressed file parsing
  • Bio::DB::BigFile - required for reading custom annotation data from BigWig files

Docker

A docker image for VEP is available from DockerHub.

See documentation for the Docker installation instructions.


VEP

Usage

./vep -i input.vcf -o out.txt -offline

See documentation for full command line instructions.

Please report any bugs or issues by contacting Ensembl or creating a GitHub issue


Haplosaurus

haplo is a local tool implementation of the same functionality that powers the Ensembl transcript haplotypes view. It takes phased genotypes from a VCF and constructs a pair of haplotype sequences for each overlapped transcript; these sequences are also translated into predicted protein haplotype sequences. Each variant haplotype sequence is aligned and compared to the reference, and an HGVS-like name is constructed representing its differences to the reference.

This approach offers an advantage over VEP's analysis, which treats each input variant independently. By considering the combined change contributed by all the variant alleles across a transcript, the compound effects the variants may have are correctly accounted for.

haplo shares much of the same command line functionality with vep, and can use VEP caches, Ensembl databases, GFF and GTF files as sources of transcript data; all vep command line flags relating to this functionality work the same with haplo.

Usage

Input data must be a VCF containing phased genotype data for at least one individual and file must be sorted by chromosome and genomic position; no other formats are currently supported.

When using a VEP cache as the source of transcript annotation, the first time you run haplo with a particular cache it will spend some time scanning transcript locations in the cache.

./haplo -i input.vcf -o out.txt -cache

Output

The default output format is a simple tab-delimited file reporting all observed non-reference haplotypes. It has the following fields:

  1. Transcript stable ID
  2. CDS haplotype name
  3. Comma-separated list of flags for CDS haplotype
  4. Protein haplotype name
  5. Comma-separated list of flags for protein haplotype
  6. Comma-separated list of frequency data for protein haplotype
  7. Comma-separated list of contributing variants
  8. Comma-separated list of sample:count that exhibit this haplotype

The altered haplotype sequences can be obtained by switching to JSON output using --json which will display them by default. Each transcript analysed is summarised as a JSON object written to one line of the output file.

The JSON output structure matches the format of the transcript haplotype REST endpoint.

You may exclude fields in the JSON from being exported with --dont_export field1,field2. This may be used, for example, to exclude the full haplotype sequence and aligned sequences from the output with --dont_export seq,aligned_sequences.

Note JSON output does not currently include side-loaded frequency data.

REST service

The transcript haplotype REST endpoint. returns arrays of protein_haplotypes and cds_haplotypes for a given transcript. The default haplotype record includes:

  • population_counts: the number of times the haplotype is seen in each population
  • population_frequencies: the frequency of the haplotype in each population
  • contributing_variants: variants contributing to the haplotype
  • diffs: differences between the reference and this haplotype
  • hex: the md5 hex of this haplotype sequence
  • other_hexes: the md5 hex of other related haplotype sequences ( CDSHaplotypes that translate to this ProteinHaplotype or ProteinHaplotype representing the translation of this CDSHaplotype)
  • has_indel: does the haplotype contain insertions or deletions
  • type: the type of haplotype - cds, protein
  • name: a human readable name for the haplotype (sequence id + REF or a change description)
  • flags: flags for the haplotype
  • frequency: haplotype frequency in full sample set
  • count: haplotype count in full sample set

The REST service does not return raw sequences, sample-haplotype assignments and the aligned sequences used to generate differences by default.

Flags

Haplotypes may be flagged with one or more of the following:

  • indel: haplotype contains an insertion or deletion (indel) relative to the reference.
  • frameshift: haplotype contains at least one indel that disrupts the reading frame of the transcript.
  • resolved_frameshift: haplotype contains two or more indels whose combined effect restores the reading frame of the transcript.
  • stop_changed: indicates either a STOP codon is gained (protein truncating variant, PTV) or the existing reference STOP codon is lost.
  • deleterious_sift_or_polyphen: haplotype contains at least one single amino acid substitution event flagged as deleterious (SIFT) or probably damaging (PolyPhen2).

bioperl-ext

haplo can make use of a fast compiled alignment algorithm from the bioperl-ext package; this can speed up analysis, particularly in longer transcripts where insertions and/or deletions are introduced. The bioperl-ext package is no longer maintained and requires some tweaking to install. The following instructions install the package in $HOME/perl5; edit PREFIX=[path] to change this. You may also need to edit the export command to point to the path created for the architecture on your machine.

git clone https://github.com/bioperl/bioperl-ext.git
cd bioperl-ext/Bio/Ext/Align/
perl -pi -e"s|(cd libs.+)CFLAGS=\\\'|\$1CFLAGS=\\\'-fPIC |" Makefile.PL
perl Makefile.PL PREFIX=~/perl5
make
make install
cd -
export PERL5LIB=${PERL5LIB}:${HOME}/perl5/lib/x86_64-linux-gnu/perl/5.22.1/

If successful the following should print OK:

perl -MBio::Tools::dpAlign -e"print qq{OK\n}"

Variant Recoder

variant_recoder is a tool for translating between different variant encodings. It accepts as input any format supported by VEP (VCF, variant ID, HGVS), with extensions to allow for parsing of potentially ambiguous HGVS notations. For each input variant, variant_recoder reports all possible encodings including variant IDs from all sources imported into the Ensembl database and HGVS (genomic, transcript and protein), reported on Ensembl, RefSeq and LRG sequences.

Usage

variant_recoder depends on database access for identifier lookup, and cannot be used in offline mode as per VEP. The output format is JSON and the JSON perl module is required.

./variant_recoder --id [input_data_string]
./variant_recoder -i [input_file] --species [species]

Output

Output is a JSON array of objects, one per input variant, with the following keys:

  • input: input string
  • id: variant identifiers
  • hgvsg: HGVS genomic nomenclature
  • hgvsc: HGVS transcript nomenclature
  • hgvsp: HGVS protein nomenclature
  • spdi: Genomic SPDI notation
  • vcf_string: VCF format (optional)
  • var_synonyms: Variation synonyms (optional)
  • mane_select: MANE Select transcripts (optional)
  • warnings: Warnings generated e.g. for invalid HGVS

Use --pretty to pre-format and indent JSON output.

Example output:

./variant_recoder --id "AGT:p.Met259Thr" --pretty
[
   {
     "warnings" : [
         "Possible invalid use of gene or protein identifier 'AGT' as HGVS reference; AGT:p.Met259Thr may resolve to multiple genomic locations"
      ],
     "C" : {
        "input" : "AGT:p.Met259Thr",
        "id" : [
           "rs699",
           "CM920010",
           "COSV64184214"
        ],
        "hgvsg" : [
           "NC_000001.11:g.230710048A>G"
        ],
        "hgvsc" : [
           "ENST00000366667.6:c.776T>C",
           "ENST00000679684.1:c.776T>C",
           "ENST00000679738.1:c.776T>C",
           "ENST00000679802.1:c.776T>C",
           "ENST00000679854.1:n.1287T>C",
           "ENST00000679957.1:c.776T>C",
           "ENST00000680041.1:c.776T>C",
           "ENST00000680783.1:c.776T>C",
           "ENST00000681269.1:c.776T>C",
           "ENST00000681347.1:n.1287T>C",
           "ENST00000681514.1:c.776T>C",
           "ENST00000681772.1:c.776T>C",
           "NM_001382817.3:c.776T>C",
           "NM_001384479.1:c.776T>C"
        ],
        "hgvsp" : [
           "ENSP00000355627.5:p.Met259Thr",
           "ENSP00000505981.1:p.Met259Thr",
           "ENSP00000505063.1:p.Met259Thr",
           "ENSP00000505184.1:p.Met259Thr",
           "ENSP00000506646.1:p.Met259Thr",
           "ENSP00000504866.1:p.Met259Thr",
           "ENSP00000506329.1:p.Met259Thr",
           "ENSP00000505985.1:p.Met259Thr",
           "ENSP00000505963.1:p.Met259Thr",
           "ENSP00000505829.1:p.Met259Thr",
           "NP_001369746.2:p.Met259Thr",
           "NP_001371408.1:p.Met259Thr"
        ],
        "spdi" : [
           "NC_000001.11:230710047:A:G"
        ]
     }
   }
]

Options

variant_recoder shares many of the same command line flags as VEP. Others are unique to variant_recoder.

  • -id|--input_data [input_string]: a single variant as a string.
  • -i|--input_file [input_file]: input file containing one or more variants, one per line. Mixed formats disallowed.
  • --species: species to use (default: homo_sapiens).
  • --grch37: use GRCh37 assembly instead of GRCh38.
  • --genomes: set database parameters for Ensembl Genomes species.
  • --pretty: write pre-formatted indented JSON.
  • --fields [field1,field2]: limit output fields. Comma-separated list, one or more of: id, hgvsg, hgvsc, hgvsp, spdi.
  • --vcf_string : report VCF
  • --var_synonyms : report variation synonyms
  • --mane_select : report MANE Select transcripts in HGVS format
  • --host [db_host]: change database host from default ensembldb.ensembl.org (UK); geographic mirrors are useastdb.ensembl.org (US East Coast) and asiadb.ensembl.org (Asia). --user, --port and --pass may also be set.
  • --pick, --per_gene, --pick_allele, --pick_allele_gene, --pick_order: set and customise transcript selection process, see VEP documentation

ensembl-vep's People

Contributors

alanhoyle avatar aparton avatar at7 avatar brunocontrerasmoreira avatar danstaines avatar dbolser avatar dglemos avatar diegomscoelho avatar ens-lgil avatar gurpreet-ghattaoraya avatar helensch avatar ima23 avatar jamie-m-a avatar juanesarango avatar likhitha-surapaneni avatar marcoooo avatar matthdsm avatar michaeljon avatar mkschuster avatar monsanto-pinheiro avatar nakib103 avatar nuno-agostinho avatar olaaustine avatar pdl avatar sarahhunt avatar thomasmaurel avatar willmclaren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ensembl-vep's Issues

splicesite annotations

Hi Will,

As you know, there's been quite some work done to integrate full VEP functionality into bcbio. Lately, I've been running into quite some issue with splicesite annotations. I believe the greatest offenders are Genesplicer and MaxEntScan. Both these plugins rely on fairly ancient binaries, which are hugely inefficient.
With both plugins active, I keep getting forks killed off or going out of memory. Runtime for an exome goes from +- 6min to 2h+.

I believe the main cause of these issues (apart from runtime) is MaxEntScan.
This plugin requires the --fasta flag to be set explicitly. When using a plain fasta file, it seems to work ok most of the time, but when using a compressed fasta, the memory requirements go through the roof, hence the process failure.

Do you have any idea what could be done to fix this? We're looking to use VEP results in diagnostics, and good splicesite annotations are essential. Do you know of any (more modern) alternatives?

Thanks for the help
Matthias

ping @chapmanb, so you know what's going on

installation failed

Installation (perl INSTALL.pl) fails with the following errors:

#   Failed test 'get current git branch'
#   at ./t/version.t line 31.
# Looks like you failed 1 test of 1.
./t/version.t ........................................ Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests 

Test Summary Report
-------------------
./t/version.t                                      (Wstat: 256 Tests: 1 Failed: 1)
  Failed test:  1
  Non-zero exit status: 1
Files=37, Tests=1441, 126 wallclock secs ( 0.36 usr  0.17 sys + 113.54 cusr  9.03 csys = 123.10 CPU)
Result: FAIL
Failed 1/37 test programs. 1/1441 subtests failed.

adding --NO_HTSLIB doesn't help.

Please can anyone suggest what may be the problem?

Thank you!

P.S. perl v5.20.2; DBI package installed

Performance of ensembl-vep 88

Hello.

I have a question regarding ensembl-vep performance. Would it be possible to obtain more details regarding the conditions under which the following results :
"Set up correctly, VEP is capable of processing around 3 million variants in 30 minutes." (from http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html)
were obtained? (number of processors/forks, memory/buffer, command-line flags).

I am asking because both with a locally-built ensembl-vep and with the provided Docker image, processing a VCF file with 30000 variants takes me around 2.5-3 minutes (with flags --dir --cache --offline --assembly --everything (hgsv gets disabled at runtime because I was just testing things and did not provide a fasta for it) --vcf --no_stats and --fork 8) on a Ubuntu 14.04 workstation with 8 cores and 8 GB of memory. With --fork 4, the same job takes 4 minutes. I have tabix-converted the cache files with convert_cache.pl, the VCF is sorted in chromosomal order, no plugins are used, and I followed "Getting VEP to run faster", except installing Ensembl::XS and copying fasta/cache to memory.

I don't have much experience with VEP so I could be missing something obvious.

Thank you for your time.

Feature request: output compressed vcf

Hi,

Would it be possible to add an option to output compressed vcf files? I realize this is allready possible by outputting to stdout and piping to gzip, but it would be handy to have a built in option.

Thanks!
M

test error in t/Runner.t

When trying to install the test suit fails two tests, se below.

It looks kind of internal and not related to out system or perl installation.

Failed test 'get_stats_file_handle - extension handling 2'

at ./t/Runner.t line 292.

Failed test 'get_stats_file_handle - extension handling 3'

at ./t/Runner.t line 295.

-------------------- EXCEPTION --------------------
MSG: ERROR: Stats file ensembl-vep/t/testdata/user_file53593.txt.txt already exists. Specify a different output file with --stats_file or overwrite existing file with --force_overwrite

STACK Bio::EnsEMBL::VEP::BaseRunner::get_stats_file_handle ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::BaseRunner::dump_stats ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:191
STACK Bio::EnsEMBL::VEP::Runner::finish ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:97
STACK Bio::EnsEMBL::VEP::Runner::run ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:120
STACK toplevel ./t/Runner.t:315
Date (localtime) = Mon Dec 12 11:02:05 2016
Ensembl API version = 87

Tests were run but no plan was declared and done_testing() was not seen.

Looks like your test exited with 25 just after 29.

./t/Runner.t ......................................... Dubious, test returned 25 (wstat 6400, 0x1900)
Failed 2/29 subtests

Installing cache fails for old version (homo_sapiens 75 GRCh37)

Im trying to install cache for homo_sapiens 75 GRCh37 but it fails.

perl INSTALL.pl --AUTO c --SPECIES homo_sapiens --DESTDIR ~/vep_test/ --VERSION 75 --CACHEDIR ~/vep_test/cache/ --ASSEMBLY GRCh37
Getting list of available cache files
ERROR: No matching species found at INSTALL.pl line 1045.

I think that this happens because in

ftp://ftp.ensembl.org/pub/release-75/variation/VEP/

the vep file is formatted

homo_sapiens_vep_75.tar.gz

and not

homo_sapiens_vep_75_GRCh37.tar.gz

two issues

Hi,

Thanks for the good work you're doing! I tried the vep.pl beta and noticed some irregularities.

1:
When multiple accessions are present under the TREMBL field, they are separated using commas (,). This is not optimal for parsing since the whole annotation lines (in case multiple transcripts exist) are also separated using commas. For Existing_variation, the separator is "&", which is much better.

2:
It did not seem to fork runs to multiple threads?

Unit test failing when installing from different directory

A unit test fails whenINSTALL.pl is run from a different directory than the installation folder:

sudo perl /usr/local/ensembl-vep/INSTALL.pl
.....
Testing VEP installation
.....
/usr/local/ensembl-vep/t/version.t ........................................ fatal: Not a git repository (or any of the parent directories): .git
/usr/local/ensembl-vep/t/version.t ........................................ 1/? 
#   Failed test 'get current git branch'
#   at /usr/local/ensembl-vep/t/version.t line 31.
# Looks like you failed 1 test of 1.
/usr/local/ensembl-vep/t/version.t ........................................ Dubious, test returned 1 (wstat 256, 0x100)
......
Failed 1/37 test programs. 1/1437 subtests failed.

Runs fine when I change directory to /usr/local/ensembl-vep/ and run the install.

SIFT not available

I'm currently running CADD locally which runs VEP internally:

Here is the section of the commands that CADD runs:

VEPpath=/home/ucbtmog/data/.progz/ensembl-vep/vep.pl
CADDpath=/home/ucbtmog/data/analysis/CADD
INPUT=test.vcf.gz
OUTPUT=out.tsv.gz

zcat ${INPUT} | \
${CADDpath}/bin/src/VCF2vepVCF.py |  \
sort -k1,1 -k2,2n -k3,3 -k4,4 |  \
uniq | \
${CADDpath}/bin/src/extract_scored.py -p ${CADDpath}/prescored/whole_genome_SNVs.tsv.gz --found_out=>( gzip -c > ${OUTPUT}_p2.tmp ) | \
${CADDpath}/bin/src/extract_scored.py -p ${CADDpath}/prescored/InDels.tsv.gz --found_out=>( gzip -c > ${OUTPUT}_p3.tmp ) | \
perl ${VEPpath} --quiet --cache --offline --dir ${CADDpath}/annotations/vep/ --buffer 1000 --no_stats --species homo_sapiens --db_version=75 --format vcf --regulatory --sift b --polyphen b --per_gene --ccds --domains --numbers --canonical --total_length --output_file >( awk 'BEGIN{ FS="\t"; OFS="\t"; }{ if ($1 ~ /^#/) { if ($1 ~ /^#Up/) { sub("#","",$1); print "#Chrom","Start","End",$0 } else { print } } else { split($2,a,":"); split(a[2],b,"-"); if (length(b) == 2) { print a[1],b[1],b[2],$0 } else { print a[1],b[1],b[1],$0 } }}' ) --force_overwrite 

The above works perfectly with ensembl-tools-release-87 release from the ensembl vep website.

However, with the github release/87 branch, the VEP line fails with the following:

-------------------- EXCEPTION --------------------
MSG: ERROR: SIFT not available

STACK Bio::EnsEMBL::VEP::AnnotationSource::Cache::Transcript::check_sift_polyphen /mnt/254b78b9-76b4-422d-84b1-cc632bff60f7/IsmailM/.progz/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/Cache/Transcript.pm:96
STACK Bio::EnsEMBL::VEP::AnnotationSource::Cache::Transcript::new /mnt/254b78b9-76b4-422d-84b1-cc632bff60f7/IsmailM/.progz/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/Cache/Transcript.pm:70
STACK Bio::EnsEMBL::VEP::CacheDir::get_all_AnnotationSources /mnt/254b78b9-76b4-422d-84b1-cc632bff60f7/IsmailM/.progz/ensembl-vep/modules/Bio/EnsEMBL/VEP/CacheDir.pm:95
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_from_cache /mnt/254b78b9-76b4-422d-84b1-cc632bff60f7/IsmailM/.progz/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:81
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /mnt/254b78b9-76b4-422d-84b1-cc632bff60f7/IsmailM/.progz/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:63
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /mnt/254b78b9-76b4-422d-84b1-cc632bff60f7/IsmailM/.progz/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:119
STACK Bio::EnsEMBL::VEP::Runner::init /mnt/254b78b9-76b4-422d-84b1-cc632bff60f7/IsmailM/.progz/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:77
STACK Bio::EnsEMBL::VEP::Runner::run /mnt/254b78b9-76b4-422d-84b1-cc632bff60f7/IsmailM/.progz/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:111
STACK toplevel /mnt/254b78b9-76b4-422d-84b1-cc632bff60f7/IsmailM/.progz/ensembl-vep/vep.pl:193
Date (localtime)    = Tue Mar 28 00:27:08 2017
Ensembl API version = 87
---------------------------------------------------

Is this because of changes to ensembl-vep or simply because CADD links to an older annotation db?

hg19 and --hgvsg

I am observing an issue using vep 89.3 together with the flag --hgvsg and hg19 (from http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/) as reference genome. With the downloadable GRCh37.75, it works fine but for some reason I get the following error when using hg19 (chromosomes are named with the prefix "chr" and the mitogenome is different).

$ perl /u/scratch/tools/vep/89.3/binary/linux/vep --offline --cache --dir /u/scratch/genomes/H_sapiens/UCSC/hg19/Annotation/variation/vep --fasta hg19/hg19.fa -i test.vcf.gz --vcf -o out.vcf.gz --refseq --species homo_sapiens --format vcf --hgvsg --force_overwrite
substr outside of string at /u/scratch/tools/vep/89.3/binary/linux/Bio/EnsEMBL/Variation/Utils/Sequence.pm line 517, <$fh> line 117.
Use of uninitialized value $ref_allele in string eq at /u/scratch/tools/vep/89.3/binary/linux/Bio/EnsEMBL/Variation/Utils/Sequence.pm line 520, <$fh> line 117.
Use of uninitialized value in concatenation (.) or string at /u/scratch/tools/vep/89.3/binary/linux/Bio/EnsEMBL/Variation/Utils/Sequence.pm line 648, <$fh> line 117.
Specifically, the error occurs when vep parses a variant entry of chrY and tries to do a substring of an empty string. However, also ref_slices ($ref_seq in function hgvs_variant_notation) of other chromosomes seem to be off (many Ns) and different to the ones I see with the GRCh37.75 genome.

It seems odd that up to now I did not have any troubles with hg19 (e.g. with --hgvs).

I have attached the test file containing only one variant on chrY (test.vcf.gz). To generate the hg19 reference, just execute the following commands.

mkdir hg19 && cd hg19 && wget -O - http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz | tar -xzvf - && cat chr* > hg19.fa && rm chr* && cd ..

Installation issue: API versions not recognized

Hi,

was trying to install VEP as outlined here, but got stuck when the installation script tried to determine whether my EnsEMBL API was up-to-date:

Checking for installed versions of the Ensembl API...done
There are updates avaiable for these modules:
ensembl-io : installed = , available = 277fe7c
ensembl : installed = , available = b8ff470
ensembl-variation : installed = , available = ed4fefd
ensembl-funcgen : installed = , available = 3e63130

Both ensembl-vep as well as all other API components are at checout "release/88" and their respective most recent commits. Running on Centos6, Perl 5.18.4.

If I skip to the next step, htslib etc is not built, so that is a problem of course.

Any advise?

fatal: Not a git repository (or any of the parent directories): .git

It seems like the test in /vep/vep/t/version.t checks the git revision using git rev-parse --abbrev-ref HEAD.

I don't know if some of the VEP code relies on this, but this test fails if:

  • VEP was installed without using git (e.g. by installing a release archive from github)
  • The INSTALL.pl script is run from a directory other than where VEP was cloned into (e.g. the parent directory)

In either of these cases the tests fail with:

/vep/vep/t/version.t ........................................ fatal: Not a git repository (or any of the parent directories): .git
#   Failed test 'get current git branch'
#   at /vep/vep/t/version.t line 31.
# Looks like you failed 1 test of 1.
/vep/vep/t/version.t ........................................ Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests 

Test Summary Report
-------------------
/vep/vep/t/version.t                                      (Wstat: 256 Tests: 1 Failed: 1)

Do we need this test? Should VEP be relying on git at all?

HGVS.p for synonymous variants

Hi,

I'm using VEP v87 together with GRCh37.75 and the merged cache.

Somehow there seems to be a problem with the character set or something. For all synonymous variants I get an annotation like for example "ENSP00000369497.3:p.Lys1132%3D" where the %3D normally should be a "=". Is this a bug or do I have to configure something?

Best regards and thanks in advance,

Sebastian

dbNSFP plugin producing empty columns in vep 88.8

I've just upgraded to the new VEP from the old Ensembl Tools one, and it seems that after the upgrade, the dbNSFP columns (e.g. CADD_phred | CADD_raw | CADD_raw_rankscore | COSMIC_CNT | COSMIC_ID | ESP6500_AA_AF | ESP6500_EA_AF) are being created, but not populated.

I used the following command:

vep
--allele_number
--assembly GRCh37
--cache
--canonical
--check_existing
--dir /hsm/VR0320/shared/production/2.5.0/data/vep_cache
--dir_plugins /hsm/VR0320/shared/production/2.5.0/tools/vep_plugins
--fasta /hsm/VR0320/shared/production/2.5.0/data/vep_cache/homo_sapiens_refseq/*/Homo_sapiens.*.dna.primary_assembly.fa.gz
--force_overwrite
--hgvs
-i variants/NA12878.individual.genotype.soi.vcf
--af_1kg
--af_esp
--af
-o variants/NA12878.individual.genotype.soi.vep.vcf
--offline
--plugin Condel,/hsm/VR0320/shared/production/2.5.0/tools/vep_plugins/config/Condel/config,s 
--plugin dbNSFP,/hsm/VR0320/shared/production/2.5.0/data/dbnsfp/dbNSFP.gz,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,LRT_score,LRT_converted_rankscore,LRT_pred,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,MutationAssessor_score,MutationAssessor_rankscore,MutationAssessor_pred,FATHMM_score,FATHMM_rankscore,FATHMM_pred,MetaSVM_score,MetaSVM_rankscore,MetaSVM_pred,MetaLR_score,MetaLR_rankscore,MetaLR_pred,Reliability_index,VEST3_score,VEST3_rankscore,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred,GERP++_NR,GERP++_RS,GERP++_RS_rankscore,phyloP46way_primate,phyloP46way_primate_rankscore,phyloP46way_placental,phyloP46way_placental_rankscore,phyloP100way_vertebrate,phyloP100way_vertebrate_rankscore,phastCons46way_primate,phastCons46way_primate_rankscore,phastCons46way_placental,phastCons46way_placental_rankscore,phastCons100way_vertebrate,phastCons100way_vertebrate_rankscore,SiPhy_29way_pi,SiPhy_29way_logOdds,SiPhy_29way_logOdds_rankscore,LRT_Omega,UniSNP_ids,ESP6500_AA_AF,ESP6500_EA_AF,clinvar_rs,clinvar_clnsig,clinvar_trait,COSMIC_ID,COSMIC_CNT
--plugin Grantham
--polyphen b
--protein
--pubmed
--refseq
--sift b
-species homo_sapiens
--symbol
--vcf
--vcf_info_field ANN
--verbose;

This is with vep 88.8, and commid 3be3889 of the vep_plugins git repository.

Is there any reason this might be happening?

Casting problem for HGVS (protein) notation having special characters like "="

Hi VEP developers,

For some variants having a modification on the protein level, the HGVS notation does not come-out right.

For example, in my VCF I have got :

11    69462910    .    G    A    313.7    PASS    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=10;CIGAR=1X;DP=10;DPB=10;DPRA=0;EPP=3.87889;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=18.4681;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=375;QR=0;RO=0;RPL=8;RPP=10.8276;RPPR=0;RPR=2;RUN=1;SAF=4;SAP=3.87889;SAR=6;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.protocol=1;DBSNP=rs9344;CLINVAR=rs9344;COSMIC=COSM1179751;CSQ=A|splice_region_variant&synonymous_variant|LOW|CCND1|595|Transcript|NM_053056.2|protein_coding|4/5||NM_053056.2:c.723G>A|NP_444284.1:p.Pro241%3D|932|723|241|P|ccG/ccA

The ID NP_444284.1:p.Pro241%3D should instead be NP_444284.1:p.Pro241=. Is it done on purpose, or is it a casting bug?

I think it is the same for other special characters like "".

Thanks in advance.

Best,
Jรฉrรดme.

Undefined error (haplo)

I had this issue in the pre-release and see it is still present in the 88 release. This is printed out and then the tool dies.

Can't call method "transcript" on an undefined value at ensembl-vep/modules/Bio/EnsEMBL/VEP/Haplo/Runner.pm line 131, <$fh> line 245947.

I get around it by wrapping the logic in an eval clause. But I don't know how it arises. I have had this in both Human and zebrafish samples with RefSeq cache v85.

failed installation or running

default installation failed with this error message:
Compile didn't complete. No libhts.a library file found at INSTALL.pl line 748.

I tried installation with:
perl INSTALL.pl --NO_HTSLIB

Now the installation is working but when I tried to run my prediction command:
perl $HOME/ensembl-vep/vep.pl -i my.vcf -gff data.gff.gz -fasta $genome

I got this exception:
MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module installed

Offline annotation with GFF

Hi

I wondered if it would be possible to annotate a VCF using a GFF/GTF resource without connecting to ensembl?

`[ml@cvx-gen02 ~]$ /share/apps/vep-distros/ensembl-vep/vep -i /share/apps/vep-distros/ensembl-vep/examples/homo_sapiens_GRCh37.vcf -gff /data/db/human/refseq/ref_GRCh37.p13_top_level_canonical_b37_sorted.gff3.gz -fasta /data/db/human/gatk/2.8/b37/human_g1k_v37.fasta --offline

-------------------- EXCEPTION --------------------
MSG: ERROR: Cache directory /home/ml/.vep/homo_sapiens not found

STACK Bio::EnsEMBL::VEP::CacheDir::dir /share/apps/vep-distros/ensembl-vep/modules/Bio/EnsEMBL/VEP/CacheDir.pm:182
STACK Bio::EnsEMBL::VEP::CacheDir::init /share/apps/vep-distros/ensembl-vep/modules/Bio/EnsEMBL/VEP/CacheDir.pm:139
STACK Bio::EnsEMBL::VEP::CacheDir::new /share/apps/vep-distros/ensembl-vep/modules/Bio/EnsEMBL/VEP/CacheDir.pm:68
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_from_cache /share/apps/vep-distros/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:75
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /share/apps/vep-distros/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:65
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /share/apps/vep-distros/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::init /share/apps/vep-distros/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:77
STACK Bio::EnsEMBL::VEP::Runner::run /share/apps/vep-distros/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:111
STACK toplevel /share/apps/vep-distros/ensembl-vep/vep:209
Date (localtime) = Mon Apr 24 14:50:27 2017
Ensembl API version = 88
---------------------------------------------------`

Thanks
Matt

language suffix: vep.pl vs vep

From https://wiki.debian.org/UpstreamGuide#Language_extensions_in_scripts :

Debian policy says in section 10.4 Scripts

When scripts are installed into a directory in the system PATH, the script name should not include an extension such as .sh or .pl that denotes the scripting language currently used to implement it.

Here you can find good reasons for this:

Wikipedia about command names
The use of a filename extension in a command name appears occasionally, usually as a side effect of the command having been implemented as a script (in Bourne shell, Python, etc.) and the interpreter name being suffixed to the command name, a practice common on systems like Windows and Mac OS X, which rely on globally set associations between filename extension and interpreter, but sharply deprecated in UNIX-derived systems like Linux and Apple's Mac OS X, where the interpreter is normally specified as a header in the script. ...

Commandname extensions considered harmful
Commands should never have filename extensions. Rely on interpreter directives instead or some other paradigm that prevent the implementation from being exposed, or worse yet, lied about, within the very name of the command.

one out of several good mails on Debian policy list

Since you're coming out with this new independent version with a different executable name, now might be a good time to address this.

I'm proposing that vep.pl be renamed to just vep and marked executable. It looks like you have the proper shebang line already (#!/usr/bin/env perl). The same would apply to the other scripts.

Execution in the current directory would then be ./vep [...] rather than perl vep.pl [...]. Or, when we have it installed on our systems, we can simply execute it as vep.

VEP v88 - Do not output any PolyPhen prediction terms with "--humdiv"

I have used v88.8 with the 88 cache and the PolyPhen prediction term with โ€œโ€”humdivโ€ seem to still be missing from the cache, see below. I thought this was resolved in the 88 cache or am I doing something wrong?

See:

perl variant_effect_predictor.pl -v -i example_GRCh37.vcf --offline --assembly GRCh37 --dir_cache /mnt/hds/proj/cust003/develop/modules/miniconda2/envs/mip4.1/ensembl-tools-release-88.8/cache --force_overwrite --polyphen b --humdiv
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#

version 88

By Will McLaren ([email protected])

Configuration options:

assembly           GRCh37
core_type          core
dir                /mnt/hds/proj/cust003/develop/modules/miniconda2/envs/mip4.1/ensembl-tools-release-88.8/cache
dir_cache          /mnt/hds/proj/cust003/develop/modules/miniconda2/envs/mip4.1/ensembl-tools-release-88.8/cache
dir_plugins        /home/henrik.stranneheim/.vep/Plugins
force_overwrite    1
host               ensembldb.ensembl.org
humdiv             1
input_file         example_GRCh37.vcf
offline            1
polyphen           b
port               3306
species            homo_sapiens
stats              HASH(0x58aa638)
verbose            1

--------------------

2017-05-16 13:28:13 - Read existing cache info
2017-05-16 13:28:13 - Auto-detected FASTA file in cache directory
2017-05-16 13:28:13 - Starting...
2017-05-16 13:28:13 - Detected format of input file as vcf
2017-05-16 13:28:13 - Read 173 variants into buffer
2017-05-16 13:28:13 - Reading transcript data from cache and/or database
[===============================================]  [ 100% ]
2017-05-16 13:28:14 - Retrieved 3097 transcripts (0 mem, 3162 cached, 0 DB, 65 duplicates)
2017-05-16 13:28:14 - Analyzing chromosome 21
2017-05-16 13:28:14 - Analyzing variants
[===============================================]  [ 100% ]
2017-05-16 13:28:14 - Calculating consequences
[===============================================]  [ 100% ]
2017-05-16 13:28:14 - Analyzing chromosome 22
2017-05-16 13:28:14 - Analyzing variants
[===============================================]  [ 100% ]
2017-05-16 13:28:14 - Calculating consequences
[===============================================]  [ 100% ]
2017-05-16 13:28:15 - Processed 173 total variants (86 vars/sec, 86 vars/sec total)
2017-05-16 13:28:15 - Wrote stats summary to variant_effect_output.txt_summary.html
2017-05-16 13:28:15 - Finished!
(mip4.1) [henrik.stranneheim@rastapopoulos variant_effect_predictor]$ grep -c PolyPhen variant_effect_output.txt
1

Input lines skipped when input is a pipe

If I run VEP with a pipe as input, like for instance:

perl vep.pl --format vcf -i <(generate_some_vcf_content) -o /dev/stdout

a bunch of the first lines from that pipe will be skipped.
The culprit is in this line.
The -B operator reads a chunk of data from the file to detect if it's binary data.
I'd suggest changing it to -f $file && -B $file.

Thanks.

Install VEP behind proxy

Hi,

I want to install VEP with all its dependencies by using the INSTALL.pl script. Unfortunattely we have to connect to the internet via a proxy server in our medical center. And it seems like the script can not connect to the internet. Is there an option to tell the IP adress of our proxy server to the script? Or is there any workaround?

Thanks in advance!
Stefan

Installation errors/fail

Hi,

I'm trying to install VEP to run some SNP annotations, but run into several error messages when I run INSTALL.pl.

Below is the terminal output when I try to install. I'm running Ubuntu 16.04. I've bolded the error messages (it's a pretty long output):

perl INSTALL.pl

Hello! This installer is configured to install v88 of the Ensembl API for use by the VEP.
It will not affect any existing installations of the Ensembl API that you may have.

It will also download and install cache files from Ensembl's FTP server.

Checking for installed versions of the Ensembl API...done

Setting up directories

  • fetching BioPerl
  • unpacking ./Bio/tmp/release-1-6-924.zip
  • moving files
    Attempting to install Bio::DB::HTS and htslib.

If this fails, try re-running with --NO_HTSLIB

  • checking out HTSLib
    Cloning into 'htslib'...
    remote: Counting objects: 8815, done.
    remote: Compressing objects: 100% (3/3), done.
    remote: Total 8815 (delta 0), reused 0 (delta 0), pack-reused 8812
    Receiving objects: 100% (8815/8815), 6.42 MiB | 0 bytes/s, done.
    Resolving deltas: 100% (6093/6093), done.
    Checking connectivity... done.
    Note: checking out '6bed35a3eaefa3baa2c7e0166ceba442212f166b'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b

  • building HTSLIB in ./htslib
    In /home/conrad/Software/ensembl-vep/htslib
    echo '/* Empty config.h generated by Makefile */' > config.h
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o kfunc.o kfunc.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o knetfile.o knetfile.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o kstring.o kstring.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o bgzf.o bgzf.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o faidx.o faidx.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o hfile.o hfile.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o hfile_net.o hfile_net.c
    echo '#define HTS_VERSION "1.3.2-dirty"' > version.h
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o hts.o hts.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o md5.o md5.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o regidx.o regidx.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o sam.o sam.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o synced_bcf_reader.o synced_bcf_reader.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o vcf_sweep.o vcf_sweep.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o tbx.o tbx.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o vcf.o vcf.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o vcfutils.o vcfutils.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/cram_codecs.o cram/cram_codecs.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/cram_decode.o cram/cram_decode.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/cram_encode.o cram/cram_encode.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/cram_external.o cram/cram_external.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/cram_index.o cram/cram_index.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/cram_io.o cram/cram_io.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/cram_samtools.o cram/cram_samtools.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/cram_stats.o cram/cram_stats.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/files.o cram/files.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/mFILE.o cram/mFILE.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/open_trace_file.o cram/open_trace_file.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/pooled_alloc.o cram/pooled_alloc.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/rANS_static.o cram/rANS_static.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/sam_header.o cram/sam_header.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/string_alloc.o cram/string_alloc.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/thread_pool.o cram/thread_pool.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/vlen.o cram/vlen.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o cram/zfio.o cram/zfio.c
    ar -rc libhts.a kfunc.o knetfile.o kstring.o bgzf.o faidx.o hfile.o hfile_net.o hts.o md5.o regidx.o sam.o synced_bcf_reader.o vcf_sweep.o tbx.o vcf.o vcfutils.o cram/cram_codecs.o cram/cram_decode.o cram/cram_encode.o cram/cram_external.o cram/cram_index.o cram/cram_io.o cram/cram_samtools.o cram/cram_stats.o cram/files.o cram/mFILE.o cram/open_trace_file.o cram/pooled_alloc.o cram/rANS_static.o cram/sam_header.o cram/string_alloc.o cram/thread_pool.o cram/vlen.o cram/zfio.o
    ranlib libhts.a
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o kfunc.pico kfunc.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o knetfile.pico knetfile.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o kstring.pico kstring.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o bgzf.pico bgzf.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o faidx.pico faidx.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o hfile.pico hfile.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o hfile_net.pico hfile_net.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o hts.pico hts.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o md5.pico md5.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o regidx.pico regidx.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o sam.pico sam.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o synced_bcf_reader.pico synced_bcf_reader.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o vcf_sweep.pico vcf_sweep.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o tbx.pico tbx.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o vcf.pico vcf.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o vcfutils.pico vcfutils.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/cram_codecs.pico cram/cram_codecs.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/cram_decode.pico cram/cram_decode.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/cram_encode.pico cram/cram_encode.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/cram_external.pico cram/cram_external.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/cram_index.pico cram/cram_index.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/cram_io.pico cram/cram_io.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/cram_samtools.pico cram/cram_samtools.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/cram_stats.pico cram/cram_stats.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/files.pico cram/files.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/mFILE.pico cram/mFILE.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/open_trace_file.pico cram/open_trace_file.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/pooled_alloc.pico cram/pooled_alloc.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/rANS_static.pico cram/rANS_static.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/sam_header.pico cram/sam_header.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/string_alloc.pico cram/string_alloc.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/thread_pool.pico cram/thread_pool.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/vlen.pico cram/vlen.c
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -fpic -c -o cram/zfio.pico cram/zfio.c
    gcc -shared -Wl,-soname,libhts.so.1 -pthread -o libhts.so kfunc.pico knetfile.pico kstring.pico bgzf.pico faidx.pico hfile.pico hfile_net.pico hts.pico md5.pico regidx.pico sam.pico synced_bcf_reader.pico vcf_sweep.pico tbx.pico vcf.pico vcfutils.pico cram/cram_codecs.pico cram/cram_decode.pico cram/cram_encode.pico cram/cram_external.pico cram/cram_index.pico cram/cram_io.pico cram/cram_samtools.pico cram/cram_stats.pico cram/files.pico cram/mFILE.pico cram/open_trace_file.pico cram/pooled_alloc.pico cram/rANS_static.pico cram/sam_header.pico cram/string_alloc.pico cram/thread_pool.pico cram/vlen.pico cram/zfio.pico -lz -lm
    ln -sf libhts.so libhts.so.1
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o bgzip.o bgzip.c
    gcc -pthread -o bgzip bgzip.o libhts.a -lz
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o htsfile.o htsfile.c
    gcc -pthread -o htsfile htsfile.o libhts.a -lz
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o tabix.o tabix.c
    gcc -pthread -o tabix tabix.o libhts.a -lz
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o test/fieldarith.o test/fieldarith.c
    gcc -pthread -o test/fieldarith test/fieldarith.o libhts.a -lz
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o test/hfile.o test/hfile.c
    gcc -pthread -o test/hfile test/hfile.o libhts.a -lz
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o test/sam.o test/sam.c
    gcc -pthread -o test/sam test/sam.o libhts.a -lz
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o test/test-regidx.o test/test-regidx.c
    gcc -pthread -o test/test-regidx test/test-regidx.o libhts.a -lz
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o test/test_view.o test/test_view.c
    gcc -pthread -o test/test_view test/test_view.o libhts.a -lz
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o test/test-vcf-api.o test/test-vcf-api.c
    gcc -pthread -o test/test-vcf-api test/test-vcf-api.o libhts.a -lz
    gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I. -c -o test/test-vcf-sweep.o test/test-vcf-sweep.c
    gcc -pthread -o test/test-vcf-sweep test/test-vcf-sweep.o libhts.a -lz
  • unpacking ./Bio/tmp/biodbhts.zip to ./Bio/tmp/
    ./Bio/tmp/Bio-DB-HTS-master - moving files to ./biodbhts
  • making Bio::DB:HTS
    Checking prerequisites...
    requires:
    ! Bio::Root::Version is not installed

ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the versions
of the modules indicated above before proceeding with this installation

Run 'Build installdeps' to install missing prerequisites.

Created MYMETA.yml and MYMETA.json
Creating new 'Build' script for 'Bio-DB-HTS' version '2.7'
Building Bio-DB-HTS
x86_64-linux-gnu-gcc -I/home/conrad/Software/ensembl-vep/htslib/htslib -I/usr/lib/x86_64-linux-gnu/perl/5.22/CORE -DVERSION="2.7" -DXS_VERSION="2.7" -fPIC -D_IOLIB=2 -D_FILE_OFFSET_BITS=64 -Wno-error -Wno-unused-result -c -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -o lib/Bio/DB/HTS.o lib/Bio/DB/HTS.c
ExtUtils::Mkbootstrap::Mkbootstrap('blib/arch/auto/Bio/DB/HTS/HTS.bs')
x86_64-linux-gnu-gcc -shared -L/usr/local/lib -fstack-protector-strong -o blib/arch/auto/Bio/DB/HTS/HTS.so lib/Bio/DB/HTS.o -L/home/conrad/Software/ensembl-vep/htslib -Wl,-rpath,/home/conrad/Software/ensembl-vep/htslib -lhts -lpthread -lz
x86_64-linux-gnu-gcc -I/home/conrad/Software/ensembl-vep/htslib/htslib -I/usr/lib/x86_64-linux-gnu/perl/5.22/CORE -DVERSION="2.7" -DXS_VERSION="2.7" -fPIC -D_IOLIB=2 -D_FILE_OFFSET_BITS=64 -Wno-error -Wno-unused-result -c -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -o lib/Bio/DB/HTS/Faidx.o lib/Bio/DB/HTS/Faidx.c
ExtUtils::Mkbootstrap::Mkbootstrap('blib/arch/auto/Bio/DB/HTS/Faidx/Faidx.bs')
x86_64-linux-gnu-gcc -shared -L/usr/local/lib -fstack-protector-strong -o blib/arch/auto/Bio/DB/HTS/Faidx/Faidx.so lib/Bio/DB/HTS/Faidx.o -L/home/conrad/Software/ensembl-vep/htslib -Wl,-rpath,/home/conrad/Software/ensembl-vep/htslib -lhts -lpthread -lz

Downloading required Ensembl API files

  • fetching ensembl
  • unpacking ./Bio/tmp/ensembl.zip
  • moving files
  • getting version information
  • fetching ensembl-variation
  • unpacking ./Bio/tmp/ensembl-variation.zip
  • moving files
  • getting version information
  • fetching ensembl-funcgen
  • unpacking ./Bio/tmp/ensembl-funcgen.zip
  • moving files
  • getting version information
  • fetching ensembl-io
  • unpacking ./Bio/tmp/ensembl-io.zip
  • moving files
  • getting version information

Testing VEP installation
ERROR: Testing VEP script failed with the following error
Can't locate DBI.pm in @inc (you may need to install the DBI module) (@inc contains: /home/conrad/Software/ensembl-vep/modules /home/conrad/Software/ensembl-vep ./Bio /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .) at /home/conrad/Software/ensembl-vep/Bio/EnsEMBL/DBSQL/DBConnection.pm line 74.

BEGIN failed--compilation aborted at /home/conrad/Software/ensembl-vep/Bio/EnsEMBL/DBSQL/DBConnection.pm line 74.
Compilation failed in require at /home/conrad/Software/ensembl-vep/Bio/EnsEMBL/DBSQL/DBAdaptor.pm line 69.
BEGIN failed--compilation aborted at /home/conrad/Software/ensembl-vep/Bio/EnsEMBL/DBSQL/DBAdaptor.pm line 69.
Compilation failed in require at /home/conrad/Software/ensembl-vep/Bio/EnsEMBL/Registry.pm line 137.
BEGIN failed--compilation aborted at /home/conrad/Software/ensembl-vep/Bio/EnsEMBL/Registry.pm line 137.
Compilation failed in require at /home/conrad/Software/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm line 46.
BEGIN failed--compilation aborted at /home/conrad/Software/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm line 46.
Compilation failed in require at /usr/share/perl/5.22/base.pm line 97.
...propagated at /usr/share/perl/5.22/base.pm line 106.
BEGIN failed--compilation aborted at /home/conrad/Software/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm line 46.
Compilation failed in require at /usr/share/perl/5.22/base.pm line 97.
...propagated at /usr/share/perl/5.22/base.pm line 106.
BEGIN failed--compilation aborted at /home/conrad/Software/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm line 46.
Compilation failed in require at ./vep line 20.
BEGIN failed--compilation aborted at ./vep line 20.

I'm not really sure where to begin trouble shooting this - I know nothing about perl (only basic python).

Any help is greatly appreciated!!

tests failing on OSX

Hi,

When trying to install on OSX I get this. Can't figure out what to do next.

10:54 $ perl INSTALL.pl --NO_HTSLIB

Hello! This installer is configured to install v87 of the Ensembl API for use by the VEP.
It will not affect any existing installations of the Ensembl API that you may have.

It will also download and install cache files from Ensembl's FTP server.

Checking for installed versions of the Ensembl API...done
It looks like you have an older version (80) of the API installed.
This installer will install a limited set of the API v87 for use by the VEP only

Skip to the next step (n) to install cache files

Do you want to continue installing the API (y/n)? y

Setting up directories
Destination directory ./Bio already exists.
Do you want to overwrite it (if updating VEP this is probably OK) (y/n)? y
 - fetching BioPerl
 - unpacking ./Bio/tmp/release-1-6-924.zip
 - moving files

Downloading required Ensembl API files
 - fetching ensembl
 - unpacking ./Bio/tmp/ensembl.zip
 - moving files
 - getting version information
 - fetching ensembl-variation
 - unpacking ./Bio/tmp/ensembl-variation.zip
 - moving files
 - getting version information
 - fetching ensembl-funcgen
 - unpacking ./Bio/tmp/ensembl-funcgen.zip
 - moving files
 - getting version information
 - fetching ensembl-io
 - unpacking ./Bio/tmp/ensembl-io.zip
 - moving files
 - getting version information

Testing VEP installation
./t/AnnotationSource.t ............................... ok    
./t/AnnotationSource_BaseVariation.t ................. ok    
./t/AnnotationSource_Cache.t ......................... ok   
./t/AnnotationSource_Cache_RegFeat.t ................. 1/? Can't locate object method "_finish_annotation" via package "Bio::EnsEMBL::Variation::VariationFeature" at ./t/AnnotationSource_Cache_RegFeat.t line 165, <__ANONIO__> line 139.
# Tests were run but no plan was declared and done_testing() was not seen.
./t/AnnotationSource_Cache_RegFeat.t ................. Dubious, test returned 25 (wstat 6400, 0x1900)
All 44 subtests passed 
./t/AnnotationSource_Cache_Transcript.t .............. 62/? Can't locate object method "_finish_annotation" via package "Bio::EnsEMBL::Variation::VariationFeature" at ./t/AnnotationSource_Cache_Transcript.t line 250, <__ANONIO__> line 139.
# Tests were run but no plan was declared and done_testing() was not seen.
./t/AnnotationSource_Cache_Transcript.t .............. Dubious, test returned 25 (wstat 6400, 0x1900)
All 62 subtests passed 
./t/AnnotationSource_Cache_Variation.t ............... ok    
./t/AnnotationSource_Cache_VariationTabix.t .......... 16/? Can't locate object method "alt_alleles" via package "Bio::EnsEMBL::Variation::VariationFeature" at /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/BaseVariation.pm line 231.
# Tests were run but no plan was declared and done_testing() was not seen.
./t/AnnotationSource_Cache_VariationTabix.t .......... Dubious, test returned 255 (wstat 65280, 0xff00)
All 20 subtests passed 
./t/AnnotationSource_Database_RegFeat.t .............. ok    
./t/AnnotationSource_Database_StructuralVariation.t .. ok    
./t/AnnotationSource_Database_Transcript.t ........... ok    
./t/AnnotationSource_Database_Variation.t ............ ok    
./t/AnnotationSource_File.t .......................... ok    
./t/AnnotationSource_File_BED.t ...................... ok    
./t/AnnotationSource_File_BigWig.t ................... ok    
./t/AnnotationSource_File_GFF.t ...................... ok    
./t/AnnotationSource_File_GTF.t ...................... ok    
./t/AnnotationSource_File_VCF.t ...................... ok    
./t/AnnotationSourceAdaptor.t ........................ ok    
./t/BaseVEP.t ........................................ 1/? 
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Object did not provide a get_all_ids method
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/Root/Root.pm:368
STACK: Bio::DB::SeqI::get_all_primary_ids /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/DB/SeqI.pm:153
STACK: Bio::EnsEMBL::VEP::Stats::log_fasta_chromosomes /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Stats.pm:104
STACK: Bio::EnsEMBL::VEP::BaseVEP::fasta_db /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:291
STACK: ./t/BaseVEP.t:118
-----------------------------------------------------------
# Tests were run but no plan was declared and done_testing() was not seen.
./t/BaseVEP.t ........................................ Dubious, test returned 2 (wstat 512, 0x200)
All 24 subtests passed 
./t/CacheDir.t ....................................... ok    
./t/Config.t ......................................... ok    
./t/FilterSet.t ...................................... ok     
./t/InputBuffer.t .................................... 1/? 
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Object did not provide a get_all_ids method
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/Root/Root.pm:368
STACK: Bio::DB::SeqI::get_all_primary_ids /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/DB/SeqI.pm:153
STACK: Bio::EnsEMBL::VEP::Stats::log_fasta_chromosomes /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Stats.pm:104
STACK: Bio::EnsEMBL::VEP::BaseVEP::fasta_db /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:291
STACK: Bio::EnsEMBL::VEP::BaseVEP::get_slice /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:232
STACK: Bio::EnsEMBL::VEP::InputBuffer::finish_annotation /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/InputBuffer.pm:244
STACK: ./t/InputBuffer.t:280
-----------------------------------------------------------
# Tests were run but no plan was declared and done_testing() was not seen.
./t/InputBuffer.t .................................... Dubious, test returned 2 (wstat 512, 0x200)
All 44 subtests passed 
	(less 10 skipped subtests: 34 okay)
./t/OutputFactory.t .................................. 1/? 
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Object did not provide a get_all_ids method
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/Root/Root.pm:368
STACK: Bio::DB::SeqI::get_all_primary_ids /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/DB/SeqI.pm:153
STACK: Bio::EnsEMBL::VEP::Stats::log_fasta_chromosomes /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Stats.pm:104
STACK: Bio::EnsEMBL::VEP::BaseVEP::fasta_db /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:291
STACK: Bio::EnsEMBL::VEP::Runner::init /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:83
STACK: ./t/OutputFactory.t:51
-----------------------------------------------------------
# Tests were run but no plan was declared and done_testing() was not seen.
./t/OutputFactory.t .................................. Dubious, test returned 2 (wstat 512, 0x200)
All 4 subtests passed 
./t/OutputFactory_JSON.t ............................. ok    
./t/OutputFactory_Tab.t .............................. 1/? 
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Object did not provide a get_all_ids method
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/Root/Root.pm:368
STACK: Bio::DB::SeqI::get_all_primary_ids /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/DB/SeqI.pm:153
STACK: Bio::EnsEMBL::VEP::Stats::log_fasta_chromosomes /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Stats.pm:104
STACK: Bio::EnsEMBL::VEP::BaseVEP::fasta_db /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:291
STACK: Bio::EnsEMBL::VEP::Runner::init /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:83
STACK: main::get_annotated_buffer_runner ./t/OutputFactory_Tab.t:284
STACK: ./t/OutputFactory_Tab.t:102
-----------------------------------------------------------
# Tests were run but no plan was declared and done_testing() was not seen.
./t/OutputFactory_Tab.t .............................. Dubious, test returned 2 (wstat 512, 0x200)
All 6 subtests passed 
./t/OutputFactory_VCF.t .............................. 1/? 
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Object did not provide a get_all_ids method
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/Root/Root.pm:368
STACK: Bio::DB::SeqI::get_all_primary_ids /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/DB/SeqI.pm:153
STACK: Bio::EnsEMBL::VEP::Stats::log_fasta_chromosomes /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Stats.pm:104
STACK: Bio::EnsEMBL::VEP::BaseVEP::fasta_db /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:291
STACK: Bio::EnsEMBL::VEP::Runner::init /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:83
STACK: main::get_runner ./t/OutputFactory_VCF.t:556
STACK: ./t/OutputFactory_VCF.t:62
-----------------------------------------------------------
# Tests were run but no plan was declared and done_testing() was not seen.
./t/OutputFactory_VCF.t .............................. Dubious, test returned 2 (wstat 512, 0x200)
All 5 subtests passed 
./t/OutputFactory_VEP_output.t ....................... 3/? 
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Object did not provide a get_all_ids method
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/Root/Root.pm:368
STACK: Bio::DB::SeqI::get_all_primary_ids /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/DB/SeqI.pm:153
STACK: Bio::EnsEMBL::VEP::Stats::log_fasta_chromosomes /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Stats.pm:104
STACK: Bio::EnsEMBL::VEP::BaseVEP::fasta_db /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:291
STACK: Bio::EnsEMBL::VEP::Runner::init /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:83
STACK: main::get_annotated_buffer_runner ./t/OutputFactory_VEP_output.t:278
STACK: ./t/OutputFactory_VEP_output.t:101
-----------------------------------------------------------
# Tests were run but no plan was declared and done_testing() was not seen.
./t/OutputFactory_VEP_output.t ....................... Dubious, test returned 2 (wstat 512, 0x200)
All 8 subtests passed 
./t/Parser.t ......................................... 1/? # Tests were run but no plan was declared and done_testing() was not seen.
./t/Parser.t ......................................... Dubious, test returned 2 (wstat 512, 0x200)
All 50 subtests passed 
./t/Parser_HGVS.t .................................... ok    
./t/Parser_ID.t ...................................... ok   
./t/Parser_VCF.t ..................................... ok    
./t/Parser_VEP_input.t ............................... ok    
./t/Runner.t ......................................... 1/? 
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Object did not provide a get_all_ids method
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/Root/Root.pm:368
STACK: Bio::DB::SeqI::get_all_primary_ids /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/DB/SeqI.pm:153
STACK: Bio::EnsEMBL::VEP::Stats::log_fasta_chromosomes /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Stats.pm:104
STACK: Bio::EnsEMBL::VEP::BaseVEP::fasta_db /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:291
STACK: Bio::EnsEMBL::VEP::Runner::init /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:83
STACK: ./t/Runner.t:190
-----------------------------------------------------------
# Tests were run but no plan was declared and done_testing() was not seen.
./t/Runner.t ......................................... Dubious, test returned 2 (wstat 512, 0x200)
All 14 subtests passed 
./t/Stats.t .......................................... 8/? 
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Object did not provide a get_all_ids method
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/Root/Root.pm:368
STACK: Bio::DB::SeqI::get_all_primary_ids /Users/kohb9m/opt/ensembl-api/src/bioperl-1.6.1/Bio/DB/SeqI.pm:153
STACK: Bio::EnsEMBL::VEP::Stats::log_fasta_chromosomes /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Stats.pm:104
STACK: Bio::EnsEMBL::VEP::BaseVEP::fasta_db /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:291
STACK: Bio::EnsEMBL::VEP::Runner::init /Users/kohb9m/tmp/new_vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:83
STACK: main::get_annotated_buffer_runner ./t/Stats.t:459
STACK: ./t/Stats.t:78
-----------------------------------------------------------
# Tests were run but no plan was declared and done_testing() was not seen.
./t/Stats.t .......................................... Dubious, test returned 2 (wstat 512, 0x200)
All 11 subtests passed 
./t/Utils.t .......................................... ok    
./t/version.t ........................................ ok   

Test Summary Report
-------------------
./t/AnnotationSource_Cache_RegFeat.t               (Wstat: 6400 Tests: 44 Failed: 0)
  Non-zero exit status: 25
  Parse errors: No plan found in TAP output
./t/AnnotationSource_Cache_Transcript.t            (Wstat: 6400 Tests: 62 Failed: 0)
  Non-zero exit status: 25
  Parse errors: No plan found in TAP output
./t/AnnotationSource_Cache_VariationTabix.t        (Wstat: 65280 Tests: 20 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output
./t/BaseVEP.t                                      (Wstat: 512 Tests: 24 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
./t/InputBuffer.t                                  (Wstat: 512 Tests: 44 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
./t/OutputFactory.t                                (Wstat: 512 Tests: 4 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
./t/OutputFactory_Tab.t                            (Wstat: 512 Tests: 6 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
./t/OutputFactory_VCF.t                            (Wstat: 512 Tests: 5 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
./t/OutputFactory_VEP_output.t                     (Wstat: 512 Tests: 8 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
./t/Parser.t                                       (Wstat: 512 Tests: 50 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
./t/Runner.t                                       (Wstat: 512 Tests: 14 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
./t/Stats.t                                        (Wstat: 512 Tests: 11 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
Files=37, Tests=1054, 22 wallclock secs ( 0.19 usr  0.13 sys + 19.28 cusr  1.54 csys = 21.14 CPU)
Result: FAIL
Failed 12/37 test programs. 0/1054 subtests failed.

Installation Failure: cram/cram_io.c:60:10: fatal error: lzma.h: No such file or directory

While installing vep release 88, I get this error.

gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I.  -c -o cram/cram_encode.o cram/cram_encode.c
gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I.  -c -o cram/cram_external.o cram/cram_external.c
gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I.  -c -o cram/cram_index.o cram/cram_index.c
gcc -g -Wall -O2 -fPIC -Wno-unused -Wno-unused-result -I.  -c -o cram/cram_io.o cram/cram_io.c
cram/cram_io.c:60:10: fatal error: lzma.h: No such file or directory
 #include <lzma.h>
          ^~~~~~~~
compilation terminated.
make: *** [cram/cram_io.o] Error 1
Compile didn't complete. No libhts.a library file found at INSTALL.pl line 749.

This error goes away with --NO_HTSLIB flag. Is there an alternate way to this?

VEP not matching COSMIC alleles

I am using release 88.8 with VEP cache v84 and noticed that it is annotating COSMIC variants; however, in the cache COSMIC does not have alleles present due to licensing changes. Why would VEP annotate this?!? I am not using the --no_check_alleles option!

Here is an example command.

/usr/bin/perl /mnt/SCRATCH/bin/ensembl_vep/ensembl-vep/vep \
--cache_version 84 --species homo_sapiens --assembly GRCh38 \
--cache --offline --no_progress --no_stats --sift b --ccds \
--uniprot --hgvs --symbol --numbers --domains --gene_phenotype \
--regulatory --canonical --protein --biotype --tsl --pubmed \
--variant_class --shift_hgvs 1 --check_existing --total_length \
--allele_number --no_escape --xref_refseq --vcf --minimal \
--failed 1 --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length \
--dir_cache /mnt/SCRATCH/vep_cache_84/cache \
--dir_plugin /home/ubuntu/NCI-GDC/vcf2maf-tool/vep-plugins \
--fasta /mnt/SCRATCH/vep_cache_84/vep_fasta/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz \
--input_file /mnt/SCRATCH/test/pyvcf2maf/new_test/new_vep/run/0577fdb0-909c-41e1-81a2-b9ab0d2b09f9.vep.vcf.gz \
--output_file /mnt/SCRATCH/test/pyvcf2maf/new_test/new_vep/run/0577fdb0-909c-41e1-81a2-b9ab0d2b09f9.vcf2maf.vcf \
--plugin GDC_entrez,/mnt/SCRATCH/vep_cache_84/custom/ensembl_entrez_names.json \
--polyphen b --af --af_1kg --af_esp --af_exac --vcf_info_field GDCCSQ --fork 4 \
--synonyms /mnt/SCRATCH/test/pyvcf2maf/new_test/new_vep/synonyms.txt

GeneSplicer Plugin not working

Hi,

I tried to use the GeneSplicer plugin with VEP. First I installed GeneSplicer and did a testrun with its testdata, and it worked.
After runing VEP with the following command I do not get any Annotaion in the GeneSplicer column.

vep.pl --offline --dir '/media/Berechnungen/AnnotationDBs/vep' --fasta '/media/Berechnungen/Referenzgenom/HG19/HG19.karyo.fasta' --assembly GRCh37 -i '/media/Ergebnisse/xxxx-15_Exom_NB501654_0006/xxxx-15.vcf' -o '/media/Ergebnisse/xxxx-15_Exom_NB501654_0006/xxxx-15.annot_neu.vcf' --plugin GeneSplicer,/BioinfSoftware,/media/Berechnungen/AnnotationDBs/GeneSplicer/human,context=100,tmpdir=/media/Berechnungen/Temp --vcf --force_overwrite --fork 8

So I ook a deeper look into the GeneSplicer.pm, and with a view print commands I could narrow the error down to the part where the plugin tries to extact the up and downstream sequence. A print $ref_seq."\n"; after line 170 gives me only "N" bases + "variant" + "N" bases, like this:

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Can you tell what I am doing wrong?

Thanks in advance,
Stefan Diederich

VEP terminates after 7-10 warnings

Hi,

Thanks again for coming back on my previous issues. Currently I'm running VEP with HGVS notations. For about 5% of my data, I receive the warning message WARNING: Unable to parse HGVS notation. The problem is that after 7-10 warnings, VEP terminates. Only after manually removing the incorrect lines the program continues. Is exiting the program intended? Is there a possibility to finish a run with more than a couple warnings, so I can inspect them all at once?

Cannot run VEP when specifying gtf

I'm attempting to run VEP, revision 6b712700089dc71996cea9a7283877406482dd87, by specifying my own GTF annotation with the command:

perl vep.pl -i ../alignment/R636012.vcf -gtf pao1.gtf.gz -fasta ../references/pao1.fa

However, I am getting the following error at the command line:

Can't locate object method "new" via package "Bio::EnsEMBL::VEP::AnnotationSource::File::GTF" at /home/<username>/git-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm line 119.

For some reason it doesn't seem to be calling the new method in the BaseGXF class when trying to instantiate a GTF object.

Unable to parse HGVS notation

Hi,

Congratulations on the new VEP project on GitHub. I'm having some trouble running it though. I'd like to run a file in HGVS notation as input, but the parser is giving me an error:

$ perl /usr/local/ensembl-vep/vep.pl -i test_HGVS-annotation.txt

WARNING: Unable to parse HGVS notation 'AGT:c.803T>C'
Can't call method "dnadb" on an undefined value at /usr/local/ensembl-vep/Bio/EnsEMBL/Variation/DBSQL/VariationFeatureAdaptor.pm line 1790, <__ANONIO__> line 2.

This input file contains the example from the online VEP tool:

AGT:c.803T>C
9:g.22125504G>C
ENST00000003084:c.1431_1433delTTC
19:g.110747_110748insT
LRG_101t1:c.1019T>C

Before the run, I installed VEP by following the instructions on VEP project page. My perl version is 5.18.2, OS is Ubuntu 14.04 and I installed the DBI package with:

sudo cpan
install DBI

I tested the installation with an example input by running it both online (get an error) and offline (runs good):

perl /usr/local/ensembl-vep/vep.pl -i /usr/local/ensembl-vep/examples/homo_sapiens_GRCh37.vcf -o ~/test2

WARNING: Chromosome 21 not found in annotation sources or synonyms on line 1

WARNING: Chromosome 21 not found in annotation sources or synonyms on line 2

WARNING: Chromosome 21 not found in annotation sources or synonyms on line 3

This error goes on until WARNING: Chromosome 22 not found in annotation sources or synonyms on line 173

When I run it offline, it seems to run successfully (I installed hg19 in the cache):

perl /usr/local/ensembl-vep/vep.pl -i /usr/local/ensembl-vep/examples/homo_sapiens_GRCh37.vcf -o ~/test4 --offline -dir /usr/local/vep-cache

Polyphen & SIFT not available

Hello,

I did see the posting relating to this at: #33
Unfortunately I wasn't able to get a solution to this problem from that issue page.

I'm working on accomodating the new vep methods into my code, but have run into the same error message for both SIFT and Polyphen:

variant_effect_predictor.pl failed: exit status: 256; output: ; error: 
-------------------- EXCEPTION --------------------
MSG: ERROR: PolyPhen not available

STACK Bio::EnsEMBL::VEP::AnnotationSource::Cache::Transcript::check_sift_polyphen /Users/jcook/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/Cache/Transcript.pm:110
STACK Bio::EnsEMBL::VEP::AnnotationSource::Cache::Transcript::new /Users/jcook/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/Cache/Transcript.pm:78
STACK Bio::EnsEMBL::VEP::CacheDir::get_all_AnnotationSources /Users/jcook/ensembl-vep/modules/Bio/EnsEMBL/VEP/CacheDir.pm:95
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_from_cache /Users/jcook/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:81
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /Users/jcook/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:63
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /Users/jcook/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::init /Users/jcook/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:77
STACK Bio::EnsEMBL::VEP::Runner::run /Users/jcook/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:111
STACK toplevel /Users/jcook/ensembl-vep/vep:209
Date (localtime)    = Wed Apr  5 10:33:29 2017
Ensembl API version = 88
---------------------------------------------------

Is there a plugin or anything else needed to rectify this?

I installed using 'perl INSTALL.pl --NO_HTSLIB'

Thanks!

-cookersjs

Feature request: most severe in filter script

I know the --most-severe flag exists during the primary VEP run, but it would nice if something similar existed for the filter script.

I.e., for a given "uploaded variation", if there are multiple entries, which of these is most severe.

Commas in annotation fields (TREMBL)

Hello,

just a quick note. I was parsing a VEP annotated VCF today and ran into a comma in the TREMBL field. I saw that the VEP documentation states that commas are replaced with ampersands to preserve VCF format, so I thought I'd point this out.

For example, for this variant:
1 1139603 rs200158513 C T 100 PASS

I have the following output for the TREMBL field: J3KT02,A8WFP4 for one of the transcripts.

Annotation was done with ensembl-vep (pre-release), here's additional info from the VCF header:
##VEP="v87" time="2017-03-10 13:08:10" cache="/opt/ensembl-vep/cache/homo_sapiens/87_GRCh37" ensembl=87.f547798 ensembl-io=87.48cb128 ensembl-funcgen=87.0577dd0 ensembl-variation=87.661e72c 1000genomes="phase3" COSMIC="78" ClinVar="201610" ESP="20141103" ExAC="0.3" HGMD-PUBLIC="20162" assembly="GRCh37.p13" dbSNP="147" gencode="GENCODE 19" genebuild="2011-04" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"

The dbNSFP plugin also produced some whitespaces and commas, posting examples there.

I'm still rather new at working with VCFs and I saw in the VCF spec that commas are allowed in the INFO field as separators for lists (although I don't think this works for the CSQ, because transcripts entries are comma separated?), so please let me know if this is expected behaviour. Parsing this VCF with htsjdk failed.

Thank you for your time.

Combining --merged and --custom results in duplicated SOURCE column in --tab output

When using both flags (--merged and --custom) for regular or tabbed output the header displays duplicate lines:

## SOURCE : Source of transcript
## SOURCE : Source of transcript

For regular output, there's no duplicated SOURCE key in the Extra column.
But in when the --tab flag is given the output file contains duplicated SOURCE columns.

Want to Use only GRCh37

Hi Folks,

I want to use only GRCh37 to process my VCF file. Will --offline gives the same output for the command I'm using, find below. This taking GRCh38 cache file, How to revert this to GRCh37.

variant_effect_predictor.pl --dir_cache /reference/software/vep/ --fork 20 -i 39bc83a8-b270-488b-b236-fa0b8728c4d2.vcf --format vcf --vcf -o /mnt/lustre/working/genomeinfo/share/temp/venkatA/Analysis/VEP/39bc83a8-b270-488b-b236-fa0b8728c4d2 --symbol --dir_plugins /mnt/lustre/reference/software/vep/Plugins/ --plugin Downstream --plugin Wildtype --coding_only --buffer_size 100000

excessive memory usage during analysis

Hi,

We're trying to annotate exome vcf using the latest version of vep, but we're experiencing huge memory inefficiencies.

The following command running on a single core was using 16GB of memory before crashing.
Any idea what could be wrong? The README states the new vep should be quite memory efficient.

vep \
  --vcf \
  -o GiaB-vep88.vcf \
  -i Giab.vcf.gz \
  --force_overwrite \
  --species homo_sapiens \
  --fasta ./Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz \
  --cache \
  --no_stats \
  --offline \
  --dir ./vep/ \
  --hgvs \
  --shift_hgvs 1 \
  --total_length \
  --no_stats \
  --symbol --numbers --biotype --total_length --canonical --gene_phenotype --ccds --uniprot --domains --regulatory \
  --protein --tsl --appris --af --max_af --af_1kg --af_esp --af_exac --pubmed --variant_class

Thanks!
M

Stable release version

Hi,

Is there a version of the new vep that can be considered as "stable"? I'm looking into creating a recipe for the bioconda repository, so it can be used/tested in production pipelines

Thanks
M

VCF header malformed - branch release87

Hi,

The header does not seem to be dereferenced properly and the contigs entries are incorrect:

##fileformat=VCFv4.2
##ALT=ARRAY(0x4e366e8)
##ALT=ARRAY(0x4e366e8)
##ALT=ARRAY(0x4e366e8)
##ALT=ARRAY(0x4e366e8)
##ALT=ARRAY(0x4e366e8)
##FILTER=ARRAY(0x53baf98)
##FILTER=ARRAY(0x53baf98)
##FILTER=ARRAY(0x53baf98)
##FILTER=ARRAY(0x53baf98)
##FILTER=ARRAY(0x53baf98)
##FILTER=ARRAY(0x53baf98)
##FORMAT=ARRAY(0x334bbe8)
##FORMAT=ARRAY(0x334bbe8)
##FORMAT=ARRAY(0x334bbe8)
##FORMAT=ARRAY(0x334bbe8)
##FORMAT=ARRAY(0x334bbe8)
##FORMAT=ARRAY(0x334bbe8)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##INFO=ARRAY(0x334bbd0)
##bcftools_annotateCommand=annotate --header-lines references/vcfanno_headerLines.v1.0.txt /scratch/718358/118_sorted_md_brecal_comb_SV_vt_filt_genmod_filter_vcfanno.vcf
##bcftools_annotateVersion=1.3.1+htslib-1.3.1
##bcftools_concatCommand=concat -a /scratch/718358/118_pmanta.vcf.gz
##bcftools_concatVersion=1.3.1+htslib-1.3.1
##bcftools_viewCommand=view -f PASS /scratch/718358/118_sorted_md_brecal_comb_SV_vt.vcf
##bcftools_viewVersion=1.3.1+htslib-1.3.1
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##contig=<ID=hs37d5,length=35477943>
##fileDate=20161211
##reference=file:references/Homo_sapiens.GRCh37.d5.fasta
##source=GenerateSVCandidates 1.0.0
##VEP="v87" time="2016-12-13 08:41:39" cache="ensembl-tools-release-87/cache/homo_sapiens/87_GRCh37" ensembl-variation=87.f4bd0ac ensembl-funcgen=87.1514451 ensembl=87.f547798 ensembl-io=87.9a66f34 COSMIC="71" ClinVar="201507" ESP="20141103" HGMD-PUBLIC="20152" assembly="GRCh37.p13" dbSNP="144" gencode="GENCODE 19" genebuild="2011-04" polyphen="2.2.2" regbuild="13" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|SIFT|PolyPhen|DOMAINS|HGVS_OFFSET|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|LoFtool|LoF|LoF_filter|LoF_flags|LoF_info">
##LoFtool=LoFtool score for gene
##LoF=Loss-of-function annotation (HC = High Confidence; LC = Low Confidence)
##LoF_filter=Reason for LoF not being HC
##LoF_flags=Possible warning flags for LoF
##LoF_info=Info used for LoF annotation

Problems with mitochondrial variants annotation

Hi
I use the latest version(v88) to annotate mitochondrial variants, and find a question. When I use 'chrM', it does not work, says 'WARNING: Chromosome chrM not found in annotation sources or synonyms on line 1' , then I changed to 'M', so it works. Does it is a mistake?

possible bug: slowdown and crashing when using uncompressed reference.

Hi Will,

We're experiencing the following:
using the command:

 unset PERL5LIB && \
vep \
--vcf -o stdout \
-i GiaB.vcf.gz \
--fork 8 \
--species homo_sapiens \
--no_stats \
--cache \
--merged \
--offline \
--dir /genomes/Hsapiens/hg38/vep \
--symbol --numbers --biotype --total_length --canonical --gene_phenotype --ccds --uniprot \
--domains --regulatory --protein --tsl --appris --af --max_af --af_1kg --af_esp --af_exac \
--pubmed --variant_class --sift b --polyphen b \
--plugin LoF,human_ancestor_fa:false \
--plugin MaxEntScan,/share/maxentscan-0_2004.04.21-0 \
--plugin GeneSplicer,/share/genesplicer-1.0-1/genesplicer,/genomes/Hsapiens/hg38/variation/genesplicer \
--hgvs --shift_hgvs 1 --fasta /genomes/Hsapiens/hg38/seq/hg38.fa \
| sed '/^#/! s/;;/;/g' \
| bgzip -c > GiaB-vepeffects.vcf.gz

I'm getting a gzip: stdout: Broken pipe error, which doesn't really do anything, but just looks ugly.
What's worse is that this command takes forever to run (8h+) and then crashes. The issue is solved by removing the --hgvs --shift_hgvs 1 --fasta /genomes/Hsapiens/hg38/seq/hg38.fa flags, but then we are left without some crucial annotations.
Since our vep installation has some HTSLIB issues, we're not able to test this with the default bgzipped fasta file.
I have a suspicion that the issue stems from the indexing of the fasta file. Instead of hg38.fa.index as was usual. I'm getting hg38.fa.index.dir and hg38.fa.index.pag.

Another (smaller) annoyance is that both Genesplicer and MaxEntScan don't work without specifying a fasta using the --fasta flag.

Any idea's in this?

Cheers
M

Bioconda recipe for ensembl vep

Hi Will,

I've taken the liberty of creating a conda recipe for vep. I (and others) have been using this to easily install vep.

I was wondering if you're interested in checking out the recipe and providing some comments and suggestions to make it better. It would also be very helpful if you could help keep this recipe up to date with the latest release of vep.

I'd love to hear your thoughts on this.
Thanks
Matthias

Missing declaration of $NO_BIOPERL variable.

Running the install script gives the following error:

$ perl INSTALL.pl 
Global symbol "$NO_BIOPERL" requires explicit package name (did you forget to declare "my $NO_BIOPERL"?) at INSTALL.pl line 146.
Global symbol "$NO_BIOPERL" requires explicit package name (did you forget to declare "my $NO_BIOPERL"?) at INSTALL.pl line 399.
Execution of INSTALL.pl aborted due to compilation errors.

Will fork, fix and submit PR.

Custom annotation with VCF files

Good morning,

I use the custom annotation option to annotate my new found variants with information from our in house variant database (allele frequencies etc...). Therefore I export all the variants from our in house database to a vcf file which is looking like this:

#CHROM POS ID REF ALT chr1 12670 0.0151515151515152 G C chr1 13061 0.0151515151515152 G C chr1 13091 0.0151515151515152 G A chr1 13273 0.166666666666667 G C chr1 13302 0.196969696969697 C T

If I now have one position with two or more different variations, for example

chr1 160009163 0.0151515151515152 GACACACACACACACACAC G chr1 160009163 0.0151515151515152 GACACACACACACACAC G chr1 160009163 0.0151515151515152 G GAC

Variant Effect Predictor will now not annotate a found variant at this position. Can I somehow fix this?

Thanks
Stefan

WARNING: Chromosome 17 not found in annotation sources or synonyms on line 167982

Hello,

I followed the manual and did the following steps:

grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n | bgzip -c > data.gff.gz
tabix -p gff data.gff.gz
perl vep.pl -i input.vcf -gff data.gff.gz -fasta genome.fa.gz

I got a long list of warnings like the following:

WARNING: Chromosome 11 not found in annotation sources or synonyms on line 591383

WARNING: Chromosome 11 not found in annotation sources or synonyms on line 591384

WARNING: Chromosome 11 not found in annotation sources or synonyms on line 591385

WARNING: Chromosome 11 not found in annotation sources or synonyms on line 591386

WARNING: Chromosome 11 not found in annotation sources or synonyms on line 591387

WARNING: Chromosome 11 not found in annotation sources or synonyms on line 591388

WARNING: Chromosome 11 not found in annotation sources or synonyms on line 591389

WARNING: Chromosome 11 not found in annotation sources or synonyms on line 591390

Did I do something wrong? I used a gff3 annotation file.

Thanking you!

Real genomic position of the calculated HGVSc of an ambiguous indel

Hello

Using the last github pull

I have following ambiguous indel (GRCh37)
... from samtools/bcftools
9 740769 . CTTTTTTTTTTTTTTTTTT CTTTTTTTTTTTT,CTTTTTTTTTTT

... and from freebayes
9 740769 . CTTTTTTTTTTTTTTTTTTA CTTTTTTTTTTTTA

Because I'd like to parse also the other informations in the original vcf I would parse the VEP output in vcf format.
The TAB format doesn't contain the initial data and the "#Uploaded_variation" is always a dot.
And after 3' shifting of the ambiguous indel the calculated location doesn't reflect any more the location in the initial vcf. HGVS_OFFSET is always null.

The "shifted" allele is after the VEP run
9:740770-740787 TTTTTTTTTTTT (samtools) and
9:740770-740788 TTTTTTTTTTTTA (freebayes)
HGVS_OFFSET: null

... but HGVSc is for both the same: ENST00000382286.1:c.311-10_311-5delTTTTTT

As you see, there are now only 6 T's involved instead of the shown allele (12 T's).

Could you please enable to show also the real genomic position of the calculated HGVSc?
That would be the real position of the variation to be saved into a database.

Thank you very much in advance
Regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.