Git Product home page Git Product logo

vapid's Introduction

VAPiD

Viral Annotation Pipeline and iDentification

VAPiD is a ultra-lightweight script for quickly annotating and preparing sequences of human viruses for sending to NCBI Genbank. VAPiD takes a fasta file of human viruses, an NCBI-generated author list, and sample metadata and exports Sequin files that are ready to submit to NCBI as well as human readable annotations which can be used for other analysis.

If you use this program for research (or something else where you can put citations) I'd love a citation!

BMC Bioinformatics Manuscript!

VAPiD can perform three different types of viral annotation:

  1. Rapid search of a local compressed viral database for any set of viruses with a reference genome (recommended)
  2. Annotation of a single viral species based on a preferred reference genome using a single Genbank accession number.
  3. Comprehensive web NT database search for all viral sequences (much slower)

VAPiD is currently tested and working on Windows 10, Ubuntu 10.4, and Mac OS X.

Viruses that VAPiD has been tested with:
2019-nCov is tested and working as of 2/23/20. However, the prebuilt local blast databases are from late 2018 and as such do not contain any references for this new virus. I reccomend either using '--r NC_045512.2' or downloading novel sequences and adding them to your own local blast database, the releases page contains the fasta sequences used to generate the blast databases so simply append the new sequences and rebuild the blast database.

RSV, Parainfluenzas, Metapneumovirus, Coronaviruses, Enterovirus/Rhinoviruses, Hepatitis A-E, Nipah, Sendai, Measles, Mumps, Rubella, Ebola, West Nile Virus, HTLV, HIV (The references for HIV can be a little variable), Norovirus, JC, BK, HPV. However, any non segmented virus that has been previously deposited on genbank or for which you posses a .gbf file should work. If you would like to use VAPiD for a virus not listed just go ahead and try running it - if you get bad annoations or errors then send an email to [email protected] and I'll add support for your favorite virus.

VAPiD currently does not support segmented viral genomes, although theoretically one could run individual segments one at a time.

Your FASTA sequence names shouldn't be more than 23 characters, although this won't crash the program your output .gbf files will be corrupted, VAPiD will print a warning when this happens so you can edit the names and resubmit.

As of VAPiD v1.3 support for custom virus names has been added and restrictions on slashes have been removed. For more information see the section below titled "Custom Names".

Quickstart Installation Guide

  1. Ensure you have python with numpy and biopython, mafft, and blast+ installed locally and on your path.
  2. Download and install VAPiD
  3. Download VAPiD Viral Database and put it in the VAPiD folder
  4. Download and install tbl2asn this needs to be put on your path.

Quickstart Run Guide

  1. Put your viral genomes in a single fasta file, preferably with the strain name for each sequence as the sequence header.
  2. Make a author metadata submission template file for your submission at https://submit.ncbi.nlm.nih.gov/genbank/template/submission/
  3. Run VAPiD. vapid --db all_virus.fasta <your_virus.fasta> <your_author_metadata.sbt>
  4. VAPiD will prompt you for sample collection dates, collection locations, and coverages for each sequence in your fasta file.
  5. Email .sqn files for each virus to [email protected]

Installation

Mac or Linux

  1. Install all dependencies (Shout-out to the wonderful people who wrote these!)

Python - tested almost exclusively on python 2.7.14. Python 3 is now supported. Follow these exact steps except obviously install python pakcages for Python 3 not 2. Then at runtime simply exectue vapid3.py (python3 vapid3.py ect.) https://www.python.org/downloads/

If you're running Python < 2.7.4 follow this guide to install pip (You may need administrator privileges.) https://pip.pypa.io/en/stable/installing/

Download the get-pip.py and run it from the command line by typing python get-pip.py

If you're using Python >= 2.7.4 or you've successfully installed pip: Open a command line and type

pip install numpy

pip install biopython

Now download and unzip this repository to your computer (anywhere will work)

Then you'll need to install tbl2asn and put it on your path. (Another option is to download tbl2asn and simply unpack it directly into the folder you've unpacked this repository into.) tbl2asn can be found at https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/

Windows

  1. Install Python 2.7.14 https://www.python.org/downloads/ Make sure that when this is installing on the third step 'Customize your python installation' the box that says put python on your path is checked. This makes installing and running this annotator much easier. (For best results run the installer program as administrator) If you don't see this option during python installation, after its fully installed open a command prompt window and type:

set PATH=%PATH%;C:\python27\

or change C:\python27\ to wherever Python was installed (but C:\python27\ is the default install location).

  1. Install Numpy and Biopython

python -m pip install numpy

python -m pip install biopython

  1. Then download and unzip this repository (which can be downloaded with the green link on the right to the top of the page) to your computer (anywhere will do)

  2. Download tbl2asn for windows https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/ and unzip it, then copy and paste every single file in the tbl2asn folder (there should be like 7 .dlls and an .exe that is simply called tbl2asn) into the folder that you just made in step 3.

After the above has completed there's still a bit of setup that needs to be done.

  1. Download a reference database over on the release tab, (https://github.com/rcs333/VAPiD/releases/). Download the .nhr .nsq and .nin file and place them in the VAPiD folder. You can download all of them to see which one works best for your case. Pick a different reference using the --db flag! (If you don't specify a database VAPiD will try all_virus, then compressed, then refseq automatically)

  2. (Mac and Linux only )Install MAFFT using your favorite package manager ('brew install mafft' or 'sudo apt-get install mafft') or download and install from https://mafft.cbrc.jp/alignment/software/ for your appropriate system. (THIS IS ALREADY DONE FOR WINDOWS and is included under the GPL licence)

  3. A .sbt file that contains your name and information about the organization that you wish to submit your sequences to NCBI under. This .sbt file can be generated by filling out the form here: https://submit.ncbi.nlm.nih.gov/genbank/template/submission/ (The example.sbt file is included so that you can verify your installation, please don't use this for actual submissions)

  4. Put the newly generated .sbt file onto your computer. (Generally it's easiest to just put it in the VAPiD folder). If you'll be submitting multiple sequences from different people you can generate more than one, put them all in this folder, and choose which one you want to use at run time. (Although you can only select one .sbt per fasta input)

  5. (Optional) You can generate a .csv file with most metadata that you wish to associate with your sequences. The file should have across the top Strain (the name of the fasta sequences that you'll import) followed by columns with NCBI approved metadata https://www.ncbi.nlm.nih.gov/Sequin/modifiers.html

An important note is that the strain names in the metadata csv MUST match the sequence names in your fasta files. (i.e. the part after the >)

An example metadata file has been provided under the name example_metadata.csv - this metadata file will work with the example fasta file provided (example.fasta).

If you don't have very many sequences at a time or you include a fasta not in your metadata sheet the program will automatically prompt you for strain name, collection date, country of collection and coverage (for ngs reads). You must fill out strain name, collection date and location or NCBI will not accept your submission. Coverage is not necessary and if skipped during the automatic prompting, will not create any issues.

Usage - vapid.py

Create your fasta file with all of the sequences that you would like to annotate. You can have as many sequences as you want. And you should name the sequences in your fasta file what you would like the strain name to be. (For the provided fasta file, example.fasta, this name would be 'test'). FASTA STRAIN NAMES SHOULD NOT HAVE SPECIAL CHARACTERS IN THEM!! (Things like ' " ? - # * ect.) This is still true for the base name as of v1.3. See the custom names section below for more information on this.

Then you would need to run the vapid.py script from the command line. First cd to the directory that VAPiD is living in - if you cloned from GitHub it'll be ../VAPiD/

Detailed help can be generated by running vapid.py -h or without any arguments.

Example CD Command Just change the path here to wherever you ended up unzipping the VAPiD folder on your computer.

cd C:\User\Downloads\VAPiD

General Usage

python vapid.py fasta_file_path author_template_path --metadata_loc metadata_info_path

Optional arguments - choose one of these

--db custom_database_location --online --r reference_accesion_num --f

Secondary optional argument - pick any you want

--no_spell_check --all

Example Usage (With metadata in the sheet)

python vapid.py example.fasta example.sbt --metadata_loc example_metadata.csv

Example Usage (No metadata sheet)

python vapid.py example.fasta example.sbt

Example Usage (With metadata sheet AND specifying the reference)

python vapid.py example.fasta example.sbt --metadata_loc example_metadata.csv --r KF530268.1

Example Usage (No metadata sheet, specifying the reference, and no spellchecking)

python vapid.py example.fasta example.sbt --r KY45632.1 --no_spell_check

Example Usage (with a custom blast database)

python vapid.py example.fasta example.sbt --metadata_loc example_metadata.csv --db /User/my/path/to/a/working/blastdb/

You need to have BLAST+ available on the system path and the specified directory needs to be set up correctly or the program will crash

Example Usage (online with no metadata)

python vapid.py example.fasta example.sbt --online

Example Usage (default reference database, spellchecking and transfering 'gene' annotations as well as 'CDS' annotations

python vapid.py example.fasta example.sbt --spell_check --all

python vapid.py -h prints out a list of arguments and some help information

The metadata_loc argument is optional and if you don't provide a metadata location or if your sheet does not contain metadata for some sequences the program will prompt you for it.

You can just put relative paths to your sbt and fasta file, I find it is easiest to store everything in the VAPiD folder - that way you don't have to worry about typing paths.

Output

The program will run for a bit and generate a folder for each sequence in your fasta file. The folder will be named the same as what you named the strain in the fasta file. (So if the first line of your fasta file is >SC12309 then there will be a folder called SC12309 with all of the .gbf, .tbl, and ect files that you need and love.)

Here's a picture of all the files that should be present if you run VAPiD on the example.fasta and examble.sbt included.

Example Output

tbl2asn will also spit out some errors and warnings onto the command line.

The program itself will also examine each sequence record for stop codons and notify you of which ones contain stops.

Inside each folder will be some files, you can examine .gbf files either in a text editor or in something like Geneious. To submit your sequence to NCBI simply email the .sqn file to [email protected] and then shortly after your sequences will be deposited on NCBI.

Custom Names

I have added support to submit viruses with free form full names. For example >Sample 3 (USA/Human/2016). Also if you are using the manual metadata input feature where it prompts you for the minimum required metadata VAPiD automatically appends (country/col_year) to your name. This is in an attempt to improve the quality of sequence being sent to NCBI so they like us more.

There are two different ways to submit with free form full names.

Add a full_name column to your metadata.csv file

metadata.csv line 1: strain,collection-date,country,coverage,full_name

metadata.csv line 2: test,2017,USA,42.5,test (USA/2016/A)

Simply add a column that is named EXACTLY 'full_name' and put the full name into the metadata.csv file. If you are using this option you still need to have your fasta headers match. So for the example above your fasta file would still need to start with just >test. This is so that I can find the record corresponding to 'test' in the metadata file. In this case you do not need to put a space in the full_name. i.e. test-usa/2016/a would work using this method.

OR

Have your FASTA header be anything you want and use --slashes at runtime

In this option you would have your fasta header be >test (USA/Human/2016). And you would add --slashes to your python command running vapid.py. When using this option, internally your samples are still represented by test, right now it is required that there be a space between >test and anything with backslashes in it. So when using this option your metadata.csv file would still have just test and you could associate collection year or any other metadata with this sample.

In either of these methods your results will be placed into a folder that is named test. VAPiD will break if you don't have a space between your fasta header and anything with backslashes.

I do NOT reccomend batching submissions that mix these options. Also, if my solution to this problem has introduced any new problems or simply isn't appropriate for your case please let me know and I can add support for your specific case.

Implementation Details and Important Notes

A large problem is actually inconsistent spelling in GenBank sequence records or sequence records that do not have every protein annotated. The ESpell utility from NCBI is currently being used to check spelling on protein names. However this can result in certain protein names losing capitilization (i.e. IIIa3 will get changed to iiia3).

vapid's People

Contributors

hyphaltip avatar michellejlin avatar rcs333 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

vapid's Issues

How similar do inputs need to be?

Do you have a sense of how similar an input should be to the nearest available reference in NT?
Is it more like 99% ANI or 90% ANI?
I couldn't figure it out from the code or the paper. Sorry if I missed something.

Thanks,

Mike

adjust() does not work properly

Minimally reproducible example:

>>> import vapid
>>> d = vapid.build_num_arrays("a--g-", "accgt")
>>> vapid.adjust(2, d[0], d[1], "accgt")
'-1'

Please help, I have problems

Hi everyone,
Hope you'll help me. I uploaded VAPiD and all necessary modules. Unfortunately, I couldn't start.

Here, the result with my own files.
PS C:\Users\shank\PycharmProjects\BioPython_Tree\VAPiD> python vapid.py C:\Users\shank\PycharmProjects\BioPython_Tree\BioPython_Tree\data\Measles_FORT_L16.fasta ### fasta file
C:\Users\shank\PycharmProjects\BioPython_Tree\BioPython_Tree\data\Measles_FORT-L16.sbt ### sbt file
--metadata_loc C:\Users\shank\PycharmProjects\BioPython_Tree\BioPython_Tree\data\Measles_FORT_L16.csv ### metadata file
--online ### because local DB doesn't work

metadata not found in provided .csv or .csv not created - time for minimal manual entry for sequence - Measles_FORT_L16
Traceback (most recent call last):
File "C:\Users\shank\PycharmProjects\BioPython_Tree\VAPiD\vapid.py", line 958, in
metadata, coverage, full_name_list[x] = do_meta_data(virus_strain_list[x], True, full_name_list[x])
File "C:\Users\shank\PycharmProjects\BioPython_Tree\VAPiD\vapid.py", line 863, in do_meta_data
col = ' [collection-date=' + raw_input('Enter collection date in the format (23-Mar-2005, Mar-2005, or 2005): ').strip() + ']'
NameError: name 'raw_input' is not defined

Why there is no metadata in my csv file? Somebody, say something!
Regards, Andrei

Issues with final genbank-format file

Hi,

Thanks for the interesting tool. I've tried it out on a de novo assembly of an RSV isolate, but the resulting genbank-format file is incomplete with some minor errors.

Firstly, the file is missing the 'gene' features necessary for a genbank file, as well as some of the qualifiers for the CDS region ('/gene=...', '/product=...'):

image

Compare to the selected reference:

image

Secondly, my input sequence was reverse complemented with respect to the reference. In this circumstance, the genbank file should state that the feature occurs on the complementary strand:

image

If you are still working on this project and have time, could these issues please be fixed?

Cheers,
Charles

facing issue with custom database

d3w@d3w:~/VAPiD$ python vapid3.py reference.fasta template.sbt --db blastdb/dbase
metadata not found in provided .csv or .csv not created - time for minimal manual entry for sequence - NC_045512.2
Enter collection date in the format (23-Mar-2005, Mar-2005, or 2005): 2019
Enter country sample was collected in (example - USA): china
Enter strain name - if unknown just put NC_045512.2: NC_045512.2
Enter coverage as a number (example 42.3), if unknown just leave this blank and hit enter:
Searching local blast database at blastdb/dbase
Traceback (most recent call last):
File "vapid3.py", line 965, in
meta_list[x], coverage_list[x], sbt_file_loc, full_name_list[x],nuc_acid_type)
File "vapid3.py", line 630, in annotate_a_virus
name_of_virus, our_seq, ref_seq, ref_accession, need_to_rc = blast_n_stuff(strain, strain + SLASH + strain + '.fasta')
File "vapid3.py", line 106, in blast_n_stuff
ref_seq_gb = line.split('|')[3]
IndexError: list index out of range

I created a database only with sars-cov-2 reference sequence and tried to annotate. But these errors are occurring.

However, when I use --r flag with accession number, it works. what should I do? I need to work with that reference sequence locally due to bandwidth limitation. please help

Spliced genes + ncRNAs

Hi Ryan,

Great tool! I've been using it for a couple of side projects and seems to work great (python3 version). However, now I want to submit some human cytomegalovirus genomes (HCMV) and I found myself in a bit of a mess (correcting everything manually).

Apparently VAPiD has troubles with transferring ncRNA and repeats, which makes the gene name/count (ie UL2, UL3, UL4, etc) to shift some positions. Additionally, I've found that performs rather poorly with genes with multiple CDS that require splicing.

Both cases a quite relevant for dsDNA viruses and I hope you'll consider an update of VAPiD in order to fulfill better those cases. If you need some examples to work with, I recommend you use any of the HCMV available genomes.

EDIT: I recommend to run it back to back with RATT (which now can be installed through Conda without much hassle) + gbd2tbl + tbl2asn

VAPID run error

Hi,
I have downloaded VAPiD-master folder and running VAPID with the following command, it shows following error. Please help me to figure out the error.

run command:
python /VAPiD-master/vapid.py --db all_virus.fasta rndm_assembly.fasta template.sbt
error:

metadata not found in provided .csv or .csv not created - time for minimal manual entry for sequence - 1
Traceback (most recent call last):
File "/VAPiD-master/vapid.py", line 958, in
metadata, coverage, full_name_list[x] = do_meta_data(virus_strain_list[x], True, full_name_list[x])
File "/home/mmk6053/Bcov_new/rndm-reads/vapid_run/VAPiD-master/vapid.py", line 863, in do_meta_data
col = ' [collection-date=' + raw_input('Enter collection date in the format (23-Mar-2005, Mar-2005, or 2005): ').strip() + ']'
NameError: name 'raw_input' is not defined

VAPiD not processing final sequence (of 5) and no sequences yield .sqn files

Hi,
I have a set of 5 novel picorna viruses that I am trying to annotate for submission to GenBank. I am attaching my fasta, template and metadata files. (NOTE: I had to add .txt to the end of the template and metadata files to attach them here)

Here is the command I used:
python3 vapid3.py /mnt/pathogen1/kathiem/simian_picornaviruses/novel_Picornaviridae.fasta /mnt/pathogen1/kathiem/simian_picornaviruses/template.sbt --metadata_loc /mnt/pathogen1/kathiem/simian_picornaviruses/novel_Picornaviridae_metadata.csv --spell_check --all

When the process starts, I get this message (5 times):
"Automatic strain naming failed but submission will proceed without metadata appended to the fasta header."

For the first 4 sequences, all finish with this message:
"/bin/sh: tbl2asn: command not found
Done with: NOLA1"

These are the files present in the NOLA1 folder:
assembly.cmt
NOLA1.ali
NOLA1_aligner.fasta
NOLA1.blastresults
NOLA1.fasta
NOLA1.fsa
NOLA1_ref.fasta
NOLA1_ref.gbk
NOLA1.tbl

The final sequence crashes with this message:
"Searching local blast database at all_virus.fasta
AF326754.2 was the selected reference
Simian enterovirus SV19 was the parsed name of the virus
Aligning reference and query...
Done alignment
Traceback (most recent call last):
File "vapid3.py", line 965, in
meta_list[x], coverage_list[x], sbt_file_loc, full_name_list[x],nuc_acid_type)
File "vapid3.py", line 643, in annotate_a_virus
gene_loc_list, gene_product_list, all_loc_list, all_product_list, name_of_the_feature_list = pull_correct_annotations(strain, our_seq, ref_seq, genome)
File "vapid3.py", line 385, in pull_correct_annotations
all_loc_list[entry][y] = adjust(int(all_loc_list[entry][y]), our_seq_num_array, ref_seq_num_array, genome)
File "vapid3.py", line 300, in adjust
if our_num_array[given_num] == '-1':
IndexError: list index out of range"

Can you please tell me where my error lies? I have already replaced \r\n with \n in my files.
Your advice would be greatly appreciated,

Kathie
novel_Picornaviridae_metadata.csv

novel_Picornaviridae.fasta.txt
template.sbt.txt

running VAPiD throws IndexError: list index out of range - v2

Hi,

Thanks for creating vapid. I'm having a similar problem to issue #14 . However it's on a different line of code:

Searching local blast database at ref_seq_vir
Traceback (most recent call last):
  File "../VAPiD/vapid3.py", line 969, in <module>
    strain2species[virus_strain_list[x]] = annotate_a_virus(virus_strain_list[x], virus_genome_list[x],
  File ".../VAPiD/vapid3.py", line 635, in annotate_a_virus
    name_of_virus, our_seq, ref_seq, ref_accession, need_to_rc = blast_n_stuff(strain, strain + SLASH + strain + '.fasta')
  File "../VAPiD/vapid3.py", line 182, in blast_n_stuff
    ref_seq_gb = line.split('|')[3]

I am using the default ref_seq_vir database. Command I used:

python vapid3.py my.fasta example.sbt --metadata_loc meta.csv

I have confirmed all deps are installed (blast, mafft) and running/in path. What's interesting is this line:

for line in open(strain + SLASH + strain + '.blastresults'):
    ref_seq_gb = line.split('|')[3]

This is the output .blastresults content:

yfv	NC_002031.1	99.880	10861	13	0	2	10862	2	10862	0.0	19994

Reading the code, the blast output is in -outfmt 6 which is tab separated. The code above parses the blast output with | as a separator but the file is actually tab delimited. Another odd thing is that the code extracts the fourth element. I have to confess confusion here, why is the above code working for others?

Anyway I changed the offending line(s) to this:

ref_seq_gb = line.split('\t')[1]

there are two instances where the blast file is parsed.

I am running on MacOS ARM based, the OS is detected correctly as Darwin. Python version 3.9. All deps installed via conda (including blast).

UnboundLocalError: local variable 'ref_seq_gb' referenced before assignment

Greetings, I keep running into an issue with ref_seq_gb. I'm working with potential viral regions, and using the All_virus blast db as is (No additional sequences added), have you ran into this particular issue before?

Searching local blast database at all_virus.fasta
Warning: [blastn] Examining 5 or more matches is recommended
Traceback (most recent call last):
File "vapid.py", line 965, in
meta_list[x], coverage_list[x], sbt_file_loc, full_name_list[x],nuc_acid_type)
File "vapid.py", line 630, in annotate_a_virus
name_of_virus, our_seq, ref_seq, ref_accession, need_to_rc = blast_n_stuff(strain, strain + SLASH + strain + '.fasta')
File "vapid.py", line 185, in blast_n_stuff
record = Entrez.read(Entrez.esearch(db='nucleotide', term=ref_seq_gb))
UnboundLocalError: local variable 'ref_seq_gb' referenced before assignment

nboundLocalError: local variable 'name_of_virus' referenced before assignment for Insect virus

Hi sir, I am using Vapid3 to annotate the insect virus following the command

"python vapid3.py /media/shanmu/shanmu_data/Sanjay_Viral_whole_genome/denovo_with_reference_alignment_27th_Nov_2021/Final_for_paper/NNG1/NNG1_Picard_VirgenA/out_NNg1/ragtag_scaffold.fasta IIPR05_template.sbt --online --r AP010907.1"

I have checked the availability of the reference virus (AP010907.1) in .nhr and .nin files/databases but it was not available.

when I run the command, it throws an error like below.

AP010907.1 was the selected reference
Traceback (most recent call last):
File "vapid3.py", line 968, in
strain2species[virus_strain_list[x]] = annotate_a_virus(virus_strain_list[x], virus_genome_list[x],
File "vapid3.py", line 634, in annotate_a_virus
name_of_virus, our_seq, ref_seq, ref_accession, need_to_rc = blast_n_stuff(strain, strain + SLASH + strain + '.fasta')
File "vapid3.py", line 210, in blast_n_stuff
print(name_of_virus + ' was the parsed name of the virus')
UnboundLocalError: local variable 'name_of_virus' referenced before assignment

what to do if the selected reference are not available in .nin and .nhr?

pls help me to solve the issue

running VAPiD throws IndexError: list index out of range

Hi Ryan,

I ran the command below
python3 vapid3.py WUSTL_000226.consensus.fa template.sbt --spell_check --all

But it throws this Error:

Searching local blast database at all_virus.fasta
Warning: [blastn] Examining 5 or more matches is recommended
KJ473815.1 was the selected reference
BtRs-BetaCoV/GX2013, was the parsed name of the virus
Aligning reference and query...
Traceback (most recent call last):
  File "vapid3.py", line 969, in <module>
    meta_list[x], coverage_list[x], sbt_file_loc, full_name_list[x],nuc_acid_type)
  File "vapid3.py", line 634, in annotate_a_virus
    name_of_virus, our_seq, ref_seq, ref_accession, need_to_rc = blast_n_stuff(strain, strain + SLASH + strain + '.fasta')
  File "vapid3.py", line 253, in blast_n_stuff
    if '_R_' in ali_list[1]:
IndexError: list index out of range

It would be great to receive assistance on this, thanks!
-Ana

Create config.yaml for easy conda install

Hi,

I would like to recommend you include a yaml file in the github directory to allow for easy installation of VAPiD. There's a number of dependencies, which I think conda can install all of them.

To install from the yaml file:

conda env create -f conda.yaml

contents of conda.yaml:

name: vapid
channels:
  - conda-forge
  - bioconda
dependencies:
  - python=3.9
  - pip
  - biopython
  - numpy
  - tbl2asn
  - mafft
  - blast

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.