gdko / avp Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 2.0 3.24 MB

Automatic evaluation of HGTs

License: GNU General Public License v3.0

Python 100.00%

bioinformatics genomics horizontal-gene-transfer phylogenetics

avp's People

Contributors

Stargazers

Watchers

Forkers

xingxingshen carolebelliardo

avp's Issues

Question about avp prepare output

Hi @GDKO,

it's me again @bheimbu. I have a question concerning the output of avp prepare…

query name	donor	ingroup	AI	HGTindex	query hits number	AHS	outg_pct
229769_8676	Uniref90|UniRef90_A0A4R7CTD7:14:38.4:8.48e-75:250	Uniref90|UniRef90_E7CIQ5:1:45.4:4.62e-100:311	-58.17194306956131	-61.0	500	26007.388157305424	99
229769_3470	::::	Uniref90|UniRef90_A0A834V977:15:42.8:9.47e-248:726	-460.51701859880916	-726.0	500	-41372.008650311625	0
229769_4622	Uniref90|UniRef90_A0A2S9JLU1:1:49.8:1.58e-130:395	Uniref90|UniRef90_E7CIQ5:252:43.2:2.87e-103:321	62.766684693571875	74.0	500	42835.27786623576	99
229769_7930	Uniref90|UniRef90_L8GME8:7:50.8:5.29e-130:387	Uniref90|UniRef90_A0A8J2WLW8:39:37.2:4.09e-64:218	151.7133428617915	169.0	500	2411.325978706731	94
229769_4131	Uniref90|UniRef90_UPI00093AA699:24:63.7:1.35e-210:615	Uniref90|UniRef90_A0AA38IBE4:2:77.3:2.96e-260:730	-1.3500311979441904e-10	-115.0	500	140346.83844536747	94
229769_22899	Uniref90|UniRef90_A0A7C3TB98:3:41.8:1.05e-38:149	Uniref90|UniRef90_A0A9Q0N8B2:240:33.5:5.51e-18:90.5	47.70947631887631	58.5	500	2152.0361571608355	99

Why is 229769_3470 excluded from downstream analyses? It's actually a well-annotated contig including several GH2s etc., see below.

Cheers @bheimbu

ps: This also happens for other species I'm studying.

`avp evaluate` runs forever with no progress

Hi @GDKO,

really nice work you did. However, as soon as I run avp evaluate the program runs forever, seemingly without any progress nor end. I try to find HGT events for 845 proteins using Oribatida as EGP and Metazoa as Ingroup.

I have also two questions:

What does the abbreviation EGP stand for?
Actually, I want to know whether proteins/genes have been acquired from fungi and/or bacteria. Which Ingroup should I use to account for this -- or is Metazoa just right?

Cheers Bastian

AttributeError: 'str' object has no attribute 'append'

hello,when I run the prepare.py,I got this problem,how can I solve it?
Traceback (most recent call last):
File "/share/home/bgi_cwangzf/softare/AvP/depot/prepare.py", line 34, in
from depot.PetIO import check_programs, get_outdir, progress, open_file
File "/share/home/bgi_cwangzf/miniconda3/lib/python3.10/site-packages/depot/init.py", line 25, in
from .storage import StorageWrapper
File "/share/home/bgi_cwangzf/miniconda3/lib/python3.10/site-packages/depot/storage.py", line 13, in
libcloud.security.CA_CERTS_PATH.append(os.path.join(_base_path, 'cacert.pem'))
AttributeError: 'str' object has no attribute 'append'

how to infer donor species according to result file named fasttree_tree_results.txt_k5.prox.txt

Hi, dear author,
I used AvP to perform genome-wide HGT events identification, and it work well. However, my next focus is how can I summarize the donor species of HGT candidates, from fastree output (phylogeny tree or anything else)?

Thanksa lot !

Error: invalid taxonomic rank:

I am currently having an issue generating a database using Diamond. I have downloaded uniref90 and the taxdump from ncbi's website and run through the setup by copying the commands from the installation tutorial.

When I attempt to make the database using Diamond I get an error of invalid taxonomic rank. I had re-created the error here with a smaller database which only has one protein sequence, but generates the same error.

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
MAX_SHAPE_LEN=19 SEQ_MASK STRICT_BAND
Database input file: uniref90-small.fasta
Opening the database file... [0.001s]
Loading sequences... Sequences = 1, letters = 540, average length = 540
[0.002s]
Masking sequences... [0.002s]
Writing sequences... [0s]
Writing accessions... [0s]
Hashing sequences... [0s]
Loading sequences... [0s]
Writing trailer... [0s]
Accession parsing rules triggered for database seqids (use --no-parse-seqids to disable):
UniRef prefix 0
gi|xxx| prefix 0
xxx| prefix 1
|xxx suffix 0
.xxx suffix 0
:PDB= suffix 0
Loading taxonomy names... [0.743s]
Loaded taxonomy names for 2575973 taxon ids.
Loading taxonomy mapping file... [116.034s]
Joining accession mapping... [26.592s]
Writing taxon id list... [0.002s]

Accession parsing rules triggered for mapping file seqids (use --no-parse-seqids to disable):
UniRef prefix 0
gi|xxx| prefix 0
xxx| prefix 0
|xxx suffix 0
.xxx suffix 0
:PDB= suffix 0
Error: Invalid taxonomic rank:

I am unsure if it is a problem with the taxdump files, the un.taxids file or something else. Any help in this matter would be most wonderful

how is the tree made after classification

Dear @GDKO ,

Thank you for the great tool. I suppose the tree I should be looking at is in the classification output directory with the extension .nexus. My question is how the tree is made? Is it a multiple sequence alignment of genes identified by BLAST included in HGTs calculation? Let's say if different bacterial strains of the same species are on the tree, does that mean because the % identity obtained from BLAST are different from the query?

calculate ai output file empty

Hello,

Thank you for such a nice program. I am preparing the input files and I tried to get the ai.out file from my blast searches and the groups.yaml, however the output file is empty. I get something like this:

query name donor ingroup AI HGTindex query hits number AHS outg_pct
gene_1 :::: :::: 0.0 0 500 0 0
gene_2 :::: :::: 0.0 0 115 0 0
gene_3 :::: :::: 0.0 0 608 0 0
gene_4 :::: :::: 0.0 0 511 0 0
gene_5 :::: :::: 0.0 0 43 0 0

Do you know what might be wrong?

Thanks

UnboundLocalError: local variable 'query_hit_id' referenced before assignment

Hello, I am writing to report an error.

avp prepare -a ACAB.blast_ai.out -o . -f ACAB_host_genome.faa -b ACAB.blast -x group.yaml -c config.yaml
[+] Setting up
[!] Selected 1072 HGT candidates
[+] Parsing Blast file and grouping similar queries
Traceback (most recent call last):
  File "/home/c/c-liu/miniconda3/envs/avp/AvP/avp", line 6, in <module>
    main()
  File "/home/c/c-liu/miniconda3/envs/avp/AvP/depot/interface.py", line 29, in main
    prepare.main()
  File "/home/c/c-liu/miniconda3/envs/avp/AvP/depot/prepare.py", line 171, in main
    if "@" in query_hit_id:
UnboundLocalError: local variable 'query_hit_id' referenced before assignment

Sincerely,

Cong

taxonomy_nexus.txt file empty

Hello,

Thanks for this tool. I am running the pipeline using blast and the nr database. The pipeline seems to work but all my HGT candidates are classified as UNKNOWN. It seems there is an error with the avp prepare module because the file taxonomy_nexus.txt is empty. Do you know what might be the issue?

Thanks

I don't want to exclude any species.

My target species is Saccharomyces cerevisiae, and I want to detect any horizontal gene transfer from other species, including other species in the Saccharomyces genus. I set groups.yaml to

Ingroup:
4932: S.cerevisiae

How should I fill in the EGP field? I don't want to exclude any species.

Clarification on HGT call

Hi,

I am interested in identifying HGT events in Dinophyceae, specifically non-eukaryotic gene transfers in Dinophyceae.

So, I set up groups.yaml as-

Ingroup:
2759: Eukaryota
EGP:
2864: Dinophyceae

While looking at the putative HGTs identified by AvP, I wasn't sure how AvP tagged a protein as HGT. I have included a screehshot of the tree generated by AvP.

As you can see in the tree , query protein sequence "50_DN47782_c0_g1_i1" is in a clade with other Dinophyceae members and the sister taxon to that clade is another eukaryote (YP_009033831), which is a streptophyte algae. All other members are bacterial species. So, since a homolog of that gene is already present in other eukaryote, how can it be a HGT specific to Dinophyceae. I am little confused here.

Best regards,
Bikash

Evaluating intrakingdom HGT and adding lower toxonomic ranks to classification.txt

Hello,

Thanks again for your great software. I am wondering if your software is recommended for evaluating intrakingdom HGT events. For example, can I determine if there has between HGT between distantly related fungi? Is it sufficient to specificy my fungal order or class as the ingroup rather than all of fungi as the ingroup? I did try this and it seems to work OK... just wondering if you have any tips on this approach. Would it help to modify classification.txt?

Thanks,
Alden

Error running classify, IndexError: list index out of range

Hi, I am getting an error when running downstream analysis, classify.
I tried using both sample.classification_ingroup_Metazoa.txt and a custom classification file.
Evaluate works fine.

[+] Setting up
[+] Classifying Trees
Traceback (most recent call last):
File "/home/y/yong-phua/AvP/avp", line 6, in
main()
File "/home/y/yong-phua/AvP/depot/interface.py", line 35, in main
classify.main()
File "/home/y/yong-phua/AvP/depot/classify.py", line 108, in main
final_nodes = classify_tree(tree[gene],gene+"@StudiedOrganism",full_ranks, complex_per_node)
File "/home/y/yong-phua/AvP/depot/PetNexus.py", line 212, in classify_tree
per_of_most_common_rank = (final_nodes.most_common(1)[0][1]/sum(final_nodes.values()))*100
IndexError: list index out of range

Cannot run program AvP - Uniref database in configuration file

Hello,

I have installed the databases and the programs required but I do not see any executable avp in the conda environment or in the subdirectories of AvP folder. How does then someone start the program (prepare mode)?

In addition the config.yaml file only contains paths to databases nr and swissprot but I have used uniref50. How can I specify this in the config file? Is the nr database necessary to be setup if uniref50 was used?

Thanks

AvP AI index different from Alienness

Hi there,
Thank you for making this amazing tool to detect HGTs. I have a quick question about the AI index calculated from AvP. I'm looking for horizontally transferred genes in aphids. Potential donors of interest are bacteria, fungi, and viruses. So I set the groups.yaml as follows. Ingroups are metazoa. And I wanted to exclude all the hits to arthropods.

---
Ingroup:
33208: Metazoa

EGP:
6656: Arthropoda

The resulting AI from AvP was 460.51701859880916, indicating it's likely a horizontally transferred gene.

Then I used the same input BLAST result and same setting for Alienness as in AvP, i.e. ingroups are metazoa. And I wanted to exclude all the hits to arthropods. However, based on Alienness results, the same gene is not an HGT.

Here're the parameters I used for Alienness.

Parameters
-Taxon group of interest: 33208-Metazoa;
-Taxon group(s) to exclude: 6656-Arthropoda;
-Taxon group(s) to classify: no taxa

I was wondering why the AI from Alienness and AvP are different? Did I set the AvP groups.yaml correctly?

Thank you for any suggestions!

Error running avp detect: IndexError: list index out of range

Hi @GDKO,

it's me again Bastian. I've new data and re-did all the genome annotations with Metaeuk. Now I'm encountering following problem:

$ ../avp detect -i data/ACHCO@229769@1/mafftgroups/ -o data/ACHCO@229769@1 -g data/ACHCO@229769@1/groups.tsv -t data/ACHCO@229769@1/tmp/taxonomy_nexus.txt -c config.yaml
[+] Setting up
[!] Found 328 groups and 509 genes
[+] Reconstructing phylogenies with FastTree
[x] 	99%
[x] 	100%
[+] Analyzing fasttree results
Traceback (most recent call last):
  File "/scratch1/users/bheimbu/avp/latest/../avp", line 7, in <module>
    main()
  File "/scratch1/users/bheimbu/avp/depot/interface.py", line 32, in main
    detect.main()
  File "/scratch1/users/bheimbu/avp/depot/detect.py", line 227, in main
    result_tree = analyze_tree(phylogeny_file,gene+"@StudiedOrganism", node_support, complex_per)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/users/bheimbu/avp/depot/PetNexus.py", line 117, in analyze_tree
    return analysis(gene_tree, full_name_studied_gene, complex_per)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/users/bheimbu/avp/depot/PetNexus.py", line 136, in analysis
    sister_nodes_list = get_sister_classification(gene_tree,full_name_studied_gene,list_of_ranks)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/users/bheimbu/avp/depot/PetNexus.py", line 224, in get_sister_classification
    gene_node = gene_tree.get_leaves_by_name(full_name_studied_gene)[0]
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

Not sure, what's wrong here…

Cheers Bastian

HGT candidates in M.incognita

Hi, thanks for developing this tool.
I predicted HGTs in about 40000 proteins of M.incognita using Avp (1.0.0) ,while i got about 4000 proteins(1/10) with a UNKNOWN tag. All of candidates are UNKNOWN. The .ai file are generated using diamond and uniref90 database.
Is this correct or credible?
And how many high-confidence HGTs in Mi?

Database download error [Swissprot]

Hello, I am writing because I am having issues downloading a database.

While downloading Swissprot, I encountered an error while performing the makedb task in Diamond: 'Error: Invalid option: taxonnames'.

Could you please advise me on how to resolve this issue?

Thank you.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xee in position 2: invalid continuation byte

Hello,

I am trying to run the prepare module but I get the following error:
$ ./avp prepare -a proteins.blastp.diamond.txt_ai.out -o results -f proteins.fa -b blastp.diamond.txt -x groups.yaml -c config.yaml

Output:
[+] Setting up
[!] Selected 6750 HGT candidates
[+] Parsing Blast file and grouping similar queries
[!] Formed 4056 groups
[+] Extracting hits from DB
Traceback (most recent call last):
File "./avp", line 6, in
main()
File "/media/urbe/MyCDrive1/Alex/AVP/AvP/depot/interface.py", line 29, in main
prepare.main()
File "/media/urbe/MyCDrive1/Alex/AVP/AvP/depot/prepare.py", line 229, in main
for record in SeqIO.parse(handle,"fasta"):
File "/home/urbe/anaconda3/envs/avp/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 74, in next
return next(self.records)
File "/home/urbe/anaconda3/envs/avp/lib/python3.7/site-packages/Bio/SeqIO/FastaIO.py", line 198, in iterate
for title, sequence in SimpleFastaParser(handle):
File "/home/urbe/anaconda3/envs/avp/lib/python3.7/site-packages/Bio/SeqIO/FastaIO.py", line 47, in SimpleFastaParser
for line in handle:
File "/home/urbe/anaconda3/envs/avp/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xee in position 2: invalid continuation byte

The content of the config.yaml file is the following:
max_threads: 10

-> # DB path
sp_fasta_path: /media/urbe/MyCDrive1/Alex/AVP/uniref50/uniref50.fasta.dmnd
nr_db_path: /media/urbe/MyCDrive1/Alex/09.BlobTools/blobtoolkit/nt

-> ## Algorithm options
-> # prepare
ai_cutoff: 0
percent_identity: 100
cutoffextend: 20 # when toi hit is found, we take this hit + n hits
trimal: false
min_num_hits: 4 # select queries with at least that many blast hits
percentage_similar_hits: 0.7 # group queries based on this
mode: sp # use nr for nr database, use sp for swissprot database
-> # detect, clasify, evaluate
fastml: true # Use fasttree instead of IQTree
node_support: 0 # nodes below that number will collapse
complex_per_toi: 20 # if H/(H+T) smaller than this then node is considered T
complex_per_hgt: 80 # if H/(H+T) greater than this then node is considered H
complex_per_node: 90 # if node contains percent number of this category, it is assigned

-> # Program specific options
mafft_options: '--anysymbol --auto'
trimal_options: '-automated1'

-> #IQ-Tree
iqmodel: '-mset WAG,LG,JTT,DCMUT,JTTDCMUT -AICc -mrate E,I,G,I+G,R -madd LG4X'
ufbootstrap: 1000
iq_threads: 10

What might be the issue?
Thanks

Error parsing GFF3 file for AvP hgt local score analysis

Your software has worked really well for me! I'm now trying to run the HGT local score script but am getting errors parsing the GFF3 file.

Traceback (most recent call last):
  File "/nfs/turbo/lsa-tyjames/mycology/Alden/software/AvP/aux_scripts/hgt_local_score.py", line 169, in <module>
    main()
  File "/nfs/turbo/lsa-tyjames/mycology/Alden/software/AvP/aux_scripts/hgt_local_score.py", line 86, in main
    gene_scaf = gene_location[gene]
                ~~~~~~~~~~~~~^^^^^^
KeyError: 'Hydnotrya_cerebriformis_MICH67763_scaffold_3601_FUN_011721'

I wasn't sure if the GFF3 file should be the full annotations (gene, mRNA, exon, and CDS) or just the lines that are annotated as "gene" or just those annotated as "mRNA". I've tried all the different combos and it gives the same error. When I grep that ID it is found as a match in the file. My GFF3 has 9 columns in the same format specified so I'm not sure why it can't find the right line.

something wrong when using diamond with nr database

Thanks for your wonderful work on AvP. When I used diamond with Uniprot database, it went smoothly. But when I used diamond with nr database, it seems something goes wrong.
Here are some of my codes:
nohup diamond blastp -q ../Hrob.protein.fa -d /tools/diamond/nr_28July2023.dmnd --evalue 1e-5 --max-target-seqs 500 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids --out nr_similarity.out -p 20 > Hrob.log 2>&1 &
(the diamond result seems ok)

Then: nohup ../aux_scripts/calculate_ai.py -i Hrob_nr_similarity.out -x sample.groups.yaml &
(The log file showed: [!] Skipped 469494 hits) (When I used Uniprot database, no hits were skipped)

Then I tried to use avp prepare: nohup ../avp prepare -a Hrob_nr_similarity.out_ai.out -o Hrob_prepare -f Hrob.protein.fa -b Hrob_nr_similarity.out -x sample.groups.yaml -c sample.config.yaml &
The log file showed:
[+] Setting up
[!] Selected 1351 HGT candidates
[+] Parsing hits file and grouping similar queries
[!] Formed 980 groups
[+] Extracting hits from DB
[+] Writing fasta files
[!] Skipped 109465 hits and 0 taxids.
[+] Aligning fasta files
........
[!] Finished with 1351 HGT candidates in 980 groups
(When I used Uniprot database, no hits were skipped)

Then I “cd Hrob_prepare/tmp” :
-rw-rw-r-- 1 1416512 Sep 7 17:04 extract_id.txt
-rw-rw-r-- 1 0 Sep 7 17:04 setblast.fa
-rw-rw-r-- 1 162 Sep 7 17:04 setblast.log
-rw-rw-r-- 1 0 Sep 7 17:04 setblast.perf
-rw-rw-r-- 1 0 Sep 7 17:04 taxonomy_nexus.txt
Then I “less setblast.log” : BLAST Database error: No alias or index file found for protein database [/tools/diamond/nr_28July2023.dmnd] in search path [/home/zhen/AvP/Hrob_nr_input.files::]

Then I tried: nohup ../avp detect -i ./Hrob_prepare/mafftgroups/ -o Hrob_detect -g ./Hrob_prepare/groups.tsv -t ./Hrob_prepare/tmp/taxonomy_nexus.txt -c sample.config.yaml &
The file fasttree_general_results.txt showd:
Proteins analyzed : 1351
Proteins with no Ingroup : 0
Proteins with only Ingroup : 0

Unknown Topology : 1351
No HGT support : 0
Complex topology : 0
Strong HGT support : 0
(seems really weird)

Here’s part of my config file:

DB path

blast_db_path: /tools/diamond/nr_28July2023.dmnd # blast: change to the local blast_db path
fasta_path: /tools/diamond/nr.gz # diamond: change to the local fasta path for sp, ur90, or custom database
mode: blast # use blast for blast database, use sp for swissprot database, ur90 for uniref90 or custom database
data_type: AA # data type DNA, AA

I found from your published paper that you also tested with the nr database. May I ask how did you do that? Is it necessary to use blastp instead of diamond when using the nr database, as the “Documentation: github wiki” says?

I also extracted a small part of data from nr.gz to test with blastp but not diamond, the code is as follows:
makeblastdb -in nr.test.fa -dbtype prot -parse_seqids -out nr.test
blastp -query Hrob.proteins.fa -db nr.test -outfmt '6 std staxids' -seg no -evalue 1e-5 -out Hrob.similarity.out
But there was a problem with the blastp results: column 13 staxids are all 0.
So I would like to ask if it is necessary to "update_blastdb.pl --decompress nr" when building nr database, and should not download nr fasta file and manually makeblastbd.

Thanks!

Best,

kenlly

HI, i wanna know whats the meaning of "Ingroup" and "EGP"

As you said, you want to find HGTs from "Non Metazoa" species to your species
Your example:
Ingroup:
33208: Metazoa
EGP:
6300: Tylenchida
i check that "Metazoa" ranks "kingdom" with "Tylenchida" ranks lower than "kingdom"
But the "Ingroup" is for which is the target of the HGT and "EGP" is for which taxonomic groups to exclude from calculations
Whats my question: the rank of "Ingroup" is whether "not investigation" or "investigation"
My output shows that:"Ingroup" means "not investigation rank" which contradicts your explanation.

acc2taxid.py with uniref50 - ValueError: not enough values to unpack (expected 2, got 1)

Hello,

I am trying to create the tax ID file using the acc2taxid.py script and uniref50. I run:
$ python3 AvP/aux_scripts/acc2taxid.py -m uniref -i uniref50/uniref50.fasta.gz > Adineta_ricciae/sp.taxids

and get:
Traceback (most recent call last):
File "AvP/aux_scripts/acc2taxid.py", line 50, in
main()
File "AvP/aux_scripts/acc2taxid.py", line 43, in main
tr, id = record.name.split("|")
ValueError: not enough values to unpack (expected 2, got 1)

When using the swissprot mode the script terminates normally.
$ python3 AvP/aux_scripts/acc2taxid.py -m swissprot -i swissprot/uniprot_sprot.fasta.gz > sp.taxids
$

Best

TypeError: can only concatenate str (not "list") to str

Hello,

Thank you for your quick response, and here is another error...

Traceback (most recent call last):
  File "/home/c/c-liu/miniconda3/envs/avp/AvP/avp", line 6, in <module>
    main()
  File "/home/c/c-liu/miniconda3/envs/avp/AvP/depot/interface.py", line 29, in main
    prepare.main()
  File "/home/c/c-liu/miniconda3/envs/avp/AvP/depot/prepare.py", line 219, in main
    blastdbcmd_command = 'blastdbcmd -db '+ config_opts["nr_db_path"] + ' -dbtype prot -entry_batch ' +  extract_id_path + ' -target_only -outfmt ">%a@%T\n%s" -logfile ' + setnrlog_path + ' -out ' + setnrfa_path
TypeError: can only concatenate str (not "list") to str

and here is my config file:

---
max_threads: 64

# DB path
nr_db_path: [/apps/unit/BioinfoUgrp/DB/diamondDB/ncbi/238/nr]

## Algorithm options
# prepare
ai_cutoff: 0
percent_identity: 100
cutoffextend: 20    # when toi hit is found, we take this hit + n hits
trimal: false
min_num_hits: 4   # select queries with at least that many blast hits
percentage_similar_hits: 0.7  # group queries based on this
mode: nr    # use nr for nr database, use sp for swissprot database
# detect, clasify, evaluate
fastml: true  # Use fasttree instead of IQTree
node_support: 0  # nodes below that number will collapse
complex_per_toi: 20   # if H/(H+T) smaller than this then node is considered T
complex_per_hgt: 80   # if H/(H+T) greater than this then node is considered H
complex_per_node: 90  # if node contains percent number of this category, it is assigned

# Program specific options
mafft_options: '--anysymbol --auto'
trimal_options: '-automated1'

#IQ-Tree
iqmodel: '-mset WAG,LG,JTT -AICc -mrate E,I,G,R'
ufbootstrap: 1000
iq_threads: 4

Also, I'd like to confirm what database means in config file. Is AvP expecting fasta sequences of nr as database? Also, I do not understand why there is "blastdbcmd" in the error message. It would be great if AvP can take pre-computed blast/DIAMOND database, and do similarity search with DIAMOND. In my school HPC, individual user does not have enough storage to keep nr fasta, but the managers are maintaining pre-computed nr database for blast/diamond.

Sincerely,

Cong Liu

gdko / avp Goto Github PK

avp's People

Contributors

Stargazers

Watchers

Forkers

avp's Issues

DB path

Recommend Projects

Recommend Topics

Recommend Org