apcamargo / genomad Goto Github PK
View Code? Open in Web Editor NEWgeNomad: Identification of mobile genetic elements
Home Page: https://portal.nersc.gov/genomad/
License: Other
geNomad: Identification of mobile genetic elements
Home Page: https://portal.nersc.gov/genomad/
License: Other
Dear @apcamargo,
thank you for developing so great tool!
I used it to do viral genome taxonomic assignment:
genomad end-to-end --min-score 0.8 --cleanup --splits 16 \
results/09.dereplicate/genomes/virome/representative/vMAGs_hmq.megahit.rep.fa.gz \
genomad_output ~/databases/ecogenomics/geNomad/genomad_db \
>genomad.log 2>&1
Here is the summary of results:
➤ zcat results/09.dereplicate/genomes/virome/representative/vMAGs_hmq.megahit.rep.fa.gz | rg -c "^>"
8439
➤ wc -l genomad_output/vMAGs_hmq.megahit.rep_summary/vMAGs_hmq.megahit.rep_plasmid_summary.tsv
483 genomad_output/vMAGs_hmq.megahit.rep_summary/vMAGs_hmq.megahit.rep_plasmid_summary.tsv
➤ wc -l genomad_output/vMAGs_hmq.megahit.rep_summary/vMAGs_hmq.megahit.rep_virus_summary.tsv
4933 genomad_output/vMAGs_hmq.megahit.rep_summary/vMAGs_hmq.megahit.rep_virus_summary.tsv
Since all vMAGs were identified by Virsorter2 and phamb, and have complete, high or medium quality evaluated by CheckV,
below is what I don't understand currently:
Thanks a lot!
I was wondering if you would know of a smaller database in the size of MB that could be used to test this tool?
Thanks if possible!
Hello, I got a error as " con1.fa is either empty or contains multiple entries with the same identifier. Please check your input FASTA file and execute genomad annotate again."
I noticed that genomad could used for metagenomic data in the website, however when I try to put this (con1.fa) as the input file, it doesn't work. In my mind, the identifier of every read could be repeat. Should I change all the identifiers?
could you provide a advice to solve this problem? I will really really appreciate about this. Best wishes!
Hello! I am currently trying to utilize the genomad annotate
module to annotate a .fna file of Megahit assembled contigs. After downloading the database and attaching a unique identifier to each .fna headline (because they all started with k127), i ran the following command:
genomad annotate final_vOTUs_numbered.fna ./genomad_output ./genomad_db
I get this error directly from genomad:
Traceback (most recent call last):
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 137, in run_mmseqs2
subprocess.run(command, stdout=fout, stderr=fout, check=True)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mmseqs', 'search', PosixPath('genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/query_db/query_db'), PosixPath('genomad_db/genomad_db'), PosixPath('genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/search_db/search_db'), PosixPath('genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/tmp'), '--threads', '48', '-s', '4.2', '--cov-mode', '1', '-c', '0.2', '-e', '0.001', '--split', '0', '--split-mode', '0', '--max-seqs', '1000000', '--min-ungapped-score', '20', '--max-rejected', '225']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/hhallow1/.conda/envs/genomad/bin/genomad", line 8, in
sys.exit(cli())
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 425, in annotate
genomad.annotate.main(
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/genomad/modules/annotate.py", line 202, in main
mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 140, in run_mmseqs2
raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs search genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/query_db/query_db genomad_db/genomad_db genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/search_db/search_db genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/tmp --threads 48 -s 4.2 --cov-mode 1 -c 0.2 -e 0.001 --split 0 --split-mode 0 --max-seqs 1000000 --min-ungapped-score 20 --max-rejected 225' failed.
Here is the output from mmseqs2:
Converting sequences
[=====
Time for merging to query_db_h: 0h 0m 0s 79ms
Time for merging to query_db: 0h 0m 0s 31ms
Database type: Aminoacid
Time for processing: 0h 0m 1s 380ms
search genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/query_db/query_db genomad_db/genomad_db genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_m
mseqs2/search_db/search_db genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/tmp --threads 48 -s 4.2 --cov-mode 1 -c 0.2 -e 0.001 --split 0 --split-mode 0 --max-seq
s 1000000 --min-ungapped-score 20 --max-rejected 225MMseqs Version: 13.45111
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
Add backtrace false
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0.2
Coverage mode 1
Max sequence length 65535
Compositional bias 1
Max reject 225
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Threads 48
Compressed 0
Verbosity 3
Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out
Sensitivity 4.2
k-mer length 5
k-score 2147483647
Alphabet size nucl:5,aa:21
Max results per query 1000000
Split database 0
Split mode 0
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask lower case residues 0
Minimum diagonal score 20
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Maximum seq. id. threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files falseprefilter genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/query_db/query_db genomad_db/genomad_db genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/tmp/4444936417411739143/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -s 4.2 -k 5 --k-score 2147483647 --alph-size nucl:5,aa:21 --max-seq-len 65535 --max-seqs 1000000 --split 0 --split-mode 0 --split-memory-limit 0 -c 0.2 --cov-mode 1 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-lower-case 0 --min-ungapped-score 20 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 48 --compressed 0 -v 3
Query database size: 56046 type: Aminoacid
Estimated memory consumption: 1G
Target database size: 227897 type: Profile
Process prefiltering step 1 of 1Index table k-mer threshold: 104 at k-mer size 5
Index table: counting k-mers
[=================================================================] 227.90K 10s 479ms
Index table: Masked residues: 0
No k-mer could be extracted for the database genomad_db/genomad_db.
Maybe the sequences length is less than 14 residues.
Error: Prefilter died
the .fna file, the genomad_output
directory and the genomad_db
directory are all in the same directory, and i am running the command from that directory as well. Any ideas how to fix this? Thanks!!
Hello. I've been reading the documentation and the use of --enable-score-calibration is not clear to me. When I apply it in the end-to-end pipeline, it tells me that I have less than 1000 sequences and that another option will be used by default if I do not apply an automatic option.
I am working with prokaryotic genome assemblies (less than 1000 contigs) and I would like to know the estimated probabilities. What is the most recommended option? Should I do genomad end-to-end without any tags?
Thanks in advanced!
Hello,
Thank you for making this. I am looking forward to using it. I have received the following error:
Invalid value for 'INPUT': Path 'sample_1.fa.gz' does not exist.
I have entered the correct path and also tried moving the file to my working directory with no avail. I am using singularity to run this in Docker, if that may pose any issues.
Thank you for your time
I downloaded a multi-fasta file from Genbank and passed it as input to genomad. I only get expected results with genomad when I rename the fasta headers in the file.
Rename command: awk '/^>/{print ">Seq"++i; next}{print}' input.fasta > output.fasta
Original header:
>gi|29366675|ref|NC_000866.4| Enterobacteria phage T4, complete genome
New header:
>Seq1
If I don't rename fasta headers prior to running genomad, the *taxonomy.tsv file created by 'genomad end-to-end` is empty. Fasta headers longer than 30 characters or with white spaces can cause bugs in downstream processing because some software tools have limitations on the maximum length of header lines they can handle, or they may use whitespace as a delimiter to parse the header line and extract specific information. As a result, headers that exceed these limits may cause errors or unexpected behavior in downstream processing tools.
Hi,
following my previous issue from the run with a pretty big input file, I have tried to run a much smaller assembly (the input fa.gz file is about 65 MB), but there is another error this time:
[14:10:19] Executing genomad annotate.
[14:10:19] Creating the ANT01_genomad_output/contigs_annotate directory.
Traceback (most recent call last):
File "/projappl/project_2006548/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/genomad/cli.py", line 1208, in end_to_end
ctx.invoke(
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/genomad/cli.py", line 425, in annotate
genomad.annotate.main(
File "/opt/conda/lib/python3.10/site-packages/genomad/modules/annotate.py", line 178, in main
prodigal_obj.run_parallel_prodigal(threads)
File "/opt/conda/lib/python3.10/site-packages/genomad/prodigal.py", line 92, in run_parallel_prodigal
self._append_prodigal_fasta(current_file_path, protid_start)
File "/opt/conda/lib/python3.10/site-packages/genomad/prodigal.py", line 42, in _append_prodigal_fasta
match.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
Could you please help?
Best,
Tatiana
I'm getting the following error when installing genomad in a fresh environment:
OSError: Could not find/load shared object file: libllvmlite.so
Error was: libstdc++.so.6: failed to map segment from shared object
A couple of things I've tried but it was the same situation:
What ended up working was swapping the llvmlite
install from conda-forge
to numba
Here's the log:
(base) [jespinoz@login02 jcl110]$ mamba create -n genomad_env -c bioconda genomad
__ __ __ __
/ \ / \ / \ / \
/ \/ \/ \/ \
███████████████/ /██/ /██/ /██/ /████████████████████████
/ / \ / \ / \ / \ \____
/ / \_/ \_/ \_/ \ o \__,
/ _/ \_____/ `
|/
███╗ ███╗ █████╗ ███╗ ███╗██████╗ █████╗
████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
██╔████╔██║███████║██╔████╔██║██████╔╝███████║
██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
██║ ╚═╝ ██║██║ ██║██║ ╚═╝ ██║██████╔╝██║ ██║
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝
mamba (1.2.0) supported by @QuantStack
GitHub: https://github.com/mamba-org/mamba
Twitter: https://twitter.com/QuantStack
█████████████████████████████████████████████████████████████
Looking for: ['genomad']
bioconda/linux-64 Using cache
bioconda/noarch Using cache
conda-forge/linux-64 Using cache
conda-forge/noarch Using cache
pkgs/r/linux-64 No change
pkgs/main/linux-64 No change
pkgs/main/noarch No change
pkgs/r/noarch No change
jolespin/linux-64 No change
qiime2/noarch No change
jolespin/noarch No change
qiime2/linux-64 No change
Transaction
Prefix: /expanse/projects/jcl110/anaconda3/envs/genomad_env
Updating specs:
- genomad
Package Version Build Channel Size
─────────────────────────────────────────────────────────────────────────────────────────────────────
Install:
─────────────────────────────────────────────────────────────────────────────────────────────────────
+ _libgcc_mutex 0.1 conda_forge conda-forge/linux-64 Cached
+ _openmp_mutex 4.5 2_gnu conda-forge/linux-64 Cached
+ _py-xgboost-mutex 2.0 cpu_0 conda-forge/linux-64 8kB
+ absl-py 1.4.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ aiohttp 3.8.3 py310h5764c6d_1 conda-forge/linux-64 Cached
+ aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge/noarch Cached
+ appdirs 1.4.4 pyh9f0ad1d_0 conda-forge/noarch Cached
+ aragorn 1.2.41 hec16e2b_0 bioconda/linux-64 Cached
+ aria2 1.23.0 0 bioconda/linux-64 29MB
+ astunparse 1.6.3 pyhd8ed1ab_0 conda-forge/noarch Cached
+ async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge/noarch Cached
+ attrs 22.2.0 pyh71513ae_0 conda-forge/noarch Cached
+ blinker 1.5 pyhd8ed1ab_0 conda-forge/noarch Cached
+ brotlipy 0.7.0 py310h5764c6d_1005 conda-forge/linux-64 Cached
+ bzip2 1.0.8 h7f98852_4 conda-forge/linux-64 Cached
+ c-ares 1.18.1 h7f98852_0 conda-forge/linux-64 Cached
+ ca-certificates 2022.12.7 ha878542_0 conda-forge/linux-64 Cached
+ cached-property 1.5.2 hd8ed1ab_1 conda-forge/noarch 4kB
+ cached_property 1.5.2 pyha770c72_1 conda-forge/noarch Cached
+ cachetools 5.3.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ certifi 2022.12.7 pyhd8ed1ab_0 conda-forge/noarch Cached
+ cffi 1.15.1 py310h255011f_3 conda-forge/linux-64 Cached
+ charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge/noarch Cached
+ click 8.1.3 unix_pyhd8ed1ab_2 conda-forge/noarch Cached
+ cryptography 39.0.0 py310h34c0648_0 conda-forge/linux-64 Cached
+ flatbuffers 22.12.06 hcb278e6_2 conda-forge/linux-64 Cached
+ frozenlist 1.3.3 py310h5764c6d_0 conda-forge/linux-64 Cached
+ gast 0.4.0 pyh9f0ad1d_0 conda-forge/noarch Cached
+ gawk 5.1.0 h7f98852_0 conda-forge/linux-64 Cached
+ genomad 1.3.3 pyhdfd78af_0 bioconda/noarch Cached
+ gettext 0.21.1 h27087fc_0 conda-forge/linux-64 Cached
+ giflib 5.2.1 h36c2ea0_2 conda-forge/linux-64 Cached
+ google-auth 2.16.0 pyh1a96a4e_1 conda-forge/noarch Cached
+ google-auth-oauthlib 0.4.6 pyhd8ed1ab_0 conda-forge/noarch Cached
+ google-pasta 0.2.0 pyh8c360ce_0 conda-forge/noarch Cached
+ grpcio 1.51.1 py310h4a5735c_1 conda-forge/linux-64 Cached
+ h5py 3.8.0 nompi_py310h0311031_100 conda-forge/linux-64 Cached
+ hdf5 1.12.2 nompi_h4df4325_101 conda-forge/linux-64 Cached
+ icu 70.1 h27087fc_0 conda-forge/linux-64 Cached
+ idna 3.4 pyhd8ed1ab_0 conda-forge/noarch Cached
+ importlib-metadata 6.0.0 pyha770c72_0 conda-forge/noarch Cached
+ joblib 1.2.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ jpeg 9e h166bdaf_2 conda-forge/linux-64 Cached
+ keras 2.11.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ keras-preprocessing 1.1.2 pyhd8ed1ab_0 conda-forge/noarch Cached
+ keyutils 1.6.1 h166bdaf_0 conda-forge/linux-64 Cached
+ krb5 1.20.1 h81ceb04_0 conda-forge/linux-64 Cached
+ ld_impl_linux-64 2.40 h41732ed_0 conda-forge/linux-64 Cached
+ libabseil 20220623.0 cxx17_h05df665_6 conda-forge/linux-64 Cached
+ libaec 1.0.6 hcb278e6_1 conda-forge/linux-64 Cached
+ libblas 3.9.0 16_linux64_openblas conda-forge/linux-64 Cached
+ libcblas 3.9.0 16_linux64_openblas conda-forge/linux-64 Cached
+ libcurl 7.87.0 hdc1c0ab_0 conda-forge/linux-64 Cached
+ libedit 3.1.20191231 he28a2e2_2 conda-forge/linux-64 Cached
+ libev 4.33 h516909a_1 conda-forge/linux-64 Cached
+ libffi 3.4.2 h7f98852_5 conda-forge/linux-64 Cached
+ libgcc 7.2.0 h69d50b8_2 conda-forge/linux-64 Cached
+ libgcc-ng 12.2.0 h65d4601_19 conda-forge/linux-64 Cached
+ libgfortran-ng 12.2.0 h69a702a_19 conda-forge/linux-64 Cached
+ libgfortran5 12.2.0 h337968e_19 conda-forge/linux-64 Cached
+ libgomp 12.2.0 h65d4601_19 conda-forge/linux-64 Cached
+ libgrpc 1.51.1 h4fad500_1 conda-forge/linux-64 Cached
+ libiconv 1.17 h166bdaf_0 conda-forge/linux-64 Cached
+ libidn2 2.3.4 h166bdaf_0 conda-forge/linux-64 Cached
+ liblapack 3.9.0 16_linux64_openblas conda-forge/linux-64 Cached
+ libllvm11 11.1.0 he0ac6c6_5 conda-forge/linux-64 Cached
+ libnghttp2 1.51.0 hff17c54_0 conda-forge/linux-64 Cached
+ libnsl 2.0.0 h7f98852_0 conda-forge/linux-64 Cached
+ libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge/linux-64 Cached
+ libpng 1.6.39 h753d276_0 conda-forge/linux-64 Cached
+ libprotobuf 3.21.12 h3eb15da_0 conda-forge/linux-64 Cached
+ libsqlite 3.40.0 h753d276_0 conda-forge/linux-64 Cached
+ libssh2 1.10.0 hf14f497_3 conda-forge/linux-64 Cached
+ libstdcxx-ng 12.2.0 h46fd767_19 conda-forge/linux-64 Cached
+ libunistring 0.9.10 h7f98852_0 conda-forge/linux-64 Cached
+ libuuid 2.32.1 h7f98852_1000 conda-forge/linux-64 Cached
+ libxgboost 1.7.1 cpu_ha3b9936_0 conda-forge/linux-64 Cached
+ libxml2 2.10.3 h7463322_0 conda-forge/linux-64 Cached
+ libzlib 1.2.13 h166bdaf_4 conda-forge/linux-64 Cached
+ llvmlite 0.39.1 py310h58363a5_1 conda-forge/linux-64 Cached
+ markdown 3.4.1 pyhd8ed1ab_0 conda-forge/noarch Cached
+ markdown-it-py 2.1.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ markupsafe 2.1.2 py310h1fa729e_0 conda-forge/linux-64 Cached
+ mdurl 0.1.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ mmseqs2 14.7e284 pl5321hf1761c0_0 bioconda/linux-64 Cached
+ multidict 6.0.4 py310h1fa729e_0 conda-forge/linux-64 Cached
+ ncurses 6.3 h27087fc_1 conda-forge/linux-64 Cached
+ numba 0.56.4 py310ha5257ce_0 conda-forge/linux-64 Cached
+ numpy 1.23.5 py310h53a5b5f_0 conda-forge/linux-64 Cached
+ oauthlib 3.2.2 pyhd8ed1ab_0 conda-forge/noarch Cached
+ openssl 3.0.7 h0b41bf4_2 conda-forge/linux-64 Cached
+ opt_einsum 3.3.0 pyhd8ed1ab_1 conda-forge/noarch Cached
+ packaging 23.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ perl 5.32.1 2_h7f98852_perl5 conda-forge/linux-64 Cached
+ pip 23.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ pooch 1.6.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ prodigal-gv 2.10.0 h7132678_0 bioconda/linux-64 Cached
+ protobuf 4.21.12 py310heca2aa9_0 conda-forge/linux-64 Cached
+ py-xgboost 1.7.1 cpu_py310hd1aba9c_0 conda-forge/linux-64 Cached
+ pyasn1 0.4.8 py_0 conda-forge/noarch Cached
+ pyasn1-modules 0.2.7 py_0 conda-forge/noarch Cached
+ pycparser 2.21 pyhd8ed1ab_0 conda-forge/noarch Cached
+ pygments 2.14.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ pyjwt 2.6.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ pyopenssl 23.0.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ pysocks 1.7.1 pyha2e5f31_6 conda-forge/noarch Cached
+ python 3.10.8 h4a9ceb5_0_cpython conda-forge/linux-64 Cached
+ python-crfsuite 0.9.8 py310hbf28c38_1 conda-forge/linux-64 Cached
+ python-flatbuffers 23.1.21 pyhd8ed1ab_0 conda-forge/noarch Cached
+ python_abi 3.10 3_cp310 conda-forge/linux-64 Cached
+ pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge/noarch Cached
+ re2 2023.02.01 hcb278e6_0 conda-forge/linux-64 Cached
+ readline 8.1.2 h0f457ee_0 conda-forge/linux-64 Cached
+ requests 2.28.2 pyhd8ed1ab_0 conda-forge/noarch Cached
+ requests-oauthlib 1.3.1 pyhd8ed1ab_0 conda-forge/noarch Cached
+ rich 13.3.1 pyhd8ed1ab_1 conda-forge/noarch Cached
+ rich-click 1.6.1 pyhd8ed1ab_0 conda-forge/noarch Cached
+ rsa 4.9 pyhd8ed1ab_0 conda-forge/noarch Cached
+ scikit-learn 1.2.1 py310h209a8ca_0 conda-forge/linux-64 Cached
+ scipy 1.10.0 py310h8deb116_0 conda-forge/linux-64 Cached
+ setuptools 67.1.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ six 1.16.0 pyh6c4a22f_0 conda-forge/noarch Cached
+ snappy 1.1.9 hbd366e4_2 conda-forge/linux-64 Cached
+ sqlite 3.40.0 h4ff8645_0 conda-forge/linux-64 Cached
+ taxopy 0.11.0 pyhdfd78af_0 bioconda/noarch Cached
+ tensorboard 2.11.2 pyhd8ed1ab_0 conda-forge/noarch Cached
+ tensorboard-data-server 0.6.1 py310h600f1e7_4 conda-forge/linux-64 Cached
+ tensorboard-plugin-wit 1.8.1 pyhd8ed1ab_0 conda-forge/noarch Cached
+ tensorflow 2.11.0 cpu_py310hd1aba9c_0 conda-forge/linux-64 Cached
+ tensorflow-base 2.11.0 cpu_py310hc9b7e7f_0 conda-forge/linux-64 Cached
+ tensorflow-estimator 2.11.0 cpu_py310hfed9998_0 conda-forge/linux-64 Cached
+ termcolor 2.2.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge/noarch Cached
+ tk 8.6.12 h27826a3_0 conda-forge/linux-64 Cached
+ typing-extensions 4.4.0 hd8ed1ab_0 conda-forge/noarch Cached
+ typing_extensions 4.4.0 pyha770c72_0 conda-forge/noarch Cached
+ tzdata 2022g h191b570_0 conda-forge/noarch Cached
+ urllib3 1.26.14 pyhd8ed1ab_0 conda-forge/noarch Cached
+ werkzeug 2.2.2 pyhd8ed1ab_0 conda-forge/noarch Cached
+ wget 1.20.3 ha35d2d1_1 conda-forge/linux-64 Cached
+ wheel 0.38.4 pyhd8ed1ab_0 conda-forge/noarch Cached
+ wrapt 1.14.1 py310h5764c6d_1 conda-forge/linux-64 Cached
+ xgboost 1.7.1 cpu_py310hd1aba9c_0 conda-forge/linux-64 Cached
+ xz 5.2.6 h166bdaf_0 conda-forge/linux-64 Cached
+ yarl 1.8.2 py310h5764c6d_0 conda-forge/linux-64 Cached
+ zipp 3.12.0 pyhd8ed1ab_0 conda-forge/noarch Cached
+ zlib 1.2.13 h166bdaf_4 conda-forge/linux-64 Cached
Summary:
Install: 147 packages
Total download: 29MB
─────────────────────────────────────────────────────────────────────────────────────────────────────
Confirm changes: [Y/n] y
_py-xgboost-mutex 7.9kB @ 44.8kB/s 0.2s
cached-property 4.1kB @ 21.0kB/s 0.2s
aria2 29.3MB @ 38.9MB/s 0.8s
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
To activate this environment, use
$ mamba activate genomad_env
To deactivate an active environment, use
$ mamba deactivate
(base) [jespinoz@login02 jcl110]$ conda activate genomad_env
(genomad_env) [jespinoz@login02 jcl110]$ genomad -h
Traceback (most recent call last):
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/lib/python3.10/site-packages/llvmlite/binding/ffi.py", line 160, in <module>
lib = ctypes.CDLL(str(_lib_handle.__enter__()))
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libstdc++.so.6: failed to map segment from shared object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/bin/genomad", line 6, in <module>
from genomad.cli import cli
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/lib/python3.10/site-packages/genomad/__init__.py", line 5, in <module>
from genomad.modules import (
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/lib/python3.10/site-packages/genomad/modules/aggregated_classification.py", line 4, in <module>
from genomad import sequence, utils
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/lib/python3.10/site-packages/genomad/sequence.py", line 9, in <module>
from numba import njit
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/lib/python3.10/site-packages/numba/__init__.py", line 19, in <module>
from numba.core import config
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/lib/python3.10/site-packages/numba/core/config.py", line 16, in <module>
import llvmlite.binding as ll
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/lib/python3.10/site-packages/llvmlite/binding/__init__.py", line 4, in <module>
from .dylib import *
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/lib/python3.10/site-packages/llvmlite/binding/dylib.py", line 3, in <module>
from llvmlite.binding import ffi
File "/expanse/projects/jcl110/anaconda3/envs/genomad_env/lib/python3.10/site-packages/llvmlite/binding/ffi.py", line 167, in <module>
raise OSError(msg)
OSError: Could not find/load shared object file: libllvmlite.so
Error was: libstdc++.so.6: failed to map segment from shared object
I got it to work by changing the llvmlite install from conda-forge to numba channel:
(genomad_env) [jespinoz@login02 jcl110]$ conda install -c numba llvmlite
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /expanse/projects/jcl110/anaconda3/envs/genomad_env
added / updated specs:
- llvmlite
The following packages will be downloaded:
package | build
---------------------------|-----------------
llvmlite-0.39.1 | py310he1b5a44_0 28.2 MB numba
------------------------------------------------------------
Total: 28.2 MB
The following packages will be SUPERSEDED by a higher-priority channel:
llvmlite conda-forge::llvmlite-0.39.1-py310h58~ --> numba::llvmlite-0.39.1-py310he1b5a44_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
I was wondering if there is a difference between
(1) Run geNomad 1000 times, one at a time per WGS assembly for 1000 genomes
or
(2) Run geNomad 1 time, by combine the 1000 WGS assemblies into a single fasta file
Is there a d ifference in how it works? Will the option (2) be treated like a 'metagenome' and hence run with different parameters? I have personally run (2) but am concerned if the accuracy is affected
Also, if a contig is identified as virus (either prophage or non-integrated), will geNomad only take the part it thinks is viral or will it just give the whole contig? I have noticed that sometimes the 'coordinates' is NA and it simply just gave me the entire bacteria contig unchanged.
First of all, thank you very much for developing this tool. Using this tool was a great experience.
I have four questions,
In 95% of cases, the taxonomy results only go as far as the "order" layer. Only in some cases "family" is also reported.
Can this situation be improved? Is it possible to reach other layers including genera or even species? Or at least all cases can include family?
The second question,
Most of the results (over 90%) are placed in the Unclassified group. Is there a way to classify them?
The third question,
In this address https://zenodo.org/record/7084650, you have also placed three other databases. What is their use? Is it possible to use this as an example: genomad_hmm_v1.1.tar.gz
The fourth question,
Is it possible to use the AMG/VR 4 database (as you mentioned in its paper) along with this pipeline (or as a database for this pipeline)? To achieve a more accurate taxonomy as well as fewer Unclassified.
Thank you very much in advance,
NP
Dear @apcamargo!
Thanks a lot for your excellent work.
I noticed the software used the ICTV's VMR number 19, however, the ICTV has updated the new taxonomy. Could you update the database based on the new taxonomy? or shared the procedures to make the database.
Thanks a lot.
Warm Regards
Jiandui Mi
Hi again @apcamargo!
I've been able to successfully run genomad on several datasets (metagenomes). During the nn-classification I always receive an error in the log that says:
2022-10-16 22:38:39.339717: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
But the software keeps running and was able to finish successfully in some cases. However, for some very large datasets, the log shows this error and the software keeps running for >20h without any additional info on whether or not something is happening in the background. I cannot find a way to see what process is being executed. I left them running for now, still hoping that they will finish like the others, but it would be great if there was more info in the log on what is going on.
Thanks!
Hello,
Some tools require a gff or genbank file as input for synteny analysis, such as clinker. Is there a If there a way to obtain these file formats from the outputs of $ genomad annotate? If not, do you recommend any script or program to convert the outputs to these formats?
Thank you very much!
Hello, I have some metagenomic data and got some bin after megahit and metabat2 processing. Can I use genomad on bin to obtain the virus?
Which contig assembled by megahit or bin processed by metabat2 can be used for genomad better?
I have a lot of memory and cpu, can I speed things up?
Do you have a better suggestion? Thanks!
Hi,
Thanks for providing genomad, it is very useful.
I was wondering would it be OK to run genomad directly on raw nanopore reads (>1000 bp, median q19) instead of assembled contigs? Should the parameters be tuned in this case?
DearJamie Morton,
Thanks a lot for your excellent software. However, we found Siphoviridae was the majority one in the gut of previous studies. But we did not find it in our result with the software analysis. And we check the list ICTV VMR_19 and also did not found any more. Siphoviridae was removed? Could you please help me. Siphoviridae was combined to which order or family?
Thanks a lot.
Warm Regards
Jiandui Mi
Hi, geNomad works well with most of my data overall. However, when I went in details checking if proviruses in find_provirus directory match exactly with all proviruses in summary directory, I got some mismatches.
I guess some of thoes mismatches might come from post-classification filtering. Nevertheless, I also found some proviruses exist in summary part without cutting off bacterial squence as shown in find_provirus part.
For example, in contig_find_proviruses/contig_provirus.tsv contig_1 was cut
seq_name source_seq start end length n_genes v_vs_c_score in_seq_edge integrases
contig_1|provirus_5838_32914 contig_1|full 5838 32914 27077 41 24.1463 True NA
On the contrary, in contig_summary/contig_summary.tsv, contig_1 was not cut
seq_name length topology coordinates n_genes genetic_code virus_score fdr n_hallmarks marker_enrichment taxonomy
contig_1 32914 Linear NA 48 11 0.9121 NA 1 14.5901 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes
This mismatch confused me a lot. Looking forward to the clarification. Thank you very much!
Best,
Menghao
Hello, thank you for maintaining this useful tool. When running Genomad end-to-end, every module up to the nn-classification step uses up to 30 CPUs, which I allow using --threads 30. However, when the nn-classification step runs, all available CPUs on the machine are used. Could you please help me limit the CPUs used by this step? Skipping this module avoids this problem, but I'd really like to use the neural network classification. Thank you!
genomad end-to-end --cleanup --splits 8 "/mnt/c/Users/DavidIbarra/OneDrive - Cemvita Factory Inc/Desktop/GCF001999325.1.fasta" genomad_output "/mnt/c/Users/DavidIbarra/genomad_db"
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Executing geNomad annotate (v1.7.0). This will perform gene calling in the input sequences and annotate the predicted │
│ proteins with geNomad's markers. │
│ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ Outputs: │
│ genomad_output/GCF001999325.1_annotate │
│ ├── GCF001999325.1_annotate.json (execution parameters) │
│ ├── GCF001999325.1_genes.tsv (gene annotation data) │
│ ├── GCF001999325.1_taxonomy.tsv (taxonomic assignment) │
│ ├── GCF001999325.1_mmseqs2.tsv (MMseqs2 output file) │
│ └── GCF001999325.1_proteins.faa (protein FASTA file) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[13:50:35] Executing genomad annotate.
[13:50:35] Previous execution detected. Steps will be skipped unless their outputs are not found. Use the --restart option
to force the execution of all the steps again.
[13:50:35] GCF001999325.1_proteins.faa was found. Skipping gene prediction with prodigal-gv.
Traceback (most recent call last):
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 190, in run_mmseqs2
subprocess.run(command, stdout=fout, stderr=fout, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mmseqs', 'prefilter', PosixPath('genomad_output/GCF001999325.1_annotate/GCF001999325.1_mmseqs2/query_db/query_db'), PosixPath('/mnt/c/Users/DavidIbarra/genomad_db/genomad_db'), PosixPath('genomad_output/GCF001999325.1_annotate/GCF001999325.1_mmseqs2/search_db/prefilter_db'), '--threads', '12', '-s', '4.2', '--split', '8', '--split-mode', '0', '--max-seqs', '10000000', '--min-ungapped-score', '25', '-k', '5']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/david1ibarra/.local/bin/genomad", line 8, in
sys.exit(cli())
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/rich_click/rich_command.py", line 126, in main
rv = self.invoke(ctx)
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1240, in end_to_end
ctx.invoke(
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 441, in annotate
genomad.annotate.main(
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/genomad/modules/annotate.py", line 203, in main
mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
File "/home/david1ibarra/.local/pipx/venvs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 193, in run_mmseqs2
raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs prefilter genomad_output/GCF001999325.1_annotate/GCF001999325.1_mmseqs2/query_db/query_db /mnt/c/Users/DavidIbarra/genomad_db/genomad_db genomad_output/GCF001999325.1_annotate/GCF001999325.1_mmseqs2/search_db/prefilter_db --threads 12 -s 4.2 --split 8 --split-mode 0 --max-seqs 10000000 --min-ungapped-score 25 -k 5' failed.
Not sure what went wrong or how to fix
Hi Antônio,
I really love this tool. It has really nice docs with beautiful charts and is effortless to use!
I downloaded the database from Zenodo and extracted them manually.
./: 32.66 GB
22.89 GB genomad_hmm_v1.3
3.70 GB genomad_msa_v1.3
3.27 GB genomad_hmm_v1.3.tar.gz
1.37 GB genomad_db
810.10 MB genomad_db_v1.3.tar.gz
653.91 MB genomad_msa_v1.3.tar.gz
6.49 MB genomad_metadata_v1.3.tsv.g
Then I tested with a genome from GTDB (GCA_000010645.1), which seemed to work as expected, successfully identifying the four plasmids in the file (only one when not using --relaxed
).
genomad end-to-end --relaxed --cleanup --threads 40 GCA_000010645.1.fna.gz genomad ~/ws/db/genomad/genomad_db
I have one little question.
genomad_db
needed? Files including genomad_hmm_v1.3
and genomad_msa_v1.3
are out of the genomad_db
.-- EDIT --
Hmm, I think the answer is no. It still works after moving other files to other paths.
Hi,
I'm trying to run genomad for the first time! I'm using it on a compute cluster but with shared resources, so trying to control memory and threads.
I've tried several times, each time adjusting cores and memory resources, and also using the split option you indicated in the manual. But I always end up here:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/jorap2/.conda/envs/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1015, in end_to_end
ctx.invoke(
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 338, in annotate
genomad.annotate.main(
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/genomad/modules/annotate.py", line 201, in main
mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 134, in run_mmseqs2
raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs search all-samples_VIRUSES_out/all-samples_5kb_1.5kb-cir_annotate/all-samples_5kb_1.5kb-cir_mmseqs2/query_db/query_db genomad_db/genomad_db_v1.1/genomad_db all-samples_VIRUSES_out/all-samples_5kb_1.5kb-cir_annotate/all-samples_5kb_1.5kb-cir_mmseqs2/search_db/search_db all-samples_VIRUSES_out/all-samples_5kb_1.5kb-cir_annotate/all-samples_5kb_1.5kb-cir_mmseqs2/tmp --threads 128 -s 6.4 --cov-mode 1 -c 0.2 -e 0.001 --split 16 --split-mode 0' failed.
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=15098211.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
It seems like mmseqs is using 128 threads and I don't know how to contain it. Do you think this is the issue?
Thanks!
Hello, Thanks for the impressive tools for provirus identification. I have two questions when running the tools:
Commond:
genomad nn-classification --cleanup --threads 2 GCF_009025895.1.fa output
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Executing geNomad nn-classification (v1.7.0). This will classify the input sequences into chromosome, plasmid, or virus based on the nucleotide sequence. │
│ ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ Outputs: │
│ out_1/GCF_009025895.1_nn_classification │
│ ├── GCF_009025895.1_nn_classification.json (execution parameters) │
│ ├── GCF_009025895.1_encoded_sequences (directory containing encoded sequence data) │
│ ├── GCF_009025895.1_nn_classification.tsv (contig classification: tabular format) │
│ ├── GCF_009025895.1_nn_classification.npz (contig classification: binary format) │
│ ├── GCF_009025895.1_encoded_proviruses (directory containing encoded sequence data) │
│ ├── GCF_009025895.1_provirus_nn_classification.tsv (provirus classification: tabular format) │
│ └── GCF_009025895.1_provirus_nn_classification.npz (provirus classification: binary format) │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[16:24:41] Executing genomad nn-classification.
[16:24:42] Creating the out_1/GCF_009025895.1_nn_classification/GCF_009025895.1_encoded_sequences directory.
[16:24:45] Encoded sequence data written to GCF_009025895.1_encoded_sequences.
[16:24:45] Creating the out_1/GCF_009025895.1_nn_classification/GCF_009025895.1_encoded_proviruses directory.
[16:24:46] Encoded provirus data written to GCF_009025895.1_encoded_proviruses.
Traceback (most recent call last):
File "/path/to/python3.9.6/bin/genomad", line 8, in
sys.exit(cli())
File "/path/to/python3.9.6/lib/python3.9/site-packages/click/core.py", line 1137, in call
return self.main(*args, **kwargs)
File "/path/to/python3.9.6/lib/python3.9/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/path/to/python3.9.6/lib/python3.9/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/path/to/python3.9.6/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/path/to/python3.9.6/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/path/to/python3.9.6/lib/python3.9/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/path/to/python3.9.6/lib/python3.9/site-packages/genomad/cli.py", line 719, in nn_classification
genomad.nn_classification.main(
File "/path/to/python3.9.6/lib/python3.9/site-packages/genomad/modules/nn_classification.py", line 304, in main
TimeRemainingColumn(elapsed_when_finished=True),
TypeError: init() got an unexpected keyword argument 'elapsed_when_finished'
Software version I used:
The python version: 3.9.6
TensorFlow version: I tried 2.8.0, 2.10.0 and 2.13.0
GeNomad database version: v1.5
In response to Issue #23, geNomad's post-classification filters should be refined to take into account the circularity/completeness of input sequences. This could prevent the erroneous exclusion of legitimate plasmids due to the current stringent filtering criteria aimed at genomic island fragments.
Key Points:
I have enjoyed using geNomad and find it to be a very useful tool. When geNomad identifies a plasmid or virus on a particular contig, is it saying that entire contig likely makes up the plasmid? Because the annotated genes cover the length of the contig, so I wanted to make sure I am interpreting this correctly.
Is there additional documentation on the significance of assigning the three types of topology to plasmids in particular? I was told DTR plasmids are perhaps more likely to be closed than ITR, but it would be helpful to have some documentation or links to information about interpreting the topology.
Finally, is there a way to identify where geNomad found the direct or inverted terminal repeats in a contig?
Thank you!
mamba create -n genomad -c condo-forge -c bioconda genome
Linux ubuntu 18 x86_64
There seems to be an encoding issue with the rich
console output. I have encountered the following error when running the following commands:
- genomad end-to-end -h
- genomad download-database .
after the tarball is downloaded
- PYTHONIOENCODING="utf-8" genomad download-database .
error suggests I add an env var to specify encoding, but that did not work
Error stack:
Traceback (most recent call last):
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/modules/download.py", line 93, in main
console.log(f"Database extracted to [green]{database_path}[/green].")
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/utils.py", line 99, in log
self.regular_console.log(*args, **kwargs)
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich/console.py", line 1940, in log
with self:
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich/console.py", line 864, in __exit__
self._exit_buffer()
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich/console.py", line 822, in _exit_buffer
self._check_buffer()
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich/console.py", line 2060, in _check_buffer
self.file.write(text)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u280f' in position 280: ordinal not in range(256)
*** You may need to add PYTHONIOENCODING=utf-8 to your environment ***
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/storage1/data14/miniconda3/envs/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 335, in download_database
genomad.download.main(destination, keep, verbose)
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/modules/download.py", line 91, in main
with console.status(f"Extracting the database to [green]{database_path}[/green]."):
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich/status.py", line 106, in __exit__
self.stop()
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich/status.py", line 91, in stop
self._live.stop()
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich/live.py", line 147, in stop
with self.console:
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich/console.py", line 864, in __exit__
self._exit_buffer()
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich/console.py", line 822, in _exit_buffer
self._check_buffer()
File "/storage1/data14/miniconda3/envs/genomad/lib/python3.10/site-packages/rich/console.py", line 2060, in _check_buffer
self.file.write(text)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u280f' in position 280: ordinal not in range(256)
*** You may need to add PYTHONIOENCODING=utf-8 to your environment ***
Include functionality to export GFF and GBK files, as per Issue #28. This can be achieved via parsing the genes tabular output or by leveraging Pyrodigal's write_gff
and write_genbank
methods. A script to convert the tabular output into a GFF can be found here.
Key Points:
--write-gff
and --write-gbk
flags if there's a significant processing delay.Is it possible to download database manually instead of genomad download-database
? Thanks!
Hello! I am trying to install geNomad using mamba. However, it will not install because it says that aragorn does not exist
error: the following package could not be installed.
aragorn does not exist (perhaps a typo or a missing channel)
It is not an issue with my typing and from what I have seen it appears that my channels are fine. That being said I am new to bioinformatics and would appreciate any help. I would love be able to start using geNomad:)
Hello, I am doing some analysis on my dataset and have been reading through the documentation but I would like more clarification to make the best judgement.
For the input on the command line I understand that the --splits parameter splits the search into chunks but I am unsure exactly what the example means by "--splits 8" ? I have been putting in different numbers and I do not notice a difference in my results but does the number 8 in this case refer to the threads being used by the computer?
When looking at my summary output file (my virus summary file in my case) I notice that the output says there is predicted 7 viral genes, virus score = .9428, but the number of hallmarks = 0. Judging from the virus score it looks promising that there could be viral genes (because the score is closer to 1) but I am confused as to why the number of hallmarks would be zero if there are possibly 7 genes? My input settings are set to the default setting but my next step would be to put the --conservative parameter but I would like to hear your input thanks!
Hi,
Thanks for developing this tool. I use it to predict viral classification. However, I meet some problems:
my command:
genomad annotate ../checkv_contigs.fa genomad /public/zycheng/database/virus.db/genomad_db_v1.3
And the log:
Executing geNomad annotate (v1.5.2). This will perform gene calling in the input sequences and annotate the predicted proteins with geNomad's markers.
[09:58:03] Executing genomad annotate.
[09:58:03] Creating the genomad/checkv_contigs_annotate directory.
Traceback (most recent call last):
File "/public/home/zycheng/anaconda3/envs/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/public/home/zycheng/anaconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/public/home/zycheng/anaconda3/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/public/home/zycheng/anaconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/public/home/zycheng/anaconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/public/home/zycheng/anaconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/public/home/zycheng/anaconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/public/home/zycheng/anaconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 425, in annotate
genomad.annotate.main(
File "/public/home/zycheng/anaconda3/envs/genomad/lib/python3.10/site-packages/genomad/modules/annotate.py", line 167, in main
database_obj = database.Database(database_path)
File "/public/home/zycheng/anaconda3/envs/genomad/lib/python3.10/site-packages/genomad/database.py", line 10, in __init__
with open(database_directory / "version.txt") as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/public/zycheng/database/virus.db/genomad_db/version.txt'
I downloaded the database from Zenodo and extract it manually, and there are only four files, without version.txt.
Best,
Zhongyi
Thank you very much for making this amazing tool. This is becoming very useful in my research.
I performed a metaSpades assembly on pairend 150bp Illumina reads and subsequently employed VirSorter2 to isolate viral sequences.
Currently, I am utilizing geNomad to further refine my dataset by excluding non-viral sequences, determining taxonomy, and identifying potential chimeric viral sequences.
I ran geNomad using the following command. My fasta file contains 143117 sequences.
genomad end-to-end --enable-score-calibration --cleanup --threads 10 --composition metagenome VirSorter_combined.fasta VirSorter genomad_db
Regarding the chimeric sequences, those that contain genes from two distinct groups, such as "S03-NODE_15397_length_3234_cov_0.690698||full," I am considering as chimeric.
File: VirSorter_combined_virus_summary.tsv
seq_name length topology coordinates n_genes genetic_code virus_score fdr n_hallmarks marker_enrichment taxonomy
S03-NODE_15397_length_3234_cov_0.690698||full 3234 No terminal repeats NA 4 11 0.9507 0.005 1 3.4366 Viruses
S05-NODE_6353_length_8129_cov_0.841540||full 8129 No terminal repeats NA 12 11 0.8916 0.0092 0 2.9931 Viruses;Bicaudaviridae
File: VirSorter_combined_genes.tsv
gene start end length strand gc_content genetic_code rbs_motif marker evalue bitscore uscg plasmid_hallmark virus_hallmark taxid taxname annotation_conjscan annotation_amr annotation_accessions annotation_description
S03-NODE_15397_length_3234_cov_0.690698||full_1 3 89 87 -1 0.379 11 AGxAGG/AGGxGG NA NA NA 0 0 0 1 NA NA NA NA NA
S03-NODE_15397_length_3234_cov_0.690698||full_2 304 1230 927 -1 0.372 11 AGGAG GENOMAD.044416.VV 3.03E-05 48 0 0 1 2561 Caudoviricetes NA NA TIGR01537 NA
S03-NODE_15397_length_3234_cov_0.690698||full_3 1507 2265 759 1 0.461 11 AGGAG NA NA NA 0 0 0 1 NA NA NA NA NA
S03-NODE_15397_length_3234_cov_0.690698||full_4 2622 3233 612 1 0.451 11 AGGAG GENOMAD.096491.VV 5.21E-13 70 0 0 0 352 Marseilleviridae NA NA NA NA
S05-NODE_6353_length_8129_cov_0.841540||full_1 96 419 324 -1 0.358 11 GGA/GAG/AGG NA NA NA 0 0 0 1 NA NA NA NA NA
S05-NODE_6353_length_8129_cov_0.841540||full_2 601 1605 1005 1 0.506 11 AGxAGG/AGGxGG NA NA NA 0 0 0 1 NA NA NA NA NA
S05-NODE_6353_length_8129_cov_0.841540||full_3 1620 2165 546 1 0.579 11 AGGAG GENOMAD.167727.VP 3.31E-05 46 0 0 0 2561 Caudoviricetes NA NA PF18306;COG4474;TIGR00725 Uncharacterized SPBc2 prophage-derived protein YoqJ
S05-NODE_6353_length_8129_cov_0.841540||full_4 2237 2680 444 -1 0.563 11 AGGAG/GGAGG NA NA NA 0 0 0 1 NA NA NA NA NA
S05-NODE_6353_length_8129_cov_0.841540||full_5 2673 3134 462 -1 0.552 11 GGAGG NA NA NA 0 0 0 1 NA NA NA NA NA
S05-NODE_6353_length_8129_cov_0.841540||full_6 3197 4777 1581 -1 0.593 11 GGAGG GENOMAD.184062.VV 1.80E-05 50 0 0 0 3654 Bicaudaviridae NA NA COG4245;K16630 Uncharacterized conserved protein YegL, contains vWA domain of TerY type
S05-NODE_6353_length_8129_cov_0.841540||full_7 4901 5332 432 -1 0.528 11 AGGAG NA NA NA 0 0 0 1 NA NA NA NA NA
S05-NODE_6353_length_8129_cov_0.841540||full_8 5450 5896 447 -1 0.597 11 GGAGG NA NA NA 0 0 0 1 NA NA NA NA NA
S05-NODE_6353_length_8129_cov_0.841540||full_9 5915 7267 1353 -1 0.582 11 AGxAGG/AGGxGG NA NA NA 0 0 0 1 NA NA NA NA NA
S05-NODE_6353_length_8129_cov_0.841540||full_10 7281 7631 351 -1 0.553 11 GGA/GAG/AGG NA NA NA 0 0 0 1 NA NA NA NA NA
S05-NODE_6353_length_8129_cov_0.841540||full_11 7631 7813 183 -1 0.563 11 GGAGG NA NA NA 0 0 0 1 NA NA NA NA NA
S05-NODE_6353_length_8129_cov_0.841540||full_12 7810 8127 318 -1 0.513 11 None NA NA NA 0 0 0 1 NA NA NA NA NA
Now, I have a question about the "Viruses;Bicaudaviridae" taxonomy. Upon examining the "genes.tsv" file, I noticed the presence of two different genes from different families. Should I consider this sequence as chimeric as well?
Thanks and regards,
Bhim
When the output of annotate
changes (due to a change in the sensitivity of the search, for instance), the marker-classification
and nn-classification
will skill skip some steps.
In marker-classification
this will cause some features to be incompatible with the actual gene annotations (e.g., marker frequency remains the same, when it should have changed). In both marker-classification
and nn-classification
, the provirus outputs will remain intact, even if no provirus was detected in the second execution (leading to an error in summary
, as exemplified below).
Run the end-to-end module twice to classify LC735414.1, first with -s 4.2
and then with -s 1
. A provirus will be detected when running with -s 4.2
but not with -s 1
, causing a bug in the summary
module.
I have a large number of metagenomic assemblies (virus-enriched in the wet lab). When I concatenate the assemblies into one file and run geNomad end-to-end (with score calibration), I see a few proviruses that are not detected when I run geNomad on the individual assemblies, with the same settings. The proviruses appear in the find-proviruses folder as well as in the summary.
This might be an expected behaviour but it's not really clear to me. An explanation would be highly appreciated. Also, if this is expected, would the prophage prediction of the concatenated input be more accurate than that of the individual assemblies?
In case it helps to clarify what I mean, this is the summary from the concatenated input:
$ grep "NODE_A77_length_14406_cov_85.369321_CH_17692_sum_ile_d" ALL.SAMPLES_1kb.contigs_virus_summary.tsv
NODE_A77_length_14406_cov_85.369321_CH_17692_sum_ile_d|provirus_6327_14404 8078 Provirus 6327-14404 14 11 0.9952 0.0014 1 11.4948 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes
And this is from the individual assembly:
$ grep "NODE_A77_length_14406_cov_85.369321_CH_17692_sum_ile_d" CH_17692_sum_ile_d_1kb.contigs_virus_summary.tsv
NODE_A77_length_14406_cov_85.369321_CH_17692_sum_ile_d 14406 No terminal repeats NA 20 11 0.9429 0.0109 1 9.7680 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes
Cheers!
Version 1.6.1 worked fine for me.
After installing Genomad 1.7.1 (fresh install), I get the following error:
Traceback (most recent call last):
File "/gpfs/work5/0/gusr0570/conda/genomad_1.7.1/bin/genomad", line 6, in
from genomad.cli import cli
File "/gpfs/work5/0/gusr0570/conda/genomad_1.7.1/lib/python3.10/site-packages/genomad/init.py", line 5, in
from genomad.modules import (
File "/gpfs/work5/0/gusr0570/conda/genomad_1.7.1/lib/python3.10/site-packages/genomad/modules/annotate.py", line 4, in
from genomad import database, mmseqs2, prodigal, sequence, taxonomy, utils
File "/gpfs/work5/0/gusr0570/conda/genomad_1.7.1/lib/python3.10/site-packages/genomad/prodigal.py", line 5, in
import pyrodigal_gv
File "/gpfs/work5/0/gusr0570/conda/genomad_1.7.1/lib/python3.10/site-packages/pyrodigal_gv/init.py", line 11, in
from .meta import METAGENOMIC_BINS, METAGENOMIC_BINS_VIRAL
File "/gpfs/work5/0/gusr0570/conda/genomad_1.7.1/lib/python3.10/site-packages/pyrodigal_gv/meta.py", line 12, in
METAGENOMIC_BINS = pyrodigal.MetagenomicBins([
AttributeError: module 'pyrodigal' has no attribute 'MetagenomicBins'. Did you mean: 'MetagenomicBin'?
Hi,
I'm trying to parse the taxonomy column of the _virus_summary.tsv
file, but unclassified taxonomic levels are omitted which causes inconsistencies. In the table below for example, for the first entry, Straboviridae would end up in the Order column while it is a viral family.
seq_name | length | topology | coordinates | n_genes | genetic_code | virus_score | fdr | n_hallmarks | marker_enrichment | taxonomy |
---|---|---|---|---|---|---|---|---|---|---|
NODE_A75_length_5000_cov_17.505180_BlackFly28 | 5000 | No_terminal_repeats | NA | 4 | 11 | 0.9997 | 0.0003 | 1 | 6.8731 | Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Straboviridae |
NODE_A20_length_7615_cov_6.791589_BlackFly6 | 7615 | No_terminal_repeats | NA | 11 | 11 | 0.9997 | 0.0003 | 6 | 15.3910 | Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes |
NODE_A8_length_6639_cov_9.592350_BlackFly34 | 6639 | No_terminal_repeats | NA | 8 | 11 | 0.9997 | 0.0003 | 1 | 13.4767 | Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes |
NODE_A4_length_12404_cov_730.807901_BlackFly44 | 12404 | No_terminal_repeats | NA | 6 | 11 | 0.9996 | 0.0004 | 2 | 3.4366 | Viruses;Riboviria;Orthornavirae;Negarnaviricota;Monjiviricetes;Mononegavirales;Rhabdoviridae |
Would it be possible to output the taxonomy column as:
Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;;Straboviridae;;
or
Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;unclassified;Straboviridae
?
Thanks in advance!
Thanks for the tool, its been really easy to install and fast to run!
I was just wondering, if I have ~100 SAGs/MAGs to classify, would you suggest concatenating them into a single fasta file for processing in order to FDR corrrect across all genomes rather than on a per-genome-basis?
Thanks for making such a great tool! It's super easy to run and exactly what I need for my project.
Can you explain what the difference is between "Unclassified" and just "Viruses" in the taxonomy output? Does it mean that Unclassified hits are unknown if they're viruses at all?
For example, I have 292 'Unclassified' hits and 7668 'Viruses' hits across 2000 genomes, does this mean the unclassified could have possibly been plasmid/chromosome?
Hi everyone!
I am new to this field and currently in the process of installing Prokka on my MacBook M2. Despite my attempts using both Homebrew and Conda package managers, the result is the same: Aragorn, a dependency of Prokka, is not available. I would greatly appreciate any assistance in resolving this matter !!
Error: prokka: Failed to download resource "aragorn"
Failure while executing; /usr/bin/env /opt/homebrew/Library/Homebrew/shims/shared/curl --disable --cookie /dev/null --globoff --show-error --user-agent Homebrew/4.1.14\ \(Macintosh\;\ arm64\ Mac\ OS\ X\ 13.6\)\ curl/8.1.2 --header Accept-Language:\ en --retry 3 --fail --location --silent --head http://mbio-serv2.mbioekol.lu.se/ARAGORN/Downloads/aragorn1.2.38.tgz
exited with 56. Here's the output:
curl: (56) Recv failure: Connection reset by peer
Hi,
I am trying to annotate virus contigs ( 5kb and above) identified via virsorter2 and deepvirfinder. However the mmseqs prefilter throws the following error:
[14:07:34] Executing genomad annotate.
[14:07:34] Previous execution detected. Steps will be skipped unless their outputs are not found. Use the --restart option to force the execution of all the steps again.
[14:07:34] final.vcontigs.fixed_proteins.faa was found. Skipping gene prediction with prodigal-gv.
Traceback (most recent call last):
File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/genomad/mmseqs2.py", line 190, in run_mmseqs2
subprocess.run(command, stdout=fout, stderr=fout, check=True)
File "/home/user/miniconda3/envs/genomad/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mmseqs', 'prefilter', PosixPath('0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/query_db/query_db'), PosixPath('/home/user/database/genomad-1.5/genomad_db'), PosixPath('0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/search_db/prefilter_db'), '--threads', '30', '-s', '4.2', '--split', '0', '--split-mode', '0', '--max-seqs', '10000000', '--min-ungapped-score', '25', '-k', '5']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/miniconda3/envs/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/genomad/cli.py", line 441, in annotate
genomad.annotate.main(
File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/genomad/modules/annotate.py", line 203, in main
mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/genomad/mmseqs2.py", line 193, in run_mmseqs2
raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs prefilter 0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/query_db/query_db /home/user/database/genomad-1.5/genomad_db 0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/search_db/prefilter_db --threads 30 -s 4.2 --split 0 --split-mode 0 --max-seqs 10000000 --min-ungapped-score 25 -k 5' failed.
I checked the mmseqs2.log and it says Input database has the wrong type (Generic):
Time for merging to query_db: 0h 0m 0s 8ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 124ms
prefilter 0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/query_db/query_db /home/user/database/genomad-1.5/genomad_db 0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/search_db/prefilter_db --threads 30 -s 4.2 --split 0 --split-mode 0 --max-seqs 10000000 --min-ungapped-score 25 -k 5
MMseqs Version: 14.7e284
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 4.2
k-mer length 5
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max sequence length 65535
Max results per query 10000000
Split database 0
Split mode 0
Split memory limit 0
Coverage threshold 0
Coverage mode 0
Compositional bias 1
Compositional bias 1
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 25
Selected taxa
Include identical seq. id. false
Spaced k-mers 1
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Spaced k-mer pattern
Local temporary path
Threads 30
Compressed 0
Verbosity 3
Input database "/home/user/database/genomad-1.5/genomad_db" has the wrong type (Generic).
Allowed input:
- Index
- Nucleotide
- Profile
- Aminoacid
I tried by re-downloading the database, and changing the output directory but had the same error.
The database files were manually downloaded and extracted to /home/user/database/genomad-1.5
Environment info
genomad --version
geNomad, version 1.7.0 (installed through conda)
mmseqs version
14.7e284
database =1.5
ls /home/user/database/genomad-1.5
genomad_db
genomad_hmm_v1.5
genomad_metadata_v1.5.tsv
genomad_msa_v1.5
mmseqs_vrefseq
version.txt
Dear Developer,
I am trying genomad to find taxonomy for my viral contig. Prior to genomad, I performed spades to get assembled contigs and used this as an input for genomad. But I got the following error. This might be associated with to mmseqs2.py
genomad end-to-end --min-score 0.7 --cleanup --splits 8 spade_lim1_1_old/contigs.fasta lim1_1_genomad_output genomad_db
Traceback (most recent call last):
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 131, in run_mmseqs2
subprocess.run(command, stdout=fout, stderr=fout, check=True)
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mmseqs', 'search', PosixPath('lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/query_db/query_db'), PosixPath('/dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/genomad_db/genomad_db'), PosixPath('lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/search_db/search_db'), PosixPath('lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/tmp'), '--threads', '56', '-s', '4.2', '--cov-mode', '1', '-c', '0.2', '-e', '0.001', '--split', '8', '--split-mode', '0']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1208, in end_to_end
ctx.invoke(
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 425, in annotate
genomad.annotate.main(
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/modules/annotate.py", line 202, in main
mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 134, in run_mmseqs2
raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs search lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/query_db/query_db /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/genomad_db/genomad_db lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/search_db/search_db lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/tmp --threads 56 -s 4.2 --cov-mode 1 -c 0.2 -e 0.001 --split 8 --split-mode 0' failed.
Any help would be appreciated.
Dear Developer,
I am using genomad to annotate virus from metagenomic sequencing. I met a problem with mmseqs2.py prefilter. I have read the FAQ and used --splits 8
, but it still showed memory was not enough.
Environment:
Linux x86_64
1000G memory
8 threads
genomad: 1.5.1
mmseq2 version: 14.7e284.
Input file: FASTA.fa (5G size)
My annotate code is:
$MY_PATH/genomad annotate --splits 8 --threads 8 --cleanup $MY_PATH/FASTA.fa $MY_PATH/demo $MY_PATH/genomad_db_v1.1
$MY_PATH means real work dir pathway.
Error shows as follow:
prefilter $MY_PATH/FASTA_annotate/FASTA_mmseqs2/query_db/query_db $MY_PATH/genomad_db_v1.1/genomad_db $MY_PATH/FASTA_annotate/FASTA_mmseqs2/tmp/11571856592932011841/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -s 4.2 -k 5 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 8 --split-mode 0 --split-memory-limit 0 -c 0.2 --cov-mode 1 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 20 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 8 --compressed 0 -v 3
Query database size: 33938637 type: Aminoacid
Target split mode. Searching through 8 splits
Estimated memory consumption: 577M
Target database size: 227897 type: Profile
Process prefiltering step 1 of 8
Index table k-mer threshold: 89 at k-mer size 5
Index table: counting k-mers
[=================================================================] 28.46K 85h 53m 21s 171ms
Index table: Masked residues: 0
Can not allocate entries memory in IndexTable::initMemory
Error: Prefilter died
Looking forward to your reply and you can and you can contact me with e-mail [email protected]
Thank you!
I just found the answer:
coordinates: 1-indexed coordinates of the provirus region within host sequences. Will be NA for viruses that were not predicted to be integrated.
Normal prediction result.
seq_name length topology coordinates n_genes genetic_code virus_score fdr n_hallmarks marker_enrichment taxonomy
-------------------- ------ -------- ------------- ------- ------------ ----------- --- ----------- ----------------- --------------------
CP000388.1|provirus_ 52034 Provirus 774617-826650 58 11 0.9401 NA 11 45.1836 Viruses;
774617_826650 Duplodnaviria;
Heunggongvirae;
Uroviricota;
Caudoviricetes
Prediction without coordinates (genome is GCA_000166735.2, a MAG).
seq_name length topology coordinates n_genes genetic_code virus_score fdr n_hallmarks marker_enrichment taxonomy
-------------- ------ ------------------- ----------- ------- ------------ ----------- --- ----------- ----------------- --------------------
AEMJ01000831.1 698 No terminal repeats NA 1 11 0.9638 NA 0 1.7183 Viruses;
Duplodnaviria;
Heunggongvirae;
Uroviricota;
Caudoviricetes
AEMJ01000737.1 1746 No terminal repeats NA 2 11 0.9244 NA 0 1.7183 Viruses;
Duplodnaviria;
Heunggongvirae;
Uroviricota;
Caudoviricetes
AEMJ01000706.1 826 No terminal repeats NA 1 11 0.8908 NA 0 1.4495 Viruses;
Duplodnaviria;
Heunggongvirae;
Uroviricota;
Caudoviricetes
AEMJ01000847.1 3369 No terminal repeats NA 2 11 0.8785 NA 0 1.7183 Viruses;
Duplodnaviria;
Heunggongvirae;
Uroviricota;
Caudoviricetes
AEMJ01000526.1 288 No terminal repeats NA 2 11 0.8497 NA 0 0.0000 Unclassified
AEMJ01000792.1 2672 No terminal repeats NA 3 11 0.8414 NA 0 1.7183 Unclassified
AEMJ01000320.1 1885 No terminal repeats NA 2 11 0.8297 NA 0 1.7183 Unclassified
AEMJ01000546.1 283 No terminal repeats NA 2 11 0.8262 NA 0 0.0000 Unclassified
AEMJ01000712.1 349 No terminal repeats NA 2 11 0.8238 NA 0 0.0000 Unclassified
$ seqkit stats GCA_000166735.2.fna.gz
file format type num_seqs sum_len min_len avg_len max_len
GCA_000166735.2.fna.gz FASTA DNA 893 2,298,088 101 2,573.4 82,336
$ seqkit seq -n GCA_000166735.2.fna.gz | head -n 3
AEMJ01000893.1 UNVERIFIED_ORG: Leuconostoc inhae KCTC 3774 contig00909, whole genome shotgun sequence
AEMJ01000892.1 UNVERIFIED_ORG: Leuconostoc inhae KCTC 3774 contig00908, whole genome shotgun sequence
AEMJ01000891.1 UNVERIFIED_ORG: Leuconostoc inhae KCTC 3774 contig00907, whole genome shotgun sequence
I am encountering unexpected errors and incomplete output during the execution of the geNomad tool. The error messages indicate potential character encoding issues, and the output seems to be truncated or incomplete, making it difficult to interpret the results accurately. I have also tried setting the PYTHONIOENCODING environment variable to utf-8 to address potential character encoding problems.
Issue Details
Error Description:
When attempting to execute the geNomad tool, I receive an error related to character encoding ('charmap' codec) and difficulties in encoding specific characters. The error messages include:
UnicodeEncodeError: 'charmap' codec can't encode characters in position ...: character maps to
These errors suggest that the tool might face challenges in handling file paths with special characters or in environments with non-UTF-8 character encoding.
Output Truncation:
Additionally, even in successful executions, the output seems to be truncated or cut off, making it challenging to obtain comprehensive and complete information from the tool's output.
Steps to Reproduce
Install the geNomad tool.
Execute the tool with a command that involves file paths with special characters or in a system with a non-UTF-8 character encoding environment.
Expected Behavior
The tool should execute smoothly without encountering character encoding errors.
The output should be complete and display all relevant information without truncation.
Additional Information
I have attempted to resolve the character encoding issue by setting the PYTHONIOENCODING environment variable to utf-8, but the error persists.
Executing geNomad annotate (v1.7.0). This will perform gene calling in the input sequences and annotate the predicted │
│ proteins with geNomad's markers. │
│ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │
│ Outputs: │
│ genomad_output\E. coli K12 MG1655 (NC_000913)_annotate │
│ ├── E. coli K12 MG1655 (NC_000913)_annotate.json (execution parameters) │
│ ├── E. coli K12 MG1655 (NC_000913)_genes.tsv (gene annotation data) │
│ ├── E. coli K12 MG1655 (NC_000913)_taxonomy.tsv (taxonomic assignment) │
│ ├── E. coli K12 MG1655 (NC_000913)_mmseqs2.tsv (MMseqs2 output file) │
│ └── E. coli K12 MG1655 (NC_000913)_proteins.faa (protein FASTA file) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Scripts\genomad.exe_main.py", line 7, in
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\rich_click\rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\genomad\cli.py", line 1240, in end_to_end
ctx.invoke(
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\genomad\cli.py", line 441, in annotate
genomad.annotate.main(
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\genomad\modules\annotate.py", line 82, in main
utils.display_header(
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\genomad\utils.py", line 286, in display_header
console.print(
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\genomad\utils.py", line 96, in print
self.write_print(*args, **kwargs)
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\genomad\utils.py", line 80, in write_print
self.writer_console.print(*args, **kwargs)
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\rich\console.py", line 1673, in print
with self:
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\rich\console.py", line 865, in exit
self._exit_buffer()
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\rich\console.py", line 823, in _exit_buffer
self._check_buffer()
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\site-packages\rich\console.py", line 2039, in _check_buffer
write(text)
File "C:\Users\DavidIbarra\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 381-498: character maps to
*** You may need to add PYTHONIOENCODING=utf-8 to your environment ***
I installed on a cori login node using conda which ran without errors:
conda create -n genomad-conda -c conda-forge -c bioconda genomad
After activating and testing it produces errors:
source activate genomad-conda
genomad
Traceback (most recent call last):
File "/global/homes/s/snayfach/.conda/envs/genomad-conda/bin/genomad", line 6, in
from genomad.cli import cli
File "/global/homes/s/snayfach/.conda/envs/genomad-conda/lib/python3.10/site-packages/genomad/init.py", line 5, in
from genomad.modules import (
File "/global/homes/s/snayfach/.conda/envs/genomad-conda/lib/python3.10/site-packages/genomad/modules/aggregated_classification.py", line 4, in
from genomad import sequence, utils
File "/global/homes/s/snayfach/.conda/envs/genomad-conda/lib/python3.10/site-packages/genomad/sequence.py", line 9, in
from numba import njit
File "/global/homes/s/snayfach/.conda/envs/genomad-conda/lib/python3.10/site-packages/numba/init.py", line 42, in
from numba.np.ufunc import (vectorize, guvectorize, threading_layer,
File "/global/homes/s/snayfach/.conda/envs/genomad-conda/lib/python3.10/site-packages/numba/np/ufunc/init.py", line 3, in
from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize
File "/global/homes/s/snayfach/.conda/envs/genomad-conda/lib/python3.10/site-packages/numba/np/ufunc/decorators.py", line 3, in
from numba.np.ufunc import _internal
SystemError: initialization of _internal failed without raising an exception
Here's a stack overflow thread on the issue: numba/numba#8615
From skimming that, the issue might be the latest version of numpy and installing a lower version of numpy (<1.24) may fix the problem.
Hi,
I used geNomad to identify viruses in addition to VirSorter2 and Cenote-taker2. Subsequently, I used geNomad's annotate module to assign taxonomy to all viral contigs, including those identified by other tools. I noticed that three of them received completely different taxonomic assignment when the input contigs were altered.
For instance, one of them was initially categprozed as Algavirales (Varidnaviria › Bamfordvirae › Nucleocytoviricota › Megaviricetes ) when the input contigs consisted of all assembled contigs (approximately 70,000), but it was then classified as Caudoviricetes (Duplodnaviria › Heunggongvirae › Uroviricota) when only viral contigs were used as the input (around 5000). Is this expected? I would think that taxonomy annotation should be more stable across different input contigs.
Many thanks.
Hello,
Thank you for this great tool. I both used the old version (I dont remember the exact version number but I used it in November) and the new version (1.5.0) of the tool for the same samples. The new version gives less hits (generally 50% decrease) compared to the older version. I kept the virus_score at 0.7 for both versions. I am just wondering why there is such a dramatic change in the number of hits between two versions? I am a bit confused.
Best,
Kadir
Hi
When running the end-to-end module, I got this error for the genomad (version 1.5.0) nn-classification.
genomad end-to-end --cleanup --threads 25 GFS_2469.fa GFS_2469_genomad ~/Desktop/Databases/Genomad/genomad_db/
I'm running this on a ubuntu machine with 250GB of RAM and it stops without really using any of the memory
[10:22:24] Executing genomad nn-classification.
[10:22:24] Creating the GFS_2469_genomad/GFS_2469_nn_classification directory.
[10:22:24] Creating the GFS_2469_genomad/GFS_2469_nn_classification/GFS_2469_encoded_sequences directory.
[10:22:26] Encoded sequence data written to GFS_2469_encoded_sequences.
Traceback (most recent call last):
File "/home/river/miniconda3/envs/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1239, in end_to_end
ctx.invoke(
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 694, in nn_classification
genomad.nn_classification.main(
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/modules/nn_classification.py", line 285, in main
contig_predictions = nn_model.predict(
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Graph execution error:
Detected at node 'model_1/model/conv1d/Pad' defined at (most recent call last):
File "/home/river/miniconda3/envs/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1239, in end_to_end
ctx.invoke(
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 694, in nn_classification
genomad.nn_classification.main(
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/modules/nn_classification.py", line 285, in main
contig_predictions = nn_model.predict(
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/training.py", line 2350, in predict
tmp_batch_outputs = self.predict_function(iterator)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/training.py", line 2137, in predict_function
return step_function(self, iterator)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/training.py", line 2123, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in run_step
outputs = model.predict_step(data)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/training.py", line 2079, in predict_step
return self(x, training=False)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/training.py", line 561, in __call__
return super().__call__(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1132, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/functional.py", line 511, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/functional.py", line 668, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/training.py", line 561, in __call__
return super().__call__(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1132, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/functional.py", line 511, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/functional.py", line 668, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1132, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/river/miniconda3/envs/genomad/lib/python3.10/site-packages/keras/layers/convolutional/base_conv.py", line 276, in call
inputs = tf.pad(inputs, self._compute_causal_padding(inputs))
Node: 'model_1/model/conv1d/Pad'
OOM when allocating tensor with shape[128,6002,257] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_1/model/conv1d/Pad}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_predict_function_942]
I encountered the following error when running genomad score_calibration:
Traceback (most recent call last):
File "/home/remi/miniconda3/envs/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1074, in end_to_end
ctx.invoke(
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 675, in score_calibration
genomad.score_calibration.main(input, output, composition, force_auto, verbose)
File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/modules/score_calibration.py", line 316, in main
len(score_dict["contig_names"]) + len(score_dict["provirus_names"])
KeyError: 'provirus_names'
"
I was running the following command:
genomad end-to-end -t 30 --composition metagenome --enable-score-calibration ZSM005_contigs.filtered.sorted.fasta ZSM005_contigs_score_calibration genomad_db
I am using genomad version 1.3.2 installed using conda.
I can provide the fasta file ZSM005_contigs.filtered.sorted.fasta if it would be helpful in troubleshooting this error.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.