Comments (11)
Thanks @vmikk. I will update the wiki page slightly. And I will build the new v7.2 UNITE DB databases so that amptk install -i ITS
will download the newest version.
from amptk.
I also noticed a few weeks ago that Robert dropped support for UTAX in USEARCH10, so I'm not sure how long I will continue to support UTAX. I find that it sometimes does a better job than SINTAX, but hard to force people to stick to USEARCH9, although if there aren't major upgrades in v10 than maybe will stay with v9 for a little while yet.
from amptk.
And let me know what they say about the developer vs regular for general fasta release, here are some quick stats on each of them (so yes looks like probably not use developer this time, I want to use the most unprocessed so that primers are more likely to be incorporated in the sequence):
fasta_stats.py sh_general_release_s_28/sh_general_release_dynamic_s_28.06.2017.fasta
Reads: 58,639
AvgLen: 731 bp
Shortest: 216 bp
Longest: 7,491 bp
Total: 42,886,966 bp
fasta_stats.py sh_general_release_s_28/developer/sh_general_release_dynamic_s_28.06.2017_dev.fasta
Reads: 58,639
AvgLen: 547 bp
Shortest: 140 bp
Longest: 2,764 bp
Total: 32,081,159 bp
from amptk.
Also there is an issue with duplicated sequence IDs (because of them USEARCH will fail with error):
awk 'BEGIN {FS = "|"} /^>/ {print $2}' \
sh_general_release_dynamic_s_28.06.2017_dev.fasta \
| sort -g | uniq -c | sort -r \
| awk '$1 > 1'
from amptk.
FYI: https://twitter.com/unite_sh/status/882846103846236160, still working on getting these pre-installed versions released. running into a memory error with UTAX on full length ITS (i don't have 64 bit usearch....). We'll see if "updates" to v7.2 are any better. I'm worried that SINTAX will have same memory problem....
from amptk.
Pre-built ITS databases have been updated to UNITE v7.2.
from amptk.
Hello Jon!
Thanks for the update!
By the way, how did you handled duplicated sequences (e.g., KX909166)? In some cases they have different annotation, for example:
>unidentified|KX909166|SH640154.07FU|reps|k__Fungi;p__Ascomycota;c__Sordariomycetes;o__unidentified;f__unidentified;g__unidentified;s__unidentified
>Nectria_dacryocarpa|KX909166|SH490415.07FU|reps_singleton|k__Fungi;p__Ascomycota;c__Sordariomycetes;o__Hypocreales;f__Nectriaceae;g__Nectria;s__Nectria_dacryocarpa
PS. UNITE team didn't reply to my message.
from amptk.
I'm not sure what they are doing with duplicate sequences in their DB, but the seqs of these two are identical, so if you use the --derep_fulllength
option of amptk database
it will remove the duplicates. But perhaps I should update that so it keeps the longer taxonomy string if sequences are identical.
from amptk.
Yes, it would be great to preserve the longest taxonomy string. As I understand, the first one is taken now, and in the case of KX909166 the shortest string was selected.
However, it is not really urgent because there are not too much duplicates in the database.
from amptk.
Hello @nextgenusfs
I'm trying to create a new ITS2 database from UNITE reference with the primers used in my work. But I've got this error message:
$ amptk database -i /mnt/data2/lbcb/projects/Bruno.Moda/databases/unite/v7.2/unite_insd/UNITE_public_01.12.2017.fasta -f GTGAATCATCGARTCTTTGAAC -r TATGCTTAAGTTCAGCGGGTA --primer_required none -o ITS2_unite_v7.2 --create_db usearch --install --source UNITE:7.2
usage: amptk-extract_region.py [options] -f
amptk-extract_region.py: error: unrecognized arguments: --primer_required none --install --source UNITE:7.2
I'm using amptk within conda env, but with the repo from git (up-to-date)
I've tried without the unrecognized arguments, but got this other error:
$ amptk database -i /mnt/data2/lbcb/projects/Bruno.Moda/databases/unite/v7.2/unite_insd/UNITE_public_01.12.2017.fasta -f GTGAATCATCGARTCTTTGAAC -r TATGCTTAAGTTCAGCGGGTA -o ITS2_unite_v7.2 --create_db usearch
Traceback (most recent call last):
File "/mnt/data2/lbcb/conda/envs/amptk/opt/amptk-1.2.4/bin/amptk-extract_region.py", line 411, in
amptklib.setupLogging(log_name)
File "/mnt/data2/lbcb/conda/envs/amptk/opt/amptk-1.2.4/lib/amptklib.py", line 1637, in setupLogging
fhnd = logging.FileHandler(LOGNAME)
File "/mnt/data2/lbcb/conda/envs/amptk/lib/python2.7/logging/init.py", line 920, in init
StreamHandler.init(self, self._open())
File "/mnt/data2/lbcb/conda/envs/amptk/lib/python2.7/logging/init.py", line 950, in _open
stream = open(self.baseFilename, self.mode)
IOError: [Errno 13] Permission denied: u'/mnt/data2/lbcb/conda/envs/amptk/opt/amptk-1.2.4/DB/ITS2_unite_v7.2.log'
What should I do?
Thanks!
from amptk.
You can open a new issue if you have a problem instead of appending to this closed one. You can try to uninstall v1.2.4 from conda, i.e. conda remove amptk
and then install the new version via pip
, i.e. pip install amptk
. I've been having many conda issues the last few weeks and thus haven't pushed an update to conda as it isn't working for me at the moment. Conda has been giving us some permissions errors as well, so hard to know if this is related or not.
from amptk.
Related Issues (20)
- Issue installing AMPtk (Mac OS - M1 chip) HOT 2
- getting NoneType vs int error in clustering step
- Error when run quick start HOT 7
- usearch9 not found when generate UTAX database
- VSEARCH error on amptk -filter step
- Support Python 3.8 onwards HOT 3
- SyntaxError in "duplicate ID in mapping file: XXX, exiting"
- Default for -p, --index_bleed documented as 0.005 HOT 1
- Typo "Bjerkandara adusta" --> "Bjerkandera adusta" HOT 1
- Missing species names in amptk_mock1.fa HOT 3
- Missing final new line in amptk_mock1.fa and amptk_synmock.fa HOT 2
- Inconsistent primer trimming sequence in amptk_mock*.fa HOT 5
- Matching MockA, MockB1 and MockB2 to FASTQ filenames HOT 2
- platform.linux_distribution is removed since Python 3.8 HOT 1
- Species names in amptk_mock2.fa and amptk_mock3.fa vs Figure 4
- new users cannot install amptk properly, please help HOT 3
- unoise3 clustering HOT 5
- Problem with TypeError during AMPtk cluster HOT 11
- Saw you started some prelim ONT methods HOT 2
- Problematic unoise3 implementation with VSEARCH HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amptk.