Git Product home page Git Product logo

Comments (11)

nextgenusfs avatar nextgenusfs commented on June 22, 2024

Thanks @vmikk. I will update the wiki page slightly. And I will build the new v7.2 UNITE DB databases so that amptk install -i ITS will download the newest version.

from amptk.

nextgenusfs avatar nextgenusfs commented on June 22, 2024

I also noticed a few weeks ago that Robert dropped support for UTAX in USEARCH10, so I'm not sure how long I will continue to support UTAX. I find that it sometimes does a better job than SINTAX, but hard to force people to stick to USEARCH9, although if there aren't major upgrades in v10 than maybe will stay with v9 for a little while yet.

from amptk.

nextgenusfs avatar nextgenusfs commented on June 22, 2024

And let me know what they say about the developer vs regular for general fasta release, here are some quick stats on each of them (so yes looks like probably not use developer this time, I want to use the most unprocessed so that primers are more likely to be incorporated in the sequence):

fasta_stats.py sh_general_release_s_28/sh_general_release_dynamic_s_28.06.2017.fasta 
Reads:    58,639
AvgLen:   731 bp
Shortest: 216 bp
Longest:  7,491 bp
Total:    42,886,966 bp

fasta_stats.py sh_general_release_s_28/developer/sh_general_release_dynamic_s_28.06.2017_dev.fasta 
Reads:    58,639
AvgLen:   547 bp
Shortest: 140 bp
Longest:  2,764 bp
Total:    32,081,159 bp

from amptk.

vmikk avatar vmikk commented on June 22, 2024

Also there is an issue with duplicated sequence IDs (because of them USEARCH will fail with error):

awk 'BEGIN {FS = "|"} /^>/ {print $2}' \
  sh_general_release_dynamic_s_28.06.2017_dev.fasta \
  | sort -g | uniq -c | sort -r \
  | awk '$1 > 1'

from amptk.

nextgenusfs avatar nextgenusfs commented on June 22, 2024

FYI: https://twitter.com/unite_sh/status/882846103846236160, still working on getting these pre-installed versions released. running into a memory error with UTAX on full length ITS (i don't have 64 bit usearch....). We'll see if "updates" to v7.2 are any better. I'm worried that SINTAX will have same memory problem....

from amptk.

nextgenusfs avatar nextgenusfs commented on June 22, 2024

Pre-built ITS databases have been updated to UNITE v7.2.

from amptk.

vmikk avatar vmikk commented on June 22, 2024

Hello Jon!
Thanks for the update!

By the way, how did you handled duplicated sequences (e.g., KX909166)? In some cases they have different annotation, for example:

>unidentified|KX909166|SH640154.07FU|reps|k__Fungi;p__Ascomycota;c__Sordariomycetes;o__unidentified;f__unidentified;g__unidentified;s__unidentified
>Nectria_dacryocarpa|KX909166|SH490415.07FU|reps_singleton|k__Fungi;p__Ascomycota;c__Sordariomycetes;o__Hypocreales;f__Nectriaceae;g__Nectria;s__Nectria_dacryocarpa

PS. UNITE team didn't reply to my message.

from amptk.

nextgenusfs avatar nextgenusfs commented on June 22, 2024

I'm not sure what they are doing with duplicate sequences in their DB, but the seqs of these two are identical, so if you use the --derep_fulllength option of amptk database it will remove the duplicates. But perhaps I should update that so it keeps the longer taxonomy string if sequences are identical.

from amptk.

vmikk avatar vmikk commented on June 22, 2024

Yes, it would be great to preserve the longest taxonomy string. As I understand, the first one is taken now, and in the case of KX909166 the shortest string was selected.
However, it is not really urgent because there are not too much duplicates in the database.

from amptk.

bsmoda avatar bsmoda commented on June 22, 2024

Hello @nextgenusfs
I'm trying to create a new ITS2 database from UNITE reference with the primers used in my work. But I've got this error message:

$ amptk database -i /mnt/data2/lbcb/projects/Bruno.Moda/databases/unite/v7.2/unite_insd/UNITE_public_01.12.2017.fasta -f GTGAATCATCGARTCTTTGAAC -r TATGCTTAAGTTCAGCGGGTA --primer_required none -o ITS2_unite_v7.2 --create_db usearch --install --source UNITE:7.2
usage: amptk-extract_region.py [options] -f
amptk-extract_region.py: error: unrecognized arguments: --primer_required none --install --source UNITE:7.2

I'm using amptk within conda env, but with the repo from git (up-to-date)
I've tried without the unrecognized arguments, but got this other error:

$ amptk database -i /mnt/data2/lbcb/projects/Bruno.Moda/databases/unite/v7.2/unite_insd/UNITE_public_01.12.2017.fasta -f GTGAATCATCGARTCTTTGAAC -r TATGCTTAAGTTCAGCGGGTA -o ITS2_unite_v7.2 --create_db usearch
Traceback (most recent call last):
File "/mnt/data2/lbcb/conda/envs/amptk/opt/amptk-1.2.4/bin/amptk-extract_region.py", line 411, in
amptklib.setupLogging(log_name)
File "/mnt/data2/lbcb/conda/envs/amptk/opt/amptk-1.2.4/lib/amptklib.py", line 1637, in setupLogging
fhnd = logging.FileHandler(LOGNAME)
File "/mnt/data2/lbcb/conda/envs/amptk/lib/python2.7/logging/init.py", line 920, in init
StreamHandler.init(self, self._open())
File "/mnt/data2/lbcb/conda/envs/amptk/lib/python2.7/logging/init.py", line 950, in _open
stream = open(self.baseFilename, self.mode)
IOError: [Errno 13] Permission denied: u'/mnt/data2/lbcb/conda/envs/amptk/opt/amptk-1.2.4/DB/ITS2_unite_v7.2.log'

What should I do?
Thanks!

from amptk.

nextgenusfs avatar nextgenusfs commented on June 22, 2024

You can open a new issue if you have a problem instead of appending to this closed one. You can try to uninstall v1.2.4 from conda, i.e. conda remove amptk and then install the new version via pip, i.e. pip install amptk. I've been having many conda issues the last few weeks and thus haven't pushed an update to conda as it isn't working for me at the moment. Conda has been giving us some permissions errors as well, so hard to know if this is related or not.

from amptk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.