Comments (6)
My primary interest in environmental data is for floristic efforts. 97% clustering is good generally for fungi, but if you are looking for species-level taxonomic resolution, many species will be clustered together that should not be, and there will be separate OTUs for the same species. Further examination of OTU composition is helpful when looking closely at individual OTUs.
from amptk.
I wouldn't be looking so much for all of the individual reads, but rather the representative sequences that make up the OTUs.
from amptk.
The OTU sequence in UPARSE by definition is the most abundant sequence in each cluster. It stems from dereplicating the data to find identical sequences, which are then sorted by abundance (the assumption here is that the more abundant sequences are the higher the likelihood that they are not errors), and then the algorithm moves from most abundant to least abundant defining clusters at the 97% threshold. So essentially all reads that map to the OTU at 100% are representative sequences. Other OTU picking algorithms identify a "centroid" sequence which can be interpreted as representative sequence, UPARSE by default uses representative sequences as the consensus OTU.
There are lots of reasons that 97% is used for establishing OTUs and ITS is more complicated than other amplicons such as 16S as there are many examples of intra-variation in ITS, that is a single isolate has multiple ITS sequences that may be divergent by more than 97% (this happens quite a bit). So the idea of actually getting species level resolution for some groups isn't really possible, for other groups ITS works well. I've seen different species with the same ITS sequence, so really the idea behind OTUs is in the name "operational taxonomic unit" - it isn't really a proxy for species (although it's easier to think that it is).
You can give the DADA2 or UNOSIE2 algorithm a try, they are a "new breed" of algorithms that try to error correct sequences as opposed to clustering. DADA2 has been shown to be accurate to a single base pair (but currently requires all sequences be trimmed to a set length), whereas UNOISE2 claims to be better than or equal to DADA2 (all are author claims of course and all tested with 16S data). Both the algorithms are incorporated into ufits.
from amptk.
Ive done a bit of testing of different clustering methods with curated reference datasets. Most of this involved the centroid clustering methodology. I am well aware that ITS OTU clusters are not a perfect proxy for species-level resolution, but it is very informative in most cases for macrofungi. I will check out the DADA2 and UNOISE algorithms.
from amptk.
Just an FYI, the software automatically looks to install DADA2 in a location on the compute cluster that I do not have write permissions for. I am going to try it out on a server where I have better control over the entire file structure.
from amptk.
You should install manually and it will then not try to install automatically.
from amptk.
Related Issues (20)
- Issue installing AMPtk (Mac OS - M1 chip) HOT 2
- getting NoneType vs int error in clustering step
- Error when run quick start HOT 7
- usearch9 not found when generate UTAX database
- VSEARCH error on amptk -filter step
- Support Python 3.8 onwards HOT 3
- SyntaxError in "duplicate ID in mapping file: XXX, exiting"
- Default for -p, --index_bleed documented as 0.005 HOT 1
- Typo "Bjerkandara adusta" --> "Bjerkandera adusta" HOT 1
- Missing species names in amptk_mock1.fa HOT 3
- Missing final new line in amptk_mock1.fa and amptk_synmock.fa HOT 2
- Inconsistent primer trimming sequence in amptk_mock*.fa HOT 5
- Matching MockA, MockB1 and MockB2 to FASTQ filenames HOT 2
- platform.linux_distribution is removed since Python 3.8 HOT 1
- Species names in amptk_mock2.fa and amptk_mock3.fa vs Figure 4
- new users cannot install amptk properly, please help HOT 3
- unoise3 clustering HOT 5
- Problem with TypeError during AMPtk cluster HOT 11
- Saw you started some prelim ONT methods HOT 2
- Problematic unoise3 implementation with VSEARCH HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amptk.