Question was: One of the optional outputs with other software is a file that lists

Composition of each OTU? about amptk HOT 6 CLOSED

nextgenusfs commented on September 24, 2024

Composition of each OTU?

from amptk.

Comments (6)

MycoMap commented on September 24, 2024

My primary interest in environmental data is for floristic efforts. 97% clustering is good generally for fungi, but if you are looking for species-level taxonomic resolution, many species will be clustered together that should not be, and there will be separate OTUs for the same species. Further examination of OTU composition is helpful when looking closely at individual OTUs.

from amptk.

MycoMap commented on September 24, 2024

I wouldn't be looking so much for all of the individual reads, but rather the representative sequences that make up the OTUs.

from amptk.

nextgenusfs commented on September 24, 2024

The OTU sequence in UPARSE by definition is the most abundant sequence in each cluster. It stems from dereplicating the data to find identical sequences, which are then sorted by abundance (the assumption here is that the more abundant sequences are the higher the likelihood that they are not errors), and then the algorithm moves from most abundant to least abundant defining clusters at the 97% threshold. So essentially all reads that map to the OTU at 100% are representative sequences. Other OTU picking algorithms identify a "centroid" sequence which can be interpreted as representative sequence, UPARSE by default uses representative sequences as the consensus OTU.

There are lots of reasons that 97% is used for establishing OTUs and ITS is more complicated than other amplicons such as 16S as there are many examples of intra-variation in ITS, that is a single isolate has multiple ITS sequences that may be divergent by more than 97% (this happens quite a bit). So the idea of actually getting species level resolution for some groups isn't really possible, for other groups ITS works well. I've seen different species with the same ITS sequence, so really the idea behind OTUs is in the name "operational taxonomic unit" - it isn't really a proxy for species (although it's easier to think that it is).

You can give the DADA2 or UNOSIE2 algorithm a try, they are a "new breed" of algorithms that try to error correct sequences as opposed to clustering. DADA2 has been shown to be accurate to a single base pair (but currently requires all sequences be trimmed to a set length), whereas UNOISE2 claims to be better than or equal to DADA2 (all are author claims of course and all tested with 16S data). Both the algorithms are incorporated into ufits.

from amptk.

MycoMap commented on September 24, 2024

Ive done a bit of testing of different clustering methods with curated reference datasets. Most of this involved the centroid clustering methodology. I am well aware that ITS OTU clusters are not a perfect proxy for species-level resolution, but it is very informative in most cases for macrofungi. I will check out the DADA2 and UNOISE algorithms.

from amptk.

MycoMap commented on September 24, 2024

Just an FYI, the software automatically looks to install DADA2 in a location on the compute cluster that I do not have write permissions for. I am going to try it out on a server where I have better control over the entire file structure.

from amptk.

nextgenusfs commented on September 24, 2024

You should install manually and it will then not try to install automatically.

from amptk.

Composition of each OTU? about amptk HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent