Is there anything in the structure of amptk that would prohibit using Nanopore amplico

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Nanopore about amptk HOT 11 OPEN

nextgenusfs commented on June 17, 2024

Nanopore

from amptk.

Comments (11)

nextgenusfs commented on June 17, 2024

I was thinking same thing when I saw your tweet this morning. Probably a few mods and it should work, but right out of the box I'm less sure of unless I had some data. Quality trimming with expected errors won't work as it is too stringent and currently AMPtk is using that for all "clustering" steps. But you could of course set that really high to bypass quality trimming altogether. I thought I saw a paper on bioxriv about a nanopore pipeline for 16S (this seems really rough around the edges though https://github.com/umerijaz/nanopore) and I think Schloss has one for PacBio reads recently - which means its in mothur and is horrible to use....

I'm not sure how to deal with the "clustering" or if you need to keep each sequence separate or not. But basically there isn't a reason you couldn't run normal clustering at say 97% and see what happens. The reference based clustering could be useful in AMPtk as well. If you have some data and want to share I can see if it works and see if I need to make some tweaks or not. It would probably only take a few hours to get something that would work at least work well enough to get through some testing.

from amptk.

nextgenusfs commented on June 17, 2024

In the past I've used PoreChop https://github.com/rrwick/Porechop for demuxing and adapter trimming (Ryan writes really nice tools). I think I would not quality trim the data at all actually, it would be better to leave the ends intact and make sure you can find primers (if there are any) -- then you know what a full length read looks like.

from amptk.

nextgenusfs commented on June 17, 2024

If I remember correctly, both PoreChop and now Albacore demux files into separate folders. So something like how amptk illumina should be somewhat easy to write, basically lift the sample name from the folder the read resides in --> FASTQ headers need to have ;barcodelabel=sample_name; in them to properly get through the AMPtk steps and additionally that script could look for primer sites to anchor to. After that, I think you could just run amptk cluster on that dataset and set a very high expected errors value or I can write a switch to turn that off. Let me know if this is something that would be helpful, always trying to make AMPtk as useful as possible.

from amptk.

nextgenusfs commented on June 17, 2024

Actually I think this already exists, the amptk SRA demuxing script takes a folder as input and fastq files in that folder get processed, i.e. the file name is used as sample name.

myfolder:
    barcode01.fastq
    barcode02.fastq

So in the above folder, if you ran this command it should label everything:

amptk SRA -i myfolder -l 1500 --min_len 1200 -o output \
    -f ATACCGGGAGA -r AGAGATTAGAGAG --require_primer off

This would then relabel all sequences in barcode01.fastq as ;barcodelabel=barcode01; and so on, it would find and trim primers (if needed) and then drop sequences shorter than 1200 bp and trim reads to 1500 bp if they are longer than that.

You could then take the resulting output.demux.fq.gz file into amptk cluster, i.e.:

amptk cluster -i output.demux.fq.gz --minsize 1 -o output -e 100

Note this will keep singletons which you probably want to do (default is --minsize 2). You could get a better idea about what expected error value to use by running the following command on your input reads and investigating a little bit:

vsearch --fastq_eestats2 test.fastq --output test.txt \
    --ee_cutoffs 1,2,5,10,50,100 --length_cutoffs 500,2000,100

This will tell you how many reads would be retained at various EE values and lengths.

from amptk.

devonorourke commented on June 17, 2024

Thanks Jon,
I'm generating these data in the next couple of weeks. I'll let you know how testing goes soon. I'll send you some test data if you'd like?
Devon

from amptk.

nextgenusfs commented on June 17, 2024

Yeah that would be great. I have a nanopore, but haven't used it for amplicons. I probably got one a little too early where data wasn't as good as what I read comes off now. Seems like the kits/technology change overnight...

from amptk.

druvus commented on June 17, 2024

@devonorourke @nextgenusfs Any updates on using amptk with nanopore?

I have been playing around a little with my nanopore amplicons (16S, 16S23S, ITS, 18S) but I am not able to get nice clustering so I thought you might have some furter recommendations.

from amptk.

nextgenusfs commented on June 17, 2024

@druvus I haven't seen any data yet, so I haven't looked at it specifically. Should be able to come up with a method if a mock community was sequenced - anybody know if that data is public somewhere? Reference based clustering in theory should work, although probably the best aligner would be minimap2 for that (not currently in AMPtk). I would think for a de novo approach something like quality trimming (find forward/reverse primers, some sort of Q-filter), find uniques, followed by some sort of pre-clustering to find "centroids" (as I would think too many errors for 100% dereplication to be very effective in determining quality) and then mapping to those sequences using minimap2?

from amptk.

devonorourke commented on June 17, 2024

I have generated a tiny bit of 16-S data a few months ago; totally failed experiment I was trying as part of a high school 1-week workshop (apparently bad reagents killed 3 flow cells in a day... ouch). It generated maybe 2000 total reads, so probably not enough to really flesh out how well amptk can handle these kinds of data.
@nextgenusfs - I'll be in London at the Nanopore conference starting Wednesday and can ask around for public data; nothing comes to mind at the moment.
I'm guessing the workflow could look something like you proposed:
sequence --> Albacore --> Porechop --> Minimap2.
In my tiny dataset I just used USEARCH to do everything and it seemed to generate an output that was expected (we swabbed the mouths of animals and got back bacteria commonly found in mouths of animals). I scribbled down that pretty standard code here.

If you wanted to do a de novo approach first, you could try miniasm on the front end. You won't be looking for forward/reverse primers, I don't think, will you? If you're base-calling with nanopore data, your first task is converting the raw signal from a .fast5 file to a .fastq; at that point you'll have your demultiplexed dataset. Porechop will also demultiplex if you want to; nevertheless you should probably have already split your reads before assembling or mapping.

Cheers,
Devon

from amptk.

nextgenusfs commented on June 17, 2024

Well let me know if you find some data that has a mock community... While it certainly depends on your experimental goal, I'm assuming here that we are talking about PCR amplicons -- but I very much would use the forward/reverse conserved priming regions to enforce "full-length" sequences for "OTU-picking" (to use classical terminology). I don't know what length amplicons you are talking about here? If 1.5 kb or so, should be easy for Nanopore to sequence across the entire length of these amplicons. While porechop would be good to remove adapter sequence - I'm assuming that your initial PCR region specific primers would still be intact - so then 1) pick out only sequences that are full-length 2) run dereplication, 3) cluster, 4) map reads to "OTUs" using minimap2. Would need to write a PAF/SAM to OTU_table script but that shouldn't be too difficult I wouldn't think.

I would be concerned with using something like miniasm as there should be many sequences in a community that are 95-97% identical yet are unique OTUs, so not sure about collapsing/assembling those reads would yield the desired result.

from amptk.

nextgenusfs commented on June 17, 2024

I should add -- we have a minion, but I have only tried to use for long reads for genome assembly and have not run any of the PCR/amplicon procedures -- so I'm not very familiar with the adapters/primers/etc.

from amptk.

Nanopore about amptk HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent