Git Product home page Git Product logo

Comments (11)

nextgenusfs avatar nextgenusfs commented on June 17, 2024

I was thinking same thing when I saw your tweet this morning. Probably a few mods and it should work, but right out of the box I'm less sure of unless I had some data. Quality trimming with expected errors won't work as it is too stringent and currently AMPtk is using that for all "clustering" steps. But you could of course set that really high to bypass quality trimming altogether. I thought I saw a paper on bioxriv about a nanopore pipeline for 16S (this seems really rough around the edges though https://github.com/umerijaz/nanopore) and I think Schloss has one for PacBio reads recently - which means its in mothur and is horrible to use....

I'm not sure how to deal with the "clustering" or if you need to keep each sequence separate or not. But basically there isn't a reason you couldn't run normal clustering at say 97% and see what happens. The reference based clustering could be useful in AMPtk as well. If you have some data and want to share I can see if it works and see if I need to make some tweaks or not. It would probably only take a few hours to get something that would work at least work well enough to get through some testing.

from amptk.

nextgenusfs avatar nextgenusfs commented on June 17, 2024

In the past I've used PoreChop https://github.com/rrwick/Porechop for demuxing and adapter trimming (Ryan writes really nice tools). I think I would not quality trim the data at all actually, it would be better to leave the ends intact and make sure you can find primers (if there are any) -- then you know what a full length read looks like.

from amptk.

nextgenusfs avatar nextgenusfs commented on June 17, 2024

If I remember correctly, both PoreChop and now Albacore demux files into separate folders. So something like how amptk illumina should be somewhat easy to write, basically lift the sample name from the folder the read resides in --> FASTQ headers need to have ;barcodelabel=sample_name; in them to properly get through the AMPtk steps and additionally that script could look for primer sites to anchor to. After that, I think you could just run amptk cluster on that dataset and set a very high expected errors value or I can write a switch to turn that off. Let me know if this is something that would be helpful, always trying to make AMPtk as useful as possible.

from amptk.

nextgenusfs avatar nextgenusfs commented on June 17, 2024

Actually I think this already exists, the amptk SRA demuxing script takes a folder as input and fastq files in that folder get processed, i.e. the file name is used as sample name.

myfolder:
    barcode01.fastq
    barcode02.fastq

So in the above folder, if you ran this command it should label everything:

amptk SRA -i myfolder -l 1500 --min_len 1200 -o output \
    -f ATACCGGGAGA -r AGAGATTAGAGAG --require_primer off 

This would then relabel all sequences in barcode01.fastq as ;barcodelabel=barcode01; and so on, it would find and trim primers (if needed) and then drop sequences shorter than 1200 bp and trim reads to 1500 bp if they are longer than that.

You could then take the resulting output.demux.fq.gz file into amptk cluster, i.e.:

amptk cluster -i output.demux.fq.gz --minsize 1 -o output -e 100 

Note this will keep singletons which you probably want to do (default is --minsize 2). You could get a better idea about what expected error value to use by running the following command on your input reads and investigating a little bit:

vsearch --fastq_eestats2 test.fastq --output test.txt \
    --ee_cutoffs 1,2,5,10,50,100 --length_cutoffs 500,2000,100

This will tell you how many reads would be retained at various EE values and lengths.

from amptk.

devonorourke avatar devonorourke commented on June 17, 2024

Thanks Jon,
I'm generating these data in the next couple of weeks. I'll let you know how testing goes soon. I'll send you some test data if you'd like?
Devon

from amptk.

nextgenusfs avatar nextgenusfs commented on June 17, 2024

Yeah that would be great. I have a nanopore, but haven't used it for amplicons. I probably got one a little too early where data wasn't as good as what I read comes off now. Seems like the kits/technology change overnight...

from amptk.

druvus avatar druvus commented on June 17, 2024

@devonorourke @nextgenusfs Any updates on using amptk with nanopore?

I have been playing around a little with my nanopore amplicons (16S, 16S23S, ITS, 18S) but I am not able to get nice clustering so I thought you might have some furter recommendations.

from amptk.

nextgenusfs avatar nextgenusfs commented on June 17, 2024

@druvus I haven't seen any data yet, so I haven't looked at it specifically. Should be able to come up with a method if a mock community was sequenced - anybody know if that data is public somewhere? Reference based clustering in theory should work, although probably the best aligner would be minimap2 for that (not currently in AMPtk). I would think for a de novo approach something like quality trimming (find forward/reverse primers, some sort of Q-filter), find uniques, followed by some sort of pre-clustering to find "centroids" (as I would think too many errors for 100% dereplication to be very effective in determining quality) and then mapping to those sequences using minimap2?

from amptk.

devonorourke avatar devonorourke commented on June 17, 2024

I have generated a tiny bit of 16-S data a few months ago; totally failed experiment I was trying as part of a high school 1-week workshop (apparently bad reagents killed 3 flow cells in a day... ouch). It generated maybe 2000 total reads, so probably not enough to really flesh out how well amptk can handle these kinds of data.
@nextgenusfs - I'll be in London at the Nanopore conference starting Wednesday and can ask around for public data; nothing comes to mind at the moment.
I'm guessing the workflow could look something like you proposed:
sequence --> Albacore --> Porechop --> Minimap2.
In my tiny dataset I just used USEARCH to do everything and it seemed to generate an output that was expected (we swabbed the mouths of animals and got back bacteria commonly found in mouths of animals). I scribbled down that pretty standard code here.

If you wanted to do a de novo approach first, you could try miniasm on the front end. You won't be looking for forward/reverse primers, I don't think, will you? If you're base-calling with nanopore data, your first task is converting the raw signal from a .fast5 file to a .fastq; at that point you'll have your demultiplexed dataset. Porechop will also demultiplex if you want to; nevertheless you should probably have already split your reads before assembling or mapping.

Cheers,
Devon

from amptk.

nextgenusfs avatar nextgenusfs commented on June 17, 2024

Well let me know if you find some data that has a mock community... While it certainly depends on your experimental goal, I'm assuming here that we are talking about PCR amplicons -- but I very much would use the forward/reverse conserved priming regions to enforce "full-length" sequences for "OTU-picking" (to use classical terminology). I don't know what length amplicons you are talking about here? If 1.5 kb or so, should be easy for Nanopore to sequence across the entire length of these amplicons. While porechop would be good to remove adapter sequence - I'm assuming that your initial PCR region specific primers would still be intact - so then 1) pick out only sequences that are full-length 2) run dereplication, 3) cluster, 4) map reads to "OTUs" using minimap2. Would need to write a PAF/SAM to OTU_table script but that shouldn't be too difficult I wouldn't think.

I would be concerned with using something like miniasm as there should be many sequences in a community that are 95-97% identical yet are unique OTUs, so not sure about collapsing/assembling those reads would yield the desired result.

from amptk.

nextgenusfs avatar nextgenusfs commented on June 17, 2024

I should add -- we have a minion, but I have only tried to use for long reads for genome assembly and have not run any of the PCR/amplicon procedures -- so I'm not very familiar with the adapters/primers/etc.

from amptk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.