Git Product home page Git Product logo

Comments (14)

onordesjo avatar onordesjo commented on September 24, 2024

Hi @MortenEneberg, could you try setting PCR primers here:

https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_on_adapter.py#L36

then, run with the "PCR" setting like the command below:

duplex_tools split_on_adapter <fastq_directory> <output_directory> PCR

It may give you the following, in which case we would have a think of how to retain the tail-barcode in the left read, and the head barcode in the right read.

5'-ADAPTER-Y-TOP-BARCODEX-PRIMER-SEQ-PRIMER' and 5'PRIMER-SEQ-PRIMER-BARCODEY-3'

Cheers

from duplex-tools.

MortenEneberg avatar MortenEneberg commented on September 24, 2024

Hi @onordesjo

So I tried with just a few reads where I knew the structure.

One was a read where nothing was supposed to happen having a structure like:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-3'
This read was not split.

The next one was a read with the following structure, supposed to be split into 3 reads, as I set the --allow-multiple-splits:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-BARCODEY-PRIMER1-SEQ-PRIMER2-BARCODEY-BARCODEZ-PRIMER1-SEQ-PRIMER2-BARCODEZ-3'

And this was split into 2 reads with the following structures:
Read 1: 5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-BARCODEY-3'

Read 2: 5'-BARCODEY-BARCODEZ-PRIMER1-SEQ-PRIMER2-BARCODEZ-3'

meaning that the following was discarded:
5'-PRIMER1-SEQ-PRIMER2-3'

Interestingly, the beginning of read 2 contains the end of PRIMER2, meaning that not all of the primer was in the cut out part of the read.

I would have like these 3 reads to be the output instead:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-3'
5'-BARCODEY-PRIMER1-SEQ-PRIMER2-BARCODEY-3'
5'-BARCODEZ-PRIMER1-SEQ-PRIMER2-BARCODEZ-3'

Same splits if run without --allow-multiple-splits

I used the following primers in the split_on_adapter.py file:

pcr_primers=(
            'ACACTCTTTCCCTACACGACGCTCTTCCGATCT',
            'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'),

On the following 4 reads, where I only presented the first 2 reads here:
input_seq1.fastq.gz

With this command: duplex_tools split_on_adapter $input_seq $output_seq PCR --allow-multiple-splits

Cheers!

from duplex-tools.

onordesjo avatar onordesjo commented on September 24, 2024

Hi @MortenEneberg,

Getting there it seems!

If the last SEQ is rather short, it may be the case that it's masked (to not accidentally split reads right at the end).

You could try to use the additional options --trim_end 10 --trim_start 10 (which would reduce trimming down to 10bp on either side).

from duplex-tools.

MortenEneberg avatar MortenEneberg commented on September 24, 2024

Hi @onordesjo,

just updated the splits for the second example (7/9 at 10AM) - made a small error

I tried your suggested settings.

The read where nothing was supposed to happen having a structure like:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-3'
now splits into:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-3'
and
5'-BARCODEX-3'

thus discarding: 5'-SEQ-PRIMER2-3' ...
As with the examples in the previous comment, remains/small parts of the adapters are on each side of the cut - it does not identify the whole part which would be crucial for the subsequent demultiplexing if it got to cut correctly.

The next one was a read with the following structure, supposed to be split into 3 reads:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-BARCODEY-PRIMER1-SEQ-PRIMER2-BARCODEY-BARCODEZ-PRIMER1-SEQ-PRIMER2-BARCODEZ-3'

With the new settings, this was split into 2 reads with the following structures:
Read 1: 5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-BARCODEY-PRIMER1-SEQ-3'
and
Read 2: 5'-SEQ-PRIMER2-BARCODEZ-3'

meaning that the following was discarded:
5'-PRIMER2-BARCODEY-BARCODEZ-PRIMER1-3'

Do you have any clue on how to solve this?

Cheers!

from duplex-tools.

MortenEneberg avatar MortenEneberg commented on September 24, 2024

Hi @onordesjo,

I appreciate your help!

Did you have a chance to look at it yet?

Kind regards

from duplex-tools.

onordesjo avatar onordesjo commented on September 24, 2024

Hi @MortenEneberg, sorry, I don't have much bandwidth to look at this at the moment.

Would you be able to use a debugger to step through split_on_adapter and see where the decisions are made? I'd suggest starting on this line, which is where all results are found for matches against the subsequence:

https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_on_adapter.py#L142

from duplex-tools.

MortenEneberg avatar MortenEneberg commented on September 24, 2024

Hi @onordesjo,

Have you had the time to give it a look?

Kind regards,
Morten

from duplex-tools.

onordesjo avatar onordesjo commented on September 24, 2024

Hi @MortenEneberg,

I have started to look at it, but may probably need to add in the barcode sequences to this plot to get a better view of what should be written out. I've added in imperfect matches to both of your primer sequences (using seqkit locate), but it's not entirely clear yet
image

from duplex-tools.

MortenEneberg avatar MortenEneberg commented on September 24, 2024

Hi @onordesjo,

Thank you for looking into it!

I have attached the barcodes here, where also the sequencing adapter is: Single_barcodes_rev_for.txt

Note that in the attached file it is the primer sequences. When reading one strand the barcodes in 5' and 3' ends will be the same

Kind regards,
Morten

from duplex-tools.

onordesjo avatar onordesjo commented on September 24, 2024

Thanks Morten!

I'll add those in and try to get it straightened out. My feeling is that it'll be easiest in this use case to use a standalone tool (since the front-adapter is not actually expected to be in the middle.

Do note by the way that the targets being matched to are these: I forgot to point that out previously, but obviously relevant if you're not actually expecting part of the adapter to be between the primers:
https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_on_adapter.py#L58

So basically what's being searched for is <primer-x-rc><adapter-rc><variable-length-N><adapter><primer-y>, where primer-x and primer-y may be same or different.

        'PCR': [
            (rev_comp(x)
                + tail_adapter[:len(tail_adapter) - n_bases_to_mask_tail]
                + middle_sequence + head_adapter[n_bases_to_mask_head:] + y)
            for x in pcr_primers for y in pcr_primers]

from duplex-tools.

onordesjo avatar onordesjo commented on September 24, 2024

Basically, in principle you should have better luck with something like this, detecting

        'BARCODES': [
            (rev_comp(x) + y)
            for x in barcodes for y in barcodes]

I've marked up the regions (red bars right under the tick marks) that I understand you'd want written out as separate reads, is this correct?

image

from duplex-tools.

MortenEneberg avatar MortenEneberg commented on September 24, 2024

Dear @onordesjo ,

Yes it looks correct! I have attached a paint image (not pretty..) just to make sure we are on the same page :)

Thank you!
197509279-9d0c463f-e610-408b-bde3-81090db4ee86

Morten

from duplex-tools.

MortenEneberg avatar MortenEneberg commented on September 24, 2024

Dear @onordesjo,

Thanks for your help! Did you have a chance to look at it yet?

Kind regards,
Morten

from duplex-tools.

onordesjo avatar onordesjo commented on September 24, 2024

Hi Morten!

Sorry, I wasn't clear on the last message. I don't think it's something we're planning to support since it's a rather special use case.

You could definitely it a go to replace:

        'PCR': [
            (rev_comp(x)
                + tail_adapter[:len(tail_adapter) - n_bases_to_mask_tail]
                + middle_sequence + head_adapter[n_bases_to_mask_head:] + y)
            for x in pcr_primers for y in pcr_primers]

with:

        'BARCODES': [
            (rev_comp(x) + y)
            for x in barcodes for y in barcodes]

and see if you get the right matches then.

Again, sorry for not being clear and not being able to put more resource on this!

from duplex-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.