Comments (14)
Hi @MortenEneberg, could you try setting PCR primers here:
https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_on_adapter.py#L36
then, run with the "PCR" setting like the command below:
duplex_tools split_on_adapter <fastq_directory> <output_directory> PCR
It may give you the following, in which case we would have a think of how to retain the tail-barcode in the left read, and the head barcode in the right read.
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER-SEQ-PRIMER' and 5'PRIMER-SEQ-PRIMER-BARCODEY-3'
Cheers
from duplex-tools.
Hi @onordesjo
So I tried with just a few reads where I knew the structure.
One was a read where nothing was supposed to happen having a structure like:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-3'
This read was not split.
The next one was a read with the following structure, supposed to be split into 3 reads, as I set the --allow-multiple-splits:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-BARCODEY-PRIMER1-SEQ-PRIMER2-BARCODEY-BARCODEZ-PRIMER1-SEQ-PRIMER2-BARCODEZ-3'
And this was split into 2 reads with the following structures:
Read 1: 5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-BARCODEY-3'
Read 2: 5'-BARCODEY-BARCODEZ-PRIMER1-SEQ-PRIMER2-BARCODEZ-3'
meaning that the following was discarded:
5'-PRIMER1-SEQ-PRIMER2-3'
Interestingly, the beginning of read 2 contains the end of PRIMER2, meaning that not all of the primer was in the cut out part of the read.
I would have like these 3 reads to be the output instead:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-3'
5'-BARCODEY-PRIMER1-SEQ-PRIMER2-BARCODEY-3'
5'-BARCODEZ-PRIMER1-SEQ-PRIMER2-BARCODEZ-3'
Same splits if run without --allow-multiple-splits
I used the following primers in the split_on_adapter.py file:
pcr_primers=(
'ACACTCTTTCCCTACACGACGCTCTTCCGATCT',
'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'),
On the following 4 reads, where I only presented the first 2 reads here:
input_seq1.fastq.gz
With this command: duplex_tools split_on_adapter $input_seq $output_seq PCR --allow-multiple-splits
Cheers!
from duplex-tools.
Hi @MortenEneberg,
Getting there it seems!
If the last SEQ is rather short, it may be the case that it's masked (to not accidentally split reads right at the end).
You could try to use the additional options --trim_end 10 --trim_start 10
(which would reduce trimming down to 10bp on either side).
from duplex-tools.
Hi @onordesjo,
just updated the splits for the second example (7/9 at 10AM) - made a small error
I tried your suggested settings.
The read where nothing was supposed to happen having a structure like:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-3'
now splits into:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-3'
and
5'-BARCODEX-3'
thus discarding: 5'-SEQ-PRIMER2-3'
...
As with the examples in the previous comment, remains/small parts of the adapters are on each side of the cut - it does not identify the whole part which would be crucial for the subsequent demultiplexing if it got to cut correctly.
The next one was a read with the following structure, supposed to be split into 3 reads:
5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-BARCODEY-PRIMER1-SEQ-PRIMER2-BARCODEY-BARCODEZ-PRIMER1-SEQ-PRIMER2-BARCODEZ-3'
With the new settings, this was split into 2 reads with the following structures:
Read 1: 5'-ADAPTER-Y-TOP-BARCODEX-PRIMER1-SEQ-PRIMER2-BARCODEX-BARCODEY-PRIMER1-SEQ-3'
and
Read 2: 5'-SEQ-PRIMER2-BARCODEZ-3'
meaning that the following was discarded:
5'-PRIMER2-BARCODEY-BARCODEZ-PRIMER1-3'
Do you have any clue on how to solve this?
Cheers!
from duplex-tools.
Hi @onordesjo,
I appreciate your help!
Did you have a chance to look at it yet?
Kind regards
from duplex-tools.
Hi @MortenEneberg, sorry, I don't have much bandwidth to look at this at the moment.
Would you be able to use a debugger to step through split_on_adapter and see where the decisions are made? I'd suggest starting on this line, which is where all results are found for matches against the subsequence:
https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_on_adapter.py#L142
from duplex-tools.
Hi @onordesjo,
Have you had the time to give it a look?
Kind regards,
Morten
from duplex-tools.
Hi @MortenEneberg,
I have started to look at it, but may probably need to add in the barcode sequences to this plot to get a better view of what should be written out. I've added in imperfect matches to both of your primer sequences (using seqkit locate
), but it's not entirely clear yet
from duplex-tools.
Hi @onordesjo,
Thank you for looking into it!
I have attached the barcodes here, where also the sequencing adapter is: Single_barcodes_rev_for.txt
Note that in the attached file it is the primer sequences. When reading one strand the barcodes in 5' and 3' ends will be the same
Kind regards,
Morten
from duplex-tools.
Thanks Morten!
I'll add those in and try to get it straightened out. My feeling is that it'll be easiest in this use case to use a standalone tool (since the front-adapter is not actually expected to be in the middle.
Do note by the way that the targets being matched to are these: I forgot to point that out previously, but obviously relevant if you're not actually expecting part of the adapter to be between the primers:
https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_on_adapter.py#L58
So basically what's being searched for is <primer-x-rc><adapter-rc><variable-length-N><adapter><primer-y>
, where primer-x and primer-y may be same or different.
'PCR': [
(rev_comp(x)
+ tail_adapter[:len(tail_adapter) - n_bases_to_mask_tail]
+ middle_sequence + head_adapter[n_bases_to_mask_head:] + y)
for x in pcr_primers for y in pcr_primers]
from duplex-tools.
Basically, in principle you should have better luck with something like this, detecting
'BARCODES': [
(rev_comp(x) + y)
for x in barcodes for y in barcodes]
I've marked up the regions (red bars right under the tick marks) that I understand you'd want written out as separate reads, is this correct?
from duplex-tools.
Dear @onordesjo ,
Yes it looks correct! I have attached a paint image (not pretty..) just to make sure we are on the same page :)
Morten
from duplex-tools.
Dear @onordesjo,
Thanks for your help! Did you have a chance to look at it yet?
Kind regards,
Morten
from duplex-tools.
Hi Morten!
Sorry, I wasn't clear on the last message. I don't think it's something we're planning to support since it's a rather special use case.
You could definitely it a go to replace:
'PCR': [
(rev_comp(x)
+ tail_adapter[:len(tail_adapter) - n_bases_to_mask_tail]
+ middle_sequence + head_adapter[n_bases_to_mask_head:] + y)
for x in pcr_primers for y in pcr_primers]
with:
'BARCODES': [
(rev_comp(x) + y)
for x in barcodes for y in barcodes]
and see if you get the right matches then.
Again, sorry for not being clear and not being able to put more resource on this!
from duplex-tools.
Related Issues (20)
- Unexpected input file name changes output file file name format on split_on_adapter HOT 1
- problem with Fastx HOT 2
- Cannot install duplex_tools HOT 2
- split_pairs not working HOT 3
- pairs.txt file empty, but pairs_from_bam/pair_ids.txt not empty HOT 1
- empty output from split_pairs HOT 5
- KeyError: 'sequence_length_template' when basecalling is turned off HOT 2
- Duplicate reads and read splitting option in MinKNOW HOT 9
- Positional arguments (especially seqkit_stats_nosecondary) in duplex_tools assess_split_on_adapter HOT 1
- split_pod5 supported seed types error HOT 3
- pod5 version of duplex_tools issue HOT 3
- question on split pairs HOT 3
- promethion good pairs: 0 HOT 3
- Extracting duplex reads for multiplexed samples HOT 2
- Unexpected base in duplex call HOT 4
- issue with split_on_adapter output HOT 17
- couldn't install on linux or pc HOT 5
- np.bool deprecated, package no longer works HOT 1
- split_on_adapter no more than one core?
- guppy_duplex ValueError: not enough values to unpack (expected 4, got 3) HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duplex-tools.