Comments (4)
Hi @eyesmo,
It's going to be difficult to decide whether a read is a follow-on in most cases where you're sequencing only a single read. If ther are at least a few different sequences in the sample, then these options can be used to adjust the stringency of the match:
bases_to_align
:
https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/filter_pairs.py#L39
align_threshold
:
https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/filter_pairs.py#L40
You're right that one option would be to use UMIs or something similar to identify actual pairs in these cases.
I would suggest to instead spike in some amount of other sequences in these cases to make it easier to identify correct follow-ons. Lambda is typically pretty good as a spike-in for example.
Technically, filter_pairs is quite quick-and-dirty and checks precisely whether a read is rc of the other. The core alignment method happens here, note that seq2 is already reverse-complemented earlier, which is why this works:
https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/filter_pairs.py#L220
from duplex-tools.
Better yet, do the sequencing adapters have any UMIs/stretches of random nucleotides embedded within them?
from duplex-tools.
I think initially used filter_pairs incorrectly. I specified a directory I intended as the output for duplex basecalling, instead of the directory where all the simplex fastq reads were stored.
Running a second time with the simplex fastq reads directory specified, it now says roughly 30% of my reads are good pairs. Still higher than I'd expect for Kit 12, but lower than 90%.
from duplex-tools.
Hi, @eyesmo, just wanted to check if you had any luck trying to increase the bases_to_align? It should help fairly well with making sure you have a complement being rc of the template.
from duplex-tools.
Related Issues (20)
- Unexpected input file name changes output file file name format on split_on_adapter HOT 1
- problem with Fastx HOT 2
- Cannot install duplex_tools HOT 2
- split_pairs not working HOT 3
- pairs.txt file empty, but pairs_from_bam/pair_ids.txt not empty HOT 1
- empty output from split_pairs HOT 5
- KeyError: 'sequence_length_template' when basecalling is turned off HOT 2
- Duplicate reads and read splitting option in MinKNOW HOT 9
- Positional arguments (especially seqkit_stats_nosecondary) in duplex_tools assess_split_on_adapter HOT 1
- split_pod5 supported seed types error HOT 3
- pod5 version of duplex_tools issue HOT 3
- question on split pairs HOT 3
- promethion good pairs: 0 HOT 3
- Extracting duplex reads for multiplexed samples HOT 2
- Unexpected base in duplex call HOT 4
- issue with split_on_adapter output HOT 17
- couldn't install on linux or pc HOT 5
- np.bool deprecated, package no longer works HOT 1
- split_on_adapter no more than one core?
- guppy_duplex ValueError: not enough values to unpack (expected 4, got 3) HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duplex-tools.