Comments (10)
Hi,
Thanks for the question. It's not yet possible, but I would suspect that it would be useful. We intend to release a better version of template/complement splitting today hopefully that should be better than adapter splitting for duplex.
from duplex-tools.
Thanks for your quick reply. I will try it out when it is released.
from duplex-tools.
Hi @jagos01, v0.2.20 is now out, and you can use this to recover reads which are non-split.
Feel free to try it out by
- simplex-calling (fast is ok):
$ dorado basecaller [email protected] pod5s/ --emit-moves > unmapped_reads_with_moves.sam
- run split_pairs like this:
duplex_tools split_pairs unmapped_reads_with_moves.sam pod5s/ pod5s_splitduplex/
This should give you new pod5s in the pod5s_splitduplex
directory (with new read-ids), together with the pair_ids that correspond to the new read_ids.
Feel free to try it out and let me know how things are working.
from duplex-tools.
Hello @onordesjo, I followed the directions outlined in the readme for duplex calling with dorado. I generated the pair_id files for both step 2a and 2b. They contained 4667 and 7867 pairs respectively. When stereo basecalling those reads, dorado only basecalled 4114 and 1338 reads. Why is the number of stereo basecalled reads less than the number of read pairs?
Thanks
from duplex-tools.
Hi @jagos01. Can I ask what type of data you have been looking at? Whole genome? Any amplification? There is some filtering happening in Dorado to ensure that bad pairs don't get through, so that is to be expected. I would expect less pairs generated in step 2b than 2b but greater retention of good pairs. 2a would also necessarily have to be generated without a subset (or alternatively a selection of channels).
Any of this information would help to explain what you are seeing.
from duplex-tools.
Hello @onordesjo. This is bacterial whole genome sequence data. No amplification was carried out. The data is split over two runs (had to restart the sequencer a couple hours into the run). I was also expecting less pairs from 2b. 2a was generated from the complete data set.
from duplex-tools.
from duplex-tools.
I inspected the pod5 reads for each run and the unmapped BAM file contains reads from both runs.
from duplex-tools.
from duplex-tools.
Thanks, I have emailed a link to the bam file.
from duplex-tools.
Related Issues (20)
- pairs.txt file empty, but pairs_from_bam/pair_ids.txt not empty HOT 1
- empty output from split_pairs HOT 5
- KeyError: 'sequence_length_template' when basecalling is turned off HOT 2
- Duplicate reads and read splitting option in MinKNOW HOT 9
- Positional arguments (especially seqkit_stats_nosecondary) in duplex_tools assess_split_on_adapter HOT 1
- split_pod5 supported seed types error HOT 3
- pod5 version of duplex_tools issue HOT 3
- question on split pairs HOT 3
- promethion good pairs: 0 HOT 3
- Extracting duplex reads for multiplexed samples HOT 2
- Unexpected base in duplex call HOT 4
- issue with split_on_adapter output HOT 17
- couldn't install on linux or pc HOT 5
- np.bool deprecated, package no longer works HOT 1
- split_on_adapter no more than one core?
- guppy_duplex ValueError: not enough values to unpack (expected 4, got 3) HOT 5
- Low number of good pairs after filtering HOT 1
- split_on_adapter: Missing internal adapter/primer from the middle of the main read
- numpy<1.24 incompatible with python3.12 HOT 3
- merging original duplex bam file with the addtional split duplex bam file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duplex-tools.