Comments (6)
Hi Shane
The decoder, as it is coded today, requires the indexes to be of the same length. I mean within the group, you can have 4 different sets of barcodes in different locations with different lengths.
The issues is mostly that if some are in a different length it is not well defined how to compute the match. I assume that means some have a few extra bases. are those bases required to tell the barcodes apart? if yes wont it be possible that the sequence flanking the shorter barcodes happens to be similar to a longer one?
Is the set of 20 barcodes described in the paper, with lengths of 5-8, what you are after? Are the anchored at the same 5' position?
You can think of only the first 5 bases as being the barcode. Since you can't tell what will be downstream of the short ones, extra bases on the long ones don't actually contribute that much.
I might be able to allow Ns as fillers to make the barcodes the same length and adjust the posterior calculations accordingly.
from pheniqs.
Hi,
Thanks for the quick reply.
In practice the sequences will look something like this:
Forward
p5 i5 Sequencing primer spacer index 16S locus primers
---- ---- --------------------------------- --- ----- -----------------
ACACTCTTTCCCTACACGACGCTCTTCCGATCT GGTAC CCTACGGGNGGCWGCAG
ACACTCTTTCCCTACACGACGCTCTTCCGATCT c AACAC CCTACGGGNGGCWGCAG
ACACTCTTTCCCTACACGACGCTCTTCCGATCT at CGGTT CCTACGGGNGGCWGCAG
ACACTCTTTCCCTACACGACGCTCTTCCGATCT tcg GTCAA CCTACGGGNGGCWGCAG
ACACTCTTTCCCTACACGACGCTCTTCCGATCT AAGCG CCTACGGGNGGCWGCAG
ACACTCTTTCCCTACACGACGCTCTTCCGATCT g CCACA CCTACGGGNGGCWGCAG
Reverse
p7 i7 Sequencing primer spacer index 16S locus primers
---- ---- ---------------------------------- --- ----- ---------------------
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT AGGAA GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT g AGTGG GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT cc ACGTC GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ttc TCAGC GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTAGG GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT t GCTTA GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT gc GAAGT GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT aat CCTAT GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ATCTG GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT g AGACT GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT cg ATTCC GACTACHVGGGTATCTAATCC
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT tct CAATC GACTACHVGGGTATCTAATCC
So the actual index used to multiplex is the same length (5bp) however there is a spacer from 1-3 bp between some of the indexes to create extra diversity at the positions in the amplicon pool. The custom indexes will be anchored at the same 3' position after the 16S locus primer.
The p5|7 sequence is the same for everything however we will also need to demultiplex on the i5|7 Illumina indexes. Preferably we could do this in one pass using pheniqs?
Is the set of 20 barcodes described in the paper, with lengths of 5-8, what you are after?
Yes, that is exactly what I want to use to demultiplex.
Thanks again,
-shane
from pheniqs.
Oh,
Pheniqs tokens follow the python array slicing syntax. They can take either positive or negative offsets. In your case, just ignore the 0-3bp used for spacing and anchor the custom index to the 3' end. I assume those 5 bases have sufficient entropy for you to tell them apart. Either way, PAMLD, taking quality into account and computing the posterior, will be able to make the most of what ever entropy you have.
Please let me know if you need help defining those tokens. example 2.7 on the configuration page has examples of negative offsets and how they select bases anchored to the 3' end of a read segment.
To answer your questions: Pheniqs can decode all 4 barcodes in one run. You simply define 4 different decoders. The only limitation is that only one of the decoders can be used to split the reads into different files.
from pheniqs.
Hi,
Sorry for the late reply. I was incorrect about the indexes being at a fixed position from the 3' end. What I ultimately, did was follow your suggestion from your first reply. In the first step, I demultiplex on the Illumina index pairs following your Illumina tutorial. In the second step then I demultiplexed on an 8 bp index at the first 8 positions of the read. For the shorter 5-7 bp indexes, I used the first 1-3 bp from the adjacent amplicon primer sequence to ensure that all indexes were the same length. Using this approach everything works as expected.
It took me a while to fully understand the workflow from the online documentation, which I think was likely the cause of my first question. But now that I have it mostly figured out it is pretty straightforward... One source of confusion was that I initially installed this from conda (bioconda channel) and the installed version was 2.0.4. As a result, the options for the software didn't actually correspond to the online documentation, which I am assuming is for the most recent version. After building from source the documentation fully corresponded to the software version I had built and everything made a lot more sense.
thanks,
-shane
from pheniqs.
Hi Shane
Great that you figured it out!
If you want I can post your configuration on the site, it sounds like quite an interesting use case :)
I also think you can do all steps in one run. if you show me your configs I can help with that.
Lior.
from pheniqs.
Hi Lior,
Sure - I made a simple git repo, to house scripts and config files to help the bioinformatics students in the group - https://github.com/slhogle/hambiDemultiplex.
Please have a look there, and yes, if you have a more efficient way to do this (like doing it all in one run) then please let me know! I am basically following the Fluidigm vignette on the webpage and doing the demultiplexing in two separate steps. In each step, I am estimating priors.
Thanks much,
-shane
from pheniqs.
Related Issues (20)
- Install failure with pheniqs-tools (ppkg.py) HOT 2
- error while installing pheniqs under centos 6 using ppkg.py HOT 5
- Pheniqs only processes a small fraction of reads HOT 21
- --help bug HOT 2
- Desirable future features
- EOF error HOT 3
- Citing Pheniqs HOT 3
- Trouble replicating basic behavior HOT 3
- Troubleshooting "SequenceError" error HOT 1
- output knitted and corrected barcodes to fastq HOT 7
- demultiplexing based on primer HOT 7
- Help understanding json config for basic demultiplexing HOT 2
- Last record missing in barcode corrected BAM file HOT 8
- Quickstart Example not working for me HOT 1
- IO error HOT 1
- Tutorial info not correct? HOT 1
- demultiplexing by multiple barcode positions HOT 1
- Pheniqs on very large barcode spaces HOT 1
- Incorrect urls in 'Getting Started' HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pheniqs.