morris-lab / biddyetalworkflow Goto Github PK
View Code? Open in Web Editor NEWThis repository contains our CellTag workflow, as deployed in our 2018 Biddy et al., Nature paper.
This repository contains our CellTag workflow, as deployed in our 2018 Biddy et al., Nature paper.
Hi!
Thanks for the great workflow! For the custom reference used in the cellranger count function, do you add the entire celltag plasmid sequence or only a small portion containing the celltag barcode and perhaps the GFP sequence?
Thanks!
Hello!
I've created the parsed celltag list for both v1 and v3 libraries. When I try to run matrix.count.celltags.R, I get the following error for the v1 CellTag library, but not the V3:
Error in CJ(1:178363, 1:35378) :
Cross product of elements provided to CJ() would result in 6310126214 rows which exceeds .Machine$integer.max == 2147483647
Calls: dcast -> dcast.data.table -> do.call -> CJ
The v1 parsed file is much larger than the v3, and given the error, I suspect it has to do with the size? I have two samples, one with ~18,000 cells and one with ~6,000 cells and they both throw this error for the V1 library.
Have you seen this error with larger datasets?
In case it is relevant, I also get this warning about data.table:
Warning message:
package ‘data.table’ was built under R version 3.5.2
I'm using R 3.5.1.
Thanks!
Sincerely,
Lauren
Hi,
I noticed that in the drawSubnet
function, you've chosen to use tag = "CellTagV1_2"
. Was there a specific reason for this choice? Additionally, does the figure in the paper also use "CellTagV1_2"?
drawSubnet(tag = "CellTagV1_2", overlay = "Cluster", linkList = linkList, Nodes = Nodes )
From what I understand, "CellTagV1_1" is the largest and most diverse clone. I'm curious as to why "CellTagV1_2" was selected instead.
Thank you.
I followed the protocol step by step and got blocked by this error, Can somebody help me overcome it ?
Hi
I found when the parsing the output of bam, you mentioned you also output the other gene info
With the CellTag reads extracted we use a custom gawk script to parse the file and retain only the information we need. This scripts identifies and extracts the CellTag, Cell Barcode, and UMI sequences associated with each CellTag read. Furthermore, the read ID, read sequence, and any genes the read aligned to are extracted as well. This allows us to accurately quantify each CellTag and associate the CellTags with the correct cells. The output of this script is a tab delimited file with the following collumns: Read.ID, Read.Seq, Cell.BC, UMI, Cell.Tag, Gene. We will use this file to quantify and filter the CellTag data. Note that the regular expression identifying CellTagMEF has two bases added to the beginning. This is a "stricter" CellTag motif which helps filter out some erroneous CellTag reads.
But in my thinking, the only useful reads for celltag is in the CellTag.UTR or GFP chromosome and not other chromosome.
(Please correct me if I misunderstand something).
So is it necessary to keep these reads from other chromosome for downstream analysising ?
Best wishes
Guandong Shang
Hello,
I have a single-cell RNAseq dataset run with the CellTag-V1 library. I am running the Workflow and am able to go through it successfully till the Clone Calling step. However, I run into an error at the Lineage and Network Visualization step for which the input Celltag data should be Nx3 matrix. Since I only have results for V1, how should I proceed at this step?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.