cahanlab / cellnet Goto Github PK
View Code? Open in Web Editor NEWCellNet: network biology applied to stem cell engineering
License: MIT License
CellNet: network biology applied to stem cell engineering
License: MIT License
Hi, I'm running CellNet following the Nature Protocols paper, after downloading the example data from srp059670, I tried cn_salmon(stQuery) command, however error occured as followed:
determining read length.
Trimming reads.
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 21, 22
Then I tried to read the function of cn_salmon in your code, however I still didn't fix the bug, it seems to me that the program is reporting error since the number of the samples in the sample table is incoherent with the files in the working directory, which is not true in my circumstances, so what should I do next? Thanks!
Missing:
I'm trying to make cn_Proc with my own data which has 1 cell type and all have the same description.
At the step to construct C/T-specific GRNs, it is throwing an error.
The command that I'm using
grnProp<-cn_make_grn(stQuery, expList, species='Hs', tfs=hsTFs)
The output I get
`healthy_liver : 24
Number of samples per CT: 24
Error in expDat[, rownames(stGRN)] : incorrect number of dimensions
In addition: Warning message:
In if (is.na(tfs)) { :
the condition has length > 1 and only the first element will be used`
I keep running into a problem when I try to run cn_make_processor. The error is
Error in aMat[gene, ] <- zzs : incorrect number of subscripts on matrix
Calls: cn_make_processor -> cn_trainNorm -> cn_score -> cn_netScores
I'm not sure why this is happening. I am able to create a grnProp object and the classifier performance is good. I also do not run into this issue running cn_make_processor on the data used to make the original cellnet classifier. Clearly this is a problem having to do with my data, but I am at a loss for what would be causing this. Any insight would be appreciated!
Hi,
I've been trying to use Cellnet following the protocols from the article in Nature. I am using the the mouse training data. However I seem to be stuck on the same step using expList<-cn_salmon(stQuery) and it generates this error:
In file(file, "rt") :
cannot open file './salmonRes_SRR2070926/quant.sf': No such file or directory
I also tried this with the human data and it generated a similar message.
Thanks
@darlinghuer
when using the "cn_make_grn" function:
esc : 5
liver : 3
neuron : 5
VE : 1
Number of samples per CT: 1
Error in expDat[, rownames(stGRN)] : incorrect number of dimensions
Calls: cn_make_grn
In addition: Warning message:
In if (is.na(tfs)) { :
the condition has length > 1 and only the first element will be used
Can you check that the rownames of the sample table that you made from the csv, and make sure that they match the colnames of the expression matrix?
Hi, I got the following error when I run cn_apply. Thanks.
Error in predict.randomForest(classList[[ctt]], t(expDat[xgenes, ]), type = "prob") :
number of variables in newdata does not match that in the training data
Hi Patrick,
First, I really really really hope you could help me.
For now, I am working on a research topic and want to use CellNet to access similarity of transcriptome.
But I cannot to use the pre-built GRNs you provieded in web-base tools. So I have to use home-made data to build a new GRNs.
I followed the instructions written in Nature protocol and Platform-Agnostic CellNet and successfully classify query samples.
But, all values abount GRN status are NaN. I've tried many times, but it doesn't solve the problem.
Could you help me to check files and re-try to compute GRN status. Our files are provieded in my GitHub. (https://github.com/AIBio/Pictures_for_Markdown/blob/master/CellNet_files.zip)
If you wish, we can list you as a co-author on the paper.
Thank you again!!!!
Hanwen Yu
Best wishes
Thank you for developing CellNet. I am trying to apply CellNet to some of my RNAseq data following your Nature Protocol paper. I have a stable pipeline on my cluster to run salmon, and I was wondering if there's any function within cellnet to take the quant.sf files and go from there. In other words, I was wondering if we can just compute the last part of cn_salmon locally?
Hi,
I was trying to run locally the procedure (mouse example). I encountered the following error at step 8:
> cnRes1<-cn_apply(expList[['normalized']], stQuery, cnProc)
119 21
1715 21
94 21
141 21
451 21
273 21
236 21
2122 21
221 21
705 21
985 21
165 21
189 21
272 21
955 21
1113 21
Error in predict.randomForest(classList[[ctt]], t(expDat[xgenes, ]), type = "prob") :
missing values in newdata
I noticed that expList[['normalized']] is empty. Also at step 6 the following warning had been issued multiple times:
[2021-04-06 11:57:18.354] [jointLog] [warning] Only 0 fragments were mapped, but the number of burn-in fragments was set to 5000000.
The effective lengths have been computed using the observed mappings.
I am using conda with the following packages installed:
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
boost 1.60.0 py35_3 conda-forge
bz2file 0.98 py_0 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2020.12.5 ha878542_0 conda-forge
certifi 2018.8.24 py35_1001 conda-forge
cutadapt 1.18 py35_0 bioconda
icu 56.1 4 conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc 7.2.0 h69d50b8_2 conda-forge
libgcc-ng 9.3.0 h2828fa1_18 conda-forge
libgomp 9.3.0 h2828fa1_18 conda-forge
libstdcxx-ng 9.3.0 h6de172a_18 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
openssl 1.0.2u h516909a_0 conda-forge
parallel 20170422 pl5.22.0_0 bioconda
perl 5.22.0.1 0 conda-forge
pigz 2.6 h27826a3_0 conda-forge
pip 20.3.4 pyhd8ed1ab_0 conda-forge
python 3.5.5 h5001a0f_2 conda-forge
readline 7.0 hf8c457e_1001 conda-forge
salmon 0.7.2 boost1.60_3 bioconda
setuptools 40.4.3 py35_0 conda-forge
sqlite 3.28.0 h8b20d00_0 conda-forge
tbb 2020.2 h4bd325d_4 conda-forge
tk 8.6.10 h21135ba_1 conda-forge
wheel 0.36.2 pyhd3deb0d_0 conda-forge
xopen 0.7.3 py_0 bioconda
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h516909a_1010 conda-forge
Any help is greatly appreciated!
HERE IS THE ERROR. Salmon and Cutadapt works fine but when i follow your protocol i keep getting this error. Please help me resolve this issue.
determining read length
Trimming reads
This is cutadapt 4.2 with Python 3.10.8
Command line parameters: -m 30 -u 18 -u -17 -o ./subset_SRR12820270_trimmed.fq subset_SRR12820270.fastq
Processing single-end reads on 1 core ...
Finished in 103.257 s (2.988 µs/read; 20.08 M reads/minute).
=== Summary ===
Total reads processed: 34,554,336
== Read fate breakdown ==
Reads that were too short: 778,286 (2.3%)
Reads written (passing filters): 33,776,050 (97.7%)
Total basepairs processed: 2,555,642,890 bp
Total written (filtered): 1,335,316,421 bp (52.2%)
Salmon
Version Info: This is the most recent version of salmon.
[2023-02-28 11:42:38.364] [jointLog] [info] done
[2023-02-28 11:42:38.896] [jointLog] [info] Index contained 177,651 targets
[2023-02-28 11:42:39.060] [jointLog] [info] Number of decoys : 195
[2023-02-28 11:42:39.060] [jointLog] [info] First decoy index : 177,412
processed 500,005 fragments
processed 1,000,021 fragmentsg: 3.51056
processed 1,500,006 fragmentsg: 3.57137
processed 2,000,010 fragmentsg: 3.50785
processed 2,500,000 fragmentsg: 3.52013
processed 3,000,002 fragmentsg: 3.51665
processed 3,500,005 fragmentsag: 3.51738
processed 4,000,005 fragmentsag: 3.49182
processed 4,500,001 fragmentsag: 3.49594
processed 5,000,003 fragmentsag: 3.50565
processed 5,500,002 fragmentsag: 3.4983
processed 6,000,002 fragmentsag: 3.49189
processed 6,500,004 fragmentsag: 3.49138
processed 7,000,003 fragmentsag: 3.4895
processed 7,500,008 fragmentsag: 3.48928
processed 8,000,002 fragmentsag: 3.49037
processed 8,500,001 fragmentsag: 3.4896
processed 9,000,010 fragmentsag: 3.48465
processed 9,500,001 fragmentsag: 3.48998
processed 10,000,003 fragmentsg: 3.48525
processed 10,500,010 fragmentsg: 3.48806
processed 11,000,015 fragmentsg: 3.48831
processed 11,500,005 fragmentsg: 3.48911
processed 12,000,014 fragmentsg: 3.48969
processed 12,500,006 fragmentsg: 3.48973
processed 13,000,013 fragmentsg: 3.49159
processed 13,500,002 fragmentsg: 3.49334
processed 14,000,008 fragmentsg: 3.49106
processed 14,500,014 fragmentsg: 3.49369
processed 15,000,006 fragmentsg: 3.49404
processed 15,500,003 fragmentsg: 3.49327
processed 16,000,006 fragmentsg: 3.49111
processed 16,500,000 fragmentsg: 3.49389
processed 17,000,008 fragmentsg: 3.49389
processed 17,500,003 fragmentsg: 3.49404
processed 18,000,008 fragmentsg: 3.49595
processed 18,500,007 fragmentsg: 3.49517
processed 19,000,004 fragmentsg: 3.4917
processed 19,500,005 fragmentsg: 3.49187
processed 20,000,016 fragmentsg: 3.49191
processed 20,500,001 fragmentsg: 3.49566
processed 21,000,002 fragmentsg: 3.49412
processed 21,500,005 fragmentsg: 3.49473
processed 22,000,003 fragmentsg: 3.49406
processed 22,500,008 fragmentsg: 3.49372
processed 23,000,001 fragmentsg: 3.49317
processed 23,500,001 fragmentsg: 3.49268
processed 24,000,009 fragmentsg: 3.49015
processed 24,500,012 fragmentsg: 3.49356
processed 25,000,000 fragmentsg: 3.49292
processed 25,500,005 fragmentsg: 3.49137
processed 26,000,007 fragmentsg: 3.49166
processed 26,500,018 fragmentsg: 3.49258
processed 27,000,004 fragmentsg: 3.49206
processed 27,500,007 fragmentsg: 3.49304
processed 28,000,004 fragmentsg: 3.49252
processed 28,500,003 fragmentsg: 3.49275
processed 29,000,006 fragmentsg: 3.4934
processed 29,500,011 fragmentsag: 3.49361
processed 30,000,010 fragmentsag: 3.49353
processed 30,500,009 fragmentsag: 3.4931
processed 31,000,008 fragmentsag: 3.49452
processed 31,500,010 fragmentsag: 3.4932
processed 32,000,004 fragmentsag: 3.4953
processed 32,500,012 fragmentsag: 3.4926
processed 33,000,009 fragmentsag: 3.49283
processed 33,500,004 fragmentsag: 3.49281
hits: 116,993,925; hits per frag: 3.49287
[2023-03-02 14:09:05.703] [jointLog] [info] Computed 342,989 rich equivalence classes for further processing
[2023-03-02 14:09:05.706] [jointLog] [info] Counted 20,405,708 total reads in the equivalence classes
[2023-03-02 14:09:05.778] [jointLog] [warning] 0.106504% of fragments were shorter than the k used to build the index.
If this fraction is too large, consider re-building the index with a smaller k.
The minimum read size found was 30.
[2023-03-02 14:09:05.781] [jointLog] [info] Number of mappings discarded because of alignment score : 6,166,684
[2023-03-02 14:09:05.781] [jointLog] [info] Number of fragments entirely discarded because of alignment score : 549,711
[2023-03-02 14:09:05.781] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 538,143
[2023-03-02 14:09:05.781] [jointLog] [info] Number of fragments discarded because they have only dovetail (discordant) mappings to valid targets : 0
[2023-03-02 14:09:05.781] [jointLog] [info] Mapping rate = 60.4147%
[2023-03-02 14:09:05.781] [jointLog] [info] finished quantifyLibrary()
[2023-03-02 14:09:05.793] [jointLog] [info] Starting optimizer
[2023-03-02 14:09:21.937] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate
[2023-03-02 14:09:21.966] [jointLog] [info] iteration = 0 | max rel diff. = 6800.09
[2023-03-02 14:09:23.764] [jointLog] [info] iteration = 100 | max rel diff. = 18.8774
[2023-03-02 14:09:25.676] [jointLog] [info] iteration = 200 | max rel diff. = 12.666
[2023-03-02 14:09:27.747] [jointLog] [info] iteration = 300 | max rel diff. = 5.09729
[2023-03-02 14:09:30.090] [jointLog] [info] iteration = 400 | max rel diff. = 1.4944
[2023-03-02 14:09:32.487] [jointLog] [info] iteration = 500 | max rel diff. = 0.441993
[2023-03-02 14:09:34.927] [jointLog] [info] iteration = 600 | max rel diff. = 1.02472
[2023-03-02 14:09:37.425] [jointLog] [info] iteration = 700 | max rel diff. = 0.682443
[2023-03-02 14:09:39.923] [jointLog] [info] iteration = 800 | max rel diff. = 9.14076
[2023-03-02 14:09:42.654] [jointLog] [info] iteration = 900 | max rel diff. = 3.03266
[2023-03-02 14:09:45.419] [jointLog] [info] iteration = 1,000 | max rel diff. = 0.527561
[2023-03-02 14:09:48.060] [jointLog] [info] iteration = 1,100 | max rel diff. = 0.0268382
[2023-03-02 14:09:50.716] [jointLog] [info] iteration = 1,200 | max rel diff. = 2.66146
[2023-03-02 14:09:53.844] [jointLog] [info] iteration = 1,300 | max rel diff. = 1.04132
[2023-03-02 14:09:56.847] [jointLog] [info] iteration = 1,400 | max rel diff. = 0.199233
[2023-03-02 14:09:59.859] [jointLog] [info] iteration = 1,500 | max rel diff. = 0.131681
[2023-03-02 14:10:02.856] [jointLog] [info] iteration = 1,600 | max rel diff. = 0.0744955
[2023-03-02 14:10:05.860] [jointLog] [info] iteration = 1,700 | max rel diff. = 0.028298
[2023-03-02 14:10:08.580] [jointLog] [info] iteration = 1,800 | max rel diff. = 0.0700855
[2023-03-02 14:10:11.169] [jointLog] [info] iteration = 1,900 | max rel diff. = 0.036263
[2023-03-02 14:10:13.713] [jointLog] [info] iteration = 2,000 | max rel diff. = 0.0731979
[2023-03-02 14:10:15.059] [jointLog] [info] iteration = 2,055 | max rel diff. = 0.00890234
[2023-03-02 14:10:15.065] [jointLog] [info] Finished optimizer
[2023-03-02 14:10:15.065] [jointLog] [info] writing output
[2023-03-02 14:10:15.401] [jointLog] [warning] NOTE: Read Lib [[ subset_SRR12820270_trimmed.fq ]] :
Detected a potential strand bias > 1% in an unstranded protocol check the file: salmonRes_SRR12820270/lib_format_counts.json for details
./salmonRes_SRR12820270/quant.sf
Error in geneIndexList[[i]] : subscript out of bounds
In the NIS plot, what does the colour range of the boxplot represent? P-value?
Also what is the range of the NIS? In the RNA-seq paper (Radley et al. 2017) in figure 5a the range is much larger than in 5b, does that mean that the dysregulation in 5a is stronger than in 5b?
Thank you for your help.
Hi,
I have been trying to run CellNet locally on my Mac. But there's error "non-numeric matrix extent" at the step "Estimation of expression levels".
Here's my code:
expList<-cn_salmon(stQuery, refDir="ref/", salmonIndex=iFileMouse,salmonPath=pathToSalmon)
When I trace back, it's the line "ansTPM<-matrix(0, nrow=length(eids), ncol=ncol(expTPM))" at function "gene_expr_sum" at "cn_salmon".
I'm not familiar with R, so would you please help with this?
Thank you for developing CellNet. I am trying to apply CellNet to some of my RNAseq data following your Nature Protocol paper. And the R file named 'geneToTrans_Homo_sapiens.GRCh38.80.exo_Jul_04_2015.R' is needed for parameter geneTabfname.
But I can't find where to download it or don't know how to generate it. Could you tell me how to get the file?
Hi,
Thank you for developing CellNet.
I wanted to ask about your opinion in using this tool for generating NIS scores using scRNA-seq data. Basically, by replacing expList by its equivalent with normalized counts per cell.
expList[['normalized']]
I think it can be setup well and NIS scores could be calculated. However, I wanted to confirm if you maybe already have tutorials for in other repositories for scRNA-seq data e.g. singleCellNet.
Thank you!
Ignacio
Hi Patrick,
I am trying to apply my trained cnProc on my query data by the following piece of code:
cnRes <- cn_apply(expList, stQuery, cnProc)
The error message that I get is the following, I cannot figure out what this is related to:
"Error in predict.randomForest(classList[[ctt]], t(expDat[xgenes, ]), type = "prob") :
variables in the training data missing in newdata"
I have been using this function successfully on other query data, but now I get this error. I believe this should be about my processor. However, when I trained the processor, everything went well.
Do you have any suggestions on how I could avoid this error and what exactly this is about?
I appreciate your help.
Rebeka
I am trying to run CellNet for RNA-Seq data locally but at the step 6.B mentioned in your Nature protocols paper, I am having an error unsed arguiment for 'geneTabfname'.
expList<-cn_salmon(stQuery, refDir="ref/", salmonIndex=iFileHuman, salmonPath=pathToSalmon, geneTabFname="geneToTrans_Homo_sapiens.GRCh38.80.exo_Jul_04_2015.R")
Error in cn_salmon(stQuery, refDir = "ref/", salmonIndex = iFileHuman, : unused argument (geneTabFname = "geneToTrans_Homo_sapiens.GRCh38.80.exo_Jul_04_2015.R")
Then just to check I tried running it without geneTabfname, all other steps in it ran but then it gives an error that the file not found
expList<-cn_salmon(stQuery, refDir="ref/", salmonIndex=iFileHuman, salmonPath=pathToSalmon)
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file 'ref//geneToTrans_Mus_musculus.GRCm38.80.exo_Jun_02_2015.R', probable reason 'No such file or directory'
I checked the help document and also the cn_salmon function file from the bin but couldn't find any help there as well
I also tried giving the full path for the file but it made no difference
Another point to make here, thoiugh I am using the Human example but I get the file not found for Mus musculus
Hi Patrick,
I would like to ask where the TF list for Homo sapiens is from (data/hsTFs.rda)? Where did you retrieve this list of genes?
I am comparing your list with the CisBP database, the number of Homo sapiens transcription factor genes is 1639 in that database. While the list that can be found in this tutorial (data/hsTFs.rda) contains 2139 genes. It would be tempting to use your TF list since that has more genes, but CisBP seems to be a more reliable source.
Thanks for the clarification!
Beka
The error is below.I want to know how to solve it.Thank you.
determining read length
Trimming reads
sh: 1: parallel: not found
Salmon
sh: 1: parallel: not found
rm: cannot remove '*trimmed.fq': No such file or directory
./salmonRes_SRR1501367/quant.sf
Error in file(file, "rt") : cannot open the connection
In addition: Warning messages:
1: In system(cmd) : error in running command
2: In system(cmd) : error in running command
3: In file(file, "rt") :
cannot open file './salmonRes_SRR1501367/quant.sf': No such file or directory.
I am trying to get more information on the training data that is used for humans. From the Robject that you have provided it gives the SRA ids and I am not able to get more information from them computationally and I donot want to spend my time searching for each id manually.
It would be really helpful if you could provide more insights into the data that was used to train the system.
In particular I'm interested in knowing the diseased states of the models and in vitro or biopsies used.
Thanks
Hi Patrick,
I'm trying to get CellNet to work locally on my workstation, but I got error on step cn_salmon(stQuery) saying parrellel command not found. So I figured this step calls salmon using parallel. Then I installed parellel using conda but the problem persisted. On top of that, I'm not sure if I configured the paths properly. All parameter values default to what are supposed to be in the AMI. I wonder would you be able to provide a working example for local setup so I can understand it better. Thanks a lot!
Regards,
Max
I get an error while running:
> cn_barplot_grnSing(cnRes,cnProc,"fibroblast", c("fibroblast","kidney"), bOrder, sidCol="sra_id", dlevel="description1")
Error in data.frame(sample_id = sample_ids, description = descriptions, :
arguments imply differing number of rows: 0, 1
The error occurs when cnRes contains only one row.
It looks like the fix should be applied to cn_extract_SN_DF function:
cn_extract_SN_DF<-function
(scores,
sampTab,
dLevel,
rnames=NULL,
sidCol="sample_id"
){
if(is.null(rnames)){
rnames<-rownames(scores);
#cat("GOT NULL\n");
}
tss<-scores[rnames,];
if(length(rnames)==1){
tss<-t(as.matrix(scores[rnames,]));
rownames(tss)<-rnames;
# cat(dim(tss),"\n")
}
colnames(tss) <- colnames(scores); # <---- the bug fix
nSamples<-ncol(tss);
stTmp<-sampTab[colnames(tss),]; ####
snNames<-rownames(tss);
num_subnets<-length(snNames);
snNames<-unlist(lapply(snNames, rep, times=nSamples));
sample_ids<-rep(as.vector(stTmp[,sidCol]), num_subnets);
descriptions<-rep(as.vector(stTmp[,dLevel]), num_subnets);
# myCtts<-rep(ctt, length(snNames));
scores<-as.vector(t(tss));
data.frame(sample_id=sample_ids,
description=descriptions,
# ctt=myCtts,
subNet = snNames,
score=scores);
### data.frame
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.