Supplementary codes for my Honours thesis.
- Four mixture models:
- Gamma-Normal (GN):
gammaNormMix()
- Constrained Gamma-Normal (cGN):
gammaNormFix()
- Gamma-Gamma (GG):
gammaMix()
- Constrained Gamma-Gamma (cGG):
gammaMixFix()
- Gamma-Normal (GN):
- Model selection:
- Select the best model using BIC:
modelSelect()
- Select the best model using BIC:
Load the R packages
require(distr)
require(scales)
require(cvTools)
require(edgeR)
Download and read the mixtureModels.R file.
source("mixtureModels.R")
We start from a count matrix, where each row represents a gene, and each column represents a cell. Here, we simulate a count matrix from a negative binomial distribution with 20000 genes and 250 cells as an example.
nGene<-20000
nCell<-250
gene_mean <- 2^runif(nGene, -2, 8)
count <- matrix(rnbinom(nGene*nCell, mu=gene_mean, size=10), nrow=nGene)
Then we need to transform the raw count matrix into a log-scale counts per million mapped reads (CPM) matrix. Note that, one can also use log-scale FPKM, RPKM or TPM matrix or other normalisation output as the input of mixture model functions.
#Transform the raw count matrix
log2cpm<-log2(cpm(count)+1)
GN_gene1<-gammaNormMix(log2cpm[1,])
GG_gene1<-gammaMix(log2cpm[1,])
Note that the fittings for the entire dataset need a few minutes...
GN_dataset<-gammaNormMix(log2cpm,plot=FALSE)
GG_dataset<-gammaMix(log2cpm,plot=FALSE)
cGN_gene1<-gammaNormFix(log2cpm[1,],alpha_fix=GN_dataset$alpha,beta_fix=GN_dataset$beta)
cGG_gene1<-gammaMixFix(log2cpm[1,],alpha_fix=GG_dataset$alpha1,beta_fix=GG_dataset$beta1)
modelSelect(log2cpm[1,],alpha_fix=GG_dataset$alpha1,beta_fix=GG_dataset$beta1,
alpha_fix_norm=GN_dataset$alpha,beta_fix_norm =GN_dataset$beta,parameter=TRUE)
Yingxin Lin
gammaNormMix()
function is modified from the codes provided by Shila Ghazanfar.