Git Product home page Git Product logo

bretigea's Introduction

CRAN_Status_Badge Downloads

BRETIGEA

The goal of BRETIGEA (BRain cEll Type specIfic Gene Expression Analysis) is to estimate and/or deconvolute relative cell type proportions from bulk gene expression data.

BRETIGEA simplifies the process of defining your own set of brain cell type marker genes by using a well-validated set of cell type-specific marker genes derived from multiple types of experiments, as described in our manuscript, McKenzie and Wang et al 2018. For brain tissue data sets, there are marker genes available for astrocytes, endothelial cells, microglia, neurons, oligodendrocytes, and oligodendrocyte precursor cells, derived from each of human, mice, and combination human/mouse data sets. We also provide markers from an alternative source that leveraged variation among intact tissue samples, Kelley et al 2018. However, if you have access to your own marker genes, the functions can be applied to bulk gene expression data from any tissue.

BRETIGEA also implements multiple options for relative cell type proportion estimation using these marker genes, adapting and expanding on approaches from the 'CellCODE' R package described in Chikina et al 2015. The number of cell type marker genes used in a given analysis can be increased or decreased based on your preferences and the data set. Finally, BRETIGEA provides functions to use the estimates to adjust for variability in the relative proportion of cell types across samples (i.e., deconvolute) prior to downstream analyses.

Installation

You can install BRETIGEA from CRAN with:

install.packages("BRETIGEA")

You can install the development version of BRETIGEA from Github, which is recommended to use the most updated version, with:

# install.packages("devtools")
devtools::install_github("andymckenzie/BRETIGEA")

Example

Using example data from the Allen Brain Atlas, a subset of which is available in the package.

library(BRETIGEA)
library(knitr) #only for visualization
str(aba_marker_expression, list.len = 10) #input data format
str(aba_pheno_data) #input data format

ct_res = brainCells(aba_marker_expression, nMarker = 50)
kable(head(ct_res)) #output data format

cor_mic = cor.test(ct_res[, "mic"], as.numeric(aba_pheno_data$ihc_iba1_ffpe), method = "spearman")
print(cor_mic)
cor_ast = cor.test(ct_res[, "ast"], as.numeric(aba_pheno_data$ihc_gfap_ffpe), method = "spearman")
print(cor_ast)

Vignette

See the basic vignette for help with getting started here: https://github.com/andymckenzie/BRETIGEA/blob/master/inst/doc/BRETIGEA_basic.pdf

Applications

You can view the manuscript describing BRETIGEA in detail as well as several applications here:

https://www.nature.com/articles/s41598-018-27293-5

References

  1. McKenzie AT, Wang M, Hauberg ME, et al. Brain Cell Type Specific Gene Expression and Co-expression Network Architectures. Sci Rep. 2018;8(1):8868. See also: http://celltypes.org/

  2. Chikina M, Zaslavsky E, Sealfon SC. CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations. Bioinformatics. 2015;31(10):1584-91.

  3. Miller JA, Guillozet-bongaarts A, Gibbons LE, et al. Neuropathological and transcriptomic characteristics of the aged brain. Elife. 2017;6. Available from: http://aging.brain-map.org/

  4. Kelley KW, Nakao-inoue H, Molofsky AV, Oldham MC. Variation among intact tissue samples reveals the core transcriptional features of human CNS cell classes. Nat Neurosci. 2018;21(9):1171-1184. Alternative marker data from: http://oldhamlab.ctec.ucsf.edu/

bretigea's People

Contributors

andymckenzie avatar brainpreservation avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bretigea's Issues

RNA-seq input values

The input values for the RNA-seq data are raw counts, is that right? As opposed to TPM, etc. I can find it explicitly in the documentation.

Error when only one cell type is specified

When only one cell type is specified for functions brainCells & adjustBrainCells, the following error will return

Error in colnames<-(*tmp*, value = cell_types) :
attempt to set 'colnames' on an object with less than two dimensions

Question about gene expression quantization method

Hello,

Thanks a lot for compiling this resource. I am wondering if the method of gene quantization in my expression matrix will have an impact on the results of the brainCells function. I read your supplementary methods and saw that you obtained raw RNA-seq data, aligned it, and normalized gene counts using a quantile approach. My data is available as RSEM expected counts, TPM, and FPKM but I do not have access to raw data. Would a matrix based on FPKM gene quantization, for example, be valid with this pipeline?

Thank you for your time

Marker gene list for each mouse brain cell type

Hi,
As per the paper: https://doi.org/10.1038/s41598-021-97284-6
it states the following in Supplementary section
"BRETIGEA also offers principal component analysis (PCA) as an option for cell type relative proportion estimation, along with sets of 1000 top consensus cell-specific genes for each cell type. "

I'm doing a single cell analysis of Brain tissues, with cell types as listed in your paper using scType to annotate clusters. Using your tool BRETIGEA, I was trying to extract 1000 top consensus marker genes for each cell type but was unsuccessful in doing so. Cell types I'm providing in the reference marker list: Astrocyte, Endothelial, Microglia, Neuron, Oligodendrocyte, OPC.

So far, with your tool, I'm only able to extract 20 genes with the command

> ct_res = brainCells(aba_marker_expression, nMarker = 1000, celltypes = "ast")
    markers cell
1      AQP4  ast
2   ALDH1L1  ast
3    BMPR1B  ast
4   SLC14A1  ast
5      MLC1  ast
6     FGFR3  ast
7  SLC25A18  ast
8      GLI3  ast
9      GFAP  ast
10   ACSBG1  ast
11   SLC4A4  ast
12     GJA1  ast
13     GJB6  ast
14 SLC39A12  ast
15      AGT  ast
16   CHRDL1  ast
17   SLC1A2  ast
18   CLDN10  ast
19     SOX9  ast
20  PPP1R3C  ast

With the error reported at the end of the command run
Error in colnames<-(*tmp*, value = cell_types) :
attempt to set 'colnames' on an object with less than two dimensions

Could you please let us know how to obtain 1000 marker list per cell type using your BRETIGEA tool?

Results interpretation

Dear developer,

I have a bulk RNA-seq data set that I would like to analyze using BRETIGEA. The steps to produce the results are very simple, but I have difficulties understanding the output. In the vignette section 4 ('Relative cell type proportion estimation') you run the brainCell function to obtain the 'surrogate proportion variables' for each sample and each cell type. How to interpret them? Why are some proportion negative? Are results for different samples (rows in the vignette table) comparable?

Thank you very much for clarifying this points
regards
r

Error in svd(data) : a dimension is zero

Hi Developers of Bretigea,
I ran my Hippocampus bulk tissue rnaseq trying to deconvolute with bretigea with the following command:
brainCells(normalizedcounts, nMarker = 50, species = "mouse")
but I got the error

Error in svd(data) : a dimension is zero

Is there anything i could do to resolve this as it is quite perculiar?

Applicability to proteomic datasets

Hello,

Very cool package/paper!

I know this package was written with the goal of deconvoluting bulk RNA-seq data in mind. However, I was wondering if it could be applied to proteomic datasets as long as there are gene symbols to reference. Or am I missing something that would make this approach incompatible with proteomic data?

Thanks in advance. I know this is potentially a silly question.

adding immune cell types to existing brain cell types marker gene df

Hi,

I would like to use this tool to identify relative cell-proportions of cell types in brain and immune cell types. Specifically interested in monocytes so I tried to add ~300 marker genes for monocyte from literature review and xCell and run findCells function on all samples with marker_gene_df=the existing 6000 marker genes for brain cells provided + ~300 marker gene for monocyte.

I do get some interesting results so I was wondering if this is a valid use of the findCells function to look for relative cell proportions of additional cell types along with brain cells?

Thank you,
Krutika

brainCells()

Hello! I am using BRETIGEA to annotate a single-cell expression matrix from brain organoid data and I am unable to run the brainCells() function. My input matrix does have the same structure as your example one.

I would strongly appreciate some technical support on that!

Thank you,
Ros

str(aba_marker_expression, list.len = 5)

data.frame':	395 obs. of  377 variables:
 $ X488395315: num  0.6557 4.5264 0 0 0.0397 ...
 $ X496100277: num  0.0951 8.8558 0 0 0.0165 ...
 $ X496100278: num  0 4.87 0 0 0 ...
 $ X496100279: num  0 4.85 0 0 0.17 ...
 $ X496100281: num  0 3.6 0 0 0 ...
  [list output truncated]

str(aggregatedMarkerExpression, list.len = 5)

'data.frame':	827 obs. of  7700 variables:
 $ AAACCTGAGAGTCTGG-1: num  0 0 0 1.5 4.25 ...
 $ AAACCTGAGATGCCTT-1: num  0 0 0 1.75 4.58 ...
 $ AAACCTGAGCAATATG-1: num  0 0 0.754 1.338 3.976 ...
 $ AAACCTGAGTACATGA-1: num  0 0 1.62 1.62 3.62 ...
 $ AAACCTGAGTCGATAA-1: num  0 0 0 0.967 3.942 ...

ct_res = brainCells(aggregatedMarkerExpression, nMarker = 50)

Error in if (sum(inputMat[gene, ] > 0)) { :
  argument is not interpretable as logical

transcript/gene quantification

I have quantification per transcript, not gene as is the format for the input. Do you have advice how how I should handle this? Should I re-quantify at the gene level or find a tool that will aggregate the transcripts?

extracting cell-specific expression from bulk tissue matrix through BRETIGEA/complete deconvolution

Hi developers of BRETIGEA
I have the bulk tissue brain expression matrix which i want to deconvolute to specifically my cell type of interest's expression.

With mycelltype containing a character of vectors with that cell's specific gene signature,

i used the myadjcells<-adjustcells (bulkmatrix,svd,mycelltype,addMeans = FALSE,formula = NULL, verbose = TRUE) function and ran them successfully.

Does it mean complete deconvolution has been performed? where myadjcells[["expression"]] will refer to the expression matrix of specifically my cell-type?

Negative estimations in BRETIGEA

Hi,

I am running BRETIGEA on a couple RNA-seq data sets, and I’ve noticed that the brainCells() function often returns negative value for the estimates. This can also be observed in the vignette for the R package. What is the appropriate interpretation in this case?

Relatedly, what is the scale of the values returned by this function? The manual would indicate that it is an estimated proportion, though the numbers do not come close to summing to 1 (even if resetting negative values to 0).

Thanks,
Mark

———
Mark Maienschein-Cline, PhD
Director, Research Informatics Core
Research Resources Center
University of Illinois at Chicago

FindCells.R line 37: sum(inputMat[gene, ] > 0)

Hi,

I would like to ask, how did you make the assumption that total expression of a gene across the samples must be greater than 0 to consider it as a marker? I'm a bit confused because, you used normalized RPKM which has been adjusted for RNA integrity number (RIN) and batch. The expression values can be negative too, since they are already adjusted. If you would use raw RPKM values (without adjusting it and log transformation), the assumption would be correct. I don't understand why this value need to be greater than zero, if inputMat contains negative values. My last question, are the RPKM values log2 transformed?

Thanks,
Ersoy

Input data format and comparing betn experimental groups

Hi Andy.

I'm going through your tutorial to use BRETIGEA to deconvolute cell type populations from mouse brain seq data from two different experimental groups similar to the Fig 3A in this paper.

Apologies, I'm still a new to using this tool but I'm not entirely clear on the format of the input seq data. Do we use raw counts or norm counts? And what does it mean when you have negative values as you do in the proportion analysis table on page 6 of your manual? Lastly, is there a way to directly compare cell type proportions directly between the two groups within the BRETIGEA framework?

Thanks,

Anand

Input data

I'm trying to use the r R BRETIGEA package and can't seem to load my data. No matter how I read it, it comes back with:

Error in brainCells(data, nMarker = 50, species = "combined", celltypes = c("ast", :
At least one marker gene symbol must be present in the rownames of the input matrix.

I'm sure it's something silly but I just can't find the problem, would someone be able to help? I'm using mouse RNAseq data, first row mgi symbols as identifiers, then every row is a different sample from the experiment.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.