giannimonaco / abis Goto Github PK

R 100.00%

abis's Introduction

ABsolute Immune Signal (ABIS) deconvolution

This Shiny app performs absolute deconvolution on RNA-Seq and microarray data. It also contain a Gene Viewer page where the expression of a gene can be visualized across 29 immune cell types.

INSTALLATION

You can run the apps in several ways:

Run it on the web!

This shiny app is hosted online at the https://giannimonaco.shinyapps.io/ABIS/. It does not need R installation and it can be immediately used by just clicking on the link. It only allows the upload of files no bigger than 5 MB.

We suggest using ABIS from the web only for testing. For larger analysis we encourage to install the shiny app locally.

Run it locally without installation

You need to download the app from GitHub through R and it will run locally. However, as soon as you will close R, the app will not be available anymore and you need it to download it again. All the packages and dependencies have to be installed first.

install.packages(c("shiny", "MASS", "preprocessCore"), dependencies = TRUE)

shiny::runGitHub("ABIS", user="giannimonaco")

Run it locally with installation

Save the repository on your local machine. Open either the ui.R or the server.R file with RStudio, then click on "Run App".

ABSOLUTE DECONVOLUTION

Input file

The gene expression matrix of your PBMC samples must be in a Tab delimited format. The gene names must be gene symbols and there should not be duplicates. In case you use Excel to check or modify your file, be aware that Excel automatically converts some gene names to date (for example MARCH1 becomes 1-Mar and SEPT2 becomes 2-Sept). This can also create duplicated names when you export the file in TXT file (for example both MARC1 and MARCH1 become 1-Mar). Check the file TPMPBMC.txt in the folder data if you are looking for an example.

RNA-Seq deconvolution

For RNA-Seq deconvolution the gene expression values must be TPM values. RNA-Seq deconvolution has been implemented using data from Illumina HiSeq 2000.

Microarray deconvolution

For microarray deconvolution, the expression values should derive from the selection of the maximum expression value from the probes encoding for a single gene. Microarray deconvolution has been implemented using data from Illumina HT-12 v4.

Please, be aware that platform and pre-processing specific effects can occur.

Output

The output values should be percentages of immune cell type, hence within the range of 0-100. Hence, if you use PBMC data, when you sum the scores for each sample, you should ideally get a value close to 100. Note also that the method used is without constraints, hence it is likely that you will obtain negative values. In this case you can consider one of these two situations:

Negative values close to zero -- they are likely due to technical or biological variability. Simply set them to zero or scale all the values up so that the minimum value is zero.
Very low negative values -- there might be strong biological or technical variability for the cell type with such values. Exclude the cell type from the analisys.

GENE VIEWER

The Gene Viewer panel shows the median gene expression value of a gene accross the 29 immune cell types contained in our dataset. It is straitforward to use, simply enter the name of a gene in one of these formats: gene symbol, Ensembl ID or Entrez ID.

You could also use a standalone-html gene viewer which is available from: https://doi.org/10.5281/zenodo.2649355.

This software is released under the GPL v2 license, "which guarantees end users (individuals, organizations, companies) the freedoms to use, study, share (copy), and modify the software".

abis's People

Contributors

Stargazers

Watchers

Forkers

eegk kant guoliping2010 sisov fortune9 amnahsiddiqa sbenz-nh mil2041 montsecb yeredh erictrang mengchengyao feeeengym padwalmk mdmkac1 gr8lawrence

abis's Issues

Input data from Affymetrix arrays

I tried to use the ABIS tool via the web, but got an error: "An error has occurred. Check your logs or contact the app author for clarification." It works fine for the TPMPBMC.txt file, so it is not an issue of my browser, I assume.
My file contains RMA-normalized counts from human Affymetrix PrimeView arrays, has 19464 rows and 4 columns (gene symbols and 3 samples). It looks like this:

head test123.txt
"SCH1"	"SCH2"	"SCH3"
"1060P11.3"	3.5967598624897	3.93685508289532	3.87821946945781
"A1BG"	6.45565863125128	5.82575471633686	5.79043259640592
"A1CF"	5.34300130822687	5.68253584972171	5.30336643058087
"A2M"	4.15193049248871	4.3784780972333	3.94194736201775
"A2ML1"	3.49557726410993	3.6238503706137	3.51239887716308
"A3GALT2"	3.4001284612413	3.53329672413759	3.64071911619115
"A4GALT"	4.65199708894221	5.19605851776019	4.9648856544884
"A4GNT"	4.41943768837953	4.39662210867146	4.12529843617137
"AAAS"	5.73657599875531	5.8650512477877	5.86236578077812

I would be grateful for suggestions on what may be wrong here.
Thanks and best regards.
PS.: Are there some genes that "have to" be there?
PPS.: I just noticed that when selecting "RNA-Seq" as input type, it does run and output a table, but not when selecting "Microarray". Your example file works with both settings, though, and outputs different numbers of rows with each.

How to receive TPM values scaled by RLM for input into other deconvolution methods

Hello,

thank you fpr the amazing work on ABIS!

I was wondering how you obtained the TPM values scaled by mRNA abundance with RLM like you did in the original publication, Figure 7B.

In particular, I would like to test to input your signature matrix, scaled by mRNA abundance, into other deconvolution methods like FARDEEP.

The aim is forcing the cell populations to a positive scale (i.e., no negetaive values and 0 as the lower boundary).

Thank you,
Philipp

Unable to reproduce your paper results

Hi.
We are trying to reproduce your deconvolution method for RNA-seq using the matrix of mixes you provide on your github (TPMPBMC.txt). Your method returns a matrix of 'beta' coefficients that, theoretically, corresponds to the estimated proportions of each cell type in each sample.
In the supplementary to your paper (https://pubmed.ncbi.nlm.nih.gov/30726743/) you provide in a file the true matrix of proportions of each cell type in each sample.
Our problem is that, when comparing both matrices, we obtain neither the correlations nor the concordance correlation coefficients that you show in the paper. The correlations and concordances are very far from what you present in the paper.
Also, we do not understand (we do not know how you do) the normalization 'with tilde' TPM_RLM with which it seems that the proportions should look more like the real ones.
Could you clarify all these doubts? It would also help us if you provide us with the R code for the TPM_RLM normalization.
Thank you.

PBMC data does not add up to approximately 100 percent

Dear author,

I recently tried your deconvolution method on PBMC RNAseq data (TPM). However, none of the columns (samples) in the result table did add up to 100%, not even coming close in most cases. In some cases the percentage even was significantly above 100% Below find a subset of the data as I got it (colSums(data)):

X7053.08.001.005 X7053.08.001.008 X7053.08.001.010 X7053.08.001.011 X7053.08.001.012 X7053.08.001.013 X7053.08.001.014
47.658550 73.662520 44.081200 58.190400 63.597400 68.691400 51.867800
X7053.08.001.015 X7053.08.001.016 X7053.08.001.017 X7053.08.001.018 X7053.08.001.019 X7053.08.001.020 X7053.08.001.021
59.207400 79.147700 70.589700 56.311700 58.408340 62.325400 37.441700
X7053.08.001.022 X7053.08.001.023 X7053.08.001.024 X7053.08.001.025 X7053.08.001.026 X7053.08.001.027 X7053.08.001.028
46.511880 58.139300 64.109700 61.042100 52.493300 80.300900 59.569000
X7053.08.001.029 X7053.08.001.030 X7053.08.001.031 X7053.08.001.006 X7053.08.001.007 X7053.08.001.032 X7053.08.001.033
84.131000 64.686200 127.991800 74.345300 88.099000 99.438800 58.528200
X7053.08.001.034 X7053.08.001.035 X7053.08.001.036 X7053.08.001.037 X7053.08.001.038 X7053.08.001.039 X7053.08.001.040
53.186800 37.064300 60.761600 63.212230 44.397900 67.132800 61.01230

I know from trying a different deconvolution tool (quanTIseq) that neutrophils highly bias the result as many genes specific for this immune cell are highly expressed (also using your deconvolution tool, ranging from 40 to 110%). I don't know if this could impact your deconvolution method in such a way that I obtain such results as above? Generally, do you have any idea what could cause such results, and moreover, how to potentially solve the issue?

Thank you!

Run server.R from Studio

Hi,
I was trying to use the ABIS tool locally from R-studio but when I opened the 'server.R' file and run it with 'run app' I receive the following error message:

runApp('/Applications/ABIS-master')
Listening on http://127.0.0.1:7160
Warning in if (check.names) col.names <- make.names(col.names, unique = TRUE) :
the condition has length > 1 and only the first element will be used
Warning: Error in if: argument is not interpretable as logical
51: read.table
50: read.delim
Error in if (check.names) col.names <- make.names(col.names, unique = TRUE) :
argument is not interpretable as logical

Thanks a lot for any help.
Best regards
Francesco

How to combine similar cell types

For the sake of using the proportions generated by ABIS in cell type aware differential expression (TOAST package), I was recommended to merge similar cell types into 5-8 types only.

Is there a more smart way of doing that rather than simply summing together ABIS output proportions ?
The summary of ABIS generated proportions is

  Monocytes_C           NK           T_CD8_Memory   
 Min.   : 2.949   Min.   :-3.9268   Min.   : 1.674  
 1st Qu.: 6.622   1st Qu.: 0.9884   1st Qu.: 5.747  
 Median : 7.677   Median : 2.2476   Median : 8.797  
 Mean   : 8.137   Mean   : 2.4485   Mean   :10.647  
 3rd Qu.: 9.610   3rd Qu.: 3.7038   3rd Qu.:13.069  
 Max.   :15.787   Max.   : 9.7480   Max.   :48.491  
  T_CD4_Naive      T_CD8_Naive      
 Min.   :-2.411   Min.   :-25.8218  
 1st Qu.: 4.159   1st Qu.: -3.7227  
 Median : 6.610   Median : -0.4212  
 Mean   : 7.563   Mean   : -1.3039  
 3rd Qu.: 9.806   3rd Qu.:  2.6580  
 Max.   :25.564   Max.   :  9.5094  
    B_Naive         T_CD4_Memory         MAIT       
 Min.   : 0.5531   Min.   :-4.018   Min.   :-0.359  
 1st Qu.: 3.1769   1st Qu.: 1.671   1st Qu.: 2.001  
 Median : 4.6171   Median : 4.162   Median : 3.772  
 Mean   : 5.3058   Mean   : 4.119   Mean   : 3.942  
 3rd Qu.: 7.2463   3rd Qu.: 6.179   3rd Qu.: 5.590  
 Max.   :15.4163   Max.   :12.769   Max.   :11.857  
    T_gd_Vd2      Neutrophils_LD   T_gd_non_Vd2   
 Min.   :-2.938   Min.   :16.87   Min.   :-9.113  
 1st Qu.: 2.382   1st Qu.:41.87   1st Qu.:-4.120  
 Median : 3.447   Median :50.85   Median :-2.589  
 Mean   : 3.644   Mean   :50.70   Mean   :-2.560  
 3rd Qu.: 4.760   3rd Qu.:59.91   3rd Qu.:-1.491  
 Max.   :12.175   Max.   :80.54   Max.   : 8.840  
  Basophils_LD     Monocytes_NC_I   
 Min.   : 0.6517   Min.   :-1.2083  
 1st Qu.: 2.6431   1st Qu.: 0.2907  
 Median : 4.6014   Median : 0.9748  
 Mean   : 6.4039   Mean   : 1.2425  
 3rd Qu.: 8.5597   3rd Qu.: 1.8747  
 Max.   :46.8233   Max.   : 6.1912  
    B_Memory            mDCs         
 Min.   :-7.6440   Min.   :-0.07160  
 1st Qu.:-1.7759   1st Qu.: 0.08102  
 Median :-0.8056   Median : 0.13453  
 Mean   :-1.0176   Mean   : 0.14015  
 3rd Qu.:-0.0758   3rd Qu.: 0.19162  
 Max.   : 3.5879   Max.   : 0.46562  
      pDCs          Plasmablasts   
 Min.   :0.02063   Min.   :0.0358  
 1st Qu.:0.14775   1st Qu.:0.1508  
 Median :0.20857   Median :0.2183  
 Mean   :0.22740   Mean   :0.3574  
 3rd Qu.:0.28736   3rd Qu.:0.3611  
 Max.   :0.62854   Max.   :3.9839

As you can see, the median for three cell types (T_CD8_Naive, T_gd_non_Vd2, and B_memory) is negative. It seems reasonable to set all negative values to zero (and remove T_CD8_Naive from the analysis, due to its low minimum values).

However, an additional source of proportions (based on methylations data) is available for me for comparison's sake. I summed together similar cell types (T_CD8_Naive with T_CD8_Memory, B_Naive with B_Memory) and look at the correlation between the external source of proportions and the ABIS generated one.

The correlation (between ABIS merged cell types to the external source) is actually better if I do not set all negative values to zero.
Hence, my question is :

Does it make sense to sum together negative and positive proportions, when merging similar cell types into one?

On a different subject - I am performing de convolution on microarray data; the values range from 0.25 to 13. Is it an acceptable range? Is it preferable to filter the low values (that can indicate simply noise), or to apply some other procedure to the too low/too high values?

Microarrays data, often above 100%

Dear Gianni,

Thank you for the wonderful work, very useful for present and future immunological research.
I am running ABIS on R, I use PBMCs microarray data. Unfortunately, I obtain most often than not, cumulative abundance higher than 100 (range 88- 151). I have read how to deal with negative values, but I am wondering if those results could be trusted and how to deal with this. I could scale all the samples to 100 (but I guess I will lose some information) or just compare the relative abundances for each cell tipe. Any suggestion?

Thanks in advance,

Edoardo

ABIS for tissue RNA-seq data?

Hi Gianni,

I was wondering whether one can use the ABIS tool for deconvolution of tissue RNA-seq data to estimate the abundance of individual immune cells?

Thank you for your feedback.
Best, AnneCa

Install issue

Hi,

I want to install ABIS locally and I followed the instructions by open server.R file with RStudio, then click on "Run App.
But in R, it showed
Error in library(preprocessCore) :
there is no package called ‘preprocessCore’

Please kindly advise how to fix.

Thank you.

Preparing TPM files with gene symbol

Hi Gianni,

I am very interested in using your tool. Could you let me know which program(s) you used to prepare your TPM tab file with gene symbols?
I have generated count tables before, however they usually are e.g. ensembl ids and not TPM.

I am working with paired-end fastq files from whole PBMCs.

I already have gene count table generated through quantmode in STAR...

Best wishes,

Jason

RLM Warning message when running ABIS locally on a large dataset

I have cloned the repo ( git clone https://github.com/giannimonaco/ABIS) and ran the Shiny app locally (leaving all files as they are without change) - on 400 human samples.

I get a message

Warning in rlm.default(as.matrix(sigRNAseq[genes, ]), x, maxit = 100) :
  'rlm' failed to converge in 100 steps

Is it important? Does it invalidate the deconvolution results? How can this be fixed?

Deconvolution of the Whole blood cells transcriptome data using ABIS.

Hi,
I want to use ABIS for deconvolution of the RNA-Seq profiles generated from the whole blood ( only RBCs are removed by RBC lysis buffer) remaining all cells are their. I have run premilinary analysis and finding high Neutorphill percentage ( for some samples greater than 100 %). I am confused that whether ABIS can be used to deconvluate RNA-Seq profiles generated from the whole blood cells. Is this ok to get such a high percentage of the neutrophills in some samples ? Please share your views.

Further, I have generated TPM counts using the "STAR-------> Stringtie" pipeline. Will it be ok or should I generate TPM counts using the salmon pseudoalignments methods as you have suggested in some of your post. ?
Thank you..

Incomplete mapping of transcript IDs to the gene names within ABIS-seq matrix

Hi there,
thanks so much for generating and sharing this tool!
I have been using ABIS on published RNAseq datasets and the transcript IDs for the TPM tables are the Ensembl Transcript stable ID version. I have been using BioMart to map thesr transcript IDs to gene names to generate the ABIS input table. Using this method there are ~195 genes from the ABIS-Seq matrix (based on SuppTable5 from your paper) that are not represented within this input table. Some such gene names are below:

AP001171.1
AP001434.2
AP003774.6
C12orf74
C9orf47
CD8BP
CH17-296N19.1
CH17-373J23.1
CTA-286B10.7
CTB-113D17.1
CTB-114C7.4
CTB-50L17.14
CTB-61M7.1
CTC-205M6.1
FAM153C
FAM198B
FLJ27354
RP11-290F5.1
RP11-291B21.2
RP11-295P22.2
RP11-297B17.3
RP11-305L7.1

Are you also using the Ensembl Transcript stable ID version for your transcript identification? What method are you using to convert transcript IDs to gene names? If we can't represent all the genes from ABIS-Seq matrix within the gene list within the input table, will this compromise the cell deconvolution results?
Thank you!

Column names in Gene Browser

Just a small issue: In the Gene Browser, when downloading the data in tabular format, the column names for sample and expression are switched I believe:
gene_name cell_type expression sample
EGFR B_Ex RHH5218 -9.1371
EGFR B_Ex RHH5247 -9.1371
EGFR B_Ex RHH5276 -9.1371
EGFR B_Ex RHH5310 -9.1371
EGFR B_naive RHH5216 -9.1371
EGFR B_naive RHH5245 -9.1371

Best,
Sophia

Estimated frequencies are problematic

Hello,
First - awesome tool! Thank you very much.
I ran into a problem which I'm not sure how to solve, would be great to have your advice,
I'm using an RNAseq data (PBMCs) in which the data reported are presumably FPK (unfortunately - can't know for sure). I performed deconvolution using your R tool and the results weren't that promising (attached here as Result 1.
Then, I tried to transform the counts into TPM values by dividing the counts for each subject by the sum of the counts in each subject and multiple by one million. Then, again ran your algorithm and the performance was much better but still not perfect (attached here as Result 2)
It can be easily seen that the problems In Result 2 are:

Negative values
Values over 100
Summing up the frequencies to values much higher than 100

Would be great to have your advice

number of input reads

Dear Gianni,
I would like to know how many read you used for the analysis in the paper, or how many reads you would consider the minimum number, a good number or more than enough for the analysis?
Thanks and best,
Sophia

negative proportions

Hi, I wonder if there is any other procedure required after TPM normalization,

I attempted to use your package using the formula below over a data-set and obtained negative proportions in some cases.

https://support.bioconductor.org/p/91218/

tpm3 <- function(counts,len) {
x <- counts/len
return(t(t(x)*1e6/colSums(x)))
}

Best,

What exactly is the output?

I can reason that the program is attempting to output proportion representation of each population, but its not exactly clearly stated what the output is supposed to be. For instance, I ran this on a number of tumor samples with TPM values and not only had some negative values but also no sample added up to more than 47% in sum. Is this package not reporting proportion representation in sample? I can confidently state that the samples I ran are 99% of hematopoietic origin.

Additionally, does this program compute the RLM normalization? Or is this something that needs to be computed externally?

Just trying to get a sense for what this data actually means. I appreciate any insight you have.
Excellent paper!

Best,
J

could the deconvolution results be compared across samples and cell types?

Hi Gianni,

Thank you for your excellent work. I am very interested in ABIS.

I was wondering if the results of ABIS represent the absolute proportion of cell types and could be compared across samples and across cell types or not.

The reason of my question is that some other similar methods, like MCP-counter(DOI 10.1186/s13059-016-1070-5) and xCell(DOI 10.1186/s13059-017-1349-1), only generate cell type enrichment scores and these scores don't represent the percentage of the corresponding cell types.

Thanks again
Yi Han

To use ABIS shiny app for large TCGA data

Hi,
I am interested in using your ABIS app. Would it be possible to run ABIS (web version) on a large RNA-seg data from TCGA? Till how many samples web version is useful or there is no limit. My sample size is approximate at around 500.
can you also attach your "TPMPBMC.txt" example file?
Best regards:
Imran

information about the output file

Dear Gianni,

I am interested in using your program to perform deconvolution of bulk RNA-seq data.

I have two questions about the output that I can obtain from the software:

Will I identify the cell types that contributed (most) to the bulk data and the abundance of the different cell types that made up the PBMCs from which the bulk data are derived?
Will I obtain transformed bulk RNAseq data, for example, can I obtain different expression tables (gene vs expression level) for each cell type that constitutes my bulk RNA-seq data?

Thank you.

Concetta

Running the algortihm locally

Hi,
I would like to run my microarrays gene expressions through the algorithm on my PC. I have savedthe repository but couldn't understand what to run in either the ui.R or the server.R files. I can see in server.R there are variables sigMicroarray , target , MedianTPM and Annotation that need the file location on the disk, which I changed to the correct path. What next? how to generate output file and where will it be saved ?

Thanks

giannimonaco / abis Goto Github PK

abis's Introduction

ABsolute Immune Signal (ABIS) deconvolution

INSTALLATION

Run it on the web!

Run it locally without installation

Run it locally with installation

ABSOLUTE DECONVOLUTION

Input file

RNA-Seq deconvolution

Microarray deconvolution

Output

GENE VIEWER

abis's People

Contributors

Stargazers

Watchers

Forkers

abis's Issues

Recommend Projects

Recommend Topics

Recommend Org