uc-bd2k / grein Goto Github PK

GREIN : GEO RNA-seq Experiments Interactive Navigator

Home Page: https://shiny.ilincs.org/grein

License: GNU General Public License v2.0

R 15.27% CSS 0.59% HTML 83.87% JavaScript 0.15% Dockerfile 0.12%

bioinformatics-analysis bioinformatics-pipeline gene-expression gene-expression-omnibus geo geo-data-processing graphical-interface grein interactive-analysis interactive-visualizations processing r rna-seq rna-seq-analysis rna-seq-data rna-seq-pipeline shiny-apps

grein's People

Contributors

Stargazers

Watchers

Forkers

zhilongjia naimmahi htnani jyotsanamehra bbyun28 buihoangnam1988 metabdel ldweinstock thudxz phagehunter iiiime standardgalactic nilesh-iiita changxin-wang rnaimehaom zacrasca nsmc11 bioinf-lab

grein's Issues

Count table at gene level

Hi,
I am trying to run DEA for some datasets using raw count table at gene level. I see that some gene symbol are mapped to multiple ID's such as AADACL2 mapped to 2 different ENSEMBL ids and LST1 mapped to 8 different ENSEMBL ids. Is there a way to get one row for each gene symbol?
Thanks,

GSE2666 example in container, Create Signature, not working

Hi,
I installed GREIN using docker, the interface shows as advertised, however when I wanted to run the analysis of the example dataset GSE2666, using Analyze -> Create Signature -> Variable of Interest: Cell Line, Type of Comparison: Two group without covariate, Experimental Group: H9, Control Group: HeLa and Click Generate Signature, looks like is computing, but then shows no output, when I download the signature, it is also empty.

I think your work is great, and I would like to use it locally (by the way when I go to the public server, the data set GSE2666 has not been analyzed yet ... I thought it should be there as you chose it as the example in the container)

I appreciate it if you can help, it may be something simple (I hope). I can ssh into the container, but don't know which log files could help to understand the issue.

Server down error?

Hey there, I've tried to load this page http://www.ilincs.org/apps/GRIN/ unsuccessfully for a few times today and yesterday and keep seeing this error.

Help?

Site not loading

Hi,
I was wondering if the website is still active?
When I try to use it, the elements on the screen keep showing the loading animation and the screen is greyed out.

Best regards,
Sebastian

Combine multiple dataset from GREIN

I have one question about combination results, as I know a combination of the different datasets will get the batch effect. I want to ask how to avoid the batch effect when combining results from GREIN? actually, I want to combine gene count or normalized count results.
Thank you

new datasets

Hi guys,
I was wondering why the number of in progress datasets are more than the processed ones (first page plot). I noticed that something probably has changed and the pipeline can no longer download data from GEO. I checked several recently published data on GEO and when tried to analyze, they either had been tried before by someone else and failed or if I submit it for analysis, it fails the download step shortly after extracting metadata. (ie GSE125422, GSE159067...). It seems many dataset names and metadata have been added to the database list but failed to download and process. Any idea?

All counts are zero

Hello
I thank you for your academic contributions.

I downloaded the count data for GSE110390 that have already been analyzed
from the http://www.ilincs.org/apps/grein/?gse= .
However, the dataset showed that all counts for all genes were zero.

Can I ask you to solve this problem?

Regards,

Analysis not done

Hello

I appreciate for making a great program.
I tried to analyze GSE144269 on the http://www.ilincs.org/apps/grein/?gse=.
The processing console showed the analysis was activated, but
the analysis did not proceed more than 2 steps
and went back to the beginning of the analysis.
I tried more than 4 times, and all the interruption was the same.

Could you solve this problem?
Is this interruption sensitive to the big dataset?

Regards,

website down?

Hello,
I haven't been able to access the GREIN website for a few days. A blue circle is constantly rotating and the background turns light grey after a few seconds. The problem remains the same with Chrome, Opera and Firefox.
Thank you.

Christophe

ALL SAMPLES ARE NOT INCLUDED FOR ANALYSIS

I am trying to analyse GSE131705 and GSE134900 through GREIN . when I searched these two in 2 section it showed the data has been processed for both the series . In NCBI GEO showed both series have more than 90 samples but in GREIN only 20 samples are included.
And it is also not accepting re analysis of these series.

Suggestions for improvements

I think GREIN is a really usefull tool but have two suggestions for imporvements:

Make abundances downloadable as well - many people would be interested in that and the improved abundance estimates are one of the main features of a Salmon quantification.
If you re-run the pipeline I would wish for running salmon with the bias correction (--seqBias and --gcBias) since those are some of the main features of Salmon.

Post your comments or suggestions here. Thanks!

API for download and TPM calculation

Hello,

The GREIN looks great! I am interested in downloading expression files for blood samples from GEO.
What will be the best way to do so? also, please let me know if I have transcripts counts and the genome reference version to transfer the measurements to TPM

Many thanks,
Eila

Fast download of Count Matrix?

Thank you for creating this very powerful database. You have made outstanding contributions to GEO data mining. I would like to ask if there is a faster way to download the count matrix, as this is not conducive for website usage by web crawlers. Please contact me if there are any further developments. Thank you once again.

SIte not loading (bis)

The site is still not loading today (may 23 2022)

Any possibility for mass download or API for dataset

Hi, I really appreciate the webserver and its high-qualitied RNA-seq datasets. However, I am working on a project requiring LOTS of GEO RNA-seq datasets, but I found downloading manually or using crawler tools like selenium inefficient for GREIN. Is their any possibility for the team to offer access for database files, like an archive of RNA-seq rawcount.csv files, or have an API like getGEO in R? Thanks a lot.

Gene number different from different dataset

I got the results from some different dataset. however, the gene numbers in the count table were different.
For example GSE55807: 28,089 genes, and GSE126669: 28,125 genes.
Can I combine results from multiple datasets for further analysis?
Thank you.

Header in downloadable data is partially worng

I found this in the transcript level data (not sure about gene level data) The header looks like this:

",V1,Sample1,Sample2"

The data is structured as:

"rowNumber,transcriptId,Sample1"

This means that there is a headers missing - which will cause R's read.csv to skip the transcriptId column.

Is there a particular reason you would include rowNumbers in the data?

gene names not correspond to human ensembl v91

Hi team,

Recently I download gene expression files for GSE58375 from grein but I found out something contradictory. When I check GSE58375_GeneLevel_Raw_data.csv I found out the genes named "COX1","ND5","COX3","CYTB","COX2" and more are not in human ensembl v91 gtf files. But these genes can be found on gerin website and downloaded files with corresponding expression value. Could you explain to me how this happens please? Thank you.

Best,
Regards

Duplicated gene symbol

Hi, I got the gene raw count from GREIN, but in the gene symbol column have too many duplicated gene ( around 2700) but different Ensembl ID, If I want to use gene symbol for further analysis, how I can remove duplicated raw. Thank you

ex:
ENSG00000204574 ABCF1
ENSG00000206490 ABCF1
ENSG00000225989 ABCF1
ENSG00000231129 ABCF1
ENSG00000232169 ABCF1
ENSG00000236149 ABCF1
ENSG00000236342 ABCF1

download metadata

hello naimmahi,
I can't load the page http://www.ilincs.org/tmp/GREIN_metadata.h5,and I can't downloaded data from https://shiny.ilincs.org/grein. Can you give me some advice to download the GREIN_metadata?Thanks very much.

The following datasets are not complete in terms of sample size.

GREIN is a fantastic tool for exploring RNA-seq data, and I greatly appreciate it. However, there appears to be an issue where certain datasets include only 20 samples each, which is not consistent with the sample size listed in GEO. It seems there might be some bugs present. Could I re-procressing the following datasets?
GSE184941
GSE190504
GSE180280
GSE183947
GSE189757

GSE146009
GSE162960
GSE165255
GSE183984
GSE107422

GSE179746
GSE158420
GSE171415
GSE142441
GSE172356

GSE181273
GSE133626
GSE147493
GSE179252
GSE184336

GSE113255
GSE126304
GSE127165
GSE142083
GSE173855

GSE112026
GSE179351
Than you very much!!

ERROR: GSE174836 cannot be processed

Hi!

Thank you for this great tool!

One dataset that I would like to analyse is not yet in GREIN so I am trying to process it. Its GEO accession number is GSE174836.

However, it is giving the following error. Does this mean that single-cell RNA data cannot be processed? I find it strange because many pre-processed datasets in GREIN are single-cell RNA data.

How can I preprocess and analyse this dataset on GREIN? Thank you in advance!

Incomplete Dataset and New Datasets

Respected developers and authors,

I sincerely thank you for hosting and maintaining this website analyzing and providing raw data for tons of RNA-seq studies in GEO. I have also recently used the processing request to get some of the datasets analyzed and its great to see such a quick process on your end.

I have some queries regarding certain human datasets I am investigating that are hosted on GEO. The details with GSE IDs are as follows:

GSE135743 - It seems the dataset is analyzed for only 20 samples while the total samples in the GEO is 59. Can it be redone on your end with all the samples?

GSE144254 and GSE78928 - It seems these datasets are not available on your backend. Last I checked GSE144254 has 42 samples of only bulk mRNA-seq data.

GSE78928 has bulk mRNA seq samples in both Homo Sapiens and Mus musculus. Additionally it has ncRNA-seq data as well. I guess the pipeline cannot differentiate multiple organisms/RNA library types in a single dataset?

Atleast GEO shows that Raw SRA for both of these datasets are available. Is there a way to request the authors/developers of GREIN to add datasets and specifc samples on your backend for analysis?

GSE182866 - This dataset has no accessible raw SRA in the GEO database. So I assume this cannot be processed?

Thank You .

GEO_data_processing error

Hi, I try to run the GREP2 for the new dataset, I just follow the GEO_data_processing.R but get the error,
In folder "data/user_geo_request/" I only saw the test.txt file with empty content. , how I can fix it?
For example, I want to do with the dataset: GSE22666

library(GREP2)

logdir <- "data/user_geo_request/"
destdir <- "data/user_geo_request/"
cat(paste("STEP 1: Processing starts... ","\n",sep=""),file=paste0(logdir,"/",geo_series_acc,"/log.txt"))
**Error in paste0(logdir, "/", geo_series_acc, "/log.txt") :
object 'geo_series_acc' not found**

process_geo_rnaseq (geo_series_acc=GSE22666,destdir="data/user_geo_request",
ascp=TRUE,prefetch_workspace="path_to_prefetch_workspace",
ascp_path="path_to_aspera",get_sra_file=FALSE,trim_fastq=FALSE,
trimmomatic_path=NULL,index_dir="path_to_indexDir",
species=species,countsFromAbundance="lengthScaledTPM",n_thread=2)
**Error in process_geo_rnaseq(geo_series_acc = GSE22666, destdir = "data/user_geo_request", :
unused arguments (get_sra_file = FALSE, trimmomatic_path = NULL)**

Header in data and meta data file does not match

In the meta data the id column uses the GSE id. In the transcript count file (not sure about gene) the sample columns have the id structure: paste0( SRR_id, '_', 'GSE_id')

Hi. I downloaded raw counts for GSE83577, but the data corresponds to normalized counts. Is this a problem just for this dataset of a platform bug? thanks

GEO_data_processing.R get error

library(GREP2)

logdir <- "data/user_geo_request/"
destdir <- "data/user_geo_request/"
cat(paste("STEP 1: Processing starts... ","\n",sep=""),file=paste0(logdir,"/",geo_series_acc,"/log.txt"))
Error in paste0(logdir, "/", geo_series_acc, "/log.txt") :
object 'geo_series_acc' not found

process_geo_rnaseq (geo_series_acc=GSE22666,destdir="data/user_geo_request",
ascp=TRUE,prefetch_workspace="path_to_prefetch_workspace",
ascp_path="path_to_aspera",get_sra_file=FALSE,trim_fastq=FALSE,
trimmomatic_path=NULL,index_dir="path_to_indexDir",
species=species,countsFromAbundance="lengthScaledTPM",n_thread=2)
Error in process_geo_rnaseq(geo_series_acc = GSE22666, destdir = "data/user_geo_request", :
unused arguments (get_sra_file = FALSE, trimmomatic_path = NULL)

Some datasets are not loading completely

Hi guys,
I was trying to look at some datasets (i.e. GSE133317) and it seems to be processed but can not load metadata/count table etc...
Probably there is a bug in the way!

Thanks,
Mehdi