uc-bd2k / grein Goto Github PK
View Code? Open in Web Editor NEWGREIN : GEO RNA-seq Experiments Interactive Navigator
Home Page: https://shiny.ilincs.org/grein
License: GNU General Public License v2.0
GREIN : GEO RNA-seq Experiments Interactive Navigator
Home Page: https://shiny.ilincs.org/grein
License: GNU General Public License v2.0
Hi team,
Recently I download gene expression files for GSE58375 from grein but I found out something contradictory. When I check GSE58375_GeneLevel_Raw_data.csv I found out the genes named "COX1","ND5","COX3","CYTB","COX2" and more are not in human ensembl v91 gtf files. But these genes can be found on gerin website and downloaded files with corresponding expression value. Could you explain to me how this happens please? Thank you.
Best,
Regards
Hi, I really appreciate the webserver and its high-qualitied RNA-seq datasets. However, I am working on a project requiring LOTS of GEO RNA-seq datasets, but I found downloading manually or using crawler tools like selenium inefficient for GREIN. Is their any possibility for the team to offer access for database files, like an archive of RNA-seq rawcount.csv files, or have an API like getGEO in R? Thanks a lot.
Hi, I try to run the GREP2 for the new dataset, I just follow the GEO_data_processing.R but get the error,
In folder "data/user_geo_request/" I only saw the test.txt file with empty content. , how I can fix it?
For example, I want to do with the dataset: GSE22666
library(GREP2)
logdir <- "data/user_geo_request/"
destdir <- "data/user_geo_request/"
cat(paste("STEP 1: Processing starts... ","\n",sep=""),file=paste0(logdir,"/",geo_series_acc,"/log.txt"))
**Error in paste0(logdir, "/", geo_series_acc, "/log.txt") :
object 'geo_series_acc' not found**
process_geo_rnaseq (geo_series_acc=GSE22666,destdir="data/user_geo_request",
ascp=TRUE,prefetch_workspace="path_to_prefetch_workspace",
ascp_path="path_to_aspera",get_sra_file=FALSE,trim_fastq=FALSE,
trimmomatic_path=NULL,index_dir="path_to_indexDir",
species=species,countsFromAbundance="lengthScaledTPM",n_thread=2)
**Error in process_geo_rnaseq(geo_series_acc = GSE22666, destdir = "data/user_geo_request", :
unused arguments (get_sra_file = FALSE, trimmomatic_path = NULL)**
Hi, I got the gene raw count from GREIN, but in the gene symbol column have too many duplicated gene ( around 2700) but different Ensembl ID, If I want to use gene symbol for further analysis, how I can remove duplicated raw. Thank you
ex:
ENSG00000204574 ABCF1
ENSG00000206490 ABCF1
ENSG00000225989 ABCF1
ENSG00000231129 ABCF1
ENSG00000232169 ABCF1
ENSG00000236149 ABCF1
ENSG00000236342 ABCF1
In the meta data the id column uses the GSE id. In the transcript count file (not sure about gene) the sample columns have the id structure: paste0( SRR_id, '_', 'GSE_id')
Hello,
I haven't been able to access the GREIN website for a few days. A blue circle is constantly rotating and the background turns light grey after a few seconds. The problem remains the same with Chrome, Opera and Firefox.
Thank you.
Christophe
I am trying to analyse GSE131705 and GSE134900 through GREIN . when I searched these two in 2 section it showed the data has been processed for both the series . In NCBI GEO showed both series have more than 90 samples but in GREIN only 20 samples are included.
And it is also not accepting re analysis of these series.
Hi,
I installed GREIN using docker, the interface shows as advertised, however when I wanted to run the analysis of the example dataset GSE2666, using Analyze -> Create Signature -> Variable of Interest: Cell Line, Type of Comparison: Two group without covariate, Experimental Group: H9, Control Group: HeLa and Click Generate Signature, looks like is computing, but then shows no output, when I download the signature, it is also empty.
I think your work is great, and I would like to use it locally (by the way when I go to the public server, the data set GSE2666 has not been analyzed yet ... I thought it should be there as you chose it as the example in the container)
I appreciate it if you can help, it may be something simple (I hope). I can ssh into the container, but don't know which log files could help to understand the issue.
Hello
I thank you for your academic contributions.
I downloaded the count data for GSE110390 that have already been analyzed
from the http://www.ilincs.org/apps/grein/?gse= .
However, the dataset showed that all counts for all genes were zero.
Can I ask you to solve this problem?
Regards,
Hello,
The GREIN looks great! I am interested in downloading expression files for blood samples from GEO.
What will be the best way to do so? also, please let me know if I have transcripts counts and the genome reference version to transfer the measurements to TPM
Many thanks,
Eila
I found this in the transcript level data (not sure about gene level data) The header looks like this:
",V1,Sample1,Sample2"
The data is structured as:
"rowNumber,transcriptId,Sample1"
This means that there is a headers missing - which will cause R's read.csv to skip the transcriptId column.
Is there a particular reason you would include rowNumbers in the data?
I have one question about combination results, as I know a combination of the different datasets will get the batch effect. I want to ask how to avoid the batch effect when combining results from GREIN? actually, I want to combine gene count or normalized count results.
Thank you
Respected developers and authors,
I sincerely thank you for hosting and maintaining this website analyzing and providing raw data for tons of RNA-seq studies in GEO. I have also recently used the processing request to get some of the datasets analyzed and its great to see such a quick process on your end.
I have some queries regarding certain human datasets I am investigating that are hosted on GEO. The details with GSE IDs are as follows:
GSE135743 - It seems the dataset is analyzed for only 20 samples while the total samples in the GEO is 59. Can it be redone on your end with all the samples?
GSE144254 and GSE78928 - It seems these datasets are not available on your backend. Last I checked GSE144254 has 42 samples of only bulk mRNA-seq data.
GSE78928 has bulk mRNA seq samples in both Homo Sapiens and Mus musculus. Additionally it has ncRNA-seq data as well. I guess the pipeline cannot differentiate multiple organisms/RNA library types in a single dataset?
Atleast GEO shows that Raw SRA for both of these datasets are available. Is there a way to request the authors/developers of GREIN to add datasets and specifc samples on your backend for analysis?
GSE182866 - This dataset has no accessible raw SRA in the GEO database. So I assume this cannot be processed?
Thank You .
The site is still not loading today (may 23 2022)
Hey there, I've tried to load this page http://www.ilincs.org/apps/GRIN/ unsuccessfully for a few times today and yesterday and keep seeing this error.
Help?
Hi,
I was wondering if the website is still active?
When I try to use it, the elements on the screen keep showing the loading animation and the screen is greyed out.
Best regards,
Sebastian
hello naimmahi,
I can't load the page http://www.ilincs.org/tmp/GREIN_metadata.h5,and I can't downloaded data from https://shiny.ilincs.org/grein. Can you give me some advice to download the GREIN_metadata?Thanks very much.
I got the results from some different dataset. however, the gene numbers in the count table were different.
For example GSE55807: 28,089 genes, and GSE126669: 28,125 genes.
Can I combine results from multiple datasets for further analysis?
Thank you.
Hi, I try to run the GREP2 for the new dataset, I just follow the GEO_data_processing.R but get the error,
In folder "data/user_geo_request/" I only saw the test.txt file with empty content. , how I can fix it?
For example, I want to do with the dataset: GSE22666
library(GREP2)
logdir <- "data/user_geo_request/"
destdir <- "data/user_geo_request/"
cat(paste("STEP 1: Processing starts... ","\n",sep=""),file=paste0(logdir,"/",geo_series_acc,"/log.txt"))
Error in paste0(logdir, "/", geo_series_acc, "/log.txt") :
object 'geo_series_acc' not found
process_geo_rnaseq (geo_series_acc=GSE22666,destdir="data/user_geo_request",
ascp=TRUE,prefetch_workspace="path_to_prefetch_workspace",
ascp_path="path_to_aspera",get_sra_file=FALSE,trim_fastq=FALSE,
trimmomatic_path=NULL,index_dir="path_to_indexDir",
species=species,countsFromAbundance="lengthScaledTPM",n_thread=2)
Error in process_geo_rnaseq(geo_series_acc = GSE22666, destdir = "data/user_geo_request", :
unused arguments (get_sra_file = FALSE, trimmomatic_path = NULL)
Hello
I appreciate for making a great program.
I tried to analyze GSE144269 on the http://www.ilincs.org/apps/grein/?gse=.
The processing console showed the analysis was activated, but
the analysis did not proceed more than 2 steps
and went back to the beginning of the analysis.
I tried more than 4 times, and all the interruption was the same.
Could you solve this problem?
Is this interruption sensitive to the big dataset?
Regards,
Hi guys,
I was wondering why the number of in progress datasets are more than the processed ones (first page plot). I noticed that something probably has changed and the pipeline can no longer download data from GEO. I checked several recently published data on GEO and when tried to analyze, they either had been tried before by someone else and failed or if I submit it for analysis, it fails the download step shortly after extracting metadata. (ie GSE125422, GSE159067...). It seems many dataset names and metadata have been added to the database list but failed to download and process. Any idea?
Thank you for creating this very powerful database. You have made outstanding contributions to GEO data mining. I would like to ask if there is a faster way to download the count matrix, as this is not conducive for website usage by web crawlers. Please contact me if there are any further developments. Thank you once again.
I think GREIN is a really usefull tool but have two suggestions for imporvements:
Hi,
I am trying to run DEA for some datasets using raw count table at gene level. I see that some gene symbol are mapped to multiple ID's such as AADACL2 mapped to 2 different ENSEMBL ids and LST1 mapped to 8 different ENSEMBL ids. Is there a way to get one row for each gene symbol?
Thanks,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.