Git Product home page Git Product logo

uc-bd2k / grein Goto Github PK

View Code? Open in Web Editor NEW
47.0 7.0 19.0 39.09 MB

GREIN : GEO RNA-seq Experiments Interactive Navigator

Home Page: https://shiny.ilincs.org/grein

License: GNU General Public License v2.0

R 15.27% CSS 0.59% HTML 83.87% JavaScript 0.15% Dockerfile 0.12%
rna-seq geo bioinformatics-pipeline bioinformatics-analysis rna-seq-pipeline rna-seq-data rna-seq-analysis graphical-interface r shiny-apps

grein's People

Contributors

michalkouril avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

grein's Issues

gene names not correspond to human ensembl v91

Hi team,

Recently I download gene expression files for GSE58375 from grein but I found out something contradictory. When I check GSE58375_GeneLevel_Raw_data.csv I found out the genes named "COX1","ND5","COX3","CYTB","COX2" and more are not in human ensembl v91 gtf files. But these genes can be found on gerin website and downloaded files with corresponding expression value. Could you explain to me how this happens please? Thank you.

Best,
Regards

Any possibility for mass download or API for dataset

Hi, I really appreciate the webserver and its high-qualitied RNA-seq datasets. However, I am working on a project requiring LOTS of GEO RNA-seq datasets, but I found downloading manually or using crawler tools like selenium inefficient for GREIN. Is their any possibility for the team to offer access for database files, like an archive of RNA-seq rawcount.csv files, or have an API like getGEO in R? Thanks a lot.

GEO_data_processing error

Hi, I try to run the GREP2 for the new dataset, I just follow the GEO_data_processing.R but get the error,
In folder "data/user_geo_request/" I only saw the test.txt file with empty content. , how I can fix it?
For example, I want to do with the dataset: GSE22666

library(GREP2)

logdir <- "data/user_geo_request/"
destdir <- "data/user_geo_request/"
cat(paste("STEP 1: Processing starts... ","\n",sep=""),file=paste0(logdir,"/",geo_series_acc,"/log.txt"))
**Error in paste0(logdir, "/", geo_series_acc, "/log.txt") :
object 'geo_series_acc' not found**

process_geo_rnaseq (geo_series_acc=GSE22666,destdir="data/user_geo_request",
ascp=TRUE,prefetch_workspace="path_to_prefetch_workspace",
ascp_path="path_to_aspera",get_sra_file=FALSE,trim_fastq=FALSE,
trimmomatic_path=NULL,index_dir="path_to_indexDir",
species=species,countsFromAbundance="lengthScaledTPM",n_thread=2)
**Error in process_geo_rnaseq(geo_series_acc = GSE22666, destdir = "data/user_geo_request", :
unused arguments (get_sra_file = FALSE, trimmomatic_path = NULL)**

Duplicated gene symbol

Hi, I got the gene raw count from GREIN, but in the gene symbol column have too many duplicated gene ( around 2700) but different Ensembl ID, If I want to use gene symbol for further analysis, how I can remove duplicated raw. Thank you

ex:
ENSG00000204574 ABCF1
ENSG00000206490 ABCF1
ENSG00000225989 ABCF1
ENSG00000231129 ABCF1
ENSG00000232169 ABCF1
ENSG00000236149 ABCF1
ENSG00000236342 ABCF1

website down?

Hello,
I haven't been able to access the GREIN website for a few days. A blue circle is constantly rotating and the background turns light grey after a few seconds. The problem remains the same with Chrome, Opera and Firefox.
Thank you.

Christophe

ALL SAMPLES ARE NOT INCLUDED FOR ANALYSIS

I am trying to analyse GSE131705 and GSE134900 through GREIN . when I searched these two in 2 section it showed the data has been processed for both the series . In NCBI GEO showed both series have more than 90 samples but in GREIN only 20 samples are included.
And it is also not accepting re analysis of these series.

GSE2666 example in container, Create Signature, not working

Hi,
I installed GREIN using docker, the interface shows as advertised, however when I wanted to run the analysis of the example dataset GSE2666, using Analyze -> Create Signature -> Variable of Interest: Cell Line, Type of Comparison: Two group without covariate, Experimental Group: H9, Control Group: HeLa and Click Generate Signature, looks like is computing, but then shows no output, when I download the signature, it is also empty.

I think your work is great, and I would like to use it locally (by the way when I go to the public server, the data set GSE2666 has not been analyzed yet ... I thought it should be there as you chose it as the example in the container)

I appreciate it if you can help, it may be something simple (I hope). I can ssh into the container, but don't know which log files could help to understand the issue.

Some datasets are not loading completely

Hi guys,
I was trying to look at some datasets (i.e. GSE133317) and it seems to be processed but can not load metadata/count table etc...
Probably there is a bug in the way!

Thanks,
Mehdi

image

All counts are zero

Hello
I thank you for your academic contributions.

I downloaded the count data for GSE110390 that have already been analyzed
from the http://www.ilincs.org/apps/grein/?gse= .
However, the dataset showed that all counts for all genes were zero.

Can I ask you to solve this problem?

Regards,

API for download and TPM calculation

Hello,

The GREIN looks great! I am interested in downloading expression files for blood samples from GEO.
What will be the best way to do so? also, please let me know if I have transcripts counts and the genome reference version to transfer the measurements to TPM

Many thanks,
Eila

Header in downloadable data is partially worng

I found this in the transcript level data (not sure about gene level data) The header looks like this:

",V1,Sample1,Sample2"

The data is structured as:

"rowNumber,transcriptId,Sample1"

This means that there is a headers missing - which will cause R's read.csv to skip the transcriptId column.

Is there a particular reason you would include rowNumbers in the data?

Combine multiple dataset from GREIN

I have one question about combination results, as I know a combination of the different datasets will get the batch effect. I want to ask how to avoid the batch effect when combining results from GREIN? actually, I want to combine gene count or normalized count results.
Thank you

Incomplete Dataset and New Datasets

Respected developers and authors,

I sincerely thank you for hosting and maintaining this website analyzing and providing raw data for tons of RNA-seq studies in GEO. I have also recently used the processing request to get some of the datasets analyzed and its great to see such a quick process on your end.

I have some queries regarding certain human datasets I am investigating that are hosted on GEO. The details with GSE IDs are as follows:

GSE135743 - It seems the dataset is analyzed for only 20 samples while the total samples in the GEO is 59. Can it be redone on your end with all the samples?

GSE144254 and GSE78928 - It seems these datasets are not available on your backend. Last I checked GSE144254 has 42 samples of only bulk mRNA-seq data.

GSE78928 has bulk mRNA seq samples in both Homo Sapiens and Mus musculus. Additionally it has ncRNA-seq data as well. I guess the pipeline cannot differentiate multiple organisms/RNA library types in a single dataset?

Atleast GEO shows that Raw SRA for both of these datasets are available. Is there a way to request the authors/developers of GREIN to add datasets and specifc samples on your backend for analysis?

GSE182866 - This dataset has no accessible raw SRA in the GEO database. So I assume this cannot be processed?

Thank You .

Site not loading

Hi,
I was wondering if the website is still active?
When I try to use it, the elements on the screen keep showing the loading animation and the screen is greyed out.

Best regards,
Sebastian

Gene number different from different dataset

I got the results from some different dataset. however, the gene numbers in the count table were different.
For example GSE55807: 28,089 genes, and GSE126669: 28,125 genes.
Can I combine results from multiple datasets for further analysis?
Thank you.

GEO_data_processing.R get error

Hi, I try to run the GREP2 for the new dataset, I just follow the GEO_data_processing.R but get the error,
In folder "data/user_geo_request/" I only saw the test.txt file with empty content. , how I can fix it?
For example, I want to do with the dataset: GSE22666

library(GREP2)

logdir <- "data/user_geo_request/"
destdir <- "data/user_geo_request/"
cat(paste("STEP 1: Processing starts... ","\n",sep=""),file=paste0(logdir,"/",geo_series_acc,"/log.txt"))
Error in paste0(logdir, "/", geo_series_acc, "/log.txt") :
object 'geo_series_acc' not found

process_geo_rnaseq (geo_series_acc=GSE22666,destdir="data/user_geo_request",
ascp=TRUE,prefetch_workspace="path_to_prefetch_workspace",
ascp_path="path_to_aspera",get_sra_file=FALSE,trim_fastq=FALSE,
trimmomatic_path=NULL,index_dir="path_to_indexDir",
species=species,countsFromAbundance="lengthScaledTPM",n_thread=2)
Error in process_geo_rnaseq(geo_series_acc = GSE22666, destdir = "data/user_geo_request", :
unused arguments (get_sra_file = FALSE, trimmomatic_path = NULL)

Analysis not done

Hello

I appreciate for making a great program.
I tried to analyze GSE144269 on the http://www.ilincs.org/apps/grein/?gse=.
The processing console showed the analysis was activated, but
the analysis did not proceed more than 2 steps
and went back to the beginning of the analysis.
I tried more than 4 times, and all the interruption was the same.

Could you solve this problem?
Is this interruption sensitive to the big dataset?

Regards,

new datasets

Hi guys,
I was wondering why the number of in progress datasets are more than the processed ones (first page plot). I noticed that something probably has changed and the pipeline can no longer download data from GEO. I checked several recently published data on GEO and when tried to analyze, they either had been tried before by someone else and failed or if I submit it for analysis, it fails the download step shortly after extracting metadata. (ie GSE125422, GSE159067...). It seems many dataset names and metadata have been added to the database list but failed to download and process. Any idea?

Fast download of Count Matrix?

Thank you for creating this very powerful database. You have made outstanding contributions to GEO data mining. I would like to ask if there is a faster way to download the count matrix, as this is not conducive for website usage by web crawlers. Please contact me if there are any further developments. Thank you once again.

Suggestions for improvements

I think GREIN is a really usefull tool but have two suggestions for imporvements:

  • Make abundances downloadable as well - many people would be interested in that and the improved abundance estimates are one of the main features of a Salmon quantification.
  • If you re-run the pipeline I would wish for running salmon with the bias correction (--seqBias and --gcBias) since those are some of the main features of Salmon.

Count table at gene level

Hi,
I am trying to run DEA for some datasets using raw count table at gene level. I see that some gene symbol are mapped to multiple ID's such as AADACL2 mapped to 2 different ENSEMBL ids and LST1 mapped to 8 different ENSEMBL ids. Is there a way to get one row for each gene symbol?
Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.