Git Product home page Git Product logo

rf_sample_data's Introduction

Sample Data for RiboFlow

A complete set of sample files to test and try RiboFlow pipeline.

All files and references herein are coming from the human genome.

The files in this repo are random subsets of the originally published data. For each sample, there is 1 million raw reads which is a fraction of the original data.

Required and Optional Files

Not all files are required by RiboFlow.

Required File Types:

  • Fastq files from ribosome profiling experiments
  • Annotation
  • Transcriptome Reference
  • Filter

Optional File Types:

  • Fastq files from RNA-Seq experiments
  • Metadata
  • Genome Reference
  • Post-Genome Reference

Fastq

Includes raw reads for ribosome profiling and RNA-Seq data. Each sample has two fastq files. All fastq files are obtained by taking a subset of reads from the publicly available data.

  1. Single cell ribosome profiling data with UMIs: (1cell-2, 1cell-4)
    NCBI GEO accession number GSE185732 published in Ozadam, Tonn, Han, et.al.
    This dataset contains UMIs which need to be removed prior to alignment.
  2. Bulk ribosome profiling and RNA-Seq data: ( GSM1606107 and GSM1606108 )
    NCBI GEO accession number GSE65778 published in Sidrauski et. al..

Note that RNA-Seq data is optional for RiboFlow and .ribo files.

Annotation

The tsv file contains transcript lengths. The bed file contains region boundaries; CDS, 5'UTR and 3'UTR.

Metadata

Contains metadata for the ribo files in yaml format.

Metadata is optional for RiboFlow and ribo files.

Transcriptome Reference

Bowtie2 index files for the transcriptome. The actual output of the RiboFlow pipeline, i.e., ribo files, is obtained using the reads that are mapped to the transcriptome reference.

Filter

Bowtie2 index coming from the filter sequences which are mainly ribosomal and tRNAs.

Genome

A mock Hisat2 reference in place of the entire genome. For actual data analysis, users should download the complete human genome such as hg38. Links are avaialble at Hisat2 website.

Note that this reference has no effect on the output ribo files since the reads are mapped to the transcriptome to generate ribo files. The reads which aren't map to the transcriptome are mapped to genome.

Genomic Reference is an optional parameter for RiboFlow.

Post-Genome

A sample bowtie2 reference file as post-genome reference.

Reads that are not mapped to the genome are mapped to post-genome reference.

Similar to the case of genome, post-genome parameter is optional and it has no effect on the output ribo files.

rf_sample_data's People

Contributors

hakanozadam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.