Git Product home page Git Product logo

bulk_rnaseq_fusion_nf's Introduction

STAR-fusion NF

STAR-fusion run using the Nextflow workflow manager.

This workflow requires:

  1. Sample sheet, comma delimited
  2. The location of the CTAT resource library
  3. The location of CICERO genome resource library

The output of STAR aligner, STAR-Fusion, and Fusion-Inspector will be uploaded to an S3 bucket. This includes the most relevant output files, such as SJ.out.tab, aligned.bam, and chimeric.junctions.tab, and the fusion inspector HTML report. In addition, the fastq files undergo quality control checks and a multiQC report is generated and uploaded an S3 bucket. The workflow also includes the CICERO fusion detection algorithm that is run using the aligned.bam from STAR-aligner output.

The output files will be put into a directory that is named after the sample ID provided in the sample sheet file.

The STAR-Fusion docker image can be updated easily by selecting the latest image from either 1) the CTAT docker hub or quay.io. Then update the fusion-processes.nf with the appropriate image and tag.

The pre-build STAR-Fusion genome references can be found here.

If a new fusion resource library must be created, for example for a non-human species, there is detailed documentation available here.

[09/26/2023] STAR-fusion --denovo_reconstruct currently does not work.

To Run

Install Nextflow

Install nextflow using the conda env yaml file.

conda env create -f env/nextflow.yaml

First Step: Sample Sheet

First, create a sample manifest for the fastq files that are hosted in an S3 bucket. The manifest file is a simple 3 column, tab-delimited text file with the column names "Sample", "R1", and "R2", where R1 and R2 are paired end-fastqs.

I have created a sample manifest file that can be used to select the appropriate files, with a demonstration in Nextflow_AWS_Sample_Sheets_from_Manifest.ipynb that uses the associated create_sample_sheet.py. Below is a very simple example directly on the command line.

python3 create_sample_sheet.py "my-s3-bucket" "Fastq" --prefix "TARGET_AML/RNAseq_Illumina_Data/Fastq/"

This can also be accomplished using R with

library(aws.s3)
library(aws.signature)
library(tidyr) 

creds <- aws.signature::use_credentials(profile = "default")
Sys.setenv("AWS_ACCESS_KEY_ID" = creds$default$AWS_ACCESS_KEY_ID,
           "AWS_SECRET_ACCESS_KEY" = creds$default$AWS_SECRET_ACCESS_KEY,
           "AWS_DEFAULT_REGION"="us-west-2")
BUCKET="my-s3-bucket"
PREFIX="TARGET_AML/RNAseq_Illumina_Data/Fastq"
fastqs <- get_bucket_df(bucket = BUCKET, 
                        prefix = PREFIX,
                        max = Inf) %>%
           mutate(filename=str_split_fixed(Key, pattern = "/", n=4)[,4],
                  Sample=basename(filename),
                  Read=case_when(
                       grepl("_r[1].fq.gz|_R[1]_.+|r[1].fastq.gz", filename) ~ "R1", 
                       grepl("_r[2].fq.gz|_R[2]_.+|r[2].fastq.gz", filename) ~ "R2")) %>% 
           pivot_wider(id_cols=c(Sample), 
                 names_from=Read, 
                 values_from=filename)                      

Second Step: Execute the Workflow

Edit the nextflow.config file to contain the correct output directories and point to the sample sheet in sample_sheet, and the references in fasta_file, gtf, star_genome_lib, cicero_genome_lib.

Then call the wrapper script on the command line, main_run.sh. The nextflow config file can be specified by upating the variable NFX_CONFIG for the workflow to be run locally, using Singularity/Apptainer on an HPC system.

To select different executors and container engines, use of the nextflow profiles ('local_apptainer', 'PBS_apptainer', 'local_singularity', 'PBS_singularity'). Update the NFX_PROFILE in main_run.sh to select a different executor/container engine.

conda activate nextflow
./main_run.sh

CICERO Fusion Detection

This repo also provides a NF workflow for the CICERO fusion detection algorithm. The genome references for CICERO can be found here and here.

For GRCh38 fasta

Quality Control

In addition, running FastQC and multiQC on the input fastq files.

DSL2 Resources

Resources on Automated Builds

bulk_rnaseq_fusion_nf's People

Contributors

jennylsmith avatar

Watchers

Marc Carlson avatar Neerja Katiyar avatar Lindsay Clark avatar  avatar  avatar Glenn Morton avatar

Forkers

jennylsmith

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.