Scramble

Repository Layout

cluster_identifier/ - this directory contains the part of the application responsible for identifying soft clipped clusters. For how to build see the build section.
cluster_analysis/ - contains code related to the analysis of soft-clipped clusters and interpretation.
validation/ - sample bam and outputs for testing SCRAMble
README - Repository description/notes.

Build

Install dependencies (Ubuntu 20.04):

apt-get update
apt-get install -y  \
    autoconf \
    autogen \
    build-essential \
    curl \
    libbz2-dev \
    libcurl4-openssl-dev \
    libhts-dev \
    liblzma-dev \
    libncurses5-dev \
    libnss-sss \
    libssl-dev \
    libxml2-dev \
    ncbi-blast+ \
    r-base \
    r-bioc-biostrings \
    r-bioc-rsamtools \
    r-cran-biocmanager \
    r-cran-devtools \
    r-cran-stringr \
    r-cran-optparse \
    zlib1g-dev

Install R packages dependencies:

Rscript -e "library(devtools); install_github('mhahsler/rBLAST')"

To build the cluster_identifier (estimated install time <5 minutes):

$ cd cluster_identifier/src
$ make

That should be it. It will create an executable named build/cluster_identifier.

Running

SCRAMble runs as a two-step process. First cluster_identifier is used to generate soft-clipped read cluster consensus sequences. Second, SCRAMble-MEIs.R analyzes the cluster file for likely MEIs. Running SCRAMble on the test bam in the validation directory should take <1 minute for each step.

To run SCRAMble cluster_identifier:

$ /path/to/scramble/cluster_identifier/src/build/cluster_identifier \
    /path/to/install_dir/scramble/validation/test.bam > /path/to/output/test.clusters.txt

To run SCRAMble-MEIs and SCRAMble-dels(with default settings):

$ Rscript --vanilla /path/to/scramble/cluster_analysis/bin/SCRAMble.R \
    --out-name /path/to/output/test 	\
    --cluster-file /path/to/output/test.clusters.txt \
    --install-dir /path/to/scramble/cluster_analysis/bin \
    --mei-refs /path/to/scramble/cluster_analysis/resources/MEI_consensus_seqs.fa \
    --ref /path/to/scramble/validation/test.fa \
    --eval-meis \
    --eval-dels

Running with Docker

SCRAMble is also distributed with a Dockerfile. Running SCRAMble using docker (estimated install time <10 minutes):

$ git clone https://github.com/GeneDx/scramble.git
$ cd scramble
$ docker build -t scramble:latest .
$ docker run -it --rm scramble:latest bash
# cluster_identifier \
    /app/validation/test.bam > /app/validation/test.clusters.txt
# Rscript --vanilla /app/cluster_analysis/bin/SCRAMble.R \
    --out-name ${PWD}/test \
    --cluster-file /app/validation/test.clusters.txt \
    --install-dir /app/cluster_analysis/bin \
    --mei-refs /app/cluster_analysis/resources/MEI_consensus_seqs.fa \
    --ref /app/validation/test.fa \
    --eval-dels \
    --eval-meis

Output

The output of cluster_identifier is a tab delimited text file with clipped cluster consensus sequences. The columns are as follows:


1.	Coordinate
2.	Side of read where soft-clipped occurred
3.	Number of reads in cluster
4.	Clipped read consensus
5.	Anchored read consensus

Calling SCRAMble.R with --eval-meis produces a tab delimted file. If a reference .fa file is provided, then a VCF is produced as well. The <out-name>_MEIs.txt output is a tab delimited text file with MEI calls. If no MEIs are present an output file will still be produced with only the header. The columns are as follows:


1.	Insertion	Coordinate where MEI insertion occurs (zero-based)
2.	MEI_Family	The consensus sequence to which the clipped sequence aligned best
3.	Insertion_Direction	Whether MEI is on fwd or rev strand relative to bam reference
4.	Clipped_Reads_In_Cluster	Number of supporting reads in cluster
5.	Alignment_Score	Pairwise alignment score of clipped read consensus to MEI reference sequence
6.	Alignment_Percent_Length	Percent of clipped read consensus sequence involved in alignment to MEI reference sequence
7.	Alignment_Percent_Identity	Percent identify of alignment of clipped read consensus sequence with MEI reference sequence
8.	Clipped_Sequence	Clipped cluster consensus sequences
9.	Clipped_Side	Left or right, side of read where soft-clipping ocurred
10.	Start_In_MEI	Left-most position of alignment to MEI reference sequence
11.	Stop_In_MEI	Right-most position of alignment to MEI reference sequence
12.	polyA_Position	Position of polyA clipped read cluster if found
13.	polyA_Seq	Clipped cluster consensus sequences of polyA clipped read cluster if found
14.	polyA_SupportingReads	Number of supporting reads in polyA clipped read cluster if found
15.	TSD	Target site duplication sequence if polyA clipped read cluster found
16.	TSD_length	Length of target site duplication if polyA clipped read cluster found

Calling SCRAMble.R with --eval-dels produced a VCF and a tab delimted file. The <out-name>_PredictedDeletions.txt output is a tab delimited text file with deletion calls. If no deletions are present an output file will still be produced with only the header. The columns are as follows:


1.	CONTIG	Chromosome
2.	DEL.START	Deletion start coordinate (0-based)
3.	DEL.END	Deletion end coordinate (0-based)
4.	REF.ANCHOR.BASE	Reference based at deletion start
5.	DEL.LENGTH	Deletion length
6.	RIGHT.CLUSTER	Name of right cluster
7.	RIGHT.CLUSTER.COUNTS	Number of supporting reads in right cluster
8.	LEFT.CLUSTER	Name of left cluster
9.	LEFT.CLUSTER.COUNTS	Number of supporting reads in left cluster
10.	LEN.RIGHT.ALIGNMENT	Length of right-clipped consensus sequence involved in alignment
11.	SCORE.RIGHT.ALIGNMENT	BLAST alignment bitscore for right-clipped consensus
12.	PCT.COV.RIGHT.ALIGNMENT	Percent length of right-clipped consensus involved in alignment
13.	PCT.IDENTITY.RIGHT.ALIGNMENT	Percent identity of right-clipped consensus in alignment
14.	LEN.LEFT.ALIGNMENT	Length of left-clipped consensus sequence involved in alignment
15.	SCORE.LEFT.ALIGNMENT	BLAST alignment bitscore for left-clipped consensus
16.	PCT.COV.LEFT.ALIGNMENT	Percent length of left-clipped consensus involved in alignment
17.	PCT.IDENTITY.LEFT.ALIGNMENT	Percent identity of right-clipped consensus in alignment
18.	INS.SIZE	Length of insert within deleted sequence (for two-end deletions only)
19.	INS.SEQ	Inserted sequence (for two-end deletions only)
20.	RIGHT.CLIPPED.SEQ	Clipped consensus sequence for right-clipped cluster
21.	LEFT.CLIPPED.SEQ	Clipped consensus sequence for left-clipped cluster

Disclaimers

In theory, SCRAMble should work well on any MEI reference fasta sequences, however, it has only been tested on the sequences provided in /path/to/scramble/cluster_analysis/resources/MEI_consensus_seqs.fa.

alesmaver / scramble Goto Github PK

scramble's Introduction

Scramble

Repository Layout

Build

Running

Running with Docker

Output

Disclaimers

scramble's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent