Git Product home page Git Product logo

visualmsi's Introduction

VisualMSI

VisualMSI is a tool to detect and visualize microsatellite status from NGS data, by simulating the PCR behavior. VisualMSI extracts the PCR adapters from the reference genome, and tries to map them to the sequencing reads. If the adapters are successfully mapped to a read/pair, its inserted length enbraced by the adapter will be calculated. VisualMSI performs statistics based on the inserted length distribution. This method is very similar as the PCR-based MSI detection method, which is usually considered as the golden standard method for clinical usage.

For each MSI target locus, VisualMSI computes the information entropy of its inserted length distribution. The information entropy value is a indicator for the MSI status, the higher the information entropy is, the higher the probility that this MSI locus is instable.

VisualMSI can run in tumor-normal paired mode or case-only mode, and the tumor-normal mode is suggested if the paired normal sample is available. If the paired normal sample is given, VisualMSI will evaluate the earth mover's distance (EMD) between the distributions of tumor data or normal data. Since the normal data is usually considered as MSI-stable, the EMD value indicates how instable the tumor data is when comparing to the normal data. The higher the EMD value is, the higher probility that this MSI locus is instable.

Take a quick glance of the informative report

A quick example

  • Tumor-normal paired mode:
visualmsi -i tumor.sorted.bam -n normal.sorted.bam -r hg19.fasta -t targets/msi.tsv
  • Case-only mode (no paired normal data given):
visualmsi -i tumor.sorted.bam -r hg19.fasta -t targets/msi.tsv

Get visualmsi program

download binary

This binary is only for Linux systems, http://opengene.org/VisualMSI/visualmsi

# this binary was compiled on CentOS, and tested on CentOS/Ubuntu
wget http://opengene.org/VisualMSI/visualmsi
chmod a+x ./visualmsi

or compile from source

# step 1: download and compile htslib from: https://github.com/samtools/htslib
# step 2: get VisualMSI source (you can also use browser to download from master or releases)
git clone https://github.com/OpenGene/VisualMSI.git

# build
cd VisualMSI
make

# Install
sudo make install

Usage

You should provide following arguments to run visualmsi

  • the reference genome fasta file, specified by -r or --ref=
  • the target setting file, specified by -t or --target=
  • the input BAM file, specified by -i or --in=. If the normal data is available, specify it by -n or --normal=
  • the plain text result is directly printed to STDOUT, you can pipe it to a file using a >

Reference genome

The reference genome should be a single whole FASTA file containg all chromosome data. This file shouldn't be compressed. For human data, typicall hg19/GRch37 or hg38/GRch38 assembly is used, which can be downloaded from following sites:

Target file

The target file is a TSV (tab-separated values) file giving the MSI locuses. To add a MSI target locus at chr:position, you can add a row with values (chrom, position, name). You can see the example from targets/msi.tsv:

#CHROM  POSITION  NAME
chr4  55598216  BAT25
chr2  47641568  BAT26
chr14 23652365  NR-21
chr11 102193518 NR-27
chr2  95849372  NR-24

Please note that this file is based on hg19 coordination.

Reports

VisualMSI reports results in HTML/JSON/TEXT formats, you can take a look at following examples:

Tumor-normal paired mode

image  

For each MSI locus, the entropy values of tumor and normal data are shown, as well as the earth mover's distance (EMD) value.

Case-only mode

image  

For each MSI locus, only the entropy value of tumor data is shown.

Cite VisualMSI

Please cite VisualMSI as following:

Chen, S., Qu, H., Yang, B., Huang, T., Zhang, X., Liu, Y., ... & Gu, J. (2019). Detect and visualize tumor microsatellite instability status from next-generation sequencing data by simulating PCR techniques. Journal of Clinical Oncology 37, no. 15_suppl.e13052

All options

options:
  -i, --in                     input sorted bam/sam file for the case (tumor) sample. STDIN will be read from if it's not specified (string [=-])
  -n, --normal                 input sorted bam/sam file for the paired normal sample (tumor-normal mode). If not specified, VisualMSI will run in case-only mode. (string [=])
  -t, --target                 the TSV file (chr, start, end, name) to give the MSI targets (string)
  -r, --ref                    reference fasta file name (should be an uncompressed .fa/.fasta file) (string)
  
  # options for setting thresholds
  -a, --adapter_len            set the length of the adapter for PCR simulation (5~30). Default 12 means the left and right adapter both have 12 bp. (int [=12])
  -l, --target_inserted_len    set the distance on reference of the two adapters for PCR simulation (20~200). Default 100 means: <left adapter><100 bp inserted><right adapter> (int [=100])
  -d, --depth_req              set the minimum depth requirement for each MSI locus (1~1000). Default 10 means 10 supporting reads/pairs are required. (int [=10])
  
  # options for specifying the file names of the reports
  -j, --json                   the json format report file name (string [=msi.json])
  -h, --html                   the html format report file name (string [=msi.html])

  # other options
      --debug                  output some debug information to STDERR.
  -?, --help                   print this message

visualmsi's People

Contributors

sfchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

visualmsi's Issues

how to determine the status of MSI, as snapshot method using tumor/normal samples

Hi,

I am very interesting in your VisualMSI tool, since it is quite simple to use and very fast to find the microsatellites.

However, I am confusing how to determine the MSI status, etc. MSS, MSI-L, MSI-H, since the results for five loci did not directly infer the status, or do you have any thresholds to determine?

Thanks

QC failed for dataset provided for testing

I've followed the steps to install visualMSI and the tried to testing the dataset provided.

visualmsi -i tumor.sorted.bam -r hg19.fasta -t msi.tsv

It loads the reference data, parses the tsv file the processes the provided tumour bam file; however, for each microsatellite location it says QC failed, zero reads (screenshot attached).

Screenshot from 2020-01-28 11-14-21

terminate called after throwing an instance of 'std::out_of_range'

Hi Pro Chen,
when I use the VisualMsi software, I accure the problem like this
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr

the command is like this:
visualmsi -i *case.bam -n *ctrl.bam -r hg19.fa -t visual_msi.tsv -j *json -h *html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.