Git Product home page Git Product logo

msisensor's Introduction

Note: For questions and discussion about msisensor, please visit the repository of msisensor-pro at https://github.com/xjtu-omics/msisensor-pro.

MSIsensor

MSIsensor is a C++ program to detect replication slippage variants at microsatellite regions, and differentiate them as somatic or germline. Given paired tumor and normal sequence data, it builds a distribution for expected (normal) and observed (tumor) lengths of repeated sequence per microsatellite, and compares them using Pearson's Chi-Squared Test. Comprehensive testing indicates MSIsensor is an efficient and effective tool for deriving microsatellite instability (MSI) status from standard tumor-normal paired sequence data. MSIsensor is publiched in Bioinformatics. Please click here to see more details about MSIsensor. If you have any questions about MSIsensor, please contact one or more of the following folks: Beifang Niu ([email protected]), Kai Ye ([email protected]) or Li Ding ([email protected]).

If you use these tools for your work, please cite the following papers:
[1] Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015-1016, doi:10.1093/bioinformatics/btt755 (2014).
[2] Jia, P. et al. MSIsensor-pro: Fast, Accurate, and Matched-normal-sample-free Detection of Microsatellite Instability. Genomics, Proteomics & Bioinformatics, doi:https://doi.org/10.1016/j.gpb.2020.02.001 (2020).

MSIsensor-pro is a new MSI detection method developed by Kai Ye et al. MSIsensor-pro is a fast, accurate, and matched-normal-sample-free MSI detection method. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input. MSIsensor-pro introduces a multinomial distribution model to quantify polymerase slippages for each tumor sample and a discriminative sites selection method to enable MSI detection without matched normal samples. MSIsensor-pro is now published in Genomics Proteomics & Bioinformatics. If you have any question about MSIsensor-pro, please open a issue on MSIsensor-pro's homepage or contact with Kai Ye ([email protected]) directly.

MSIsensor2 is also a MSI detecteion method specially designed for tumor only sequencing data. MSIsensor2 was developed by Beifang Niu's lab ([email protected]) independently. Please try the MSIsensor2 here: https://github.com/niu-lab/msisensor2 or require any further details here: http://niulab.scgrid.cn/msisensor2/index.html.

msisensor's People

Contributors

alexpenson avatar beifang avatar ckandoth avatar liangkaiye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

msisensor's Issues

Single tumor BAM doesnt work.....

Hello,

Thank you for sharing such a nice tool. I have tumor BAM files and I would like to do MSI analysis on them using msisensor tool. But I am unable to run the tool using tumor BAM. It sys "Please provide valid format normal bam file !". Can I run the tool just using tumor BAM?

I have also tried on test dataset provided. But got the same error.
msisensor msi -d test/example.microsate.sites -t test/example.tumor.bam -o output.tumor.prefix

Many thanks.

How to prepare correct bed file for msisensor msi?

I am trying to use the bed file carried by msisensor which is under test/all_CDS_and_ncRNA_24Chroms_Contigs_1BasedStart_2bpFlanks_ForMusic. Seems like test/example.bed used in the example script is part of this file. However, when I replace example.bed with the big bed file:

../msisensor msi -d example.microsate.sites -n example.normal.bam -t example.tumor.bam -e all_CDS_and_ncRNA_24Chroms_Contigs_1BasedStart_2bpFlanks_ForMusic -o test -l 1 -q 1

It shows following error:
loading bed regions ...
loading homopolymer and microsatellite sites ...
Segmentation fault (core dumped)

The same error happens when I try to use the big bed with my own bam files. So I am wondering is there any specific requirement for preparing the bed file?

Thank you very much!

Running on WGS

Hi,
Thank you for providing a comprehensive tool. I am currently running MSIsensor on WGS and WES done on the same sample.
While I do get some % value for the Exome, for the WGS it only detects 1 somatic site. I am running the command exactly like provided in the README.

Any help with understanding what is going on is much appreciated.

msisensor msi -c [int] does not work

Hi,

My name is Manuel LEBEURRIER and I am testing MSIsensor on cancer datasets at Gustave ROUSSY.

Your work is really nice and totally coherent with clinical needs.

But your parameter -c isn't working when non default value is given.

The problem came from a little mistake in homo.cpp lines 192 and 194 param.covCutoff must be replace by paramd.covCutoff.

If someone is responsible for the development of this tool, I have some suggestions :

A normalisation step should be a good idea. In some cases the sequencing depths can be different between Normal and Tumoral.

I have found in MSI a SSR having a corresponding read count table like this :
N: 0 0 0 0 5 80 5 0
T: 0 0 0 0 0 90 0 0
It can't be a MSI because SSR length distributions have the same mod but the variability is higher in normal than in tumoral. Or we are looking for cases where length variability is higher in tumoral than normal (in case of same mod).

Best regards,

LEBEURRIER manuel

query about bam

When you use bwa to generate bam input for msisensor, what is your specific bwa command? bwa mem or bwa mem -M or bwa aln or other specific parameters? Thanks very much.

5-marker or 7-marker

Hi, I'm wondering how to assess the 5-marker/ 7-marker positivity from the output of msisensor, as the result indicated in the supplementary table of the paper?
Also, could you tell me what does the number under each column of markers in the supplementary table mean? Thank you!

Kevin

MSI scroe?

MSIsensor judge MSI-H,MSI-L,MSS by MSI score,>3.5 belong to MSI <3.5 belong to MSS.
I want to know the MSI score is equal to the % from the result below?
Total_Number_of_Sites Number_of_Somatic_Sites %
1314 15 1.14
and the total_number_of_sites mean the total MS sites in my bedfile?

msisensor2 requires write permission to the models directory

Hi,
thanks for the great work and with msisensor2.
It appears that msisensor2 requires write permission to the models directory.
If it is not provided the tool does not exit with an error but instead produces output files with 0 msi. This makes it hard to work in environments where the installed tool is in a read only filesystem.
Is it possible to use a different directory (e.g. ${TMPDIR}) to write, or at least issue and error and exit with an exit code when it cannot?

Error: 'scan' keeps getting 'killed'

When I try to make the initial microsatellites list inside my Docker container, it fails every time with the following error:

Step 35/37 : RUN cd /opt/bin/msisensor && ./msisensor scan -d "${HG19_FA}" -o "${HG19_MICROSATELLITES}"
 ---> Running in 0942bacffdc0
scan -d /opt/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -o /opt/bin/msisensor/hg19_microsatellites.list Start at:  Wed Jan 31 01:52:09 2018

scanning chomosome chrM done. 1 secs passed
scanning chomosome chr1 done. 12 secs passed
scanning chomosome chr2 done. 23 secs passed
scanning chomosome chr3 done. 32 secs passed
scanning chomosome chr4 done. 51 secs passed
scanning chomosome chr5 done. 60 secs passed
scanning chomosome chr6 done. 67 secs passed
scanning chomosome chr7 done. 75 secs passed
Killed
The command '/bin/sh -c cd /opt/bin/msisensor && ./msisensor scan -d "${HG19_FA}" -o "${HG19_MICROSATELLITES}"' returned a non-zero code: 137
make: *** [build] Error 137

Microsatellite loci for hg38

Hello -

I am trying to use MSISensor on WES paired tumor-normal data aligned to hg38. I downloaded the fastq files from UCSC and ran the scan command to generate the list of microsatellite loci for hg38. The program has been stalled for hours. Is this normal? Is there a way to thread/fork this so it does this faster?

msisensor scan -d /mnt/hg38.fa.gz -o /mnt/msisensor_files/microsatellites_hg38.list

Tagging release 0.6

I am going to tag the current master branch as release 0.6. The recent commit 2c62cc9 seems important, and we want to start using it in our research pipeline. I will include a Dockerfile. Let me know if you have any concerns.

cannot 'make' installation with samtools 1.3.1

When I try to compile msisensor, I get the following error:

$ make
g++ -O2 -fopenmp -I/opt/bin/samtools-1.3.1 -c cmds.cpp -o cmds.o
g++ -O2 -fopenmp -I/opt/bin/samtools-1.3.1 -c scan.cpp -o scan.o
In file included from bamreader.h:26:0,
                 from refseq.h:41,
                 from scan.cpp:41:
/opt/bin/samtools-1.3.1/bam.h:48:25: fatal error: htslib/bgzf.h: No such file or directory
 #include "htslib/bgzf.h"
                         ^
compilation terminated.
make: *** [scan.o] Error 1

Any suggestions?

How msisensor deal with soft clipping reads?

Hi,
I read your article, and it not mention counted reads including soft clipping or not, because soft-clipping of reads may add potentially unwanted alignments to repetitive regions, I don't know wheather soft-clipping reads have impact on the msisensor results or not.
Many thanks.

msi fails with exit code 139 when encountering contigs not in bam/bai

When running msi using a microsatellite list generated by MSIsensor scan on UCSC hg19, msi fails (Segmentation fault (core dumped)) when hitting contigs such as chrM, not present in the bam/bai used as input. If these are manually removed from the microsatellite list, the issue is resolved.

msisensor on whole exome

Hi,

Should we not create the microsatellites list for the regions covered by the exome we are doing instead of the whole genome?
if yes, could you please update the README to filter the microsatellites.list using a WES target bed file?

Thanks,
Rajesh

The MSI loci in output.prefix_germline file

What's the difference between output.prefix_germline file and output.prefix_somatic file, and why some of their MSI loci are same.How did you get the MSI loci in the output.prefix_germline file?

whether to use deduplicated bam

When I ran Msisensor, I found the results are quite different between using the deduplicated bam and not deduplicated bam. I wonder which bam shoud be used , the deduplicated bam or not deduplicated bam .

not_dedeplicated.bam

Total_Number_of_Sites	Number_of_Somatic_Sites	%
9739	1501	15.41

dedeplicated.bam

Total_Number_of_Sites	Number_of_Somatic_Sites	%
8798	122	1.39

error while loading shared libraries

Hi,
I installed your package using conda with conda install msisensor
But when I try to use it, I get this error message:
msisensor: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory
Do you know where the problem comes from? All requested packages already installed.
Thank you in advance for your help,
-Charlotte

Reference genome issues in msisensor and number of somatic sites prediction in msisensor2

Hi ,
I have a normal and tumor bam file from patient sample.I would like to view the micro satellites in the sample.I tried scanning reference genome hg38 and hg19 from ucsc genome browser.
I can create the micro satellites list file but with the file i am not able to proceed with finding scores and I am repeatedly getting the following error,
Same reference genome file should be used in both 'msisensor scan' and 'msisensor msi' steps!!!
I have questions like:

  1. Is there any specific reference genome to use ?
  2. I could run scan and msi score on the data with genome models in msisensor2 but i couldn't get any information on threshold value for MSI status. How much is the threshold for MSI status in msisensor2?
  3. Also please explain the difference in number of somatic sites and msi score in matched tumor-normal and tumor only options. Which one I should use for reliable result?
    Please reply.

No matched normal

Hi, and thanks for all the hard work on this. We are attempting to use your tool, but are working with smaller targeted panel data, without a matched normal. The approach we are taking involves creating a pooled normal sample out of 20 random "normal" samples we have available. I then down sample that file to make the BAM roughly the same size as the tumor samples we are working with. The result is elevated MSI scores across the board (30%-70% range). Most of the samples tend to be in the 40-45% range using your program, so I'm thinking with additional normal samples to add to our pool, we may be able to set MSI-H/L/S cutoffs. Could you share any of your expertise as to whether you think this is plausible? What issues do you think we might encounter?

virtual memory error - scan ref genome

I am trying to run msisensor on a few samples from targeted sequencing data. I get an error at the scan step.

Reference genome – b37 GATK version
Command - msisensor scan -d GATKresources/b37/human_g1k_v37.fasta -o msisensor.b37.microsatellites.list

Error –
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)

The virtual memory limit is 8Gb.
The log shows going through all 22 chromsomes.

scanning chomosome 20 done. 152 secs passed
scanning chomosome 21 done. 154 secs passed
scanning chomosome 22 done. 156 secs passed

However the output of scan, file msisensor.b37.microsatellites.list , stops at chromosome 1.
1 46462340 3 57 3 138 202 TGC AGAGG ATAGG
1 46462400 1 0 5 179 539 A AGTAT GACGT
1 46462417

I was able to run the test sample correctly. Any advice will be helpful!
In case you have uploaded the microsatellites list for b37, please let me know.

Thanks!

Error "Same reference genome file should be used in both 'msisensor scan' and 'msisensor msi' steps!!!"

I have tested the example data from within the test folder for the msi scoring function and I noticed that if I specify any of the following argument, with the default values used, the same error message as seen on the subject line "Same reference genome file should be used in both 'msisensor scan' and 'msisensor msi' steps!!!" shows up with empty output files generated.

-m50 -q3 -s5 -w40

When I don't specify them in the command, the tool seems to work as expected (and have the four non-zero output files generated). But aren't those above arguments with the default values automatically applied if not specified in the command line?

(Example) This one generates the error:
msi -d example.microsate.sites -n example.normal.bam -t example.tumor.bam -e example.bed -f0.05 -i1 -c20 -z0 -m50 -w40 -o output

any input is appreciated.

Bioconda package

Would anybody be interested in assisting in creating a Bioconda recipe for MSIsensor? I have no experience with C++, and so far my attempts have failed miserably. I'm sure that I am just missing something obvious (for people familiar with compiling C++ code):

bioconda/bioconda-recipes#9965

the relationship of -l -p -m

i don't understand the the relationship of -l -p -m, if set as default, does it mean the homopolymer of size between 5-10 will not be calculated? thank for your reply
-l mininal homopolymer size, default=5
-p mininal homopolymer size for distribution analysis, default=10
-m maximal homopolymer size for distribution analysis, default=50

Zero scoring returns empty files

Running an analysis that finds no MSI results should still return files with values NOT empty files.

For example, an analysis with no results looks like:

 $ cat /path/<sample>.msi.output

But it should return:

 $ cat /path/<sample>.msi.output

Total_Number_of_Sites	Number_of_Somatic_Sites	%
0	0	0.00

Otherwise, from a pipeline setting, it is difficult to distinguish an analysis that has failed from an analysis that simply didn't find anything.

can't find the description of the result file : output.prefix: msi score output

The list of microsatellites is output in "scan" step. The MSI scoring step produces 4 files:
output.prefix
output.prefix_dis_tab
output.prefix_germline
output.prefix_somatic

output.prefix: msi score output

Total_Number_of_Sites Number_of_Somatic_Sites %
640 75 11.72

I can find Number_of_Somatic_Sites can match the file output.prefix_somatic,but the Total_Number_of_Sites 640 is not,this Total_Number_of_Sites isn't output as a result? Is there someone know where can i find the Total_Number_of_Sites? or description of this?

feature request: ignore duplicate-marked reads

this issue is to request ignoring dup reads as an option. gatk best practice is to keep all duplicates in the bam, and i have read from a few issues here and here that a deduplicated (not just marked duplicate) bam is best for msisensor msi. therefore, we have to run an extra step and use almost twice as much storage to run msisensor from a deduplicated bam. ignoring dup reads from marked bam would avoid the trouble while hopefully not adding to the run time of msisensor

only part of BED file loaded/analysed

When I try to restict the anaysis with a bed file, not all regions of the file get loaded. Are there special naming or sorting requirements? I have tried numeric and lexical sorting, as well as changing the interval's names to no avail.

$ ./msisensor msi -d human_g1k_v37_decoy.microsat -t msi.bam -e test.bed -o test                                         11.1s
msi -d human_g1k_v37_decoy.microsat -t msi.bam -e test.bed -o test Start at:  Thu Dec  6 11:20:25 2018

loading bed regions ...
loading homopolymer and microsatellite sites ...

Total loading windows:  1 


Total loading homopolymer and microsatellites:  1 

window: 0 done...:18:53160526-53162526

*** Summary information ***

Number of total sites: 1
Number of sites with enough coverage: 1
Number of MSI sites: 1

Total time consumed:  28 secs

$ cat test.bed
18	53161197	53161313	MSI1
18	53161497	53161611	MSI2
18	53842297	53842413	MSI3
18	53842610	53842723	MSI4
18	57426007	57426123	MSI5
18	57426308	57426424	MSI6
18	61873331	61873445	MSI7
18	61873677	61873783	MSI8
18	67435963	67436082	MSI9
18	67436304	67436415	MSI10
$

bedfile.zip

The bam file has over 100x coverage in all the target regions.

MSI score greater than 1

I ran MSIsensor-pro with paired samples successfully and I found some sample has a MSI score great than 1. Actually, the MSI score is the percent of somatic sites, which can not be 1 or great than 1. So, I am confused with the result. I hope you can give me some suggestions.
25X(X8GVQDO$8K8TFKZBNH0

MSIsensor using variable numbers of positions

Hi there.
I made a number of random samplings of the full microsatellite list created with "scan". I am downsampling it to test if using a subset of the positions is still able to differentiate between high and low MSI samples. I'm seeing that even using a small number (10 thousand) positions I can easily differentiate the two groups of samples.

I have discovered something unusual though. For some reason, the high MSI cases get lower % MSI when I used more than 5000000 positions, and much higher when I use fewer positions.

feedback

Do you have any ideas of why higher numbers of positions for high MSI cases would not be in line with the lower counts of positions?

MSIsensor,how to classify the MSI high and MSI low?

Dear beifang,
Sorry liberty to disturb. I'm hongsen qu and a new for MSIsenor from China. Recently I have install MSIsenor and run the command "bash run.sh" for test,then I got the MSI score 100%(Number_of_Somatic_Sites %) which means microsatellite instability, and it also means MSI high, how to classify the MSI high and MSI low ? Looking forward to your reply.

Tumor-only method

Hi,
Thank you for your improvment for tumor-only analysis. Could you give us more informations about the method used please ?
Bests,
Elodie

bam files after realignment & output_dis file

Hello,
Thank you for the msisensor tool.
I have 2 questions:

  1. I am using bam files after the GATK workflow. Is this a valid input for msisensor?
  2. I have a question regarding the output_dis file (I am using v 0.6).
    I thought that this file has the lengths of the microstaellites. However it always has 100 values for T and for N.
    What are these values?

Thank you

Get location of total sites

Hi, I could find the locations of all the somatic sites in the file output.prefix_somatic. Could I get the locations of total sites analyzed somewhere? They should be a part of sites in the file microsatellites.list.
Thanks a lot!

read count distribution

Hello,

I am new to MSI analysis. I would like to get the length of the microsatellite and the number of reads. Can I get this information from MSI output?

The MSI_dis file contains counts but I dont know whether I should sum all counts to get read count for that sample. Also how to get the length of the microsatellite?

Many thanks,

failed to open output files to write

I'm trying to pipe the output of msisensor with a bash script and get this error with the following command:
for bam in BAMs/*.bam ; do msisensor msi -d GRCh37.msi.list -t $bam -e CDHS-15963Z-17266.GRCh37.roi.bed -o msisensor/${bam%.bam}.msi.txt -b 4 & ; done
The directories for BAMs and msisensor already exist before running the command.

what is homopolymer size?

hi,
I am currently testing MSIsensor, and confused on a parameter in msisensor scan or msi, homopolymer size. anyone can give me some tips?

minimal comentropy threshold

Hi, I read the documentation about how comentropy values are used for tumor-only calling.
How does the '-i' parameter, the minimal comentropy threshold, get used? Is the default normally a reasonable value? When would it need to be changed, and what would you suggest?

Cheers,

Jim

Bed file not loading

Hi,

I am running the msisensor msi command as below:

msisensor msi -d microsatellites.list -n example_normal_bam.bam -t example_tumour_bam.bam –e my_targetedregions.bed -o target_region_msi.tsv

However, it seems like the bed file is not loading, as according to some of the answers that I have seen here, I would expect something like:

loading bed regions ...
loading homopolymer and microsatellite sites ...
Total loading windows: n

However, mine goes straight to:

loading homopolymer and microsatellite sites ...
Total loading windows: 5810
Total loading homopolymer and microsatellites: 1766632

Why is the command not just taking the windows in my bed file? Any help would be very much appreciated.

Thanks.

MSI score Plot

Hello,

I have got the output files for tumor only BAMs. I would like to run Plot.R script. But I dont know which file I should use as input to Plot.R.

Here are 3 my outfiles:
tumorBam.MSI
tumorBaM.MSI_dis
tumorBam.MSI_somatic

Many thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.