oicr-gsi / sequenza Goto Github PK

Workflow for Sequenza, cellularity and ploidy

Shell 2.95% R 67.80% WDL 29.25%

rnaseq copy-number-variation cnv cnv-detection cnv-analysis

sequenza's Introduction

sequenza

Sequenza workflow, Given a pair of cellularity and ploidy parameters, the function returns the most likely allele-specific copy numbers with the corresponding log-posterior probability of the fit, for given values of B-allele frequency and depth ratio. Sequenza workflow, Given a pair of cellularity and ploidy parameters, the function returns the most likely allele-specific copy numbers with the corresponding log-posterior probability of the fit, for given values of B-allele frequency and depth ratio.

Overview

Dependencies

Usage

Cromwell

java -jar cromwell.jar run sequenza.wdl --inputs inputs.json

Inputs

Required workflow parameters:

Parameter	Value	Description
`snpFile`	File	File (data file with CNV calls from Varscan).
`cnvFile`	File	File (data file with SNV calls from Varscan).
`outputFileNamePrefix`	String	Output prefix to prefix output file names with.
`reference`	String	Version of genome reference

Optional workflow parameters:

Parameter	Value	Default	Description
`gammaRange`	Array[String]	["50", "100", "200", "300", "400", "500", "600", "700", "800", "900", "1000", "1250", "1500", "2000"]	List of gamma parameters for tuning Sequenza seqmentation step, used by copynumber package.

Optional task parameters:

Parameter	Value	Default	Description
`preprocessInputs.rScript`	String	"$RSTATS_CAIRO_ROOT/bin/Rscript"	path to Rscript
`preprocessInputs.preprocessScript`	String	"$SEQUENZA_SCRIPTS_ROOT/bin/SequenzaPreProcess_v2.2.R"	Path to the preprocessing .R script
`preprocessInputs.modules`	String	"sequenza/2.1.2m sequenza-scripts/2.1.5m"	modules needed to run preprocessing step
`preprocessInputs.timeout`	Int	20	timeout for this step in Hr, default is 20
`preprocessInputs.jobMemory`	Int	38	Memory allocated for this job
`runSequenza.rScript`	String	"$RSTATS_CAIRO_ROOT/bin/Rscript"	Path to Rscript
`runSequenza.sequenzaScript`	String	"$SEQUENZA_SCRIPTS_ROOT/bin/SequenzaProcess_v2.2.R"	Sequenza wrapper script, instructions for running the pipeline
`runSequenza.modules`	String	"sequenza/2.1.2m sequenza-scripts/2.1.5m sequenza-res/2.1.2"	Names and versions of modules
`runSequenza.female`	String?	None	logical, TRUE or FALSE. default is TRUE
`runSequenza.cancerType`	String?	None	acronym for cancer type (from ploidy table)
`runSequenza.minReadsNormal`	Float?	None	threshold of minimum number of observation of depth ratio in a segment
`runSequenza.minReadsBaf`	Int?	None	threshold of minimum number of observation of B-allele frequency in a segment
`runSequenza.windowSize`	Int	100000	parameter to define window size for segmentation
`runSequenza.timeout`	Int	20	Timeout in hours, needed to override imposed limits
`runSequenza.jobMemory`	Int	24	Memory allocated for this job
`formatJson.jobMemory`	Int	8	Memory allocated for this job
`formatJson.width`	Int	1200	width of the summary plot, default is 1200
`formatJson.height`	Int	400	height of the summary plot, default is 400
`formatJson.modules`	String	"sequenza-scripts/2.1.5m rmarkdown/0.1m"	Names and versions of modules
`formatJson.summaryPlotScript`	String	"$SEQUENZA_SCRIPTS_ROOT/bin/plot_gamma_solutions.R"	service script for plotting data from gamma solutions file, summary plot
`formatJson.sequenzaRmd`	String	"$SEQUENZA_SCRIPTS_ROOT/bin/SequenzaSummary.Rmd"	Path to rmarkdown file for producing a .pdf report
`formatJson.rScript`	String	"$RSTATS_CAIRO_ROOT/bin/Rscript"	Path to Rscript

Outputs

Output	Type	Description
`resultZip`	File	All results from sequenza runs using gamma sweep.
`resultJson`	File?	Combined json file with ploidy and contamination data.
`gammaSummaryPlot`	File	png for summary plot showing the effect of different gamma values
`gammaMarkdownPdf`	File	rmarkdown pdf with all gamma-specific panels along with gamma effect summary plot

Commands

This section lists command(s) run by sequenza workflow

Running sequenza

Sequenza produces the most likely allele-specific copy numbers for given values of B-allele frequency and depth ratio

Preprocessing:

  Rscript PREPROCESS_SCRIPT -s VARSCAN_SNP_FILE -c VARSCAN_CNV_FILE -y TRUE -p PREFIX

Prepearing data file using Varscan results:

 set -euo pipefail
 Rscript SEQUENZA_SCRIPT -s SEQZ_FILE -r REFERENCE -z GENOME_SIZE 
            -w WINDOW_SIZE 
            -g GAMMA 
            -p PREFIX 
            -l PLOIDY_FILE (Optional) 
            -f FEMALE_FLAG (Optional) 
            -t CANCER_TYPE (Optional) 
            -n MIN_READS_NORMAL (Optional) 
            -a MIN_READS_BAF (OPtional)

 zip -qr PREFIX_results.zip sol* PREFIX*

Running analysis:

 ...
 
 In this section Sequenza runs for a range of gamma values (fragment shown):

 cellularity = []
 ploidy = []
 no_segments = []

 for g in gammas:
   print(g)
   solutions = pd.read_table(os.path.join("gammas", g, "~{prefix}" + '_alternative_solutions.txt'))
   row = solutions.loc[solutions['SLPP'].idxmax()]
   cellularity.append(float(row['cellularity']))
   ploidy.append(float(row['ploidy']))
   path_seg = os.path.join("gammas", g, "~{prefix}" + '_Total_CN.seg')
   no_segments.append(len(open(path_seg).readlines()) - 1)
 
 gamma_solutions = pd.DataFrame({"gamma": gammas,
                                 "cellularity": cellularity,
                                 "ploidy": ploidy,
                                 "no_segments": no_segments})
 gamma_solutions.to_csv('gamma_solutions.csv', index=False)

 ...

Support

For support, please file an issue on the Github project or send an email to [email protected] .

Generated with generate-markdown-readme (https://github.com/oicr-gsi/gsi-wdl-tools/)

sequenza's People

Contributors

Stargazers

Watchers

Forkers

g3chen qindan2008

sequenza's Issues

Is Sequenza suitable for panel sequencing data?

Hi, developers
We are now analyzing the purity and ploidy of a set of tumor samples with panel sequencing (over 700 genes). The Sequenza results were obtained using version 3.0.0 with default parameters; the input was generated by the python script sequenza-utils.py version 3.0.0 with default binning size of 50 bases. We found that the purity of most samples was 1 or close to 1, while the results of other software were not.
We want to know if Sequenza is suitable for estimating the purity of panel sequencing, or we need to adjust some parameters to make it fit.
Could you please give us some guidance. Thank you!

best,

issues with Processing chrY

[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
Collecting GC information ............................................................................... done

Processing chr1:
2694 variant calls.
355 copy-number segments.
174377 heterozygous positions.
3535919 homozygous positions.
Processing chr2:
2675 variant calls.
434 copy-number segments.
185981 heterozygous positions.
3462374 homozygous positions.
Processing chr3:
2002 variant calls.
636 copy-number segments.
161175 heterozygous positions.
3032613 homozygous positions.
Processing chr4:
2061 variant calls.
238 copy-number segments.
164571 heterozygous positions.
2642838 homozygous positions.
Processing chr5:
1914 variant calls.
429 copy-number segments.
140817 heterozygous positions.
2664844 homozygous positions.
Processing chr6:
54216 variant calls.
315 copy-number segments.
94488 heterozygous positions.
2571966 homozygous positions.
Processing chr7:
2472 variant calls.
293 copy-number segments.
134383 heterozygous positions.
3178129 homozygous positions.
Processing chr8:
1986 variant calls.
637 copy-number segments.
120658 heterozygous positions.
2342417 homozygous positions.
Processing chr9:
1364 variant calls.
257 copy-number segments.
104164 heterozygous positions.
1760848 homozygous positions.
Processing chr10:
1119 variant calls.
281 copy-number segments.
113969 heterozygous positions.
1883859 homozygous positions.
Processing chr11:
906 variant calls.
326 copy-number segments.
107182 heterozygous positions.
1737474 homozygous positions.
Processing chr12:
1272 variant calls.
336 copy-number segments.
104163 heterozygous positions.
1829704 homozygous positions.
Processing chr13:
740 variant calls.
174 copy-number segments.
76344 heterozygous positions.
1230465 homozygous positions.
Processing chr14:
1337 variant calls.
156 copy-number segments.
70741 heterozygous positions.
1614775 homozygous positions.
Processing chr15:
913 variant calls.
107 copy-number segments.
67094 heterozygous positions.
1268897 homozygous positions.
Processing chr16:
35014 variant calls.
46 copy-number segments.
34378 heterozygous positions.
1459538 homozygous positions.
Processing chr17:
796 variant calls.
207 copy-number segments.
65068 heterozygous positions.
1193466 homozygous positions.
Processing chr18:
729 variant calls.
151 copy-number segments.
63508 heterozygous positions.
1174898 homozygous positions.
Processing chr19:
615 variant calls.
61 copy-number segments.
54666 heterozygous positions.
893063 homozygous positions.
Processing chr20:
899 variant calls.
67 copy-number segments.
61025 heterozygous positions.
1021553 homozygous positions.
Processing chr21:
271 variant calls.
25 copy-number segments.
37636 heterozygous positions.
443041 homozygous positions.
Processing chr22:
609 variant calls.
131 copy-number segments.
40940 heterozygous positions.
720292 homozygous positions.
Processing chrX:
25209 variant calls.
7 copy-number segments.
2224 heterozygous positions.
1669982 homozygous positions.
Warning message:
In (function (..., deparse.level = 1) :
number of columns of result is not a multiple of vector length (arg 1523)
Error in [<-.data.frame(*tmp*, tmp[, 8] > tmp[, 7], 7, value = c(1L, :
missing values are not allowed in subscripted assignments of data frames
Calls: scar_score -> preprocess.hrd -> [<- -> [<-.data.frame
Execution halted

oicr-gsi / sequenza Goto Github PK

sequenza's Introduction

sequenza

Overview

Overview

Dependencies

Usage

Cromwell

Inputs

Required workflow parameters:

Optional workflow parameters:

Optional task parameters:

Outputs

Commands

Support

sequenza's People

Contributors

Stargazers

Watchers

Forkers

sequenza's Issues

Recommend Projects

Recommend Topics

Recommend Org