Git Product home page Git Product logo

sequenza's Introduction

sequenza

Sequenza workflow, Given a pair of cellularity and ploidy parameters, the function returns the most likely allele-specific copy numbers with the corresponding log-posterior probability of the fit, for given values of B-allele frequency and depth ratio. Sequenza workflow, Given a pair of cellularity and ploidy parameters, the function returns the most likely allele-specific copy numbers with the corresponding log-posterior probability of the fit, for given values of B-allele frequency and depth ratio.

Overview

sequenza outputs

Overview

Dependencies

Usage

Cromwell

java -jar cromwell.jar run sequenza.wdl --inputs inputs.json

Inputs

Required workflow parameters:

Parameter Value Description
snpFile File File (data file with CNV calls from Varscan).
cnvFile File File (data file with SNV calls from Varscan).
outputFileNamePrefix String Output prefix to prefix output file names with.
reference String Version of genome reference

Optional workflow parameters:

Parameter Value Default Description
gammaRange Array[String] ["50", "100", "200", "300", "400", "500", "600", "700", "800", "900", "1000", "1250", "1500", "2000"] List of gamma parameters for tuning Sequenza seqmentation step, used by copynumber package.

Optional task parameters:

Parameter Value Default Description
preprocessInputs.rScript String "$RSTATS_CAIRO_ROOT/bin/Rscript" path to Rscript
preprocessInputs.preprocessScript String "$SEQUENZA_SCRIPTS_ROOT/bin/SequenzaPreProcess_v2.2.R" Path to the preprocessing .R script
preprocessInputs.modules String "sequenza/2.1.2m sequenza-scripts/2.1.5m" modules needed to run preprocessing step
preprocessInputs.timeout Int 20 timeout for this step in Hr, default is 20
preprocessInputs.jobMemory Int 38 Memory allocated for this job
runSequenza.rScript String "$RSTATS_CAIRO_ROOT/bin/Rscript" Path to Rscript
runSequenza.sequenzaScript String "$SEQUENZA_SCRIPTS_ROOT/bin/SequenzaProcess_v2.2.R" Sequenza wrapper script, instructions for running the pipeline
runSequenza.modules String "sequenza/2.1.2m sequenza-scripts/2.1.5m sequenza-res/2.1.2" Names and versions of modules
runSequenza.female String? None logical, TRUE or FALSE. default is TRUE
runSequenza.cancerType String? None acronym for cancer type (from ploidy table)
runSequenza.minReadsNormal Float? None threshold of minimum number of observation of depth ratio in a segment
runSequenza.minReadsBaf Int? None threshold of minimum number of observation of B-allele frequency in a segment
runSequenza.windowSize Int 100000 parameter to define window size for segmentation
runSequenza.timeout Int 20 Timeout in hours, needed to override imposed limits
runSequenza.jobMemory Int 24 Memory allocated for this job
formatJson.jobMemory Int 8 Memory allocated for this job
formatJson.width Int 1200 width of the summary plot, default is 1200
formatJson.height Int 400 height of the summary plot, default is 400
formatJson.modules String "sequenza-scripts/2.1.5m rmarkdown/0.1m" Names and versions of modules
formatJson.summaryPlotScript String "$SEQUENZA_SCRIPTS_ROOT/bin/plot_gamma_solutions.R" service script for plotting data from gamma solutions file, summary plot
formatJson.sequenzaRmd String "$SEQUENZA_SCRIPTS_ROOT/bin/SequenzaSummary.Rmd" Path to rmarkdown file for producing a .pdf report
formatJson.rScript String "$RSTATS_CAIRO_ROOT/bin/Rscript" Path to Rscript

Outputs

Output Type Description
resultZip File All results from sequenza runs using gamma sweep.
resultJson File? Combined json file with ploidy and contamination data.
gammaSummaryPlot File png for summary plot showing the effect of different gamma values
gammaMarkdownPdf File rmarkdown pdf with all gamma-specific panels along with gamma effect summary plot

Commands

This section lists command(s) run by sequenza workflow

  • Running sequenza

Sequenza produces the most likely allele-specific copy numbers for given values of B-allele frequency and depth ratio

Preprocessing:

  Rscript PREPROCESS_SCRIPT -s VARSCAN_SNP_FILE -c VARSCAN_CNV_FILE -y TRUE -p PREFIX

Prepearing data file using Varscan results:

 set -euo pipefail
 Rscript SEQUENZA_SCRIPT -s SEQZ_FILE -r REFERENCE -z GENOME_SIZE 
            -w WINDOW_SIZE 
            -g GAMMA 
            -p PREFIX 
            -l PLOIDY_FILE (Optional) 
            -f FEMALE_FLAG (Optional) 
            -t CANCER_TYPE (Optional) 
            -n MIN_READS_NORMAL (Optional) 
            -a MIN_READS_BAF (OPtional)

 zip -qr PREFIX_results.zip sol* PREFIX*

Running analysis:

 ...
 
 In this section Sequenza runs for a range of gamma values (fragment shown):

 cellularity = []
 ploidy = []
 no_segments = []

 for g in gammas:
   print(g)
   solutions = pd.read_table(os.path.join("gammas", g, "~{prefix}" + '_alternative_solutions.txt'))
   row = solutions.loc[solutions['SLPP'].idxmax()]
   cellularity.append(float(row['cellularity']))
   ploidy.append(float(row['ploidy']))
   path_seg = os.path.join("gammas", g, "~{prefix}" + '_Total_CN.seg')
   no_segments.append(len(open(path_seg).readlines()) - 1)
 
 gamma_solutions = pd.DataFrame({"gamma": gammas,
                                 "cellularity": cellularity,
                                 "ploidy": ploidy,
                                 "no_segments": no_segments})
 gamma_solutions.to_csv('gamma_solutions.csv', index=False)

 ...

Support

For support, please file an issue on the Github project or send an email to [email protected] .

Generated with generate-markdown-readme (https://github.com/oicr-gsi/gsi-wdl-tools/)

sequenza's People

Contributors

alkabenawra avatar gavin-peng avatar hannahdriver avatar mlaszloffy avatar prisnir-zz avatar pruzanov avatar torchij avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

g3chen qindan2008

sequenza's Issues

Is Sequenza suitable for panel sequencing data?

Hi, developers
We are now analyzing the purity and ploidy of a set of tumor samples with panel sequencing (over 700 genes). The Sequenza results were obtained using version 3.0.0 with default parameters; the input was generated by the python script sequenza-utils.py version 3.0.0 with default binning size of 50 bases. We found that the purity of most samples was 1 or close to 1, while the results of other software were not.
We want to know if Sequenza is suitable for estimating the purity of panel sequencing, or we need to adjust some parameters to make it fit.
Could you please give us some guidance. Thank you!

best,

issues with Processing chrY

[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
Collecting GC information ............................................................................... done

Processing chr1:
2694 variant calls.
355 copy-number segments.
174377 heterozygous positions.
3535919 homozygous positions.
Processing chr2:
2675 variant calls.
434 copy-number segments.
185981 heterozygous positions.
3462374 homozygous positions.
Processing chr3:
2002 variant calls.
636 copy-number segments.
161175 heterozygous positions.
3032613 homozygous positions.
Processing chr4:
2061 variant calls.
238 copy-number segments.
164571 heterozygous positions.
2642838 homozygous positions.
Processing chr5:
1914 variant calls.
429 copy-number segments.
140817 heterozygous positions.
2664844 homozygous positions.
Processing chr6:
54216 variant calls.
315 copy-number segments.
94488 heterozygous positions.
2571966 homozygous positions.
Processing chr7:
2472 variant calls.
293 copy-number segments.
134383 heterozygous positions.
3178129 homozygous positions.
Processing chr8:
1986 variant calls.
637 copy-number segments.
120658 heterozygous positions.
2342417 homozygous positions.
Processing chr9:
1364 variant calls.
257 copy-number segments.
104164 heterozygous positions.
1760848 homozygous positions.
Processing chr10:
1119 variant calls.
281 copy-number segments.
113969 heterozygous positions.
1883859 homozygous positions.
Processing chr11:
906 variant calls.
326 copy-number segments.
107182 heterozygous positions.
1737474 homozygous positions.
Processing chr12:
1272 variant calls.
336 copy-number segments.
104163 heterozygous positions.
1829704 homozygous positions.
Processing chr13:
740 variant calls.
174 copy-number segments.
76344 heterozygous positions.
1230465 homozygous positions.
Processing chr14:
1337 variant calls.
156 copy-number segments.
70741 heterozygous positions.
1614775 homozygous positions.
Processing chr15:
913 variant calls.
107 copy-number segments.
67094 heterozygous positions.
1268897 homozygous positions.
Processing chr16:
35014 variant calls.
46 copy-number segments.
34378 heterozygous positions.
1459538 homozygous positions.
Processing chr17:
796 variant calls.
207 copy-number segments.
65068 heterozygous positions.
1193466 homozygous positions.
Processing chr18:
729 variant calls.
151 copy-number segments.
63508 heterozygous positions.
1174898 homozygous positions.
Processing chr19:
615 variant calls.
61 copy-number segments.
54666 heterozygous positions.
893063 homozygous positions.
Processing chr20:
899 variant calls.
67 copy-number segments.
61025 heterozygous positions.
1021553 homozygous positions.
Processing chr21:
271 variant calls.
25 copy-number segments.
37636 heterozygous positions.
443041 homozygous positions.
Processing chr22:
609 variant calls.
131 copy-number segments.
40940 heterozygous positions.
720292 homozygous positions.
Processing chrX:
25209 variant calls.
7 copy-number segments.
2224 heterozygous positions.
1669982 homozygous positions.
Warning message:
In (function (..., deparse.level = 1) :
number of columns of result is not a multiple of vector length (arg 1523)
Error in [<-.data.frame(*tmp*, tmp[, 8] > tmp[, 7], 7, value = c(1L, :
missing values are not allowed in subscripted assignments of data frames
Calls: scar_score -> preprocess.hrd -> [<- -> [<-.data.frame
Execution halted

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.