raerose01 / deconstructsigs Goto Github PK

View Code? Open in Web Editor NEW

138.0 138.0 47.0 6.51 MB

deconstructSigs

R 64.56% HTML 28.63% JavaScript 6.04% CSS 0.78%

deconstructsigs's People

Contributors

Stargazers

Watchers

deconstructsigs's Issues

error in converting vcf to sigs.input

I get the following error when I try to convert a vcf file into the correct input format for deconstructSigs via the command
vcf.to.sigs.input(vcf = "path_to_file/file1.vcf")
Error: scanVcf: invalid class “VCFHeader” object: 'info(VCFHeader)' must be a 3 column DataFrame with names Number, Type, Description
path: path_to_file/file1.vcf

Use with non-human genome

Hi,

It's not clear to me whether setting the bsg option to another genome in mut.to.sigs.input() (e.g. mut.to.sigs.input(bsg=BSgenome.Dmelanogaster.UCSC.dm6) is sufficient to apply the correct normalisation in whichSignatures(tri.counts.method = 'genome')`.

Your README says:

Included with the package are tri.counts.exome, which contains the trinucleotide counts for an exome and tri.counts.genome, which contains the trinucleotide counts for the hg19 genome

Does this change if we supply a different genome? Or will it always normalise to counts in hg19?

Thanks,

Nick

Many Samples in my Maf files

Hi, I have around 90 Samples (taken from same tissues from different patients) in my .maf files, how should i be determining the signatures contributing to these 90 samples at a go? Its very difficult to provide sample.id every time. It would be really helpful if you can please help me in this case

For example:

sample_1 = whichSignatures(tumor.ref = sigs.input,
signatures.ref = signatures.nature2013,
sample.id = 1,
contexts.needed = TRUE,
tri.counts.method = 'default')

Should i manually change this every time and generate a graph or can i run these 90 samples together and generate graph at once? Its very confusing to run manually every sample.Please help.

Thanks a ton,
Zac

The ESCA data

Hi,

I am working on implementing this tool in the pipeline and wanted to compare it with other tools available.
I see you have used the TCGA-ESCA data in the paper, but the Signature predictions are not present in the Supplementary tables.
Please let me know if they are present in any other file or if you could please provide me the table, so I can use that as a benchmark. (Just like other datasets are present in Additional File 6 in the paper)

Whole Genome normalization question

Hi,

I am a bit confused about when to use set the tri.counts.method to Genome.
For the 70-30 Simulation dataset with tri.counts.method should be default, right?

So If I want to test how the tool does with Genome normalization, which Simulated dataset should I use? Or is there a way to make the 70-30 Simulation dataset based on the tri.counts.genome

"Check ref bases -- not all match context" warning message

I was testing out this package with a public dataset as per this script here, and got this error message:

Warning message:
    In mut.to.sigs.input(mut.ref = ccle, sample.id = "Tumor_Sample_Barcode",  :
                             Check ref bases -- not all match context:
                             22RV1_PROSTATE:chrM:9477:G:A, 22RV1_PROSTATE:chrM:14323:G:A, 253JBV_URINARY_TRACT:chrM:10589:G:A, 253JBV_URINARY_TRACT:chrM:13768:T:C, AML193_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:12720:A:G, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:6179:G:A, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:8684:C:T, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:14470:T:C, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:15148:G:A, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:15355:G:A, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:15487:A:T, BICR31_UPPER_AERODIGESTIVE_TRACT:chrM:14793:A:G, BICR31_UPPER_AERODIGESTIVE_TRACT:chrM:15218:A:G, BICR56_UPPER_AERODIGESTIVE_TRACT:chrM:8697:G:A, BICR56_UPPER_AERODIGESTIVE_TRACT:chrM:13928:G:C, BICR56_UPPER_AERODIGESTIVE_TRACT:chrM:14905:G:A, BICR6_UPPER_AERODIGESTIVE_TRACT:chrM:6365:T:C, BL41_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:11251:A:G, BL41_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:12612:A:G, BL41_HAEMATOPOIETIC_AND_LY [... truncated]

I found the origin of this error message in the package source code here, however I am not clear on the significance of it. What exactly does this mean? And is there a way I can clean the data to avoid the issue? Thanks.

More normalization

Hi,
I am still struggling to understand which normalization procedure would make sense for WES versus WGS, both from your previous discussions, and also from the documentation that goes with the deconstructSigs R package. According to the package docs: " For exome data, the 'exome2genome' method is appropriate for the signatures included in this package. For whole genome data, use the 'default' method to obtain consistent results.". From the deconstructSigs GitHub page it is however stated that "For exome data, the default method is appropriate for the signatures included in this package.". I am using the 'signatures.cosmic' as the set of known signatures.

best,
Sigve

vcf.to.sigs.input() size of alt1 is empty

if there is a sample in vcf, and all variants with same genotype '0/1',
that make alt1 of vcf.to.sigs.input() empty,
and break the process

I think it will be better to check the length of alt1 before combine to mut to prevent this bug

Mutation Limit

The mutation limit of the "mut.to.sigs.input" tool is 2,000 but I need to analize more, What can I do?

Thanks.

normalization question for whole genome

Hi,

this package seems great! thank u!

I have some whole genome seq that I wanted to try, but I am bit confused about whether I have to set on tri.counts.method to genome or not. I want them to compare to signatures.nature2013 signature.

any help will be great!

thanks

makePie fails with only 1 signature weight non-zero and/or with unknown=0

If makePie() is called when only one signature weight is non-zero, it fails due to a needed ", drop=FALSE". If furthermore it is called after that fix is added, with only one signature and with unknown=0, it also fails in the palette() function call (which can be fixed by adding a dummy second color to colors.sigs.present).

Error on installation

Hi,

I am getting the below error when attempting to install (R version is 3.4.4)
Is there another recommended way of installing?

> install.packages("deconstructSigs")
Installing package into ‘/home/lmose/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
Warning: dependencies ‘BSgenome’, ‘BSgenome.Hsapiens.UCSC.hg19’, ‘GenomeInfoDb’ are not available
trying URL 'https://cloud.r-project.org/src/contrib/deconstructSigs_1.8.0.tar.gz'
Content type 'application/x-gzip' length 211160 bytes (206 KB)
==================================================
downloaded 206 KB

ERROR: dependencies ‘BSgenome’, ‘BSgenome.Hsapiens.UCSC.hg19’, ‘GenomeInfoDb’ are not available for package ‘deconstructSigs’
* removing ‘/home/lmose/R/x86_64-pc-linux-gnu-library/3.4/deconstructSigs’

The downloaded source packages are in
	‘/tmp/RtmpGcYflL/downloaded_packages’
Warning message:
In install.packages("deconstructSigs") :
  installation of package ‘deconstructSigs’ had non-zero exit status

Filtering exome SNVs

HI
I want to use deconstructsigs on my WES data. However, there are a lot of SNVs on introns or UTRs. Should I filter these SNVs and only leave those on CDS? Thanks!
Yang

normalization question for whole exome sequencing data

Hi,

It is a useful tool for mutation feature analysis.

I have some whole exome/genome sequencing data that I wanted to compare to signatures.nature2013 (or signatures.cosmic) signature. I am bit confused about which tri.counts.method I have to set on, exome or exome2genome for WES, genome for WGS?

thanks

Trinucleotide Counts for More Genomes

Can you add counts for hg38 and mm10, which are more modern and commonly used in today's times.

chr name problem for GRCh38

I'm using deconstrucSigs for GRch38 genome. I find the chromosome names in BSGenome are 1,2,3 etc, and not 'chr1', 'chr2', ..

However in the mut.to.sigs.input function, there's one line:
levels(mut[, chr]) <- sub("^([0-9XY])", "chr\1", levels(mut[, chr]))
that add the 'chr' prefix and leads to the error:
Check chr names -- not all match BSgenome.Hsapiens.NCBI.GRCh38 object:
chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrX

Is there anyway to get rid of that line?

Unknown error with my data

Hi,

I am permanently getting this error

sigs.input <- mut.to.sigs.input(mut.ref = a,

                            sample.id = "Sample",

                            chr = "chr",

                            pos = "pos",

                            ref = "ref",

                            alt = "alt")

Error in [.data.frame(mut.full, , c(sample.id, chr, pos, ref, alt)) :
undefined columns selected

This is link to my data
https://www.dropbox.com/s/1rf9cps97znae03/myfile.txt?dl=0
Could I please ask to have a look to see why I am failing?

change the colors of the plotSignatures function

Can you change the colors to the usual ones used in cosmic?
thanks in advance

Weighting of mutations

Hi,

thanks for a great package!

I'm using it for tumor-only sequencing with our https://github.com/lima1/PureCN tool. PureCN essentially predicts somatic status by adjusting allele frequencies for tumor purity and allele-specific copy number. This works pretty well and I get a decent correlation of deconstructSigs scores with matched tumor/normal. But I was wondering if you see an easy way to incorporate the uncertainty of the somatic prediction. Essentially weighting mutations by somatic posterior probability would be perfect. (for example especially copy number losses can result in cases where germline and somatic have very similar expected allele frequencies).

I understand if you don't have the bandwidth for this. But if you see an easy solution, I'm happy to dig into the code and method.

Thanks in advance,
Markus

"signature weights" no longer displayed in graphics

I have encountered two problems while using DeconstructSigs.
I have used DeconstructSigs a while ago and everthing worked fine, but lately I have the problem, that the title of the graph and weights of the different signatures are no longer displayed in the graphics (after running plotSignatures). Additionally, in some cases "makepie() fails and shows a white pie called "1".

Does anyone have an idea how I can resolve these problems?

Thanks

Possible to save `plotSignatures` object and print later?

If I run

p <- plotSignatures(signatures, sub = 'signatures.cosmic')

The plot outputs to the graphics device, and p becomes NULL with warnings:

Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "cin" cannot be set
2: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "cra" cannot be set
3: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "csi" cannot be set
4: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "cxy" cannot be set
5: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "din" cannot be set
6: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "page" cannot be set
7: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "cin" cannot be set
8: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "cra" cannot be set
9: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "csi" cannot be set
10: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "cxy" cannot be set
11: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "din" cannot be set
12: In doTryCatch(return(expr), name, parentenv, handler) :
  graphical parameter "page" cannot be set

I want to be able to render the plot later, in a different location, potentially on a different system that does not have deconstructSigs installed. Passing around a pre-rendered PDF or PNG output is not desirable since it wont scale to fit the final location for rendering. Is there a way to do this?

hg38 support

Hi there!

I am attempting to use your R tool to analyze some VCF files, and deconstruct the signatures within them. I am using HG38 as a reference, and when I change bsg to be BSgenome.Hsapiens.UCSC.hg38 i get the error "undefined columns selected". this is while running mut.to.sigs.input

Is there support for hg38? I have been trying to find a workaround, but having the reference be tied up in the BSgenome library its hard to dev

Re:tri nucleotide frequency of exome and genome.

Why does the trinucleotide frequency data tri.counts.exome or tri.counts.genome in deconstSigs package has only 32 values instead of 96?

A suspected bug leading to "Error in `[.data.frame`(x, i, j) : undefined columns selected" in "mut.to.sigs.input"

When using mut.to.sigs.input on a data.frame as below:

      sample.id  Chr     Start Ref Alt

1: P001A_Tumor1 chr1  22304831   C   A

2: P001A_Tumor1 chr3  56835761   G   A

3: P001A_Tumor1 chr3 155197843   G   A

4: P001A_Tumor1 chr3 195978246   T   C

5: P001A_Tumor1 chr4   9706544   G   C

6: P001A_Tumor1 chr4  39975008   C   G


input <- mut.to.sigs.input(

    a.mut.ref,

    sample.id='sample.id',

    chr='Chr',

    pos='Start',

    ref='Ref',

    alt='Alt'

)

It will got an error


Error in `[.data.frame`(x, i, j) : undefined columns selected

This error can be fixed by renaming the colnames of the data.frame from "sample.id" to other names.

It seems that the data.frame can not be named using "sample.id".

My session info is

> sessionInfo()

R version 3.5.1 (2018-07-02)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows >= 8 x64 (build 9200)



Matrix products: default



locale:

[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   

[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              

[5] LC_TIME=Chinese (Simplified)_China.936    



attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base     



other attached packages:

[1] data.table_1.11.4     dplyr_0.7.6           stringr_1.3.1         deconstructSigs_1.8.0

[5] BiocInstaller_1.30.0  RevoUtils_11.0.1      RevoUtilsMath_11.0.0 



loaded via a namespace (and not attached):

 [1] Rcpp_0.12.18                      pillar_1.3.0                     

 [3] compiler_3.5.1                    GenomeInfoDb_1.16.0              

 [5] bindr_0.1.1                       XVector_0.20.0                   

 [7] zlibbioc_1.26.0                   bitops_1.0-6                     

 [9] tools_3.5.1                       lattice_0.20-35                  

[11] tibble_1.4.2                      BSgenome_1.48.0                  

[13] pkgconfig_2.0.2                   rlang_0.2.2                      

[15] Matrix_1.2-14                     DelayedArray_0.6.5               

[17] rstudioapi_0.7                    yaml_2.2.0                       

[19] parallel_3.5.1                    bindrcpp_0.2.2                   

[21] GenomeInfoDbData_1.1.0            rtracklayer_1.40.6               

[23] Biostrings_2.48.0                 S4Vectors_0.18.3                 

[25] IRanges_2.14.11                   grid_3.5.1                       

[27] stats4_3.5.1                      tidyselect_0.2.4                 

[29] Biobase_2.40.0                    glue_1.3.0                       

[31] R6_2.2.2                          BSgenome.Hsapiens.UCSC.hg19_1.4.0

[33] XML_3.98-1.16                     BiocParallel_1.14.2              

[35] purrr_0.2.5                       magrittr_1.5                     

[37] matrixStats_0.54.0                GenomicAlignments_1.16.0         

[39] Rsamtools_1.32.3                  BiocGenerics_0.26.0              

[41] GenomicRanges_1.32.6              SummarizedExperiment_1.10.1      

[43] assertthat_0.2.0                  stringi_1.1.7                    

[45] RCurl_1.95-4.11                   crayon_1.3.4

Thanks a lot.

subscript out of bounds

Hi,
I want to use deconstructSigs to analyze the mutation signature of lung cancer patients with their whole exome sequencing data.
I run the command following the instruction in the page "https://github.com/raerose01/deconstructSigs".

After I loaded the data,
(str(dat4)
'data.frame': 9623 obs. of 5 variables:
$ X : int 13 13 13 13 13 13 13 13 13 13 ...
$ Chr: chr "chr1" "chr1" "chr1" "chr1" ...
$ End: int 2461397 2461418 2461421 3328849 3328852 3328858 3328861 3328873 3328876 3328889 ...
$ Ref: chr "G" "G" "G" "C" ...
$ Alt: chr "A" "C" "A" "T" ...)

I run

sigs.input <- mut.to.sigs.input(mut.ref = dat4,

                           sample.id = "X",

                               chr = "Chr",

                               pos = "End",

                               ref = "Ref",

                               alt = "Alt")

I got this:
Error in [<-(*tmp*, i, trimer, value = 21L) : subscript out of bounds
What is the mean of "subscript out of bounds"?
Can you tell me what is wrong with it?
Thank you!
Zack

Normalization method for RNA-Seq data?

I was able to get the package to work with variants called from RNA-Seq data. However, I am not sure what the optimal normalization method for this is. The package description only mentions exome and whole genome data. Any suggestions? Are there details on how to determine & create your own normalization method to use?

How to make signatures for multiple samples at once?

I have .vcf files for many samples. I would like to produce a single plot that shows the signatures for all of them. I am not clear how to accomplish this, since the whichSignatures function seems to only take one sample at a time. Suggestions?

problem with mut.to.sigs.input

Dear raerose01,
I am using deconstructSigs for the first time and having difficulty. I have a data frame of somatic mutations in the format:
sample.id chr pos ref alt
sample1 6 32157652 C G
When I try to import this using mut.to.sigs.input I get an error

sig<-mut.to.sigs.input(ds,sample.id="sample.id",chr="chr",pos="pos",ref="ref",alt="alt",bsg=BSgenome.Hsapiens.UCSC.hg19)

Error in .local(x, ...) :
'start', 'end' and 'width' can only be specified when 'names' is either missing, a character vector/factor, a character-Rle, or a factor-Rle

Please help,
Juan

mut.to.sigs.input Error in GrCH38

test4.txt

Certain genomic profiles are generating the error:
Error in if (trimer %in% all.tri) { : argument is of length zero

I'm using the appropiate BSgenome object. I've attached a sample input file. I tried adding a check to see if trimer was not null prior, but that ended up creating a blank matrix.

Error in FUN(left, right) : non-numeric argument to binary operator

Hi, I met a error when runing mut.to.sigs.input().
The following is the demo of the TCGA mutation maf file.

  Tumor_Sample_Barcode         Start_Position Reference_Allele Tumor_Seq_Allele2
  <chr>                        <chr>          <chr>            <chr>            
1 TCGA-DD-AACT-01A-11D-A40R-10 1955799        T                A                
2 TCGA-DD-AACT-01A-11D-A40R-10 30877160       C                T                
3 TCGA-DD-AACT-01A-11D-A40R-10 31697974       G                A                
4 TCGA-DD-AACT-01A-11D-A40R-10 39966012       A                G                
5 TCGA-DD-AACT-01A-11D-A40R-10 63323401       G                A                
6 TCGA-DD-AACT-01A-11D-A40R-10 113829981      G                C

After I run

sigs.input <- mut.to.sigs.input(mut.ref = sample.mut.ref, 
                                sample.id = "Tumor_Sample_Barcode", 
                                chr = "Chromosome", 
                                pos = "Start_Position", 
                                ref = "Reference_Allele", 
                                alt = "Tumor_Seq_Allele2",
                                bsg = BSgenome.Hsapiens.UCSC.hg38)

I just got an error like : Error in FUN(left, right) : non-numeric argument to binary operator
Could you give me some hints?

non-default genomes

Thanks for the great package!

I'm trying to run on mouse data, and having trouble with it. I'm using
bsg = BSgenome.Mmusculus.UCSC.mm10 to provide a BSgenome object for mut.to.sigs.input, but I keep getting the following error:

Error in as.character.default(<S4 object of class "BSgenome">) :
no method for coercing this S4 class to a vector

BSgenome.Mmusculus.UCSC.mm10 is a BSgenome object. I also tried for some human data to manually set bsg = BSgenome.Hsapiens.UCSC.hg19, but get the same error, regardless of whether I append ::Hsapiens to the end of the object name. The function works perfectly if I use default bsg, but this should be the same thing. Is there something obvious that I'm missing here?

Thanks,
Rob

signature matrix for new Alexandrov SBS signatures?

Hi,

Is there a signature matrix for the expanded signature set detailed here? https://www.biorxiv.org/content/biorxiv/early/2018/05/15/322859.full.pdf

Thanks,
Emily

Error in validObject(.Object) : invalid class “DNAStringSet” object: undefined class for slot "elementMetadata" ("DataTableORNULL")

head(d)
Sample chr pos ref alt
1 SRR630583 chr1 735141 A T
2 SRR630583 chr1 786021 A T
3 SRR630583 chr1 794397 C A
4 SRR630583 chr1 813831 A T
5 SRR630583 chr1 848467 C T
6 SRR630583 chr1 920860 A G

sigs.input <- mut.to.sigs.input(mut.ref = d,sample.id="Sample",chr = "chr", pos = "pos", ref = "ref", alt = "alt")

####error

sigs.input <- mut.to.sigs.input(mut.ref = d,sample.id="Sample",chr = "chr", pos = "pos", ref = "ref", alt = "alt")
Error in validObject(.Object) :
invalid class “DNAStringSet” object: undefined class for slot "elementMetadata" ("DataTableORNULL")

why this error happend?

Tri nucleoide frequency for exome data

I saw the option of using triFreq function that counts the number of times each triunucleotide is found in a supplied genome in genome level.
But How to generate counts of every trinucleotide frequency in exome-level?

Problem w plotSignatures and makePie

Having trouble opening PDFs in any PDF reader created by plotSignatures and makePie. Any ideas what might be causing this?

mut.to.sigs.input out of memory error

Hello I've detected an extrange behaviour at mut.to.sigs.input function, this behaviour generates an out of memory error (even with 54GB!) when parsing big files at beep loop.

This is the affected code:

  for (i in unique(mut[, sample.id])) {
    tmp = subset(mut, mut[, sample.id] == i) #Failing line
    beep = table(tmp$tricontext)
    for (l in 1:length(beep)) {
      trimer = names(beep[l])
      if (trimer %in% all.tri) {
        final.matrix[i, trimer] = beep[trimer]
      }
    }
  }

What I've seen is, when I was going to execute the substep line the size of selected rows was squared. For example, when perorming a subset of 100 samples ( and 10 columns), the tmp matrix dimensions were 10000x10 (!) instead of expected 100x10 one.

I've checked 3 different ways to perform the same operation and in all the behaviour is the expected.
I suggest you could try to implement "tmp2" or "tmp4" solutions.


    i= 'PDX102.bam'    
    
    tmp = subset(mut, mut[, sample.id] == i)
    tmp2 =  mut[mut[,sample.id] == i,]    
    tmp3 = subset(mut, c(rep(TRUE,100)))
    inSubset = mut[, sample.id] == i
    tmp4 = subset(mut, inSubset)
    
    dim(tmp)   # 10000    10
    dim(tmp2)  # 100    10
    dim(tmp3)  # 100    10
    dim(tmp4)  # 100    10

Thank you!

Minor bug in arbitrary bsg implementation.

Thanks for all of your hard work on deconstructSigs. I found a small bug in the implementation of arbitrary bsg objects. The below line fails because paste can’t coerce the bsg object to a character.

warning(paste('Check chr names -- not all match ”,bsg,“ object:\n', unknown.regions, sep = ' ‘))

sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000
[2] BSgenome_1.38.0
[3] rtracklayer_1.30.4
[4] deconstructSigs_1.8.0
[5] RColorBrewer_1.1-2
[6] VariantAnnotation_1.16.4
[7] Rsamtools_1.22.0
[8] Biostrings_2.38.4
[9] XVector_0.10.0
[10] SummarizedExperiment_1.0.2
[11] Biobase_2.30.0
[12] GenomicRanges_1.22.4
[13] GenomeInfoDb_1.6.3
[14] IRanges_2.4.8
[15] S4Vectors_0.8.11
[16] BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] Rcpp_0.12.6 AnnotationDbi_1.32.3 magrittr_1.5
[4] GenomicAlignments_1.6.3 zlibbioc_1.16.0 BiocParallel_1.4.3
[7] stringr_1.0.0 plyr_1.8.4 tools_3.2.0
[10] DBI_0.4-1 lambda.r_1.1.9 futile.logger_1.4.3
[13] reshape2_1.4.1 futile.options_1.0.0 bitops_1.0-6
[16] RCurl_1.95-4.8 biomaRt_2.26.1 RSQLite_1.0.0
[19] stringi_1.1.1 GenomicFeatures_1.22.13 XML_3.98-1.4

vcf.to.sigs.input breaks when file contains annotations (ID field)

Hi,

when processing VCF files with partially (or fully) annotated ID field using the readGT function, what you get back is a matrix looking like this:

> gt <- readGT(vcf, nucleotides = TRUE)
> head(gt)
               tumor
1:6610695_G/A  "G/A"
1:15654752_C/T "C/T"
1:16248740_A/T "A/T"
1:40928079_A/T "A/T"
rs2596251      "G/A"
rs75758917     "G/T"

Some row names are dbSNP IDs, which makes it impossible to obtain chromosome, position or reference information the way it is done in the vcf.to.sigs.input function.

    chr <- sub(":.+", "", rownames(gt))
    pos <- as.numeric(sub("_.+", "", sub(".+:", "", rownames(gt))))
    ref <- sub("[/|].+", "", sub(".+_", "", rownames(gt)))

Is there a reason other than readGT being lightweight, that a more general solution like this below is not used?

    auxVcf <- readVcf(vcf)
    chr <- as.character(seqnames(auxVcf))
    pos <- start(auxVcf)
    ref <- as.character(ref(auxVcf))

Thanks for all the work!

Marko

Using variants from chromosome X and Y

Hi ,

Do you recommend on including or excluding the mutations on (X and Y) chromosome from the input?
Since the opportunity for mutations on chromosome X will be different in male and female samples, right?

Thank you!

Thank You Rachel

Just wanted to thank you for your very useful and kind answer. All good now. Will stick to the 6% cutoff.
Many thanks,
Pedro

SSE threshold

Hi,

Do you have a suggestion on the SSE threshold above which is indicative of something else happening in the sample (resulting in the use of probably a de-novo signature prediction etc)?

Even in the Supplementary figure 2 (b) from the paper, there are samples with enough mutations but still higher SSE.

I am just trying to see if I can use the SSE value to provide some confidence.

Thanks,
Rashesh

Weights assigned to each signature do not always add up to 1

Dear DecosntructSigs developers,

I recently used your software. While everything went smoothly, I noticed that, even in the examples you provide, the probabilities/weights assigned to each signature, using either nature or cosmic references, DO NOT always add up to 1. Why is that? Does that mean that a set of SNVs are not explained by any of the signatures?

I would highly appreciate your input.

Regards,

Pedro
Sloan Kettering Institute

question about the tri.count.method

Hello,

would it be possible to describe how the observed frequencies in the genome were counted?

e.g
stepwise window with a step of 3 and the start being equal to the first nucleotide of the chromosome

count all possible triplets in a sequence: given a string , the triplets would be count from position 1-3,2-4,3-5 and so on....

I am struggling to think what is the best way. Thank you!

Mutational Signatures for mouse mutation data?

Hi,

I have MAF files from mice tumours, and I tried to extract mutational signatures from that.

The trinucleotideMatrix function does not recognise the BSgenome.Mmusculus.UCSC.mm10 genome.

Is it possible to compare the mutational spectrum of mouse tumours to the mutational signatures derived from human tumours? After all, mouse tumour models are supposed to resemble to a certain extent human tumours. (?)

Any idea about this?

Thanks,
Alejandro

issue with BSgenome.Hsapiens.UCSC.hg38 in mut.to.sigs.input function

Hello.
I'm running the following command:

sigs.input = mut.to.sigs.input(mut.ref = "/path/to/myfile.txt", sample.id = "Sample", chr = "chr", pos = "pos", ref = "ref", alt = "alt", bsg = BSgenome.Hsapiens.UCSC.hg38)

it gives me the following error:
Error in as.character.default(<S4 object of class "BSgenome">) :
no method for coercing this S4 class to a vector

any suggestion?

Thank you very much for your work!

Save plots to a file

Hi,

Could you please add a parameter to save the plots to a file?
I tired
pdf("mutationSignature2.pdf")
plotSignatures(sample_1, sub="ex")
makePie(sample_1,sub="ex")
dev.off()
would it be possible to add argument to send plots to a pdf/png files?

Thanks,
Rajesh

mut.to.sigs.input function does not work even on sample data

When I run mut.to.sigs.input either on my own data or on the provided data, I get errors similar to the following:

Error in [.data.frame(mut, , sample.id) : undefined columns selected

It does not matter whether I use the default genome or specify another one. Colleagues who use the program regularly tell me that they simply don't use mut.to.sigs.input as it is perceived to be broken....

h38 support for deconstructSigs

Hello,

I'd like to cerate some signatures using your software, so far I performed two tests. I used the GATK pipeline with hg19 as reference and everything went fine.

However when I performed same pipeline with hg38 the following error occured (after mut.to.sigs.input).

Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  : 
  solving row 136: 'allow.nonnarrowing' is FALSE and the supplied start (181234917) is > refwidth + 1

Please could you give me a clue how to fix this issue?

Thank you,
Adam

custom panel normalization

Hi,
I am using whichSignatures on variants that I've found on specific genome regions (about 1 gb total).
I'm using mut.to.sigs.input to create the input data frame and using signatures.cosmic as the ref.

My question is about how should I normalize the input.

I've already derived the trinucleotide counts table (assigned it to tri.counts.targeted) using trinucleotideFrequency on a hg19 fasta intersected with my BED file.

I've thought about two options:

using tri.counts.method = tri.counts.genome/tri.counts.targeted
instead of using the absolute counts of the tables above, using normalized tables in order to normalize my data by the ratio of each trinucleotide count fraction in my panel regions compared to the i.e:
tri.counts.targeted.norm <- tri.counts.targeted/sum(tri.counts.targeted)
tri.counts.genome.norm <- tri.counts.genome/sum(tri.counts.genome)
and then using: tri.counts.method = tri.counts.genome.norm/tri.counts.targeted.norm

any advice regarding that issue will be appreciated!
greetings,
Idan

Normalisation to spesific TF context?

Hi,

Thank you for contributing science with this package.

~~Is there a way that I can tweak randomisation to apply my case which I want to normalise tri-nucleotide context considering TF base context.~~

~~For example lets say CTCF. The sequence context of binding regions might not be same as whole genome so when we get the signature there will be bias towards to the enriched base.~~

~~Previously I have attempted to simulate random bed files that have similar context then get the signature. If it the random signatures are not same to the CTCFs then my signature from CTCF is true.~~

Any thoughts will be helpful.

Best regards,

Tunc.

EDIT: Sorry for asking before reading the article in a more detailed way. Please forget my previous comment. ( I strikedthrough). Let me rephrase my question.
I found that I can supply custom tri-nucleotide context in to whichSignatures() function. For each transcription factor region, I will calculate tri-nucleotide context from their fasta and try to normalise based on it. This is going to be my method to normalise sequence context of TF regions. Is this right ? I think it is better than no normalisation.

However, I have a question related to "genome" normalisation. I get different mutation signature results when I use "genome" and "default" normalisation types. However, my data is a whole breast cancer data. (like a single whole genome patient data). Could you give me some information about the base of this normalisation. Why there is a "genome" option ? or in which situations I should use it ? In article for whole genome tumor samples, it is stated that there is no need for normalisation but, isn't it supposed to be give same result, even though I normalise to genome ?

To sum up, I am looking for if mutation signature changed after TF binding event occurs and I want to remove the effect of sequence context of TF binding regions.

Sorry for confusion.

raerose01 / deconstructsigs Goto Github PK

deconstructsigs's People

Contributors

Stargazers

Watchers

Forkers

deconstructsigs's Issues

Recommend Projects

Recommend Topics

Recommend Org