raerose01 / deconstructsigs Goto Github PK
View Code? Open in Web Editor NEWdeconstructSigs
deconstructSigs
I get the following error when I try to convert a vcf file into the correct input format for deconstructSigs via the command
vcf.to.sigs.input(vcf = "path_to_file/file1.vcf")
Error: scanVcf: invalid class “VCFHeader” object: 'info(VCFHeader)' must be a 3 column DataFrame with names Number, Type, Description
path: path_to_file/file1.vcf
Hi,
It's not clear to me whether setting the bsg
option to another genome in mut.to.sigs.input()
(e.g. mut.to.sigs.input(bsg=BSgenome.Dmelanogaster.UCSC.dm6) is sufficient to apply the correct normalisation in
whichSignatures(tri.counts.method = 'genome')`.
Your README says:
Included with the package are tri.counts.exome, which contains the trinucleotide counts for an exome and tri.counts.genome, which contains the trinucleotide counts for the hg19 genome
Does this change if we supply a different genome? Or will it always normalise to counts in hg19?
Thanks,
Nick
Hi, I have around 90 Samples (taken from same tissues from different patients) in my .maf files, how should i be determining the signatures contributing to these 90 samples at a go? Its very difficult to provide sample.id every time. It would be really helpful if you can please help me in this case
For example:
sample_1 = whichSignatures(tumor.ref = sigs.input,
signatures.ref = signatures.nature2013,
sample.id = 1,
contexts.needed = TRUE,
tri.counts.method = 'default')
Should i manually change this every time and generate a graph or can i run these 90 samples together and generate graph at once? Its very confusing to run manually every sample.Please help.
Thanks a ton,
Zac
Hi,
I am working on implementing this tool in the pipeline and wanted to compare it with other tools available.
I see you have used the TCGA-ESCA data in the paper, but the Signature predictions are not present in the Supplementary tables.
Please let me know if they are present in any other file or if you could please provide me the table, so I can use that as a benchmark. (Just like other datasets are present in Additional File 6 in the paper)
Hi,
I am a bit confused about when to use set the tri.counts.method to Genome.
For the 70-30 Simulation dataset with tri.counts.method should be default, right?
So If I want to test how the tool does with Genome normalization, which Simulated dataset should I use? Or is there a way to make the 70-30 Simulation dataset based on the tri.counts.genome
I was testing out this package with a public dataset as per this script here, and got this error message:
Warning message:
In mut.to.sigs.input(mut.ref = ccle, sample.id = "Tumor_Sample_Barcode", :
Check ref bases -- not all match context:
22RV1_PROSTATE:chrM:9477:G:A, 22RV1_PROSTATE:chrM:14323:G:A, 253JBV_URINARY_TRACT:chrM:10589:G:A, 253JBV_URINARY_TRACT:chrM:13768:T:C, AML193_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:12720:A:G, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:6179:G:A, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:8684:C:T, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:14470:T:C, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:15148:G:A, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:15355:G:A, AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:15487:A:T, BICR31_UPPER_AERODIGESTIVE_TRACT:chrM:14793:A:G, BICR31_UPPER_AERODIGESTIVE_TRACT:chrM:15218:A:G, BICR56_UPPER_AERODIGESTIVE_TRACT:chrM:8697:G:A, BICR56_UPPER_AERODIGESTIVE_TRACT:chrM:13928:G:C, BICR56_UPPER_AERODIGESTIVE_TRACT:chrM:14905:G:A, BICR6_UPPER_AERODIGESTIVE_TRACT:chrM:6365:T:C, BL41_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:11251:A:G, BL41_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE:chrM:12612:A:G, BL41_HAEMATOPOIETIC_AND_LY [... truncated]
I found the origin of this error message in the package source code here, however I am not clear on the significance of it. What exactly does this mean? And is there a way I can clean the data to avoid the issue? Thanks.
Hi,
I am still struggling to understand which normalization procedure would make sense for WES versus WGS, both from your previous discussions, and also from the documentation that goes with the deconstructSigs R package. According to the package docs: " For exome data, the 'exome2genome' method is appropriate for the signatures included in this package. For whole genome data, use the 'default' method to obtain consistent results.". From the deconstructSigs GitHub page it is however stated that "For exome data, the default method is appropriate for the signatures included in this package.". I am using the 'signatures.cosmic' as the set of known signatures.
best,
Sigve
if there is a sample in vcf, and all variants with same genotype '0/1',
that make alt1 of vcf.to.sigs.input() empty,
and break the process
I think it will be better to check the length of alt1 before combine to mut to prevent this bug
The mutation limit of the "mut.to.sigs.input" tool is 2,000 but I need to analize more, What can I do?
Thanks.
Hi,
this package seems great! thank u!
I have some whole genome seq that I wanted to try, but I am bit confused about whether I have to set on tri.counts.method
to genome
or not. I want them to compare to signatures.nature2013
signature.
any help will be great!
thanks
If makePie() is called when only one signature weight is non-zero, it fails due to a needed ", drop=FALSE". If furthermore it is called after that fix is added, with only one signature and with unknown=0, it also fails in the palette() function call (which can be fixed by adding a dummy second color to colors.sigs.present).
Hi,
I am getting the below error when attempting to install (R version is 3.4.4)
Is there another recommended way of installing?
> install.packages("deconstructSigs")
Installing package into ‘/home/lmose/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
Warning: dependencies ‘BSgenome’, ‘BSgenome.Hsapiens.UCSC.hg19’, ‘GenomeInfoDb’ are not available
trying URL 'https://cloud.r-project.org/src/contrib/deconstructSigs_1.8.0.tar.gz'
Content type 'application/x-gzip' length 211160 bytes (206 KB)
==================================================
downloaded 206 KB
ERROR: dependencies ‘BSgenome’, ‘BSgenome.Hsapiens.UCSC.hg19’, ‘GenomeInfoDb’ are not available for package ‘deconstructSigs’
* removing ‘/home/lmose/R/x86_64-pc-linux-gnu-library/3.4/deconstructSigs’
The downloaded source packages are in
‘/tmp/RtmpGcYflL/downloaded_packages’
Warning message:
In install.packages("deconstructSigs") :
installation of package ‘deconstructSigs’ had non-zero exit status
HI
I want to use deconstructsigs on my WES data. However, there are a lot of SNVs on introns or UTRs. Should I filter these SNVs and only leave those on CDS? Thanks!
Yang
Hi,
It is a useful tool for mutation feature analysis.
I have some whole exome/genome sequencing data that I wanted to compare to signatures.nature2013 (or signatures.cosmic) signature. I am bit confused about which tri.counts.method I have to set on, exome or exome2genome for WES, genome for WGS?
thanks
Can you add counts for hg38 and mm10, which are more modern and commonly used in today's times.
I'm using deconstrucSigs for GRch38 genome. I find the chromosome names in BSGenome are 1,2,3 etc, and not 'chr1', 'chr2', ..
However in the mut.to.sigs.input function, there's one line:
levels(mut[, chr]) <- sub("^([0-9XY])", "chr\1", levels(mut[, chr]))
that add the 'chr' prefix and leads to the error:
Check chr names -- not all match BSgenome.Hsapiens.NCBI.GRCh38 object:
chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrX
Is there anyway to get rid of that line?
Hi,
I am permanently getting this error
sigs.input <- mut.to.sigs.input(mut.ref = a,
sample.id = "Sample",
chr = "chr",
pos = "pos",
ref = "ref",
alt = "alt")
Error in [.data.frame
(mut.full, , c(sample.id, chr, pos, ref, alt)) :
undefined columns selected
This is link to my data
https://www.dropbox.com/s/1rf9cps97znae03/myfile.txt?dl=0
Could I please ask to have a look to see why I am failing?
Can you change the colors to the usual ones used in cosmic?
thanks in advance
Hi,
thanks for a great package!
I'm using it for tumor-only sequencing with our https://github.com/lima1/PureCN tool. PureCN essentially predicts somatic status by adjusting allele frequencies for tumor purity and allele-specific copy number. This works pretty well and I get a decent correlation of deconstructSigs scores with matched tumor/normal. But I was wondering if you see an easy way to incorporate the uncertainty of the somatic prediction. Essentially weighting mutations by somatic posterior probability would be perfect. (for example especially copy number losses can result in cases where germline and somatic have very similar expected allele frequencies).
I understand if you don't have the bandwidth for this. But if you see an easy solution, I'm happy to dig into the code and method.
Thanks in advance,
Markus
I have encountered two problems while using DeconstructSigs.
I have used DeconstructSigs a while ago and everthing worked fine, but lately I have the problem, that the title of the graph and weights of the different signatures are no longer displayed in the graphics (after running plotSignatures). Additionally, in some cases "makepie() fails and shows a white pie called "1".
Does anyone have an idea how I can resolve these problems?
Thanks
If I run
p <- plotSignatures(signatures, sub = 'signatures.cosmic')
The plot outputs to the graphics device, and p
becomes NULL
with warnings:
Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "cin" cannot be set
2: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "cra" cannot be set
3: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "csi" cannot be set
4: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "cxy" cannot be set
5: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "din" cannot be set
6: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "page" cannot be set
7: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "cin" cannot be set
8: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "cra" cannot be set
9: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "csi" cannot be set
10: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "cxy" cannot be set
11: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "din" cannot be set
12: In doTryCatch(return(expr), name, parentenv, handler) :
graphical parameter "page" cannot be set
I want to be able to render the plot later, in a different location, potentially on a different system that does not have deconstructSigs
installed. Passing around a pre-rendered PDF or PNG output is not desirable since it wont scale to fit the final location for rendering. Is there a way to do this?
Hi there!
I am attempting to use your R tool to analyze some VCF files, and deconstruct the signatures within them. I am using HG38 as a reference, and when I change bsg to be BSgenome.Hsapiens.UCSC.hg38 i get the error "undefined columns selected". this is while running mut.to.sigs.input
Is there support for hg38? I have been trying to find a workaround, but having the reference be tied up in the BSgenome library its hard to dev
Why does the trinucleotide frequency data tri.counts.exome or tri.counts.genome in deconstSigs package has only 32 values instead of 96?
When using mut.to.sigs.input on a data.frame as below:
sample.id Chr Start Ref Alt
1: P001A_Tumor1 chr1 22304831 C A
2: P001A_Tumor1 chr3 56835761 G A
3: P001A_Tumor1 chr3 155197843 G A
4: P001A_Tumor1 chr3 195978246 T C
5: P001A_Tumor1 chr4 9706544 G C
6: P001A_Tumor1 chr4 39975008 C G
input <- mut.to.sigs.input(
a.mut.ref,
sample.id='sample.id',
chr='Chr',
pos='Start',
ref='Ref',
alt='Alt'
)
It will got an error
Error in `[.data.frame`(x, i, j) : undefined columns selected
This error can be fixed by renaming the colnames of the data.frame from "sample.id" to other names.
It seems that the data.frame can not be named using "sample.id".
My session info is
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.4 dplyr_0.7.6 stringr_1.3.1 deconstructSigs_1.8.0
[5] BiocInstaller_1.30.0 RevoUtils_11.0.1 RevoUtilsMath_11.0.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 pillar_1.3.0
[3] compiler_3.5.1 GenomeInfoDb_1.16.0
[5] bindr_0.1.1 XVector_0.20.0
[7] zlibbioc_1.26.0 bitops_1.0-6
[9] tools_3.5.1 lattice_0.20-35
[11] tibble_1.4.2 BSgenome_1.48.0
[13] pkgconfig_2.0.2 rlang_0.2.2
[15] Matrix_1.2-14 DelayedArray_0.6.5
[17] rstudioapi_0.7 yaml_2.2.0
[19] parallel_3.5.1 bindrcpp_0.2.2
[21] GenomeInfoDbData_1.1.0 rtracklayer_1.40.6
[23] Biostrings_2.48.0 S4Vectors_0.18.3
[25] IRanges_2.14.11 grid_3.5.1
[27] stats4_3.5.1 tidyselect_0.2.4
[29] Biobase_2.40.0 glue_1.3.0
[31] R6_2.2.2 BSgenome.Hsapiens.UCSC.hg19_1.4.0
[33] XML_3.98-1.16 BiocParallel_1.14.2
[35] purrr_0.2.5 magrittr_1.5
[37] matrixStats_0.54.0 GenomicAlignments_1.16.0
[39] Rsamtools_1.32.3 BiocGenerics_0.26.0
[41] GenomicRanges_1.32.6 SummarizedExperiment_1.10.1
[43] assertthat_0.2.0 stringi_1.1.7
[45] RCurl_1.95-4.11 crayon_1.3.4
Thanks a lot.
Hi,
I want to use deconstructSigs to analyze the mutation signature of lung cancer patients with their whole exome sequencing data.
I run the command following the instruction in the page "https://github.com/raerose01/deconstructSigs".
After I loaded the data,
(str(dat4)
'data.frame': 9623 obs. of 5 variables:
$ X : int 13 13 13 13 13 13 13 13 13 13 ...
$ Chr: chr "chr1" "chr1" "chr1" "chr1" ...
$ End: int 2461397 2461418 2461421 3328849 3328852 3328858 3328861 3328873 3328876 3328889 ...
$ Ref: chr "G" "G" "G" "C" ...
$ Alt: chr "A" "C" "A" "T" ...)
I run
sigs.input <- mut.to.sigs.input(mut.ref = dat4,
sample.id = "X",
chr = "Chr",
pos = "End",
ref = "Ref",
alt = "Alt")
I got this:
Error in [<-
(*tmp*
, i, trimer, value = 21L) : subscript out of bounds
What is the mean of "subscript out of bounds"?
Can you tell me what is wrong with it?
Thank you!
Zack
I was able to get the package to work with variants called from RNA-Seq data. However, I am not sure what the optimal normalization method for this is. The package description only mentions exome and whole genome data. Any suggestions? Are there details on how to determine & create your own normalization method to use?
I have .vcf files for many samples. I would like to produce a single plot that shows the signatures for all of them. I am not clear how to accomplish this, since the whichSignatures
function seems to only take one sample at a time. Suggestions?
Dear raerose01,
I am using deconstructSigs for the first time and having difficulty. I have a data frame of somatic mutations in the format:
sample.id chr pos ref alt
sample1 6 32157652 C G
When I try to import this using mut.to.sigs.input I get an error
sig<-mut.to.sigs.input(ds,sample.id="sample.id",chr="chr",pos="pos",ref="ref",alt="alt",bsg=BSgenome.Hsapiens.UCSC.hg19)
Error in .local(x, ...) :
'start', 'end' and 'width' can only be specified when 'names' is either missing, a character vector/factor, a character-Rle, or a factor-Rle
Please help,
Juan
Certain genomic profiles are generating the error:
Error in if (trimer %in% all.tri) { : argument is of length zero
I'm using the appropiate BSgenome object. I've attached a sample input file. I tried adding a check to see if trimer was not null prior, but that ended up creating a blank matrix.
Hi, I met a error when runing mut.to.sigs.input().
The following is the demo of the TCGA mutation maf file.
Tumor_Sample_Barcode Start_Position Reference_Allele Tumor_Seq_Allele2
<chr> <chr> <chr> <chr>
1 TCGA-DD-AACT-01A-11D-A40R-10 1955799 T A
2 TCGA-DD-AACT-01A-11D-A40R-10 30877160 C T
3 TCGA-DD-AACT-01A-11D-A40R-10 31697974 G A
4 TCGA-DD-AACT-01A-11D-A40R-10 39966012 A G
5 TCGA-DD-AACT-01A-11D-A40R-10 63323401 G A
6 TCGA-DD-AACT-01A-11D-A40R-10 113829981 G C
After I run
sigs.input <- mut.to.sigs.input(mut.ref = sample.mut.ref,
sample.id = "Tumor_Sample_Barcode",
chr = "Chromosome",
pos = "Start_Position",
ref = "Reference_Allele",
alt = "Tumor_Seq_Allele2",
bsg = BSgenome.Hsapiens.UCSC.hg38)
I just got an error like : Error in FUN(left, right) : non-numeric argument to binary operator
Could you give me some hints?
Thanks for the great package!
I'm trying to run on mouse data, and having trouble with it. I'm using
bsg = BSgenome.Mmusculus.UCSC.mm10
to provide a BSgenome object for mut.to.sigs.input
, but I keep getting the following error:
Error in as.character.default(<S4 object of class "BSgenome">) :
no method for coercing this S4 class to a vector
BSgenome.Mmusculus.UCSC.mm10
is a BSgenome
object. I also tried for some human data to manually set bsg = BSgenome.Hsapiens.UCSC.hg19
, but get the same error, regardless of whether I append ::Hsapiens
to the end of the object name. The function works perfectly if I use default bsg, but this should be the same thing. Is there something obvious that I'm missing here?
Thanks,
Rob
Hi,
Is there a signature matrix for the expanded signature set detailed here? https://www.biorxiv.org/content/biorxiv/early/2018/05/15/322859.full.pdf
Thanks,
Emily
head(d)
Sample chr pos ref alt
1 SRR630583 chr1 735141 A T
2 SRR630583 chr1 786021 A T
3 SRR630583 chr1 794397 C A
4 SRR630583 chr1 813831 A T
5 SRR630583 chr1 848467 C T
6 SRR630583 chr1 920860 A G
sigs.input <- mut.to.sigs.input(mut.ref = d,sample.id="Sample",chr = "chr", pos = "pos", ref = "ref", alt = "alt")
####error
sigs.input <- mut.to.sigs.input(mut.ref = d,sample.id="Sample",chr = "chr", pos = "pos", ref = "ref", alt = "alt")
Error in validObject(.Object) :
invalid class “DNAStringSet” object: undefined class for slot "elementMetadata" ("DataTableORNULL")
why this error happend?
I saw the option of using triFreq function that counts the number of times each triunucleotide is found in a supplied genome in genome level.
But How to generate counts of every trinucleotide frequency in exome-level?
Having trouble opening PDFs in any PDF reader created by plotSignatures and makePie. Any ideas what might be causing this?
Hello I've detected an extrange behaviour at mut.to.sigs.input function, this behaviour generates an out of memory error (even with 54GB!) when parsing big files at beep loop.
This is the affected code:
for (i in unique(mut[, sample.id])) {
tmp = subset(mut, mut[, sample.id] == i) #Failing line
beep = table(tmp$tricontext)
for (l in 1:length(beep)) {
trimer = names(beep[l])
if (trimer %in% all.tri) {
final.matrix[i, trimer] = beep[trimer]
}
}
}
What I've seen is, when I was going to execute the substep line the size of selected rows was squared. For example, when perorming a subset of 100 samples ( and 10 columns), the tmp matrix dimensions were 10000x10 (!) instead of expected 100x10 one.
I've checked 3 different ways to perform the same operation and in all the behaviour is the expected.
I suggest you could try to implement "tmp2" or "tmp4" solutions.
i= 'PDX102.bam'
tmp = subset(mut, mut[, sample.id] == i)
tmp2 = mut[mut[,sample.id] == i,]
tmp3 = subset(mut, c(rep(TRUE,100)))
inSubset = mut[, sample.id] == i
tmp4 = subset(mut, inSubset)
dim(tmp) # 10000 10
dim(tmp2) # 100 10
dim(tmp3) # 100 10
dim(tmp4) # 100 10
Thank you!
Thanks for all of your hard work on deconstructSigs. I found a small bug in the implementation of arbitrary bsg objects. The below line fails because paste can’t coerce the bsg object to a character.
warning(paste('Check chr names -- not all match ”,bsg,“ object:\n', unknown.regions, sep = ' ‘))
sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000
[2] BSgenome_1.38.0
[3] rtracklayer_1.30.4
[4] deconstructSigs_1.8.0
[5] RColorBrewer_1.1-2
[6] VariantAnnotation_1.16.4
[7] Rsamtools_1.22.0
[8] Biostrings_2.38.4
[9] XVector_0.10.0
[10] SummarizedExperiment_1.0.2
[11] Biobase_2.30.0
[12] GenomicRanges_1.22.4
[13] GenomeInfoDb_1.6.3
[14] IRanges_2.4.8
[15] S4Vectors_0.8.11
[16] BiocGenerics_0.16.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.6 AnnotationDbi_1.32.3 magrittr_1.5
[4] GenomicAlignments_1.6.3 zlibbioc_1.16.0 BiocParallel_1.4.3
[7] stringr_1.0.0 plyr_1.8.4 tools_3.2.0
[10] DBI_0.4-1 lambda.r_1.1.9 futile.logger_1.4.3
[13] reshape2_1.4.1 futile.options_1.0.0 bitops_1.0-6
[16] RCurl_1.95-4.8 biomaRt_2.26.1 RSQLite_1.0.0
[19] stringi_1.1.1 GenomicFeatures_1.22.13 XML_3.98-1.4
Hi,
when processing VCF files with partially (or fully) annotated ID field using the readGT function, what you get back is a matrix looking like this:
> gt <- readGT(vcf, nucleotides = TRUE)
> head(gt)
tumor
1:6610695_G/A "G/A"
1:15654752_C/T "C/T"
1:16248740_A/T "A/T"
1:40928079_A/T "A/T"
rs2596251 "G/A"
rs75758917 "G/T"
Some row names are dbSNP IDs, which makes it impossible to obtain chromosome, position or reference information the way it is done in the vcf.to.sigs.input function.
chr <- sub(":.+", "", rownames(gt))
pos <- as.numeric(sub("_.+", "", sub(".+:", "", rownames(gt))))
ref <- sub("[/|].+", "", sub(".+_", "", rownames(gt)))
Is there a reason other than readGT being lightweight, that a more general solution like this below is not used?
auxVcf <- readVcf(vcf)
chr <- as.character(seqnames(auxVcf))
pos <- start(auxVcf)
ref <- as.character(ref(auxVcf))
Thanks for all the work!
Marko
Hi ,
Do you recommend on including or excluding the mutations on (X and Y) chromosome from the input?
Since the opportunity for mutations on chromosome X will be different in male and female samples, right?
Thank you!
Just wanted to thank you for your very useful and kind answer. All good now. Will stick to the 6% cutoff.
Many thanks,
Pedro
Hi,
Do you have a suggestion on the SSE threshold above which is indicative of something else happening in the sample (resulting in the use of probably a de-novo signature prediction etc)?
Even in the Supplementary figure 2 (b) from the paper, there are samples with enough mutations but still higher SSE.
I am just trying to see if I can use the SSE value to provide some confidence.
Thanks,
Rashesh
Dear DecosntructSigs developers,
I recently used your software. While everything went smoothly, I noticed that, even in the examples you provide, the probabilities/weights assigned to each signature, using either nature or cosmic references, DO NOT always add up to 1. Why is that? Does that mean that a set of SNVs are not explained by any of the signatures?
I would highly appreciate your input.
Regards,
Pedro
Sloan Kettering Institute
Hello,
would it be possible to describe how the observed frequencies in the genome were counted?
e.g
stepwise window with a step of 3 and the start being equal to the first nucleotide of the chromosome
count all possible triplets in a sequence: given a string , the triplets would be count from position 1-3,2-4,3-5 and so on....
I am struggling to think what is the best way. Thank you!
Hi,
I have MAF files from mice tumours, and I tried to extract mutational signatures from that.
The trinucleotideMatrix function does not recognise the BSgenome.Mmusculus.UCSC.mm10 genome.
Is it possible to compare the mutational spectrum of mouse tumours to the mutational signatures derived from human tumours? After all, mouse tumour models are supposed to resemble to a certain extent human tumours. (?)
Any idea about this?
Thanks,
Alejandro
Hello.
I'm running the following command:
sigs.input = mut.to.sigs.input(mut.ref = "/path/to/myfile.txt", sample.id = "Sample", chr = "chr", pos = "pos", ref = "ref", alt = "alt", bsg = BSgenome.Hsapiens.UCSC.hg38)
it gives me the following error:
Error in as.character.default(<S4 object of class "BSgenome">) :
no method for coercing this S4 class to a vector
any suggestion?
Thank you very much for your work!
Hi,
Could you please add a parameter to save the plots to a file?
I tired
pdf("mutationSignature2.pdf")
plotSignatures(sample_1, sub="ex")
makePie(sample_1,sub="ex")
dev.off()
would it be possible to add argument to send plots to a pdf/png files?
Thanks,
Rajesh
When I run mut.to.sigs.input either on my own data or on the provided data, I get errors similar to the following:
Error in [.data.frame
(mut, , sample.id) : undefined columns selected
It does not matter whether I use the default genome or specify another one. Colleagues who use the program regularly tell me that they simply don't use mut.to.sigs.input as it is perceived to be broken....
Hello,
I'd like to cerate some signatures using your software, so far I performed two tests. I used the GATK pipeline with hg19 as reference and everything went fine.
However when I performed same pipeline with hg38 the following error occured (after mut.to.sigs.input).
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord, :
solving row 136: 'allow.nonnarrowing' is FALSE and the supplied start (181234917) is > refwidth + 1
Please could you give me a clue how to fix this issue?
Thank you,
Adam
Hi,
I am using whichSignatures on variants that I've found on specific genome regions (about 1 gb total).
I'm using mut.to.sigs.input to create the input data frame and using signatures.cosmic as the ref.
My question is about how should I normalize the input.
I've already derived the trinucleotide counts table (assigned it to tri.counts.targeted) using trinucleotideFrequency on a hg19 fasta intersected with my BED file.
I've thought about two options:
any advice regarding that issue will be appreciated!
greetings,
Idan
Hi,
Thank you for contributing science with this package.
Is there a way that I can tweak randomisation to apply my case which I want to normalise tri-nucleotide context considering TF base context.
For example lets say CTCF. The sequence context of binding regions might not be same as whole genome so when we get the signature there will be bias towards to the enriched base.
Previously I have attempted to simulate random bed files that have similar context then get the signature. If it the random signatures are not same to the CTCFs then my signature from CTCF is true.
Any thoughts will be helpful.
Best regards,
Tunc.
EDIT: Sorry for asking before reading the article in a more detailed way. Please forget my previous comment. ( I strikedthrough). Let me rephrase my question.
I found that I can supply custom tri-nucleotide context in to whichSignatures()
function. For each transcription factor region, I will calculate tri-nucleotide context from their fasta and try to normalise based on it. This is going to be my method to normalise sequence context of TF regions. Is this right ? I think it is better than no normalisation.
However, I have a question related to "genome" normalisation. I get different mutation signature results when I use "genome" and "default" normalisation types. However, my data is a whole breast cancer data. (like a single whole genome patient data). Could you give me some information about the base of this normalisation. Why there is a "genome" option ? or in which situations I should use it ? In article for whole genome tumor samples, it is stated that there is no need for normalisation but, isn't it supposed to be give same result, even though I normalise to genome ?
To sum up, I am looking for if mutation signature changed after TF binding event occurs and I want to remove the effect of sequence context of TF binding regions.
Sorry for confusion.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.