reimandlab / activedriverwgsr Goto Github PK
View Code? Open in Web Editor NEWActiveDriverWGSR is an R package for discovery of cancer driver genes and non-coding elements in whole genome sequencing data
ActiveDriverWGSR is an R package for discovery of cancer driver genes and non-coding elements in whole genome sequencing data
Hello,
I just wanted to clarify whether it was essential for the ref/alt alleles for indel mutations to be represented solely by A/C/T/G bases or whether mutation representations containing "-" were acceptable (as long as pos1/pos2 are consistent with these)?
In the documentation (https://htmlpreview.github.io/?https://github.com/reimandlab/ActiveDriverWGSR/blob/master/doc/ActiveDriverWGSR.html#mutations) it is suggested that allele representations should only include A/T/C/G, but the example cll_mutations data contains mutations where the ref/alt allele is represented by "-":
Thanks very much
Best wishes
Ben
Hi there,
I ran ActiveDriverWGS using this command:
drivers <- ActiveDriverWGS(mutations = unique_mutations,
elements = elements,
ref_genome = "hg38",
window_size = 50000,
filter_hyper_MB = 30,
mc.cores = 16,
recovery.dir = recovery_dir)
> head(unique_mutations)
chr pos1 pos2 ref alt patient
1: chr1 16965728 16965728 G G BCCA1T
2: chr1 20345468 20345468 G G BCCA1T
3: chr1 21823627 21823627 C C BCCA1T
4: chr1 109192794 109192794 A A BCCA1T
5: chr1 111766331 111766331 T T BCCA1T
6: chr1 112913924 112913924 A A BCCA1T
> head(elements)
chr start end id
1: chr1 11869 14409 ENSG00000223972.5
2: chr1 11869 14409 ENST00000456328.2
3: chr1 11869 12227 exon:ENST00000456328.2:1
4: chr1 12613 12721 exon:ENST00000456328.2:2
5: chr1 13221 14409 exon:ENST00000456328.2:3
6: chr1 12010 13670 ENST00000450305.2
However i keep getting this error:
7 remove hypermut, n= 865012 , 29 %
hypermuted samples: BCCA15T BCCA22T BCCA38T BCCA43T BCCA9T DKFZ-KZWST1 NYGC7T
reversing 0 positions
Removing 0 invalid SNVs & indels
Number of Elements with 0 Mutations: 1759191
Tests to do: 447249
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
In addition: Warning message:
In parallel::mclapply(1:length(not_done), function(i) { :
all scheduled cores encountered errors in user code
Not sure what's causing the error, would you be able to help troubleshoot? Many thanks!
Al
Hello,
Thank you for this very useful software package! I was just wondering if there are plans to add support for GRCh38, or if it would be possible to provide details for how other users could create GRCh38 reference files suitable for use with ActiveDriverWGSR?
Thanks very much
Best wishes
Ben
Hi there!
I've been wondering if there is a way to retrieve patient ids in which the observed mutations have occurred when running ActiveDriverWGS.
Thanks!
Hi,
I'm interested to use this tool for one of datasets. I've VCF files per chromosome, is there any code/script that can be used to prepare input mutation data (tabular - six columns: chr, pos1, pos2, ref allele, alt allele and patient name) from VCF file?
Excuse me:
As the name of the software, ActiveDriverWGSR, I wonder whether it could be used to ananlyze WES data.Hope your response
Hi,
First of all, thank you so much for sharing such an amazing analyzing tool for us.
I believe ActiveDriverWGS can help and move my project forward.
I think I can prepare equivalent files of "cll_mutations" and "cancer_genes" by myself.
However, I have no idea how I can generate the "cancer_gene_sites" file. and I do not understand why we need it.
Will you please explain clearly what the cancer_gene_sites is and let me know how I can make this file for my analysis?
I am interested in analyzing non-coding regions such as UTR, intron, promoter and enhancer regions.
Thank you.
I got the following error:
> results = ActiveDriverWGS(mutations = wgs_variants,
+ elements = peak_ac,
+ ref_genome = "hg38")
0 remove hypermut, n= 0 , 0 %
hypermuted samples:
reversing 0 positions
Removing 0 invalid SNVs & indels
Number of Elements with 0 Mutations: 149712
Tests to do: 1035
.Error in glm.fit(x = numeric(0), y = numeric(0), weights = NULL, start = NULL, :
object 'fit' not found
In addition: Warning messages:
1: In glm.fit(x = numeric(0), y = numeric(0), weights = NULL, start = NULL, :
no observations informative at iteration 1
2: glm.fit: algorithm did not converge
is violated when the same candidate driver includes indels and SNVs in the same patient
Hello, We are pretty interesting to test this algorithm in our dataset with >1000 WGS tumor-normal matched data. can you let us know where can I get the PTM sites as the algorithm suggested. For example, i try to look at this website
https://activedriverdb.org/download/, but none of them including the genome positions (in GRCh38) for 552,068 PTMs sites. So, is there another website or link I can download this from? Thanks so much.
install.packages("ActiveDriverWGS")
library(ActiveDriverWGS)
#example elements
testEle <- data.frame("chr" = c("chr1", "chr1"), "start" = c(3195984, 3204562),
"end" = c(3205713, 3661579 ), "id" = c("uc007aet.1", "uc007aeu.1"), stringsAsFactors = FALSE)
#example insertion only mutation data
testMutIn <- data.frame("chr" = c("chr18", "chr8"), "pos1" = c(70287915, 32273077),
"pos2" = c(70287915, 32273077), "ref" = c("T", "T"),
"alt" = c("GA", "GA"), "patient" = c("10167", "10167"), stringsAsFactors = FALSE)
results = ActiveDriverWGS(mutations = testMutIn,
elements = testEle)
When activeDriverWGSR is run with mutation data that only consists of insertions it returns the following error
Error in ActiveDriverWGS(mutations = testMutIn, elements = testEle) :
Reference and alternate alleles must be A, T, C or G
This error message comes from line 128 which checks if the reference and alternative columns are "legal_DNA".
The test examples used are indeed legal_DNA as they are sequences that consist of "G" and "A" however, the current check only allows single nucleotides.
# what line 128 is checking for
legal_dna = c('A', 'T', 'C', 'G')
> any(testMutIn$alt %in% legal_dna)
[1] FALSE
A possible correction to incorporate insertion data would be a grepl() with a regex expression.
if (!(any(mutations$ref %in% legal_dna) && grepl("^[ATCG]+$", mutations$alt))) stop("Reference and alternate alleles must be A, T, C or G")
When activeDriver is run with this quick fix, the following error message is returned.
Error in .get_3n_context_of_mutations(mutations, reference) :
Dataset must contain SNVs
This error message comes from line 31 in format_muts.R
.
The current activeDriver does not run with insertion only mutation data.
How does ADWGS_test handle the case when an adjacent element sits in the background window? It seems like these are not excluded and if the adjacent element is itself hypermutated, it could inflate the background mutation rate. Does this happen in practice?
Is this software only suitable for whole genome sequencing data? what about whole exome sequencing data?
Error: Cannot load http://raw.githubusercontent.com/reimandlab/ActiveDriverWGS/master/inst/doc/ActiveDriverWGS.html: 404 Not Found
Line 258 - if(length(unmutated_elements) > 9)
Just reminding to remove with the next release.
Hi @reimand0 ,
I am getting an error, when trying to run ActiveDriveWGS using following command:
ActiveDriverWGS.res[[i]] = ActiveDriverWGS(mutations = ActiveDriverWGSInfo[[i]], elements = elements, sites = NULL, window_size = 50000, recovery.dir = paste(outDir, "ActiveDriverWGS_recovery", sep="/"), mc.cores = 4, ref_genome = paste0("hg", params$ucsc_genome_assembly))
The error is:
0 remove hypermut, n= 0 , 0 %
hypermuted samples:
reversing 0 positions
Removing 0 invalid SNVs & indels
Number of Elements with 0 Mutations: 14
Tests to do: 1195
Tests recovered: 928
100 elements completed
200 elements completed
300 elements completed
400 elements completed
500 elements completed
600 elements completed
700 elements completed
800 elements completed
900 elements completed
.Error in glm.fit(x = numeric(0), y = numeric(0), weights = NULL, start = NULL, :
object 'fit' not found
Data
> head(ActiveDriverWGSInfo)
$pdac
chr pos1 pos2 ref alt patient
1: chr3 120002137 120002137 T C p010_tumor-52fccd-somatic.pcgr.vcf
2: chr4 125450687 125450687 G T p010_tumor-52fccd-somatic.pcgr.vcf
3: chr5 38502681 38502681 C A p010_tumor-52fccd-somatic.pcgr.vcf
4: chr6 89951675 89951675 C T p010_tumor-52fccd-somatic.pcgr.vcf
5: chr7 82822603 82822603 G T p010_tumor-52fccd-somatic.pcgr.vcf
---
196698: chrX 15795073 15795073 A C SA533811_SP125786
196699: chrX 15800132 15800132 T A SA569276_SP133702
196700: chrX 15803607 15803607 G T SA558660_SP125807
196701: chr9 14398633 14398633 C G CGPA229
196702: chr1 186680291 186680291 C T CGPA234
> str(ActiveDriverWGSInfo)
List of 1
$ pdac:Classes 'data.table' and 'data.frame': 196702 obs. of 6 variables:
..$ chr : chr [1:196702] "chr3" "chr4" "chr5" "chr6" ...
..$ pos1 : num [1:196702] 1.20e+08 1.25e+08 3.85e+07 9.00e+07 8.28e+07 ...
..$ pos2 : num [1:196702] 1.20e+08 1.25e+08 3.85e+07 9.00e+07 8.28e+07 ...
..$ ref : chr [1:196702] "T" "G" "C" "C" ...
..$ alt : chr [1:196702] "C" "T" "A" "T" ...
..$ patient: chr [1:196702] "p010_tumor-52fccd-somatic.pcgr.vcf" "p010_tumor-52fccd-somatic.pcgr.vcf" "p010_tumor-52fccd-somatic.pcgr.vcf" "p010_tumor-52fccd-somatic.pcgr.vcf" ...
..- attr(*, ".internal.selfref")=<externalptr>
> head(elements)
chr start end id GENEID
20 chr12 912077 990053 RAD52 ENSG00000002016
30 chr17 38869859 38921770 LASP1 ENSG00000002834
56 chr12 21468911 21501669 RECQL ENSG00000004700
66 chr7 96120220 96322147 SLC25A13 ENSG00000004864
74 chr19 18831938 18868236 UPF1 ENSG00000005007
78 chr7 27181510 27185223 HOXA11 ENSG00000005073
I have also done some sanity checking on both mutation and elements data and found no issues with having NA's or empty values.
Can you please help figure out why this might be producing an error?
Many thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.