reimandlab / activedriverwgsr Goto Github PK

ActiveDriverWGSR is an R package for discovery of cancer driver genes and non-coding elements in whole genome sequencing data

R 100.00%

activedriverwgsr's People

Contributors

Stargazers

Watchers

Forkers

jmarzec asfarlathif xtmgah corner0426 hegu2692 biocq

activedriverwgsr's Issues

indel ref and alt alleles query

Hello,

I just wanted to clarify whether it was essential for the ref/alt alleles for indel mutations to be represented solely by A/C/T/G bases or whether mutation representations containing "-" were acceptable (as long as pos1/pos2 are consistent with these)?

In the documentation (https://htmlpreview.github.io/?https://github.com/reimandlab/ActiveDriverWGSR/blob/master/doc/ActiveDriverWGSR.html#mutations) it is suggested that allele representations should only include A/T/C/G, but the example cll_mutations data contains mutations where the ref/alt allele is represented by "-":

Thanks very much

Best wishes

Ben

numbers of columns of arguments do not match

Hi there,

I ran ActiveDriverWGS using this command:

drivers <- ActiveDriverWGS(mutations = unique_mutations,
                elements = elements,  
                ref_genome = "hg38",
                window_size = 50000,
                filter_hyper_MB = 30,
                mc.cores = 16,
                recovery.dir = recovery_dir)

> head(unique_mutations)
    chr      pos1      pos2 ref alt patient
1: chr1  16965728  16965728   G   G  BCCA1T
2: chr1  20345468  20345468   G   G  BCCA1T
3: chr1  21823627  21823627   C   C  BCCA1T
4: chr1 109192794 109192794   A   A  BCCA1T
5: chr1 111766331 111766331   T   T  BCCA1T
6: chr1 112913924 112913924   A   A  BCCA1T

> head(elements)
    chr start   end                       id
1: chr1 11869 14409        ENSG00000223972.5
2: chr1 11869 14409        ENST00000456328.2
3: chr1 11869 12227 exon:ENST00000456328.2:1
4: chr1 12613 12721 exon:ENST00000456328.2:2
5: chr1 13221 14409 exon:ENST00000456328.2:3
6: chr1 12010 13670        ENST00000450305.2

However i keep getting this error:

7 remove hypermut, n= 865012 ,  29 %
hypermuted samples:  BCCA15T BCCA22T BCCA38T BCCA43T BCCA9T DKFZ-KZWST1 NYGC7T 

reversing 0 positions
Removing  0  invalid SNVs & indels

Number of Elements with 0 Mutations:  1759191 
Tests to do:  447249 
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match
In addition: Warning message:
In parallel::mclapply(1:length(not_done), function(i) { :
  all scheduled cores encountered errors in user code

Not sure what's causing the error, would you be able to help troubleshoot? Many thanks!

GRCh38 reference files

Hello,

Thank you for this very useful software package! I was just wondering if there are plans to add support for GRCh38, or if it would be possible to provide details for how other users could create GRCh38 reference files suitable for use with ActiveDriverWGSR?

Thanks very much

Best wishes

Ben

Retrieving Patient information?

Hi there!

I've been wondering if there is a way to retrieve patient ids in which the observed mutations have occurred when running ActiveDriverWGS.

Thanks!

how to create input mutation data

Hi,
I'm interested to use this tool for one of datasets. I've VCF files per chromosome, is there any code/script that can be used to prepare input mutation data (tabular - six columns: chr, pos1, pos2, ref allele, alt allele and patient name) from VCF file?

can ActiveDriverWGSR be used to analyze WES data?

Excuse me:
As the name of the software, ActiveDriverWGSR, I wonder whether it could be used to ananlyze WES data.Hope your response

Question_cancer_gene_sites

Hi,
First of all, thank you so much for sharing such an amazing analyzing tool for us.
I believe ActiveDriverWGS can help and move my project forward.

I think I can prepare equivalent files of "cll_mutations" and "cancer_genes" by myself.
However, I have no idea how I can generate the "cancer_gene_sites" file. and I do not understand why we need it.
Will you please explain clearly what the cancer_gene_sites is and let me know how I can make this file for my analysis?
I am interested in analyzing non-coding regions such as UTR, intron, promoter and enhancer regions.

Thank you.

object 'fit' not found

I got the following error:

> results = ActiveDriverWGS(mutations = wgs_variants,
+                 elements = peak_ac, 
+                 ref_genome = "hg38")
0 remove hypermut, n= 0 ,  0 %
hypermuted samples:   

reversing 0 positions
Removing  0  invalid SNVs & indels

Number of Elements with 0 Mutations:  149712 
Tests to do:  1035 
.Error in glm.fit(x = numeric(0), y = numeric(0), weights = NULL, start = NULL,  : 
  object 'fit' not found
In addition: Warning messages:
1: In glm.fit(x = numeric(0), y = numeric(0), weights = NULL, start = NULL,  :
  no observations informative at iteration 1
2: glm.fit: algorithm did not converge

one-mutation-per-patient requirement

is violated when the same candidate driver includes indels and SNVs in the same patient

PTM sites

Hello, We are pretty interesting to test this algorithm in our dataset with >1000 WGS tumor-normal matched data. can you let us know where can I get the PTM sites as the algorithm suggested. For example, i try to look at this website
https://activedriverdb.org/download/, but none of them including the genome positions (in GRCh38) for 552,068 PTMs sites. So, is there another website or link I can download this from? Thanks so much.

ActiveDriver not compatible with mutation data that only contains insertions

install.packages("ActiveDriverWGS") 
library(ActiveDriverWGS)

#example elements 
testEle <- data.frame("chr" = c("chr1", "chr1"), "start" = c(3195984, 3204562), 
                      "end" = c(3205713, 3661579	), "id" = c("uc007aet.1", "uc007aeu.1"), stringsAsFactors = FALSE)

#example insertion only mutation data
testMutIn <- data.frame("chr" = c("chr18", "chr8"), "pos1" = c(70287915, 32273077), 
                      "pos2" = c(70287915, 32273077), "ref" = c("T", "T"), 
                      "alt" = c("GA", "GA"), "patient" = c("10167", "10167"), stringsAsFactors = FALSE)

results = ActiveDriverWGS(mutations = testMutIn,
                          elements = testEle)

When activeDriverWGSR is run with mutation data that only consists of insertions it returns the following error

Error in ActiveDriverWGS(mutations = testMutIn, elements = testEle) : 
   Reference and alternate alleles must be A, T, C or G

This error message comes from line 128 which checks if the reference and alternative columns are "legal_DNA".

The test examples used are indeed legal_DNA as they are sequences that consist of "G" and "A" however, the current check only allows single nucleotides.

# what line 128 is checking for
legal_dna = c('A', 'T', 'C', 'G')
> any(testMutIn$alt %in% legal_dna)
[1] FALSE

A possible correction to incorporate insertion data would be a grepl() with a regex expression.

if (!(any(mutations$ref %in% legal_dna) && grepl("^[ATCG]+$", mutations$alt))) stop("Reference and alternate alleles must be A, T, C or G")

When activeDriver is run with this quick fix, the following error message is returned.

 Error in .get_3n_context_of_mutations(mutations, reference) : 
   Dataset must contain SNVs

This error message comes from line 31 in format_muts.R.

The current activeDriver does not run with insertion only mutation data.

Adjacent elements falling within background flanks

How does ADWGS_test handle the case when an adjacent element sits in the background window? It seems like these are not excluded and if the adjacent element is itself hypermutated, it could inflate the background mutation rate. Does this happen in practice?

ActiveDriverWGS.res[[i]] = ActiveDriverWGS(mutations = ActiveDriverWGSInfo[[i]], elements = elements, sites = NULL, window_size = 50000, recovery.dir = paste(outDir, "ActiveDriverWGS_recovery", sep="/"), mc.cores = 4, ref_genome = paste0("hg", params$ucsc_genome_assembly))

The error is:

0 remove hypermut, n= 0 ,  0 %
hypermuted samples:   

reversing 0 positions
Removing  0  invalid SNVs & indels

Number of Elements with 0 Mutations:  14 
Tests to do:  1195 
Tests recovered:  928 
100  elements completed
200  elements completed
300  elements completed
400  elements completed
500  elements completed
600  elements completed
700  elements completed
800  elements completed
900  elements completed
.Error in glm.fit(x = numeric(0), y = numeric(0), weights = NULL, start = NULL,  : 
  object 'fit' not found

Data

>  head(ActiveDriverWGSInfo)
$pdac
         chr      pos1      pos2 ref alt                            patient
     1: chr3 120002137 120002137   T   C p010_tumor-52fccd-somatic.pcgr.vcf
     2: chr4 125450687 125450687   G   T p010_tumor-52fccd-somatic.pcgr.vcf
     3: chr5  38502681  38502681   C   A p010_tumor-52fccd-somatic.pcgr.vcf
     4: chr6  89951675  89951675   C   T p010_tumor-52fccd-somatic.pcgr.vcf
     5: chr7  82822603  82822603   G   T p010_tumor-52fccd-somatic.pcgr.vcf
    ---                                                                    
196698: chrX  15795073  15795073   A   C                  SA533811_SP125786
196699: chrX  15800132  15800132   T   A                  SA569276_SP133702
196700: chrX  15803607  15803607   G   T                  SA558660_SP125807
196701: chr9  14398633  14398633   C   G                            CGPA229
196702: chr1 186680291 186680291   C   T                            CGPA234

> str(ActiveDriverWGSInfo)
List of 1
 $ pdac:Classes 'data.table' and 'data.frame':	196702 obs. of  6 variables:
  ..$ chr    : chr [1:196702] "chr3" "chr4" "chr5" "chr6" ...
  ..$ pos1   : num [1:196702] 1.20e+08 1.25e+08 3.85e+07 9.00e+07 8.28e+07 ...
  ..$ pos2   : num [1:196702] 1.20e+08 1.25e+08 3.85e+07 9.00e+07 8.28e+07 ...
  ..$ ref    : chr [1:196702] "T" "G" "C" "C" ...
  ..$ alt    : chr [1:196702] "C" "T" "A" "T" ...
  ..$ patient: chr [1:196702] "p010_tumor-52fccd-somatic.pcgr.vcf" "p010_tumor-52fccd-somatic.pcgr.vcf" "p010_tumor-52fccd-somatic.pcgr.vcf" "p010_tumor-52fccd-somatic.pcgr.vcf" ...
  ..- attr(*, ".internal.selfref")=<externalptr>
  
  > head(elements)
     chr    start      end       id          GENEID
20 chr12   912077   990053    RAD52 ENSG00000002016
30 chr17 38869859 38921770    LASP1 ENSG00000002834
56 chr12 21468911 21501669    RECQL ENSG00000004700
66  chr7 96120220 96322147 SLC25A13 ENSG00000004864
74 chr19 18831938 18868236     UPF1 ENSG00000005007
78  chr7 27181510 27185223   HOXA11 ENSG00000005073

I have also done some sanity checking on both mutation and elements data and found no issues with having NA's or empty values.

Can you please help figure out why this might be producing an error?
Many thanks.