maxwellsh / digdriver Goto Github PK

View Code? Open in Web Editor NEW

23.0 23.0 5.0 35.02 MB

Flexible and efficient tests for evidence of positive selection anywhere in the cancer genome.

License: BSD 3-Clause "New" or "Revised" License

Python 96.63% R 3.28% Shell 0.09%

digdriver's People

Contributors

Stargazers

Watchers

Forkers

meeracs96 bit-vs-it oliverpriebe myz540 jinxin-wang

digdriver's Issues

Error when using Dig to annotate the mutation bed file

Hello,

I am getting an error when using Dig to annotate the mutation bed file created in step 1. Could you please help me find out what's wrong, please?

Best regards,
Gabriela

DigPreprocess.py annotMutationFile ./input.bed ./reference_hg19_Homo_sapiens_assembly19.fasta output

Adding mutation function
dyld[49721]: Library not loaded: @rpath/libgfortran.3.dylib
Referenced from: <34EA1D6C-7BD3-38A2-9869-5FCB9627BA35> /Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/libRblas.dylib
Reason: tried: '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/bin/exec/../../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/bin/exec/../../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file)
Adding mutation context
Reading in mutation file
Traceback (most recent call last):
File "/Users/gv9/opt/anaconda3/envs/digdriver/bin/DigPreprocess.py", line 366, in
args.func(args)
File "/Users/gv9/opt/anaconda3/envs/digdriver/bin/DigPreprocess.py", line 117, in annotMutationFile
addMutationContext(args)
File "/Users/gv9/opt/anaconda3/envs/digdriver/bin/DigPreprocess.py", line 82, in addMutationContext
df_mut = mutation_tools.read_mutation_file(args.fmut, drop_duplicates=False)
File "/Users/gv9/opt/anaconda3/envs/digdriver/lib/python3.7/site-packages/DIGDriver/data_tools/mutation_tools.py", line 48, in read_mutation_file
with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'output.h5'

Request for the qualifiefd DigPreprocess.py file

The mutation file cannot be translated into Dig file since the numberous bugs in you code! We hope you can be responsiblable for your work and be loyal to the accuracy of the science work.

Documentation Uncertainties

... test for positive selection anywhere in the genome in any cohort of sequenced tumor samples.

Does that even include cohorts which are not any of the 37 PCAWG cancer types?

Genic Model
... Details are provided here (LINK).
Elements Model
... Details are provided here (LINK).
Sites Model
... Details are provided here (LINK).

What should the links be to?

You will need the following files from our data portal:
hg19.fasta or an equivalent hg19 reference fasta file.

What if the user has BAM files aligned to hg38?

DigDriver.py geneDriver Analyzes 19,210 autosomal genes for burdens of mutations in the following categories: synonymous SNVs, missense SNVs, nonsense (stop-gained) SNVs, canonical splice SNVs, truncating SNVs (nonsense & canonical splice), nonsynonymous SNVs (missense, nonsense, & canonical splice), and indels.

How about lncRNA?

Why use CNN + GP?

This might be a naive question, but I'm wondering about your rationale behind using the (CNN + GP) module to obtain estimates for mean regional mutation rates and their associated regional standard deviations.

Why didn't you just estimate these directly from your mutation data via simple calculation and use these estimates to derive your generative model? Why go through the trouble of predicting these from epigenetic data when you can just directly estimate them by computing the sample mean and standard error per region?

How did you make predictions for genes of length 1-1.5kb

Super interesting work!

I have a question about how you made predictions for 1-1.5kb genes.

In your paper, you said:

Compared to existing methods designed specifically to analyze tiled regions, coding sequence, and non-coding elements in which synonymous mutations cannot be used to calibrate mutation rate models (for example, enhancers and non-coding RNAs), Dig explained the most variation of SNV counts within 10-kb regions in 14 of 16 cohorts, of non-synonymous SNV counts in 16 of 16 cohorts and of enhancer and non-coding RNA SNV counts in 15 of 16 cohorts, respectively (Fig. 1d, Table 1, Supplementary Fig. 2 and Supplementary Tables 4–6).

In table 1, you say that you evaluated performance of your deep learning model on genes with coding sequence 1-1.5kb in length.

Based on what I understand, your model is trained to predict mutation rates/counts in 10kb (or 1Mb) regions. So how exactly did you obtain predictions for 1-1.5 kb regions?

Thanks in advance!

Which matrix did the columns that you used for dimensionality reduction come from?

When you constructed feature maps, what did you consider the "attention matrix" to be? i.e which matrix did you use for dimensionality reduction? Was it the original epigenetic feature matrix, the attention weights matrix, or the matrix resulting from their element-wise multiplication? It doesn't seem very clear to me from the supplementary material.

pretrain_key

Hello,

I'm trying to run the DigDriver.py elementDriver using the Digestive tract tumors bed file and the UTR5 with splice annotation bed file. According to the help page of the command, I should specify a 'pretrain_key' (Name of key used to strore pretrained model). What should I fill in for this input argument? I assume this 'key' is known if I would pretrain the model myself but I'm using the pretrained model that is provided trough the data portal.

I'm sorry if this is explained somewhere in a relevant Wiki section but I couldn't find it.

kind regards,
Tom

maxwellsh / digdriver Goto Github PK

digdriver's People

Contributors

Stargazers

Watchers

Forkers

digdriver's Issues

Error when using Dig to annotate the mutation bed file

Request for the qualifiefd DigPreprocess.py file

Documentation Uncertainties

Why use CNN + GP?

How did you make predictions for genes of length 1-1.5kb

Which matrix did the columns that you used for dimensionality reduction come from?

pretrain_key

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent