maxwellsh / digdriver Goto Github PK
View Code? Open in Web Editor NEWFlexible and efficient tests for evidence of positive selection anywhere in the cancer genome.
License: BSD 3-Clause "New" or "Revised" License
Flexible and efficient tests for evidence of positive selection anywhere in the cancer genome.
License: BSD 3-Clause "New" or "Revised" License
Hello,
I am getting an error when using Dig to annotate the mutation bed file created in step 1. Could you please help me find out what's wrong, please?
Best regards,
Gabriela
DigPreprocess.py annotMutationFile ./input.bed ./reference_hg19_Homo_sapiens_assembly19.fasta output
Adding mutation function
dyld[49721]: Library not loaded: @rpath/libgfortran.3.dylib
Referenced from: <34EA1D6C-7BD3-38A2-9869-5FCB9627BA35> /Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/libRblas.dylib
Reason: tried: '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/bin/exec/../../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/bin/exec/../../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file)
Adding mutation context
Reading in mutation file
Traceback (most recent call last):
File "/Users/gv9/opt/anaconda3/envs/digdriver/bin/DigPreprocess.py", line 366, in
args.func(args)
File "/Users/gv9/opt/anaconda3/envs/digdriver/bin/DigPreprocess.py", line 117, in annotMutationFile
addMutationContext(args)
File "/Users/gv9/opt/anaconda3/envs/digdriver/bin/DigPreprocess.py", line 82, in addMutationContext
df_mut = mutation_tools.read_mutation_file(args.fmut, drop_duplicates=False)
File "/Users/gv9/opt/anaconda3/envs/digdriver/lib/python3.7/site-packages/DIGDriver/data_tools/mutation_tools.py", line 48, in read_mutation_file
with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'output.h5'
The mutation file cannot be translated into Dig file since the numberous bugs in you code! We hope you can be responsiblable for your work and be loyal to the accuracy of the science work.
... test for positive selection anywhere in the genome in any cohort of sequenced tumor samples.
Does that even include cohorts which are not any of the 37 PCAWG cancer types?
Genic Model
... Details are provided here (LINK).
Elements Model
... Details are provided here (LINK).
Sites Model
... Details are provided here (LINK).
What should the links be to?
You will need the following files from our data portal:
hg19.fasta or an equivalent hg19 reference fasta file.
What if the user has BAM files aligned to hg38?
DigDriver.py
geneDriver Analyzes 19,210 autosomal genes for burdens of mutations in the following categories: synonymous SNVs, missense SNVs, nonsense (stop-gained) SNVs, canonical splice SNVs, truncating SNVs (nonsense & canonical splice), nonsynonymous SNVs (missense, nonsense, & canonical splice), and indels.
How about lncRNA?
This might be a naive question, but I'm wondering about your rationale behind using the (CNN + GP) module to obtain estimates for mean regional mutation rates and their associated regional standard deviations.
Why didn't you just estimate these directly from your mutation data via simple calculation and use these estimates to derive your generative model? Why go through the trouble of predicting these from epigenetic data when you can just directly estimate them by computing the sample mean and standard error per region?
Super interesting work!
I have a question about how you made predictions for 1-1.5kb genes.
In your paper, you said:
Compared to existing methods designed specifically to analyze tiled regions, coding sequence, and non-coding elements in which synonymous mutations cannot be used to calibrate mutation rate models (for example, enhancers and non-coding RNAs), Dig explained the most variation of SNV counts within 10-kb regions in 14 of 16 cohorts, of non-synonymous SNV counts in 16 of 16 cohorts and of enhancer and non-coding RNA SNV counts in 15 of 16 cohorts, respectively (Fig. 1d, Table 1, Supplementary Fig. 2 and Supplementary Tables 4โ6).
In table 1, you say that you evaluated performance of your deep learning model on genes with coding sequence 1-1.5kb in length.
Based on what I understand, your model is trained to predict mutation rates/counts in 10kb (or 1Mb) regions. So how exactly did you obtain predictions for 1-1.5 kb regions?
Thanks in advance!
When you constructed feature maps, what did you consider the "attention matrix" to be? i.e which matrix did you use for dimensionality reduction? Was it the original epigenetic feature matrix, the attention weights matrix, or the matrix resulting from their element-wise multiplication? It doesn't seem very clear to me from the supplementary material.
Hello,
I'm trying to run the DigDriver.py elementDriver using the Digestive tract tumors bed file and the UTR5 with splice annotation bed file. According to the help page of the command, I should specify a 'pretrain_key' (Name of key used to strore pretrained model). What should I fill in for this input argument? I assume this 'key' is known if I would pretrain the model myself but I'm using the pretrained model that is provided trough the data portal.
I'm sorry if this is explained somewhere in a relevant Wiki section but I couldn't find it.
kind regards,
Tom
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.