Git Product home page Git Product logo

digdriver's Introduction

conda badge

Welcome to Dig

Dig builds genome-wide maps of somatic mutation rates in cancer genomes and allows any set of candidate mutations to be tested for an excess of observed mutations compared to the number expected based on the neutral mutation rate.

Web-browseable mutation maps

Want to visually explore somatic mutation rates across the genome? Check out our genome browser genome browser with maps of predicted and observed mutation counts for 37 types of cancer.

Getting started

See our wiki for installation instructions and tutorials.

Data files

All necessary data files are available from our data portal

Citation

Want to learn more about Dig and its biological applications? Check out our preprint Sherman et al. 2021.

Really want to get into the weeds of the deep-learning model? Check out our ICRL paper.

Please cite both papers if you make use of our resources.

digdriver's People

Contributors

maxwellsh avatar

Stargazers

 avatar Jinxin WANG avatar  avatar  avatar  avatar Blake avatar  avatar Olivier Ma avatar Ken Chen avatar Peter Zeng avatar Oliver Priebe avatar slp avatar Jérémie Kalfon avatar Zhe Pan avatar SimonY avatar Kez Cleal avatar Xiaotao Wang avatar Meera C S avatar  avatar Jonathan Branam avatar  avatar  avatar Wendy Wong avatar

Watchers

 avatar

digdriver's Issues

Why use CNN + GP?

This might be a naive question, but I'm wondering about your rationale behind using the (CNN + GP) module to obtain estimates for mean regional mutation rates and their associated regional standard deviations.

Why didn't you just estimate these directly from your mutation data via simple calculation and use these estimates to derive your generative model? Why go through the trouble of predicting these from epigenetic data when you can just directly estimate them by computing the sample mean and standard error per region?

pretrain_key

Hello,

I'm trying to run the DigDriver.py elementDriver using the Digestive tract tumors bed file and the UTR5 with splice annotation bed file. According to the help page of the command, I should specify a 'pretrain_key' (Name of key used to strore pretrained model). What should I fill in for this input argument? I assume this 'key' is known if I would pretrain the model myself but I'm using the pretrained model that is provided trough the data portal.

I'm sorry if this is explained somewhere in a relevant Wiki section but I couldn't find it.

kind regards,
Tom

How did you make predictions for genes of length 1-1.5kb

Super interesting work!

I have a question about how you made predictions for 1-1.5kb genes.

In your paper, you said:

Compared to existing methods designed specifically to analyze tiled regions, coding sequence, and non-coding elements in which synonymous mutations cannot be used to calibrate mutation rate models (for example, enhancers and non-coding RNAs), Dig explained the most variation of SNV counts within 10-kb regions in 14 of 16 cohorts, of non-synonymous SNV counts in 16 of 16 cohorts and of enhancer and non-coding RNA SNV counts in 15 of 16 cohorts, respectively (Fig. 1d, Table 1, Supplementary Fig. 2 and Supplementary Tables 4–6).

In table 1, you say that you evaluated performance of your deep learning model on genes with coding sequence 1-1.5kb in length.

Based on what I understand, your model is trained to predict mutation rates/counts in 10kb (or 1Mb) regions. So how exactly did you obtain predictions for 1-1.5 kb regions?

Thanks in advance!

Which matrix did the columns that you used for dimensionality reduction come from?

When you constructed feature maps, what did you consider the "attention matrix" to be? i.e which matrix did you use for dimensionality reduction? Was it the original epigenetic feature matrix, the attention weights matrix, or the matrix resulting from their element-wise multiplication? It doesn't seem very clear to me from the supplementary material.

Error when using Dig to annotate the mutation bed file

Hello,

I am getting an error when using Dig to annotate the mutation bed file created in step 1. Could you please help me find out what's wrong, please?

Best regards,
Gabriela

DigPreprocess.py annotMutationFile ./input.bed ./reference_hg19_Homo_sapiens_assembly19.fasta output

Adding mutation function
dyld[49721]: Library not loaded: @rpath/libgfortran.3.dylib
Referenced from: <34EA1D6C-7BD3-38A2-9869-5FCB9627BA35> /Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/libRblas.dylib
Reason: tried: '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/bin/exec/../../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/conda/conda-bld/r-base_1536076838216/work/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/bin/exec/../../../libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/R/lib/libgfortran.3.dylib' (no such file), '/Users/gv9/opt/anaconda3/envs/digdriver/lib/libgfortran.3.dylib' (no such file)
Adding mutation context
Reading in mutation file
Traceback (most recent call last):
File "/Users/gv9/opt/anaconda3/envs/digdriver/bin/DigPreprocess.py", line 366, in
args.func(args)
File "/Users/gv9/opt/anaconda3/envs/digdriver/bin/DigPreprocess.py", line 117, in annotMutationFile
addMutationContext(args)
File "/Users/gv9/opt/anaconda3/envs/digdriver/bin/DigPreprocess.py", line 82, in addMutationContext
df_mut = mutation_tools.read_mutation_file(args.fmut, drop_duplicates=False)
File "/Users/gv9/opt/anaconda3/envs/digdriver/lib/python3.7/site-packages/DIGDriver/data_tools/mutation_tools.py", line 48, in read_mutation_file
with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'output.h5'

Documentation Uncertainties

... test for positive selection anywhere in the genome in any cohort of sequenced tumor samples.

Does that even include cohorts which are not any of the 37 PCAWG cancer types?

Genic Model
... Details are provided here (LINK).
Elements Model
... Details are provided here (LINK).
Sites Model
... Details are provided here (LINK).

What should the links be to?

You will need the following files from our data portal:
hg19.fasta or an equivalent hg19 reference fasta file.

What if the user has BAM files aligned to hg38?

DigDriver.py geneDriver Analyzes 19,210 autosomal genes for burdens of mutations in the following categories: synonymous SNVs, missense SNVs, nonsense (stop-gained) SNVs, canonical splice SNVs, truncating SNVs (nonsense & canonical splice), nonsynonymous SNVs (missense, nonsense, & canonical splice), and indels.

How about lncRNA?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.