xinhe-lab / mirage Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 2.0 10.06 MB

Mixture model based Rare variant Analysis on Genes

Home Page: https://xinhe-lab.github.io/mirage

R 35.08% HTML 64.92%

mirage's People

Contributors

Stargazers

Watchers

Forkers

crerecombinase xsun1229

mirage's Issues

Annotation group documentation

I got this email from a user:

In my data analysis, I got the result by using annovar anova. I found that you annotate with several popular programs including PolyPhen, CADD and SIFT. In my data analysis, I will do this step before mirage analysis? I read your code about mirage, there are four columns( format of input data column 1:variant, 2,NO.variant in cases 3. No variant in control 4 variant group index). but I do not know where these columns come from? can you teach me?

@han16 we wrote on this page "Variant groups can be user defined, usually depending on its annotations." but it seems too vague to be helpful to users. Is it correct if we change it to the following:

"Variant groups can be user defined, usually depending on its annotations. For example, in Han et al (2019+), we label as group 2 those variants with PolyPhen 194 score greater than 0.957, CADD score top 10% or SIFT score < 0.05; other variants are labelled group 1"

Is this correct?

Is it true to obtain the NO.case and NO.control?

Is it true to obtain the NO.case and NO.control?
#NO.case
#grep 0/1
count_case_01 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('0/1',x))))
rownames(count_case_01) <- tmp_vcf_case_data[,1]
colnames(count_case_01) <- "count_01"
#grep 1/2
count_case_12 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('1/2',x))))
rownames(count_case_12) <- tmp_vcf_case_data[,1]
colnames(count_case_12) <- "count_12"

#grep 1/1

count_case_11 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('1/1',x))))
rownames(count_case_11) <- tmp_vcf_case_data[,1]
colnames(count_case_11) <- "count_11"

#grep 2/2
count_case_22 <- data.frame(apply(tmp_vcf_case_data,1,function(x) length(grep('2/2',x))))
rownames(count_case_22) <- tmp_vcf_case_data[,1]
colnames(count_case_22) <- "count_22"

#combine four data for case
count_case <- cbind(count_case_01,count_case_11,count_case_12,count_case_22)
count_case[,5] <- 2rowSums(count_case[,2:4])+1count_case[,1]
colnames(count_case)[5] <- "N.case"

#NO.control
#grep 0/1
count_contro_01 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('0/1',x))))
rownames(count_contro_01) <- tmp_vcf_control_data[,1]
colnames(count_contro_01) <- "count_01"

#grep 1/2
count_contro_12 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('1/2',x))))
rownames(count_contro_12) <- tmp_vcf_control_data[,1]
colnames(count_contro_12) <- "count_12"

#grep 1/1

count_contro_11 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('1/1',x))))
rownames(count_contro_11) <- tmp_vcf_control_data[,1]
colnames(count_contro_11) <- "count_11"

#grep 2/2
count_contro_22 <- data.frame(apply(tmp_vcf_control_data,1,function(x) length(grep('2/2',x))))
rownames(count_contro_22) <- tmp_vcf_control_data[,1]
colnames(count_contro_22) <- "count_22"
#combine four data for control
count_cantro <- cbind(count_contro_01,count_contro_11,count_contro_12,count_contro_22)
count_cantro[,5] <- 2rowSums(count_cantro[,2:4])+1count_cantro[,1]
colnames(count_cantro)[5] <- "N.control"

how to obtain the NO.case and NO.control

Dear,
I still felt confused about how to obtain the NO.case and NO.control. You said that " (No.case, how many times the variant appears in cases, No.contr, how many times the variant appears in controls — you can compute these quantities from your data)". which file I can get this information. Can you give me an example?Thank you!

Example of gene level FDR & multiple testing in genome-wide scan

@han16 I was asked by @linnanqia offline who has fixed bug in her code and got what seems encouraging results (log(BF) about 5 for some genes that seems to make sense). However in our tutorial we didn't explain how results are interpreted; in particular, how multiple testing is performed -- how gene level posterior probability should be interpreted in terms of FDR, and what threshold to use.

Could you kindly update the tutorial adding a section on interpreting the results? Thanks!

xinhe-lab / mirage Goto Github PK

mirage's People

Contributors

Stargazers

Watchers

Forkers

mirage's Issues

Annotation group documentation

Is it true to obtain the NO.case and NO.control?

how to obtain the NO.case and NO.control

Example of gene level FDR & multiple testing in genome-wide scan

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent