Git Product home page Git Product logo

gwasglue's Introduction

gwasglue

Lifecycle: experimental Codecov test coverage R build status

Under development

This R package serves as a conduit between packages that can read or query GWAS summary data, and packages that can analyse GWAS summary data. Here is where it lies in the general ecosystem of GWAS data and analysis:

schematic

The figure above depicts a set of packages that we plan to connect to. Here is a list of what has been done and what is still to do:

Data sources

Finemapping

Colocalisation

Mendelian randomization

Visualisation

Installation

You can install the development version of gwasglue with:

devtools::install_github("mrcieu/gwasglue")

Usage

See vignettes etc here: https://mrcieu.github.io/gwasglue.

Reference datasets

Example GWAS VCF (GIANT 2010 BMI):

Updated 1000 genomes LD reference panels (multiple populations):

1kg European reference panel for LD (legacy):

1kg vcf harmonised against human genome reference:

Contributing to the resource

For any <analysis> package we create a new file called R/<analysis>.r which contains two functions:

  • gwasvcf_to_<analysis>
  • ieugwasr_to_<analysis>

For an example, see the R/TwoSampleMR.r file, which contains the functions gwasvcf_to_TwoSampleMR and ieugwasr_to_TwoSampleMR.

gwasglue's People

Contributors

explodecomputer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gwasglue's Issues

function gwasvcf_to_coloc small typo bug.

Hi,
in the function gwasvcf_to_coloc,
the code chunk :
else if (type2 == "cc") {
out2$s <- mean(tab1$NC/tab1$SS, na.rm = TRUE)
}

should be replaced by
if (type2 == "cc") {
out2$s <- mean(tab2$NC/tab2$SS, na.rm = TRUE)
}

#1 the supression of the else statement allow for colocalisation between two case-control traits.
#2 the replacement of tab1 by tab2 allow to get information from the good object.

Error in gwasvcf::vcf_to_granges(.) : Not all inherits(vcf, c("ExpandedVCF", "CollapsedVCF")) are TRUE.

vcfRT <- vcfR::read.vcfR("01_raw_data/confounder/ukb-b-19953.vcf.gz")
Scanning file to determine attributes.
File attributes:
meta lines: 107
header_line: 108
variant count: 9851866
column count: 10
Meta line 107 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
Character matrix gt rows: 9851866
Character matrix gt cols: 10
skip: 0
nrows: 9851866
row_num: 0
Processed variant: 9851866
All variants processed

data = gwasvcf_to_TwoSampleMR(vcf = vcfRT, type="exposure")
Error in gwasvcf::vcf_to_granges(.) :
Not all inherits(vcf, c("ExpandedVCF", "CollapsedVCF")) are TRUE.

SSimp

Speak to Zoltan about this

problem in gwasvcf::query_gwas

Dear authors,
I have the following error reporting, are there any hints for this?

# Extract the instruments for HDL cholesterol
gwasvcf::set_bcftools(
  NULL
)
HDL_exp <- gwasvcf::query_gwas(
  "/Users/zhangzhongheng/Documents/2022/GWAS_sepsis/DATA/ieu-b-109.vcf.gz", 
  pval = 5e-10)
HDL_exp <- gwasglue::gwasvcf_to_TwoSampleMR(
  HDL_exp, type="exposure")
#sepsis outcome 
sepsis_out <- gwasvcf::query_gwas(
  "/Users/zhangzhongheng/Documents/2022/GWAS_sepsis/DATA/ieu-b-4982.vcf.gz", 
  rsid = HDL_exp$SNP
    )
'tools_bcftools' option is not set, using native read which may be substantially slower. See 'set_bcftools' for information.
Note, this is much slower than searching by chromosome/position (e.g. see query_chrompos_file)
starting prefilter
prefiltering 12243372 records
Warning: TRE pattern compilation error 'Out of memory'Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'i' in selecting a method for function 'extractROWS': Filter 'fil' failed: invalid regular expression 'rs1191478940|rs1269057460|rs386627425|rs2298214|rs1936360|rs61703480|rs2749141|rs113177974|rs1291065947|rs2144734|rs1457642705|rs664755|rs10737472|rs1558013721|rs2298632|rs649458|rs1198443|rs1212034|rs10753556|rs11340914|rs72654647|rs1370916257|rs147266247|rs961976684|rs60659886|rs1203827760|rs34232231|rs34789477|rs34728853|rs113520491|rs34175245|rs12753981|rs148268445|rs12759594|rs71640337|rs35989952|rs71007882|rs34217609|rs60551503|rs71640338|rs12746810|rs12754468|rs35917448|rs35161452|rs12738345|rs1254174435|rs35738294|rs71636769|rs71636770|rs114165349|rs34585496|rs4579782|rs57217461|rs12743862|rs12744003|rs6666121|rs77408691|rs71514277|rs71007886|rs34423276|rs57532083|rs12737407|rs12737912|rs190077222|rs12729444|rs12726081|rs12730386|rs79568175|rs71581091|rs12726287|rs12756154|rs35407375|rs78533482|rs566260414|rs34517168|rs1369054892|rs800

Error in coloc example

> out <- ieugwasr_to_coloc(id1='ieu-a-300', id2='ieu-a-7', chrompos=chrpos)
Error in .$unit %in% c("logOR", "log odds") : object '.' not found

gwasvcf_to_finemapr error: Input is not detected as bcf or vcf format #40

  1. Using your vignette, downloading ieu-a-300.vcf.gz ([from (https://gwas.mrcieu.ac.uk/datasets/ieu-a-300/) (also your vignette region, vcffile, bfile and plink-bin)

dat <- gwasvcf_to_finemapr(region = region$region, vcf=vcffile, bfile=bfile, plink_bin=plink_bin)

Extracting data from vcf
Failed to open /Users/dr/Desktop/ieu-a-300.vcf.gz: could not load index
[E::COMPAT_bcf_hdr_read] Input is not detected as bcf or vcf format
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'seqinfo': no 'header' line "#CHROM POS ID..."?

  1. Using a vcf file created from gwas txt file

vv %$% create_vcf(chrom=seqnames, pos=start, nea=REF, ea=ALT, snp=ID, ea_af=AF, effect=ES, se=SE, pval=10^-LP, n=SS, name="a")

With same gwasvcf_to_finemapr, following error occurred:

Extracting data from vcf
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'samples': [internal] _hts_rewind() failed

Any help would be appreciated! Thanks

vcf file error

I made 2 vcf files using the gwas2vcf web interface.

When I try and use them with gwasglue, for example:

gtc <- gwasvcf_to_coloc(vcf1, vcf2, chrompos)

I get the error:

Error in unlist_as_integer(x@subscript) : object 'fancy_mseq' not found

This is the same error when I try and do anything with the pkg with these files - do you know what could be causing this?

Thanks!

Installation issue

Trying to do a first-ever install of this package. I'm using R-4.0.0

I did the recommended command: devtools::install_github("mrcieu/gwasglue")
But it failed with:

Error: Failed to install 'gwasglue' from GitHub:
  Failed to install 'gwasvcf' from GitHub:
  HTTP error 404.
  No commit found for the ref master
  
Did you spell the repo owner (`mrcieu`) and repo name (`gwasglue2`) correctly?

I can't help note that final message acts like I type 'gwasglue2' in the command, but I did not include that '2'.

FWIW, I earlier had a "Bad credentials" error, which I got around by creating a GITHUB PAT, so this new error does not seem to be permissions related, as far as I can tell.

Please advise.
Thanks

remotes:install_github error

Installing the package does not work. I have had multiple different issues now:

ERROR: lazy loading failed for package ‘Biostrings’
* removing ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Biostrings’
Error: Failed to install 'gwasglue' from GitHub:
  Failed to install 'gwasvcf' from GitHub:
  Failed to install 'Biostrings' from Bioc:
  (converted from warning) installation of package ‘/var/folders/15/564m2q6n7bvdtl7jn8clyxzr0000gn/T//RtmpN2F7eX/file1189c736aeb34/Biostrings_2.56.0.tar.gz’ had non-zero exit status
#10 680.8 ** byte-compile and prepare package for lazy loading
#10 683.5 Error: package or namespace load failed for ‘IRanges’:
#10 683.5  (converted from warning) multiple methods tables found for ‘mstack’
#10 683.5 Error: package ‘IRanges’ could not be loaded
#10 683.5 Execution halted
#10 683.5 ERROR: lazy loading failed for package ‘DelayedArray’
#10 683.5 * removing ‘/usr/local/lib/R/site-library/DelayedArray’
#10 683.9 Error: Failed to install 'gwasglue' from GitHub:
#10 683.9   Failed to install 'gwasvcf' from GitHub:
#10 683.9   Failed to install 'SummarizedExperiment' from Bioc:
#10 683.9   (converted from warning) installation of package ‘DelayedArray’ had non-zero exit status
#10 683.9 Execution halted
#10 ERROR: executor failed running [/bin/sh -c Rscript -e 'devtools::install_github(c("mrcieu/TwoSampleMR", "mrcieu/gwasglue"))']: runc did not terminate sucessfully
```

MAF missing in several datasets

MAF information is missing in several datasets, and colocalisation cannot be performed.
An example:

GWAS <- "ebi-a-GCST005063"
chrpos <- "11:61396322-61796322"
out <- ieugwasr_to_coloc(id1=GWAS, id2=GWAS, chrompos=chrpos)
res <- coloc::coloc.abf(out[[1]], out[[2]])
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
Inoltre: Warning message:
In sdY.est(d$varbeta, d$MAF, d$N) :
estimating sdY from maf and varbeta, please directly supply sdY if known

coloc only use 50 snps

Coloc recommends to only run tests on loci with 50 SNPs or more so I included a filter.

w1<-which(out[[1]]$varbeta==Inf|out[[2]]$varbeta=="Inf")
length(out[[1]]$snp)

for (k in 1:length(n)){
out[[1]][[k]]<-out[[1]][[k]][-w1]
out[[2]][[k]]<-out[[2]][[k]][-w1]
}

out <- gwasvcf_to_coloc(vcf1=paste0(path2,eqtl.traits[tr],"/",eqtl.traits[tr],".vcf.gz"),vcf2=paste0(path,trait[i],".harmonized.vcf.gz"),chrompos=chrpos)

w1<-which(out[[1]]$varbeta==Inf|out[[2]]$varbeta=="Inf")
length(out[[1]]$snp)

for (k in 1:length(n)){
out[[1]][[k]]<-out[[1]][[k]][-w1]
out[[2]][[k]]<-out[[2]][[k]][-w1]
}

length(out[[1]]$snp)

if(length(out[[1]]$snp)>50& length(out[[2]]$snp)>50) {
res <- coloc::coloc.abf(out[[1]], out[[2]])

bug with ieugwasr_to_coloc

top <- ieugwasr::tophits('ieu-a-300',p=5e-8) %>% arrange(p)
chrpos <- paste0(top$chr[1], ":", top$position[1] - 90000, "-", top$position[1] + 90000)
out <- ieugwasr_to_coloc(id1='ieu-a-300', id2='eqtl-a-ENSG00000270172', chrompos=chrpos)
Error in if (type1 == "cc") { : missing value where TRUE/FALSE needed

new software suggest

Hi, there:

I am very glad to see that you guys are building an ecosystem for VCF formatted GWAS files.

There are 4 new tools, all published this year. I tested and found them very easy to implement and quite important for functional analyses. It would be great if you guys could consider adding them.

  1. For genetic architecture, iCPAGdb, published at Genome Medicine: "An atlas connecting shared genetic architecture of human diseases and molecular phenotypes provides insight into COVID-19 susceptibility".

  2. For colocalization, hyprcoloc, published at Nature Communications: "A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits".

  3. For drug target analysis (maybe add as a new category), Trans-Phar, published at Human Molecular Biology: “Integration of genetically regulated gene expression and pharmacological library provides therapeutic drug candidates".

  4. For enrichment analysis (maybe add as a new category "enrichment"), clusterProfiler, published at The Innovation: "clusterProfiler 4.0: A universal enrichment tool for interpreting omics data".

Finally, I feel it is very important to add a new category for (cross-traits) meta-analysis. A lot of reserachers would want to do this when a lot of GWAS are available. This includes the traditional MTAG software, a new PLEIO software published this year at AJHG: "a method to map and interpret pleiotropic loci with GWAS summary statistics"

Best regards,
Jie

missing values for coloc

In eQTL Gen some of the zscores/varbeta in the out list were NA. For example, eqtl-a-ENSG00000002834. This resulted in NA in the coloc test. This happens quite a lot. Not sure why.

GTEX eQTL

Is it possible to add GTEX eQTL to MRBase?

installing gwasglue without finemapr?

Dear Gib,

is there a way to install the gwasglue package without the finemapr package? I have issues installing the finemapr one:

Error: Failed to install 'gwasglue' from GitHub:
Failed to install 'finemapr' from GitHub:
(converted from warning) installation of package ‘/var/folders/4l/wx5nz34j4k5064xcw4wf0vk80000gp/T//RtmpdUQ8Po/file54b310ffff77/finemapr_0.1.0.tar.gz’ had non-zero exit status

Many thanks,
Verena

gwasvcf_to_ldsc

Hey,

I'm interested in using GWASVCF files to run LDSC. Seems like the best way to do it would be to create a gwasvcf_to_ldsc function here, right?

Wondering if you've had a go at doing this yet? Got any tips on setting it up? LDSC will expect the gwasvcf to be converted to a standard sumstats format (SNP CHR BP A1 A2 P Z): have you got any example code for doing this?

Thanks

Tagging in: @Al-Murphy @roxyisat-rex

coloc version 5

Dear Team,

I am wondering whether coloc version 5 has been implemented into the workflow? or coloc version 4 is applied.

Thanks.

Shicheng

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.