- R
- GNU Make (for running the complete analysis pipeline)
- curl (for downloading files)
- tar (for extracting
tar
archives) - PLINK 1.9 (for clumping genetic summary data)
- Depending on your platform, you may need of the following to install some of the required R packages:
- (Windows) Rtools
- (Mac) Xcode or Xcode Command Line Tools
- (Linux) a C/C++ compiler (in Debian/Ubuntu, the packages
build-essential
orr-base-dev
should contain everything necessary)
The most convenient way to install these tools might be by using a package manager appropriate for your system, e.g. apt on Debian/Ubuntu, Scoop on Windows, or Homebrew on OS X.
If your PLINK executable is in a non-standard location, or is named something other than plink1.9.exe
, you may need to edit the plink_executable
variable at the top of the Makefile accordingly.
- Install renv:
- In R:
install.packages("renv")
- From the command line:
Rscript -e 'install.packages("renv")'
- In R:
- Install other required R packages:
- In R:
renv::restore()
- From the command line:
make packages
- In R:
- Most data sets are publicly available, and are automatically downloaded as part of the analyses.
- Within-family (sibship-adjusted) income GWAS data were obtained directly from the Within Family Consortium. These data should eventually be publicly available, but in the meantime, users who want to replicate the within-family analyses would need to request them from the authors and create the following files:
input/data/siblinggwas/Income_WS_mtag_meta.txt
input/data/siblinggwas/income-study-summary.txt
- The analyses can be run from the command line using
make
; this runs the required R scripts in the right order as defined inMakefile
. - Run all analyses:
make all
- Run MR analyses only:
make mr
- Run CAUSE analyses only (time-consuming):
make cause
- Run multivariable MR analyses only (time-consuming):
make mvmr
After running the analyses, the results are reported in CSV format in the output/results/
directory:
cause_results.csv
contains results from the CAUSE analysismr_estimates.csv
contains results from the IVW, median, mode-based, and MR-Egger analysesmvmr_estimates.csv
contains results from the multivariable MR analysesmvmr_stats.csv
contains information about the instruments used for the exposures in the multivariable MR analyses (number of SNPs and conditional F-statistics)
Exposures and outcomes in these files are denoted using the identifiers specified in input/metadata.csv
.
input/
: contains GWAS data (needed for the analyses) and linkage disequilibrium reference panel data (needed for clumping GWAS data)output/
: contains results of the analyses:output/analysis/
: raw output from each analysis (not human-readable)output/data/
: pre-processed and clumped datasetsoutput/results/
: formatted analysis results (see above)
renv/
: contains installed R packagesscripts/
: contains all R code for data processing and analysis- The subdirectories correspond to the subdirectories in
output/
. - There is no "master" script which runs the entire analysis. Instead, the order in which these scripts are run is defined in
Makefile
, and the analyses are run using themake
command.
- The subdirectories correspond to the subdirectories in
After running the analyses, the input
and output
folders will contain a number of very large files. Some of these can be deleted if necessary to conserve disk space. In particular:
input/data/
input/ld_ref_panel/
output/data/