Git Product home page Git Product logo

decline_package's Introduction

Decline_Package

1. Background

The optimal approaches to study lung function decline, particularly in the setting of omic-scale predictors, are not known. The goal of this research project is to test different statistical models (e.g. slope, GEE, GLM) in simulated and real world cohorts (population and case-control) to compare Type 1 and Type 2 error, identify factors leading to heterogeneity, and to provide a foundation for modeling for future omics data.

Aims

  1. develop a population of simulated spirometry data with a) autocorrelation, estimated from real world data, b) four groups representing normal, lower baseline, rapid decline, and lower baseline and rapid decline and c) both linear and quadratic decline
  2. (this code package): test a set of models in real world data: a) simple slope models; b) LMM with random intercept, slope, or both; c) GEE; d) age, time; e) quadratic or linear terms (for underlying linear or quadratic decline). We will look at the effects of smoking (as a positive control), and selected SNPs.

Authorship

Anticipate up to 4 per cohort, with additional as required for writing / analysis.

Methods

Prepare cohort dataset as requested below. We require pre_fev1 in mL, age, sex, height, race, smoking status, pack-years, SNP allele freq, fev1pp (can use GLI global, or can use what has been previously calculated for your cohort), fev1/fvc ratio; follow up time, number of visits. We anticipate that the dataset will be clean; i.e. with minimal missingness, subjects with existing longitudinal data (>= 2 time points, smoking data), removal of erroneous data (i.e. QC’d spirometry and identification of spurious outliers). Large discrepancies in sample size in models / baseline characteristics will be assessed, and if necessary, request to re-prepare the datasets. We selected 12 SNPs: 6 from prior GWAS (COPD, lung function, or lung function decline) and 6 proxies (null control). If you have related individuals, please reach out to Jingwen. Errors particularly with GEE model 13 are known, the software will continue to run.

Results

All result files will be added to the source folder. Please zip/tar/etc with filename cohort_date.zip and send to Jingwen ([email protected]) and Matt Moll ([email protected]) See project document for further details on the project.

2. Columns for data set

  • IID: Unique individual ID (numeric). Note: two different individuals CANNOT have the same IID
  • FID: family ID (numeric); For unrelated individuals, create a column of the same number as "FID", e.g. a column of 1
  • pre_fev1: FEV1 (in mL)
  • SNPs: SNP information, column name MUST starts with the lower case "rs", e.g. rs507211, rsChrPosRefAlt
  • timefactor_spiro: time since the baseline exam (in YEARS). At baseline, timefactor_spiro=0
  • age: time-varying age
  • smoking_status: time-varying smoking status (never=0, former=1, current=2);

Baseline

  • age_baseline: baseline age
  • ht_baseline: baseline height (in cm)
  • smoking_packyears_base: pack-years at baseline
  • sex: biological sex (female=0, male=1)
  • smoking_status_base: baseline smoking status (never=0, former=1, current=2); Will be used as the grouping variable for glmmkin.

Other cohort-specific variables

  • PCs (PC1, PC2, ... ...)
  • equipchange ......

Variables for summary not for analysis

  • pre_fev1fvc: ratio of fev1 and fvc (fev1/fvc)
  • fev1_pp: fev1 percent predicted

  • Kinship matrix (for related data): both row names and column names MUST be IID

3. Important Note

  • Cohorts with multiple racial groups should conduct race-stratified analyses.
  • If cohorts have only 3 or less repeated measurements per individual, you may encounter an error message for GEE model 13 (e.g. something related to contrasting for variables with less than 2 levels). If this error only occurs in GEE model 13, please ignore this error, but kindly notify us and provide results for the remaining models.
  • Please check the files from the output folder "decline_package_output" to make sure that you have all the requested files and plots (plots are displayed normally).

4. Required packages

If the current version does not work, you can try previous version. Both versions should give the SAME output

R version: R/4.2.1 (current version)

  • geepack_1.3.9
  • GMMAT_1.4.0
  • dplyr_1.1.2
  • readxl_1.4.3
  • ggplot2_3.4.2
  • kinship2_1.9.6 (optional) for kinship matrix

R version: R/4.0.2 (previous version)

  • geepack_1.3-2
  • GMMAT_1.3.1
  • dplyr_1.0.2
  • readxl_1.3.1
  • ggplot2_3.3.2
  • kinship2 (optional)

5. For analysis, use "Analysis_MAIN.R"

Example code is included inside Analysis_MAIN.R as comments. Please make sure your dataset has all the columns that are mentioned in section 2. Columns for data set.

Example data:

IID FID pre_fev1 timefactor_spiro age smoking_status age_baseline PC1 rs507211 rs4077833
1 1 2771 0 50 1 50 -0.0293 0 1
1 1 2500 4 54 1 50 -0.0293 0 1
1 1 2450 10 60 1 50 -0.0293 0 1
2 11 3510 0 38 0 38 -0.0038 0 2
2 11 3450 9 47 0 38 -0.0038 0 2
2 11 3320 12 50 0 38 -0.0038 0 2
2 11 3220 17 55 0 38 -0.0038 0 2
3 24 2570 0 42 2 42 0.0071 1 0
3 24 2600 6 48 2 42 0.0071 1 0
3 24 2540 8 50 2 42 0.0071 1 0

Example Kinship matrix for 5 individuals (IID = 11,20,31,42,50):

IID 11 20 31 42 50
11 0.5 0 0.25 0.125 0
20 0 0.5 0.25 0.125 0
31 0.25 0.25 0.5 0.25 0.25
42 0.125 0.125 0.25 0.5 0
50 0 0 0.25 0 0.5

decline_package's People

Contributors

zjwlucy avatar mikecho95 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.