Git Product home page Git Product logo

eps-ml's Introduction

EPS-ML

Author: Cai John

This repo contains the code from "Extreme Phenotype Sampling Improves LASSO and Random Forest Marker Selection in Complex Traits"

Pipelines

Both pipelines from the manuscript are available here as lasso_pipeline.R and rf_pipeline.R. They both use the same input files.

The LASSO pipeline contains more post-processing and visualizations than the RF pipeline. If you want to create identical PCA plots for the RF pipeline simply run a PCA on all your markers beforehand (see code in lasso_pipeline.R if unfamiliar with how to do this). Then take the set of selected features output from the RF pipeline and run a PCA using only the selected features. Note that the {}_transcriptImportance.txt file output from the RF pipeline contains variable importance values for all features, you need to filter this based on your desired significance cut-off for your data.

The example directory contains data to run a small example analysis. This will demonstrate all output statistics and visualizations. This is a random subset of the Poplar data presented in the paper so do not expect to obtain the same results.

Malaria Data

The meta-data used for both the Zhu et al. and Mok et al. datasets are included in the malaria data directory. We have included this to make it easier for future investigators as some data was generated via text-scraping.

The paper describing the Mok et al. data can be found here: https://pubmed.ncbi.nlm.nih.gov/25502316/. Their transcriptome data can be downloaded from here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59097.

The paper describing the Zhu et al. data can be found here: https://www.nature.com/articles/s41467-018-07588-x. Their transcriptome data can be downloaded from here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121505.

Using With Your Own Data

To use the code with your own data, open the pipeline file in your favorite editor and simply change the strings in the first lines:

samples - The path to the samples file

expr - The path to the expression file

outname - Desired name of output directory to save stats and visualizations

subtit - The string to use as a subtitle on each plot. Not used in the RF pipeline as no plotting is done.

Once changed, the pipeline can easily be run using the Rscript command.

Required files

The pipeline requires two main files:

  1. The samples file: this is a tab-separated text file with two columns. Headers must be "Genotype_ID" and "Pheno". The Genotype_ID column should contain genotype codes to match samples to the expression data. The Pheno column should contain phenotypic class values for each sample.

  2. The expression data file: this is a tab-separated text file. The first column must be named "Genotype_ID" and these values should match those in the samples file. All subsequent columns should contain the markers to look for associations with.

eps-ml's People

Contributors

caiwjohn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.