Git Product home page Git Product logo

age-prediction's Introduction

Human skin, oral, and gut microbiomes predict chronological age

This study performed Random Forest regression analyses of human microbiota from multiple body sites (gut, mouth and skin). This repository provided all source data and codes for generation of all results in the manuscript. Furthermore, in output directories, we also provided additional exploratory analysis results for a better understanding of our microbiota-based models for age prediction.

  • Huang S, Haiminen N, Carrieri A-P, Hu R, Jiang L, Parida L, Russell B, Allaband C, Zarrinpar A, Vázquez-Baeza Y, Belda-Ferre P, Zhou H, Kim H-C, Swafford AD, Knight R, Xu ZZ. 2020. Human skin, oral, and gut microbiomes predict chronological age. mSystems 5:e00630-19. https://doi.org/10.1128/mSystems.00630-19.

Data source

Qiita study IDs involved in the meta-analysis:

  • Gut microbiota:
QIITA Study ID EBI accession ID Project name Publication(s) # of samples involved
10317 ERP012803 American Gut Project American Gut: an Open Platform for Citizen Science Microbiome Research 2770
11757 PRJEB18535 GGMP regional variation Regional variation greatly limits application of healthy gut microbiome reference ranges and disease models 1609
  • Oral microbiota:
QIITA Study ID EBI accession ID Project name Publication(s) # of samples involved
10317 ERP012803 American Gut Project American Gut: an Open Platform for Citizen Science Microbiome Research 547
1841 PRJEB5726, PRJEB5727, PRJEB5728 Flores_SMP Temporal variability is a personalized feature of the human microbiome 642
550 ERP021896 Moving pictures of the human microbiome Moving pictures of the human microbiome 508
1774 ERP016472 Puerto Rico and Plantanal NA 48
2010 ERP012216 Longitudinal babies project Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer 72
2024 ERP016621 TZ_probiotic_pregnancy_study Microbiota at Multiple Body Sites during Pregnancy in a Rural Tanzanian Population and Effects of Moringa-Supplemented Probiotic Yogurt 254
2202 PRJEB6518 mit_daily_timeseries Host lifestyle affects human microbiota on daily timescales 285
10052 ERP008799, ERP008694 Yanomani 2008 The microbiome of uncontacted Amerindians 16
11052 ERP021896 Knight_ABTX NA 178
  • Skin microbiota:
QIITA Study ID EBI accession ID Project name Publication(s) # of samples involved
10317 ERP012803 American Gut Project American Gut: an Open Platform for Citizen Science Microbiome Research 440
11052 ERP021896 Knight_ABTX NA 177
2010 ERP012216 Longitudinal babies project Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer 65
1841 PRJEB5726, PRJEB5727, PRJEB5728 Flores_SMP Temporal variability is a personalized feature of the human microbiome 1293

The age distribution of all samples in gut, oral and skin datasets:

age distr Although the skewed age distribution in the skin or oral microbiota dataset may decrease the accuracy of age prediction for the older adults, it will not affect the conclusions about the relative ability of different human microbiomes to predict age.

R scripts

There are some R scripts and files in this repository that were used in the process of preparing the manuscript, also. Here I'll try to explain some of these.

Usage requirements and dependencies

This meta-analysis depends on the self-developed R package crossRanger that can be downloaded as following.

## install.packages('devtools') # if devtools not installed
devtools::install_github('shihuang047/crossRanger')

What analyses were done by the R script Age.crossRF_reg.ranger.R?

The R script Age.crossRF_reg.ranger.R performs the meta-analysis of microbiota data for predicting chronological age. For each dataset (i.e. gut, mouth or skin), this script can perform analyses as following.

  • Data trimming (such as sample filtering by NA values in the metadata).
  • RF modeling and performance evaluation for the whole dataset.
  • RF modeling and performance evaluation for the sub-datasets. To test if confounders (such as sex) affected the modeling, we first trained the age model within a sub-dataset stratified by a confounder, then applied it on all the other sub-datasets. For both model training and testing, we evaluated regression performance using mean absolute error (MAE).
  • Cross-application of RF models built on the sub-datasets and evaluated the performance using MAE.

All the anaylses can be conducted with this script typically in the Rstudio or R concole.

What inputs are neccessary for this R script?

Input gut_data oral_data skin_data Description
datafile gut_data/gut_4434.biom oral_data/oral_4014.biom skin_data/skin_4168.biom Biom-table file
sample_metadata gut_data/gut_4434_map.txt oral_data/oral_2550_map.txt skin_data/skin_1975_map.txt Metadata file
feature_metadata gut_data/gut_taxonomy.txt oral_data/oral_taxonomy.txt skin_data/skin_taxonomy.txt Feature metadata file
prefix_name gut_4434 oral_2550 skin_1975 The prefix of datasets
s_category c("cohort", "sex") "qiita_host_sex" c("body_site","qiita_host_sex") The metadata category for dividing datasets
c_category "age" "qiita_host_age" "qiita_host_age" The targeted metadata category for RF modeling

About the Input/ folder

This folder includes all the input files (biom table, sample metadata and feature metadata files) necessary for the RF regression analysis.

About the Output/ folder

This folder contains all of the output files from the main R script Age.crossRF_reg.ranger.R.

About the Figures/ folder

This folder contains selected output figures from the Output folder to genenrate the formal figures in our manuscript.

Microbiome age prediction for new datasets

Acknowledgements

This work is supported by IBM Research AI through the AI Horizons Network. For more information visit the IBM AI Horizons Network website.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.