DCAP:Integrating multi-omics data with deep learning for predicting cancer prognosis DCAP is designed to integrate multi-omics data for predicting the prognosis risk of cancer patients.The multi-omics data was input into Autoencoder to obtain representative composite features that can best rebuild all input data. These features were then input to the Cox proportional hazards (Cox-PH) model to estimate the patients’ prognosis risks. Finally, in order to reduce the number of features for cancer prognosis prediction, we employed XGboost for features selection, and reconstruct the cancer prognosis model.
tensorflow, python 3, R
In this study, we utilized cancer datasets from the TCGA portal (https://tcga-data.nci.nih.gov/tcga/). All these datasets were downloaded by using the R package “TCGA-assembler”(v1.0.3, (Wei, et al., 2018)), which contains four types of multi-omics data: mRNA, miRNA, DNA methylation, and copy number variation (CNV) data. Here, “mRNA” was RNA sequencing data generated by UNC Illumina HiSeq_RNASeq V2; Level 3, “miRNA” was miRNA sequencing data obtained by BCGSC Illumina HiSeq miRNASeq, DNA methylation data was generated by USC HumanMethylation450, and CNV data that generated by BROAD-MIT Genome wide SNP_6.
DCAP was a framework with three different method. Firstly please use the autoencoder_DCAP.py to obtain representative composite features that can best rebuild all input multi-omics data. The table named fea.csv will be saved in the file folder. After that a table (Such as brca_cox.csv) should be constructed by users, this table will be put into uni_cox.R (), and the generated file (such as brca_cox2.csv) will be processed by Cox_DCAP for getting the finally results. The CI_Cox$concordance is the C-index value obtained by DCAP_Cox and the CI-XGB is the C-index value obtained by XGBoost. (If users need data preprocessing, the Data_preprocessing.R can be used)
Beforing using please checking the file folder path.
For easy to use, here we give four example data: brca_multitest.csv used for Data_preprocessing.R, brcatest_go.csv used for autoencoder_DCAP.py, brca_cox.csv used for uni_cox.R and brca_cox2.csv used for Cox_DCAP. Due to file size limitation, we only uploaded brca-omics data (brcatest_go.csv for 10 patients with 10 genes) as an example. Users can build data as the format of the example.
This method is still on progress, any questions can be sent to [email protected]