SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data
This is the work for SurvBenchmark (202205 updated) and the associated paper can be found:
Zhang, Yunwei & Wong, Germaine & Mann, Graham & Muller, Samuel & Yang, Jean. (2021). SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data. 10.1101/2021.07.11.451967.
Paper github page:https://github.com/SydneyBioX/SurvBenchmark Please cite this paper if you would like to use the curated data.
We develop a benchmarking design, SurvBenchmark, that evaluates a diverse collection of survival models for both clinical and omics datasets. SurvBenchmark not only focuses on classical approaches such as the Cox model, but it also evaluates state-of-art machine learning survival models. There are 16 datasets (https://github.com/SydneyBioX/SurvBenchmark/blob/main/tables/table1.docx)
Table1. Datasets summary
Dataset (name used in this paper) | Number of observations | No. of variables | Type of data | Censoring rate(rounded to 4 decimalplaces) | Reference |
---|---|---|---|---|---|
Melanoma_itraq | 41 | 643 | Omics | 0.4146 | Mactier, Swetlana et al. “Protein signatures correspond to survival outcomes of AJCC stage III melanoma patients.” Pigment cell & melanoma research vol. 27,6 (2014): 1106-16. doi:10.1111/pcmr.12290 |
Melanoma_nano | 45 | 207 | Omics | 0.4222 | Wang,K.Y.X. et al. Cross-Platform Omics Prediction procedure: agame changer for implementing precision medicine in patientswithstage-IIImelanoma.bioRxiv2020.12.09.415927;doi:https://doi.org/10.1101/2020.12.09.415927 |
Ovarian_2 | 58 | 19818 | Omics | 0.3793 | Ganzfried,B.F.etal.(2013)curatedOvarianData:clinicallyannotateddatafortheovariancancertranscriptome.Database,2013. |
GE_5 | 78 | 4753 | Omics | 0.5641 | van'tVeer,L.J.etal.(2002)Geneexpressionprofilingpredictsclinical outcomeofbreast cancer.Nature,415,530–536. |
GE_3 | 86 | 6288 | Omics | 0.7209 | Bullinger,L.etal.(2004)UseofGene-ExpressionProfilingtoIdentifyPrognostic Subclasses in Adult Acute Myeloid Leukemia. NewEnglandJournalofMedicine,350, 1605–1616. |
Melanoma_clinical | 77 | 16 | Clinical | 0.3939 | Wang,K.Y.X. et al. Cross-Platform Omics Prediction procedure: agame changer for implementing precision medicine in patientswithstage-IIImelanoma.bioRxiv2020.12.09.415927;doi:https://doi.org/10.1101/2020.12.09.415927. |
GE_1 | 115 | 551 | Omics | 0.6670 | Sorlie,T. et al. (2003) Repeated observation of breast tumor subtypesin independent gene expression data sets. Proc. Natl. Acad. Sci.U.S. A., 100, 8418–8423. |
GE-_4 | 116 | 6285 | Omics | 0.5641 | van de Vijver,M.J. et al. (2002) A gene-expression signature as apredictorofsurvivalinbreastcancer.N.Engl.J.Med.,347,1999–2009. |
Veteran | 137 | 8 | Clinical | 0.0657 | Kalbfleisch,J.D.andPrentice,R.L.(2002)TheStatisticalAnalysisofFailureTimeData.WileySeriesinProbabilityandStatistics. |
Ovarian_1 | 194 | 16050 | Omics | 0.7062 | Ganzfried,B.F.etal.(2013)curatedOvarianData:clinicallyannotateddatafortheovariancancertranscriptome.Database,2013. |
Lung | 228 | 9 | Clinical | 0.2763 | Loprinzi,C.L.etal.(1994)Prospectiveevaluationofprognosticvariables from patient-completed questionnaires. North CentralCancerTreatment Group.J. Clin.Oncol., 12,601–607. |
GE_6 | 240 | 7401 | Omics | 0.4250 | Van Houwelingen,H.C. (2004) The Elements of Statistical Learning,Data Mining, Inference, and Prediction. Trevor Hastie, RobertTibshirani and Jerome Friedman, Springer, New York, 2001. No.of pages: xvi 533. ISBN 0-387-95284-5. Statistics in Medicine,23, 528–529. |
GE_2 | 295 | 4921 | Omics | 0.7322 | Beer,D.G.etal.(2002)Gene-expressionprofilespredictsurvivalofpatientswithlungadenocarcinoma.Nat.Med.,8,816–824. |
PBC | 312 | 7 | Clinical | 0.5994 | Fleming,T.R.andHarrington,D.P.(2005)CountingProcessesandSurvivalAnalysis.WileySeriesinProbabilityandStatistics. |
UNOS_Kidney | 3000 | 101 | Clinical | 0.7350 | OPTNdata (https://optn.transplant.hrsa.gov/) |
ANZ | 3323 | 40 | Clinical | 0.8739 | ANZDATA (https://www.anzdata.org.au/) |
and 20 survival methods (https://github.com/SydneyBioX/SurvBenchmark/blob/main/tables/table2.docx)
Table2. Summary of methods used in this study
Method name | Method name in this paper | R function name | R package name | Parameters (default) |
---|---|---|---|---|
Cox | Cox | coxph | survival | NA |
Cox with backward elimination using AIC | Cox_bw_AIC | cph,fastbw | rms | rule="aic",sls=.05,k.aic=2 |
Cox with backward elimination using pvalue | Cox_bw_p | cph,fastbw | rms | rule="p",sls=.05 |
Cox with backward elimination using BIC | Cox_bw_BIC | cph,fastbw | rms | rule="aic",sls=.05,k.aic=log(as.numeric(table(train$status)[2])) |
Lassocox (for clinical datasets) | Lasso_Cox | penalized | penalized | Lambda1=1,lambda2=0 |
Ridgecox (for clinical datasets) | Ridge_Cox | penalized | penalized | Lambda1=0,lambda2=1 |
Elasticnetcox (for clinical datasets) | EN_Cox | penalized | penalized | Lambda1=1,lambda2=1 |
Lassocox (for omics datasets) | Lasso_Cox | glmnet | glmnet | alpha=1,nfolds=5,type.measure="C" |
Ridgecox (for omics datasets) | Ridge_Cox | glmnet | glmnet | alpha=0,nfolds=5,type.measure="C" |
Elasticnetcox (for omics datasets) | EN_Cox | glmnet | glmnet | alpha=0.5,nfolds=5,type.measure="C" |
Random survival forest | RSF | rfsrc | RandomSurvivalForest | Default:ntree=1000,mtry=10 |
Multitask logistic regression method | MTLR | mtlr | MTLR | C1=1 |
DNNSurv (Deeplearning survival model) | DNNSurv | multiple functionsas in Github codes | DNNSurv | Default: no parameter arguments to be changed by users |
Boosting coxmodel | CoxBoost | coxboost | CoxBoost | stepnumber=10, penalty number=100 |
Cox model with genetic algorithmas feature selection method | Cox (GA) | GenAlg | GenAlgo | n.features=10(foromics),n.features=4(forclinical),generation_num=20 |
Multitask logistic regression model with genetic algorithmas feature selection method | MTLR(GA) | GenAlg | GenAlgo | n.features=10 (foromics),n.features=4 (forclinical),generation_num=20 |
Boosting cox model with genetic algorithmas feature selection method | CoxBoost (GA) | GenAlg | GenAlgo | n.features=10(foromics),n.features=4(forclinical),generation_num=20 |
Multitask logistic regression model with ranking based methodas feature selection method | MTLR(DE) | lmFit,eBayes | limma | n.features=10(foromics),n.features=4(forclinical) |
Boosting cox model with ranking based methodas feature selection method | CoxBoost (DE) | lmFit,eBayes | limma | n.features=10(foromics),n.features=4(forclinical) |
Survival support vector machine | SurvivalSVM | survivalsvm | survivalsvm | Default: sgf.sv = 5, sigf = 7, maxiter = 20, margin = 0.05, bound = 10, eig.tol = 1e-06, conv.tol = 1e-07, posd.tol = 1e-08 |
DeepSurv(Deeplearning survival model) | DeepSurv | deepsurv | survivalmodels | Default:frac=0.3,activation="relu",num_nodes=c(4L,8L,4L,2L),dropout=0.1,early_stopping=TRUE,epochs=100L,batch_size=32L |
DeepHit(Deeplearningsurvival model) | DeepHit | deephit | survivalmodels | Default:frac=0.3,activation="relu",num_nodes=c(4L,8L,4L,2L),dropout=0.1,early_stopping=TRUE,epochs=100L,batch_size=32L |
benchmarked in this study.
##############################################################################
library(devtools)
devtools::install_github("SydneyBioX/SurvBenchmark_package")
library(SurvBenchmark)
You may need to install the following dependencies first:
library(dplyr)
library(survival)
library(glmnet)
library(rms)
library(tidyverse)
library(caret)
library(pec)
library(coefplot)
library("survAUC")
library(gridExtra)
library(ggplot2)
library("survival")
library(survminer)
library(randomForestSRC)
library(ggRandomForests)
library(penalized)
library(DMwR)
library(randomForest)
library(riskRegression)
library(pROC)
library(ROCR)
library(cvTools)
library(parallel)
library(pbmcapply)
library(MTLR)
library(profmem)
library(keras)
library(pseudo)
library(survivalROC)
library(survival)
library(survcomp)
library(survAUC)
library(CoxBoost)
library(limma)
library(partykit)
library(coin)
library(compound.Cox)
library(GenAlgo)
library(survivalsvm)
library(rmatio)
library(survivalmodels)
library(reticulate)
Zhang, Yunwei & Wong, Germaine & Mann, Graham & Muller, Samuel & Yang, Jean. (2021). SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data. 10.1101/2021.07.11.451967.
Copyright [2022] [Yunwei Zhang]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.