Git Product home page Git Product logo

survbenchmark's Introduction

SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data

This is the work for SurvBenchmark (202205 updated) and the associated paper can be found:

Zhang, Yunwei & Wong, Germaine & Mann, Graham & Muller, Samuel & Yang, Jean. (2021). SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data. 10.1101/2021.07.11.451967.

Please cite this paper if you would like to use the curated data.

Introduction

We develop a benchmarking design, SurvBenchmark, that evaluates a diverse collection of survival models for both clinical and omics datasets. SurvBenchmark not only focuses on classical approaches such as the Cox model, but it also evaluates state-of-art machine learning survival models. There are 16 datasets (https://github.com/SydneyBioX/SurvBenchmark/blob/main/tables/table1.docx)

Table1. Datasets summary

Dataset (name used in this paper) Number of observations No. of variables Type of data Censoring rate(rounded to 4 decimalplaces) Reference
Melanoma_itraq 41 643 Omics 0.4146 Mactier, Swetlana et al. “Protein signatures correspond to survival outcomes of AJCC stage III melanoma patients.” Pigment cell & melanoma research vol. 27,6 (2014): 1106-16. doi:10.1111/pcmr.12290
Melanoma_nano 45 207 Omics 0.4222 Wang,K.Y.X. et al. Cross-Platform Omics Prediction procedure: agame changer for implementing precision medicine in patientswithstage-IIImelanoma.bioRxiv2020.12.09.415927;doi:https://doi.org/10.1101/2020.12.09.415927
Ovarian_2 58 19818 Omics 0.3793 Ganzfried,B.F.etal.(2013)curatedOvarianData:clinicallyannotateddatafortheovariancancertranscriptome.Database,2013.
GE_5 78 4753 Omics 0.5641 van'tVeer,L.J.etal.(2002)Geneexpressionprofilingpredictsclinical outcomeofbreast cancer.Nature,415,530–536.
GE_3 86 6288 Omics 0.7209 Bullinger,L.etal.(2004)UseofGene-ExpressionProfilingtoIdentifyPrognostic Subclasses in Adult Acute Myeloid Leukemia. NewEnglandJournalofMedicine,350, 1605–1616.
Melanoma_clinical 77 16 Clinical 0.3939 Wang,K.Y.X. et al. Cross-Platform Omics Prediction procedure: agame changer for implementing precision medicine in patientswithstage-IIImelanoma.bioRxiv2020.12.09.415927;doi:https://doi.org/10.1101/2020.12.09.415927.
GE_1 115 551 Omics 0.6670 Sorlie,T. et al. (2003) Repeated observation of breast tumor subtypesin independent gene expression data sets. Proc. Natl. Acad. Sci.U.S. A., 100, 8418–8423.
GE-_4 116 6285 Omics 0.5641 van de Vijver,M.J. et al. (2002) A gene-expression signature as apredictorofsurvivalinbreastcancer.N.Engl.J.Med.,347,1999–2009.
Veteran 137 8 Clinical 0.0657 Kalbfleisch,J.D.andPrentice,R.L.(2002)TheStatisticalAnalysisofFailureTimeData.WileySeriesinProbabilityandStatistics.
Ovarian_1 194 16050 Omics 0.7062 Ganzfried,B.F.etal.(2013)curatedOvarianData:clinicallyannotateddatafortheovariancancertranscriptome.Database,2013.
Lung 228 9 Clinical 0.2763 Loprinzi,C.L.etal.(1994)Prospectiveevaluationofprognosticvariables from patient-completed questionnaires. North CentralCancerTreatment Group.J. Clin.Oncol., 12,601–607.
GE_6 240 7401 Omics 0.4250 Van Houwelingen,H.C. (2004) The Elements of Statistical Learning,Data Mining, Inference, and Prediction. Trevor Hastie, RobertTibshirani and Jerome Friedman, Springer, New York, 2001. No.of pages: xvi 533. ISBN 0-387-95284-5. Statistics in Medicine,23, 528–529.
GE_2 295 4921 Omics 0.7322 Beer,D.G.etal.(2002)Gene-expressionprofilespredictsurvivalofpatientswithlungadenocarcinoma.Nat.Med.,8,816–824.
PBC 312 7 Clinical 0.5994 Fleming,T.R.andHarrington,D.P.(2005)CountingProcessesandSurvivalAnalysis.WileySeriesinProbabilityandStatistics.
UNOS_Kidney 3000 101 Clinical 0.7350 OPTNdata (https://optn.transplant.hrsa.gov/)
ANZ 3323 40 Clinical 0.8739 ANZDATA (https://www.anzdata.org.au/)

and 20 survival methods (https://github.com/SydneyBioX/SurvBenchmark/blob/main/tables/table2.docx)

Table2. Summary of methods used in this study

Method name Method name in this paper R function name R package name Parameters (default)
Cox Cox coxph survival NA
Cox with backward elimination using AIC Cox_bw_AIC cph,fastbw rms rule="aic",sls=.05,k.aic=2
Cox with backward elimination using pvalue Cox_bw_p cph,fastbw rms rule="p",sls=.05
Cox with backward elimination using BIC Cox_bw_BIC cph,fastbw rms rule="aic",sls=.05,k.aic=log(as.numeric(table(train$status)[2]))
Lassocox (for clinical datasets) Lasso_Cox penalized penalized Lambda1=1,lambda2=0
Ridgecox (for clinical datasets) Ridge_Cox penalized penalized Lambda1=0,lambda2=1
Elasticnetcox (for clinical datasets) EN_Cox penalized penalized Lambda1=1,lambda2=1
Lassocox (for omics datasets) Lasso_Cox glmnet glmnet alpha=1,nfolds=5,type.measure="C"
Ridgecox (for omics datasets) Ridge_Cox glmnet glmnet alpha=0,nfolds=5,type.measure="C"
Elasticnetcox (for omics datasets) EN_Cox glmnet glmnet alpha=0.5,nfolds=5,type.measure="C"
Random survival forest RSF rfsrc RandomSurvivalForest Default:ntree=1000,mtry=10
Multitask logistic regression method MTLR mtlr MTLR C1=1
DNNSurv (Deeplearning survival model) DNNSurv multiple functionsas in Github codes DNNSurv Default: no parameter arguments to be changed by users
Boosting coxmodel CoxBoost coxboost CoxBoost stepnumber=10, penalty number=100
Cox model with genetic algorithmas feature selection method Cox (GA) GenAlg GenAlgo n.features=10(foromics),n.features=4(forclinical),generation_num=20
Multitask logistic regression model with genetic algorithmas feature selection method MTLR(GA) GenAlg GenAlgo n.features=10 (foromics),n.features=4 (forclinical),generation_num=20
Boosting cox model with genetic algorithmas feature selection method CoxBoost (GA) GenAlg GenAlgo n.features=10(foromics),n.features=4(forclinical),generation_num=20
Multitask logistic regression model with ranking based methodas feature selection method MTLR(DE) lmFit,eBayes limma n.features=10(foromics),n.features=4(forclinical)
Boosting cox model with ranking based methodas feature selection method CoxBoost (DE) lmFit,eBayes limma n.features=10(foromics),n.features=4(forclinical)
Survival support vector machine SurvivalSVM survivalsvm survivalsvm Default: sgf.sv = 5, sigf = 7, maxiter = 20, margin = 0.05, bound = 10, eig.tol = 1e-06, conv.tol = 1e-07, posd.tol = 1e-08
DeepSurv(Deeplearning survival model) DeepSurv deepsurv survivalmodels Default:frac=0.3,activation="relu",num_nodes=c(4L,8L,4L,2L),dropout=0.1,early_stopping=TRUE,epochs=100L,batch_size=32L
DeepHit(Deeplearningsurvival model) DeepHit deephit survivalmodels Default:frac=0.3,activation="relu",num_nodes=c(4L,8L,4L,2L),dropout=0.1,early_stopping=TRUE,epochs=100L,batch_size=32L

benchmarked in this study.

##############################################################################

Files in this repo

In this repo, all the high resolution figures related to the paper can be found under folder "figures".

The folder "functions" contains functions to run all methods.

The folder "datasets" contains all datasets benchmarked in our paper.

The folder "figures_data" contains all figure data used to generate the figures in our paper.

The github_example.R file gives an example to get the results using methods in "functions" on the Ovarian dataset.

For the datasets we used, please check this Table1 in our paper, this is under "tables" table1.

For the survival methods we benchmarked, please check Table2 in our paper, this is under "tables" table2.

The R package is available at(https://github.com/SydneyBioX/SurvBenchmark_package), on-going work will be updated continuously.

###############################################################################

Installation

library(devtools)
devtools::install_github("SydneyBioX/SurvBenchmark_package")
library(SurvBenchmark)

Requirements

You may need to install the following dependencies first:

library(dplyr)
library(survival)
library(glmnet)
library(rms)
library(tidyverse)
library(caret)
library(pec)
library(coefplot)
library("survAUC")
library(gridExtra)
library(ggplot2)
library("survival")
library(survminer)
library(randomForestSRC)
library(ggRandomForests)
library(penalized)
library(DMwR)
library(randomForest)
library(riskRegression)
library(pROC)
library(ROCR)
library(cvTools)
library(parallel)
library(pbmcapply)
library(MTLR)
library(profmem)
library(keras)
library(pseudo)
library(survivalROC)
library(survival)
library(survcomp)
library(survAUC)
library(CoxBoost)
library(limma)
library(partykit)
library(coin)
library(compound.Cox)
library(GenAlgo)
library(survivalsvm)
library(rmatio)
library(survivalmodels)
library(reticulate)

Visualise the results

The comparison of survival models can be visualized using heatmap as the below example.

Reference

Zhang, Yunwei & Wong, Germaine & Mann, Graham & Muller, Samuel & Yang, Jean. (2021). SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data. 10.1101/2021.07.11.451967.

License

Copyright [2022] [Yunwei Zhang]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

survbenchmark's People

Contributors

yunwezhang avatar marnitorkel avatar

Stargazers

Zilun RUAN avatar Feifei avatar Dongjie Chen avatar  avatar Deniz Akdemir avatar Hassan Foroughi avatar

Watchers

Dario Strbenac avatar James Cloos avatar Garth Tarr avatar John Ormerod avatar Ellis Patrick avatar CSB Yang Laboratory avatar Samuel Muller avatar Shila Ghazanfar avatar

Forkers

fatfeifei

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.