Git Product home page Git Product logo

autonlab / auton-survival Goto Github PK

View Code? Open in Web Editor NEW
315.0 9.0 74.0 28.21 MB

Auton Survival - an open source package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events

Home Page: http://autonlab.github.io/auton-survival

License: MIT License

Python 100.00%
survival-analysis reliability-analysis python data-science deep-learning machine-learning time-to-event counterfactual-inference regression causal-inference

auton-survival's Introduction

codecov     License: MIT     GitHub Repo stars     CI

The auton-survival Package


The python package auton-survival is repository of reusable utilities for projects involving censored Time-to-Event Data. auton-survival provides a flexible APIs allowing rapid experimentation including dataset preprocessing, regression, counterfactual estimation, clustering and phenotyping and propensity adjusted evaluation.

For complete details on auton-survival see:

Contents

What is Survival Analysis?

Survival Analysis involves estimating when an event of interest, ( T ) would take places given some features or covariates ( X ). In statistics and ML these scenarious are modelled as regression to estimate the conditional survival distribution, ( P(T>t|X) ). As compared to typical regression problems, Survival Analysis differs in two major ways:

  • The Event distribution, ( T ) has positive support, ( T in [0, \infty) ).
  • There is presence of censoring (ie. a large number of instances of data are lost to follow up.)

Figure 1. Illustration of Censoring

The Auton Survival Package

The package auton_survival is repository of reusable utilities for projects involving censored Time-to-Event Data. auton_survival allows rapid experimentation including dataset preprocessing, regression, counterfactual estimation, clustering and phenotyping and propensity-adjusted evaluation.

Survival Regression

auton_survival.models

Currently supported Survival Models include:

  • auton_survival.models.dsm.DeepSurvivalMachines
  • auton_survival.models.dcm.DeepCoxMixtures
  • auton_survival.models.cph.DeepCoxPH
  • auton_survival.models.cmhe.DeepCoxMixturesHeterogenousEffects

Training a Deep Cox Proportional Hazards Model with auton-survival:

from auton_survival import datasets, preprocessing, models 

# Load the SUPPORT Dataset
outcomes, features = datasets.load_dataset("SUPPORT")

# Preprocess (Impute and Scale) the features
features = preprocessing.Preprocessor().fit_transform(features)

# Train a Deep Cox Proportional Hazards (DCPH) model
model = models.cph.DeepCoxPH(layers=[100])
model.fit(features, outcomes.time, outcomes.event)

# Predict risk at specific time horizons.
predictions = model.predict_risk(features, t=[8, 12, 16])

Figure 2. Violation of the Proportional Hazards Assumption

auton_survival.estimators [Demo Notebook]

This module provides a wrapper auton_survival.estimators.SurvivalModel to model survival datasets with standard survival (time-to-event) analysis methods. The use of the wrapper allows a simple standard interface for multiple different survival regression methods.

auton_survival.estimators also provides convenient wrappers around other popular python survival analysis packages to experiment with Random Survival Forests and Weibull Accelerated Failure Time regression models.

from auton_survival import estimators

# Train a Deep Survival Machines model using the SurvivalModel class.
model = estimators.SurvivalModel(model='dsm')
model.fit(features, outcomes)

# Predict risk at time horizons.
predictions = model.predict_risk(features, times=[8, 12, 16])

auton_survival.experiments [Demo Notebook]

Modules to perform standard survival analysis experiments. This module provides a top-level interface to run auton_survival style experiments of survival analysis, involving options for cross-validation and nested cross-validation style experiments with multiple different survival analysis models.

The module supports multiple model peroformance evaluation metrics and further eases evaluation by automatically computing the censoring adjusted estimates, such as Time Dependent Concordance Index and Brier Score with IPCW adjustment.

# auton_survival cross-validation experiment.
from auton_survival.datasets import load_dataset

outcomes, features = load_dataset(dataset='SUPPORT')
cat_feats = ['sex', 'income', 'race']
num_feats = ['age', 'resp', 'glucose']

from auton_survival.experiments import SurvivalRegressionCV
# Instantiate an auton_survival Experiment 
experiment = SurvivalRegressionCV(model='cph', num_folds=5, 
                                    hyperparam_grid=hyperparam_grid)

# Fit the `experiment` object with the specified Cox model.
model = experiment.fit(features, outcomes, metric='ibs',
                       cat_feats=cat_feats, num_feats=num_feats)

Phenotyping and Knowledge Discovery

auton_survival.phenotyping [Demo Notebook]

auton_survival.phenotyping allows extraction of latent clusters or subgroups of patients that demonstrate similar outcomes. In the context of this package, we refer to this task as phenotyping. auton_survival.phenotyping provides the following phenotyping utilities:

  • Intersectional Phenotyping: Recovers groups, or phenotypes, of individuals over exhaustive combinations of user-specified categorical and numerical features.
from auton_survival.phenotyping import IntersectionalPhenotyper

# ’ca’ is cancer status. ’age’ is binned into two quantiles.
phenotyper = IntersectionalPhenotyper(num_vars_quantiles=(0, .5, 1.0),
cat_vars=['ca'], num_vars=['age'])
phenotypes = phenotyper.fit_predict(features)
  • Unsupervised Phenotyping: Identifies groups of individuals based on structured similarity in the fature space by first performing dimensionality reduction of the input covariates, followed by clustering. The estimated probability of an individual to belong to a latent group is computed as the distance to the cluster normalized by the sum of distance to other clusters.
from auton_survival.phenotyping import ClusteringPhenotyper

# Dimensionality reduction using Principal Component Analysis (PCA) to 8 dimensions.
dim_red_method, = 'pca', 
# We use a Gaussian Mixture Model (GMM) with 3 components and diagonal covariance.
clustering_method, n_clusters = 'gmm', 

# Initialize the phenotyper with the above hyperparameters.
phenotyper = ClusteringPhenotyper(clustering_method=clustering_method,
                                  dim_red_method=dim_red_method,
                                  n_components=n_components,
                                  n_clusters=n_clusters)
# Fit and infer the phenogroups.
phenotypes = phenotyper.fit_predict(features)
  • Supervised Phenotyping: Identifies latent groups of individuals with similar survival outcomes. This approach can be performed as a direct consequence of training the Deep Survival Machines and Deep Cox Mixtures latent variable survival regression estimators using the predict latent z method.
from auton_survival.models.dcm import DeepCoxMixtures

# Instantiate a DCM Model with 3 phenogroups and a single hidden layer with size 100.
model = DeepCoxMixtures(k = 3, layers = [100])
model.fit(features, outcomes.time, outcomes.event, iters = 100, learning_rate = 1e-4)

# Infer the latent Phenotpyes
latent_z_prob = model.predict_latent_z(features)
phenotypings = latent_z_prob.argmax(axis=1)
  • Counterfactual Phenotyping: Identifies groups of individuals that demonstrate heterogenous treatment effects. That is, the learnt phenogroups have differential response to a specific intervention. Relies on the specially designed auton_survival.models.cmhe.DeepCoxMixturesHeterogenousEffects latent variable model.
from auton_survival.models.cmhe DeepCoxMixturesHeterogenousEffects

# Instantiate the CMHE model
model = DeepCoxMixturesHeterogenousEffects(random_seed=random_seed, k=k, g=g, layers=layers)

model = model.fit(features, outcomes.time, outcomes.event, intervention)
zeta_probs = model.predict_latent_phi(x_tr)
zeta = np.argmax(zeta_probs, axis=1)
  • Virtual Twins Phenotyping: Phenotyper that estimates the potential outcomes under treatment and control using a counterfactual Deep Cox Proportional Hazards model, followed by regressing the difference of the estimated counterfactual Restricted Mean Survival Times using a Random Forest regressor.
from auton_survival.phenotyping import SurvivalVirtualTwins

# Instantiate the Survival Virtual Twins
model = SurvivalVirtualTwins(horizon=365)
# Infer the estimated counterfactual phenotype probability.
phenotypes = model.fit_predict(features, outcomes.time, outcomes.event, interventions)

DAG representations of the unsupervised, supervised, and counterfactual probabilitic phenotypers in auton-survival are shown in the below figure. X represents the covariates, T the time-to-event and Z is the phenotype to be inferred.

A. Unsupervised Phenotyping    B. Supervised Phenotyping    C. Counterfactual Phenotyping

Figure 3. DAG Representations of the Phenotypers in auton-survival

Evaluation and Reporting

auton_survival.metrics

Helper functions to generate standard reports for common Survival Analysis tasks with support for bootstrapped confidence intervals.

  • Regression Metric: Metrics for survival model performance evaluation:
    • Brier Score
    • Integrated Brier Score
    • Area under the Receiver Operating Characteristic (ROC) Curve
    • Concordance Index
from auton_survival.metrics import survival_regression_metric

# Infer event-free survival probability from model
predictions = model.predict_survival(features, times)
# Compute Brier Score, Integrated Brier Score
# Area Under ROC Curve and Time Dependent Concordance Index
metrics = ['brs', 'ibs', 'auc', 'ctd']
score = survival_regression_metric(metric='brs', outcomes_train, 
                                   outcomes_test, predictions_test,
                                   times=times)
  • Treatment Effect: Used to compare treatment arms by computing the difference in the following metrics for treatment and control groups:
    • Time at Risk (TaR)
    • Risk at Time
    • Restricted Mean Survival Time (RMST)

A. Time at Risk                                                  B. RMST                                                  C. Risk at Time

Figure 4. Graphical representation of the Treatment Effect Metrics

from auton_survival.metrics import survival_diff_metric

# Compute the difference in RMST, Risk at Time, and TaR between treatment and control groups
metrics = ['restricted_mean', 'survival_at', 'tar']
effect = survival_diff_metric(metric='restricted_mean', outcomes=outcomes
                              treatment_indicator=treatment_indicator, 
                              weights=None, horizon=120, n_bootstrap=500)
  • Phenotype Purity: Used to measure a phenotyper’s ability to extract subgroups, or phenogroups, with differential survival rates by fitting a Kaplan-Meier estimator within each phenogroup followed by estimating the Brier Score or Integrated Brier Score within each phenogroup.
from auton_survival.metrics import phenotype_purity

# Measure phenotype purity using the Brier Score at event horizons of 1, 2 and 5 years.
phenotype_purity(phenotypes, outcomes, strategy='instantaneous', 
                 time=[365,730,1825])
# Measure phenotype purity using the Integrated Brier score at an event horizon of 5 years.
phenotype_purity(phenotypes, outcomes, strategy='integrated', time=1825)

auton_survival.reporting

Helper functions to generate plots for Survival Analysis tasks.

# Plot separate Kaplan Meier survival estimates for phenogroups.
auton_survival.reporting.plot_kaplanmeier(outcomes, groups=phenotypes)

# Plot separate Nelson-Aalen estimates for phenogroups.
auton_survival.reporting.plot_nelsonaalen(outcomes, groups=phenotypes)

Dataset Loading and Preprocessing

Helper functions to load and preprocess various time-to-event data like the popular SUPPORT, FRAMINGHAM and PBC dataset for survival analysis.

auton_survival.datasets

# Load the SUPPORT Dataset
from auton_survival.datasets import load_dataset
datasets = ['SUPPORT', 'PBC', 'FRAMINGHAM', 'MNIST', 'SYNTHETIC']
features, outcomes = datasets.load_dataset('SUPPORT')

auton_survival.preprocessing

This module provides a flexible API to perform imputation and data normalization for downstream machine learning models. The module has 3 distinct classes, Scaler, Imputer and Preprocessor. The Preprocessor class is a composite transform that does both Imputing and Scaling with a single function call.

# Preprocessing loaded Datasets
from auton_survival import datasets
features, outcomes = datasets.load_topcat()

from auton_survival.preprocessing import Preprocessing
features = Preprocessor().fit_transform(features,
                    cat_feats=['GENDER', 'ETHNICITY', 'SMOKE'],
                    num_feats=['height', 'weight'])

# The `cat_feats` and `num_feats` lists would contain all the categorical and numerical features in the dataset.

Citing and References

Please cite the following paper if you use the auton-survival package:

[1] auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. arXiv (2022)

  @article{nagpal2022auton,
    title={auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data},
    author={Nagpal, Chirag and Potosnak, Willa and Dubrawski, Artur},
    journal={arXiv preprint arXiv:2204.07276},
    year={2022}
  }

Additionally, auton-survival implements the following methodologies:

[2] Counterfactual Phenotyping with Censored Time-to-Events. ACM Conference on Knowledge Discovery and Data Mining (KDD) 2022

  @article{nagpal2022counterfactual,
  title={Counterfactual Phenotyping with Censored Time-to-Events},
  author={Nagpal, Chirag and Goswami, Mononito and Dufendach, Keith and Dubrawski, Artur},
  journal={arXiv preprint arXiv:2202.11089},
  year={2022}
  }

[3] Deep Cox Mixtures for Survival Regression. Conference on Machine Learning for Healthcare (2021)

  @inproceedings{nagpal2021dcm,
  title={Deep Cox mixtures for survival regression},
  author={Nagpal, Chirag and Yadlowsky, Steve and Rostamzadeh, Negar and Heller, Katherine},
  booktitle={Machine Learning for Healthcare Conference},
  pages={674--708},
  year={2021},
  organization={PMLR}
  }

[4] Deep Parametric Time-to-Event Regression with Time-Varying Covariates. AAAI Spring Symposium (2021)

  @InProceedings{pmlr-v146-nagpal21a,
  title={Deep Parametric Time-to-Event Regression with Time-Varying Covariates},
  author={Nagpal, Chirag and Jeanselme, Vincent and Dubrawski, Artur},
  booktitle={Proceedings of AAAI Spring Symposium on Survival Prediction - Algorithms, Challenges, and Applications 2021},
  series={Proceedings of Machine Learning Research},
  publisher={PMLR},
  }

[5] Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data with Competing Risks. IEEE Journal of Biomedical and Health Informatics (2021)

  @article{nagpal2021dsm,
  title={Deep survival machines: Fully parametric survival regression and representation learning for censored data with competing risks},
  author={Nagpal, Chirag and Li, Xinyu and Dubrawski, Artur},
  journal={IEEE Journal of Biomedical and Health Informatics},
  volume={25},
  number={8},
  pages={3163--3175},
  year={2021},
  publisher={IEEE}
  }

Compatibility and Installation

auton_survival requires python 3.5+ and pytorch 1.1+.

To evaluate performance using standard metrics auton_survival requires scikit-survival.

To install auton_survival, clone the following git repository:

foo@bar:~$ git clone https://github.com/autonlab/auton-survival.git
foo@bar:~$ pip install -r requirements.txt

Contributing

auton_survival is on GitHub. Bug reports and pull requests are welcome.

License

MIT License

Copyright (c) 2022 Carnegie Mellon University, Auton Lab

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.











auton-survival's People

Contributors

chiragnagpal avatar chufangao avatar github-actions[bot] avatar haivanle avatar jeanselme avatar kishanmaharaj avatar matteo4diani avatar potosnakw avatar shikhareddy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

auton-survival's Issues

DSM / SUPPORT - Paper Results

Hi!

Thanks for creating this great repo with such clean code and documentation! It has been very helpful with understanding the papers, building benchmarks, and using nice survival datasets, easily!

Got a question about DSM / SUPPORT dataset: I've been using your code to reproduce the results of https://arxiv.org/pdf/2003.01176.pdf, but the scores that I get (on C-index, Brier, etc..) are a little different than the paper. I would appreciate any help/tip (preprocessing, model usage - hyperparameters, etc..)!

Best!

Measure harrell C score

Hello
Is there a way to evaluate the model using harrell c concordance index rather than time dependent concordance?
What would be the code for that?

AUC time horizon

Thank you so much for the hard work
I have one question regarding the AUC metric.
And the time horizon.
The AUC is reported as an array and the times are reported as an array.

For example, if the time intervals are 0 to 1 , 1 to 2 and 2 to 3.

And the array is ( 0,1,2,3)

While the AUC is 0.6, 0.7, 0.8

So the array is (0.6,0.7.0.8)

Does this mean that the AUC for the period 0 till 3 is 0.8?
Or the AUC for the period 2 to 3 is 0.8?

Error evaluating AUC

Dear

Thank you very much for the hard work.
I am trying to evaluate the AUC
It is working very well for the training and validation dataset , but not work for the test dataset and I am getting the following error:

censoring survival function is zero at on or more time points

I wonder how to fix this problem?

Where is dsm_api?

I can't find dsm_api. It would be helpful for debugging object instantiation of the recurrent deep survival machine class.

Add usage example notebooks

Currently we do not have notebooks that compare performance of DSM with other models.

We would need to compare against:
-DeepHit
-DeepSurv
-Random Survival Forest
-Cox PH
on Time Dependent CI and Brier Score.

mat1 and mat2 shapes cannot be multiplied (10424x58 and 57x100)

Dear

When I run the deep cox mixtures and deep cox proportional hazard on my data, I get the following error:
RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_14400\3101778186.py in
23
24 # Obtain survival probabilities for validation set and compute the Integrated Brier Score
---> 25 predictions_val = model.predict_survival(x_val, times)
26 metric_val = survival_regression_metric('ibs', y_val, predictions_val, times, y_tr)
27 models.append([metric_val, model])

~\Desktop\auton_survival 2\auton-survival-master\auton_survival\estimators.py in predict_survival(self, features, times)
701 return _predict_dsm(self._model, features, times)
702 elif self.model == 'dcph':
--> 703 return _predict_dcph(self._model, features, times)
704 elif self.model == 'dcm':
705 return _predict_dcm(self._model, features, times)

~\Desktop\auton_survival 2\auton-survival-master\auton_survival\estimators.py in _predict_dcph(model, features, times)
232 times = times.ravel().tolist()
233
--> 234 return model.predict_survival(x=features.values, t=times)
235
236 def _fit_cph(features, outcomes, val_data, random_seed, **hyperparams):

~\Desktop\auton_survival 2\auton-survival-master\auton_survival\models\cph_init_.py in predict_survival(self, x, t)
231 t = [t]
232
--> 233 scores = predict_survival(self.torch_model, x, t)
234 return scores
235

~\Desktop\auton_survival 2\auton-survival-master\auton_survival\models\cph\dcph_utilities.py in predict_survival(model, x, t)
144
145 model, breslow_spline = model
--> 146 lrisks = model(x).detach().cpu().numpy()
147
148 unique_times = breslow_spline.baseline_survival_.x

~\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

~\Desktop\auton_survival 2\auton-survival-master\auton_survival\models\cph\dcph_torch.py in forward(self, x)
27 def forward(self, x):
28
---> 29 return self.expert(self.embedding(x))
30
31 class DeepRecurrentCoxPHTorch(DeepCoxPHTorch):

~\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

~\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\container.py in forward(self, input)
202 def forward(self, input):
203 for module in self:
--> 204 input = module(input)
205 return input
206

~\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

~\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\linear.py in forward(self, input)
112
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) -> str:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (10424x58 and 57x100)

I wonder if you can kindly help me with that

License issue

Hello,

I noticed that you use the MIT license for your library; however, in the dependencies, you have GPL3 libraries.

More exactly, the scikit-survival dependency is distributed under the GPL-3.0 License.

If you incorporate this kind of license, you also need to be GPL-3 (reference)

This can be problematic in the open-source community, for anyone using your library.

Don't get me wrong, I want to use your library, but your depends would force me to also distribute under the GPL-3 license.
Please remove the GPL-3 dependencies or update your license accordingly.

metrics.py indexes problem

Hi,
I am using your package for the development of a dcph model. In order to see the performances on a validation set, I call the functions from metrics.py; as an example:

brier = survival_regression_metric('brs', outcomes = validation_outcome,
                                      predictions = out_survival,
                                      times=times, outcomes_train=training_outcome)

However, it throws me the following error:

_File "/Users/micheleatzeni/PycharmProjects/brainteaser/auton-survival/auton_survival/metrics.py", line 235, in survival_regression_metric
return _metric(survival_train, survival_test, predictions, times)
File "/Users/micheleatzeni/PycharmProjects/brainteaser/auton-survival/auton_survival/metrics.py", line 265, in cumulative_dynamic_auc
return metrics.cumulative_dynamic_auc(survival_train, survival_test[idx], 1-predictions[idx], times)[0]
IndexError: index 628 is out of bounds for axis 0 with size 628

Since python lists as well as numpy arrays are 0-based, the largest index in an array of size 628 would be 627.
So looking at the specific metrics.py functions, I think the possible problem is in the return statement (idx):

def _brier_score(survival_train, survival_test, predictions, times, random_seed=None):

  idx = np.arange(len(predictions))
  if random_seed is not None:
    np.random.seed(random_seed)
    idx = np.random.choice(idx, len(predictions), replace=True)

  return metrics.brier_score(survival_train, survival_test[idx], predictions[idx], times)[-1]

It should be idx-1?

Predict Score Function for Deep Cox Mixtures (PyTorch Implementation)

Hi! Really inspired by this work and tried to implement the DCM model in our project. Noticed that there was a predict_scores function in the tensorflow implementation of dcm but unable to find the same function in PyTorch. Would really appreciate your help here.
Thank you!

The parameter of shape and scale become NaN

hello, I use to DeepRecurrentSurvivalMachines to model the staying length in ICU, the parameter of shape and scale become NaN after the first back propagation. how to deal with input data?

test_pbc_dataset and test_framingham_dataset unit tests are wrongly indented

test_pbc_dataset and test_framingham_dataset unit tests are wrongly indented, so they fail to execute when running pytest. Fixing the indentation allows all 3 unit tests to run.

def test_pbc_dataset(self):
"""Test function to load and test the PBC dataset.
"""
x, t, e = datasets.load_dataset('PBC')
t_median = np.median(t[e==1])
self.assertIsInstance(x, np.ndarray)
self.assertIsInstance(t, np.ndarray)
self.assertIsInstance(e, np.ndarray)
self.assertEqual(x.shape, (1945, 25))
self.assertEqual(t.shape, (1945,))
self.assertEqual(e.shape, (1945,))
model = DeepSurvivalMachines()
self.assertIsInstance(model, DeepSurvivalMachines)
model.fit(x, t, e, iters=10)
self.assertIsInstance(model.torch_model,
DeepSurvivalMachinesTorch)
risk_score = model.predict_risk(x, t_median)
survival_probability = model.predict_survival(x, t_median)
np.testing.assert_equal((risk_score+survival_probability).all(), 1.0)
def test_framingham_dataset(self):
"""Test function to load and test the Framingham dataset.
"""
x, t, e = datasets.load_dataset('FRAMINGHAM')
t_median = np.median(t)
self.assertIsInstance(x, np.ndarray)
self.assertIsInstance(t, np.ndarray)
self.assertIsInstance(e, np.ndarray)
self.assertEqual(x.shape, (11627, 18))
self.assertEqual(t.shape, (11627,))
self.assertEqual(e.shape, (11627,))
model = DeepSurvivalMachines()
self.assertIsInstance(model, DeepSurvivalMachines)
model.fit(x, t, e, iters=10)
self.assertIsInstance(model.torch_model,
DeepSurvivalMachinesTorch)
risk_score = model.predict_risk(x, t_median)
survival_probability = model.predict_survival(x, t_median)
np.testing.assert_equal((risk_score+survival_probability).all(), 1.0)

Usage of GPU

Hello. I'm jamie from Yonsei Grad school.

I've been researching for Deep survival models using SEER data and i found out yours!
I've tried to make the use of DSM(DeepSruvivalMachines) but, it takes so long to have results due to my heavy data. So, i tried gpu_support branch but it didn't work at all (It was working in only CPU) :(
As far as i'm concerned, DSM is an installed package based on pytorch, which GPU can be supported. I'm wondering whether all branches including 'GPU_support' work on only in CPU?
I'm looking forward to hearing you. Thanks for your sincere efforts.

Link with SHAP value

Hello. This is Jamie from South Korea.

https://github.com/slundberg/shap <- This is what i'd like to link with Deep Survival Machines. I've tried it with KernelExplainer to see SHAP value but it failed. As a publisher of DSM, do you think one of explainers support for DSM with Survival data?

Always thank you for your efforts!
Best,
Jamie.

install problem

hello:
I want to install the package,please tell mo how to install by conda or pip?
Thank you!

Instantiating DeepRecurrentSurvivalMachines object without parameters breaks many things

Instantiating DeepRecurrentSurvivalMachines object without parameters does not raise errors/warnings, but it breaks when calling fit.

This is currently valid:

model = DeepRecurrentSurvivalMachines()

But breaks when calling fit:

model.fit(x, t, e)

Generates the following error with python 3.7:

File "test.py", line 13, in <module>
    model.fit(x, t, e)
File "auton_survival/models/dsm/__init__.py", line 256, in fit
    model = self._gen_torch_model(inputdim, optimizer, risks=maxrisk)
File "auton_survival/models/dsm/__init__.py", line 535, in _gen_torch_model
risks=risks)
File "auton_survival/models/dsm/dsm_torch.py", line 271, in __init__
    self._init_dsm_layers(hidden)
File "auton_survival/models/dsm/dsm_torch.py", line 164, in _init_dsm_layers
    ) for r in range(self.risks)})
File "auton_survival/models/dsm/dsm_torch.py", line 164, in <dictcomp>
    ) for r in range(self.risks)})
File "anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 76, in __init__
    self.weight = Parameter(torch.Tensor(out_features, in_features))
TypeError: new(): argument 'size' must be tuple of ints, but found element of type NoneType at pos 2

Error with python 3.9:

File "test.py", line 13, in <module>
    model.fit(x, t, e)
File "auton_survival/models/dsm/__init__.py", line 256, in fit
    model = self._gen_torch_model(inputdim, optimizer, risks=maxrisk)
File "auton_survival/models/dsm/__init__.py", line 526, in _gen_torch_model
    return DeepRecurrentSurvivalMachinesTorch(inputdim,
File "auton_survival/models/dsm/dsm_torch.py", line 271, in __init__
    self._init_dsm_layers(hidden)
File "auton_survival/models/dsm/dsm_torch.py", line 162, in _init_dsm_layers
    self.gate = nn.ModuleDict({str(r+1): nn.Sequential(
File "auton_survival/models/dsm/dsm_torch.py", line 163, in <dictcomp>
    nn.Linear(lastdim, self.k, bias=False)
File "/anaconda3/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 81, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
TypeError: empty(): argument 'size' must be tuple of ints, but found element of type NoneType at pos 2

A quick fix could be raising an error in the DeepRecurrentSurvivalMachines constructor based on required parameters or using appropriate default parameters.

Format of time varying dataset

Hello,

I want to use RDSM, but I would like to confirm some things regarding the formatting:

  1. Why does the time variable decreases as you move along the observations for a particular ID? Normally, when I use a survival package time is strictly increasing. For example:
id status2 time drug_D-penicil drug_placebo sex_female
1 1 1.095170 1.0 0.0 1.0
1 1 0.569489 1.0 0.0 1.0
2 0 14.152338 1.0 0.0 1.0
2 0 13.654036 1.0 0.0 1.0
2 0 13.152995 1.0 0.0 1.0
2 0 12.049611 1.0 0.0 1.0
2 0 9.251451 1.0 0.0 1.0
2 0 8.263060 1.0 0.0 1.0
2 0 7.266455 1.0 0.0 1.0
2 0 6.261636 1.0 0.0 1.0
2 0 5.319790 1.0 0.0 1.0
3 1 2.770781 1.0 0.0 0.0
3 1 2.288906 1.0 0.0 0.0
3 1 1.774176 1.0 0.0 0.0
3 1 0.736502 1.0 0.0 0.0
4 1 5.270507 1.0 0.0 1.0
4 1 4.755777 1.0 0.0 1.0
4 1 4.251999 1.0 0.0 1.0
4 1 3.274559 1.0 0.0 1.0
4 1 1.837148 1.0 0.0 1.0
  1. Does each record per ID represent a change in one of the covariates or each record it is just an increment in time regardless whether a covariate changed?

  2. The format you always have to feed the model is a list that contains a separate numpy matrix containing each record per ID?

Thank you :)

Amazing package!

Time varying

Hi!

Thank you for this work. Are there any possibility of use this library for survival analysis with time varying using deep learning approaches???

Thanks
Pablo

ValueError: Input estimate contains NaN

Hello! Thanks for such unique package. I am trying to use DeepSurvivalMachines (note: for example, on the same dataset DeepCoxMixtures work without any issues), here is the error log:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [24], line 12
      8 et_val = np.array([(e_val[i], t_val[i]) for i in range(len(e_val))],
      9                  dtype = [('e', bool), ('t', float)])
     11 for i, _ in enumerate(times):
---> 12     cis.append(concordance_index_ipcw(et_train, et_test, out_risk[:, i], times[i])[0])
     13 #brs.append(brier_score(et_train, et_test, out_survival, times)[1])
     14 roc_auc = []

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/sksurv/metrics.py:324, in concordance_index_ipcw(survival_train, survival_test, estimate, tau, tied_tol)
    321     mask = test_time < tau
    322     survival_test = survival_test[mask]
--> 324 estimate = _check_estimate_1d(estimate, test_time)
    326 cens = CensoringDistributionEstimator()
    327 cens.fit(survival_train)

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/sksurv/metrics.py:36, in _check_estimate_1d(estimate, test_time)
     35 def _check_estimate_1d(estimate, test_time):
---> 36     estimate = check_array(estimate, ensure_2d=False, input_name="estimate")
     37     if estimate.ndim != 1:
     38         raise ValueError(
     39             'Expected 1D array, got {:d}D array instead:\narray={}.\n'.format(
     40                 estimate.ndim, estimate))

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/sklearn/utils/validation.py:899, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    893         raise ValueError(
    894             "Found array with dim %d. %s expected <= 2."
    895             % (array.ndim, estimator_name)
    896         )
    898     if force_all_finite:
--> 899         _assert_all_finite(
    900             array,
    901             input_name=input_name,
    902             estimator_name=estimator_name,
    903             allow_nan=force_all_finite == "allow-nan",
    904         )
    906 if ensure_min_samples > 0:
    907     n_samples = _num_samples(array)

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/sklearn/utils/validation.py:146, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
    124         if (
    125             not allow_nan
    126             and estimator_name
   (...)
    130             # Improve the error message on how to handle missing values in
    131             # scikit-learn.
    132             msg_err += (
    133                 f"\n{estimator_name} does not accept missing values"
    134                 " encoded as NaN natively. For supervised learning, you might want"
   (...)
    144                 "#estimators-that-handle-nan-values"
    145             )
--> 146         raise ValueError(msg_err)
    148 # for object dtype data, we only check for NaNs (GH-13254)
    149 elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input estimate contains NaN.

Some more details:

  1. I convert all df to the float64, and to int64:
features = df_features.copy().astype('float64')

outcomes = pd.DataFrame()
outcomes['event'] = pd.DataFrame(data_y)['Status'].astype('int64')
outcomes['time'] = pd.DataFrame(data_y)['Survival_in_days'].astype('int64')

features_val = df_features_val.copy().astype('float64')
outcomes_val = pd.DataFrame()
outcomes_val['event'] = pd.DataFrame(data_y_val)['Status'].astype('int64')
outcomes_val['time'] = pd.DataFrame(data_y_val)['Survival_in_days'].astype('int64')
  1. Then training the model:
from auton_survival.models.dsm import DeepSurvivalMachines
from sklearn.model_selection import ParameterGrid

param_grid = {'k' : [3, 4, 6],
              'distribution' : ['LogNormal', 'Weibull'],
              'learning_rate' : [ 1e-4, 1e-3],
              'layers' : [ [], [100], [100, 100] ]
             }
params = ParameterGrid(param_grid)

models = []
for param in params:
    model = DeepSurvivalMachines(k = param['k'],
                                 distribution = param['distribution'],
                                 layers = param['layers'])
    
    # The fit method is called to train the model
    model.fit(x, outcomes.time, outcomes.event, iters = 100, learning_rate = param['learning_rate'])
    models.append([[model.compute_nll(x_val, outcomes_val.time, outcomes_val.event), model]])
best_model = min(models)
model = best_model[0][1]
  1. And then it fails on the evaluation step:
cis = []
brs = []

et_train = np.array([(e_train[i], t_train[i]) for i in range(len(e_train))],
                 dtype = [('e', bool), ('t', float)])
et_test = np.array([(e_test[i], t_test[i]) for i in range(len(e_test))],
                 dtype = [('e', bool), ('t', float)])
et_val = np.array([(e_val[i], t_val[i]) for i in range(len(e_val))],
                 dtype = [('e', bool), ('t', float)])
times = np.quantile(outcomes.time[outcomes.event==1], [0.25, 0.5, 0.6]).tolist()
for i, _ in enumerate(times):
    cis.append(concordance_index_ipcw(et_train, et_test, out_risk[:, i], times[i])[0])

When I check out_risk[:, I] that was created by the model.predict_risk(x_val, times) its all nans:

[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]

Does that mean that the model did not converge? Any tips are appreciated!

Problems in DeepCoxMixture with possible solutions

When I used DeepCoxMixture in my synthetic data I found some problems:

  1. Patience is too low. It cannot be controlled by the fit method, so I had to manually changed it to 50. I also change the code a little bit in that part, so the patience only take into account the best result:
var add2 = function(number) {
     if valcn > valc:
     patience_ += 1
   else:
     patience_ = 0
     valc = valcn
}
  1. In some execution I get a SIGSEGV error. After some digging I found the error is caused by the UnivariateSpline module. I found the use of this spline causes some undesirable effects, like placing negative values. I have changed this module to the more stable Pchipinterpolator, which is able to preserve the monotony of the curve. I have obtained mode stable results with this approach. I have only changed the function:
def fit_spline(t, surv, s=1e-4):
  # return UnivariateSpline(t, surv, s=s, ext=3, k=1)
  return PchipInterpolator(t, surv)
  1. I also suggest to change the repair_probs function to prevent some infinite values that can appear:
def repair_probs(probs):
  probs[torch.isnan(probs)] = -10
  probs[probs>10] = 10
  probs[probs<-10] = -10
  return probs

Perform some validation of input to models

Some basic validation should be performed on the input, i.e. checking for NaN or proper datatype.

For example, a nan value in the time (event duration) array generates an obscure error (I discovered it because of a bug in my data preprocessing).

To replicate:

x, t, e = datasets.load_dataset('PBC')
model = DeepSurvivalMachines()
t[-1] = np.nan
model.fit(x, t, e)

Generates the following error:

File "test.py", line 17, in <module>
model.fit(x, t, e)
File "auton_survival/models/dsm/__init__.py", line 257, in fit
model, _ = train_dsm(model,
File "auton_survival/models/dsm/utilities.py", line 132, in train_dsm
premodel = pretrain_dsm(model,
File "auton_survival/models/dsm/utilities.py", line 73, in pretrain_dsm
loss += unconditional_loss(premodel, t_train, e_train, str(r+1))
File "auton_survival/models/dsm/losses.py", line 121, in unconditional_loss
return _weibull_loss(model, t, e, risk)
File "auton_survival/models/dsm/losses.py", line 113, in _weibull_loss
ll += f[uncens].sum() + s[cens].sum()
IndexError: index 1653 is out of bounds for dimension 0 with size 1653

Run demonstration data error

Hello, I want to apply DSM to the construction of clinical prediction models. When running the demonstration data, an error occurred. You see where the problem lies. Thank you for your reply.

捕获

Need Better Test Coverage

Currently dsm does not have adequate test coverage. We would need more unit tests to improve coverage.

Early stopping?

Hi,

I was wondering if there's an early stopping parameter for dsm that's already implemented, or an ideal work around that you would use now to get it going. I've been digging through the source code and can't seem to find any. I imagine its not too complicated as you're extending pytorch, but I'm also not sure what the best way to make it extend your functions would be.

Loss function and the pretrained model (DSM)

Thank you for this wonderful job.

I have some questions about the loss function.

According to the original paper of DSM (Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data with Competing Risks. IEEE Journal of Biomedical & Health Informatics (2021)), the loss function consists of three parts, i.e., uncensored, censoring and prior.

I have found the uncensored loss and censoring loss in the code losses.py, but I haven't seen anything related to the prior loss. In the code

premodel = pretrain_dsm(model,

I noticed that during the training of DSM model, a pretrained model is trained first to fill the shape and scale value of DSM. (I think these two parameters correspond to and ). And this pre training process is not mentioned in the paper.

So my questions are:

  1. Is the prior loss implemented in the code? If so, which part of code is about the prior loss?

  2. Is the pretrained model related to the prior loss? (Because in the generative story of paper, it says

"the set of parameters {\tilde{\beta}k}{k=1}^{K} and {\tilde{\eta}k}{k=1}^{K} are drawn from the prior and ."

  1. (This one is not about the loss) Why does follow ? Could you give me some hints?

Thank you very much.

random_seed is not defined in DeepRecurrentSurvivalMachines class

NameError: name 'random_seed' is not defined

https://github.com/autonlab/auton-survival/blob/4af6ebe2bdb24e8840c50de86c1864b8fa3c18a/auton_survival/models/dsm/__init__.py#L517

Possible fix:

  def __init__(self, k=3, layers=None, hidden=None,
               distribution="Weibull", temp=1000., discount=1.0, typ="LSTM", random_seed=0):
    super(DeepRecurrentSurvivalMachines, self).__init__(k=k,
                                                        layers=layers,
                                                        distribution=distribution,
                                                        temp=temp,
                                                        discount=discount,
                                                        random_seed=random_seed)

Implementation for time-varying survival analysis

Hello,

Thanks for all the good work put into this package, it is definitely a big contribution to the community. I am currently looking into this package to model moving dates in a real state context.

I wanted to know about the time-varying implementation of the RDSM model. Currently working with time series data and time to event, so I wanted to know what are the caviats that we need to keep in mind when implementing the RDSM in comparison to the baseline DSM. Are there any particular steps that we need to be careful of during the pre-processing (apart from the obvious expanded dataset) such as high dimensionality, categorical variables, missing values or any other ?

For anyone wondering about the time-varying implementation, there is an example added:
https://github.com/autonlab/auton-survival/blob/master/examples/RDSM%20on%20PBC%20Dataset.ipynb

and a publication:
http://proceedings.mlr.press/v146/nagpal21a/nagpal21a.pdf

C-index calculation for RDSM

Hi there,

In your impressive work, you compared the performance of the longitudinal model RDSM with several time-independent models. And RDSM achieved the best performance in most cases.

In the example notebook demonstrating the usage of RDSM, I notice that in calculating the C-index, by using something like:

et_train = np.array([(e_train[i][j], t_train[i][j]) for i in range(len(e_train)) for j in range(len(e_train[i]))], dtype = [('e', bool), ('t', float)])

the input essentially treats each time step of a patient as an individual sample. This is rather different from the evaluation of DSM, and the interpretation of the resulting C-index should be different as well.

So I'm wondering how do you evaluate the performances of RDSM and DSM in your paper?
Thank you very much.

ValueError: optimizer got an empty parameter list

param_grid = {'k' : [3, 4, 6],
              'distribution' : ['LogNormal', 'Weibull'],
              'learning_rate' : [1e-4, 1e-3],
              'batch_size': [64, 128],
              'hidden': [50, 100],
              'layers': [3, 2, 1],
              'typ': ['LSTM', 'GRU', 'RNN'],
              'optim': ['Adam', 'SGD'],
             }
params = ParameterGrid(param_grid)

models = []
for param in params:
    model = DeepRecurrentSurvivalMachines(k = param['k'],
                                          distribution = param['distribution'],
                                          hidden = param['hidden'], 
                                          typ = param['typ'],
                                          layers = param['layers'])
    # The fit method is called to train the model
    model.fit(x_train, t_train, e_train, iters = 1, learning_rate=param['learning_rate'], 
             batch_size=param['batch_size'], optimizer=param['optim'])
    models.append([[model.compute_nll(x_valid, t_valid, e_valid), model]])

best_model = min(models)
model = best_model[0][1]

As soon as I ran above script, I got below error. what should i do to solve this problem?

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
8 # The fit method is called to train the model
9 model.fit(x_train, t_train, e_train, iters = 1, learning_rate=param['learning_rate'],
---> 10 batch_size=param['batch_size'], optimizer=param['optim'])
11 models.append([[model.compute_nll(x_valid, t_valid, e_valid), model]])
12

~/data/nas125/hepa/codes/auton_survival/models/dsm/init.py in fit(self, x, t, e, vsize, val_data, iters, learning_rate, batch_size, elbo, optimizer)
265 elbo=elbo,
266 bs=batch_size,
--> 267 random_seed=self.random_seed)
268
269 self.torch_model = model.eval()

~/data/nas125/hepa/codes/auton_survival/models/dsm/utilities.py in train_dsm(model, x_train, t_train, e_train, x_valid, t_valid, e_valid, n_iter, lr, elbo, bs, random_seed)
137 n_iter=10000,
138 lr=1e-2,
--> 139 thres=1e-4)
140
141 for r in range(model.risks):

~/data/nas125/hepa/codes/auton_survival/models/dsm/utilities.py in pretrain_dsm(model, t_train, e_train, t_valid, e_valid, n_iter, lr, thres)
61 premodel.double()
62
---> 63 optimizer = get_optimizer(premodel, lr)
64
65 oldcost = float('inf')

~/data/nas125/hepa/codes/auton_survival/models/dsm/utilities.py in get_optimizer(model, lr)
43
44 if model.optimizer == 'Adam':
---> 45 return torch.optim.Adam(model.parameters(), lr=lr)
46 elif model.optimizer == 'SGD':
47 return torch.optim.SGD(model.parameters(), lr=lr)

~/anaconda3/envs/ml/lib/python3.7/site-packages/torch/optim/adam.py in init(self, params, lr, betas, eps, weight_decay, amsgrad)
40 defaults = dict(lr=lr, betas=betas, eps=eps,
41 weight_decay=weight_decay, amsgrad=amsgrad)
---> 42 super(Adam, self).init(params, defaults)
43
44 def setstate(self, state):

~/anaconda3/envs/ml/lib/python3.7/site-packages/torch/optim/optimizer.py in init(self, params, defaults)
44 param_groups = list(params)
45 if len(param_groups) == 0:
---> 46 raise ValueError("optimizer got an empty parameter list")
47 if not isinstance(param_groups[0], dict):
48 param_groups = [{'params': param_groups}]

ValueError: optimizer got an empty parameter list`

RuntimeError: expected scalar type Float but found Double

Hi! Thanks for the great package!
Any ideas on why the same dataset works for all models, except the dsm one?

here is the error log:

At hyper-param {'distribution': 'Weibull', 'k': 2, 'layers': [100, 100], 'learning_rate': 1e-05}
At fold: 0
100%|███████████████████████████████████| 10000/10000 [00:05<00:00, 1678.94it/s]
100%|██████████████████████████████████████████| 50/50 [00:00<00:00, 222.40it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [126], line 34
     24 # Instantiate an auton_survival Experiment 
     25 #dsm  cph
     26 #Survival model choices include:
   (...)
     30 # |      - 'rsf' : Random Survival Forests [1] model
     31 # |      - 'cph' : Cox Proportional Hazards [2] model
     32 experiment = SurvivalRegressionCV(model='dsm', num_folds=6, 
     33                                     hyperparam_grid=param_grid)
---> 34 model = experiment.fit(x, outcomes, metric='ibs',horizons=times)
     36 times = np.quantile(outcomes.time[outcomes.event==1], [0.25, 0.5, 0.6]).tolist()
     38 # Fit the `experiment` object with the specified Cox model.
     39 #experiment = estimators.SurvivalModel(model='dsm')
     40 #model = experiment.fit(x, outcomes)

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/experiments.py:164, in SurvivalRegressionCV.fit(self, features, outcomes, horizons, metric)
    162 model = SurvivalModel(self.model, random_seed=self.random_seed, **hyper_param)
    163 model.fit(features.loc[self.folds!=fold], outcomes.loc[self.folds!=fold])
--> 164 predictions = model.predict_survival(features.loc[self.folds==fold], times=horizons)
    166 score = survival_regression_metric(metric=self.metric, 
    167                                    outcomes=outcomes.loc[self.folds==fold],
    168                                    predictions=predictions,
    169                                    times=horizons,
    170                                    outcomes_train=outcomes.loc[self.folds!=fold])
    171 fold_scores.append(np.mean(score))

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/estimators.py:701, in SurvivalModel.predict_survival(self, features, times)
    699   return _predict_rsf(self._model, features, times)
    700 elif self.model == 'dsm':
--> 701   return _predict_dsm(self._model, features, times)
    702 elif self.model == 'dcph':
    703   return _predict_dcph(self._model, features, times)

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/estimators.py:420, in _predict_dsm(model, features, times)
    400 def _predict_dsm(model, features, times):
    402   """Predict survival at specified time(s) using the Deep Survival Machines.
    403 
    404   Parameters
   (...)
    417 
    418   """
--> 420   survival_predictions = model.predict_survival(x=features.values, t=times)
    421   survival_predictions = pd.DataFrame(survival_predictions, columns=times).T
    423   return __interpolate_missing_times(survival_predictions, times)

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/models/dsm/__init__.py:415, in DSMBase.predict_survival(self, x, t, risk)
    413   t = [t]
    414 if self.fitted:
--> 415   scores = losses.predict_cdf(self.torch_model, x, t, risk=str(risk))
    416   return np.exp(np.array(scores)).T
    417 else:

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/models/dsm/losses.py:518, in predict_cdf(model, x, t_horizon, risk)
    516 torch.no_grad()
    517 if model.dist == 'Weibull':
--> 518   return _weibull_cdf(model, x, t_horizon, risk)
    519 if model.dist == 'LogNormal':
    520   return _lognormal_cdf(model, x, t_horizon, risk)

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/models/dsm/losses.py:335, in _weibull_cdf(model, x, t_horizon, risk)
    331 def _weibull_cdf(model, x, t_horizon, risk='1'):
    333   squish = nn.LogSoftmax(dim=1)
--> 335   shape, scale, logits = model.forward(x, risk)
    336   logits = squish(logits)
    338   k_ = shape

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/models/dsm/dsm_torch.py:204, in DeepSurvivalMachinesTorch.forward(self, x, risk)
    196 def forward(self, x, risk='1'):
    197   """The forward function that is called when data is passed through DSM.
    198 
    199   Args:
   (...)
    202 
    203   """
--> 204   xrep = self.embedding(x)
    205   dim = x.shape[0]
    206   return(self.act(self.shapeg[risk](xrep))+self.shape[risk].expand(dim, -1),
    207          self.act(self.scaleg[risk](xrep))+self.scale[risk].expand(dim, -1),
    208          self.gate[risk](xrep)/self.temp)

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/torch/nn/modules/container.py:139, in Sequential.forward(self, input)
    137 def forward(self, input):
    138     for module in self:
--> 139         input = module(input)
    140     return input

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: expected scalar type Float but found Double

and the code:

# auton_survival cross-validation experiment.
from auton_survival.datasets import load_dataset
from auton_survival.preprocessing import Preprocessor
from auton_survival.metrics import survival_regression_metric
from auton_survival import estimators

param_grid = {'k' : [2],
              'distribution' : ['Weibull'],
              'learning_rate' : [1e-5],
              'layers' : [[100,100]]}

#outcomes, features = load_dataset(dataset='SUPPORT')
cat_feats = []
num_feats = list(features.columns)

preprocessor = Preprocessor(cat_feat_strat='ignore', num_feat_strat= 'mean') 
x = preprocessor.fit_transform(features, cat_feats=cat_feats, num_feats=num_feats,
                                one_hot=True, fill_value=-1)

x_val = preprocessor.fit_transform(features_val, cat_feats=cat_feats, num_feats=num_feats,
                                one_hot=True, fill_value=-1)

from auton_survival.experiments import SurvivalRegressionCV
# Instantiate an auton_survival Experiment 
#dsm  cph
#Survival model choices include:
# |      - 'dsm' : Deep Survival Machines [3] model
# |      - 'dcph' : Deep Cox Proportional Hazards [2] model
# |      - 'dcm' : Deep Cox Mixtures [4] model
# |      - 'rsf' : Random Survival Forests [1] model
# |      - 'cph' : Cox Proportional Hazards [2] model
experiment = SurvivalRegressionCV(model='dsm', num_folds=6, 
                                    hyperparam_grid=param_grid)
model = experiment.fit(x, outcomes, metric='ibs',horizons=times)

times = np.quantile(outcomes.time[outcomes.event==1], [0.25, 0.5, 0.6]).tolist()

# Fit the `experiment` object with the specified Cox model.
#experiment = estimators.SurvivalModel(model='dsm')
#model = experiment.fit(x, outcomes)

times_val = np.quantile(outcomes_val.time[outcomes_val.event==1], [0.25, 0.5, 0.6]).tolist()
out_risk = model.predict_risk(x_val, times)
out_survival = model.predict_survival(x_val, times)  

print("Times:",times_val)
print("Brier scores")
print(survival_regression_metric('brs', outcomes_val, 
                                     out_survival, 
                                     times=times_val))
    

print("Time Dependent Concordance Index")
print(survival_regression_metric('ctd', outcomes_val, 
                                     out_survival, 
                                     times=times_val))

Increasing validation loss in RDSM with time-varying data

Hi auton-survival community @chiragnagpal @Jeanselme @chufangao @salvaRC , I appreciate your contribution to the time-varying survival analysis and thank you for making this library open to public.

I am having an issue with training RDSM using my own custom dataset. I made sure that my dataset looks like the one you used in jupyter notebook tutorials; my T column is the remaining time to event, and E is an event indicator. However, during training I have been continuously seeing decreasing training loss but increasing validation loss.

Then I tried to see what happens if I train with PBC dataset that you used in demo notebooks, and I noticed the same situation there; decreasing training loss and increasing validation loss.

I haven't made any changes to the methodology. Is this something intrinsic to RDSM or am I doing something wrong? These are the logs from the model

train_val_losses

Applying DSM on competing risks data

Hello!

I am trying to apply Deep Survival Machine to my dataset which has 23 events.
And the results are not as good as I expected.

In the DSM paper, with SEER data, DSM and DeepHit c-index results are pretty comparable,
but with my data DeepHit c-index results are about 10% better than DSM.

I made code based on your DSM example notebook code.
I only changed the evaluation part since the example notebook code is for a single event.

Is it possible that DSM is unsuitable for data with many events?
Or just because my code has a problem?

Here is my code.

`

from process_severance import make_data
from auton_survival.preprocessing import Preprocessor
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import ParameterGrid
from auton_survival.models.dsm import DeepSurvivalMachines
from sksurv.metrics import concordance_index_ipcw, brier_score

data_path = './data/dummy_data.csv'
input, Y, features = make_data(data_path)

cat_feats = ["SEX1"]
num_feats = features
num_feats.remove('SEX1')
        
features = Preprocessor().fit_transform(input, cat_feats=cat_feats, num_feats=num_feats)

horizons = [0.25, 0.5, 0.75]
times = np.nanquantile(Y["event_time"], horizons).tolist()

x, t, e = input.to_numpy(), Y["event_time"].to_numpy(), Y["label"].to_numpy()

kf = KFold(n_splits=5, shuffle=True, random_state=1234)
fold = 0

for train_index, test_index in kf.split(x):
    x_train     = x[train_index]
    t_train     = t[train_index]
    e_train     = e[train_index]
    x_test      = x[test_index]
    t_test      = t[test_index]
    e_test      = e[test_index]

    (x_train, x_val, t_train,t_val, e_train,e_val)  = train_test_split(x_train, t_train, e_train, test_size=0.20, random_state=1234) 
    

    param_grid = {'k' : [3, 4, 6],
            'distribution' : ['LogNormal', 'Weibull'],
            'learning_rate' : [ 1e-4, 1e-3],
            'layers' : [ [], [100], [100, 100] ]
            }
    params = ParameterGrid(param_grid)

    models = []
    for param in params:
        model = DeepSurvivalMachines(k = param['k'],
                                    distribution = param['distribution'],
                                    layers = param['layers'])
        # The fit method is called to train the model
        model.fit(x_train, t_train, e_train, iters = 100, learning_rate = param['learning_rate'])
        models.append([[model.compute_nll(x_val, t_val, e_val), model, param]])
        #break
    best_model = min(models)

    out_risk = model.predict_risk(x_test, times)
    out_survival = model.predict_survival(x_test, times)

    for ev in range(23):
        cis = []
        brs = []
        e_train_new = (e_train == ev+1)
        e_test_new = (e_test == ev+1)

        et_train = np.array([(e_train_new[i], t_train[i]) for i in range(len(e_train_new))],
                        dtype = [('e', bool), ('t', float)])
        et_test = np.array([(e_test_new[i], t_test[i]) for i in range(len(e_test_new))],
                        dtype = [('e', bool), ('t', float)])

        for i, _ in enumerate(times):
            try:
                cis.append(concordance_index_ipcw(et_train, et_test, out_risk[:, i], times[i])[0])
            except:
                cis.append(np.nan)
        try:
            brs.append(brier_score(et_train, et_test, out_survival, times)[1])
        except:
            brs.append([np.nan,np.nan,np.nan])

        for i, horizon in enumerate(horizons):
            print(f"For {horizon} quantile,")
            print("TD Concordance Index:", cis[i])
            print("Brier Score:", brs[0][i])
    fold = fold + 1

`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.