ahmedmalaa / autoprognosis Goto Github PK

Codebase for "AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization", ICML 2018.

Python 86.28% Jupyter Notebook 13.38% R 0.34%

automl automated-machine-learning bayesian-optimization

autoprognosis's Introduction

AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization

AutoPrognosis is a system for automating the design of ensembles of predictive modeling pipelines tailored for applications related to clinical prognosis. Each pipeline comprises various algorithms such as

Imputation and data processing algorithms.
Feature processing algorithms.
Classification algorithms.

The system operates using a Bayesian optimization algorithm that relies on structured kernel learning to solve the high-dimensional pipeline optimization problem. Technical details can be found in our ICML paper. An explanation of our algorithm can also be found in this video presentation.

Installation

Please refer to < /doc/install.md > for installation instructions.

Usage

You can use AutoPrognosis through its command line interface as follows

$ python3 autoprognosis.py -i <data.csv> --target <response variable> -o <outdir>  [ -n <num_sample> --it <num_iterations> ]

Once the above command is executed, the results can be found in two json files: /result.json and report.json. They can be shown with:

$ python3 autoprognosis_report.py -i <outdir>

A tutorial on how to use AutoPrognosis API can also be found in this Jupyter notebook.

Known issues

Acquisition function LCB generates excesive warnings

$ The set cost function is ignored! LCB acquisition does not make sense with cost.

This issue results from interfacing with GPyOpt's acquisition functions. The issue can be ignored.

Citation

If you use our code in your research, please cite:

@inproceedings{alaa2018autoprognosis,
  title={AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning},
  author={Alaa, Ahmed and Schaar, Mihaela},
  booktitle={International Conference on Machine Learning},
  pages={139--148},
  year={2018}
}

References

[1] A. M. Alaa and M. van der Schaar, AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning, ICML 2018.

[2] A. M. Alaa and M. van der Schaar, Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning, Nature Scientific Reports, 2018.

[3] A. M. Alaa and M. van der Schaar, Cardiovascular Disease Risk Prediction using Automated Machine Learning: A Prospective Study of 423,604 UK Biobank Participants, PLOS ONE, 2019.

autoprognosis's People

Contributors

Stargazers

Watchers

autoprognosis's Issues

Error with Installation of AutoPrognosis

Hi, I'm trying to install AutoPrognosis in my CentOS7 system. I have installed all the dependencies for AutoPrognosis in the requirements.txt and have followed the installation instructions. However, I get importing errors with initpath_ap, utilmlab. It seems no module is found for the same. Any thoughts? Many thanks in advance. Cheers!

Spam base data set

Not able to find the spambase data

R runtime error: dim(X) must have a positive length

Hi Ahmed,

I have managed to install and run AutoPrognosis on the sample data that you have used for the toturial, but it gives me an error on my dataset. The error that I get follows. Do you have any suggestions?

I would also like to know if your UCLA email address is still valid. I sent you an email to that address around a month ago. Have you seen it?

Best,
Mojgan

RRuntimeError Traceback (most recent call last)
in
6 acquisition_type=acquisition_type)
7
----> 8 AP_mdl.fit(X_, Y_)

~/.../AutoPrognosis-master/alg/autoprognosis/model.py in fit(self, X, Y)
461 rval_model = self.get_model(self.domains_[u],u,self.compons_,x_next[u])
462
--> 463 y_next_, modb_, eva_prp = self.evaluate_CV_objective(X.copy(), Y.copy(), rval_model)
464
465 eva_prp['iter'] = current_iter

~/.../AutoPrognosis-master/alg/autoprognosis/model.py in evaluate_CV_objective(self, X_in, Y_in, modraw_)
343 #mod_back.fit(X_in, Y_in)
344
--> 345 rval_eva = evaluate_clf(X_in.copy(), Y_in.copy(), copy.deepcopy(modraw_), n_folds = self.CV)
346 logger.info('CV_objective:{}'.format(rval_eva))
347 f = -1*rval_eva[0][0]

~/.../AutoPrognosis-master/alg/autoprognosis/model.py in evaluate_clf(X, Y, model_input, n_folds, visualize)
936 if is_pred_proba:
937 logger.info('+fit {} {}'.format(X_train.shape,list(set(np.ravel(Y_train)))))
--> 938 model.fit(X_train, Y_train)
939 preds = model.predict(X_test)
940 nnan = sum(np.ravel(np.isnan(preds)))

~/.../AutoPrognosis-master/alg/autoprognosis/pipelines/basePipeline.py in fit(self, X, Y, **kwargs)
108 if hasattr(self.model_list[u], 'fit_transform'): # This should be just a transform
109
--> 110 X_temp = np.array(self.model_list[u].fit_transform(X_temp)).copy()
111
112 else:

~/.../AutoPrognosis-master/alg/autoprognosis/models/imputers.py in fit_transform(self, X)
294 def fit_transform(self, X):
295
--> 296 return self.model.fit(X)
297
298 def get_hyperparameter_space(self):

~/.../AutoPrognosis-master/alg/autoprognosis/models/imputers.py in fit(self, X)
240 self.init_r_sytem()
241
--> 242 r(r_command)
243 X = r.X
244

~/.../lib/python3.7/site-packages/rpy2/robjects/init.py in call(self, string)
387 def call(self, string):
388 p = _rparse(text=StrSexpVector((string,)))
--> 389 res = self.eval(p)
390 return conversion.rpy2py(res)
391

~/.../lib/python3.7/site-packages/rpy2/robjects/functions.py in call(self, *args, **kwargs)
190 kwargs[r_k] = v
191 return (super(SignatureTranslatedFunction, self)
--> 192 .call(*args, **kwargs))
193
194

~/.../lib/python3.7/site-packages/rpy2/robjects/functions.py in call(self, *args, **kwargs)
119 else:
120 new_kwargs[k] = conversion.py2rpy(v)
--> 121 res = super(Function, self).call(*new_args, **new_kwargs)
122 res = conversion.rpy2py(res)
123 return res

~/.../lib/python3.7/site-packages/rpy2/rinterface_lib/conversion.py in _(*args, **kwargs)
26 def _cdata_res_to_rinterface(function):
27 def _(*args, **kwargs):
---> 28 cdata = function(*args, **kwargs)
29 # TODO: test cdata is of the expected CType
30 return _cdata_to_rinterface(cdata)

~/.../lib/python3.7/site-packages/rpy2/rinterface.py in call(self, *args, **kwargs)
783 error_occured))
784 if error_occured[0]:
--> 785 raise embedded.RRuntimeError(_rinterface._geterrmessage())
786 return res
787

RRuntimeError: Error in apply(is.na(xmis), 2, sum) : dim(X) must have a positive length
Calls: -> -> missForest -> apply

calibration algorithms

Hi Ahmed,

in your 2019 paper "Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants", you mention "ML modeling pipelines (comprising data imputation, feature processing, classification and calibration algorithms)".

I cannot find where in your code you chose the calibration algorithms. Can you help me with this?

I have also asked a few questions at the end my my previous issue, but I did not reopen the issue. Not sure if you have seen it.

Best regards,
Mojgan

ModuleNotFoundError: No module named 'initpath_ap'

Hello ahmedmalaa,

Thank you for putting this together.

I saw this same issue posted on July 3, but I do not see the resolution posted: #1

After installing on Windows and following the 'AutoPrognosis API Tutorial' I receive the error: ModuleNotFoundError: No module named 'initpath_ap'

Do you have any suggestions?

'Interpreter' module

Dear Sir or Madam,

Could your tell how to use the 'Interpreter' module mentioned in the paper, AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning?

Regards,
Beato

Installation and use of Autoprognosis

Dear ahmedmalaa,

Thanks for your sharing.
I have two questions when begining using your code,

As to installation, is it necessary to intall the tool in Linux anaconda, or can I use it in Windows with Anaconda and then use it under Windows system?
When I open 'AutoPrognosis API Tutorial' (AutoPrognosis/alg/autoprognosis/tutorial_autoprognosis_api.ipynb), it stops at 'import initpath_ap'' and 'import utilmlab'. I found "ModuleNotFoundError: No module named 'initpath_ap' first".
Could you give some suggestions?