Git Product home page Git Product logo

enbpi's Introduction

Ensemble batch prediction intervals (EnbPI)

Table of Contents

Important Notes

  • The code and related material in this main branch are for our ICML 2021 oral work, titled Conformal Prediction Interval for Dynamic Time-series (Xu et al. 2021a). Please use codes in this repository, as those downloaded from PMLR are not the most updated ones. You may direct any inquiries either to Chen Xu ([email protected]) or Yao Xie ([email protected]). The work is constantly updated to incorporate new feedback and ideas.
  • We have significantly revised and extended the ICML 2021 work, which is now under revision by the Journal of Machine Learning Research. The most recent version is available on arxiv. The JMLR_code branch contains updated codes, which essentially follow the same structure as those in this branch. Nevertheless, feel free to message us if you have any question regarding either branch.
  • We are excited that the work is being integrated as a part of the MAPIE, which is a scikit-learn-compatible module for predictive inference.
  • Please cite our work via either way as shown below if you find it interesting and inspiring to your work. As explained earlier, the arxiv version is more extensive and recent, but is still under revision.
@InProceedings{pmlr-v139-xu21h,
  title = 	 {Conformal prediction interval for dynamic time-series},
  author =       {Xu, Chen and Xie, Yao},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11559--11569},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR}
@misc{xu2021conformal,
      title={Conformal prediction for dynamic time-series}, 
      author={Chen Xu and Yao Xie},
      year={2021},
      eprint={2010.09107},
      archivePrefix={arXiv},
      primaryClass={stat.ME}
}

How to use

  • Required Dependency:
    • Basic modules: numpy, pandas, sklearn, scipy, matplotlib, seaborn.
    • Additional modules: statsmodels for implementing ARIMA, keras for building neural network and recurrent neural networks, and pyod for competing anomaly detection methods.
  • General Info and Tests: This work reproduces all experiments in Conformal Prediction Interval for Dynamic Time-series (Xu et al. 2021a). In particular,
    • tests_paper.ipynb provides an illustration of how to generate the main figures (Figure 1-4) in the paper. The code contents are nearly identical to those in tests_paper.py.
    • tests_paper+supp.py reproduces all figures, including additional ones found in the Appendix. It is written in Jupyter notebook format, so that they are meant to be executed line by line.
  • EnbPI implementation:
    • PI_class_EnbPI.py implements the class that contains EnbPI (line), Jackknife+-after-bootstrap (line, paper), Split/Inductive Conformal (line, paper), and Weighted Inductive Conformal (line, paper). We used ARIMA as another competing method.
    • Because conditional coverage (Figure 3) and anomaly detection (Figure 4) require problem-specific modifications, the code changes are not contained in "PI_class_EnbPI.py" but in their respective sections within tests_paper.py/tests_paper+supp.py.
  • Other Function Files:
    • utils_EnbPI.py primarily contain plotting functions for all figures except Figure 4.
    • PI_class_ECAD.py implements ECAD based on EnbPI (see [Xu et al. 2021a, Section 8.5, Algorithm 2]) and utils_ECAD.py contains helpers for anomaly detection.
  • Additional Files
    • The Data repository contains all dataset in our paperexcept the money laundry one for Figure 4 (due to size limit).
    • The Results repository is provided for your convenience to reproduce plots, since some experiments by neural network/recurrent neural networks can take some time to execute. It contains all .csv results files on all dataset.
  • Broad Usage: To wrap EnbPI around other regression models and/or use on other data, one should:
    • If other regression models: Make sure the model has methods .fit(X_train, Y_train) to train a predictor and .predict(X_predict) to make predictions on new data. Most models in sklearn or deep learning models built by keras/pytorch are capable of doing so.
    • If other data: We have assumed that all our datasets are save as pandas.DataFrame and convertible to numpy.array. However, such assumptions are purely computational. Please feel free to adjust the data formate as long as it can be processed by regression models of choice.

Poster Talk and Slides

  • The poster for our work is available via this link.
  • We are fortunate to pre-record a long presentation and give an oral presentation at the Proceedings of the 38th International Conference on Machine Learning (ICML 2021). The long presentation is available on Slideslive and the oral presentation will be given at the conference once the date is finalized.
  • The slide for the talk is available here.

Extension Works and Ideas

  • Conformal Anomaly Detection on Spatio-Temporal Observations with Missing Data (Xu et al. 2021b) is our recent applicable work on adopting EnbPI for detecting anomalous traffic flows. Our method significantly outperforms competing methods (Table 1). It has been accepted by the Distribution-free Uncertainty Quantification Workshop in ICML 2021 (DFUQ 2021), with the Poster here.
  • We are also actively exploring ways further improve EnbPI in classification, so that the resulting prediction sets work well for correlated categorical observations.

FAQ

  1. Encountering "NotImplementedError: Cannot convert a symbolic Tensor (lstm_2/strided_slice:0) to a numpy array" when using RNN as the regression model:
  • See this github answer to resolve the problem, primarily due to numpy & python version issues.

References

  • Xu, Chen and Yao Xie (2021a). Conformal prediction interval for dynamic time-series. The Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021.
  • Xu, Chen and Yao Xie (2021b). Conformal prediction for dynamic time-series. Under review by the Journal of Machine Learning Research
  • Xu, Chen and Yao Xie (2021c). Conformal Anomaly Detection on Spatio-Temporal Observations with Missing Data. arXiv: 2105.11886 [stat.AP].

enbpi's People

Contributors

hamrel-cxu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.