Git Product home page Git Product logo

set_functions_for_time_series's Introduction

SeFT - Set functions for Time Series

This is the main source code for the submission Set Functions for Time Series. It depends on two further packages keras-transformer (fork with support for sequences of different lengths) and medical-ts-datasets (containing the implementation of the datasets used).

Citing our work

@InProceedings{pmlr-v119-horn20a,
  title = {Set Functions for Time Series},
  author = {Horn, Max and Moor, Michael and Bock, Christian and Rieck, Bastian and Borgwardt, Karsten},
  booktitle = {Proceedings of the 37th International Conference on Machine Learning},
  pages = {4353--4363},
  year = {2020},
  editor = {Hal Daumé III and Aarti Singh},
  volume = {119},
  series = {Proceedings of Machine Learning Research},
  month = {13--18 Jul},
  publisher = {PMLR},
}

Installation

The project requires python version 3.7 (newer versions of python are unfortunately incompatible with tensorflow 1.15.x) and is set up to use poetry packaging utility. The easiest way to get started is to create a virtual python environment and to install the package inside this environment.

git clone https://github.com/BorgwardtLab/Set_Functions_for_Time_Series.git
cd Set_Functions_for_Time_Series
# Install poetry
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
# Install the `SeFT` package with its dependencies
poetry install

Quickstart

One can quickly test the models on one of the online available datasets. If they are not yet downloaded, they will be automatically downloaded and stored in the directory ~/tensorflow_datasets.

Example usage (in SeFT subdirectory):

$ poetry run seft_fit_model --dataset physionet2019 --balance --log_dir test_transformer TransformerModel 

Recreate run using following command:
seft_fit_model --random_seed 982927477 --dataset physionet2019 --balance \
  --max_epochs 300 --early_stopping 30 --log_dir test_transformer \
  TransformerModel --learning_rate 0.001 --batch_size 64 --warmup_steps 1000 \
  --n_dims 128 --n_heads 4 --n_layers 1 --dropout 0.0 --attn_dropout 0.0 \
  --aggregation_fn mean --max_timescale 100.0
Train on 176 steps, validate on 101 steps
Epoch 1/300
  5/176 [..............................] - ETA: 8:18 - loss: 0.1676 - acc: 0.1056 

Usage

$ poetry run seft_fit_model --help
usage: seft_fit_model [-h] [--max_epochs MAX_EPOCHS]
                      [--random_seed RANDOM_SEED] [--debug]
                      [--log_dir LOG_DIR] [--early_stopping EARLY_STOPPING]
                      [--dataset {physionet2012,physionet2019,mimic3_mortality,mimic3_phenotyping}]
                      [--balance] [--hypersearch]
                      {config,GRUSimpleModel,PhasedLSTMModel,InterpolationPredictionModel,GRUDModel,TransformerModel,DeepSetAttentionModel}
                      ...

positional arguments:
  {config,GRUSimpleModel,PhasedLSTMModel,InterpolationPredictionModel,GRUDModel,TransformerModel,DeepSetAttentionModel}

optional arguments:
  -h, --help            show this help message and exit
  --max_epochs MAX_EPOCHS
  --random_seed RANDOM_SEED
  --debug
  --log_dir LOG_DIR     Where to log results. If ends on backslash assume we
                        need to create a directory
  --early_stopping EARLY_STOPPING
  --dataset {physionet2012,physionet2019,mimic3_mortality,mimic3_phenotyping}
  --balance        Balance the dataset
  --hypersearch    Sample hyperparameters

For each model all hyperparameters can be set using the command line:

poetry run seft_fit_model GRUDModel --help
usage: seft_fit_model GRUDModel [-h] [--learning_rate LEARNING_RATE]
                                [--batch_size BATCH_SIZE]
                                [--warmup_steps WARMUP_STEPS]
                                [--n_units N_UNITS] [--dropout DROPOUT]
                                [--recurrent_dropout RECURRENT_DROPOUT]

optional arguments:
  -h, --help            show this help message and exit
  --learning_rate LEARNING_RATE
  --batch_size BATCH_SIZE
  --warmup_steps WARMUP_STEPS
  --n_units N_UNITS
  --dropout DROPOUT
  --recurrent_dropout RECURRENT_DROPOUT

Available datasets

While the physionet2012 and physionet2019 datasets are publicly available and thus can be automatically downloaded, this is not the case for the mimic3_mortality dataset. Here, after fulfilling the requirements for access the data needs to be manually downloaded and provided in the form of a compressed file. We are planning to make this file available in the MIMIC-III preprocessed data repository.

Adding new datasets

The datasets used in this code are implemented in a separate package medical_ts_dataset. In total you would need to run the following steps:

  1. Implement dataset in a fork of medical_ts_datasets or create you own package with the implementation. The easiest way is probably to adapt one of the readers to fit your format (see the medical_ts_datasets package). For some further information I recommend consulting the tfds documentation. In the end the following code should be able to run:
    import tensorflow_datasets as tfds
    import medical_ts_datasets      # this registers your dataset or any other dataset with tensorflow datasets
    import my_package_with_dataset  # alternatively if you decide to implement you datasets in a separate package
    tfds.load('<your_dataset_name>')
  2. Add a an entry to the directory here defining which type of endpoint you dataset provides.
  3. Optional, if you decide to implement the dataset in a separate package) add import statements to you package here

Then you should be able to run all models in this codebase on your data. On a final note as I assume you are working with medical data (which is usually not publically accessable on the internet) this section of the tfds documentation might come in handy.

Available models

The following models are supported: GRUSimpleModel, PhasedLSTMModel, InterpolationPredictionModel, GRUDModel, TransformerModel, DeepSetAttentionModel

It was brought to my attention, that the GRUD implementation is not entirely in line with the original paper. For further information please refer to this issue.

Copyright

This collective work is copyright (c) 2020 by Max Horn. Individual portions may be copyright by individual contributors, and are included in this collective work with permission of the copyright owners.

set_functions_for_time_series's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

set_functions_for_time_series's Issues

Tensorflow_datasets version

Hi,

If I install Seft using poetry, the environment contains tensorflow 1.15 and tensorflow_datasets 4.2, thus when I try to generate my own data set using the generated environment, I run into a dependency error (as tfds 4.2 requires tf 2.1).

Maybe the solution is to add tfds == 2.0 explicitly in SEFT pyproject.toml file?

Thank you and best regards.

Thank you for work! A few questions

Hi there,

Thank you for both the medical_ts_dataset repo and this repo. Very reproducible work, which is super exciting for healthcare research.

I was wondering if you can help me reconcile your work and this paper: https://arxiv.org/abs/2101.10318

Both has a MIMIC3 table using AUROC as the point of comparison. However, the numbers and ordering of methods are quite different. I understand that this work followed mimic3-benchmarks train-test-val splits and IP-net/mTAND followed a different data split. However, do you suppose that would contribute to such a difference , or could it also be some sort of data-processing/normalization process that are different between the two works?

Mainly, I'm also interested in the authors take on the current research progress for irregular time-series classification. There have been many papers demonstrating superior performance over GRU-D yet in this work, GRU-D is quite strong if not the best for MIMIC3, in addition to being very strong for Physionet 2012 and 2019. (I also followed up with the issue between different masking for x_t and m_t, and setting them to be the same doesn't downgrade performance that much, which implies that GRU-D is still strong). Is it still SOTA?

Runnig on MIMIC III Dataset

Can you please elaborate how to run models on mimic3 dataset? I have access for the dataset but I am not sure in which directory I should put the data.

Issues when installing repository

Hi, I follow the instructions and when comes the moment of running the model I get the following error

File "<string>", line 1, in <module> File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 677, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 728, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/opt/seft-hypoxia/seft/cli/fit_model.py", line 13, in <module> from seft.training_routine import TrainingLoop File "/opt/seft-hypoxia/seft/training_routine.py", line 12, in <module> from .callbacks import ( File "/opt/seft-hypoxia/seft/callbacks.py", line 8, in <module> import tensorflow_datasets as tfds File "/opt/virtualenvs/seft-NyYu6qKP-py3.7/lib/python3.7/site-packages/tensorflow_datasets/__init__.py", line 51, in <module> from tensorflow_datasets import __init__py3 as api File "/opt/virtualenvs/seft-NyYu6qKP-py3.7/lib/python3.7/site-packages/tensorflow_datasets/__init__py3.py", line 43, in <module> from tensorflow_datasets.core import tf_compat File "/opt/virtualenvs/seft-NyYu6qKP-py3.7/lib/python3.7/site-packages/tensorflow_datasets/core/__init__.py", line 21, in <module> tf_compat.ensure_tf_install() File "/opt/virtualenvs/seft-NyYu6qKP-py3.7/lib/python3.7/site-packages/tensorflow_datasets/core/tf_compat.py", line 63, in ensure_tf_install "This version of TensorFlow Datasets requires TensorFlow " ImportError: This version of TensorFlow Datasets requires TensorFlow version >= 2.1.0; Detected an installation of version 1.15.3. Please upgrade TensorFlow to proceed.
My environment is a singularity container based on ubuntu docker image with python 3.7.4

I installed poetry using pip3 install, as it was the only way I found to make poetry persistent in the container.

installation problem with keras transformer

Hi!
I am trying to run quick start following your installation guide
but having problem with installing keras-transformer with poetry
It looks like there are some version problem.
Any help to solve this problem would be so much appreciated!

Thanks for amazing job

Installing keras-transformer (0.1.dev27+g347cb21 /workspace/inkyung/Seminar/Set_Functions_for_Time_Series/repos/keras-transformer): Failed
EnvCommandError

Command ['/root/.cache/pypoetry/virtualenvs/seft-ih-2Jpha-py3.7/bin/pip', 'install', '--no-deps', '-U', '/workspace/inkyu
ng/Seminar/Set_Functions_for_Time_Series/repos/keras-transformer'] errored with the following return code 1, and output:
Processing ./repos/keras-transformer
ERROR: Command errored out with exit status 1:
command: /root/.cache/pypoetry/virtualenvs/seft-ih-2Jpha-py3.7/bin/python -c 'import sys, setuptools, tokenize; sys.
argv[0] = '"'"'/tmp/pip-req-build-h6aratez/setup.py'"'"'; file='"'"'/tmp/pip-req-build-h6aratez/setup.py'"'"';f=getattr
(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-dk1gxjae
cwd: /tmp/pip-req-build-h6aratez/
Complete output (28 lines):
WARNING: The wheel package is not available.
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-req-build-h6aratez/setup.py", line 35, in
include_package_data=True,
File "/root/.cache/pypoetry/virtualenvs/seft-ih-2Jpha-py3.7/lib/python3.7/site-packages/setuptools/init.py", line 163, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.7/distutils/core.py", line 108, in setup
_setup_distribution = dist = klass(attrs)
File "/root/.cache/pypoetry/virtualenvs/seft-ih-2Jpha-py3.7/lib/python3.7/site-packages/setuptools/dist.py", line 430, in init
k: v for k, v in attrs.items()
File "/usr/lib/python3.7/distutils/dist.py", line 292, in init
self.finalize_options()
File "/root/.cache/pypoetry/virtualenvs/seft-ih-2Jpha-py3.7/lib/python3.7/site-packages/setuptools/dist.py", line 721, in finalize_options
ep(self)
File "/root/.cache/pypoetry/virtualenvs/seft-ih-2Jpha-py3.7/lib/python3.7/site-packages/setuptools/dist.py", line 728, in _finalize_setup_keywords
ep.load()(self, ep.name, value)
File "/tmp/pip-req-build-h6aratez/.eggs/setuptools_scm-5.0.1-py3.7.egg/setuptools_scm/integration.py", line 26, inversion_keyword
dist.metadata.version = _get_version(config)
File "/tmp/pip-req-build-h6aratez/.eggs/setuptools_scm-5.0.1-py3.7.egg/setuptools_scm/init.py", line 173, in _get_version
parsed_version = _do_parse(config)
File "/tmp/pip-req-build-h6aratez/.eggs/setuptools_scm-5.0.1-py3.7.egg/setuptools_scm/init.py", line 142, in _do_parse
"use git+https://github.com/user/proj.git#egg=proj" % config.absolute_root
LookupError: setuptools-scm was unable to detect version for '/tmp/pip-req-build-h6aratez'.

  Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.

  For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj
  ----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
WARNING: You are using pip version 20.2.2; however, version 20.3.3 is available.
You should consider upgrading via the '/root/.cache/pypoetry/virtualenvs/seft-ih-2Jpha-py3.7/bin/python -m pip install --upgrade pip' command.

at ~/.poetry/lib/poetry/utils/env.py:1074 in run
1070│ output = subprocess.check_output(
1071│ cmd, stderr=subprocess.STDOUT, **kwargs
1072│ )
1073│ except CalledProcessError as e:
→ 1074│ raise EnvCommandError(e, input=input
)
1075│
1076│ return decode(output)
1077│
1078│ def execute(self, bin, *args, **kwargs):
`

Issue with running this project on M3 Macbook Pro

When trying to run the following cmd:

$ poetry run seft_fit_model --dataset physionet2019 --balance --log_dir test_transformer TransformerModel

I get the following error on my M3 Macbook Pro:

zsh: illegal hardware instruction poetry run seft_fit_model --dataset physionet2019 --balance --log_dir

Issue on Masking Transformer Model

Dear BorgwardtLab,

I am ran the Transformer model with the code in this repo. I assumed that this code is the Transformer model without masking. However, I am curious about the way to run this model with masking. It would be my appreciation if I can get your help! Thank you!

Question about GRU-D implementation

Hi there, I have a question about calculating dp_mask for x_t and m_dp_mask for m_t in your GRU-D implementation (file gru_d.py).

First, the dp_mask is generated from GRUCell built-in function get_dropout_mask_for_cell: code
Then, the dropout mask m_dp_mask for masking vector m_t is generated by calling _generate_dropout_mask: code
By doing so, the dp_mask and m_dp_mask zero out different elements in two inputs x_t and m_t. I can reproduce your result, however, I think that the dropout masks should be the same for x_t and m_t. Can you please clarify this for me? Did I misunderstand something in the core TensorFlow implementation/your implementation?

Thanks for the great work!

Running on different data sets

Hello, I am interested in running your algorithm and your code for my data (irregularly sampled time series), do you have documentation or advice on how can I do that?

Thank you and best regards.

Inquiry

Hi Max, in the "SF for TS" paper you called your novel model "SeFT-ATTN". I was expecting to see the exact same name in this repo to see how you implemented it, but can't find it anywhere. Does the "deep_set_attention.py" file contain the SeFT-ATTN implementation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.