Git Product home page Git Product logo

constellation's Introduction

constellation

Build Status Windows build status

Overview

Constellation contains a set of functions for applying multidimensional, time window based logic to time series data frames of arbitrary length. Constellation was developed to enable rapid and flexible identification of series of events that occur in hospitalized patients. The functions have been abstracted for general purpose use with time series data. Constellation extends and provides a friendly API to rolling joins and overlap joins implemented in data.table. Three datasets (labs, vitals, and orders) with randomly synthesized time series data for a cohort of 100 patients are included to facilitate testing of functions.

There are five functions included in constellation to build complex features from time series data:

  • value_change() identify increases or decreases in a value within a given time window
  • constellate() identify time stamps when a series of events occurs within a given time window
  • constellate_criteria() identify which events occur within a given time window for every measurement time stamp
  • bundle() identify which events occur within a given time window of a given event
  • incidents() identify distinct, incident episodes that must be separated in time by a minimum of a given time window

The constellate_criteria() and bundle() function are similar, but the bundle() function is anchored around a specific event table. The bundle() function identifies events that occur within a given time window of a specific event data frame that is supplied to the function. On the other hand, the constellate_criteria() function identifies events that occur within a given time window of any event data frame that is supplied to the function. The first data frame passed to the bundle() function is used as an anchor to search through the subsequent data frames passed to the function. The order of data frames is significant and passing different data frames as the first argument will generate different results. On the other hand, the order in which you pass data frames to the constellate_criteria() function is insignificant. Passing data frames in different orders will generate equivalent results.

Constellation can be used to build point-based scores for time series data (via constellate_criteria()), identify particular sequences of events that occur near each other (via constellate()), identify when specific changes occur for a given parameter (via value_change()), identify individual events that occur around a specified time stamp (via bundle()), and distinguish between eveents that are separated by a specified time window (via incidents()).

If you are new to constellation, the best place to start is the vignette("constellation", "identify_sepsis"). You can also view the sepsis vignette on CRAN.

Installation

You can install constellation from CRAN with:

install.packages("constellation")
library(constellation)

You can install the development version of constellation from github with:

devtools::install_github("marksendak/constellation")

If you have any questions, comments, or feedback, please email [email protected].

Example

Below are several variations of finding systolic blood pressure drops of 40 over a 6 hour period.

Examine systolic blood pressure data:

library(constellation)

systolic_bp <- vitals[VARIABLE == "SYSTOLIC_BP"]
systolic_bp[, RECORDED_TIME := as.POSIXct(RECORDED_TIME, format = "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")]
head(systolic_bp)
#>    PAT_ID       RECORDED_TIME    VALUE    VARIABLE
#> 1: 108546 2010-02-25 05:36:15 110.6677 SYSTOLIC_BP
#> 2: 108546 2010-02-25 08:41:56 116.0423 SYSTOLIC_BP
#> 3: 108546 2010-02-25 10:30:53 119.2235 SYSTOLIC_BP
#> 4: 108546 2010-02-25 11:05:43 102.9899 SYSTOLIC_BP
#> 5: 108546 2010-02-25 11:48:29 122.1348 SYSTOLIC_BP
#> 6: 108546 2010-02-25 12:14:18 119.7529 SYSTOLIC_BP

Identify the first systolic blood pressure drop per patient:

systolic_bp_drop <- value_change(systolic_bp, value = 40, direction = "down",
    window_hours = 6, join_key = "PAT_ID", time_var = "RECORDED_TIME", 
    value_var = "VALUE", mult = "first")
head(systolic_bp_drop)
#>    PAT_ID PRIOR_RECORDED_TIME PRIOR_VALUE CURRENT_RECORDED_TIME
#> 1: 108546 2010-02-25 15:45:29    139.9967   2010-02-25 20:42:35
#> 2: 112374 2010-03-09 18:18:13    160.4919   2010-03-09 20:48:09
#> 3: 113163 2010-07-27 15:50:35    170.2034   2010-07-27 19:21:58
#> 4: 124042 2010-11-24 21:34:57    163.8912   2010-11-25 02:03:14
#> 5: 135995 2010-11-21 01:51:09    157.9432   2010-11-21 03:26:00
#> 6: 146478 2010-08-27 16:07:47    179.3603   2010-08-27 21:03:05
#>    CURRENT_VALUE
#> 1:      80.07446
#> 2:     107.87212
#> 3:     116.22419
#> 4:     116.66625
#> 5:     111.55469
#> 6:     132.99234

Identify the last systolic blood pressure drop per patient:

systolic_bp_drop <- value_change(systolic_bp, value = 40, direction = "down",
    window_hours = 6, join_key = "PAT_ID", time_var = "RECORDED_TIME", 
    value_var = "VALUE", mult = "last")
head(systolic_bp_drop)
#>    PAT_ID PRIOR_RECORDED_TIME PRIOR_VALUE CURRENT_RECORDED_TIME
#> 1: 108546 2010-07-01 15:31:31    164.9851   2010-07-01 21:03:04
#> 2: 112374 2010-03-15 15:15:53    164.1634   2010-03-15 17:30:06
#> 3: 113163 2010-07-30 19:12:15    160.1682   2010-07-30 22:33:10
#> 4: 124042 2010-12-04 18:34:18    167.2564   2010-12-04 22:46:57
#> 5: 135995 2010-11-27 04:47:15    127.5603   2010-11-27 06:43:05
#> 6: 146478 2010-09-03 15:14:43    182.1690   2010-09-03 16:18:28
#>    CURRENT_VALUE
#> 1:     114.67968
#> 2:     115.95783
#> 3:     111.89387
#> 4:     118.81151
#> 5:      81.90537
#> 6:     138.28222

Identify all systolic blood pressure drops per patient:

systolic_bp_drop <- value_change(systolic_bp, value = 40, direction = "down",
    window_hours = 6, join_key = "PAT_ID", time_var = "RECORDED_TIME", 
    value_var = "VALUE", mult = "all")
head(systolic_bp_drop)
#>    PAT_ID PRIOR_RECORDED_TIME PRIOR_VALUE CURRENT_RECORDED_TIME
#> 1: 108546 2010-02-25 15:45:29    139.9967   2010-02-25 20:42:35
#> 2: 108546 2010-03-01 15:57:24    136.8654   2010-03-01 16:07:00
#> 3: 108546 2010-03-02 19:59:20    129.0167   2010-03-03 00:46:35
#> 4: 108546 2010-03-02 20:49:00    110.1830   2010-03-03 00:46:35
#> 5: 108546 2010-03-04 00:18:41    137.8095   2010-03-04 04:23:54
#> 6: 108546 2010-03-04 02:13:39    130.3280   2010-03-04 04:23:54
#>    CURRENT_VALUE
#> 1:      80.07446
#> 2:      88.88972
#> 3:      69.94551
#> 4:      69.94551
#> 5:      82.16874
#> 6:      82.16874

Why constellation?

In clinical medicine, there are a subset of conditions that are defined by a sequence of related events that unfold over time. These conditions are described as a โ€œconstellation of signs and symptoms.โ€

Another piece of medical jargon that made it into the package is the concept of a treatment bundle. The bundle() function was originally designed to calculate the time stamp at which a group of treatments is delivered for every patient within a specified amount of time of developing a condition.

Duke Institute for Health Innovation

constellation was originally developed to support a machine learning project at the Duke Institute for Health Innovation to predict sepsis.

constellation's People

Contributors

marksendak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

constellation's Issues

Constellate Without Time Window

  • Handle case where you pass an arbitrary number of time series data frames and you identify the first instant at which a variety of events take place
  • Use case: sepsis treatment bundle, want to identify the first instant at which all components of the treatment bundle have been delivered

Add bundle function

  • Ingest an arbitrary number of bundle time series data frames and identify instances where bundle items were administered within a specified window_hours of events

by EachI warning

Update mult option implementation to remove dependency on "by = join_key"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.