Git Product home page Git Product logo

ricu's Introduction

Lifecycle CRAN R build status pkgdown build status covr status Codecov test coverage

Working with ICU datasets, especially with publicly available ones as provided by PhysioNet in R is facilitated by ricu, which provides data access, a level of abstraction to encode clinical concepts in a data source agnostic way, as well as classes and utilities for working with the arising types of time series datasets.

Citation

To cite ricu, please use the following:

@article{bennett2023ricu,
  title={ricu: R’s interface to intensive care data},
  author={Bennett, Nicolas and Ple{\v{c}}ko, Drago and Ukor, Ida-Fong and Meinshausen, Nicolai and B{\"u}hlmann, Peter},
  journal={GigaScience},
  volume={12},
  pages={giad041},
  year={2023},
  publisher={Oxford University Press}
}

Installation

Currently, installation is only possible from github directly, using the remotes if installed

remotes::install_github("eth-mds/ricu")

or by sourcing the required code for installation from github by running

rem <- source(
  paste0("https://raw.githubusercontent.com/r-lib/remotes/main/",
         "install-github.R")
)
rem$value("eth-mds/ricu")

In order to make sure that some useful utility packages are installed as well, consider installing the packages marked as Suggests as well by running

remotes::install_github("eth-mds/ricu", dependencies = TRUE)

instead, or by installing some of the utility packages (relevant for downloading and preprocessing PhysioNet datasets)

install.packages("xml2")

and demo dataset packages

install.packages(c("mimic.demo", "eicu.demo"),
                 repos = "https://eth-mds.github.io/physionet-demo")

explicitly.

Data access

Out of the box (provided the two data packages mimic.demo and eicu.demo are available), ricu provides access to the demo datasets corresponding to the PhysioNet Clinical Databases eICU and MIMIC-III. Tables are available as

mimic_demo$admissions
#> # <mimic_tbl>: [129 βœ– 19]
#> # ID options:  subject_id (patient) < hadm_id (hadm) < icustay_id (icustay)
#> # Defaults:    `admission_type` (val)
#> # Time vars:   `admittime`, `dischtime`, `deathtime`, `edregtime`, `edouttime`
#>     row_id subject_id hadm_id admittime           dischtime
#>      <int>      <int>   <int> <dttm>              <dttm>
#> 1    12258      10006  142345 2164-10-23 21:09:00 2164-11-01 17:15:00
#> 2    12263      10011  105331 2126-08-14 22:32:00 2126-08-28 18:59:00
#> 3    12265      10013  165520 2125-10-04 23:36:00 2125-10-07 15:13:00
#> 4    12269      10017  199207 2149-05-26 17:19:00 2149-06-03 18:42:00
#> 5    12270      10019  177759 2163-05-14 20:43:00 2163-05-15 12:00:00
#> …
#> 125  41055      44083  198330 2112-05-28 15:45:00 2112-06-07 16:50:00
#> 126  41070      44154  174245 2178-05-14 20:29:00 2178-05-15 09:45:00
#> 127  41087      44212  163189 2123-11-24 14:14:00 2123-12-30 14:31:00
#> 128  41090      44222  192189 2180-07-19 06:55:00 2180-07-20 13:00:00
#> 129  41092      44228  103379 2170-12-15 03:14:00 2170-12-24 18:00:00
#> # β„Ή 124 more rows
#> # β„Ή 14 more variables: deathtime <dttm>, admission_type <chr>,
#> #   admission_location <chr>, discharge_location <chr>, insurance <chr>,
#> #   language <chr>, religion <chr>, marital_status <chr>, ethnicity <chr>,
#> #   edregtime <dttm>, edouttime <dttm>, diagnosis <chr>,
#> #   hospital_expire_flag <int>, has_chartevents_data <int>

and data can be loaded into an R session for example using

load_ts("labevents", "mimic_demo", itemid == 50862L,
        cols = c("valuenum", "valueuom"))
#> # A `ts_tbl`: 299 βœ– 4
#> # Id var:     `icustay_id`
#> # Index var:  `charttime` (1 hours)
#>     icustay_id charttime valuenum valueuom
#>          <int> <drtn>       <dbl> <chr>
#> 1       201006   0 hours      2.4 g/dL
#> 2       203766 -18 hours      2   g/dL
#> 3       203766   4 hours      1.7 g/dL
#> 4       204132   7 hours      3.6 g/dL
#> 5       204201   9 hours      2.3 g/dL
#> …
#> 295     298685 130 hours      1.9 g/dL
#> 296     298685 154 hours      2   g/dL
#> 297     298685 203 hours      2   g/dL
#> 298     298685 272 hours      2.2 g/dL
#> 299     298685 299 hours      2.5 g/dL
#> # β„Ή 294 more rows

which returns time series data as ts_tbl object.

Acknowledgments

This work was supported by grant #2017-110 of the Strategic Focal Area β€œPersonalized Health and Related Technologies (PHRT)” of the ETH Domain for the SPHN/PHRT Driver Project β€œPersonalized Swiss Sepsis Study”.

ricu's People

Contributors

dplecko avatar mlondschien avatar nbenn avatar prockenschaub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ricu's Issues

`admittime` stored as `difftime`, not `datetime`

I'd like to write a concept retrieving the day of the week at admission.

    "adm_dow": {
        "description": "Admission day of the week",
        "target": "id_tbl",
        "type": "fct_cncpt",
        "levels": [
            "Monday",
            "Tuesday",
            "Wednesday",
            "Thursday",
            "Friday",
            "Saturday",
            "Sunday"
        ],
        "sources": {
            "mimic": [
                {
                    "table": "admissions",
                    "val_var": "admittime",
                    "class": "col_itm",
                    "callback": "transform_fun(weekdays)"
                }
            ],
            "miiv": [
                {
                    "table": "admissions",
                    "val_var": "admittime",
                    "class": "col_itm",
                    "callback": "transform_fun(weekdays)"
                }
            ],
            "hirid": [
                {
                    "table": "general",
                    "val_var": "admissiontime",
                    "class": "col_itm",
                    "callback": "transform_fun(weekdays)"
                }
            ]
        }
    },

However,

concepts = load_concepts("adm_dow", "mimic")

fails with

Error in UseMethod("weekdays") : 
  no applicable method for 'weekdays' applied to an object of class "difftime"
In addition: Warning message:
Ignoring argument passed as `...` 

Some monkey-debugging yields that admittime (and admissiontime in hirid) is internally transformed to a difftime (relative to hospital admission?) somewhere. Why? How can I implement the above concept?

SOFA scores calculation in eICU

Dear ricu team, I am curious about how do you calculate the sofa components scores for the eICU database?

Looks like there is no valid method and derived table for eICU Sofa score calculation.

Error opening MIMIC IV

I have an error opening MIMIC IV using RICU package.

The code i'm writting is the next:

remotes::install_github("eth-mds/ricu", dependencies = TRUE)

Sys.setenv(RICU_PHYSIONET_USER = "xxxxxx" ,
RICU_PHYSIONET_PASS = "xxxxxxx")
library(ricu)
download_src("miiv")

Error in stop_ricu():
! Access to the requested resource was denied. Please set up an
account at https://physionet.org/ and apply for data access.
Backtrace:

  1. ricu::download_src("miiv")
  2. ricu:::download_src.character("miiv")
  3. base::Map(...)
  4. base::mapply(FUN = f, ..., SIMPLIFY = FALSE)
  5. ricu:::download_src.src_cfg(...)
  6. ricu:::download_check_data(...)
  7. ricu:::get_sha256(url, user, pass)
  8. ricu:::download_pysionet_file(...)
  9. ricu:::stop_ricu(...)

###########################################
###########################################

I don't have problems with eicu_demo and mimic_demo.
I have the credentials of mimic IV
My computer pass the requirements.

Sure i'm doing something (stupid) wrong

Odd Sepsis Labels with eICU

Hi there,

and first of all, thanks for the great work, I just noticed some oddities with regard to Sepsis 3 labels

generated by

sepsis_data <- sep3(sofa_data, si_data, si_window = "any", si_lwr = hours(48L),
                    si_upr = hours(24L), keep_components = TRUE,
                    interval = mins(15L)
)

for the eICU dataset.

There are some assigned labels for which the SI-window seems to be violated. e.g. 144 hours between the last SI-event and the determining SOFA score increase, considering the Sepsis-3 requirements in the documentation.

Is this an expected artefact due the way the labels are generated for eICU under the hood or is there something else off that might need to be taken care of?

See the plot down below for an example:

odd_sepsis_label

Thanks again!

Rounding of time variables

When trying to calculate some of the event times by hand, I noticed that my results were sometimes ever so slightly off. For example, instead of -52 minutes I would get -51 minutes. I did some digging and think it is due to an inconsistent use of rounding when changing time precision.

round is for example used when calculating minutes in in id_win_helper for MIMIC.

as_dt_min <- function(x, y) round(difftime(x, y, units = "mins"))

On the other hand, trunc is used when loading MIMIC or when re_timeing.

ricu/R/utils-misc.R

Lines 78 to 80 in 9e76c07

round_to <- function(x, to = 1) {
if (all_equal(to, 1)) trunc(x) else to * trunc(x / to)
}

I think all time rounding in ricu should use the same rounding rule. I would further argue that neither round nor trunc are the optimal choice. Both group events that happened before the nominal time into this bucket. For example, round will put both -0.5 hours and 0.5 hours into bucket 0. However, if the reference date is ICU admission, this pools pre-admission and post-admission info. Of course, this often may not matter because of imprecision in recording but nevertheless. trunc is even worse, as it would put both -0.99 hours and 0.99 hours into bucket 0, effectively creating a two-hour bucket.

I would instead opt for the use of floor throughout ricu. This will never include pre-admission information in a non-negative bucket number and keeps all buckets at the same duration.

import_src,'All required tables have already been imported '

Hi thanks for the great package!

I want to load the local dataset again, e.g. MIMIC-IV. But when we
import_src('miiv', <miiv path>)
again after
detach('miiv')
and
unlink(<miiv path>),
I found that it doesn't load the data again.

The console output is

> import_src('miiv', './local_data/miiv')
Warning message:
All required tables have already been imported

We appreciate any help you may be able to give.

Memory Requirement of Importing MIMIC-IV

Hi,

In section 5 of vignette https://cran.r-project.org/web/packages/ricu/vignettes/jss.pdf , it is written that memory requirements are kept low around 8-16 GB.

However, when I am importing MIMIC-IV on my laptop with 16 GB memory, my R session was terminated due to insufficient memory.

Later, I could only import MIMIC-IV on another device where the process used ~30 GB memory at peak time.

I just wanted to confirm if this is an expected behavior. If yes, maybe you can update the vignette in that regard.

Best,

Release of MIMIC IV version 2.0 and database versioning

Version 2.0 of MIMIC IV was released on June 12th 2022: https://physionet.org/content/mimiciv/2.0/

Problems

So far, this new version has led to the following issues for me:

  1. Warning messages related to the expected number of rows, e.g.:
# 1: expected 523740 rows but got 454324 rows for table `admissions` 
  1. Omission of the tables admissions, patients, and transfers when running import_src('miiv'), as they have been moved from the core module into the hosp module and are thus not recognised.

Question

A quick fix for this is updating the row numbers and paths in the config file.

However, this raises a bigger point about whether ricu should allow for versioning of databases:

  • In its simplest form, this can be solved with a separate config file for each version (with the latest version as the default).
  • If concepts/items become available or cease to exist in a newer version, this would also require to version concepts. A sensible default here could be to treat concepts without version info to apply to every version of the database (thus making every currently available concept available by default) and only apply versioning if a concept is actively marked as only being available for certain versions.

How to cite?

Hi,

I'd like to cite ricu. I couldn't find an associated paper or any other associated DOI. Did I miss it? if not, could you please add one using for example zenodo? Cheers

Best practice for col names when combining datasets

Problem

When combining databases (say MIMIC III and eICU), the names of the ID variables and the time variable depend on the order in which sources are passed to load_concepts. See the following reprex inspired by the quick start guide:

library(ricu)

src <- c("mimic_demo", "eicu_demo")

load_concepts("alb", src, verbose = FALSE)
#> # A `ts_tbl`: 6,657 βœ– 4
#> # Id vars:    `source`, `icustay_id`
#> # Units:      `alb` [g/dL]
#> # Index var:  `charttime` (1 hours)
#>       source     icustay_id charttime   alb
#>       <chr>           <int> <drtn>    <dbl>
#>     1 eicu_demo      141765  -2 hours   3.7
#>     2 eicu_demo      144815  -3 hours   4.2
#>     3 eicu_demo      144815   8 hours   3.6
#>     4 eicu_demo      145427  -6 hours   3.7
#>     5 eicu_demo      147307  -6 hours   3.5
#>     …
#> 6,653 mimic_demo     298685 130 hours   1.9
#> 6,654 mimic_demo     298685 154 hours   2
#> 6,655 mimic_demo     298685 203 hours   2
#> 6,656 mimic_demo     298685 272 hours   2.2
#> 6,657 mimic_demo     298685 299 hours   2.5

load_concepts("alb", rev(src), verbose = FALSE)
#># A `ts_tbl`: 6,657 βœ– 4
#># Id vars:    `source`, `patientunitstayid`
#># Units:      `alb` [g/dL]
#># Index var:  `labresultoffset` (1 hours)
#>      source     patientunitstayid labresultoffset   alb
#>      <chr>                  <int> <drtn>          <dbl>
#>    1 eicu_demo             141765  -2 hours         3.7
#>    2 eicu_demo             144815  -3 hours         4.2
#>    3 eicu_demo             144815   8 hours         3.6
#>    4 eicu_demo             145427  -6 hours         3.7
#>    5 eicu_demo             147307  -6 hours         3.5
#>    …
#>6,653 mimic_demo            298685 130 hours         1.9
#>6,654 mimic_demo            298685 154 hours         2
#>6,655 mimic_demo            298685 203 hours         2
#>6,656 mimic_demo            298685 272 hours         2.2
#>6,657 mimic_demo            298685 299 hours         2.5
#># … with 6,647 more rows

As you can see, although the information is exactly the same, the names depend on the order of src. This prevents me for example from simply appending two concepts from different databases:

bind_rows(
  load_concepts("alb", "mimic_demo", verbose = FALSE),
  load_concepts("alb", "eicu_demo", verbose = FALSE)
)

#> # A `ts_tbl`: 6,657 βœ– 5
#> # Id var:     `icustay_id`
#> # Index var:  `charttime` (1 hours)
#>       icustay_id charttime   alb patientunitstayid labresultoffset
#>            <int> <drtn>    <dbl>             <int> <drtn>
#>     1         NA  NA hours   3.4           3352333   2 hours
#>     2         NA  NA hours   3.3           3352333  11 hours
#>     3         NA  NA hours   3.1           3352333  36 hours
#>     4         NA  NA hours   3.4           3353113 -36 hours
#>     5         NA  NA hours   3.6           3353113  10 hours
#>     …
#> 6,653     201006   0 hours   2.4                NA  NA hours
#> 6,654     203766 -18 hours   2                  NA  NA hours
#> 6,655     203766   4 hours   1.7                NA  NA hours
#> 6,656     204132   7 hours   3.6                NA  NA hours
#> 6,657     204201   9 hours   2.3                NA  NA hours
#> # … with 6,647 more rows

Question

Am I missing something obvious here and am I supposed to do something differently? I did find the helper function id_vars and index_var that can help me recover what the names are but this seems cumbersome and does not allow me to only merge on a specific ID level (e.g. admissions) without remembering what this colum was called in the first database I passed to load_concept.

What was the reasoning underlying this design choice and would it be more practical to rename them directly to patient, hadm, and icustay, as returned e.g. by as_id_cfg(mimic_demo)?

Problems with MIMIC IV download and setup

Hi,

I am trying to download MIMIC IV through ricu on a Macbook M1 2020. When I use setup_src_data() I get the following error message (below), which I understand from previous posts on the same topic that it has something to do with a new version of the dataset. Earlier replies say that the new version of ricu (0.6.0) should have fixed this, but I am still getting this error.

> setup_src_data("miiv")
Error in `assert_that()`:
! hosp/caregiver.csv.gz was not found among the provided options. Did you possibly mean
icu/caregiver.csv.gz instead?
Run `rlang::last_trace()` to see where the error occurred.

I have also downloaded all the files and tried to import them by setting my directory to the files and using import_src(). The following code says that I seem to have imported all the tables (see below), but when I try to access the files it says "Data for miiv is missing. Setup now?".

> import_src("miiv", verbose = T)
Tables `admissions`, `patients`, `transfers`, `d_hcpcs`, `diagnoses_icd`, `d_icd_diagnoses`,
`d_icd_procedures`, `d_labitems`, `drgcodes`, `emar_detail`, `emar`, `hcpcsevents`,
`labevents`, `microbiologyevents`, `pharmacy`, `poe_detail`, `poe`, `prescriptions`, …,
`provider`, and `ingredientevents` have already been imported and will be skipped
Warning message:
All required tables have already been imported 
> miiv$chartevents
Data for `miiv` is missing
Setup now (Y/n)? 

As you can see here, I have ricu 0.6.0, and it says that 0 of 31 tables of miiv are available, despite the above code saying that all required tables have been imported.

── ricu 0.6.0 ─────────────────────────────────────────────────────────────────────────

The following data sources are configured to be attached:
(the environment variable `RICU_SRC_LOAD` controls this)

βœ– miiv: 0 of 31 tables available

Any help would be much appreciated.

Age in MIMIC-IV

Hi everyone,

I was just wondering whether the age in MIMIC-IV should be adapted for patients with multiple ICU stays. As far as I understood the age is taken from the "anchor_age" column in the "patients" table and not modified for later ICU stays where the year in the "intime" column is not the same anymore as the "anchor_year" from the "patients" table. This can result in age differences of up to 12 years.

Am I correct and this should be adapted or did I miss something here?

Best,
Simon

`expand` on `win_tbl` arbitrarily sets negative end times to zero

When expand is called on a win_tbl, values are repeated for all time-steps that fall into the interval starting at index_var and lasting until index_var + dur_var. This works when index_var + dur_var is a positive number but whenever it's negative, it is just set to 0 by the following code.

ricu/R/utils-ts.R

Lines 140 to 148 in 7f2cc42

if (is_win_tbl(x) && !end_var %in% colnames(x)) {
on.exit(rm_cols(x, end_var, by_ref = TRUE))
dura_var <- dur_var(x)
x <- x[, c(end_var) := re_time(get(start_var) + get(dura_var), interval)]
x <- x[get(end_var) < 0, c(end_var) := as.difftime(0, units = time_unit)]
}

The effect of this can be seen with the following example from gcs. Here, ett_gcs is processed with expand but this leads to falls results for some patients.

sed = load_concepts("ett_gcs", src = "mimic_demo")
#> ── Loading 1 concept ───────────────────────────────────────────────────────────
#> β€’ ett_gcs
#> ────────────────────────────────────────────────────────────────────────────────

sed[icustay_id == 234989]
#> # A `win_tbl`:  11 βœ– 4
#> # Id var:       `icustay_id`
#> # Index var:    `charttime` (1 hours)
#> # Duration var: `dur_var`
#>    icustay_id charttime dur_var ett_gcs
#>         <int> <drtn>    <drtn>  <lgl>
#> 1      234989 -2 hours  1 mins  TRUE       <---- negative `index_var + dur_var`
#> 2      234989  7 hours  1 mins  TRUE
#> 3      234989 14 hours  1 mins  TRUE
#> 4      234989 18 hours  1 mins  TRUE
#> 5      234989 24 hours  1 mins  TRUE
#> 6      234989 35 hours  1 mins  TRUE
#> 7      234989 39 hours  1 mins  TRUE
#> 8      234989 43 hours  1 mins  TRUE
#> 9      234989 47 hours  1 mins  TRUE
#> 10     234989 52 hours  1 mins  TRUE
#> 11     234989 55 hours  1 mins  TRUE

sed = expand(sed, aggregate = "any")
sed[icustay_id == 234989]
#> # A `ts_tbl`: 13 βœ– 3
#> # Id var:     `icustay_id`
#> # Index var:  `charttime` (1 hours)
#>    icustay_id charttime ett_gcs
#>         <int> <drtn>    <lgl>
#> 1      234989 -2 hours  TRUE
#> 2      234989 -1 hours  TRUE    <--- artificially added
#> 3      234989  0 hours  TRUE    <--- artificially added
#> 4      234989  7 hours  TRUE
#> 5      234989 14 hours  TRUE
#> 6      234989 18 hours  TRUE
#> 7      234989 24 hours  TRUE
#> 8      234989 35 hours  TRUE
#> 9      234989 39 hours  TRUE
#> 10     234989 43 hours  TRUE
#> 11     234989 47 hours  TRUE
#> 12     234989 52 hours  TRUE
#> 13     234989 55 hours  TRUE

Created on 2024-04-12 with reprex v2.1.0

It is not entirey clear to me why end_var would need to be set to zero in the below code.

ricu/R/utils-ts.R

Lines 146 to 147 in 7f2cc42

x <- x[, c(end_var) := re_time(get(start_var) + get(dura_var), interval)]
x <- x[get(end_var) < 0, c(end_var) := as.difftime(0, units = time_unit)]

Maybe the intent was to prevent negative dur_vars? In that case, the following code would be needed instead.

x <- x[get(dura_var) < 0, c(dura_var) := as.difftime(0, units = time_unit)]
x <- x[, c(end_var) := re_time(get(start_var) + get(dura_var), interval)]

Bug in `merge_ranges`

Problem

vaso60 uses merge_ranges to look for overlaps in vasopressor times of the same ICU stay, which in turn uses data.table's foverlaps. The relevant line of code is here:

x <- reclass_tbl(data.table::foverlaps(x, x, mult = "first"), as_ptype(x))

I noticed that the output for dobu60 on the mimic demo data wasn't quite was I was expecting.

reclass_tbl(data.table::foverlaps(x, x, mult = "first"), as_ptype(x))
#> # A `ts_tbl`: 7 βœ– 5
#> # Id var:     `icustay_id`
#> # Index var:  `charttime` (1 mins)
#>   icustay_id charttime dobu_dur  i.charttime i.dobu_dur
#>        <int> <drtn>    <drtn>    <drtn>      <drtn>
#> 1     203766 1543 mins 1578 mins 1543 mins   1578 mins
#> 2     217724  590 mins  765 mins  590 mins    765 mins
#> 3     251573 1069 mins 1099 mins 1069 mins   1099 mins
#> 4     264446   22 mins  327 mins   22 mins    327 mins
#> 5     298685  482 mins  517 mins  482 mins    517 mins
#> 6     298685  482 mins  517 mins  582 mins   1548 mins
#> 7     298685  482 mins  517 mins 1527 mins   2422 mins

Not how in line 6 the ranges aren't actually overlapping. Instead, charttime and dobu_dur in lines 6 and 7 should read 582 mins and 1548 mins. This appears to be a weird interaction due to the ts_tbl class. If I downcast it to a vanilla data.table, everything works as expected.

reclass_tbl(data.table::foverlaps(as.data.table(x), as.data.table(x), mult = "first"), as_ptype(x))
#> # A `ts_tbl`: 7 βœ– 5
#> # Id var:     `icustay_id`
#> # Index var:  `charttime` (1 mins)
#>   icustay_id charttime dobu_dur  i.charttime i.dobu_dur
#>        <int> <drtn>    <drtn>    <drtn>      <drtn>
#> 1     203766 1543 mins 1578 mins 1543 mins   1578 mins
#> 2     217724  590 mins  765 mins  590 mins    765 mins
#> 3     251573 1069 mins 1099 mins 1069 mins   1099 mins
#> 4     264446   22 mins  327 mins   22 mins    327 mins
#> 5     298685  482 mins  517 mins  482 mins    517 mins
#> 6     298685  582 mins 1548 mins  582 mins   1548 mins
#> 7     298685  582 mins 1548 mins 1527 mins   2422 mins

Problem

Change to reclass_tbl(data.table::foverlaps(as.data.table(x), as.data.table(x), mult = "first"), as_ptype(x)) or identify the underlying root cause in ts_tbl.

Inconsistent origin for `patient`

While working on integrating SICdb #31, I noticed that the time origin for the ID level patient is inconsistently defined between datasets. For MIMIC, the origin is the date of birth whereas for AUMCdb, it is the earliest admission. I understand that this is due to the fact that not all databases allow calculation of the date of birth, but I was wondering, whether this should be unified (or documented) to avoid confusion? A consistent (although maybe not always helpful) origin for the patient could be the start of active observation for that patient. The end of observation would be loss of follow-up (if known, e.g., because the patient died) or NA if it is unclear if the patient is still under "observation".

Function `trunc_time` does not actually truncate

The function create_intervals inside expand_intervals used by mimic_rate_cv calls the function trunc_time in order to truncate times after 6 hours. There are two issues here:

  1. I am unsure whether a hard-coded truncation after 6 hours is sensible for a function that isn't MIMIC-specific. There might be something I am missing here but I think it would make sense to have expand_intervals take a max_len argument that then defaults to 6 hours.

ricu/R/callback-itm.R

Lines 414 to 415 in 7f2cc42

x <- create_intervals(x, c(id_vars(x), grp_var), overhang = hours(1L),
max_len = hours(6L), end_var = "endtime")

  1. The above is currently moot in any case, as trunc_time currently doesn't actually truncate. The replace function is currently never assigned back to the vector x.

ricu/R/utils-ts.R

Lines 643 to 654 in 7f2cc42

trunc_time <- function(x, min, max) {
if (not_null(min)) {
replace(x, x < min, min)
}
if (not_null(max)) {
replace(x, x > max, max)
}
x
}

This can be easily checked by running

debugonce(trunc_time)
load_concepts("norepi_rate", "mimic_demo")

which has a max length of 14 even after "truncating".

Incorrect time grid in `urine24`

urine24 maps all observed urine output onto a grid of interval-sized steps. If there was no output observed for a given interval period, the function uses fill_gaps to fill in those periods. Unfortunately, fill_gaps does not account for win_tbl and produces an incorrect result. See the following reprex.

library(ricu)
#> β„Ή Loading ricu
#> 
#> ── ricu 0.6.0 ──────────────────────────────────────────────────────────────────
#> 
#> The following data sources are configured to be attached:
#> (the environment variable `RICU_SRC_LOAD` controls this)
#> 
#> βœ” mimic: 26 of 26 tables available
#> βœ” mimic_demo: 25 of 25 tables available
#> βœ” eicu: 31 of 31 tables available
#> βœ” eicu_demo: 31 of 31 tables available
#> βœ” hirid: 5 of 5 tables available
#> βœ” aumc: 7 of 7 tables available
#> βœ” miiv: 31 of 31 tables available
#> βœ– sic: 7 of 8 tables available
#> 
#> ────────────────────────────────────────────────────────────────────────────────

res = load_concepts("urine", src = "mimic_demo")
#> ── Loading 1 concept ───────────────────────────────────────────────────────────
#> β€’ urine  β—― removed 20 (0.22%) of rows due to `NA` values  β—― removed 1 (0.01%) of rows due to out of range entries  β—― not all units are in [mL]: NA (0.08%)
#> ────────────────────────────────────────────────────────────────────────────────

# Code as currently used in  `urine24` function --------------------------
limits = collapse(res)
class(limits) # <-- is win_tbl
#> [1] "win_tbl"    "ts_tbl"     "id_tbl"     "data.table" "data.frame"

# Take the example of patient 204201. Their observations range from hour -1 to 
# hour 14. 
res[icustay_id == 204201]
#> # A `ts_tbl`: 14 βœ– 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#>    icustay_id charttime urine
#>         <int> <drtn>    <dbl>
#> 1      204201 -1 hours   1500
#> 2      204201  0 hours   1000
#> 3      204201  1 hours    250
#> 4      204201  2 hours    300
#> 5      204201  4 hours    360
#> 6      204201  5 hours    100
#> 7      204201  6 hours    100
#> 8      204201  7 hours    260
#> 9      204201  9 hours    350
#> 10     204201 10 hours     60
#> 11     204201 11 hours    160
#> 12     204201 12 hours    160
#> 13     204201 13 hours    160
#> 14     204201 14 hours    200

# In limits, this is encoded as a start of observation at hour -1 and a duration of 
# observation of 15 hours (-1 + 15 = 14). This duration is confusingly named "end".
limits[icustay_id == 204201]
#> # A `win_tbl`:  1 βœ– 3
#> # Id var:       `icustay_id`
#> # Index var:    `start` (1 hours)
#> # Duration var: `end`
#>   icustay_id start    end
#>        <int> <drtn>   <drtn>
#> 1     204201 -1 hours 15 hours

# However, in `fill_gaps`, end is not interpreted as a duration but as an 
# absolute point in time (as can be seen in the artificially added row at charttime == 15).
filled_cur = fill_gaps(res, limits = limits)
filled_cur[icustay_id == 204201]
#> # A `ts_tbl`: 17 βœ– 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#>    icustay_id charttime urine
#>         <int> <drtn>    <dbl>
#> 1      204201 -1 hours   1500
#> 2      204201  0 hours   1000
#> 3      204201  1 hours    250
#> 4      204201  2 hours    300
#> 5      204201  3 hours     NA
#> 6      204201  4 hours    360
#> 7      204201  5 hours    100
#> 8      204201  6 hours    100
#> 9      204201  7 hours    260
#> 10     204201  8 hours     NA
#> 11     204201  9 hours    350
#> 12     204201 10 hours     60
#> 13     204201 11 hours    160
#> 14     204201 12 hours    160
#> 15     204201 13 hours    160
#> 16     204201 14 hours    200
#> 17     204201 15 hours     NA    <--- this row was artificially added by fill_gaps



# This should be fixed, which can be done as follows ----------------------

limits = collapse(res, as_win_tbl = FALSE) 
class(limits) # <-- is returned as ts_tbl
#> [1] "id_tbl"     "data.table" "data.frame"
#> attr(,"previous")
#> [1] "ts_tbl"     "id_tbl"     "data.table" "data.frame"
limits = as_ts_tbl(limits, index_var = "start", interval = interval(res)) # <-- turn into ts_tbl

# The `end` column in limits now encodes an absolute time and not a duration
limits[icustay_id == 204201]
#> # A `ts_tbl`: 1 βœ– 3
#> # Id var:     `icustay_id`
#> # Index var:  `start` (1 hours)
#>   icustay_id start    end
#>        <int> <drtn>   <drtn>
#> 1     204201 -1 hours 14 hours

# `fill_gaps` now also works correctly and does not add artificial rows
filled_upd = fill_gaps(res, limits = limits)
filled_upd[icustay_id == 204201]
#> # A `ts_tbl`: 16 βœ– 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#>    icustay_id charttime urine
#>         <int> <drtn>    <dbl>
#> 1      204201 -1 hours   1500
#> 2      204201  0 hours   1000
#> 3      204201  1 hours    250
#> 4      204201  2 hours    300
#> 5      204201  3 hours     NA
#> 6      204201  4 hours    360
#> 7      204201  5 hours    100
#> 8      204201  6 hours    100
#> 9      204201  7 hours    260
#> 10     204201  8 hours     NA
#> 11     204201  9 hours    350
#> 12     204201 10 hours     60
#> 13     204201 11 hours    160
#> 14     204201 12 hours    160
#> 15     204201 13 hours    160
#> 16     204201 14 hours    200

Created on 2024-04-11 with reprex v2.1.0

The problem is not just additional rows but also complete omission of patients. This happens if observation starts late and duration of observation is short, which makes it look as if the patient was observed backward in time. See patient 228977 for an example.

res[icustay_id == 228977]
#> # A `ts_tbl`: 3 βœ– 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#>   icustay_id charttime urine
#>        <int> <drtn>    <dbl>
#> 1     228977 13 hours     25
#> 2     228977 16 hours     25
#> 3     228977 19 hours     25

limits[icustay_id == 228977]
#> # A `win_tbl`:  1 βœ– 3
#> # Id var:       `icustay_id`
#> # Index var:    `start` (1 hours)
#> # Duration var: `end`
#>   icustay_id start    end
#>        <int> <drtn>   <drtn>
#> 1     228977 13 hours 6 hours

filled_cur[icustay_id == 228977]
#> # A `ts_tbl`: 0 βœ– 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#> # β„Ή 3 variables: icustay_id <int>, charttime <drtn>, urine <dbl>

filled_upd[icustay_id == 228977]
#> # A `ts_tbl`: 7 βœ– 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#>   icustay_id charttime urine
#>        <int> <drtn>    <dbl>
#> 1     228977 13 hours     25
#> 2     228977 14 hours     NA
#> 3     228977 15 hours     NA
#> 4     228977 16 hours     25
#> 5     228977 17 hours     NA
#> 6     228977 18 hours     NA
#> 7     228977 19 hours     25

Discrepancies with preprint numbers

Problem

I am trying to replicate the MIMIC III numbers given in the preprint Moor et al. (2021), which relies on this package for preprocessing. Unfortunately, I am unable to replicate their reported patient attrition (Extended Data Figure 1) and the overlap between different definitions of suspected infection (Supplementary Figure 14).

I appreciate that this question isn't necessarily directly related to ricu but belongs to the paper repo. However, that repo unfortunately isn't public yet. Since there is an overlap in authors @nbenn , I hoped to get some early help here.

Patient attrition

Let's start with patient attrition. I start with the same number of patients in MIMIC III and remove the same number of patients due to age < 14 years, leaving me with 53,432 patients.

nrow(load_id(mimic$icustays))
#> [1] 61532

load_concepts("age", src = "mimic", verbose = FALSE) %>% 
   filter(age < 14) %>%
   nrow()
#> [1] 8100

However, that's when the agreement stops. Moving on to the requirement of sepsis onset in the ICU, I get a lower number of exclusions when filtering for sepsis cases inside the ICU:

load_concepts("sep3", src = "mimic", verbose = FALSE) %>% 
  inner_join(icustays_adults, by = "icustay_id") %>%  # icustays_adults == load_id(mimic$icustays) with age >= 14
  filter(charttime < intime | outtime < charttime)
#> [1] 3062

This misses almost 2,000 patients that were excluded in the preprint. I played around with different definitions of "inside the ICU" but to no avail. Checking for sepsis onset between the 4th and 168th hour also returns a lower than reported number at 10,051

load_concepts("sep3", src = "mimic", verbose = FALSE) %>% 
  semi_join(icustays_inside, by = "icustay_id") %>% # icustays_inside== icustays_adults without  sepsis < intime or outtime < sepsis
  filter(charttime < 4 | charttime > 168)

There are also some discrepancies with the remaining exclusion criteria but those are lesser and I am unsure whether this is just because I messed up the previous exclusions. As a side note, the total number of exclusions 8100+4861+10359+486+472+681=24959 mentioned in the paper also does not agree with the final cohort size of 36591 (61532-24959=36573).

Suspicion of infection

Similar to the above, I arrive at different numbers when attempting to replicate the Venn diagram in Supplementary Figure 14. According to the diagram, there should be 8652+23556=32208 ICU stays with antibiotics and fluid sampling according to Sepsis-3 criteria in the dataset. As far as I understand, this is implemented in the susp_inf concept. However, loading this concept gives

load_concepts("susp_inf", src = "mimic", verbose = FALSE) %>% 
   distinct(icustay_id) %>% 
   nrow()
#> [1] 31070

Obtaining the multi-Abx subset 1844+23556=25400 is further complicated by the fact that the preprint does not specify what abx_min_count and abx_count_win were used. Assuming abx_min_count=2 and abx_count_win=hours(24L), we get

ricu::susp_inf(
        list(abx = abx, samp = samp),
        si_mode = "abx",
        abx_min_count = 2,
        abx_count_win = hours(24L),
    ) %>% 
   filter(susp-inf) %>% 
   distinct(icustay_id) %>%
   nrow()
#> [1] 21481

This is again smaller than what we find in the preprint. Expanding the timeframe to abx_count_win=hours(48L), we get a closer matching but still lower number at 23,243.

Question

I appreciate how hard it is to present full replication when working with EHR data and complicated definitions such as Sepsis-3. I'd be grateful for any suggestions as to where I might be going wrong or what might explain the above differences.

SaO2 and SpO2 are conflated

The current definition of oxygen saturation (concept o2sat) currently does not distinguish between oxygen saturation measured peripherally via pulse oximetry (SpO2) and directly via blood gas analysis (SaO2). While SpO2 is measured continuously at the bedside, SaO2 must be ordered from the lab.

Source SpO2 SaO2
AUMC numericitems: 6709, 8903 numericitems: 12311
eICU vitalperiodic: sao2 lab: O2 Sat (%) ?
HiRID observation: 4000, 8280 observation: 20000800
MIMIC III chartevents: 646, 220277, 226253+, 50817^ chartevents: 834*, 220227 *
MIMIC IV chartevents: 220277, 226253+ labevents: 220227*, 50817

? unclear if SpO2 or SaO2 - assigned to the most likely category based on table of origin
* not yet included in any ricu concept
+ abnormal flag for lower limit rather than an actual measured value
^ does not seem to exist in the data

Problem

While SpO2 is a good approximation of SaO2 when saturation is above 90%, this is not necessarily the case at lower saturations (see for example Lebecque, 1991). It is also treated separately in the PhysioNet 2019 challenge, a common benchmark task for sepsis prediction.

Proposed solution

Split this concept into two separate concepts that may be combined if this differentiation isn't needed. While there needs to be a balance between granularity and usability when defining the concepts, I believe these are important enough and included in many studies, warranting a more granular definition.

Edit: fixed link

Finding dosage treatments for patients in MIMIC-III

Hi, I'm a doctoral student interested using this database.

My PI and I are particularly interested in looking at different treatments for patients, and in particular those treatments should have different dosages.

Does the ricu/mimic-iii data have anything on that?

Thanks!

Comorbidity measures across datasets

In order to analyze data across datasets (MIMIC, eICU, HiRID, and Amsterdam), are there equivalent comorbidity measures across these datasets, ie classification schemas and scores using ICD-9/10 codes defined by Charlson and Elixhauser?

Elixhauser index

Hi,
Is there a way to calculate Elixhauser commorbidity index using ricu package?

-Arkaprava

Recursively passing arguments to lower-level concepts

Some callbacks rely on additional arguments depending on the database. For example, eICU does not contain reliable fluid sampling so sepsis definitions may need to be based on antibiotics alone. susp_inf allows to specify the si_mode parameter to do so.

library(ricu)
so <- load_concepts("sofa", "eicu_demo", verbose = FALSE)
si <- load_concepts("susp_inf", "eicu_demo", si_mode = "abx", verbose = FALSE)
se <- sep3(sofa = so, susp_inf = si)

This will lead to a different result than the default sepsis-3 calculation:

se_std <- load_concepts("sep3", "eicu_demo", verbose = FALSE)
setequal(se, se_std)
#> [1] FALSE

Problem

Currently, the above cannot be loaded directly

se_std <- load_concepts("sep3", "eicu_demo", verbose = FALSE)
setequal(se, se_std)
#> Error in `assert_that()`:
#> ! setequal(x = names(dots), y = concepts) is not TRUE

The reason for this is twofold:

  1. collect_dots currently only allows for concepts to be passed via ..., which leads to the above error when the sep3 callback is executed. I may be missing something here but to me this seems unnecessarily restrictive.
  2. Even if collect_dots accepted other parameters, those parameters would not be be passed down to the lower-level concepts.

ricu/R/concept-load.R

Lines 496 to 498 in 4a64d6b

ext <- list(patient_ids = patient_ids, id_type = id_type,
interval = coalesce(x[["interval"]], interval),
progress = progress)

(Potential) Solution

Change the behaviour of collect_dots and add ... to ext.

Strange error when I download miiv dataset

Description:
I am encountering an unusual error while attempting to download the miiv dataset. The error message I am receiving is as follows:

Error in `assert_that()`:
! core/omr.csv.gz was not found among the provided options. Did you
possibly mean hosp/drgcodes.csv.gz instead?

Code Snippet:
Here is the relevant code snippet that is causing the error:

admissions_df <- as.data.frame(miiv$admissions)
icustays_df <- as.data.frame(miiv$icustays)
patients_df <- as.data.frame(miiv$patients)

Issue Background:
I have tried to use the above code to download the miiv dataset, but I am consistently encountering this error. Strangely, I have successfully used a similar approach to download the MIMIC III dataset without any issues.

Question:
Could this error possibly be caused by a version difference in the miiv dataset compared to MIMIC III? I am using the latest versions of the packages and libraries, but this issue persists.

Request for Assistance:
I am new to R, and I find it difficult for me to solve the error by reading source code. I would greatly appreciate any guidance or insights regarding this error. If anyone has encountered a similar problem or has suggestions on how to resolve it, please share your thoughts.

Thank you for your help!

How to add `caregiver` concept?

I am interested in variables that describe possible heterogeneity in the data. One such variable is cgid (caregiver-id) in table chartevents MIMIC III:

CGID is the identifier for the caregiver who validated the given measurement

(ref).

I would like to add a concept that returns, for each hour where at least one measurement was entered into the system, the list of caregivers involved. I tried the following:

  "caregiver": {
    "category": "misc",
    "sources": {
      "mimic": {
        "table": "chartevents",
        "val_var": "cgid",
        "target": "ts_tbl",
        "class": "col_itm"
      }
    }
  },

However, this raises

Error in eval(assertion, env) : 
  argument "ids" is missing, with no default

How should I go about adding such a concept?

CSV datafiles are already downloaded how to connect with ricu package (Windows)

First all, great package. I have already downloaded all the data for Physionet as compressed csv files for MIMIC III and also IV. But, I am finding it very hard to let ricu know that where they are located. I tried "load_src_cfg" to configure my csv files by passing the folder link of all the files. That did not work. I can redownload all the files if that simplifes everything and automatically connected to the package. I could not understand that part also, what function to use to re-download all the files, how the credentials to be provided, how to let the function know which data I want to download.

-Arkaprava

! Access to the requested resource was denied.

I have obtained the relevant access rights and entered the correct account password, but still cannot access the data

'''
Data for miiv is missing
Setup now (Y/n)? Y
username: XXXXX
password: XXXXX
Error in stop_ricu():
! Access to the requested resource was denied. Please set up an account at https://physionet.org/ and apply for data
access.
Run rlang::last_error() to see where the error occurred.
'''

Dextrose

HiRID:

Glucose 20%/100ml Pflege Β  1000746
Glucose 20% /100ml Β  1000544
Glucose 40% Β  1000545
Glucose 50% Β  1000567
Glucose 30% Β  1000060
Glucose 20% /500ml Β  1000835
Glucose 10% Β  1000022
Glucose 10% Β  1000690
Glucose 20% Β  1000689

AUMC:

Glucose 5% Β  7257
Glucose 40% Β  8940
Glucose 5% Β  9569
Glucose 40% Β  9571
Glucose 20% Β  7255
Glucose 10% Β  7254
Glucose 30% Β  7256

MIMIC-III:

Dextrose 5% Β  220949
Dextrose 50% Β  220952
Dextrose 10% Β  220950
Dextrose 20% Β  228140
Β  Β  Β 
Dextrose 10% Β  30016
Dextrose 20% Β  30017
Dextrose 5% Β  Β 

Warning about missing rows upon importing MIMIC-IV dataset using ricu

Hello I notice the following warnings after downloading and importing the datasets .

ALthough upon downloading the log mentions that checksums are checked and its fine

upon executing import_src("miiv") the following warning are displayed.

Successfully imported 31 tables

... 
1: expected 26850359 rows but got 13294903 rows for table `emar` 
2: Encountered parsing problems for file poe.csv.gz:
  β€’ [33719551, NA]: got '1 columns' instead of 12 columns 
3: expected 39366291 rows but got 33719550 rows for table `poe` 

win_tbl interval determination

(@nbenn)
Below is a minimal reproducible example:

> res <- as_id_tbl(
+   data.table(id = 1, start = hours(1), end = hours(2)),
+   id_vars = "id"
+ )
> as_win_tbl(res, index_var = "start", dur_var = "end", by_ref = TRUE)
Error in `assert_that()`:
! x is not compatible with an interval of Inf hours
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
In min(numeric(0), na.rm = TRUE) :
 no non-missing arguments to min; returning Inf

It seems that the error comes from L378 in tbl-class.R, in which interval(index_col(res)) is called, which throws this error.

Error while importing in `setup_src_data("mimic")`

Hi thanks for the great package!

I verified that I have the data in the data_dir()

Here's a reprex:

ricu::setup_src_data("mimic")
#> The requested tables have already been downloaded
#> ── Importing 23 tables for `mimic` ─────────────────────────────────────────────
#> Warning: Encountered parsing problems for file CHARTEVENTS.csv.gz:
#>   β€’ [7746655, 9]: got '' instead of closing quote at end of file
#>   β€’ [7746655, NA]: got '9 columns' instead of 15 columns
#> β€’ chartevents chunk 1
#> Error in setorderv(dat, sort_col): some columns are not in the data.table: ITEMID

Created on 2023-02-15 with reprex v2.0.2

ENH: make aggregate allow for arbitrary functions from the .json

Problem

concept-dict.json allows to specify a standard aggregation for concepts. However, this currently only works for functions that are known to dt_gforce, as aggregation functions specified as strings are directly passed on to dt_gforce (see also #36).

Solution

We could simply check for any function that is known to dt_gforce and pass those on. If we get another function, we try to parse it.

Identifying available time stamped variables

I came across your paper on the RICU interface to intensive care data. I am NOT a stastician and most of my work is adminsitrative and as a clinician. I am, however, interested in exploring research in critical care databases. I have an account with physio NET. As a non statistician, if I wanted to figure out what variables had time stamps and what outcomes were available (MMIC, Amsterdam, HRid, Berlin), how can I figure that out? My hope is to develop a data analysis plan before finding a statistician.

Incorrect times when downgrading ID

Downgrading the ID from icustay to patient leads to incorrect negative times.

x <- load_concepts('death', 'mimic_demo')
x
#> # A `ts_tbl`: 40 βœ– 3
#> # Id var:     `icustay_id`
#> # Index var:  `index_var` (1 hours)
#>    icustay_id index_var death
#>         <int> <drtn>    <lgl>
#>  1     201006 151 hours TRUE
#>  ...

change_id(x, 'subject_id', 'mimic_demo')[icustay_id == 201006]
#> # An `id_tbl`: 1 βœ– 4
#> # Id var:      `subject_id`
#>   subject_id icustay_id index_var      death
#>        <int>      <int> <drtn>         <lgl>
#> 1      10076     201006 -36212946 mins TRUE

The above should be positive, as it would otherwise imply negative lifetime. I believe the issue is the following substraction in change_id_helper, which works fine for upgrade_id but is incorrect for downgrade_id.

res <- res[, c(cols) := lapply(.SD, `-`, get(sft)), .SDcols = cols]

For downgrade_id, this should be an addition.

Error when trying to `upgrade_id` during `load_win`

Problem

When trying to load an itm, it may be necessary to upgrade the ids, e.g., when loading prescriptions (hadm) in miiv with id_type='icustay'. This leads to an error if the target is a win_tbl

az_wi <- new_itm(
   src = "miiv", 
   table = "prescriptions", 
   sub_var = "drug", 
   regex = "aztreonam", 
   dur_var = "stoptime", 
   class = "rgx_itm", 
   target = "win_tbl")
res <- do_itm_load(az_wi)
#> Error in colnamesInt(x, names(on), check_dups = FALSE) :
#>  argument specifying columns specify non existing column(s): cols[3]='stoptime'

This error originates in the following lines because stoptime is part of meta_vars(x)

ricu/R/data-utils.R

Lines 681 to 684 in 09902bd

map <- id_map(src, id_vars(x), target_id, sft, idx)
res <- map[x, on = meta_vars(x), roll = -Inf, rollends = TRUE]
res <- res[, c(cols) := lapply(.SD, `-`, get(sft)), .SDcols = cols]

but stoptime was removed previously in

ricu/R/data-load.R

Lines 440 to 442 in 09902bd

time_vars <- setdiff(intersect(time_vars, colnames(res)), dur_var)
res <- change_id(res, id_var, x, cols = time_vars, keep_old_id = FALSE)

Expected behaviour

No error with the following result

az_ts <- new_itm(
   src = "miiv", 
   table = "prescriptions", 
   sub_var = "drug", 
   regex = "aztreonam", 
   dur_var = "stoptime", 
   class = "rgx_itm", 
   target = 'ts_tbl')
res <- do_itm_load(az_ts)
res[, "stoptime" := get("stoptime") - get("starttime")]
as_win_tbl(res, dur_var="stoptime")
#> # A `win_tbl`:  1,363 βœ– 5
#> # Id var:       `stay_id`
#> # Index var:    `starttime` (1 hours)
#> # Duration var: `stoptime`
#>        stay_id starttime stoptime  drug      dose_val_rx
#>          <int> <drtn>    <drtn>    <chr>     <chr>
#>     1 30025040   1 hours  34 hours Aztreonam 1000
#> ...

Solution

Add a dedicated function upgrade_id.win_tbl.

Floating point issues in `sofa_cardio`

I stumbled over a minor inconsistency in sofa_cardio that assigns incorrect scores, most likely due to a floating point issue. See the following reprex with the latest main branch.

Local .Rprofile detected at /Users/patrick/projects/ricu-package/.Rprofile

devtools::load_all()
#> β„Ή Loading ricu
#> 
#> ── ricu 0.6.0 ──────────────────────────────────────────────────────────────────
#> 
#> The following data sources are configured to be attached:
#> (the environment variable `RICU_SRC_LOAD` controls this)
#> 
#> βœ” mimic: 26 of 26 tables available
#> βœ” mimic_demo: 25 of 25 tables available
#> βœ” eicu: 31 of 31 tables available
#> βœ” eicu_demo: 31 of 31 tables available
#> βœ” hirid: 5 of 5 tables available
#> βœ” aumc: 7 of 7 tables available
#> βœ” miiv: 31 of 31 tables available
#> βœ– sic: 7 of 8 tables available
#> 
#> ────────────────────────────────────────────────────────────────────────────────

src = "mimic_demo"

map = load_concepts("map", src = src)
#> ── Loading 1 concept ───────────────────────────────────────────────────────────
#> β€’ map  β—― removed 22 (0.14%) of rows due to `NA` values  β—― removed 13 (0.08%) of rows due to out of range entries
#> ────────────────────────────────────────────────────────────────────────────────
dopa60 = load_concepts("dopa60", src = src)
#> ── Loading 1 concept ───────────────────────────────────────────────────────────
#> β€’ dopa60β€’ dopa_rate  β—― removed 2788 (5.21%) of rows due to `NA` valuesβ€’ dopa_dur
#> ────────────────────────────────────────────────────────────────────────────────
norepi60 = load_concepts("norepi60", src = src)
#> ── Loading 1 concept ───────────────────────────────────────────────────────────
#> β€’ norepi60β€’ norepi_rate  β—― removed 3482 (2.62%) of rows due to `NA` values  β—― removed 12 (0.01%) of rows due to out of range entriesβ€’ norepi_dur
#> ────────────────────────────────────────────────────────────────────────────────
dobu60 = load_concepts("dobu60", src = src)
#> ── Loading 1 concept ───────────────────────────────────────────────────────────
#> β€’ dobu60β€’ dobu_rate  β—― removed 240 (6.77%) of rows due to `NA` valuesβ€’ dobu_dur
#> ────────────────────────────────────────────────────────────────────────────────
epi60 = load_concepts("epi60", src = src)
#> ── Loading 1 concept ───────────────────────────────────────────────────────────
#> β€’ epi60β€’ epi_rateβ€’ epi_dur
#> ────────────────────────────────────────────────────────────────────────────────

# Get sofa_cardio as currently implemented ----------------------------------------
input = mget(c("map", "dopa60", "norepi60", "dobu60", "epi60"))
cur = sofa_cardio(input)

# Recalculate with a function that adjusts for floating point imprecision ---------
sofa_cardio_upd <- function(..., interval = NULL) {
  score_calc <- function(map, dopa, norepi, dobu, epi) {
    eps <- .Machine$double.eps ^ 0.5
    fifelse(
      is_true(dopa > 15 + eps | epi > 0.1 + eps | norepi > 0.1 + eps), 4L, fifelse(
        is_true(dopa > 5 + eps | (epi > 0 + eps &    epi <= 0.1 + eps) |
                  (norepi > 0 + eps & norepi <= 0.1 + eps)), 3L, fifelse(
                    is_true((dopa > 0 + eps & dopa <= 5 + eps) | dobu > 0 + eps), 2L, fifelse(
                      is_true(map < 70), 1L, 0L
    ))))
  }
  cnc <- c("map", "dopa60", "norepi60", "dobu60", "epi60")
  dat <- collect_dots(cnc, interval, ..., merge_dat = TRUE)
  dat <- dat[, c("sofa_cardio") := score_calc(
    get("map"), get("dopa60"), get("norepi60"), get("dobu60"), get("epi60")
  )]
  #dat <- rm_cols(dat, cnc, by_ref = TRUE) <---- keep variables to investigate diffs
  dat
}

upd = sofa_cardio_upd(input)


# Compare the results obtained with each method -------------------------------------

# We can see that the results are not equal
setequal(cur, upd)
#> [1] FALSE

# There is a difference in 6 rows, which should have been classified as 3 but were 4.
# The updated function gets it right. 
comb = merge(upd, cur, by = c("icustay_id", "charttime"), suffixes = c(".upd", ".cur"))
setnames(comb, c("sofa_cardio.cur", "sofa_cardio.upd"), c("cur", "upd"))
as.data.frame(comb[cur != upd])
#>   icustay_id charttime  map dopa60 norepi60 dobu60 epi60 upd cur
#> 1     203766  82 hours 81.0      6      0.1     NA    NA   3   4
#> 2     223177  29 hours 75.5      4      0.1     NA    NA   3   4
#> 3     241562  -1 hours 78.5     NA      0.1     NA    NA   3   4
#> 4     248755  49 hours 65.0     NA      0.1     NA    NA   3   4
#> 5     248755 104 hours 63.0     NA      0.1     NA    NA   3   4
#> 6     248755 105 hours 61.0     NA      0.1     NA    NA   3   4

Created on 2024-04-11 with reprex v2.1.0

Dataset is not attached

Thanks for your amazing work! I downloaded MIMIC-IV datasets from physionet myself. And then I was trying to import mimic4 dataset using ricu.
Here are my steps:
I used import_src to load the data and attach_src to attach the source.
However, when I tried to run the last line cohort <- load_concepts(static_features, "miiv", verbose = TRUE), I got an error:

> cohort <- load_concepts(static_features, "miiv", verbose = TRUE)
Error in `assert_that()`:
! miiv was not found among the provided options. Did you possibly mean mimic_demo instead?
Backtrace:
    β–†
 1. β”œβ”€ricu::load_concepts(static_features, "miiv", verbose = TRUE)
 2. └─ricu:::load_concepts.character(static_features, "miiv", verbose = TRUE)
 3.   β”œβ”€ricu::load_concepts(...)
 4.   └─ricu::load_dictionary(src, x, name = dict_name, cfg_dirs = dict_dirs)
 5.     └─ricu:::assert_that(are_in(src, avail))

Could you help check if I missed some steps? Thanks!

Here's my code:

library(ricu)
library(dplyr)
import_src("miiv", 'physionet.org/files/mimiciv/2.2')
attach_src("miiv")
table_concepts <- explain_dictionary()
static_features <- table_concepts$name[table_concepts$category == "demographics"]
cohort <- load_concepts(static_features, "miiv", verbose = TRUE)

Inquiry on Calculating Different Types of Death and Out of Range Entries in ricu Package

I hope this message finds you well. I am currently utilizing the ricu package from GitHub to analyze clinical data, and I have encountered a couple of points where I seek your expert guidance.

  1. Differentiating Types of Death:
    I am particularly interested in distinguishing between different types of death outcomes using the death variable. Specifically, I would like to categorize and calculate the following:

ICU Death: Patients who passed away during their ICU stay.
In-Hospital Death: Patients who passed away in the hospital but not necessarily within the ICU.
Post-Discharge Death: Patients who passed away within 30 days after being discharged from the hospital.
Could you please provide guidance or recommend a methodology on how I can compute these distinct outcomes using the available data in ricu? Are there specific fields or transformations that I should consider for accurate categorization?

  1. Understanding "Out of Range Entries":
    After downloading data using the load_concepts function, I noticed that the output includes warnings about "out of range entries," as illustrated in the attached image.

For example:

po2: Removed rows due to NA values and out of range entries.
fio2: Removed rows due to NA values and out of range entries.
plt: Removed rows due to NA values and out of range entries.
I am keen to understand the criteria or standards used to determine these out-of-range entries. Could you please elaborate on the methods or thresholds applied to classify data as out of range? Is there a way to access or customize these criteria?

Thank you very much for your assistance. Your insights will be highly valuable for my research, and I appreciate any documentation or resources you can provide to help clarify these aspects.

Best regards,

How to unify time axes within the same dataset ?

Hello, I have a question regarding time axes data handling in Ricu. p1
p2
How could we recreate p1 from p2 ? (reference ricu pdf p.6-7. I.e. we have two multiple sets of features, each with its own axes.

This varies per dataset since there are multiple time axes. for each (5 for mimic and 7 for eicu), the features are grouped differently for each. Is there some guideline unifying time axes? should we use specific functions for some APIs ?
(I load all concepts individually - for size/speed reasons )
ricu::load_concepts(column_name, src_name, verbose = FALSE)

Have a nice day and thank you for your help!

exceed 2^31-1 bytes

Hello I try to use ricu with sic dataset however I face this issue (below) any ideas?

sic$laboratory
Data for `sic` is missing
Setup now (Y/n)? Y
The requested tables have already been downloaded
── Importing 8 tables for `sic` ───────────────────────────────────────────────────
Error in paste(do.call("c", msg), collapse = "\n") : 
  result would exceed 2^31-1 bytes
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Problem loading HIRID dataset

Hi, I am trying to load the HIRID dataset.
I run the code:
library(ricu)
attach_src("hirid", data_dir = 'PATH/TO/physionet.org/files/hirid/1.1.1/raw_stage')
ricu::src_data_avail("hirid")

But it gives no error but the dataset seems still unattached.
name available tables total
1 hirid FALSE 0 5

The folder architecture is like this:
physionet.org/
└── files
└── hirid
└── 1.1.1
β”œβ”€β”€ imputed_stage
β”œβ”€β”€ merged_stage
β”œβ”€β”€ raw_stage
β”‚Β Β  β”œβ”€β”€ observation_tables
β”‚Β Β  β”‚Β Β  └── parquet
β”‚Β Β  β”œβ”€β”€ observation_tables_parquet
β”‚Β Β  β”œβ”€β”€ pharma_records
β”‚Β Β  β”‚Β Β  └── parquet
β”‚Β Β  └── pharma_records_parquet
└── reference_data

Thanks in advance!

[AUMC] Error in difftimes for second ICU stays

AUMCdb, like eICU, does not provide dates but instead reports times since admission. For this reason, both databases are loaded in ricu via that load_eiau function.

Problem

The calculation of times -- while similar -- is not equal between eICU and AUMCdb. While in eICU times denote the number of minutes from unit admit time, in AUMCdb they represent milliseconds since the first ICU admission. That is, while eICU provides the time since current admission, AUMCdb gives the time since the first admission. load_eiau however treats them both the same, which leads to incorrect times for AUMCdb

Example

library(ricu)

# Two admissions for one patient. Note that the first admittedat is always 0, 
# whereas the second is relative to the first
aumc$admissions[aumc$admissions$patientid == 38, c("admissionid", "admittedat", "dischargedat")]
#>    admissionid admittedat dischargedat
#> 1:          40          0     56820000
#> 2:          41  164280000    325620000

# For heart rates during the first admission, everything works correctly
aumc$numericitems[aumc$numericitems$admissionid == 40 & aumc$numericitems$itemid == 6640, c("admissionid", "measuredat", "value")][1]
#>    admissionid measuredat value
#> 1:          40     300000    86
load_concepts("hr", "aumc", patient_ids = 40, interval = mins(1L))[1]
#>   admissionid measuredat    hr
#>         <int> <drtn>     <dbl>
#> 1          40 5 mins        86
ricu:::ms_as_mins(300000 - 0) # <-- correct
#> Time difference of 5 mins

# For heart rates during the second admission, ricu calculates difftimes since *first* admission 
aumc$numericitems[aumc$numericitems$admissionid == 41 & aumc$numericitems$itemid == 6640, c("admissionid", "measuredat", "value")][1]
#>    admissionid measuredat value
#> 1:          41  167280000   118
load_concepts("hr", "aumc", patient_ids = 41, interval = mins(1L))[1]
#>   admissionid measuredat    hr
#>         <int> <drtn>     <dbl>
#> 1          41 2788 mins    118
ricu:::ms_as_mins(167280000 - 0) # <-- incorrect
#> Time difference of 2788 mins
ricu:::ms_as_mins(167280000 - 164280000) # <-- should be this
#> Time difference of 50 mins

Note that the output above has been been shortened a little for readability.

namespace conflict with units package

First of, this package is amazing! This is exactly what I have been looking for and this will save me some major headaches down the line. Thank you for doing all the hard work for me and creating this package.

Unfortunately, while exploring the package I've run into a (apparent) conflict when using the units package with ricu.

Problem

After installing units=0.8-8 and restarting the R session, loading ricu fails with the following error message:

library(ricu)
#> Error: package or namespace load failed for 'ricu':
#>  .onLoad failed in loadNamespace() for 'ricu', details:
#>   call: ud_map_symbols(symbol, ut_unit)
#>   error: Unit already maps to "h"

This problem does not occur if I install units while ricu is already loaded. All bits that I've tried so far (mainly the examples in the JSS paper) work in that case --- until I restart the R session, after which the above error occurs again.

Session info

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252 
#> [2] LC_CTYPE=English_United Kingdom.1252   
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.28   withr_2.4.2     magrittr_2.0.1  reprex_2.0.1   
#>  [5] evaluate_0.14   highr_0.9       stringi_1.7.5   rlang_0.4.12   
#>  [9] cli_3.1.0       rstudioapi_0.13 fs_1.5.0        rmarkdown_2.11 
#> [13] tools_4.1.2     stringr_1.4.0   glue_1.5.0      xfun_0.28      
#> [17] yaml_2.2.1      fastmap_1.1.0   compiler_4.1.2  htmltools_0.5.2
#> [21] knitr_1.36

New database available: Salzburg Intensive Care database (SICdb)

Recently, a new database from Austria with ~30,000 patients was published on physionet.

I think it would be great to include SICdb in ricu. To that end, I already contacted the team behind SICdb and @nrodemund kindly agreed to advise on the mapping of clinical concepts. I already started working on this in #30 and was able to load all tables and provide a first proof-of-concept mapping for crea (to be confirmed clinically).

The following two questions have come up so far:

  • currently, adding a new databases requires edits in many places, including core ricu files. It might be worth thinking about how all functionality that concerns only a single database may be pooled in a single file or subdirectory, or maybe even in a separate add-on package
  • SICdb uses a stream of IEEE floats to succinctly provide 1-minute resolution on some measurements. This needs some additional preprocessing, which can either be done in the form of do_callback.sic_itm or once during import_src. The latter option would require the currently non-existing ability to provide callbacks that are called during data import.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.