esoreq / dcarte Goto Github PK

Dcarte code

Home Page: https://esoreq.github.io/dcarte/

Python 60.96% Jupyter Notebook 39.04%

dcarte's Introduction

DCARTE UK-DRI CAre Research & TEchnology data ingestion tools. It is currently the simplest way to access the ongoing data collected by the UKDRI-CRT project and the legacy data collected by the TIHM project.

The tools were developed by Dr Eyal Soreq to standardize data-driven analysis across data domains (IoT, behavioral, physiological, etc.). The datasets are being collected as part of the ongoing UKDRI study, which aims to progress our understanding of the different manifestations of dementia. This package works either on the Imperial College London RDS cluster or on your computer running Python >3.9.

Installation Options

Install with pip
- $ pip install -U dcarte

Usage

See example Jupyter notebooks

Inputs

UKDRI-CRT legacy data sets

Filename	Description	Duration
tihm15.zip	Data collected by TIHM IoT system during TIHM 1.5 project (extension)	(2018-2019)
tihmdri.zip	Data collected by TIHM IoT system during DRI project	(2019-2021)

The TIHM system has undergone numerous iterations during this time. These exported datasets are extracted from historic backups taken at the end of each project. The original databases are MongoDB and do not have a consistent schema. The CSV exports harmonise these into a simplified, consistent tabular format.

How to Contribute

Clone repo and create a new branch: $ git checkout https://github.com/esoreq/dcarte -b name_for_new_branch.
Make changes and test
Submit Pull Request with a comprehensive description of changes

dcarte's People

Contributors

Stargazers

Watchers

Forkers

ukdri nbijlani ai-serban alexcapstick mathildasu mwoodbri francescapalermo

dcarte's Issues

Integrate door status data into activity domain

Eyal, as discussed: To get a more complete timeline of subject activity and transitions, it would be great if the door status information is included into the activity data. The current inclusion of door information is for the door open event only. The close event isn't separately included.

Sleep dailies missing data

sleep_periods.dropna(subset=['time_in_bed','DEEP','hr_max'])

the line above removes sleep_dailies data for 1 month consecutive days

dcarte.update_domain -> Exception: Sorry, environmental is not a registered dataset in raw domain in dcarte

When running:
dcarte.update_domain(domain='base')

the following exception is raised:

Exception: Sorry, environmental is not a registered dataset in raw domain in dcarte

In this version (dcarte 0.4.6), it seems that Environmental has been deleted by RAW domain and replaced by Ambient_Temperature so update domain function should be updated as well to cover the modification.

Physiology_Dailies, PROFILE - start_time and start_date different

There are two different dates - start_date repeats but start_time is different for each row - values are the same according to start_date

doors is not a registered dataset in base domain in dcarte

When running the base.py script, the following Exception is raised:
Exception: Sorry, doors is not a registered dataset in base domain in dcarte
This seems to be due to the fact that, when looping over the parent_datasets in the create_base_datasets() function, the scripts searches for the doors dataset in the base domain configuration because of 'Entryway':[['doors','base']] in the load.py and raise the exception, even if the doors.parquet file has been created in the data/base folder.

Traceback (most recent call last):
  File "**test_dcarte.py**", line 5, in <module>
    base.create_base_datasets()
  File "**base.py**", line 181, in create_base_datasets
    p_datasets = {d[0]:dcarte.load(*d) for d in parent_datasets[dataset]} 
  File "**base.py**", line 181, in <dictcomp>
    p_datasets = {d[0]:dcarte.load(*d) for d in parent_datasets[dataset]} 
  File "**utils.py**", line 123, in wrapped
    out = fun(*fun_args, **fun_kwargs)
  File "**load.py**", line 41, in load
    raise Exception(f"Sorry, {dataset} is not a registered dataset in {domain} domain in dcarte")
Exception: Sorry, doors is not a registered dataset in base domain in dcarte

An error appears when using create_weekly_profile()

An error appears when using create_weekly_profile().
The problem is when setting a column as 'on' in resample() function it will become index automatically and not be in the column list anymore. I worked around it by using a duplicated column of 'start_date' temporarily. I guess there may be a nicer solution.

The code is:

from dcarte.derived import create_base_datasets,create_weekly_profile
create_base_datasets()
create_weekly_profile()

python version is 3.8.9, dcarte version is 0.3.37, pandas version is 1.5.1

The output is as below:

Finished Loading motion in:                    0.7 seconds   
Finished Loading activity_dailies in:          0.0 seconds   
Finished Loading sleep in:                     1.0 seconds   
Finished Loading bed_occupancy in:             0.0 seconds   

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/py38/lib/python3.8/site-packages/pandas/core/resample.py:448, in Resampler._groupby_and_aggregate(self, how, *args, **kwargs)
    447     else:
--> 448         result = grouped.aggregate(how, *args, **kwargs)
    449 except DataError:
    450     # got TypeErrors on aggregation

File ~/py38/lib/python3.8/site-packages/pandas/core/groupby/generic.py:894, in DataFrameGroupBy.aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
    893 op = GroupByApply(self, func, args, kwargs)
--> 894 result = op.agg()
    895 if not is_dict_like(func) and result is not None:

File ~/py38/lib/python3.8/site-packages/pandas/core/apply.py:169, in Apply.agg(self)
    168 if is_dict_like(arg):
--> 169     return self.agg_dict_like()
    170 elif is_list_like(arg):
    171     # we require a list, but not a 'str'

File ~/py38/lib/python3.8/site-packages/pandas/core/apply.py:478, in Apply.agg_dict_like(self)
    476     selection = obj._selection
--> 478 arg = self.normalize_dictlike_arg("agg", selected_obj, arg)
    480 if selected_obj.ndim == 1:
    481     # key only used for output

File ~/py38/lib/python3.8/site-packages/pandas/core/apply.py:601, in Apply.normalize_dictlike_arg(self, how, obj, func)
    600         cols_sorted = list(safe_sort(list(cols)))
--> 601         raise KeyError(f"Column(s) {cols_sorted} do not exist")
    603 aggregator_types = (list, tuple, dict)

KeyError: "Column(s) ['start_date'] do not exist"

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
File ~/py38/lib/python3.8/site-packages/pandas/core/groupby/groupby.py:1558, in GroupBy.apply(self, func, *args, **kwargs)
   1557 try:
-> 1558     result = self._python_apply_general(f, self._selected_obj)
   1559 except TypeError:
   1560     # gh-20949
   1561     # try again, with .apply acting as a filtering
   (...)
   1565     # fails on *some* columns, e.g. a numeric operation
   1566     # on a string grouper column

File ~/py38/lib/python3.8/site-packages/pandas/core/groupby/groupby.py:1610, in GroupBy._python_apply_general(self, f, data, not_indexed_same, is_transform, is_agg)
   1582 """
   1583 Apply function f in python space
   1584 
   (...)
   1608     data after applying f
   1609 """
-> 1610 values, mutated = self.grouper.apply(f, data, self.axis)
   1611 if not_indexed_same is None:

File ~/py38/lib/python3.8/site-packages/pandas/core/groupby/ops.py:839, in BaseGrouper.apply(self, f, data, axis)
    838 group_axes = group.axes
--> 839 res = f(group)
    840 if not mutated and not _is_indexed_like(res, group_axes, axis):

File ~/py38/lib/python3.8/site-packages/pandas/core/resample.py:1208, in _GroupByMixin._apply.<locals>.func(x)
   1206     return getattr(x, f)(**kwargs)
-> 1208 return x.apply(f, *args, **kwargs)

File ~/py38/lib/python3.8/site-packages/pandas/core/resample.py:355, in Resampler.aggregate(self, func, *args, **kwargs)
    354     how = func
--> 355     result = self._groupby_and_aggregate(how, *args, **kwargs)
    357 result = self._apply_loffset(result)

File ~/py38/lib/python3.8/site-packages/pandas/core/resample.py:460, in Resampler._groupby_and_aggregate(self, how, *args, **kwargs)
    452 except (AttributeError, KeyError):
    453     # we have a non-reducing function; try to evaluate
    454     # alternatively we want to evaluate only a column of the input
   (...)
    458     #  on Series, raising AttributeError or KeyError
    459     #  (depending on whether the column lookup uses getattr/__getitem__)
--> 460     result = grouped.apply(how, *args, **kwargs)
    462 except ValueError as err:

File ~/py38/lib/python3.8/site-packages/pandas/core/groupby/groupby.py:1543, in GroupBy.apply(self, func, *args, **kwargs)
   1541             return func(g, *args, **kwargs)
-> 1543 elif hasattr(nanops, "nan" + func):
   1544     # TODO: should we wrap this in to e.g. _is_builtin_func?
   1545     f = getattr(nanops, "nan" + func)

TypeError: can only concatenate str (not "NoneType") to str

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
File ~/py38/lib/python3.8/site-packages/pandas/core/resample.py:448, in Resampler._groupby_and_aggregate(self, how, *args, **kwargs)
    447     else:
--> 448         result = grouped.aggregate(how, *args, **kwargs)
    449 except DataError:
    450     # got TypeErrors on aggregation

File ~/py38/lib/python3.8/site-packages/pandas/core/groupby/generic.py:894, in DataFrameGroupBy.aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
    893 op = GroupByApply(self, func, args, kwargs)
--> 894 result = op.agg()
    895 if not is_dict_like(func) and result is not None:

File ~/py38/lib/python3.8/site-packages/pandas/core/apply.py:169, in Apply.agg(self)
    168 if is_dict_like(arg):
--> 169     return self.agg_dict_like()
    170 elif is_list_like(arg):
    171     # we require a list, but not a 'str'

File ~/py38/lib/python3.8/site-packages/pandas/core/apply.py:478, in Apply.agg_dict_like(self)
    476     selection = obj._selection
--> 478 arg = self.normalize_dictlike_arg("agg", selected_obj, arg)
    480 if selected_obj.ndim == 1:
    481     # key only used for output

File ~/py38/lib/python3.8/site-packages/pandas/core/apply.py:601, in Apply.normalize_dictlike_arg(self, how, obj, func)
    600         cols_sorted = list(safe_sort(list(cols)))
--> 601         raise KeyError(f"Column(s) {cols_sorted} do not exist")
    603 aggregator_types = (list, tuple, dict)

KeyError: "Column(s) ['start_date'] do not exist"

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In [4], line 1
----> 1 create_weekly_profile()

File ~/py38/lib/python3.8/site-packages/dcarte/derived/weekly_profile.py:326, in create_weekly_profile()
    324 for dataset in parent_datasets.keys():
    325     p_datasets = {d[0]:dcarte.load(*d) for d in parent_datasets[dataset]} 
--> 326     LocalDataset(dataset_name = dataset,
    327                     datasets = p_datasets,
    328                     pipeline = [f'process_{dataset.lower()}'],
    329                     domain = domain,
    330                     module = module,
    331                     module_path = module_path,
    332                     dependencies = parent_datasets[dataset])

File <string>:20, in __init__(self, dataset_name, datasets, pipeline, domain, module_path, module, dependencies, since, until, delay, reapply, reload, update, home, compression, data_folder, data)

File ~/py38/lib/python3.8/site-packages/dcarte/local.py:90, in LocalDataset.__post_init__(self)
     86 self.metadata = {'since': self.since,
     87                  'until': self.until,
     88                  'Mac': cfg['mac']}
     89 self.register_dataset()
---> 90 self.load_dataset()
     91 self.data = read_table(self.local_file)

File ~/py38/lib/python3.8/site-packages/dcarte/local.py:126, in LocalDataset.load_dataset(self)
    124 if not path_exists(self.local_file) or self.reload:
    125     set_path(self.local_file)
--> 126     self.process_dataset()
    127 elif self.update:
    128     self.update_dataset()

File ~/py38/lib/python3.8/site-packages/dcarte/local.py:149, in LocalDataset.process_dataset(self)
    144 """process_dataset [summary]
    145 
    146 [extended_summary]
    147 """
    148 for func in self.pipeline:
--> 149     self.data = getattr(self._module, func)(self)
    150 # domains = pd.DataFrame(cfg['domains'])
    151 # dataset = np.array([self.domain,self.dataset_name])
    152 # # dataset_exist = (domains == dataset).all(axis=1).any()
    153 # # if not dataset_exist:
    154 # #     self.register_dataset()
    155 self.save_dataset()

File ~/dcarte/recipes/profile/weekly_profile.py:137, in process_sleep_dailies(obj)
    135 sleep_periods = sleep_vitals_.join(habits_).join(sleep_states_).round(2)
    136 sleep_periods = sleep_periods.dropna(subset=['time_in_bed','DEEP','hr_max'])
--> 137 sleep_metrics = resample_sleep_metrics(sleep_periods)
    138 diurnal_habits = resample_sleep_metrics(sleep_periods,'Diurnal')
    139 diurnal_habits = diurnal_habits.assign(nap_ibp = diurnal_habits.time_in_bed)

File ~/dcarte/recipes/profile/weekly_profile.py:149, in resample_sleep_metrics(sleep_periods, period_type)
    147 def resample_sleep_metrics(sleep_periods,period_type:str="Nocturnal"):
    148     habits = sleep_periods.query('period_type == @period_type').drop(columns=['period_type'])
--> 149     habits = (habits.
    150               reset_index().
    151               groupby('patient_id').
    152               resample('1D',offset='12h',on = 'start_date').agg(
    153                 start_time = ('start_date', 'min'),
    154                 end_time = ('end_date', 'max'),
    155                 nb_awakenings = ('awake_events' ,lambda x: x if x.shape[0]==1 else x.sum()+x.shape[0]-1),
    156                 time_in_bed = ('time_in_bed' ,'sum'),
    157                 period_obs = ('period_obs' ,'sum'),
    158                 minutes_snoring = ('minutes_snoring' ,'sum'),
    159                 heart_rate = ('heart_rate', 'mean'),
    160                 hr_min = ('hr_min' ,'min'),
    161                 hr_max = ('hr_max' ,'max'),
    162                 respiratory_rate = ('respiratory_rate' ,'mean'),
    163                 rr_min = ('rr_min' ,'min'),
    164                 rr_max = ('rr_max' ,'max'),
    165                 AWAKE = ('AWAKE' ,'sum'),
    166                 DEEP = ('DEEP' ,'sum'),
    167                 OTHER = ('OTHER' ,'sum')
    168               ).dropna())
    169     habits = habits.assign(bed_time_period=(habits.end_time - habits.start_time)/np.timedelta64(1, 'h'))
    170     habits = habits.assign(time_out_of_bed=(habits.bed_time_period - habits.time_in_bed))

File ~/py38/lib/python3.8/site-packages/pandas/core/resample.py:355, in Resampler.aggregate(self, func, *args, **kwargs)
    353 if result is None:
    354     how = func
--> 355     result = self._groupby_and_aggregate(how, *args, **kwargs)
    357 result = self._apply_loffset(result)
    358 return result

File ~/py38/lib/python3.8/site-packages/pandas/core/resample.py:1210, in _GroupByMixin._apply(self, f, *args, **kwargs)
   1206         return getattr(x, f)(**kwargs)
   1208     return x.apply(f, *args, **kwargs)
-> 1210 result = self._groupby.apply(func)
   1211 return self._wrap_result(result)

File ~/py38/lib/python3.8/site-packages/pandas/core/groupby/groupby.py:1569, in GroupBy.apply(self, func, *args, **kwargs)
   1559     except TypeError:
   1560         # gh-20949
   1561         # try again, with .apply acting as a filtering
   (...)
   1565         # fails on *some* columns, e.g. a numeric operation
   1566         # on a string grouper column
   1568         with self._group_selection_context():
-> 1569             return self._python_apply_general(f, self._selected_obj)
   1571 return result

File ~/py38/lib/python3.8/site-packages/pandas/core/groupby/groupby.py:1610, in GroupBy._python_apply_general(self, f, data, not_indexed_same, is_transform, is_agg)
   1573 @final
   1574 def _python_apply_general(
   1575     self,
   (...)
   1580     is_agg: bool = False,
   1581 ) -> NDFrameT:
   1582     """
   1583     Apply function f in python space
   1584 
   (...)
   1608         data after applying f
   1609     """
-> 1610     values, mutated = self.grouper.apply(f, data, self.axis)
   1611     if not_indexed_same is None:
   1612         not_indexed_same = mutated or self.mutated

File ~/py38/lib/python3.8/site-packages/pandas/core/groupby/ops.py:839, in BaseGrouper.apply(self, f, data, axis)
    837 # group might be modified
    838 group_axes = group.axes
--> 839 res = f(group)
    840 if not mutated and not _is_indexed_like(res, group_axes, axis):
    841     mutated = True

File ~/py38/lib/python3.8/site-packages/pandas/core/resample.py:1208, in _GroupByMixin._apply.<locals>.func(x)
   1205 if isinstance(f, str):
   1206     return getattr(x, f)(**kwargs)
-> 1208 return x.apply(f, *args, **kwargs)

File ~/py38/lib/python3.8/site-packages/pandas/core/resample.py:355, in Resampler.aggregate(self, func, *args, **kwargs)
    353 if result is None:
    354     how = func
--> 355     result = self._groupby_and_aggregate(how, *args, **kwargs)
    357 result = self._apply_loffset(result)
    358 return result

File ~/py38/lib/python3.8/site-packages/pandas/core/resample.py:460, in Resampler._groupby_and_aggregate(self, how, *args, **kwargs)
    451     result = grouped.apply(how, *args, **kwargs)
    452 except (AttributeError, KeyError):
    453     # we have a non-reducing function; try to evaluate
    454     # alternatively we want to evaluate only a column of the input
   (...)
    458     #  on Series, raising AttributeError or KeyError
    459     #  (depending on whether the column lookup uses getattr/__getitem__)
--> 460     result = grouped.apply(how, *args, **kwargs)
    462 except ValueError as err:
    463     if "Must produce aggregated value" in str(err):
    464         # raised in _aggregate_named
    465         # see test_apply_without_aggregation, test_apply_with_mutated_index

File ~/py38/lib/python3.8/site-packages/pandas/core/groupby/groupby.py:1543, in GroupBy.apply(self, func, *args, **kwargs)
   1540         with np.errstate(all="ignore"):
   1541             return func(g, *args, **kwargs)
-> 1543 elif hasattr(nanops, "nan" + func):
   1544     # TODO: should we wrap this in to e.g. _is_builtin_func?
   1545     f = getattr(nanops, "nan" + func)
   1547 else:

TypeError: can only concatenate str (not "NoneType") to str

Updating domain removes participants

Using dcarte.update_domain() updates the entire data it is dependent on; however, when loading a dataset from anotther domain it does not have all participants. To work around this I used reapply to each individual dataset from every domain in the ordrer of their cross-dependencies.

Dailies, Update=True problem

I am using dcarte version 0.4.08.

I am calling:
phys = dcarte.load('Physiology_Dailies','PROFILE', update=True)

This runs for ~11mins but then get an error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Untitled-1.ipynb Cell 3 in <cell line: 1>()
----> [1](vscode-notebook-cell:Untitled-1.ipynb?jupyter-notebook#W2sdW50aXRsZWQ%3D?line=0) phys = dcarte.load('Physiology_Dailies','PROFILE', reload=True)
      [2](vscode-notebook-cell:Untitled-1.ipynb?jupyter-notebook#W2sdW50aXRsZWQ%3D?line=1) phys = phys.reset_index()

File ~\Anaconda3\envs\sandbox\lib\site-packages\dcarte\utils.py:229, in timer.<locals>.wrapper.<locals>.wrapped(*fun_args, **fun_kwargs)
    227 else:
    228     prefix = f'Finished {desc} in:'
--> 229 out = fun(*fun_args, **fun_kwargs)
    230 elapsed = time.perf_counter() - start
    231 dur = f'{np.round(elapsed,1)}'

File ~\Anaconda3\envs\sandbox\lib\site-packages\dcarte\load.py:60, in load(dataset, domain, **kwargs)
     58     dflt['reapply'] = False
     59     Path(local_file).unlink()
---> 60     return load(dataset,domain,**dflt)
     62 if path_exists(local_file):
     63     if not (dflt['reload'] or dflt['reapply'] or dflt['update']):

File ~\Anaconda3\envs\sandbox\lib\site-packages\dcarte\utils.py:229, in timer.<locals>.wrapper.<locals>.wrapped(*fun_args, **fun_kwargs)
    227 else:
    228     prefix = f'Finished {desc} in:'
--> 229 out = fun(*fun_args, **fun_kwargs)
    230 elapsed = time.perf_counter() - start
...
    695         "Casting to unit-less dtype 'datetime64' is not supported. "
    696         "Pass e.g. 'datetime64[ns]' instead."
    697     )

TypeError: Cannot use .astype to convert from timezone-aware dtype to timezone-naive dtype. Use obj.tz_localize(None) or obj.tz_convert('UTC').tz_localize(None) instead.

The same error appears for sleep_dailies and for loading from base
sleep_dailies = dcarte.load('sleep_dailies','profile', update=True)
phys = dcarte.load('Physiology','BASE', update=True)

tokenless usage

Is there a way to add a token into my code somewhere, or save it in a file, so when starting up I don't get the token promt but use the already provided token directly?

[question/request] Is there a way to load data that is related to a single patient_id?

It seems like right now I can only load the whole dataset related to a single domain.

Is there a way to specify a signle patient_id while loading the dataset?

JSONDecodeError

Motion = dcarte.load('Motion','base',update=True) runs forever and then crashes with the following error:
same error with dcarte.load('Activity_Dailies','profile',update=True)

JSONDecodeError Traceback (most recent call last)
File ~/miniconda3/envs/python3.8/lib/python3.8/site-packages/requests/models.py:971, in Response.json(self, **kwargs)
970 try:
--> 971 return complexjson.loads(self.text, **kwargs)
972 except JSONDecodeError as e:
973 # Catch JSON-related errors and raise as requests.JSONDecodeError
974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File ~/miniconda3/envs/python3.8/lib/python3.8/json/init.py:357, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
354 if (cls is None and object_hook is None and
355 parse_int is None and parse_float is None and
356 parse_constant is None and object_pairs_hook is None and not kw):
--> 357 return _default_decoder.decode(s)
358 if cls is None:

File ~/miniconda3/envs/python3.8/lib/python3.8/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
333 """Return the Python representation of s (a str instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()

File ~/miniconda3/envs/python3.8/lib/python3.8/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError Traceback (most recent call last)
Cell In[26], line 2
1 update = True
----> 2 Motion = dcarte.load('Motion','base',update=update)
3 Entryway = dcarte.load('Entryway','Base',update=update)
4 Physiology = dcarte.load('Physiology','Base',update=update)

File ~/miniconda3/envs/python3.8/lib/python3.8/site-packages/dcarte/utils.py:230, in timer..wrapper..wrapped(*fun_args, **fun_kwargs)
228 else:
229 prefix = f'Finished {desc} in:'
--> 230 out = fun(*fun_args, **fun_kwargs)
231 elapsed = time.perf_counter() - start
232 dur = f'{np.round(elapsed,1)}'

File ~/miniconda3/envs/python3.8/lib/python3.8/site-packages/dcarte/load.py:91, in load(dataset, domain, **kwargs)
89 dflt['reapply'] = False
90 for _,row in dependencies.iterrows():
---> 91 parent_datasets[row.dataset] = load(row.dataset,row.domain, **dflt)
92 input = {'dataset_name':dataset,
93 'datasets':parent_datasets,
94 'pipeline':info[dataset]['pipeline'],
(...)
98 'dependencies':info[dataset]['domains'],
99 'domain':domain}
100 input = merge_dicts(input,dflt)

File ~/miniconda3/envs/python3.8/lib/python3.8/site-packages/dcarte/load.py:84, in load(dataset, domain, **kwargs)
78 input = {'dataset_name':dataset,
79 'datasets':info[dataset]['datasets'],
80 'columns':info[dataset]['columns'],
81 'dtypes':info[dataset]['dtype'],
82 'domain':domain}
83 input = merge_dicts(input,dflt)
---> 84 output = MinderDataset(**input)
85 else:
86 dependencies = pd.DataFrame(info[dataset]['domains'])

File :23, in init(self, dataset_name, datasets, columns, domain, dtypes, since, until, log_level, delay, organizations, headers, server, home, compression, data_folder, data, request_id, reload, reapply, update)

File ~/miniconda3/envs/python3.8/lib/python3.8/site-packages/dcarte/minder.py:127, in MinderDataset.post_init(self)
125 self.data = read_table(self.local_file)
126 elif self.update:
--> 127 self.update_dataset()
128 else:
129 self.data = read_table(self.local_file)

File ~/miniconda3/envs/python3.8/lib/python3.8/site-packages/dcarte/minder.py:267, in MinderDataset.update_dataset(self)
265 self.data_request['until'] = self.until
266 self.post_request()
--> 267 self.process_request()
268 if 'url' in self.csv_url.columns:
269 self.download_data()

File ~/miniconda3/envs/python3.8/lib/python3.8/site-packages/dcarte/minder.py:169, in MinderDataset.process_request(self, sleep_time)
167 while request_output.empty:
168 sleep(sleep_time)
--> 169 request_output = self.get_output()
170 self.csv_url = request_output
171 print('')

File ~/miniconda3/envs/python3.8/lib/python3.8/site-packages/dcarte/minder.py:176, in MinderDataset.get_output(self)
174 try:
175 with requests.get(f'{self.server}/{self.request_id}/', auth=self.auth) as request:
--> 176 request_elements = pd.DataFrame(request.json())
177 output = pd.DataFrame()
178 if request_elements.status.iat[0] == 202:

File ~/miniconda3/envs/python3.8/lib/python3.8/site-packages/requests/models.py:975, in Response.json(self, **kwargs)
971 return complexjson.loads(self.text, **kwargs)
972 except JSONDecodeError as e:
973 # Catch JSON-related errors and raise as requests.JSONDecodeError
974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Unable to limit data requests to specific cohorts

Given an access token that provides access to data from multiple cohorts it would be very helpful to be able to retrieve data for a subset of those cohorts. The server provides the organizations specifier for this purpose:

{
        "since": "2023-02-28T00:00:00.000Z",
        "until": "2023-03-01T00:00:00.000Z",
        "datasets": {
            "raw_activity_pir": {
                "columns": [
                    "start_date",
                    "patient_id",
                    "home_id",
                    "location_id",
                    "location_name"
                ]
            }
        },
        "organizations": [
            "F3iU1mKLzpJWWG7BEQViAZ",
            "2zx3cAh81xxTuKTzXCxYDF"
        ]
    }

If unspecified by user then library can continue to exclude organizations from request, retaining current behaviour of returning data for all cohorts that token has access to.

Possible bug in sleep data

I am investigating the analysis fot eh sleep data to add to the fall detection algorithm.
When analysing the Sleep Dailies - PROFILE for patient 'B3cr4Euax91r63NSvVhrxs' on date 27/08/21, there is a profile for that day but no data for Sleep_mat or Sleep_Event RAW or Sleep - BASE exist for the time that the person could have been in bed.
No data corresponding to the night of 2021-08-26 or to the morning of 2021-08-27.

In Sleep - BASE, the last entry around that period is for 2021-08-26 at 07:22:00.
In Sleep_Event - RAW, the last entry is on 2021-08-26 at 11:15:30 and it then returns at 2021-08-27 23:10:48
In Sleep_Mat - RAW, the last entry is on 2021-08-26 at 06:22:00

'sleep is not a registered dataset in base domain in dcarte'

When running

update = False
Motion = dcarte.load('Motion','base',update=update)
Entryway = dcarte.load('Entryway','Base',update=update)
Physiology = dcarte.load('Physiology','Base',update=update)
Sleep = dcarte.load('Sleep','Base',update=update)

The package returns 'motion' not a registered dataset in base domain in dcarte`.
This is returned when trying to access any of the sets, i.e., motion, entryway, physiology, sleep.

Full traceback:

Input In [8], in <cell line: 4>()
      1 update = False
      2 # Motion = dcarte.load('Motion','base',update=update)
      3 # Entryway = dcarte.load('Entryway','Base',update=update)
----> 4 Physiology = dcarte.load('Physiology','Base',update=update)
      5 Sleep = dcarte.load('Sleep','Base',update=update)

File ~/anaconda3/lib/python3.9/site-packages/dcarte/utils.py:156, in timer.<locals>.wrapper.<locals>.wrapped(*fun_args, **fun_kwargs)
    154 else:
    155     prefix = f'Finished {desc} in:'
--> 156 out = fun(*fun_args, **fun_kwargs)
    157 elapsed = time.perf_counter() - start
    158 dur = f'{np.round(elapsed,1)}'

File ~/anaconda3/lib/python3.9/site-packages/dcarte/load.py:46, in load(dataset, domain, **kwargs)
     42 datasets = pd.DataFrame(cfg['domains'])
     45 if not (datasets == np.array([dataset,domain])).all(axis=1).any():
---> 46     raise Exception(f"Sorry, {dataset} is not a registered dataset in {domain} domain in dcarte")
     50 local_file = f'{data_folder}{sep}{domain}{sep}{dataset}.parquet'
     51 if (dflt['reapply'] or dflt['reload']) and path_exists(local_file):

Exception: Sorry, physiology is not a registered dataset in base domain in dcarte

However, running base.py allows me to access some of the datasets (the physiology with the base module, sleep, and door).