Git Product home page Git Product logo

cv19index's Introduction

Join us for our webinar on the CV19 Index on Wednesday, April 8th, 2020 from 2:00 – 3:00pm CDT.

With the 1.1.0 release, the CV19 Index can now make predictions for any adult. It is no longer restricted to Medicare populations.

The COVID-19 Vulnerability Index (CV19 Index)

Ion Channel Status License PyPI version Release

Install | Data Prep | Running The Model | Interpreting Results | Model Performance | Contributing | Release Notes

The COVID-19 Vulnerability Index (CV19 Index) is a predictive model that identifies people who are likely to have a heightened vulnerability to severe complications from COVID-19 (commonly referred to as “The Coronavirus”). The CV19 Index is intended to help hospitals, federal / state / local public health agencies and other healthcare organizations in their work to identify, plan for, respond to, and reduce the impact of COVID-19 in their communities.

Full information on the CV19 Index, including the links to a full FAQ, User Forums, and information about upcoming Webinars is available at http://cv19index.com

Data Requirements

This repository provides information for those interested in running the COVID-19 Vulnerability Index on their own data. We provide the index as a pretrained model implemented in Python. We provide the source code, models, and example usage of the CV19 Index.

The CV19 Index utilizes only a few fields which can be extracted from administrative claims or electronic medical records. The data requirements have intentionally been kept very limited in order to facilitate rapid implementation while still providing good predictive power. ClosedLoop is also offering a free, hosted version of the CV19 Index that uses additional data and provides better accuracy. For more information, see http://cv19index.com

Install

The CV19 Index can be installed from PyPI:

pip install cv19index

Notes for windows users: Some Microsoft Windows users have gotten errors when running pip related to installing the SHAP and XGBoost dependencies. For these users we have provided prebuilt wheel files. To use these, download the wheel for SHAP and/or XGBoost to your machine. Then, from the directory where you downloaded the files, run:

pip install xgboost-1.0.2-py3-none-win_amd64.whl
pip install shap-0.35.0-cp37-cp37m-win_amd64.whl

These wheel files are for Python 3.7. If you have a different Python version and would like prebuilt binaries, try https://www.lfd.uci.edu/~gohlke/pythonlibs/ . If you still have trouble, please create a GitHub issue.

Data Prep

The CV19 Index requires 2 data files, a demographics file and a claims file. They can be comma-separated value (CSV) or Excel files. The first row is a header file and remaining rows contain the data. In each file, certain columns are used, and any extra columns will be ignored.

The model requires at least 6 months of claims history, so only those members with at least 6 months of prior history should be included. It is not necessary that they have any claims during this period.

Sample input files are in the examples directory. demographics.xlsx and claims.xlsx

Demographics File

The demographics file should contain one row for each person on whom you want to run a prediction.

There are 3 required fields in the demographics file:

  • personId - An identifier for each person. It is only used as a link into the claims table and to identify individuals in the output file. Each row in the demographics file must have a unique personId.
  • age - Current age in years, specified as an integer
  • gender - This can either be a string column containing the values 'male' or 'female', or it can be an integer column containing the numbers 0 and 1. For integer columns, 0 is female and 1 is male. Only binary genders are currently supported.

Claims File

The claims file contains a summary of medical claims for each patient. There can be multiple rows for each patient, one per claim. Both inpatient and outpatient claims should be included in the one file. If a patient has no claims, that patient should have no corresponding rows in this file.

There are 6 required fields and several optional fields in the claims file:

  • personId - An identified that should match with the personId from the demographics table.
  • admitDate - For inpatient claims, this is the date of admission to the hospital. For outpatient claims this should be the date of service. This field should always be filled in. Dates in CSV files should be in the form YYYY-MM-DD.
  • dischargeDate - For inpatient claims, this is the date of discharge from the hospital. For outpatient claims this should be left empty.
  • erVisit - Flag indicating whether this claim was for an emergency room visit. Values which are empty, 0, or false will be considered false. Other values will be considered true.
  • inpatient - Flag indicating whether this was an inpatient claim. If true, then dischargeDate should be set. Values which are empty, 0, or false will be considered false. Other values will be considered true.
  • dx1 - This field contains the primary ICD-10 diagnosis code for the claim. The code should be a string and can be entered with or without the period. e.g. Z79.4 or Z794
  • dx2-dx15 - These are optional fields that can contain additional diagnosis codes for the claim. The ordering of diagnosis codes is not important.

Note, if a patient first goes to the emergency room and then is later admitted, both the erVisit and inpatient flags should be set to true.

If you need to enter more than 15 diagnosis codes for a claim, you can repeat the row, set the erVisit and inpatient flags to false, and then add in the additional diagnosis codes on the new row.

Running the model

If you have installed the CV19 Index from PyPI, it will create an executable that you can run. The following command run from the root directory of the GitHub checkout will generate predictions on the example data and put results at examples/predictions.csv.

Note: The -a 2018-12-31 is only needed because the example data is from 2018. If you are using current data you can omit this argument.

cv19index -a 2018-12-31 examples/demographics.csv examples/claims.csv examples/predictions.csv

We also prove a run_cv19index.py scripts you can use to generate predictions from Python directly:

python run_cv19index.py -a 2018-12-31 examples/demographics.csv examples/claims.csv examples/predictions.csv

Help is available which provides full details on all of the available options:

python run_cv19index.py -h

Interpreting the results

The output file created by the CV19 Index contains the predictions along with the explanations of the factors the influenced those predictions.

If you simply want a list of the most vulnerable people, sort the file based on descending prediction. This will give you the population sorted by vulnerability, with the most vulnerable person first.

If you'd like to do more analysis, the predictions file also contains other information, including explanations of which factors most influenced the risk, both positively and negatively.

Here is a sample of the predictions output:

personId prediction risk_score pos_factors_1 pos_patient_values_1 pos_shap_scores_1 ...
772775338f7ee353 0.017149 100 Diagnosis of Pneumonia True 0.358
d45d10ed2ec861c4 0.008979 98 Diagnosis of Pneumonia True 0.264

In addition to the personId, the output contains:

  • prediction - This is raw outcome of the model. It should not be interpreted as the probability that the patient will have complications related to COVID-19 due to several factors, including the fact that a proxy endpoint was used and details around how the model was trained. A doubling of the prediction value indicates a doubling of the person's risk.
  • risk_score - This percentile which indicates where this prediction lies in the distribution of predictinos on the test set. A value of 95 indicates that the prediction was higher than 95% of the test population, which was designed to be representative of the overall US population.
  • pos_factors_1-10 - These are the names of the Contributing Factors which most increased the risk for this person. Factor 1 had the largest effect and 10 had the least. Not everyone will have 10 positive factors.
  • pos_patient_values_1-10 - The feature value that this person had for the associated Contributing Factor. For example, if factor 1 is "Diagnosis of Diseases of the circulatory system in the previous 12 months" and the value is "TRUE", that means the most important variable which increased this person's risk is that they were diagnosed with a circulatory disease in the last 12 months. All of the diagnosis categories are available in the CCSR.
  • pos_shap_scores_1-10 - Contributing factors are calculated using SHAP scores. These are the SHAP scores associated with the factors.
  • neg_factors_1-10 - These are equivalent to the pos_factors, except these are features that decreased the person's risk.
  • neg_patient_values_1-10 - These are equivalent to the pos_patient_values, except these are features that decreased the person's risk.
  • neg_shap_scores_1-10 - These are equivalent to the pos_shap_scores, except these are features that decreased the person's risk.

Model Performance

There are 3 different versions of the CV19 Index. Each is a different predictive model for the CV19 Index. The models represent different tradeoffs between ease of implementation and overall accuracy. A full description of the creation of these models is available in the accompanying MedRxiv paper, "Building a COVID-19 Vulnerability Index" (http://cv19index.com).

The 3 models are:

  • Simple Linear - A simple linear logistic regression model that uses only 14 variables. An implementation of this model is included in this package. This model had a 0.731 ROC AUC on our test set. A pickle file containing the parameters for this model is available in the lr.p file.

  • Open Source ML - An XGBoost model, packaged with this repository, that uses Age, Gender, and 500+ features defined from the CCSR categorization of diagnosis codes. This model had a 0.810 ROC AUC on our test set.

  • Free Full - An XGBoost model that fully utilizes all the data available in Medicare claims, along with geographically linked public and Social Determinants of Health data. This model provides the highest accuracy of the 3 CV19 Indexes but requires additional linked data and transformations that preclude a straightforward open-source implementation. ClosedLoop is making a free, hosted version of this model available to healthcare organizations. For more information, see http://cv19index.com.

We evaluate the model using a full train/test split. The models are tested on 369,865 individuals. We express model performance using the standard ROC curves, as well as the following metrics:

Model ROC AUC Sensitivity as 3% Alert Rate Sensitivity as 5% Alert Rate
Logistic Regression .731 .214 .314
XGBoost, Diagnosis History + Age .810 .234 .324
XGBoost, Full Features .810 .251 .336

Contributing to the CV19 Index

We are not allowed to share the data used to train the models with our collaborators, but there are tons of ways you can help. If you are interested in participating, just pick up one of the issues marked with the GitHub "help wanted" tag or contact us at [email protected]

A few examples are:

cv19index's People

Contributors

ben-tuttle-cl avatar davedecaprio avatar hglman avatar kdrobnyh avatar meagansasse avatar thadeusb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cv19index's Issues

No such file when running cv19index executable

Got this error when running the executable:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/daved/Desktop/python-virtual-environments/env/lib/python3.8/site-packages/cv19index/resources/demographics.schema.json'

Errors installing python dependencies

Several people have had issues installing the python dependencies on windows. Often these involve errors asking them to install Visual C++ libraries.

Running the CV19 Index Predictor

do_run(input_fpath, input_schema, model, output)

Traceback (most recent call last):

File "", line 1, in
do_run(input_fpath, input_schema, model, output)

File "..\cv19index\predict.py", line 360, in do_run
model = read_model(model_fpath)

File "..\cv19index\io.py", line 19, in read_model
return pickle.load(fobj)

File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 981, in setstate
_check_call(_LIB.XGBoosterLoadModelFromBuffer(handle, ptr, length))

File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 176, in _check_call
raise XGBoostError(py_str(_LIB.XGBGetLastError()))

XGBoostError: [15:34:02] C:\Jenkins\workspace\xgboost-win64_release_0.90\src\gbm\gbm.cc:20: Unknown gbm type

Error running inference, Python version: 3.6.9

Edit: Stack Trace below
Error : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 341: invalid start byte

INFO:/content/cv19index/cv19index/predict.py:Reading model from /content/cv19index/cv19index/resources/xgboost_all_ages/model.pickle.  Writing results to examples/predictions.csv
DEBUG:cv19index.preprocess:Beginning claims data frame preprocessing, raw data frame as follows.
DEBUG:cv19index.preprocess:           personId  ... dx15
0  001ef63fe5cb0cc5  ...     
1  001ef63fe5cb0cc5  ...     
2  001ef63fe5cb0cc5  ...     
3  001ef63fe5cb0cc5  ...     
4  001ef63fe5cb0cc5  ...     

[5 rows x 21 columns]
DEBUG:cv19index.preprocess:Filtered claims to just those within the dates 2017-12-31 to 2018-12-31.  Claim count went from 68481 to 35735
DEBUG:cv19index.preprocess:Cleaning diagnosis codes.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
DEBUG:cv19index.preprocess:Computing diagnosis flags.
DEBUG:cv19index.preprocess:Preprocessing complete.
INFO:/content/cv19index/cv19index/predict.py:Reordering test inputs to match training.
INFO:/content/cv19index/cv19index/predict.py:Scale pos weight is 24.882477847848648. Rescaling predictions to probabilities
WARNING:/content/cv19index/cv19index/shap_top_factors.py:Computing SHAP scores.  Approximate = False
Setting feature_perturbation = "tree_path_dependent" because no background data was given.
Traceback (most recent call last):
  File "run_cv19index.py", line 6, in <module>
    cv19index.predict.main()
  File "/content/cv19index/cv19index/predict.py", line 454, in main
    do_run_claims(args.demographics_file, args.claims_file, args.output_file, args.model, args.as_of_date, args.feature_file)
  File "/content/cv19index/cv19index/predict.py", line 431, in do_run_claims
    predictions = run_xgb_model(input_df, model, quote = quote)
  File "/content/cv19index/cv19index/predict.py", line 376, in run_xgb_model
    **kwargs,
  File "/content/cv19index/cv19index/predict.py", line 161, in perform_predictions
    df_cutoff, model, outcome_column, mapping, **kwargs
  File "/content/cv19index/cv19index/shap_top_factors.py", line 127, in generate_shap_top_factors
    shap_values = shap.TreeExplainer(model).shap_values(
  File "/usr/local/lib/python3.6/dist-packages/shap/explainers/tree.py", line 121, in __init__
    self.model = TreeEnsemble(model, self.data, self.data_missing, model_output)
  File "/usr/local/lib/python3.6/dist-packages/shap/explainers/tree.py", line 726, in __init__
    xgb_loader = XGBTreeModelLoader(self.original_model)
  File "/usr/local/lib/python3.6/dist-packages/shap/explainers/tree.py", line 1326, in __init__
    self.name_obj = self.read_str(self.name_obj_len)
  File "/usr/local/lib/python3.6/dist-packages/shap/explainers/tree.py", line 1456, in read_str
    val = self.buf[self.pos:self.pos+size].decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 341: invalid start byte
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-9-4aabaf703f58> in <module>()
----> 1 get_ipython().run_cell_magic('shell', '', 'cd cv19index/\npython run_cv19index.py -a 2018-12-31 examples/demographics.csv examples/claims.csv examples/predictions.csv -f examples/features.csv\n# cv19index -a 2018-12-31 examples/demographics.csv examples/claims.csv examples/predictions.csv')

2 frames
/usr/local/lib/python3.6/dist-packages/google/colab/_system_commands.py in check_returncode(self)
    136     if self.returncode:
    137       raise subprocess.CalledProcessError(
--> 138           returncode=self.returncode, cmd=self.args, output=self.output)
    139 
    140   def _repr_pretty_(self, p, cycle):  # pylint:disable=unused-argument

Add BlueButton support

We currently have a CSV file format for getting in claims data. Another useful format would be to take in data in FHIR JSON format. BlueButton is an example of a source like this. Example files are available for the BlueButton developer sandbox.

Script crashes if there are 0 inpatient records w/out discharge date

Hi,

Thanks for the script!

When I run a claims file that is only outpatient data (all inpatient = FALSE), or an inpatient without a discharge date the script produces this error:

Traceback (most recent call last):
  File "run_cv19index.py", line 6, in <module>
    cv19index.predict.main()
  File "c:\Users\chris-pickering\Documents\Projects\cv19index\cv19index\cv19index\predict.py", line 445, in main
    do_run_claims(args.demographics_file, args.claims_file, args.output_file, args.model, args.as_of_date, args.feature_file)
  File "c:\Users\chris-pickering\Documents\Projects\cv19index\cv19index\cv19index\predict.py", line 416, in do_run_claims
    input_df = preprocess_xgboost(claim_df, demo_df, asOfDate)
  File "c:\Users\chris-pickering\Documents\Projects\cv19index\cv19index\cv19index\preprocess.py", line 79, in preprocess_xgboost
    inpatient_days = pd.Series((inpatient_rows['dischargeDate'].dt.date - inpatient_rows['admitDate'].dt.date).dt.days, index=claim_df['personId'])
  File "C:\Users\chris-pickering\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 4372, in __getattr__
    return object.__getattribute__(self, name)
  File "C:\Users\chris-pickering\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\accessor.py", line 133, in __get__
    accessor_obj = self._accessor(obj)
  File "C:\Users\chris-pickering\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\accessors.py", line 325, in __new__
    raise AttributeError("Can only use .dt accessor with datetimelike "
AttributeError: Can only use .dt accessor with datetimelike values

To resolve, I changed on claim to be inpatient=TRUE and added a discharge date of today.

Can we update the script for this scenario?

Thanks!
Christopher

Age cap?

Does the model have an intrinsic age cap? We have a population of patients ranging from age 22 to 107. After running the xgboost model using the tutorial, we are seeing all patients aging 98 - 107 with a risk score of only 30, while patients age 92 all have the highest risk score at 73. Our sample sizes are large enough that we would expect to see at least some of the oldest patients in the highest risk category. This made us wonder if there is some kind of intrinsic age cap in the model.

Questions to feature mapping?

How are the questions asked in here, mapped to the 15 features dx0 to dx15 and ER visits?

How to run inference on the simpler regression model? What inputs/features to give etc?

A script to get ROC AUC results without cv19index package

It would be great if you can just provide an end-to-end working python script that simply outputs the ROC AUC on the data you have in this repo. This way people can quickly replace their models with your xgboost model and try new things out.

Requiring cv19index to be installed as a package should not be needed as we are really fighting against time as humanity.

ICD 10 Cleaning Code Removing Valid Codes

Hey guys, I love this tool and am working with it right now, what a great contribution.

One thing I noticed, the function to clean ICD 10 codes from the Tutorial seems to be only returning codes without dots and >3 characters. It was not 100% clear that if I supplied codes with dots that they would not make it through so I got stuck on that for a while.

The bigger problem though is that there are valid ICD10 codes with only 3 characters, F99, R17, ... they appear in the CSSR docs but will never make it into a dataset from what I have tested. Thanks again just wanted to document these speed bumps.

I was able to resolve simply by adding another return after the else: since I did some cleaning on my ICD10s (too much :D) already. But I could see how for a more general case you'd want to add some other validation.

def cleanICD10Syntax(code): if len(code) > 3 and '.' not in code: return code[:3] + '.' + code[3:] else: code

Error with reorder_inputs

Hello,
I tried running the prediction as followed, and got this error. This seems to be because model["model"].feature_names is empty, resulting in the 'NoneType' error in the reorder_inputs function. Can you please help fix it? Thank you.

    input_schema = resource_filename("cv19index", "resources/xgboost/input.csv.schema.json")
    model = resource_filename("cv19index", "resources/xgboost/model.pickle")
    model = resource_filename("cv19index", "resources/model_simple/model.pickle")

    asOfDate = '2020-01-31'
    fclaim = "data/claim_test.csv"
    fdemo = "data/demo_test.csv"
    output = "data/prediction_test.csv"
    model_name = "xgboost"

    do_run_claims(fdemo, fclaim, output, model_name, asOfDate, feature_file = None)

Traceback (most recent call last):

File "", line 12, in
do_run_claims(fdemo, fclaim, output, model_name, asOfDate, feature_file = None)

File "C:\Users..\Python\Python37\site-packages\cv19index\predict.py", line 431, in do_run_claims
predictions = run_xgb_model(input_df, model, quote = quote)

File "C:\Users..\Python\Python37\site-packages\cv19index\predict.py", line 359, in run_xgb_model
df_inputs = reorder_inputs(df_inputs, predictor)

File "C:\Users..\Python\Python37\site-packages\cv19index\predict.py", line 342, in reorder_inputs
if set(predictor["model"].feature_names) == set(df_inputs.columns) and predictor[

TypeError: 'NoneType' object is not iterable

from cv19index.io import read_model

model_name = "xgboost"
schema_fpath = resource_filename("cv19index", f"resources/{model_name}/input.csv.schema.json")
model_fpath = resource_filename("cv19index",f"resources/{model_name}/model.pickle")
model = read_model(model_fpath)

print(f'model["model"].feature_names: {model["model"].feature_names}')

Output:
model["model"].feature_names: None

Version:
cv19index: 1.1.4
xgboost: 1.4.0

Error with _NA_VALUES

This issue has been reported:

I'm running this through the Anaconda environment and getting dependency issues:

Traceback (most recent call last):
File "run_cv19index.py", line 3, in
import cv19index.predict
File "C:\Users\eugene.nguyen\Desktop\cv19index-master\cv19index\predict.py", line 14, in
from .io import read_frame, read_model, write_predictions, read_claim, read_demographics
File "C:\Users\eugene.nguyen\Desktop\cv19index-master\cv19index\io.py", line 9, in
from pandas.io.common import _NA_VALUES
ImportError: cannot import name '_NA_VALUES' from 'pandas.io.common' (C:\Users\eugene.nguyen\Anaconda3\lib\site-packages\pandas\io\common.py)

urllib.parse.quote: improper usage?

I got a problem running the model:
ValueError: feature_names mismatch

The problem is that df_inputs in predict.py:253 has columns with wrong names, like

Diagnosis%20of%20Nephritis_%20nephrosis_%20renal%20sclerosis%20in%20the%20previous%2012%20months

This is because of use urllib.parse.quote here:
df_inputs.columns = [urllib.parse.quote(col) for col in df_inputs.columns]

Python 3.7.7 and 3.8.1, Windows 10.

inpatient days mismatch

Hello, when running the results, I found that the value of inpatient days is not aligned with what I observed in the original claim input file, e.g. patients having no inpatient visits but have inpatient days of 24, or vice versa. Upon debugging, it seems it lines in the part where the inpatient_days is created with index using claim_df, this actually chose only value of date_diff where index == personId.

    preprocessed_df['# of Admissions (12M)'] = inpatient_rows.groupby('personId').admitDate.nunique()
    date_diff = pd.to_timedelta(inpatient_rows['dischargeDate'].dt.date - inpatient_rows['admitDate'].dt.date)
    inpatient_days = pd.Series(date_diff.dt.days, index=claim_df['personId'])
    preprocessed_df['Inpatient Days'] = inpatient_days.groupby('personId').sum()

Example of date_diff:
date_diff.dt.days
10 8
29 2
53 2
56 9
60 2
..
1333281 3
1333325 2 --> if there was a personid == 1333325, then there inpatient days is 2, while this is the index of the claim_df, not related to personId.
1333336 10
1333337 5
1333340 5
Length: 74609, dtype: int64


The claim_df and demo_df were set up as suggested:

  • demo_df has unique row for each patient with age and gender
  • claim_df has one or multiple rows for each patient (only patient with claims are included).
    Please let me know if you have any suggestion? Thank you.

Add Conda package for windows

Several people have had problems installing xgboost on windows. We should provide directions for installing using Anaconda on Windows. This will be easier than pip.

TypeError: ('Timestamp subtraction must have the same timezones or no timezones', 'occurred at index 92')

File "/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py", line 6928, in apply
return op.get_result()

File "/usr/local/lib/python3.6/dist-packages/pandas/core/apply.py", line 186, in get_result
return self.apply_standard()

File "/usr/local/lib/python3.6/dist-packages/pandas/core/apply.py", line 292, in apply_standard
self.apply_series_generator()

File "/usr/local/lib/python3.6/dist-packages/pandas/core/apply.py", line 321, in apply_series_generator
results[i] = self.f(v)

File "", line 16, in
inpatient_df['Inpatient Days'] = inpatient_df[['dischargeDate','admitDate']].apply(lambda x: (pd.to_datetime(x.dischargeDate) - pd.to_datetime(x.admitDate)).days, axis=1)

File "pandas/_libs/tslibs/c_timestamp.pyx", line 295, in pandas._libs.tslibs.c_timestamp._Timestamp.sub

TypeError: ('Timestamp subtraction must have the same timezones or no timezones', 'occurred at index 92')

Support Python 3.5

Currently we only support Python 3.6 because of the type hinting that we have. We should check whether there is a backwards compatible way to support this in Python 3.5. Otherwise we should look at whether we should remove those hints to increase the availability of the code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.