yu-group / covid19-severity-prediction Goto Github PK
View Code? Open in Web Editor NEWExtensive and accessible COVID-19 data + forecasting for counties and hospitals. ๐
Home Page: https://arxiv.org/abs/2005.07882
License: MIT License
Extensive and accessible COVID-19 data + forecasting for counties and hospitals. ๐
Home Page: https://arxiv.org/abs/2005.07882
License: MIT License
When I try to run the predict_all_death.py, it shows No such file or directory: 'all_deaths_preds_6_21.pkl'
I am not sure where is this file
The last 2 lines of the processed nytimes_infections file begin with "City1" and "City2" in the first field. I believe City1 corresponds to the "New York City" line in the raw file (with no fips code) and City2 to Kansas City,Missouri (also no fips code).
$ tail -2 nytimes_infections.csv |less -SX
City1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, ...
City2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, ...
The maximum value mentioned in the array under deaths column does not match the total number of deaths for some counties. In total, I found 16 such instances.
The FIPS codes of the problematic counties are as follows:
['01031', '01077', '02110', '05031', '05061', '08069', '08097', '13085', '13269', '28005', '39027', '39113', '45023', '49005', '53037', '54055']
I am using the 'abridged' version of the dataset.
Hi, thanks for sharing this dataset. I'm trying to load the safegraph_socialdistancing data from this git repository. However, it shows that (as shown below) the dataset are stored in a seperated "covid-19-private-data". Is there any way I could get access to this safegraph_socialdistancing data?
def load_safegraph_socialdistancing(data_dir='../../../../../covid-19-private-data'):
''' Load in SafeGraph Social Distancing data (automatically updated)
Parameters
----------
data_dir : str; path to the data directory to find safegraph_socialdistancing.gz (private data)
Returns
-------
data frame
'''
orig_dir = os.getcwd()
os.chdir(data_dir)
# refresh and load in data
os.system("git pull")
raw = pd.read_pickle("safegraph_socialdistancing.gz", compression="gzip")
If a goal is to put county_level data from various source in a common format then consider:
Sorting the columns so that the columns in usafacts_infection and nytimes_infections are in the same order. Currently, the #Cases_ columns come before the #Deaths_ in usafacts and the reverse is true for nytimes_infections.
All the numbers in nytimes_infections end in ".0" e.g. 0.0,0.0,1.0,1.0,... They are integers in usafacts_infections. I suggest removing the .0 in the nytimes_infections.
When I tried to run the predict_all_deaths.py it raise this KeyError, what should I do in this case?
KeyError: 'all_deaths_pred_6_17_advanced_shared_model_21'
Your dataset was added to CoronaWhy (https://www.coronawhy.org/) Data Lake on Dataverse as a piece of common COVID-19 data frame http://datasets.coronawhy.org/dataset.xhtml?persistentId=doi:10.5072/FK2/WB3UE8
Would you be willing to help with the maintenance of your dataset in Dataverse, e.g. adding the relevant metadata and keeping the dataset up-to-date? That will help to make the dataset findable and accessible for the medical science community.
The line in usafacts_infections with FIPS code 00001 corresponds to this line in the raw file:
1,New York City Unallocated/Probable,NY
Do you really want it in the processed file?
following the quickstart.ipynb. In the add_preds function, it has "date out of range" issue. The delta in the function keep increasing without stopping criteria other than the existence of the cached frame and causing the overflow issue?
Looks like there's a problem with the hrsa data, as below.
FileNotFoundError Traceback (most recent call last)
in
15 import load_data
16
---> 17 df = load_data.load_county_level()
18 df = df.sort_values('tot_deaths', ascending=False)
19 important_vars = load_data.important_keys(df)
~/load_data.py in load_county_level(data_dir, cached_file, cached_file_abridged, ahrf_data, diabetes, voting, icu, heart_disease_data, stroke_data, dir_mod)
50 heart_disease_data=heart_disease_data,
51 stroke_data=stroke_data,
---> 52 diabetes=diabetes) # also cleans usafacts data
53
54 # basic preprocessing
~/functions/merge_data.py in merge_data(ahrf_data, diabetes, voting, icu, heart_disease_data, stroke_data, medicare_group, resp_group)
18
19 # read in data
---> 20 facts = pd.read_pickle(ahrf_data)
21 facts = facts.rename(columns={'Blank': 'id'})
22
/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/io/pickle.py in read_pickle(filepath_or_buffer, compression)
168 if not isinstance(fp_or_buf, str) and compression == "infer":
169 compression = None
--> 170 f, fh = get_handle(fp_or_buf, "rb", compression=compression, is_text=False)
171
172 # 1) try standard library Pickle
/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text)
432 else:
433 # Binary mode
--> 434 f = open(path_or_buf, mode)
435 handles.append(f)
436
FileNotFoundError: [Errno 2] No such file or directory: 'data/hrsa/data_AHRF_2018-2019/processed/df_renamed.pkl'
Per the instructions, I cloned the repo and started a program to run in the root directory:
import data
# unabridged
df_unabridged = data.load_county_data(data_dir = "data", cached = False, abridged = False)
Running this code produces an error: FileNotFoundError: [Errno 2] No such file or directory: 'File ../../raw/ahrf_health/ahrf_health.csv does not exist'
. I have been able to reproduce this environment in a totally separate environment. The problem is likely in clean.py.
Not sure if I'm missing something obvious here.
Other info: MacOS
tips code 20069 has one case in the processed usafacts_infection column #Cases_04-19-2020 but none in the raw file. The raw file I'm looking at has entries up to and including 4/20/20.
Excuse me, I had trouble while running the script, and got an similar error with #16 . So that I couldn't generate the hospital-level dataset.
Could you please upload the processed clean hospital-level dataset? Thank you!!
The line in usafacts_infections with FIPS code 06000 corresponds to this line in the raw file:
6000,Grand Princess Cruise Ship,CA
Do you really want it in the processed county-level file?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.