Git Product home page Git Product logo

j535d165 / coronawatchnl Goto Github PK

View Code? Open in Web Editor NEW
143.0 15.0 73.0 1.15 GB

Numbers concerning COVID-19 disease cases in The Netherlands by RIVM, LCPS, NICE, ECML, and Rijksoverheid.

License: Creative Commons Zero v1.0 Universal

Python 10.26% R 6.17% Shell 0.17% HTML 6.51% Jupyter Notebook 76.90%
coronavirus coronavirus-tracking coronavirus-globaloutbreak coronavirus-real-time rivm netherlands covid-19 utrecht-university dataset covid19-data

coronawatchnl's Introduction

Hi, I'm Jonathan //J535D165

Ever wondered what the ideal (scientific) workflow would look like? And what kind of tools you need for it? It's maybe an impossible question to anser, but many will say the workflow should be efficient, transparent, and reproducible. I don't know the answer as well, but I fully support these principles. Over the past years, I've used my GitHub profile to share and collaborate on projects aimed at developing the ideal academic workflow. The following projects are my top interest at the moment:

  • Data access: If you're looking for an easy way to download scientific data, be sure to check out Datahugger ๐Ÿ‘ - the easiest way to download scientific data! I'm also involved in projects like pyalex (new!), cbsodata, and rispy.

  • Superfast reading: Can we make systematic reviews fun to work on by using AI for the boring ๐Ÿ’ค parts? With ASReview and asreview.ai, we speed up systematic reviewing. I'm lead of ASReview's development team.

  • Transparent workflows: I'm experimenting with projects like scitree and scisort, which help and promote to use repoducible project folder structures.

  • Data linkage: I work on projects like recordlinkage and List of data matching software. Although my attention may sometimes waver from these projects, but they are still close to my heart โค๏ธ.

In addtion to this, you can also find me at Utrecht University (in the Netherlands) as the project lead for the Open and FAIR Data and Software movement.

๐Ÿ’ฌ

coronawatchnl's People

Contributors

blogem avatar ghostleyjim avatar goedzo avatar henriterhofte avatar hungrxyz avatar j535d165 avatar japhir avatar jeroenr1 avatar jpvandervelden avatar rvoor avatar sebastiaanbekker avatar shadkam avatar sikerdebaard avatar timvosch avatar userlandkernel avatar vega-s avatar vmenger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coronawatchnl's Issues

Contact with RIVM

Hi all,

First of all, thank you all for contributing to this repository. It is a challenge to keep up-to-date with all the daily changes on the RIVM website. Without your contributions, it isn't possible to keep this up-to-date and add new datasets.

There are quite some questions regarding our contacts with RIVM and our offer to help them set up a proper data repository. We, CoronaWatchNL and Utrecht University, had contact with several people of RIVM. Unfortunately, there is no progress yet. More and more (prominent) researchers are teaming up with this project. They help us convincing RIVM to set up a data repository and use our/your expertise.

If there is anything to share, please drop it in this thread or send me an email at [email protected].

Fit a logistic regression as well?

Hi Jonathan,

After watching this epic video by 3blue1brown https://www.youtube.com/watch?v=Kas0tIxDvrg I thought you could perhaps fit a sigmoid line as well as the exponential.

Or even better (will probably only make sense if you've seen the video)

  1. calculate the change per time step (current value - previous value)
  2. calculate the growth rate (ratio of the current and previous change)
  3. estimate when the above will be 1 (with another kind of fit?)
  4. fit the sigmoid with that time as the inflection point ๐Ÿ˜ฒ?

Importing error Python

Hi all,

When I try to import the data into Python with Pandas, I keep hitting the same error:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 49, saw 2

Any suggestions how I can by pass this? Might be an easy question to answer but I haven't been able to find a solution that actually works :(

Reconstruct data of March 2

Since 3 March, RIVM reports the number of diagnoses with the coronavirus and their municipality of residence on a daily basis. This issue is used to reconstruct the data of March 1 based on RIVM media reports.

Result:

  • 1 person from Tilburg
  • 1 person from Diemen
  • ...

Concerns about missing locations

On 30 March, we had 310 missing municipalities. Based on the RIVM report, there are no missing provinces today. Is this possible? Are they imputing with the GGD region? Or do they no longer include missing locations? That would be interesting. Any thoughts on this?

12595 - (126 + 166 + 130 + 1475 + 161 + 1426 + 3412 + 1845 + 698 + 1046 + 161 + 1949) = 0

image

difference lcps and nice data

Does anyone know what the difference is between the intakeCount in the nice dataset and the number of cases in the lcps dataset? In the media they use the lcps data and say it is the number of corona patients in the icu. The Rivm daily epidemiological report calls it the confirmed covid-19 cases. Does this mean that the lcps data also includes suspected cases?

How to interpret the number of hospitalizations?

The maps for hospitalizations and infections look very different. With hospitalizations the big cities light up:

Screenshot from 2020-04-16 09-21-07

With infections the areas where Corona is most widespread ATM light up:

Screenshot from 2020-04-16 09-21-15

I first thought that the hospitalizations are recorded for the municipality in which the hospital is located. But this is not possible, because there are municipalities with no hospital that have a non-zero number.

What is the best way to interpret the number of hospitalizations?

Include data with reporting lag per province

The RIVM reports nicely show the data that is added per day, to make it clear that there is a significant lag in the reporting. However I did not find the source data published anywhere, a set that shows the reported cases/deaths/hospitalizations per reporting date and per date.

I took the liberty of extracting the data from the pdf reports. Data is here

Please note that no individual reports are available before march 27th, so this shows in the graphs below.


interactive


interactive

So now I made this, don't really know what to do with it. I can make a PR of course, but not sure how to structure the data for in here. Updating this requires a few minutes to export the data from the pdf graphs. Also I only did this for hospitalizations, not for the deaths/confirmed cases yet

test data not updated

I noticed that test data is last updated on the 20th and have not been updated since then. Also noticed that the format in the pdf was slightly changed to weekly so not sure if this was the cause but might be related. Thanks!

Improve sigmoid simulations

Do you have any suggestions how to improve the simulations of the sigmoid fitting? Currently it samples known data points (with replacement), but the range is almost certainly way too low.

If have also tried inferring data for the two next days, sampled from the distribution of the growth factor of the last five days. But I doubt the resulting image is still useful/readable:

sigmoid-with-two-inferred-days

Any suggestions are very welcome.

Reconstruct data of February 29

Since 3 March, RIVM reports the number of diagnoses with the coronavirus and their municipality of residence on a daily basis. This issue is used to reconstruct the data of February 29 based on RIVM media reports.

Result:

  • 3 persons from Tilburg
  • 3 persons from Diemen
  • 1 from Delft

Fix rendering of the maps in Actions

In the Github Actions workflow, the rendering of the maps fails. Unknown error to me, it works locally. It might be an idea to replace this by Python code.

Extract info from pdf maps

I think it is quite doable to extract the color values from the maps in an automated way.

Outline:

  • Screenshot/image the relevant pdf page.
  • Compute the center of mass of each municipality with CBS 2019 Wijk en Buurtkaart maps
  • Make this an overlay for the image.
  • Extract the color values and map them to the corresponding values.

Is someone interested in giving this one a try?

coronawatch website

is het een idee om een website te genereren aan de hand van alle data/grafieken in deze repo?
zoals bijvoorbeeld http://covid19.healthdata.org/

ik zat zelf te denken aan een gatsby site gehost op S3.
met daar op de datasets, grafieken, kaarten etc.

We hebben zelf het domein caard.nl "over" waar vast wel een acronym van te maken is. Maar een ander domein kan natuurlijk ook.

Mocht er interesse zijn kan ik uiteraard een PR sturen.

Provide Geocoded data

Compliments with this project!

For many applications that produce maps and/or apply spatial analysis, geocoded COVID-19 data (data with coordinate attributes/columns) would be very helpful. This sounds more complex than it actually is:

  • many of the produced data CSVs here, have columns like Municipality (Gemeente)/Province code and or name
  • the Dutch government via Kadaster-PDOK provides Open datasets for Administrative Borders (Bestuurlijke Grenzen) with those same names/codes.
  • GeoJSON is an ideal format for supplying geospatial data
  • there is ample Open Source software to convert/simplify these (GML) datasets. With little search found e.g. this script
  • "geocoding" is mainly a matter of JOIN-ing on Municipality/Province names/codes (i.s.o. using Geocoding backends like Nominatim ), possibly GeoPandas can be of help. e.g. https://geopandas.org/mergingdata.html

I hope this triggers interest. Eventually I could foresee some extra/derived GeoJSON files generated from the CSVs under /data/ like under /data/geocoded. Is all data here generated via GitHub Workflows/Actions? Then contributors could also add the required geocoding steps there.

Next step are OpenAPI endpoints from this GeoJSON data. This could be served directly from GitHub. We have a project based on OGC OpenAPI REST standards: pygeoapi where we are working on providing an Open Endpoint for COVID-19 data: https://demo.pygeoapi.io/covid-19/collections?f=html. Some Collections there already serve directly from GitHub repos, like Italy. For NL we use/proxy ESRI Endpoints, but would rather serve directly from a/this GH repo.

In theory I could do the work on this issue, but already quite occupied with the pygeoapi part...

Add row with unknown location

A couple of records don't have a location (or the sum of all cases doesn't sum up to the reported total). We might want to add an additional row/column to the data with unknown locations.

Create overview page with dashboards/website using CoronaWatchNL data

Thank you all for sending us emails with dashboards, visualizations, and gifs made with CoronaWatchNL data. We love to see them.

At the moment, we are receiving a lot of emails on this and are having a hard time answering these emails. It might be an idea to make a separate page with an overview of these initiatives. (And integrate the Interesting links section of the README. Is there anyone interested in setting this up?

Our website uses data from CoronaWatchNL

For our website (covid-analytics.nl) RIVM and LCPS data trough CoronaWatchNL is used. How would you like me to reference to CoronaWatchNL? Currently all charts that use RIVM and LCPS data trough CoronaWatchNL point to RIVM and LCPS directly. Would a mention in the charts explanation page or in some sort of about us page be acceptable? Or perhaps some sort of CoronaWatchNL logo somewhere on the page?

Error in dataset on e.g. row 18

in file rivm_corona_in_nl.csv i notice some lines which dont specify a "gemeenten". Like e.g. row 18
2020-03-02 | ย  | -1 | ย  | 8 -- | -- | -- | -- | --

There a few more of these.

How should i interpret these lines please?

Thanks
PAtrick

Amsterdam and Eindhoven always have the same number of infected

In https://raw.githubusercontent.com/J535D165/CoronaWatchNL/master/data/rivm_NL_covid19_municipality_range.csv the number of infections for Amsterdam and Eindhoven is always the same. Even though these numbers are reported in ranges, this still seems a bit suspicious.

These are the rows for the last couple of days, the values in the last two columns are always the same:

Datum,Gemeentenaam,Gemeentecode,Provincienaam,Type,Aantal_min,Aantal_max
2020-04-07,Amsterdam,363,Noord-Holland,Totaal,70.0,115.0
2020-04-07,Eindhoven,772,Noord-Brabant,Totaal,70.0,115.0
2020-04-08,Eindhoven,772,Noord-Brabant,Totaal,70.0,115.0
2020-04-09,Amsterdam,363,Noord-Holland,Totaal,70.0,115.0
2020-04-09,Eindhoven,772,Noord-Brabant,Totaal,70.0,115.0
2020-04-10,Amsterdam,363,Noord-Holland,Totaal,115.0,185.0
2020-04-10,Eindhoven,772,Noord-Brabant,Totaal,115.0,185.0
2020-04-11,Amsterdam,363,Noord-Holland,Totaal,115.0,185.0
2020-04-11,Eindhoven,772,Noord-Brabant,Totaal,115.0,185.0
2020-04-12,Amsterdam,363,Noord-Holland,Totaal,115.0,185.0
2020-04-12,Eindhoven,772,Noord-Brabant,Totaal,115.0,185.0
2020-04-13,Amsterdam,363,Noord-Holland,Totaal,90.0,160.0
2020-04-13,Eindhoven,772,Noord-Brabant,Totaal,90.0,160.0
2020-04-14,Amsterdam,363,Noord-Holland,Totaal,90.0,160.0
2020-04-14,Eindhoven,772,Noord-Brabant,Totaal,90.0,160.0
2020-04-15,Amsterdam,363,Noord-Holland,Totaal,90.0,160.0
2020-04-15,Eindhoven,772,Noord-Brabant,Totaal,90.0,160.0

Exclude 'buitenland' in total stats

Seems like RIVM excludes the (single?) person living abroad in their statistics. Therefore, our totals are too high. Need to remove this one from our counts imo.

Another attempt at animating new cases / help with collecting data?

Hi there,
i created https://corona-map-nl.web.app/
I've separated the new hospital intakes and the new cases, it looks a bit like what you guys made.
You can also click on a municipality for historic data.
And if you click on the country icon you see a complete history nationally.

I've also got quite a lot of more structured data by parsing the RIVM javascript, that way i don't have to deal with the PDF files. I also process the data from NICE and LCPS.

At the moment it takes me about 10 minutes to extract all the data and convert it to the format my app uses. Maybe i can help collecting part of the data?

Also, anybody notice that the chart "Aantal overledenen naar datum van overlijden" at https://www.rivm.nl/coronavirus-covid-19/grafieken has the value 109 for the 25th of march and in the pdf it shows 108 for that date. The total used by RIVM is the total in the PDF, so i assume that's the correct total. I noticed they have a one off error everyday for that date...

Let me know if i can be of help,

Best regards,
Jeroen

Data changes by RIVM - 31 March 2020

Hello Jonathan,

Quick heads up! as you will find on the RIVM website they changed the data from confirmed cases to number of people hospitalized in the municipalities.

Kind regards,
Jim

Missing daterange rivm_NL_covid19_total_municipality.csv

Hi Jonathan,

First of all: fantastic effort by pulling all of this together. It helps many people from many industries (like mine: publishing) to make the data accessible.

Question about rivm_NL_covid19_total_municipality.csv: the dataset seems to be jumping from 31-mrt to 8-apr. Do you know a way how to get the missing data available? I tried to copy/paste from different dataset but there always seems to be a problem (day count vs. cum. count etc.)

Intensive Care data from NICE

First of all thank you so much for the efforts! It was the best efforts on getting all the numbers together I saw and thank you for that.

Regarding the IC data from NICE... understand you said it was experimental but I noticed that it has stopped updating for 4 days. What's the plan on this? Is there temporary issues on getting the data, or is this dataset being deprecated?

thanks!

Data missing in rivm_NL_covid19_hosp_municipality.csv and rivm_NL_covid19_total_municipality.csv?

Hi,

Before in 'rivm_NL_covid19_hosp_municipality.csv' and 'rivm_NL_covid19_total_municipality.csv' data for unknown municipalities/provinces was reported with a blank row.

For example:
Datum,Gemeentecode,Gemeentenaam,Provincienaam,Provinciecode,Aantal
2020-04-16,-1,,,,431

I identified that in the extract of today (28 April) that this is not reported anymore for both tables.
Datum,Gemeentecode,Gemeentenaam,Provincienaam,Provinciecode,Aantal
2020-04-28,-1,,,,

Is this missing data, or will this the new data format and unknow municipalities won't be reported anymore?

Thanks in advance!

Sigmoid plotting fails

The plotting of the sigmoid fails with the latest data @vmenger

Sys.setlocale(category="LC_ALL", locale = "en_US.UTF-8")
Inflection expected after 15.6 days, at date 13/03/2020 13:00
/Users/jonathan/.pyenv/versions/anaconda3-2018.12/lib/python3.7/site-packages/pandas/plotting/_matplotlib/converter.py:103: FutureWarning: Using an implicitly registered datetime converter for a matplotlib plotting method. The converter was registered by pandas on import. Future versions of pandas will require you to explicitly register matplotlib converters.

To register the converters:
	>>> from pandas.plotting import register_matplotlib_converters
	>>> register_matplotlib_converters()
  warnings.warn(msg, FutureWarning)
Traceback (most recent call last):
  File "/Users/jonathan/Dropbox/Projects/Corona/CoronaWatchNL/python_plots.py", line 287, in <module>
    inflection_y = compute_inflection_cases(df, inflection_x)
  File "/Users/jonathan/Dropbox/Projects/Corona/CoronaWatchNL/python_plots.py", line 158, in compute_inflection_cases
    upper_bound = df[df['Dag'] == math.ceil(inflection_x)].iloc[0]['Aantal']
  File "/Users/jonathan/.pyenv/versions/anaconda3-2018.12/lib/python3.7/site-packages/pandas/core/indexing.py", line 1424, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/Users/jonathan/.pyenv/versions/anaconda3-2018.12/lib/python3.7/site-packages/pandas/core/indexing.py", line 2157, in _getitem_axis
    self._validate_integer(key, axis)
  File "/Users/jonathan/.pyenv/versions/anaconda3-2018.12/lib/python3.7/site-packages/pandas/core/indexing.py", line 2088, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
[Finished in 3.5s with exit code 1]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.