Git Product home page Git Product logo

rki-vaccination-scraper's Introduction

home title sidebar lang footer
true
Guide
auto
en-US
Made with ❤️ in Hamburg/Zurich

RKI Vaccination Monitoring

⚠️ Currently undergoind refactoring due to major data source change, updating data will be back soon, see #13 for details.

goodtables.io

Project to archive and track the state level vaccination data published by the Robert-Koch-Institut daily.

🏗️ A work in progress.

<iframe width="100%" height="584" frameborder="0" src="https://observablehq.com/embed/@n0rdlicht/vaccination-tracker-germany?cell=viewof+geo&cell=viewof+indicator&cell=chart"></iframe>

Experimental visualization: Observable Notebook

The API

A simple Vercel-hosted API is available at api.vaccination-tracker.app/v1/.

A plain request will respond with the full dataset de-vaccinations, paginated with 1000 entries per page.

Resources

Data Package Resources: resources listed in datapackage.json are exposed and can be accesses via

GET https://api.vaccination-tracker.app/v1/<resource-name>

e.g. GET https://api.vaccination-tracker.app/v1/de-vaccinations, currently available sets:

  • de-vaccinations: historized data as json
  • de-vaccination-current: currently published version as json
  • de-population-current: population data from DeStatis

Changelog

  • Jan 4, 2021: Adds quote and population fields for comparison between different geo's
  • Jan 18, 2021: Complete refactor due to overhauled excel structure and additional data

Pagination

Add page=2 & per_page=100 (defaults: 1 and 1000)

Make sure to adapt these to your type of request.

Filter / Values for de-vaccinations and de-vaccinations-current

By column: <key>=<value>, e.g. ?key=sum&geo=Hamburg to only get summery values for the state of Hamburg

Most columns can be suffixed by _initial and _booster from Jan 18 onward for more detailed values on the initial vaccination as well as the booster shots.

Values for key

  • sum: All vaccinations
  • sum_initial_biontech / sum_initial_moderna: Number of vaccinations in intial round for respective vaccinations by BioNTech or Moderna
  • delta_vortag: Delta to the previous reported day
  • quote_initial / quote_booster: rate of vaccination per 100, only on entries with where key is sum
  • ind_alter: Indication by age
  • ind_med: Indication by medical condition
  • ind_prof: Indication by profession
  • ind_pflege: Indication by residents of nursing homes

Other filters

  • geo: German state name or Germany for national data
  • geotype: either state for all states or nation for Germany entries
  • population: number of residents in geo
  • Note: filtering by date is not supported yet and only available in de-vaccinations

Example

Example request and response for all vaccinations by medical condition and publishing date in the state of Baveria:

twesterhuys@book ~ % curl --request GET 'https://api.vaccination-tracker.app/v1/de-vaccinations?key=ind_med&geo=Bayern'

{
  "dataset": "de-vaccinations",
  "time": "2021-01-04T12:49:20",
  "last_update": "2021-01-03T11:32",
  "last_published": "2021-01-04T11:32",
  "applied_filter": [{
    "geo": "Bayern"
  }, {
    "key": "ind_med"
  }],
  "per_page": 1000,
  "page": 0,
  "data": [{
    "date": "2020-12-27T00:00:00.000Z",
    "geo": "Bayern",
    "key": "ind_med",
    "iso-cc": "DE",
    "geotype": "state",
    "value": 68,
    "population": 13124737,
    "quote": null
  }, {
    "date": "2020-12-28T00:00:00.000Z",
    "geo": "Bayern",
    "key": "ind_med",
    "iso-cc": "DE",
    "geotype": "state",
    "value": 91,
    "population": 13124737,
    "quote": null
  }, {
    "date": "2020-12-29T00:00:00.000Z",
    "geo": "Bayern",
    "key": "ind_med",
    "iso-cc": "DE",
    "geotype": "state",
    "value": 214,
    "population": 13124737,
    "quote": null
  }, {
    "date": "2020-12-30T00:00:00.000Z",
    "geo": "Bayern",
    "key": "ind_med",
    "iso-cc": "DE",
    "geotype": "state",
    "value": 424,
    "population": 13124737,
    "quote": null
  }, {
    "date": "2020-12-31T00:00:00.000Z",
    "geo": "Bayern",
    "key": "ind_med",
    "iso-cc": "DE",
    "geotype": "state",
    "value": 718,
    "population": 13124737,
    "quote": null
  }, {
    "date": "2021-01-01T00:00:00.000Z",
    "geo": "Bayern",
    "key": "ind_med",
    "iso-cc": "DE",
    "geotype": "state",
    "value": 718,
    "population": 13124737,
    "quote": null
  }, {
    "date": "2021-01-02T00:00:00.000Z",
    "geo": "Bayern",
    "key": "ind_med",
    "iso-cc": "DE",
    "geotype": "state",
    "value": 1091,
    "population": 13124737,
    "quote": null
  }, {
    "date": "2021-01-03T00:00:00.000Z",
    "geo": "Bayern",
    "key": "ind_med",
    "iso-cc": "DE",
    "geotype": "state",
    "value": 1280,
    "population": 13124737,
    "quote": null
  }]
}

The Data Package

  1. Daily GitHub Action running a Frictionless Data Package Pipeline as defined in pipeline-spec.yaml. Currently run manually, to be automated based on changes to RKI website
  2. Resulting in a combined CSV and current day CSV
  3. Metadata and validation can be done via data validate on datapackage.json
  4. D3.js visualization of data

Updating the data

Either trigger the GitHub action "Update Data Package" or run locally

# Get the code
git clone https://github.com/n0rdlicht/rki-vaccination-scraper.git
cd rki-vaccination-scraper

# Activate a virtual environment and install dependencies
virtualenv env
. env/bin/activate
pip install -Ur requirements.txt

# Run all pipelines
make

# or
make update # Fetch and merge todays data with existing data

# Validate Data Package (requires goodtables)
make validate

# To deactivate virtual environment
deactivate

rki-vaccination-scraper's People

Contributors

n0rdlicht avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

nikobergemann

rki-vaccination-scraper's Issues

Numbers for 2021-04-26 seem to be wrong

e.g.
Germany,DE,nation,sum_initial_moderna,3597780,83166711,7.260192121821434,2021-04-26
while on 2021-04-25:
Germany,DE,nation,sum_initial_moderna,1069222,83166711,7.166621029416445,2021-04-25
was 2.5 million less...
rki excel reports 1099371 on 2021-04-26

Migrate to frictionless-py framework

To future proof and simplify the code should be migrated to frictionless-py Pipelines/Transformations as the API is already using it this would greatly simplify dependencies.

Keys in use from / until

Publish a document showing when which key was added / deprecated and hence has data associated with it.

Migration from Goodtables to Frictionless Repository

Hi @n0rdlicht,

Goodtables.io is going to be deprecated in 2022, we, therefore, recommend migrating to the new Frictionless Repository (https://repository.frictionlessdata.io/) continuous data validation system provided by Frictionless Data. The core difference between the two projects is that Frictionless Repository doesn't rely on any hosted infrastructure except for Github Actions which makes this project more sustainable. Also, it uses a newer Frictionless Framework under the hood that brought many improvements over the old goodtables-py library in terms of validation quality and performance.

As usual, if you have any doubts or questions, please come and ask in our Discord chat or in the GitHub Discussion.

Failing pipeline runner

Due to completly new structure of the underlying RKI excel file, the current pipeline is broken.

  • Now two relevant tabs instead of one, one for sums and one for different indications
  • New: Daily numbers for initial and booster vaccinations
  • New: Daily numbers for different vaccine types (currently BioNTech and Moderna)

Backfill missing values

Some values are not yet in de-vaccinations due to missing keys. Backfilling will be done from git history.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.