Git Product home page Git Product logo

Comments (16)

jgehrcke avatar jgehrcke commented on June 14, 2024

Thank you @mathiasflick for the report.

I had a quick look into logs and found

Traceback (most recent call last):
  File "tools/build-rki-csvs.py", line 499, in <module>
    main()
  File "tools/build-rki-csvs.py", line 52, in main
    df_by_lk, df_berlin_cases_sum, df_berlin_deaths_sum = fetch_and_clean_data()
  File "tools/build-rki-csvs.py", line 176, in fetch_and_clean_data
    assert lacking_wrt_ref == set([11000, 3152])
AssertionError

Looks like once again the set of amtliche gemeindeschlüssel changed in the RKI data set -- in the past that has always been a human error somewhere in the pipeline. The code might be overly strict. I might be able to precisely understand and fix this tomorrow. Hopefully.

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

Data for this Landkreis were missing, recently:

  "16056": {
    "name": "SK Eisenach",
    "state": "Thüringen",
    "lat": 50.9833,
    "lon": 10.3167,
    "population": 42250
  },

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

I may want to remove the lacking_wrt_ref check, update csv-epsilon-merge.py to allow for base set to contain more columns than extension set -- and then to forward-fill those columns.

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

On vacation. Didn't get to this yet. Sorry about that :/

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

I have addressed this in #1827.

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

I have looked at the data more closely to better understand what happened. The fact that 16056 disappeared from the RKI data set made me 'hope' that reporting for this Landkreis was merged with another Landkreis.

Indeed, there is a pretty suspicious case numer jump for Landkreis 16063 at the time when the case count for Landkreis 16056 did not change anymore:

Screenshot from 2021-10-20 13-44-07

That jump is specifically from 8579 to 10572:

>>> 10572 - 8579
1993

The last reported case count value for Landkreis 16056 was 1975.

I think we can safely conclude that on September 12, reporting for Landkreise 16056 and 16063 was merged, and reported together under AGS 16063.

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

With the solution from #1827 I have now retained Landkreis 16056 in the CSV files, simply forwarding the last known value (1975). That's incorrect, the value should drop to 0 so that the sum over the Landkreise evolves more correctly. Given the relatively small number though I think I will just leave this as-is. Feedback appreciated.

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

I have just looked at the columns 16056 and 16063 the RL data set. They have seemingly be synced a while ago: they contain the same values, for the entire time range of interest. (that is, the sum is also wrong)

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

The two landkreise in question:

  "16056": {
    "name": "SK Eisenach",
    "state": "Thüringen",
  "16063": {
    "name": "LK Wartburgkreis",
    "state": "Thüringen",

on a map:
Screenshot from 2021-10-20 13-57-44

(from https://www.bik-gmbh.de/download/Gebietsreform_Thueringen_zum_GS1906.pdf)

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

So, I think it's fair to say that Eisenach, kreisfreie Stadt case numbers are reported as part of Wartburgkreis, which geographically and organizationally might make sense.

from covid-19-germany-gae.

mathiasflick avatar mathiasflick commented on June 14, 2024

Some research regarding local reporting of corona-related indicators (e.g. for Eisenach and Wartburgkreis) clearly support your assumption - although I was not able to find any kind of official confirmation. Probably it is a politically motivated move in order to get "better" (i.e. lower) numbers by averaging the high one out ... But that is just my personal opinion!
Anyway - this kind of "summarization" does create problems with the processing of data in dependent systems - leaving zero values and/or grey areas like e.g in the RKI dashboard:

Screenshot 2021-10-23 at 15-23-28 RKI COVID-19 Germany

By the way, the zero for Luckenwalde/Parchim is caused by a hacking incident - they are not able to deliver ...
Source: https://www.kreis-lup.de/corona/

Greetings from Cologne
Mathias

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

Thank you Mathias for the additional insight! Huh. :)

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

RL did drop the data colums for landkreis 16056 and that required further patches -- done in #1842.

Both the RL and RKI heatmaps now show 16056+16063 both using the data from 16063.

from covid-19-germany-gae.

mathiasflick avatar mathiasflick commented on June 14, 2024

Perfect! Thank you so much for your work!
Now I need to start my own upstream patching ...
Greetings from Cologne
Mathias

from covid-19-germany-gae.

mathiasflick avatar mathiasflick commented on June 14, 2024

After a little bit of research I probably found the reason for the unexpected change:
According to information provided by the state of Thüringen, Eisenach was officially made part of the Wartburgkreis (effective as of 2021-07-01).
Source: https://statistik.thueringen.de/datenbank/gemauswahl.asp
A problem remaining for me (I just do not remember ...) is, where we get the population from (ags.json) and whether the change is already incorporated there (important for 7di computation) and when officially updated maps (shapefiles) will be available.
Thank you again and greetings from Cologne
Mathias

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on June 14, 2024

A problem remaining for me (I just do not remember ...) is, where we get the population from (ags.json) and whether the change is already incorporated there

Hey Mathias. Ouch. Thank you for that reminder. I will have to double-check, but it's likely that 7di number have been a little off for 16063 because I didn't think this through before. Thank you!

Keeping track of this topic here: https://github.com/opstrace/opstrace/issues/1472

from covid-19-germany-gae.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.