Comments (16)
Thank you @mathiasflick for the report.
I had a quick look into logs and found
Traceback (most recent call last):
File "tools/build-rki-csvs.py", line 499, in <module>
main()
File "tools/build-rki-csvs.py", line 52, in main
df_by_lk, df_berlin_cases_sum, df_berlin_deaths_sum = fetch_and_clean_data()
File "tools/build-rki-csvs.py", line 176, in fetch_and_clean_data
assert lacking_wrt_ref == set([11000, 3152])
AssertionError
Looks like once again the set of amtliche gemeindeschlüssel changed in the RKI data set -- in the past that has always been a human error somewhere in the pipeline. The code might be overly strict. I might be able to precisely understand and fix this tomorrow. Hopefully.
from covid-19-germany-gae.
Data for this Landkreis were missing, recently:
"16056": {
"name": "SK Eisenach",
"state": "Thüringen",
"lat": 50.9833,
"lon": 10.3167,
"population": 42250
},
from covid-19-germany-gae.
I may want to remove the lacking_wrt_ref
check, update csv-epsilon-merge.py to allow for base set to contain more columns than extension set -- and then to forward-fill those columns.
from covid-19-germany-gae.
On vacation. Didn't get to this yet. Sorry about that :/
from covid-19-germany-gae.
I have addressed this in #1827.
from covid-19-germany-gae.
I have looked at the data more closely to better understand what happened. The fact that 16056 disappeared from the RKI data set made me 'hope' that reporting for this Landkreis was merged with another Landkreis.
Indeed, there is a pretty suspicious case numer jump for Landkreis 16063 at the time when the case count for Landkreis 16056 did not change anymore:
That jump is specifically from 8579 to 10572:
>>> 10572 - 8579
1993
The last reported case count value for Landkreis 16056 was 1975.
I think we can safely conclude that on September 12, reporting for Landkreise 16056 and 16063 was merged, and reported together under AGS 16063.
from covid-19-germany-gae.
With the solution from #1827 I have now retained Landkreis 16056 in the CSV files, simply forwarding the last known value (1975). That's incorrect, the value should drop to 0 so that the sum over the Landkreise evolves more correctly. Given the relatively small number though I think I will just leave this as-is. Feedback appreciated.
from covid-19-germany-gae.
I have just looked at the columns 16056 and 16063 the RL data set. They have seemingly be synced a while ago: they contain the same values, for the entire time range of interest. (that is, the sum is also wrong)
from covid-19-germany-gae.
The two landkreise in question:
"16056": {
"name": "SK Eisenach",
"state": "Thüringen",
"16063": {
"name": "LK Wartburgkreis",
"state": "Thüringen",
(from https://www.bik-gmbh.de/download/Gebietsreform_Thueringen_zum_GS1906.pdf)
from covid-19-germany-gae.
So, I think it's fair to say that Eisenach, kreisfreie Stadt
case numbers are reported as part of Wartburgkreis
, which geographically and organizationally might make sense.
from covid-19-germany-gae.
Some research regarding local reporting of corona-related indicators (e.g. for Eisenach and Wartburgkreis) clearly support your assumption - although I was not able to find any kind of official confirmation. Probably it is a politically motivated move in order to get "better" (i.e. lower) numbers by averaging the high one out ... But that is just my personal opinion!
Anyway - this kind of "summarization" does create problems with the processing of data in dependent systems - leaving zero values and/or grey areas like e.g in the RKI dashboard:
By the way, the zero for Luckenwalde/Parchim is caused by a hacking incident - they are not able to deliver ...
Source: https://www.kreis-lup.de/corona/
Greetings from Cologne
Mathias
from covid-19-germany-gae.
Thank you Mathias for the additional insight! Huh. :)
from covid-19-germany-gae.
RL did drop the data colums for landkreis 16056 and that required further patches -- done in #1842.
Both the RL and RKI heatmaps now show 16056+16063 both using the data from 16063.
from covid-19-germany-gae.
Perfect! Thank you so much for your work!
Now I need to start my own upstream patching ...
Greetings from Cologne
Mathias
from covid-19-germany-gae.
After a little bit of research I probably found the reason for the unexpected change:
According to information provided by the state of Thüringen, Eisenach was officially made part of the Wartburgkreis (effective as of 2021-07-01).
Source: https://statistik.thueringen.de/datenbank/gemauswahl.asp
A problem remaining for me (I just do not remember ...) is, where we get the population from (ags.json) and whether the change is already incorporated there (important for 7di computation) and when officially updated maps (shapefiles) will be available.
Thank you again and greetings from Cologne
Mathias
from covid-19-germany-gae.
A problem remaining for me (I just do not remember ...) is, where we get the population from (ags.json) and whether the change is already incorporated there
Hey Mathias. Ouch. Thank you for that reminder. I will have to double-check, but it's likely that 7di number have been a little off for 16063 because I didn't think this through before. Thank you!
Keeping track of this topic here: https://github.com/opstrace/opstrace/issues/1472
from covid-19-germany-gae.
Related Issues (20)
- RKI data: death rate seems to be bogus; dropping towards 0 HOT 1
- DEU variants HOT 2
- Risklayer Deutschland Zahlen
- generate-latest-aggregate.py: KeyError: '16056' HOT 1
- Keine Updates mehr? Schade. HOT 3
- Double-check population count of Landkreis 16063
- Update Fehler HOT 6
- Is there a detailed description of the raw RKI_COVID19.csv -- AnzahlFall 131 ? HOT 6
- Data updates keep overwriting the newest line instead of appending HOT 2
- Feature Request: Single CSV-File with current state HOT 28
- auto update fails as of arcgis system downtime HOT 2
- No updates since 27.01.2023 HOT 3
- change heatmap scale to be absolute
- Discrepancy to RKI data of yesterday while today is accurate. HOT 7
- Time Shift in Gehrcke VS RiskLayer HOT 4
- Discrepancy in Deaths - Gehrcke vs. Risklayer HOT 5
- RKI data update: currently broken because of AGS 9178/9179 history damage in Covid19_RKI_Sums ArcGIS feature server HOT 1
- Potential error in calculation of daily deaths HOT 4
- cases-rki-by-ags.csv - implausible data for entity 8126 - Hohenlohekreis HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from covid-19-germany-gae.