Comments (10)
Also @DTPOTO ... why would there be more total positive tests in the reporting date methodology? Is that just due to the time of day the results are generated?
I agree NYC health dept should supply both data sets. The time of day has a minor impact, more so when you are using the Report-Date methodology. The reason why the Reporting date methodology has higher numbers is because you are focused on the current date (today). The data files are being restated by BACK-DATING. It's a little like the government revising last months unemployment number. The TOTAL number of cases are identical, it just when are they being reported. @joansobo demonstrated that the total cases were the same, and able to calculate a new REPORTED Cases by looking at the Case-Hosp-Deaths.csv over two different days. The issue is Daily Restatement. Getting the lasted version of Case-Hosp-Deaths.csv may be you best bet in terms of predictive modeling. I don't like either but we may get that clarity or better information in a timely way.
from coronavirus-data.
from coronavirus-data.
They dont seem to match for days though.
This is NYState's historical estimates vs NYC historical estimates.
from coronavirus-data.
they will not match until after 7 pm and only for a moment, then they will need to be corrected next day again
from coronavirus-data.
I don't think this is a timing issue. The data shown on the NYS site usually matches what is presented during a Governor briefing. For the past 3 days, the number shown for NYC during the noonish briefing has been ~2000 higher than the evening NYC number.
from coronavirus-data.
Im with @psylum - the afternoon numbers seem higher, and the implied growth rates are very different. Here's a bar chart from the NYC data (I took the last three points and added them to the 31st).
Where as on wiki, NYState has an implied growth rate of 13-10% over the last few days. There seem to be big differences
from coronavirus-data.
Hello all please review the Issue string started when the NYC Health Dept started to use GitHub as data storage for their WEB page ("Counts vary differently from Yesterday"). At the same time of switching to GitHub the Health Dept changed the reporting methodology. Using "Diagnosis Date" instead of "Reporting Date". I am sure that the State Health Department is stuck with just getting the "Reporting Date" because they are collecting from too many different sources. The City is now attempting to show the NEW cases as of the date-of-diagnosis. The original Diagnosis occurs when the doctor suspects the patient has the virus and orders the TEST. The Lab provides data on the Reporting-Date, the Lab results may take 3 to 14 days (OUCH). I have looked at the LAG time between Diagnosis Date and Reporting Date see here
I am using the level of Back-Dating Revisions as of the Diagnosis-Date as surrogate for Lab-Results Lag time. The cumulative graph suggest that 3 days back is under-reported by half and that 4 days back is under-reported by a 1/3rd. At the current lab turn-around rate it takes a week before you have a handle on today's real number.
This can be unsettling if you are only focused on yesterday's new cases. All new cases being reported is OLD news (coming from either the state or the city). The reality is using Diagnosis date may be the better method for predicting the APEX. But, changing the reporting methodology without adequate an explanation sows the seeds of distrust and certainly undermines everyone's predictive models.
@madeka @ptulin @psylum @mmontesanonyc
from coronavirus-data.
This is causing a whole lot of confusion. Itβs the responsibility of NYC to make these data differences crystal clear, and to provide both sets of data.
from coronavirus-data.
Also @DTPOTO ... why would there be more total positive tests in the reporting date methodology? Is that just due to the time of day the results are generated?
from coronavirus-data.
Data from NYC and NYS will always be different for a number of reasons, including the time of day the dataset is cut, de-duplication procedures that differ between the agencies, and data cleaning and QA procedures.
from coronavirus-data.
Related Issues (20)
- Total number of pediatric deaths 2-4 in NYC vs 0-1 HOT 5
- antibody-by-modzcta-by-week HOT 1
- calculation error in NYC 28-day-average daily percent positive ? HOT 4
- antibody data HOT 6
- Why are case rates among boosted folks higher than unboosted ones? HOT 2
- Please help connect May and September antibody data; there appears to be an error HOT 3
- How to identify hospitalizations due to COVID vs hospitalized patients who happen to have COVID? HOT 1
- The 06/09 data update is missing latest/now-weekly-breakthrough.csv HOT 1
- Covid Alert Level HOT 1
- Data for vaccination status has not updated since 6/19 HOT 1
- Disagreement in hospitalizations between weekly breakthrough an 7-day average HOT 3
- now-weekly-breakthrough.csv Not Updated for 11 Days HOT 1
- Question about total_covid_tests in data-by-modzcta.csv HOT 1
- Weekly breakthrough no longer updated or reported on website? HOT 1
- last7days-by-modzcta update frequency HOT 2
- now-weekly-breakthrough.csv: Definition of "Unvaccinated" & where do partially vaxxed show up here? HOT 2
- Data updating HOT 2
- NYC Dept of Health Data HOT 2
- Weekly Rates vs Daily Counts HOT 2
- PERCENT_POSITIVE indicator field is missing.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from coronavirus-data.