Comments (14)
@wilschmidtt , I suggest in addition to the a safety_measures one includes a safety_measures_start_date so that when new countries adopt the measures, the model is still useful, given so many countries have different measures.
Also, we can make a poll where we ask individuals from all countries to participate and provide information so that we can fill out these data easily. if you write a google forms poll I can send it to friends in Nepal, India, US, Colombia, Chile, Mexico, Australia, Peru, Belgium, and France and I can also translate the poll to Spanish in order to share it with people from the latin american region.
All you need is one nerd per country and you are set, this person can become an information source, also the poll should ask people what is the source of their data.
If the problem is data collection, I think we can find the people to help.
Just send the questions in English, and I'll send back the data in CSV or whatever format you want.
Remember, the less questions, the more datapoints.
from data.
Hey WIlliam, thanks for sharing -- this is pretty cool! I think that adding all the columns you propose might make the main dataset a bit bloated, but some of them I'd love to add if we can find a reliable source for them. Specifically, I'd like to get a better understanding of where you got the SafetyMeasures
data from. If we can get a reliable source for that, we could add a column to the dataset for:
- Unknown (
null
) - No measures (
"none"
) - International travel restricted (
"international_travel"
) - Local travel restricted (
"local_travel"
) - Shelter in place enacted (
"shelter_in_place"
)
If you want to, you can open a PR and edit the relevant metadata_*.csv
files and fill the Population
and SafetyMeasures
columns. Unless I missed something, you can infer the other columns that you mentioned from the data itself.
from data.
The SafetyMeasures column wasn't fetched from any online source. I looked online for a site that reported this information but I couldn't find anything useful. I simply populated this column by dividing the number of confirmed cased by the population, and when the number of confirmed cases exceeded 0.002% of the population, I changed the SafetyMeasures column from 0 to 1. This method is a bit arbitrary, so I could see why it might not be the best feature to include. I simply chose 0.002% based on observing at what point different locations started to take action. From what I observed, this came right around 0.002% of location's population being infected by the virus.
I agree that international_travel, local_travel, and shelter_in_place would all be much more reliable features. The only problem is that I am not sure where such data would be available.
I will open a PR to edit the metadata populations in the meantime.
from data.
actually, I just noticed there is a date on the dataset, so nevermind, my suggestion doesn't make sense.
from data.
@dataf3l I think your idea is still valid, we can put the safety measures in its own CSV table and them merge during the data processing stage. In my opinion the biggest difficulty would be to keep it up to date, since measures are changing very fast across different countries.
from data.
@dataf3l this could still be a good idea. Like I said, the 'SafetyMeasures' column is pretty arbitrarily chosen at this point. I couldn't find a good source of data indicating when each location started issuing quarantines. I had to search all over the web, and each bit of information that I found was exclusive to one location, so trying to fill it in for every location would take far too long.
From what I observed, it seemed that right around 0.002 % confirmed is when the governments started to feel the pressure and issue warnings to the public. I tried to use this information to infer the date in which preventative measures were put into place, but if there were actual sources that could verify this date then I think that would be even better.
from data.
@dataf3l there is also the problem of keeping it up to date. The nice thing about the 0.002 % threshold is that it automates the process and doesn't require any manipulation of the data by the user.
from data.
I think that's interesting, what about renaming the column HasPassed2PercentSoWeGuesstimateMeasureHaveBeenTakenButHaveNoRealDataSoIt'sJustAGuess :p
from data.
I'm merely joking, I see having no data is clearly an issue. having up to date data will also be an issue.
from data.
@dataf3l this is a decent suggestion. But I was thinking something more along the lines of ArbitrarilyChosen2PercentBecauseImTooLazyToFindRealSourcesAndUpdateTheDataEachDaySoThisIsAllWeGot
from data.
here is what the dataset could look like:
CO: 2020-03-19:https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Colombia
PE:2020-03-22:https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Bolivia
BR:????:https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Brazil
CL:2020-03-22:https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Chile
here is where I got the data from:
Other countries:
https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_South_America#Argentina
Other continents:
https://en.wikipedia.org/wiki/2019%E2%80%9320_coronavirus_pandemic_by_country_and_territory
I think as people spend more time on it, it is likely that we'll be able to improve the dataset.
Let's make this happen.
If you make a Google Forms doc, I'll send it around :)
from data.
@dataf3l thank you for those links, that makes me wonder if a better approach would be to propose the creation of a new table in the Wikipedia page rather than trying to collect that data in this repo. That way, the data will be made available to a lot more people and we can still scrape it from Wikipedia ourselves.
Personally, I would prefer to keep the efforts in this repo focused towards (automated) data aggregation rather than the creation of crowd-sourced data -- even though crowd-sourced data was the original intent of this repo!
from data.
Should mankind make an app to track movements and self-report if one has symptoms so that people can avoid paths with people with symptoms?
from data.
FYI I have added mobility and government measures datasets which are relevant to this discussion.
from data.
Related Issues (20)
- Tests for pipelines? HOT 3
- Omitted region in Polish data HOT 3
- Latest-data includes all data HOT 2
- I have changed the site address. HOT 6
- Is the data considered to be transactional? HOT 5
- 404 for https://open-covid-19.github.io/data/data.json HOT 1
- Machine-readable schema HOT 2
- @jmullo consider switching to the new file URL paths HOT 1
- @OmarJay1 consider switching to the new file URL paths HOT 2
- Columbia epidemiology data has bad date values. HOT 3
- Switzerland has incorrect epidemiological data. HOT 2
- Metadata Update HOT 1
- Missing state/province information for India HOT 1
- Missing subnational data on the confirmed cases for France HOT 2
- Bad data quality South America HOT 5
- Using your data for my website – thanks! HOT 2
- new epidemiology csv and old data.csv files both only showing philippines data HOT 2
- Too few countries have country-wide recovered counts
- Latest-Data almost empty HOT 11
- Keys in data tables don't match the csv/json file content HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from data.