Git Product home page Git Product logo

sociepy / covid19-vaccination-subnational Goto Github PK

View Code? Open in Web Editor NEW
61.0 3.0 15.0 508.9 MB

๐ŸŒ๐Ÿ’‰ Global COVID-19 vaccination data at the regional level.

Home Page: https://sociepy.org/covid19-vaccination-subnational

License: GNU General Public License v3.0

Python 97.95% Shell 1.12% HTML 0.92%
covid-19 covid19-data open-data coronavirus-tracking coronavirus vaccination regional covid-vaccination api covid-api

covid19-vaccination-subnational's People

Contributors

lucasrodes avatar marcros avatar mathiasbynens avatar sanyam-git avatar zzulu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

covid19-vaccination-subnational's Issues

Reorganize repository

  • Move API folder within data folder
  • Add folder with templates to generateu updated docs, e.g. _templates.

Dashboard

Working on a simple minimal dashboard. Plotly based.

TEST: Implement tests

Implement some tests to verify the reliability of data.

Some ideas:

  • Check that fields total_vaccinations, people_vaccinatedand people_fully_vaccinated are monotonically increasing.

Not adding dates with zero vaccinations

I have observed from comparing India.csv and state_timeline.csv that the script is leaving the date with zero vaccinations (I've checked and its seems it is the case for all other countries also).
For example : On 20th January 202, the union territory of AN in India had zero vaccination does administered so that date is not present in India.csv.

I'm relatively new in this stuff, so please don't mind if I'm wrong here. Will not this create issues when using the API to directly plot any visualizations or using the data for analysis directly ?

Denmark bug in tables

Denmark provides data in PDF format. To read it, we use tabula-py. It appears that depending on the day, tables move around the document.

OWID uses same source, follow their approach

Adding total_vaccinations and population field at a national level

Currently the country-wise latest and all API have the following structure :

{
    "country": "India",
    "country_iso": "IN",
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Two more fields can be added : total_vaccinations and population as such:

  • total_vaccination :

    • One can loop over the data for all the regions of a country and get cumulative, but I think it will be better if it can be provided pre-calculated.
    • Another reason for this is some countries (I'm only aware of India in this case currently, but it is fairly possible that it maybe the case somewhere else also), are adding some vaccinations under the heading of Miscellaneous, so this can't be accounted to any region and should be reflected in the national total.
  • population :
    will be helpful in normalizing data as per capita. (I'm not sure about what source should be used here, maybe https://www.worldometers.info/world-population/)

The updates structure as :

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "population":1371360350,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Add Russia

Add data from russia.

source: gogov.ru

Doc generation

Documentation should be generated separately from data updates.

Chilean data

Number of vaccines in chile not monotonically increasing. Check!

same dates are appearing twice for some states in API for India

Same dates are occurring twice for states such as IN-BR, IN-GJ and IN-KA in API for India. (not sure about other countries)

Example

{
            "region_iso": "IN-KA",
            "region_name": "Karnataka",
            "data": [
                {
                    "date": "2021-01-16",
                    "total_vaccinations": 13594,
                    "total_vaccinations_per_100": 0.02
                },
                {
                    "date": "2021-01-16",
                    "total_vaccinations": 13594,
                    "total_vaccinations_per_100": 0.02
                },
                {
                    "date": "2021-01-17",
                    "total_vaccinations": 29504,
                    "total_vaccinations_per_100": 0.05
                },
                {
                    "date": "2021-01-17",
                    "total_vaccinations": 29504,
                    "total_vaccinations_per_100": 0.05
                },
                {
                    "date": "2021-01-18",
                    "total_vaccinations": 66392,
                    "total_vaccinations_per_100": 0.11
                },
                {
                    "date": "2021-01-18",
                    "total_vaccinations": 66392,
                    "total_vaccinations_per_100": 0.11
                }
            ]
}

2 dose reporting updates

Some countries started second dose administration. This implies adding new columns people_vaccinated and people_fully_vaccinated

Current 2-dose reportings

  • Argentina
  • Austria
  • Belgium
  • Brazil
  • Bulgaria
  • Canada
  • Chile
  • Czechia
  • Denmark
  • France
  • Germany
  • Italy
  • Liechtenstein
  • Norway
  • Poland
  • Slovakia
  • Spain
  • Sweden
  • Switzerland
  • United Kingdom
  • United States

MISSING DATA: Subnational entitites missing (report here)

This issue is to keep track of those regions that are not present in our data, either in the CSV or API files.

How to report?


This was initially reported by a user, in particular, the regions were:

which were missing across all CSV and API files

Review dates

Check dates in data. Check if they are referring to same date (e.g. execution date or report date)

Move country scraping logic to src

In an attempt to simplify the logic and remove redundancy, now scraping logic is located in library
module
.

Current status:

  • Austria
  • Belgium
  • Brazil
  • Canada
  • Chile
  • Czechia
  • Germany
  • India
  • Italy
  • Slovakia
  • Spain
  • Sweden
  • Switzerland
  • Liechtenstein
  • United Kingdom
  • United States
  • Argentina
  • Bulgaria
  • Denmark
  • France
  • Norway
  • Poland

Cumulative quantities not really cumulative ?

Hi - Taking Canada as an example in the shot below: the timeseries is not continuously increasing as one would expect from a cumulative timeseries. It does not correspond neither to a daily (or "new") number as the latest figures are pretty close to the total vaccinations (just below 1 million people).

I observed this for a lot of countries: USA, Italy, Czechia, ...

image

Add sweden data

Sweden can be easily added, regional data is available here.

OWID uses this source, their script with the corresponding attribution could be used.

Denmark bug in tables

Denmark provides data in PDF format. To read it, we use tabula-py. It appears that depending on the day, tables move around the document.

OWID uses same source, follow their approach

BUG: people_fully_vaccinated is not consistent

Seems like there might be an issue in its obtention/computation. Should be monotonically increasing (as it depicts cumulative sum). However, some cases for Italy and Spain (there might be others) do not follow this.

Enhanced reporting France

The initial source is discontinued and current data is found here: https://www.data.gouv.fr/fr/datasets/donnees-relatives-aux-personnes-vaccinees-contre-la-covid-19-1/#_

Specifically:

Watch out for the column names: they are the same, but the descriptions of the files mention 1st and 2nd.

Current sourcehas been discontinued otherwise.

NaN returns invalid JSON

I am building out a GraphQL Server with StepZen and the query https://sociepy.org/covid19-vaccination-subnational/data/api/v1/latest/country_by_iso/FR.json breaks due to a NaN value rather than null value in total_vaccinations_per_100.

// Response from RestAPI

        {
            "region_name": "Corse",
            "region_iso": "FR-COR",
            "date": "2021-02-22",
            "total_vaccinations": 27059,
            "total_vaccinations_per_100": NaN
        },

GraphQL Query

  vaccines(countryISO: "FR") {
  country
  }

Response Error

{
  "data": {
    "vaccines": null
  },
  "errors": [
    {
      "message": "Invalid json content.: invalid character 'N' looking for beginning of value",
      "locations": [
        {
          "line": 58,
          "column": 3
        }
      ],
      "path": [
        "vaccines"
      ]
    }
  ]
}

Uruguay, alternative data source

In the case of Uruguay, you can take the information from the following repository:
https://github.com/3dgiordano/covid-19-uy-vacc-data/blob/main/data/Uruguay.csv

The same repository that is used in OWID for its data.
I added columns to reflect the sub-national data in the last few weeks.

If you want to migrate, the actual column format is total_[sub_id] where [sub_id] is the identification of the ISO format region in lowercase.
Example: total_ar is total_vaccinations of UY-AR

In the next week I hope to resolve the data for people_vaccinated and people_fully_vaccinated.
I will notify you when it is available.

Belgium

Now source excel file for Belgium includes 2nd dose.

TODO: change incremental to batch update.

Review types of vaccines

Review which vaccines are being administered in each region and take the following actions:

  • Track these in file data/country_info.csv
  • Pay special attention to countries using 1-dose vaccines, as computation of some fields might be different than with 2-dose vaccines.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.