Git Product home page Git Product logo

influenza-like-illness-prediction's Introduction

Influenza-Like-Illness prediction

This project aims to implement a prediction model for Influenza-like illness in Italy.

Brief history of the literature

Detecting epidemics using digital data is an hot topic in modern literature. For what regards influenza epidemics, the research paper "Detecting influenza epidemics using search engine query data" by Ginsberg, Mohebbi, Patel, Brammer, Smolinski & Brilliant published by Nature(2009) has been a milestone which brought the CDC to use Google FluTrend as gold standard for Influenza-Like Illness. All went good until "GFT overestimated the prevalence of flu in the 2012–2013 season and overshot the actual level in 2011–2012 by more than 50%" as Lazer, Kennedy, King, Vespignani reported in the article "The Parable of Google Flu: Traps in Big Data Analysis" on Science(2014). This thing pushed the research community to go further in the prediction analysis of Influenza-Like-Illness from digital data.

Why influenza is important to estimate?

It could seem that in developed countries the influenza epidemics is well controlled but if we see the real data "Seasonal influenza epidemics result in an estimated 3 to 5 million cases of severe illness and 250,000 to 500,000 deaths worldwide each year" from https://www.who.int/en/news-room/fact-sheets/detail/influenza-(seasonal).

General pattern for digital disease detection

  1. Ground truth data from official source, this will be your gold standard.
  2. Proxy data from digital source.
  3. Problem: Predict ground truth data(target) from proxy data (features).
  4. Train a statistical/learning model to solve the problem.
  5. Validate model performance.

Addressed problem: predict Influenza-like illness in Italy

  1. Ground truth: The data comes from the epidemiologic surveillance by the ISS(Istituto Superiore di Sanità) on influenza. The data are available here.
  2. Proxy data: The digital data comes from the Wikipedia page view rate on pages related to Influenza. The data were scraped from the PageView service from the Wikimedia Toolforge.

The adopted methodology is inspired by the work D.J. McIver & J. S. Brownstein (2014), "Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time", PLoS Comput Biol 10(4): e1003581 http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003581

Further discussion can be found into the Jupyter Notebook inside the repository.

Built With

  • Selenium - Used to scrape dynamic web data
  • Scikit-learn - For the setup of most of the machine learning models
  • H2O - For creating the GLM used in the McIver & Brownstein paper.

Final Results

All the results can be found on the Jupyter Notebook in the repository

Authors

  • Valerio Guarrasi - M.Sc. in Data Science student - guarrasi1995

  • Andrea Marcocchia - M.Sc. in Data Science student - andreamarco

  • Marco Minici - M.Sc. in Data Science student - mminici

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

  • All the work is inspired by the course lectures in "Digital Epidemiology" held by Professor Ciro Cattuto.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.