Git Product home page Git Product logo

crawl-and-parse's Introduction

covid-19-crawler

If you'd like to help the effort, email [email protected] Thanks!

The main code is in crawler.rb. The crawler and parsers are there for the 50 US states and DC. The focus is to collect offical published COVID-19 statistics. In the bin/ folder are useful commands to run the crawl and parse on all states or specific ones.

Only the main fields are being captured, so more work is needed to capture additional fields. Also county data is a big todo item.

The crawled data is being hosted on https://coronavirusapi.com/ That project could also use help!

For Windows users, following is required to get up and running: Install ruby v2.6.X latest For Windows: https://rubyinstaller.org/downloads/ After ruby install, open ruby command line and run for each โ€œgem install โ€ for the gems to install listed below Gems to install Ffi Selenium-webdriver Nokogiri Byebug Install firefox Install Visual Studio runtime redist from here: https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads Download geckodriver from here: https://github.com/mozilla/geckodriver/releases Copy to a location and add to your PATH

Thanks, and stay safe!

Setting up this repo

  • Install bundle. Bundler provides a consistent environment for Ruby projects by tracking and installing the exact gems and versions that are needed.
  • If bundle is installed, run bundle install to install dependencies
  • Run ./bin/crawl CA to run it for California.
  • Run ./bin/crawl_auto to run it on all the states where everything is automatic. It will skip a few states that need manual guidance. The script reads states.csv which contains a URL to a coronavirus webpage for each state in the USA, including DC. It crawls these webpages and collects the data for each state and saves it in the /data dir.
  • Run ruby parse_log.rb to collect the data crawled and parsed. It compares the previously scraped data with the current scraped data and saves all the data into all.csv.

crawl-and-parse's People

Contributors

adnauseum avatar btazi avatar huuep avatar kawsay avatar morissetcl avatar rbuchberger avatar rdsida avatar rickytux avatar rudi-cilibrasi avatar stevelatta avatar timmywheels avatar winstonma avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.