Git Product home page Git Product logo

visa_dates's Introduction

Update Graphs

Visa Bulletin Scraper

India visa wait times China visa wait times Mexico visa wait times Philippines visa wait times

This codebase scrapes the employment-based visa bulletin data from the U.S. Department of State's website and processes it into a clean CSV file for each country: India, China, Mexico and Philippines. The scraper script is scrape_visa_bulletins.py and the scraped data separated by country is in data/.

The scraper is re-run every Sunday at midnight to ensure the data files and figures are always up-to-date. The top of this README has a status badge indicating if the most recent data update succeeded or failed. This provides an aggregated view of historical visa bulletin data in both CSV and visual format for easy consumption as opposed to the outdated PDF aggregations in tabular format on the website.

Visualizing visa wait-times

I've included a basic time series visualization of the EB-1 through EB-4 visa wait times for each country mentioned above in figures/. You can also take the .csv files from data/ and upload them to ChatGPT-4 (with the advanced data analysis extension enabled) and ask it to make any figures you want. Good luck!

Dependencies

The scraper and visualization code require the following Python libraries, which can be installed using pip:

  • requests
  • pandas
  • tqdm
  • matplotlib
  • beautifulsoup4

In your Python virtual environment, install all dependencies using the following command:

pip install -r requirements.txt

How it Works

The script works by first extracting links to monthly visa bulletins from the main page. Each link is then visited to extract employment-based visa tables (specifically the first occurrence, which contains final action dates). For each country, this tabular data is then cleaned and processed, including converting date strings to datetime objects, calculating the backlog period, and renaming columns for clarity.

Scraped Data Output

The output is a CSV file named {country}_visa_backlog_timecourse.csv with the following columns:

  • EB_level: The employment-based visa level (integers 1, 2, 3, 4).
  • final_action_dates: The final action date for the visa.
  • visa_bulletin_date: The date of the monthly visa bulletin.
  • visa_wait_time: The calculated wait time for the visa in years.

Running the Script

To run the web scraping script, simply execute scrape_visa_bulletins.py. The script is designed to be run as a standalone program in about 2 minutes, depending on the speed of your internet connection:

python scrape_visa_bulletins.py

The data populates in data/.

To run the visualization script, simply execute the visualize_visa_wait_times.py file.

python visualize_visa_wait_times.py

The figures populate in figures/.

Note

As with any web scraping, this script is designed specifically for the U.S. Department of State website's HTML structure as of the time of writing (Sep. 24, 2023). If the website structure changes in the future, the script may need to be updated.

visa_dates's People

Contributors

actions-user avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.