Git Product home page Git Product logo

skillselect-scraper's Introduction

SkillSelect Latest Invitation Round Scraper

My first attempt at using BeautifulSoup4 and Selenium to scrape a website. It's scraping the occupations and minimum points needed for latest available invitation round of SkillSelect.

Running the program will create current_list.txt , which consists of the latest invitation round date and details of the pro rata occupations. Everytime we execute ./ss_scrape.py, it will cross check the live site's date with the date inside current_list.txt. If they're different, it will update the text file.

Example Content of current_list.txt
11 September 2018

                               name                           minimum_point    date_of_effect
id
2211  Accountants                                                  80        25/05/2018 9:59am
2212  Auditors, Company Secretaries and Corporate Treasurers       80        1/05/2018 10:54am
2334  Electronics Engineer                                         70        15/11/2017 10:32am
2335  Industrial, Mechanical and Production Engineers              70        18/01/2018 9:55pm
2339  Other Engineering Professionals                              75        3/07/2018 6:37pm
2611  ICT Business and System Analysts                             75        28/05/2018 6:25pm
2613  Software and Applications Programmers                        75        20/08/2018 3:13pm
2631  Computer Network Professionals                               70        17/01/2018 11:36am


The data in this table is licenced under a Creative Commons attribution 3.0 Australia licence,
attributed to Australian Government Department of Home Affairs

How to Run The Script

  1. install chromedriver using your OS' package manager (e.g., brew)
  2. install python 3
  3. virtualenv env -p python3
  4. source ./env/bin/activate
  5. pip install -r requirements.txt
  6. ./ss_scrape.py
TODO
  • automatically send me an email if there's changes
  • add this script into crontab
  • [] find better way to send email

skillselect-scraper's People

Contributors

asinggih avatar dependabot[bot] avatar

Stargazers

Subhan Ahmed avatar

Watchers

James Cloos avatar

skillselect-scraper's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.