Git Product home page Git Product logo

aistudio-uk-school-predictions's Introduction

aistudio-uk-school-predictions

Ofsted publishes reports on school performances across the UK. This repo contains 1) a scraper to pull reports from Ofsted's site, 2) Jupyter notebooks that contain exploratory analysis to predict whether or not a school may be in danger of closing, and 3) the code for an app where you can play with a live version of our classifier. This project was made by the Quartz AI Studio in collaboration with BBC data journalist Paul Bradshaw.

Prequisites

Tika, the dependency that converts PDF files into text, requires Java to run. You can download the latest version of Java through Oracle. Alternatively, if you have a Mac, you can simply run brew cask install java.

To install all Python related dependencies, run pip install -r requirements.txt for the Render app's dependencies and pip install -r scraper_requirements.txt for the scraper's dependencies.

Download the school report data by running:

mkdir schools
aws s3 sync s3://qz-aistudio-jbfm-scratch/schools schools

Scraper

Run python scraper/scraper.py if you'd like to scrape a set of new reports. You can also scrape the addresses of schools and the publication dates of their inspection reports in scraper/get_addresses.py and scraper/get_dates.py respectively.

Machine Learning Analysis

Our exploratory analysis can be found in Jupyter notebooks in the /nbs directory. If you'd like to get started quickly, we've provided sample CSVs with a total of 2,000 reports under the /data directory. Here's a high level overview of the machine learning approaches we took:

  • uk-school-predictions.ipynb: a scikit-learn approach using a bag-of-words model and a Naive Bayes classifier
  • ukschools-fastai-tabtext.ipynb: we tried accounting for additional meta-data (dates and school names) in our corpus by combining tabular and text neural nets. This approach was inspired by and used code from this public repository.
  • last-report-final.ipynb: this approach used fast.ai's library for NLP problems. We took our corpus of reports to predict whether or not the report may be the final report before the school closes. The dataset for this notebook is different because it required a different target label. In the dataset, the final reports of closed schools were labeled last and all schools were labeled not_last. You can view this dataset in app/last_report_test_sample.csv.

Render App

To view the app locally, run python app/server.py serve.

aistudio-uk-school-predictions's People

Contributors

jkeefe avatar vcabales avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.