Git Product home page Git Product logo

violations's Introduction

Predicting Housing Code Violations

Maxwell Austensen 2017-05-06

Overview

The purpose of this project is to to predict serious housing code violations in multi-family rental building in New York City. All data is taken from publicly available sources, and is organized at the borough-block-lot (BBL) level. The plan is to use all data available in 2015 to predict violations in 2016.

All prediction results can be visualized in this Shiny app: https://maxwell-austensen.shinyapps.io/violations-app/

Repository Organization

Directory Description
./ Pseudo makefiles for data and analysis/maps
./analysis R Notebook files for main analysis
./violations-app shiny files for app to visualize model predictions
./maps R scripts to create maps and final map images
./munge R scripts to download raw files, clean data, and prep for joining all sources
./data-raw Raw data files, and cleaned individual data sets, including crosswalks (git-ignored due to file size)
./data-documentation Documentation files downloaded for data sources
./data Final cleaned and joined data sets (only samples of data are not git-ignored)
./functions R functions used throughout project
./presentations Slide presentations for class using xaringan, including final presentaiton PDF
./packrat Files for packrat R package management system (do not edit)

Reproducability Instructions

  1. Clone repo and open the RStudio project file edsp17proj-austensen.Rproj

    • The package packrat will be automatically installed from source files in the repository. Then all the other packages used in this project will be installed from instructions saved in this repo. All installed packages will be saved in the packrat sub-directories of this repo. This allows you to easily get all the packages you need to reproduce this project while not disrupting your own local package library (eg. change versions).
  2. Run source("make_data.R") to download and prepare all the data necessary to reproduce all the analysis.

  3. Run source("make_analysis_maps.R") to run all the analysis scripts, rendering .nb.html files and generating map images.

To-Do

  • Improve logit model using MASS::stepAIC() to choose a model

  • Plot decision tree (look at rpart.plot package)

  • Consider changing from classification to regression using adjusted serious violations count

  • Deal with missing data problems

    • Impute missing data
    • simple mean imputation,
    • mean by zip code and/or building type,
    • should also see if missing-not-at-random
    • look for values in past years of data (older pluto/rpad versions),
    • regressions using other variables
  • Add to evaluation of models using tests recommended in Dietrich (1997) reading

Data source wish-list

violations's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.