Git Product home page Git Product logo

mapusaurus's Introduction

Mapusaurus Build Status Coverage Status

Mapusaurus screenshot

Description

This repository provides data and scripts to set up an API endpoint for serving Home Mortgage Disclosure Act data as well as front-end and back-end application components that feed off this data. Financial institution data is loaded from raw HMDA files and welded to National Information Center data to allow for more robust analysis in the front-end application.

The Mapusaurus back-end is a Python/Django application. Additional requirements are defined below.

Data

The data you can load is:

  • HMDA Transmittal Sheet
  • HMDA Reporter Panel

Both are available from the FFIEC.

Here are the 2013 files:

Transmittal sheet: http://www.ffiec.gov/hmdarawdata/OTHER/2013HMDAInstitutionRecords.zip

Reporter panel: http://www.ffiec.gov/hmdarawdata/OTHER/2013HMDAReporterPanel.zip

Requirements

This currently uses: Django 1.7 Python 2.7.x

You will also need: PostgreSQL 9.3 PostGIS 2.1.x ElasticSearch

There's also a requirements.txt file in the repository root directory that can be installed with pip.

Loading the data

To create the tables, you need to run:

    python manage.py migrate respondents

There's also a fixture that you need to load some information from:

    python manage.py loaddata agency

This loads static regulator agency data.

Download the two transmittal sheet and reporter panel flat files.

There are two management commands that will load data, and need to be run in the following order:

1. python manage.py load_transmittal <path/to/transmittal_sheet>
2. python manage.py load_reporter_panel <path/to/reporter_panel>

GEO

The 'geo' application requires GeoDjango and PostGIS. Follow the instructions for installing GeoDjango.

Here are some separate instructions for running the geo application.

    python manage.py migrate geo

Currently, we load census tract, county, CBSA, and metropolitan division files. You can download them from the census.gov FTP site:

ftp://ftp2.census.gov/geo/tiger/TIGER2013/TRACT/
ftp://ftp2.census.gov/geo/tiger/TIGER2013/COUNTY/
ftp://ftp2.census.gov/geo/tiger/TIGER2013/CBSA/
ftp://ftp2.census.gov/geo/tiger/TIGER2013/METDIV/

This is how you load the data:

    # This example only loads census tracts from IL (FIPS code: 17); repeat
    # for other states as needed
    python manage.py load_geos_from /path/to/tl_2013_17_tract.shp
    python manage.py load_geos_from /path/to/tl_2013_us_county.shp
    python manage.py load_geos_from /path/to/tl_2013_us_cbsa.shp
    python manage.py load_geos_from /path/to/tl_2013_us_metdiv.shp

These import scripts are set up to update geos in place -- no need to delete records manually.

Once census tracts and counties are loaded, run the following command to associate census tracts with their CBSAs.

    python manage.py set_tract_csa_cbsa

Census Data

The 'censusdata' app loads census data to the census tracts found in the 'geo' application. As such, 'censusdata' relies on 'geo'.

First, run migrate to create the appropriate tables

    python manage.py migrate censusdata

You'll then want to import census data related to the tracts you've loaded while setting up the 'geo' app. Go to

http://www2.census.gov/census_2010/04-Summary_File_1/

and select the state you care about. Download the associated *.sf1.zip file, which you should then unzip.

Loading the data looks like this:

    python manage.py load_summary_one /path/to/XXgeo2010.sf1

Warning: currently, data will not be updated in place; to re-import, you'll need to delete everything from the censusdata_census2010* tables.

HMDA Data

The 'hmda' app loads HMDA data to the census tracts found in the 'geo' application. As such, 'hmda' relies on 'geo'. In fact, 'hmda' will only store data for states that are loaded via the 'geo' app.

First, run migrate to create the appropriate tables

    python manage.py migrate hmda

Next, download a flat file representing all of the HMDA LAR data:

http://www.ffiec.gov/hmda/hmdaflat.htm

and download the zip file. Unzip it and then:

    python manage.py load_hmda /path/to/2013HMDALAR\ -\ National.csv

Note that this process takes several minutes (though you will receive progress notifications). This import can be run repeatedly (if additional geos are added later, for example).

Warning: At the moment, the import assumes a single year of information. That's a todo.

Alternatively, the load_hmda script can read a directory of CSV files and load them one by one. There is also the option of removing these files after they are processed.

    split -l 50000 -d  "/path/to/2013HMDALAR\ -\ National.csv" hmda_csv_
    python manage.py load_hmda /path/to/2013HMDALAR/  delete_file:true

You will most likely want to pre-calculate the median number of loans for a particular lender X city pair -- this speeds up map loading quite a bit.

    python manage.py calculate_loan_stats

Styles

While the base application attempts to appear "acceptable", you will likely wish to provide your own icons, colors, etc. We provide an example app (basestyle) which you can modify directly or copy into a separate Django app. If you go the latter route, remember to activate your new app and deactivate the basestyle.

mapusaurus's People

Contributors

amymok avatar cmc333333 avatar contolini avatar danny8000 avatar darthdippy avatar grapesmoker avatar higs4281 avatar imuchnik avatar khandelwal avatar marcesher avatar mehtadev17 avatar micheletepper avatar ohsk avatar sephcoster avatar sleitner avatar theresaanna avatar virginiacc avatar virtix avatar wpears avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mapusaurus's Issues

Gzip API calls

If this is being used on shady networks, gzip will definitely be worth the compute power spent.

Security Findings

Hello Team,
I have reviewed the source code and found few security issues that I would like to discuss with anyone available. Please let me know if you wish to screen share via hangout or screenhero.

Secret_Key in Setting.py

Let's make sure that the secret_key under mapusaurus/mapusaurus/settings/settings.py is not being used production. If it is, let's be certain that it is taken out.

Readme changes

The README needs quite a few updates.

  • Setup needs to be clarified, specifically with regard to setting up postgres/postgis. The readme doesn't say anything about getting a database up and running, but this is a requirement for running the data setup scripts.
  • The purpose/scope of this repository needs to be clarified. It's unclear from the readme what the code/data in the repository does. Running through the setup will get the data setup for API calls, but nothing is said about how to setup the app itself (both front and back-end).
  • See proposed changes in https://github.com/wpears/mapusaurus/blob/f56cf87c86ec4105fa09b5e0f3af8ea74798ca6d/notes.txt (mostly incorporated)

502 Error

During a dynamic test on FLC demo site, I noted a 502 bad gateway error on race_summary_csv

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.