Git Product home page Git Product logo

armslist-analysis's Introduction

Armslist analysis

What is this?

Armslist-analysis was made to clean and summarize data from Armslist.com, a site used as a marketplace for buying and selling guns. It can be used with the data scraped by NPR or in conjunction with the Armslist scraper.

Assumptions

The following things are assumed to be true in this documentation.

  • You are running OSX.
  • You are using Python 2.7. (Probably the version that came OSX.)
  • You have virtualenv and virtualenvwrapper installed and working.
  • You have postgres installed and running

For more details on the technology stack used with the app-template, see our development environment blog post.

This code should work fine in most recent versions of Linux, but package installation and system dependencies may vary.

Installation

Clone the project:

git clone [email protected]:nprapps/armslist-analysis.git
cd armslist-analysis

## Get the data

The data was scraped from the Armslist.com website in a separate repo, the filename includes the date where the scraper was run:

Dataset as of June 16th

Place the dataset into the data folder.

Run the project

Create a virtual environment and install the requirements:

mkvirtualenv armslist-analysis
pip install –r requirements.txt

The next script will try to geocode the data based on the city and state of each listing, we use Nominatim geocoding service access through the geopy library to perform that task.

Run the script to clean and geocode the data:

./clean.py

Note: The current dataset supplied is about 80000 records so it can take some time to clean and geocode, patience is a virtue...or so they say

Sometimes the geocoding service is not accesible so we always cache and persist the geocoded locations not to repeat ourselves data/geocoded-cache-nominatim.csv

Because on the original website some cities where not actually cities but could be thought more as regions, we did manually update some geolocations like West PA, Pennsylvania (15-20 manually updated).

Note: For the final map we made some hand cleaning of place names to be more consistent

What to expect

The script will create an on-memory geocode cache to try to minimize the hits to the actual Nominatim geocoding service API.

Running script will make two csv files:

  • data/listings-clean-nominatim.csv is the bulk of the data with geolocation included. Each row represents a listing and the associated details.
  • data/geocoded-cache-nominatim.csv is the geocoded cache persisted to disk for future runs of the script

## Import to DB and summarize

Start your postgres server in case you have forgotten, if you have followed our development environment setup then:

$ pgup

We created a script to insert the cleaned data into a Postgres database for further analysis

./import.sh

After the script has successfully created the database tables, we can run the script that will generate the output data that has been used in our own articles

./process.sh

Running this script will create an output folder with all the csvs that we have used for our analysis.

armslist-analysis's People

Contributors

eads avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.