Git Product home page Git Product logo

benefice's Issues

add friendly instructions for installing postGIS

Based on what environment people have, give friendly instructions for

  • mac (homebrew or macports or postgis?)
  • linux (ubuntu)
  • windows

I ultimately wound up installing PostGIS 2.0.2 from source — is there a better way?

On my slightly decrepit MacBook (OS X 10.6.8) the whole path for getting Edifice-ready was:

$ sudo port install postgresql92-server

Then to initialize a default database with Unicode (i.e. template0 and template1, I think)

$ sudo su postgres -c '/opt/local/lib/postgresql92/bin/initdb --locale=en_US.UTF-8 --encoding=UNICODE -D /opt/local/var/db/postgresql92/defaultdb'

Finally, to get the local Postgres database to start and stop on boot:

$ sudo launchctl load -w /Library/LaunchDaemons/org.macports.postgresql92-server.plist 

Then my successful compile of the PostGIS 2.0.2 source code looked like this:

~/src/postgis-2.0.2$ ./configure --with-geosconfig=/Library/Frameworks/GEOS.framework/unix/bin/geos-config  --with-projdir='/Library/Frameworks/PROJ.framework//Versions/4/unix' --with-pgconfig=/opt/local/lib/postgresql92/bin/pg_config

This was after installing the GEOS and PROJ frameworks from the http://www.kyngchaos.com/software/frameworks page (very handy -- especially if you want to get QGIS working).

If you have taken notes on other workflows, please post them and we can be as generic and/or specific as possible in the docs.

Address matching algorithm

Determine a quick, accurate way of matching addresses across multiple datasets.

This library seems like a potentially useful start: https://github.com/jjensenmike/python-streetaddress

From Forest:
From what I remember reading in this area, there is no better approach than using a gazetteer (if available). For Chicago, we know all the street names and the their address ranges. https://data.cityofchicago.org/Transportation/Chicago-Street-Names/i6bp-fvbx

Taking that as the gazetteer, the task is to find the standardized street name that is most similar to our query address.

That would standardize the street name, and often the direction.

If we had a source of trusted address or smaller resolution address ranges (maybe the building footprints?), then matching against that gazetteer is the best way to go.

For comparing the similarity of a query address to a target address I would recommend the Levenshtein distance or a modification like the affine-gap distance we use for dedupe.

This is more flexible and will tend to be much more accurate than regexp or similar tricks.

Detecting modified files on the data portal (for setup_edifice.py)

This seems like a useful thing to do. While figuring out how to store and provide views on temporal change from non-temporal datasets in the edifice database is our own problem, for those users who merely want an up-to-date dataset, it would be nice to not have to re-download everything every night.

Any strategies for this? wget --spider will return the ultimately resolved URL and the file length without downloading the file. It seems plausible that a changed file on the data portal might also resolve to a URL with a new string — i.e. when I do:

wget --spider --no-check-certificate -O 'City Boundary.zip' http://data.cityofchicago.org/download/q38j-zgre/application/zip

That gets resolved to https://data.cityofchicago.org/api/file_data/9OVgki_a-MytpymEU2LRxpx0fsvbAE6MmYS8iDWm4xs?filename=City%2520Boundary.zip .

I'm guessing that maybe when a new zip file gets put up there, that long string "9OVgki_a-MytpymEU2LRxpx0fsvbAE6MmYS8iDWm4xs" will be changed. Can anyone confirm this?

(The file length — 120943 bytes — is also displayed when you use wget --spider. But obviously file length is an insufficient criteria for determining data modification).

We verified last night that for csv files that re-resolved long string mentioned above is not present, so we can't use that as a way to detect new versions.


One alternative approach may be using the API to find the date/time of when the file was updated, i.e. 'updated_at' in the Socrata SODA API? Has anyone used this successfully before?

The question then is how to best locally store the dates/times when the client last pulled down a given dataset. I could see an argument for having this be a special table.. but it may be better as a local flat file since the main script is also proficient at completely dropping your database.

setup script assumes a postgres user exists

I'm not sure what privileges are expected for the user, so I just created a superuser for now. I used the following steps:

psql -d postgres
create role postgres;
alter role postgres SUPERUSER;

Without the postgres user, you'll just see a bunch of "psql: FATAL: role "postgres" does not exist" when running the script

implement CSV import function

Implement a function that can:

  • point to a flat file on the data portal
  • download it as a csv
  • create a new table based on the data fields
  • import the raw data in to PostgreSQL

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.