censusreporter / census-api Goto Github PK

View Code? Open in Web Editor NEW

165.0 14.0 49.0 31.77 MB

The home for the API that powers the Census Reporter project.

License: MIT License

Python 95.59% Dockerfile 0.23% PLpgSQL 4.07% Procfile 0.11%

census data

census-api's Introduction

Census Reporter API

The home for the API that powers the Census Reporter project.

It queries a census-postgres database and generates JSON output that can be read by other clients. One such client is censusreporter.org.

API Specification

Learn more about the API and how you can use it at API.md.

Installation

Documentation for installing and setting up the API parts of Census Reporter can be found at INSTALL.md.

census-api's People

Contributors

Stargazers

Watchers

census-api's Issues

Scale down the RDS instance

I made our RDS instance larger than it needed to be during loading. We should scale it down to something smaller and also scale down the size of the EBS volume that backs it (if we can).

ints instead of floats where possible

like when a number represents a count of people. whole people.

It looks like the "application/json" header is not being attached to the response. I really don't have a good understanding on how this is all working, but I initial guess is that this is not being set when uploading to S3.

For instance:

curl -I http://api.censusreporter.org/1.0/latest/14000US27053003800/profile

getting all data for an id?

Only thing I found was https://embed.censusreporter.org/test/04000US02.json which only works on some IDs.

Could be a new endpoint, or make table_ids optional on this one https://api.censusreporter.org/1.0/data/show/latest?geo_ids=04000US01

Right now to pull this you have to make thousands of requests per ID.

Basically all I'm trying to do is take a FIPS code from TIGER data and look up the data for it, but the way the API is structured makes this extremely difficult.

Readme truncated?

It seems this DEVELOPMENT readme may have been cut off rather than just simply a work in progress.

https://github.com/censusreporter/census-api/blob/master/DEVELOPMENT.md

Rotate logs to S3

Rotate logs to S3
- Use an S3 workflow to delete logs after 90(?) days
- Point embed.censusreporter.org S3 website logs to this same S3 bucket
Use papertrail for live logs from the app server

Search for table columns

What about a quick auto-complete style search for column names and related metadata?

Come up with standard metadata to include on each request

release it came from
table it came from
maximum value?

Add scaling rules to the autoscaling group

Right now we don't have any auto-scaling rules.

add land area to "geography" dict in API response

Consider SSL for api.censusreporter.org

CR API causes mixed content warnings when used from SSL-served pages, which are becoming increasingly common.

Normally this would be a ticket from @konklone.

Issue processing values that end in "+"?

As an example, table B25077 provides "Median value of owner-occupied housing units", but for any place where the value is over $1,000,000, the census just lists "$1,000,000+". In Census Reporter, that shows up as "$1,000,001," when it seems like the plus sign should be preserved to indicate it's not an exact value.

I'd try to submit a PR for this but I'm not sure how to fix it. Seems like the exact point that the switch occurs is in value_rpn_calc() in census_extractomatic/api.py.

use multiple layers for different sumlevels

When downloading a shapefile that is, e.g. block groups in a city, they are all in one layer. We should use one layer per sumlev.

Geo Elastic Search

When using the Geo Elastic Search API I've found that it typically returns better results when the place name is in lower case. Putting some cities (Savannah, GA, for example) in proper case in the URL produces different results from using lower case:

title case
lower case

Just something to note for the docs, I think.

Add a memcache check as part of healthcheck

Related to #50.

The health_check endpoint should return a non-200 status code when memcache is down.

"Suggest" example in docs does not work

The docs on 'suggest' include a non-functional example. The snippet

http://api.censusreporter.org/1.0/table/suggest?q=pover

results in:

Context for similar geometries

Ryan:

return some contextual statistics about sibling geographies, contained by various levels of parent geographies

I fetch a county, and I also get context about all counties in the same state, and all counties nationwide

I fetch a block group, and I also get context about all block groups in the same county, same state, and nation

Joe:

some research is in order about how to handle the margins of error

for states, we may want to do 'region' and 'division' also

Issue with data retrieval from API

When I curl the example in the Census Reporter API documentation I get the following response:

$ curl "https://api.censusreporter.org/1.0/data/show/latest?table_ids=B13016&geo_ids=04000US55"
{"error": "The ACS 2015 5-year release doesn't include GeoID(s) 04000US55."}

Am I doing something wrong or is there an issue with the API? Thanks!

getting census data from vector tiles

Hi!

Have you considered making census data (population, etc) available in the vector tiles?

Also, returning a centroid would be handy for labels, since the polygons get clipped at high zoom levels (e.g. above z15) levels.

Tables B992704 and B992705 Return same value

Calling B992704 and B992705 returns the same value for "Total".

I struggle to believe its a coincidence for every geoID considered B992704 is employer provided healthcare and B992705 is self purchased healthcare

Amazon S3 404 for Tiles Redirect Broken

Back when we switched to SSL, the URL we ended up using goes straight to the S3 bucket and skips the S3 "Static Website Hosting" process, which is what does the browser redirect to the API when a tile is missing. As a result, the website's tiles have been failing when we haven't already generated them for a couple months.

The Amazon Static Website Hosting system doesn't support SSL, so we need something else to terminate the SSL for us.

My first thought here is to use Cloudflare, but I'm going to think about it a bit more.

along with each data dict, a sibling "metadata" dict

So if we're reporting data in a dict called marital_status, we might have a marital_status_metadata dict with:

- universe_label, e.g. "Total population over 15"
- universe_value
- max_value
- min_value
- state_median
- national_median

Update full text API endpoint to return topics

Update the full_text_search() function in api.py to return topics, in addition to profiles and tables.

Handle the population limits for geographies better

@JoeGermuska says:

ultimately, we probably want to allow people to switch between 1-year and less places

Attach geo stuff to shapefile or geojson as an output

As part of the comparison API we could offer geojson download with the ACS columns attached.

Missing Louisville?

Just noticed that when one searches for Louisville, KY, one gets an error: http://censusreporter.org/profiles/16000US2148000-louisville-ky/

Digging a bit deeper, the /data/show/latest API endpoint returns a message saying:

{ 
    "error": "The ACS 2013 5-year release doesn't include GeoID(s) 16000US2148000." 
}

when one searches for data from that that GeoID.

Build geoheader parent/child relationships

It's easy to handle the county-state-country line, but what about other types of geographies and sumlevels?

State Leg districts use wrong maps

We use TIGER2013 for Congress, but state legislative districts (SLDU/610 and SLDL/620) were also redrawn. We need to update those geographies and clear any cached geotiles for those sumlevels.

LICENSE?

I'm looking to use this census data stack in personal and professional projects. What is the license on this API code?

geo search broken for zipcodes

due to integers being stripped out in search

e.g. api.censusreporter.org/1.0/geo/search?q=60611&sumlevs=850,860

Move extraction into API so arbitrary geoid data can be built

Make the simple_api show stuff based on a user-passed geoid instead of running it on a hand-picked geoid.

If there's no data, fall back to 3- or 5-year data.
"Best" is smallest and newest

encapsulate each top-level category in its own "object" in the API response

So for a geography profile, an API response might look something like:

{
    "geography": {
        "sumlevel": 50,
        "name": ...
        ...
    }
    "demographics": {
        "population_by_age_gender": {
            ...
        },
        ...
    },
    "economics": {
        ...
    },
    ...
}

with the actual data dicts tucked inside each relevant category object.

Bulk data access

Hiya! Over at the Dat project we wanna feature the Census Reporter data as an example dataset. The goal is we could have all of your data exposed as a bulk data feed that analysts could easily pull into different research workflows.

I was wondering if you offer any sort of bulk data endpoint, such as how CA Civic Data has their bulk download ZIPs http://calaccess.californiacivicdata.org/downloads/latest/ that we can use to create a version controlled history of their whole archive.

We could also potentially do it through your API, but I don't see a way through the current API to get a changes feed we could subscribe to.

consider adding geom=false to show/tiger endpoint

a user might just want the properties of a set of geoid's, not the whole geojson representation. one can pass a geom=false to this tiger api call but that doesn't have the convenience of getting various summary levels for some certain geoid.

here's an example use case:
https://github.com/tombuckley/census-pandas/commit/7034982152af51737f19d2e934c5d45d7d5b3dbf#diff-e1eada26a892b59872fe0612cf14b204R57

Support SSL on embed.censusreporter.org

e.g. https://censusreporter.org/profiles/16000US0644000-los-angeles-ca/ tries to load geo tiles from http://embed.censusreporter.org/... and fails because they're not secure (but the origin is). Let's add SSL there.

API examples from code failing in production

I'm trying to pull census data using the /geo/elasticsearch API endpoint, but running into trouble (specifically {"error": "'NoneType' object is not callable"}). To make sure I was doing things right, I copied in the exact example queries from api.py, and found that that both returned the same error.

The examples I tried were the following:
http://api.censusreporter.org/1.0/geo/elasticsearch?q=new+york+city
http://api.censusreporter.org/1.0/geo/elasticsearch?q=chicago,+il

I'm a bit stumped. Could the endpoint be broken, or have these examples fallen out of date?

Error output for API includes JSON in a string

http://api.censusreporter.org/1.0/geo/show/tiger2013

Come up with a way to convert geography names to something more useful

e.g. "Evanston city, Illinois" is not helpful

get-10georeleasegeoidparents API example/endpoint broken

This example:
$ curl "http://api.censusreporter.org/1.0/geo/tiger2013/04000US55/parents"
returns 301: permanently removed.

thanks for your product--it's great!

Previous years of ACS data still supported?

Hello!

When I curl for previous years of ACS data, I'm getting a not found error like the following:

curl "https://api.censusreporter.org/1.0/data/show/acs2015_5yr?table_ids=B01001&geo_ids=16000US5367000"
...
<h1>Not Found</h1>
<p>The acs2015_5yr release isn't supported.</p>

Are the previous years data no longer available or am I incorrectly trying to access them?

Thank you for this amazing tool!

Places without maps

Places that have sumlevels without maps do not appear in search results, but they do have automatically generated profile pages.

There are three sumlevels for which maps do not exist: 355 (NECTA Divisions), 067 (subbarrios), and 258 (Tribal Block Groups). We can see this by looking at a table that @Joonpark13 have created with metadata about profiles and tables. In this table text2 is sumlevel, and text3 is sumlevel name.

census=# select text2, count(*) from search_metadata where text3 is null group by text2;
 text2 | count 
-------+-------
 355   |    10
 067   |   145
 258   |   915
(3 rows)

The sumlevel names are populated by a series of update statements, all of which can be viewed in full-text-search/metadata_script.sql (on the branch full-text-search). These update statements are in turn taken from census_extractomatic/api.py, which has a dictionary at the top mapping sumlevels to names:

SUMLEV_NAMES = {
    "010": {"name": "nation", "plural": ""},
    "020": {"name": "region", "plural": "regions"},
    "030": {"name": "division", "plural": "divisions"},
    ...

Notably, the three sumlevels listed above are absent from this dictionary.

These pages are excluded from search results. An API call to api.censusreporter.org/1.0/geo/search?q=puerto r returns six results, one of which is the place "Puerto Real Subbarrio." But the same search from the Census Reporter home page does not include this place in the autocomplete suggestions.

Puerto Real Subbarrio still has an automatically generated profile page with meaningful data. Other such pages also exist, having pages with data but no map. See the queries at the bottom for more examples of such places.

The maps on in these pages' headers are all world maps.

A resolution to this issue will involve a design decision on what to do with these places (should they appear in search results? should they have profile pages?), and the implementation of this decision.

census=# select display_name, full_geoid from tiger2014.census_name_lookup where sumlevel = '067';
        display_name        |       full_geoid       
----------------------------+------------------------
 Valencia subbarrio         | 06700US721278407984771
 Melilla subbarrio          | 06700US721277969352900
 Pueblo Norte subbarrio     | 06700US720230981864776
 Pueblo Sud subbarrio       | 06700US720230981865005
 Pueblo Nuevo subbarrio     | 06700US720230981864819
...


census=# select display_name, full_geoid from tiger2014.census_name_lookup where sumlevel = '355';
                       display_name                        |    full_geoid     
-----------------------------------------------------------+-------------------
 Boston-Cambridge-Newton, MA NECTA Division                | 35500US7165071654
 Brockton-Bridgewater-Easton, MA NECTA Division            | 35500US7165072104
 Framingham, MA NECTA Division                             | 35500US7165073104
 Haverhill-Newburyport-Amesbury Town, MA-NH NECTA Division | 35500US7165073604
 Lawrence-Methuen Town-Salem, MA-NH NECTA Division         | 35500US7165074204
 Lowell-Billerica-Chelmsford, MA-NH NECTA Division         | 35500US7165074804
 Lynn-Saugus-Marblehead, MA NECTA Division                 | 35500US7165074854
 Nashua, NH-MA NECTA Division                              | 35500US7165075404
 Peabody-Salem-Beverly, MA NECTA Division                  | 35500US7165076524
 Taunton-Middleborough-Norton, MA NECTA Division           | 35500US7165078254
(10 rows)

census=# select display_name, full_geoid from tiger2014.census_name_lookup where sumlevel = '258' limit 15;
     display_name     |     full_geoid     
----------------------+--------------------
 Tribal Block Group A | 25800US1185T00100A
 Tribal Block Group B | 25800US1185T00100B
 Tribal Block Group C | 25800US1185T00100C
 Tribal Block Group A | 25800US2430T03500A
...

How can we store static ACS data outside of the database?

Problem: we have 100s of GB of ACS data sitting around in a database that's running 24/7 and a very small amount of it is queried.

Idea: Could we store data in S3 in such a way that we could do a range request for data without having to query the database?

The API is down

Sorry, I'm reporting that the api is down, e.g.,

http://api.censusreporter.org/2.0/table/latest/B01001

SQL to update full text search has a bad query

This line in the full text search update SQL doesn't execute:

census-api/full-text-search/metadata_script.sql

Line 167 in 93305fa

 UPDATE search_metadata SET document = document || to_tsvector('simple', coalesce(sub.code, ' ')) WHERE text4 = sub.geoidlower(text2) = '500' AND type = 'profile'; 

psql:/home/ubuntu/census-api/full-text-search/metadata_script.sql:167: ERROR:  missing FROM-clause entry for table "sub"
LINE 1: ...ment = document || to_tsvector('simple', coalesce(sub.code, ...
                                                             ^

It's not clear to me what sub should be referring to here.

/data/show/ doesn't honor release when testing geoid count

census=> select count(*) from acs2013_3yr.geoheader where sumlevel = 160;
 count 
-------
  2185

response for http://api.censusreporter.org/1.0/data/show/acs2013_3yr?table_ids=B01003&geo_ids=160|01000US:

{  "error": "You requested 29509 geoids. The maximum is 3500. Please contact us for bulk data." }

That's the number of places in the 5 year.

Improve place and table search

Go back to using full text search (via Elasticsearch or Postgres).

Here are some ideas for place searches and the expected outputs:

new york should include New York City (the place)
spokane schools should include 97000US5308250 (Spokane Public Schools)

Add documentation for the API

Now that the API is semi-stable we should document it.

502 Bad Gateway errors on Shapefile requests

This may not be limited to shapefiles, but we had a Uservoice report of errors trying to get all congressional districts in the US (https://api.censusreporter.org/1.0/data/download/latest?table_ids=B05006&geo_ids=500|01000US&format=shp)

On the hypothesis that it was related to the size of the download, I tried a few other groupings (districts in a specific region, division, state) and for each, at least occasionally got the error -- even with all districts in Alaska, which should not be a large file)

Of course it may not be limited to Shapefiles, but that's all I tested.

(this report also flushed out censusreporter/censusreporter#213 )

Clean up old tickets!

Hey, Ian, when you have time can you clean up the year-old tix in here?

Understand Comparability

Absorb the Census information on comparability for ACS geographies
Census Designated Places and County Divisions are probably the most important to understand