Git Product home page Git Product logo

census-api's Introduction

Census Reporter API

The home for the API that powers the Census Reporter project.

It queries a census-postgres database and generates JSON output that can be read by other clients. One such client is censusreporter.org.

API Specification

Learn more about the API and how you can use it at API.md.

Installation

Documentation for installing and setting up the API parts of Census Reporter can be found at INSTALL.md.

census-api's People

Contributors

charlesreid1 avatar cliftonmcintosh avatar dependabot[bot] avatar iandees avatar jhonatanaptitude avatar joegermuska avatar ryanpitts avatar scott2b avatar tomschenkjr avatar tuchandra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

census-api's Issues

Scale down the RDS instance

I made our RDS instance larger than it needed to be during loading. We should scale it down to something smaller and also scale down the size of the EBS volume that backs it (if we can).

JSON Headers

It looks like the "application/json" header is not being attached to the response. I really don't have a good understanding on how this is all working, but I initial guess is that this is not being set when uploading to S3.

For instance:

curl -I http://api.censusreporter.org/1.0/latest/14000US27053003800/profile

getting all data for an id?

Only thing I found was https://embed.censusreporter.org/test/04000US02.json which only works on some IDs.

Could be a new endpoint, or make table_ids optional on this one https://api.censusreporter.org/1.0/data/show/latest?geo_ids=04000US01

Right now to pull this you have to make thousands of requests per ID.

Basically all I'm trying to do is take a FIPS code from TIGER data and look up the data for it, but the way the API is structured makes this extremely difficult.

Readme truncated?

It seems this DEVELOPMENT readme may have been cut off rather than just simply a work in progress.

https://github.com/censusreporter/census-api/blob/master/DEVELOPMENT.md

Issue processing values that end in "+"?

As an example, table B25077 provides "Median value of owner-occupied housing units", but for any place where the value is over $1,000,000, the census just lists "$1,000,000+". In Census Reporter, that shows up as "$1,000,001," when it seems like the plus sign should be preserved to indicate it's not an exact value.

I'd try to submit a PR for this but I'm not sure how to fix it. Seems like the exact point that the switch occurs is in value_rpn_calc() in census_extractomatic/api.py.

Geo Elastic Search

When using the Geo Elastic Search API I've found that it typically returns better results when the place name is in lower case. Putting some cities (Savannah, GA, for example) in proper case in the URL produces different results from using lower case:

title case
lower case

Just something to note for the docs, I think.

Context for similar geometries

Ryan:

return some contextual statistics about sibling geographies, contained by various levels of parent geographies

  • I fetch a county, and I also get context about all counties in the same state, and all counties nationwide
  • I fetch a block group, and I also get context about all block groups in the same county, same state, and nation

Joe:

  • some research is in order about how to handle the margins of error
  • for states, we may want to do 'region' and 'division' also

Issue with data retrieval from API

When I curl the example in the Census Reporter API documentation I get the following response:

$ curl "https://api.censusreporter.org/1.0/data/show/latest?table_ids=B13016&geo_ids=04000US55"
{"error": "The ACS 2015 5-year release doesn't include GeoID(s) 04000US55."}

Am I doing something wrong or is there an issue with the API? Thanks!

getting census data from vector tiles

Hi!

Have you considered making census data (population, etc) available in the vector tiles?

Also, returning a centroid would be handy for labels, since the polygons get clipped at high zoom levels (e.g. above z15) levels.

Tables B992704 and B992705 Return same value

Calling B992704 and B992705 returns the same value for "Total".

I struggle to believe its a coincidence for every geoID considered B992704 is employer provided healthcare and B992705 is self purchased healthcare

Amazon S3 404 for Tiles Redirect Broken

Back when we switched to SSL, the URL we ended up using goes straight to the S3 bucket and skips the S3 "Static Website Hosting" process, which is what does the browser redirect to the API when a tile is missing. As a result, the website's tiles have been failing when we haven't already generated them for a couple months.

The Amazon Static Website Hosting system doesn't support SSL, so we need something else to terminate the SSL for us.

My first thought here is to use Cloudflare, but I'm going to think about it a bit more.

along with each data dict, a sibling "metadata" dict

So if we're reporting data in a dict called marital_status, we might have a marital_status_metadata dict with:

- universe_label, e.g. "Total population over 15"
- universe_value
- max_value
- min_value
- state_median
- national_median

State Leg districts use wrong maps

We use TIGER2013 for Congress, but state legislative districts (SLDU/610 and SLDL/620) were also redrawn. We need to update those geographies and clear any cached geotiles for those sumlevels.

LICENSE?

I'm looking to use this census data stack in personal and professional projects. What is the license on this API code?

Bulk data access

Hiya! Over at the Dat project we wanna feature the Census Reporter data as an example dataset. The goal is we could have all of your data exposed as a bulk data feed that analysts could easily pull into different research workflows.

I was wondering if you offer any sort of bulk data endpoint, such as how CA Civic Data has their bulk download ZIPs http://calaccess.californiacivicdata.org/downloads/latest/ that we can use to create a version controlled history of their whole archive.

We could also potentially do it through your API, but I don't see a way through the current API to get a changes feed we could subscribe to.

API examples from code failing in production

I'm trying to pull census data using the /geo/elasticsearch API endpoint, but running into trouble (specifically {"error": "'NoneType' object is not callable"}). To make sure I was doing things right, I copied in the exact example queries from api.py, and found that that both returned the same error.

The examples I tried were the following:
http://api.censusreporter.org/1.0/geo/elasticsearch?q=new+york+city
http://api.censusreporter.org/1.0/geo/elasticsearch?q=chicago,+il

I'm a bit stumped. Could the endpoint be broken, or have these examples fallen out of date?

Previous years of ACS data still supported?

Hello!

When I curl for previous years of ACS data, I'm getting a not found error like the following:

curl "https://api.censusreporter.org/1.0/data/show/acs2015_5yr?table_ids=B01001&geo_ids=16000US5367000"
...
<h1>Not Found</h1>
<p>The acs2015_5yr release isn't supported.</p>

Are the previous years data no longer available or am I incorrectly trying to access them?

Thank you for this amazing tool!

Places without maps

Places that have sumlevels without maps do not appear in search results, but they do have automatically generated profile pages.

There are three sumlevels for which maps do not exist: 355 (NECTA Divisions), 067 (subbarrios), and 258 (Tribal Block Groups). We can see this by looking at a table that @Joonpark13 have created with metadata about profiles and tables. In this table text2 is sumlevel, and text3 is sumlevel name.

census=# select text2, count(*) from search_metadata where text3 is null group by text2;
 text2 | count 
-------+-------
 355   |    10
 067   |   145
 258   |   915
(3 rows)

The sumlevel names are populated by a series of update statements, all of which can be viewed in full-text-search/metadata_script.sql (on the branch full-text-search). These update statements are in turn taken from census_extractomatic/api.py, which has a dictionary at the top mapping sumlevels to names:

SUMLEV_NAMES = {
    "010": {"name": "nation", "plural": ""},
    "020": {"name": "region", "plural": "regions"},
    "030": {"name": "division", "plural": "divisions"},
    ...

Notably, the three sumlevels listed above are absent from this dictionary.

These pages are excluded from search results. An API call to api.censusreporter.org/1.0/geo/search?q=puerto r returns six results, one of which is the place "Puerto Real Subbarrio." But the same search from the Census Reporter home page does not include this place in the autocomplete suggestions.

Puerto Real Subbarrio still has an automatically generated profile page with meaningful data. Other such pages also exist, having pages with data but no map. See the queries at the bottom for more examples of such places.

The maps on in these pages' headers are all world maps.

A resolution to this issue will involve a design decision on what to do with these places (should they appear in search results? should they have profile pages?), and the implementation of this decision.

census=# select display_name, full_geoid from tiger2014.census_name_lookup where sumlevel = '067';
        display_name        |       full_geoid       
----------------------------+------------------------
 Valencia subbarrio         | 06700US721278407984771
 Melilla subbarrio          | 06700US721277969352900
 Pueblo Norte subbarrio     | 06700US720230981864776
 Pueblo Sud subbarrio       | 06700US720230981865005
 Pueblo Nuevo subbarrio     | 06700US720230981864819
...


census=# select display_name, full_geoid from tiger2014.census_name_lookup where sumlevel = '355';
                       display_name                        |    full_geoid     
-----------------------------------------------------------+-------------------
 Boston-Cambridge-Newton, MA NECTA Division                | 35500US7165071654
 Brockton-Bridgewater-Easton, MA NECTA Division            | 35500US7165072104
 Framingham, MA NECTA Division                             | 35500US7165073104
 Haverhill-Newburyport-Amesbury Town, MA-NH NECTA Division | 35500US7165073604
 Lawrence-Methuen Town-Salem, MA-NH NECTA Division         | 35500US7165074204
 Lowell-Billerica-Chelmsford, MA-NH NECTA Division         | 35500US7165074804
 Lynn-Saugus-Marblehead, MA NECTA Division                 | 35500US7165074854
 Nashua, NH-MA NECTA Division                              | 35500US7165075404
 Peabody-Salem-Beverly, MA NECTA Division                  | 35500US7165076524
 Taunton-Middleborough-Norton, MA NECTA Division           | 35500US7165078254
(10 rows)

census=# select display_name, full_geoid from tiger2014.census_name_lookup where sumlevel = '258' limit 15;
     display_name     |     full_geoid     
----------------------+--------------------
 Tribal Block Group A | 25800US1185T00100A
 Tribal Block Group B | 25800US1185T00100B
 Tribal Block Group C | 25800US1185T00100C
 Tribal Block Group A | 25800US2430T03500A
...

How can we store static ACS data outside of the database?

Problem: we have 100s of GB of ACS data sitting around in a database that's running 24/7 and a very small amount of it is queried.

Idea: Could we store data in S3 in such a way that we could do a range request for data without having to query the database?

SQL to update full text search has a bad query

This line in the full text search update SQL doesn't execute:

UPDATE search_metadata SET document = document || to_tsvector('simple', coalesce(sub.code, ' ')) WHERE text4 = sub.geoidlower(text2) = '500' AND type = 'profile';

psql:/home/ubuntu/census-api/full-text-search/metadata_script.sql:167: ERROR:  missing FROM-clause entry for table "sub"
LINE 1: ...ment = document || to_tsvector('simple', coalesce(sub.code, ...
                                                             ^

It's not clear to me what sub should be referring to here.

Improve place and table search

Go back to using full text search (via Elasticsearch or Postgres).

Here are some ideas for place searches and the expected outputs:

  • new york should include New York City (the place)
  • spokane schools should include 97000US5308250 (Spokane Public Schools)

502 Bad Gateway errors on Shapefile requests

This may not be limited to shapefiles, but we had a Uservoice report of errors trying to get all congressional districts in the US (https://api.censusreporter.org/1.0/data/download/latest?table_ids=B05006&geo_ids=500|01000US&format=shp)

On the hypothesis that it was related to the size of the download, I tried a few other groupings (districts in a specific region, division, state) and for each, at least occasionally got the error -- even with all districts in Alaska, which should not be a large file)

Of course it may not be limited to Shapefiles, but that's all I tested.

(this report also flushed out censusreporter/censusreporter#213 )

Understand Comparability

  • Absorb the Census information on comparability for ACS geographies
  • Census Designated Places and County Divisions are probably the most important to understand

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.