jtleider / censusdata Goto Github PK

View Code? Open in Web Editor NEW

139.0 139.0 29.0 24.88 MB

Download data from Census API

License: MIT License

Python 100.00%

censusdata's People

Contributors

Stargazers

Watchers

censusdata's Issues

issues pulling zip code

Having issues pulling data for zip code tabulation area. I want to get data for all zip codes in the U.S. I am sure its simple, but can't seem to figure it out. thanks

import censusdata as acs
acs.download('acs5', 2019, acs.censusgeo([('zip code tabulation area','*')]), ['B25007_014E','B25007_012E'],key=key)
KeyError: 'state> zip code tabulation area'

I have also tried the below and receive the same error

acs.download('acs5', 2019, acs.censusgeo([('state','*'),('zip code tabulation area','*')]), ['B25007_014E','B25007_012E'],key=key)

KeyError: 'state> zip code tabulation area'

However, this code works. I recive a dictionary of all census zips (33,120)

acs.geographies(acs.censusgeo([('zip code tabulation area', '*')]), 'acs5', 2019)

Using API Key in Queries

Could you provide an example of a query using an API key? Thanks!

Cannot find Variables that start with S

Hello Pythoneers,

as stated in the Title, I am having troubles accessing variables that start with S.
Using the American FactFinder download center (https://factfinder.census.gov/faces/nav/jsf/pages/download_center.xhtml)
the "AGE AND SEX" datatable with its corresponding Number S0101 is available for download, however, the following code cannot find any tables:
cd.printtable(cd.censustable("acs5",2017,"S0101")) Table not found!
This behaviour occurred for all tables that start with S.

In particular, I am interested in the age distribution of certain census tracts which is provided in the above mentioned "AGE AND SEX" table. Using the search command of censusData, I was only able to find the "SEX BY AGE" table (B01001) which is not quite the same as I would have to add males and females together for all age groups.

Any help appreciated!
Thank you,

Matthias

Discrepancy between API results and downloaded table from Census data.gov

I am noticing that there is a discrepancy with the data that is obtained when using this API than the data that is obtained directly from the Census data website (via their table download feature).

The datafield that I am looking at specifically is: B18108 (Disability-related data), but from the look of it, this affects other datafield as well. And I am using county-level data for ASC-1-year 2019.

The following code is used:

result = censusdata.download('acs1', 2019, censusdata.censusgeo([('county', '*')]), datafields)
censusdata.export.exportcsv('censusdata-api.csv', result)

(The datafields are all the variables/values related to B18108)

The data itself is not incorrect, but the values seem to corresponds to the wrong location/county. The first couple rows are correct, but then the subsequent rows are not. One example: the data for Los Angeles county obtained from the API matches the Napa County from the downloaded table. Rock County, WI data from API matches to the Scioto County, OH.

Wrong Python version requirement

The following is used in setup.py:
python_requires='>=2.7, >=3',
The comma is equivalent to an and operator (see here), so this effectively implies python >= 3.

The correct way would simply be:
python_requires='>=2.7',

Support for 2015-2019 ACS 5-Year Estimates

Thanks for developing this package, which has made my data pulls so much easier.

The Census Bureau has justed released the 2015-2019 5-year estimates. Is it possible to add support for that?

https://www.census.gov/programs-surveys/acs/news/updates.2020.html

Download not working for censusgeo function with four tuples

This library seems to be working really well! Thanks for creating this.

I did run into an issue when trying to download data with 'censusgeo' with four tuples though. When I run the same 'censusgeo' to get the geographical information for this area using 'censusdata.geographies' this seems to work just fine but not when inserting this into the download method.

I also verified that the API call was working with these parameters using postman, so it seems not to be an issue with the endpoints api.census.gov.

Working API Get Request: https://api.census.gov/data/2018/acs/acs5?get=NAME,B01001_007E&for=place/remainder+(or+part):*&in=state:12+county:057+county+subdivision:93367

Here is the line that I tried to run:
censusdata.download('acs5', 2018, censusdata.censusgeo([('state','12'), ('county', '057'), ('county subdivision', '93367'), ('place/remainder (or part)', '*')]),['B01001_007E'])

The error that I receive: KeyError: 'state> county> county subdivision> place/remainder (or part)'

censusdata.download('acs5', 2018, censusdata.censusgeo([('state','12'), ('county', '057'), ('county subdivision', '93367'), ('place/remainder (or part)', '*')]),['B01001_010E'],endpt='new')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    392                         if cls is not object \
    393                                 and callable(cls.__dict__.get('__repr__')):
--> 394                             return _repr_pprint(obj, self, cycle)
    395 
    396             return _default_pprint(obj, self, cycle)

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    698     """A pprint that just redirects to the normal repr function."""
    699     # Find newlines and replace them with p.break_()
--> 700     output = repr(obj)
    701     lines = output.splitlines()
    702     with p.group():

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/core/frame.py in __repr__(self)
    685             line_width=width,
    686             max_colwidth=max_colwidth,
--> 687             show_dimensions=show_dimensions,
    688         )
    689 

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/core/frame.py in to_string(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, min_rows, max_cols, show_dimensions, decimal, line_width, max_colwidth, encoding)
    818                 line_width=line_width,
    819             )
--> 820             return formatter.to_string(buf=buf, encoding=encoding)
    821 
    822     # ----------------------------------------------------------------------

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/io/formats/format.py in to_string(self, buf, encoding)
    912         encoding: Optional[str] = None,
    913     ) -> Optional[str]:
--> 914         return self.get_result(buf=buf, encoding=encoding)
    915 
    916     def to_latex(

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/io/formats/format.py in get_result(self, buf, encoding)
    519         """
    520         with self.get_buffer(buf, encoding=encoding) as f:
--> 521             self.write_result(buf=f)
    522             if buf is None:
    523                 return f.getvalue()

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/io/formats/format.py in write_result(self, buf)
    821         else:
    822 
--> 823             strcols = self._to_str_columns()
    824             if self.line_width is None:  # no need to wrap around just print
    825                 # the whole frame

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/io/formats/format.py in _to_str_columns(self)
    717         # may include levels names also
    718 
--> 719         str_index = self._get_formatted_index(frame)
    720 
    721         if not is_list_like(self.header) and not self.header:

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/io/formats/format.py in _get_formatted_index(self, frame)
   1057             )
   1058         else:
-> 1059             fmt_index = [index.format(name=self.show_row_idx_names, formatter=fmt)]
   1060 
   1061         fmt_index = [

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/core/indexes/base.py in format(self, name, formatter, **kwargs)
    964             return header + list(self.map(formatter))
    965 
--> 966         return self._format_with_header(header, **kwargs)
    967 
    968     def _format_with_header(self, header, na_rep="NaN", **kwargs):

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/core/indexes/base.py in _format_with_header(self, header, na_rep, **kwargs)
    978 
    979         if is_object_dtype(values.dtype):
--> 980             result = [pprint_thing(x, escape_chars=("\t", "\r", "\n")) for x in values]
    981 
    982             # could have nans

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/core/indexes/base.py in (.0)
    978 
    979         if is_object_dtype(values.dtype):
--> 980             result = [pprint_thing(x, escape_chars=("\t", "\r", "\n")) for x in values]
    981 
    982             # could have nans

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/io/formats/printing.py in pprint_thing(thing, _nest_lvl, escape_chars, default_escapes, quote_strings, max_seq_items)
    228         result = "'{thing}'".format(thing=as_escaped_string(thing))
    229     else:
--> 230         result = as_escaped_string(thing)
    231 
    232     return result

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/pandas/io/formats/printing.py in as_escaped_string(thing, escape_chars)
    204             escape_chars = escape_chars or tuple()
    205 
--> 206         result = str(thing)
    207         for c in escape_chars:
    208             result = result.replace(c, translate[c])

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/censusdata/censusgeo.py in __str__(self)
    128                         return 'Summary level: ' + self.sumlevel() + ', ' + '> '.join([geo[0]+':'+geo[1] for geo in self.geo])
    129                 else:
--> 130                         return self.name + ': Summary level: ' + self.sumlevel() + ', ' + '> '.join([geo[0]+':'+geo[1] for geo in self.geo])
    131 
    132         def params(self):

~/.local/share/virtualenvs/RCVFlorida-m3qDY6vb/lib/python3.7/site-packages/censusdata/censusgeo.py in sumlevel(self)
    155                 Returns:
    156 			str: String representing the summary level code for this object's geographic level, e.g., '050' for 'state> county'."""
--> 157                 return self.sumleveldict[self.hierarchy()]
    158 
    159         def request(self):

KeyError: 'state> county> county subdivision> place/remainder (or part)'```

Censusdis as a possible replacement option

I was a very happy user of censusdata. When support ended, I rewrote part of my personal code
that used to use it and have now open sourced this as censusdis.

You may want to have a look if you are looking for a new way to access US Census data in python.

Possible to add 1-year estimates for all possible years?

I'm trying to get get data from as far back as possible, but the package only goes back to 2012 for 1-year estimates.

If it's not possible, is there a particular reason? For example, was the data from before 2012 not high quality?

Thank you so much!

pip install fails: no README.rst

This is the error message i get when

Collecting censusdata
Using cached CensusData-1.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/21/klwvzx956qg2wyh1dtnd86fw00bgqc/T/pip-build-y2g9v4d3/censusdata/setup.py", line 7, in
long_description=open('README.rst').read(),
FileNotFoundError: [Errno 2] No such file or directory: 'README.rst'

pip 9.0.1
python 3.6 (also fails on 2.7)

[feature] Display error messages with raise statements, instead of print()

When errors are raised, the accompanying messages are output via print statements. This means that messages are printed at the top of the stack trace, where they're relatively hidden. Rather, error messages should be printed with the raise statement, as in raise SomeError('Some error message')

In my experience, this is especially confusing to less-experienced Python users, who aren't in the habit of scrolling through a stack trace, and so are unlikely to find these error messages.

The current approach:

print('Some error message')
raise SomeError

The suggested approach:

raise SomeError('Some error message')

2020 redistricting data

Will this package work for the 2020 Census data coming out today?

-222222222, or -666666666 Variable Values

Question rather than an issue:

What do values such as -222222222, or -666666666 for variables mean? I've run a web search and browsed US Census website but haven't located documentation on this. Any help would be appreciated.

Collecting 2000 SF1 & SF3 Data

Hi Julien,
Is there a reason why this tool is unable to collect 2000 or 1990 census data? Looks like the issue is with the api calling link where /dec/ is used in the address to collect SF1 2010 data whereas this section of the address is missing from the the address to collect SF1 2000 data.
broken link: https://api.census.gov/data/2000/dec/sf1?get=NAME,H001001&for=county:005&in=state:36):
working link: https://api.census.gov/data/2000/sf1/variables/H001001.json
It would be much appreciated if you could update the code to handle earlier census data. Thanks!

Extracting FIPS and name from censusgeo index?

I am looking to do a join between the censusdata dataframe and a geodataframe containing associated geometry indexed by the FIPS code.

The following extracts the FIPS and NAME from the result's censusgeo index and reindexes by the FIPS. I've reviewed https://jtleider.github.io/censusdata/api.html#module-censusdata.censusgeo, but did not see an easier way to do this. Can you provide any insights on a simpler way than below, or am I missing how to use the censusgeo index altogether? Thank you.

import censusdata as cd

# query for census variables
df = cd.download('acs5', 2019, cd.censusgeo([('state', '*')]), ['C17002_001E', 'B17001_002E'])

# display censusdata result
display(df.head(2))

# convert index to FIPS and Name
df['STATEFP'] = [item.params()[0][1] for item in df.index.to_list()]
df['NAME'] = [str(item).split(':')[0] for item in df.index.to_list()]

# rearrange columns and set FIPS as index
df.set_index('STATEFP', inplace=True)
columns = df.columns.to_list()
columns.remove("NAME")
df = df[["NAME"] + columns]

# display reindexed df
df.head(2)

Result

	C17002_001E	B17001_002E
Alabama: Summary level: 040, state:01	4754288	795989
Alaska: Summary level: 040, state:02	719376	76933

Reindexed to FIPS

STATEFP	NAME	C17002_001E	B17001_002E
01	Alabama	4754288	795989
02	Alaska	719376	76933

Download returning None for ACS5 variable

I'm trying to pull block group level information for a county on gross medium rent and the returned dataframe values are all 'None'. I'm interested in ACS5 data from multiple years (I've tried 2013 through 2018) using the variable B25031_004E and I'm not sure why it is returning 'None' because I know that this data exists.

I've run into the same problem for B17026_001E, B27001_001E, and B03001_001E. I am alomst certain that there should be data for these variables.

Thanks for any help and suggestions!

Downloading places table/data

I'm trying to download census data specific to places and I'm getting an unsupported hierarchy error.

It looks like, from the documentation, that the hierarchy is states>places and this works:

census_places = censusdata.geographies(censusdata.censusgeo([('state', '36'), ('place', '*')]), 'acs5', 2015)

But this does not

censusdata.download('acs5', 2016, censusdata.censusgeo([('state', '36'), ('places', '01011')]),["B01002_001E", "B01002_002E", "B01002_003E", "B01002A_001E", "B01002A_002E", "B01002A_003E", "B01002B_001E", "B01002B_002E", "B01002B_003E", "B01002C_001E", "B01002C_002E", "B01002C_003E", "B01002D_001E", "B01002D_002E", "B01002D_003E", "B01002E_001E", "B01002E_002E", "B01002E_003E", "B01002F_001E", "B01002F_002E", "B01002F_003E"])

I'm not entirely sure where I'm going wrong. Any suggestions?

Creating Queries with More than one Geography Type

Hi Julien,
Could you provide an example of how one would select both a particular county and a census tract? For example, run a query to collect data for Richmond County, NY (FIPS 36085) and Census Tract 3 in Richmond County (FIPS (36085000300)? Additionally, how would run a query for all counties in two separate states?
In the example below I tried to select all counties from two states but didn't have any luck.

import pandas as pd
import censusdata
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.precision', 2)
cookbg = censusdata.download('acs5', 2015,
censusdata.censusgeo([('state', '36', '33'), ('county', '*')]),

Issue with TableID S0101

I have version 1.13 of CensusData. When I try to run censusdata.censustable("acs1", 2016, "S0101") for subject table S0101 (https://data.census.gov/cedsci/table?q=sex%20by%20age&g=0500000US51059&y=2016&tid=ACSST1Y2016.S0101), I get the following error:

Traceback (most recent call last):
File "", line 1, in
File "/home/matt/Projects/NAACP/.venv/lib/python3.8/site-packages/censusdata/variable_info.py", line 110, in censustable
raise ValueError(u'Table not found!')
ValueError: Table not found!

It seems to break the assumption in line 105 that the table ID will be the variable up to the last underscore (''.join(k.split('')[:-1]) == table).

I tried to use table "S0101_C01" instead based on looking at your acs1_2016_subject_variables.json file since using that as the table should and did work

tableInfo = censusdata.censustable("acs1", 2016, "S0101_C01")
fullTable = censusdata.download("acs1", 2016, censusdata.censusgeo([('state', '51'), ('county', '059')]),
list(tableInfo.keys()))

This gave me the error:

Traceback (most recent call last):
File "", line 1, in
File "/home/matt/Projects/NAACP/.venv/lib/python3.8/site-packages/censusdata/download.py", line 116, in download
data.update(_download(src + tabletype, year, params, endpt=endpt))
File "/home/matt/Projects/NAACP/.venv/lib/python3.8/site-packages/censusdata/download.py", line 38, in _download
raise ValueError('Unexpected response (URL: {0.url}): {0.text} '.format(r))
ValueError: Unexpected response (URL: https://api.census.gov/data/2016/acs/acs1?get=NAME,S0101_C01_001E,S0101_C01_002E,S0101_C01_003E,S0101_C01_004E,S0101_C01_005E,S0101_C01_006E,S0101_C01_007E,S0101_C01_008E,S0101_C01_009E,S0101_C01_010E,S0101_C01_011E,S0101_C01_012E,S0101_C01_013E,S0101_C01_014E,S0101_C01_015E,S0101_C01_016E,S0101_C01_017E,S0101_C01_018E,S0101_C01_019E,S0101_C01_020E,S0101_C01_021E,S0101_C01_022E,S0101_C01_023E,S0101_C01_024E,S0101_C01_025E,S0101_C01_026E,S0101_C01_027E,S0101_C01_028E,S0101_C01_029E,S0101_C01_030E,S0101_C01_031E,S0101_C01_032E,S0101_C01_033E,S0101_C01_034E,S0101_C01_035E,S0101_C01_036E&for=county:059&in=state:51): error: error: unknown variable 'S0101_C01_001E'

sf1 variables

Been having an issue downloading SF1 data. Seems like variable names have changed since last commit on sf1_2010_variables.json (e.g. P0030001 is now P003001 per https://api.census.gov/data/2010/dec/sf1/variables.json). Would it be possible to update that file?

Thanks for all your work here Julien!

Calling 50 or more variables in a single query

The current Census API does not allow calling 50 or more variables in a single query. Is there a solution to takes a list of variable names, and collect them into chunks with no more than 50 variables, thereby calling 50+ variables in a single query? An example of a solution in R for this issue is described at: https://rstudio-pubs-static.s3.amazonaws.com/19337_2e7f827190514c569ea136db788ce850.html

Plan to include 2020 data

Hey there, big fan of the package. Let me know if I can help implement the version for the most recent year

Unknown variables error: Unemployment Rates

Variables listed below can be looked up with variable_info(), but not downloaded. Other required variables can be downloaded. Data.census.gov will also show all variables as valid on the website.
Similar issue: #30 was resolved by updating source Census Bureau data due to changes they've made.

Having trouble accessing:
'S2301_C02_008E': 'Estimate!!Labor Force Participation Rate!!Population 16 years and over!!AGE!!55 to 59 years'
'S2301_C02_001E': 'Estimate!!Labor Force Participation Rate!!Population 16 years and over'

import censusdata as cd

# define census variable
variables = ['S2301_C02_008E']

# get variable info
display( cd.variable_info.censusvar('acs1', 2019, variables) )

# define geo index
geo = cd.censusgeo([('state', '06')])

# download data
cd.download('acs1', 2019, geo, variables)

'S2303_C02_008E: Estimate!!Percent!!Population 16 to 64 years!!WEEKS WORKED!!Did not work'
'S2301_C02_001E: Estimate!!Labor Force Participation Rate!!Population 16 years and over'
ValueError: Unexpected response (URL: https://api.census.gov/data/2019/acs/acs1?get=NAME,S2301_C02_008E&for=state:06): error: error: unknown variable 'S2301_C02_008E'

As well as:
'DP03_0009PE: Percent!!EMPLOYMENT STATUS!!Civilian labor force!!Unemployment Rate'

import censusdata as cd

# define census variable
variables = ['DP03_0009PE']

# get variable info
display( cd.variable_info.censusvar('acs5', 2019, variables) )

# define geo index
geo = cd.censusgeo([('state', '06')])

# download data
cd.download('acs5', 2019, geo, variables)

{'DP03_0009PE': ['SELECTED ECONOMIC CHARACTERISTICS',
  'Percent!!EMPLOYMENT STATUS!!Civilian labor force!!Unemployment Rate',
  'float']}
ValueError: Unexpected response (URL: https://api.census.gov/data/2019/acs/acs5?get=NAME,DP03_0009PE&for=state:06): error: error: unknown variable 'DP03_0009PE'

Indexing error

Hello,
First off, I love this package and the excellent documentation.
I am running into a small error when I try to download data from the 2017 acs5 where the index column is entirely populated with null entries. When I reset_index the null entries are converted to the string "Display Error!". I am not sure if the null index data is a problem from the package or from the Census website but I wanted to reach out and ask. The problem only occurs for 2017 acs5 not the 2016 or 2018 years.

Sample Code:
temp17 = censusdata.download(src='acs5', year=2017, geo=censusdata.censusgeo([('zip code tabulation area', '')]), var=['B01001_001E'], key=api_key).reset_index()
temp18 = censusdata.download(src='acs5', year=2018, geo=censusdata.censusgeo([('zip code tabulation area', '')]), var=['B01001_001E'], key=api_key).reset_index()

In this example the index column of temp18 is fine but temp17 is all null.
All the best.

instructions

Are there any instructions on how to use this package?

Census API Key

Hello All,

This is not really an issue but I am trying to get understanding. The "key" argument is optional for census.download. Why would you need a key? Are you limited to the amount of data you can download without a Census API key?

Thanks in advance.

Regards,

Ryan

Possible to add 2019 data?

Would it be possible to add the 2019 ACS data? It looks like the one year data was published September 17th and the supplemental data was published on October 15th. https://www.census.gov/programs-surveys/acs/news/data-releases/2019/release.html

Thanks!

sf1_2010_detail_variables.json missing

I'm new to the package, but have gotten a FileNotFoundError one two completely different machines. I checked and the file isn't there, but there is a file called sf1_2010_variables.json.

Is the file missing, or misnamed, or something else?

Collecting 2010 SF1 Block Group Data Using Wildcard Values

Would it be possible to revise the code to allow the use of wild card values (*) for calling block group data for the 2010 SF1? Currently one must prepare separate queries for each set of block groups located in a particular tract. It is fairly time consuming to do this for extracting all block groups within a county. I understand the problem is with the Census Bureau's system which allows the use of wild card queries for census tracts and census blocks but not block groups.

Unknown table type for variable C24010 when using censusvar

Hello,

I put the wrong title: Title should be: Unknown table type for variable C24010!

Thank you for this package. I am having trouble getting data for the table C24010 when trying to use:

censusdata.censusvar('acs5','2018', ['C24010'])

When I look at the code, I see there is no way to pass that validation section. Is there a way to do it?

Is rental data available?

Hi,

I've looked through the documentation but I can't seem to find if the data shown in the link below is available via the API:

https://data.census.gov/table?q=Renter+Costs&g=0100000US$04000$001_040XX00US01&y=2022

Any ideas?

Thanks

using censusdata.variable_info.censustable with full subject tables

I am attempting to search for variable information on an ACS 5-year subject table with this code:

censusdata.variable_info.censustable('acs5',2018,'S1810')

which I thought should work, since you can use the census API to query this group in full: https://api.census.gov/data/2018/acs/acs5/subject?get=group(S1810)&for=county:*

I found that adding sub-table information will work, e.g.

censusdata.variable_info.censustable('acs5',2018,'S1810_C01')

Meanwhile, a similar API Query does not work: https://api.census.gov/data/2018/acs/acs5/subject?get=group(S1810_C01)&for=county:*

I wonder if a new feature could use the first query to find all the S1810 variables?

Are people interested in having this work past 2019?

Hi! I'm considering building the variable files necessary to enable this to run on recent years (right now it only works through 2019), specifically (at least to start) the ones for the acs5 data. But before setting out to tackle that mountain, I wanted to check if other people would be interested in that. Go ahead and just comment/like/whatever if you think you would use that. Thanks!

SSLError: HTTPSConnectionPool(host='api.census.gov', port=443): Max retries exceeded with url:

I am getting this error now when I am trying to download data with your api:

`SSLError: HTTPSConnectionPool(host='api.census.gov', port=443): Max retries exceeded with url: /data/2012/acs/acs5?get=NAME,B24123_377E,B24123_378E,B24123_379E,B24123_380E,B24123_381E,B24123_382E,B24123_383E,B24123_384E,B24123_385E,B24123_386E,B24123_387E,B24123_388E,B24123_389E,B24123_390E,B24123_391E,B24123_392E,B24123_393E,B24123_394E,B24123_395E,B24123_396E,B24123_397E,B24123_398E,B24123_399E,B24123_400E,B24123_401E,B24123_402E,B24123_403E,B24123_404E,B24123_405E,B24123_406E,B24123_407E,B24123_408E,B24123_409E,B24123_410E,B24123_411E,B24123_412E,B24123_413E,B24123_414E,B24123_415E,B24123_416E,B24123_417E,B24123_418E,B24123_419E,B24123_420E,B24123_421E,B24123_422E,B24123_423E,B24123_424E,B24123_425E&for=tract:&in=state:01+county:&key=e39a53c23358c749629da6f31d8f03878d4088d6 (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))

I tried the same code on another computer and it worked fine but I need it to work on this computer. How do I fix this?

Here is my entire code:

import pandas as pd
import censusdata
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.precision', 2)
#import statsmodels.formula.api as sm
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
import censusgeocode as cg
import numpy as np
from numbers import Number
import plotly
import matplotlib.pyplot as plt
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
import requests
import pandas
import geopandas
import json
import math
from haversine import haversine
from ipfn import ipfn
import networkx
from matplotlib import pyplot
from matplotlib import patheffects
from shapely.geometry import LineString, MultiLineString


variable_list1 = [
'B24123_377E',
'B24123_378E',
'B24123_379E',
'B24123_380E',
'B24123_381E',
'B24123_382E',
'B24123_383E',
'B24123_384E',
'B24123_385E',
'B24123_386E',
'B24123_387E',
'B24123_388E',
'B24123_389E',
'B24123_390E',
'B24123_391E',
'B24123_392E',
'B24123_393E',
'B24123_394E',
'B24123_395E',
'B24123_396E',
'B24123_397E',
'B24123_398E',
'B24123_399E',
'B24123_400E',
'B24123_401E',
'B24123_402E',
'B24123_403E',
'B24123_404E',
'B24123_405E',
'B24123_406E',
'B24123_407E',
'B24123_408E',
'B24123_409E',
'B24123_410E',
'B24123_411E',
'B24123_412E',
'B24123_413E',
'B24123_414E',
'B24123_415E',
'B24123_416E',
'B24123_417E',
'B24123_418E',
'B24123_419E',
'B24123_420E',
'B24123_421E',
'B24123_422E',
'B24123_423E',
'B24123_424E',
'B24123_425E',
'B24123_426E',
'B24123_427E',
'B24123_428E',
'B24123_429E',
'B24123_430E',
'B24123_431E',
'B24123_432E',
'B24123_433E',
'B24123_434E',
'B24123_435E',
'B24123_436E',
'B24123_437E',
'B24123_438E',
'B24123_439E',
'B24123_440E',
'B24123_441E',
'B24123_442E',
'B24123_443E',
'B24123_444E',
'B24123_445E',
'B24123_446E',
'B24123_447E',
'B24123_448E',
'B24123_449E',
'B24123_450E',
'B24123_451E',
'B24123_452E',
'B24123_453E',
'B24123_454E',
'B24123_455E',
'B24123_456E',
'B24123_457E',
'B24123_458E',
'B24123_459E',
'B24123_460E',
'B24123_461E',
'B24123_462E',
'B24123_463E',
'B24123_464E',
'B24123_465E',
'B24123_466E',
'B24123_467E',
'B24123_468E',
'B24123_469E',
'B24123_470E',
'B24123_471E',
'B24123_472E',
'B24123_473E',
'B24123_474E',
'B24123_475E',
'B24123_476E',
'B24123_477E',
'B24123_478E',
'B24123_479E',
'B24123_480E',
'B24123_481E',
'B24123_482E',
'B24123_483E',
'B24123_484E',
'B24123_485E',
'B24123_486E',
'B24123_487E',
'B24123_488E',
'B24123_489E',
'B24123_490E',
'B24123_491E',
'B24123_492E',
'B24123_493E',
'B24123_494E',
'B24123_495E',
'B24123_496E',
'B24123_497E',
'B24123_498E',
'B24123_499E',
'B24123_500E',
'B24123_501E',
'B24123_502E',
'B24123_503E',
'B24123_504E',
'B24123_505E',
'B24123_506E',
'B24123_507E',
'B24123_508E',
'B24123_509E',
'B24123_510E',
'B24123_511E',
'B24123_512E',
'B24123_513E',
'B24123_514E',
'B24123_515E',
'B24123_516E',
'B24123_517E',
'B24123_518E',
'B24123_519E',
'B24123_520E',
'B24123_521E',
'B24123_522E',
'B24123_523E',
'B24123_524E',
'B24123_525E',
'B24123_526E',
'B24124_001E',
'B24124_002E',
'B24124_003E',
'B24124_004E',
'B24124_005E',
'B24124_006E',
'B24124_007E',
'B24124_008E',
'B24124_009E',
'B24124_010E',
'B24124_011E',
'B24124_012E',
'B24124_013E',
'B24124_014E',
'B24124_015E',
'B24124_016E',
'B24124_017E',
'B24124_018E',
'B24124_019E',
'B24124_020E',
'B24124_021E',
'B24124_022E',
'B24124_023E',
'B24124_024E',
'B24124_025E',
'B24124_026E',
'B24124_027E',
'B24124_028E',
'B24124_029E',
'B24124_030E',
'B24124_031E',
'B24124_032E',
'B24124_033E',
'B24124_034E',
'B24124_035E',
'B24124_036E',
'B24124_037E',
'B24124_038E',
'B24124_039E',
'B24124_040E',
'B24124_041E',
'B24124_042E',
'B24124_043E',
'B24124_044E',
'B24124_045E',
'B24124_046E',
'B24124_047E',
'B24124_048E',
]
variable_list2 = [
'B24124_049E',
'B24124_050E',
'B24124_051E',
'B24124_052E',
'B24124_053E',
'B24124_054E',
'B24124_055E',
'B24124_056E',
'B24124_057E',
'B24124_058E',
'B24124_059E',
'B24124_060E',
'B24124_061E',
'B24124_062E',
'B24124_063E',
'B24124_064E',
'B24124_065E',
'B24124_066E',
'B24124_067E',
'B24124_068E',
'B24124_069E',
'B24124_070E',
'B24124_071E',
'B24124_072E',
'B24124_073E',
'B24124_074E',
'B24124_075E',
'B24124_076E',
'B24124_077E',
'B24124_078E',
'B24124_079E',
'B24124_080E',
'B24124_081E',
'B24124_082E',
'B24124_083E',
'B24124_084E',
'B24124_085E',
'B24124_086E',
'B24124_087E',
'B24124_088E',
'B24124_089E',
'B24124_090E',
'B24124_091E',
'B24124_092E',
'B24124_093E',
'B24124_094E',
'B24124_095E',
'B24124_096E',
'B24124_097E',
'B24124_098E',
'B24124_099E',
'B24124_100E',
'B24124_101E',
'B24124_102E',
'B24124_103E',
'B24124_104E',
'B24124_105E',
'B24124_106E',
'B24124_107E',
'B24124_108E',
'B24124_109E',
'B24124_110E',
'B24124_111E',
'B24124_112E',
'B24124_113E',
'B24124_114E',
'B24124_115E',
'B24124_116E',
'B24124_117E',
'B24124_118E',
'B24124_119E',
'B24124_120E',
'B24124_121E',
'B24124_122E',
'B24124_123E',
'B24124_124E',
'B24124_125E',
'B24124_126E',
'B24124_127E',
'B24124_128E',
'B24124_129E',
'B24124_130E',
'B24124_131E',
'B24124_132E',
'B24124_133E',
'B24124_134E',
'B24124_135E',
'B24124_136E',
'B24124_137E',
'B24124_138E',
'B24124_139E',
'B24124_140E',
'B24124_141E',
'B24124_142E',
'B24124_143E',
'B24124_144E',
'B24124_145E',
'B24124_146E',
'B24124_147E',
'B24124_148E',
'B24124_149E',
'B24124_150E',
'B24124_151E',
'B24124_152E',
'B24124_153E',
'B24124_154E',
'B24124_155E',
'B24124_156E',
'B24124_157E',
'B24124_158E',
'B24124_159E',
'B24124_160E',
'B24124_161E',
'B24124_162E',
'B24124_163E',
'B24124_164E',
'B24124_165E',
'B24124_166E',
'B24124_167E',
'B24124_168E',
'B24124_169E',
'B24124_170E',
'B24124_171E',
'B24124_172E',
'B24124_173E',
'B24124_174E',
'B24124_175E',
'B24124_176E',
'B24124_177E',
'B24124_178E',
'B24124_179E',
'B24124_180E',
'B24124_181E',
'B24124_182E',
'B24124_183E',
'B24124_184E',
'B24124_185E',
'B24124_186E',
'B24124_187E',
'B24124_188E',
'B24124_189E',
'B24124_190E',
'B24124_191E',
'B24124_192E',
'B24124_193E',
'B24124_194E',
'B24124_195E',
'B24124_196E',
'B24124_197E',
'B24124_198E',
'B24124_199E',
'B24124_200E',
'B24124_201E',
'B24124_202E',
'B24124_203E',
'B24124_204E',
'B24124_205E',
'B24124_206E',
'B24124_207E',
'B24124_208E',
'B24124_209E',
'B24124_210E',
'B24124_211E',
'B24124_212E',
'B24124_213E',
'B24124_214E',
'B24124_215E',
'B24124_216E',
'B24124_217E',
'B24124_218E',
'B24124_219E',
'B24124_220E',
'B24124_221E',
'B24124_222E',
'B24124_223E',
'B24124_224E',
'B24124_225E',
'B24124_226E',
'B24124_227E',
'B24124_228E',
'B24124_229E',
'B24124_230E',
'B24124_231E',
'B24124_232E',
'B24124_233E',
'B24124_234E',
'B24124_235E',
'B24124_236E',
'B24124_237E',
'B24124_238E',
'B24124_239E',
'B24124_240E',
'B24124_241E',
'B24124_242E',
'B24124_243E',
'B24124_244E',
'B24124_245E',
'B24124_246E',
'B24124_247E',
'B24124_248E',
'B24124_249E',
'B24124_250E',
'B24124_251E',
]

all_variable_lists = [variable_list1, variable_list2]
print(len(all_variable_lists[0]))
#2) For each year, download the relevant variables for each tract
def download_year(year,variable_list,State,County,Tract):
    df = censusdata.download('acs5', year, censusdata.censusgeo([('state',State),('county',County),('tract',Tract)]), variable_list, key = 'e39a53c23358c749629da6f31d8f03878d4088d6')
    df['Year']=str(year)
    return df
#3) Define function to download for a single year and state 
def callback_arg(i,variable_list,year):
    try:        
        print('Downloading - ',year,'State', i,' of 57')
        if i<10:
            df = download_year(year,variable_list,'0'+str(i),'*','*')
            return df
        if i==51:
            df = download_year(year,variable_list,str(i),'*','*')
            return df
        else:
            df = download_year(year,variable_list,str(i),'*','*')
            return df
    except:
        pass

#3) Function to download for all states and all years, do some slight formatting
def download_all_data(variable_list,max_year):
    df=download_year(2012,variable_list,'01','*','*')
    for year in range(2012,max_year+1):
        if year == 2012:
            for i in range(0,57):
                df=df.append(callback_arg(i,variable_list,year))
        else: 
            for i in range(0,57):
                df=df.append(callback_arg(i,variable_list,year))
    df2=df.reset_index()
    df2=df2.rename(columns = {"index": "Location+Type"}).astype(str)
    df2['state']=df2["Location+Type"].str.split(':').str[0].str.split(', ').str[2]
    df2['Census_tract']=df2["Location+Type"].str.split(':').str[0].str.split(',').str[0].str.split(' ').str[2][0]
    df2['County_name']=df2["Location+Type"].str.split(':').str[0].str.split(', ').str[1]
    return(df2)
#4) Some slight formatting
def write_to_csv(df2,name = 'Employment Data Part 9'):
    df2.to_csv(name)
#5) The line below is commented out, but should run the entire download sequence

list_of_dfs = []
for var_list in all_variable_lists:
    list_of_dfs.append(download_all_data(var_list, 2018))
x1 = list_of_dfs[0].reset_index()
x2 = list_of_dfs[1].reset_index()
x3 = pd.merge(x1,x2, on=['index','Location+Type','Year','state','Census_tract','County_name'])
write_to_csv(x3)
`

CensusData request takes too long

I'm getting some census data by using the CensusData API. I'm looking for all variables related to each zip code from a list. However, it is taking too long, like 22601.822241067886 seconds per request. Would there be any workaround for this?

[Feature Request] ACS summary files?

I'm trying to get ACS feature summary files, for example, table S1701 (https://data.census.gov/cedsci/table?q=percent%20poverty&hidePreview=false&tid=ACSST5Y2015.S1701&t=Poverty&vintage=2018) which has summary statistics on poverty rates. I seem to be unable to pull this via censusdata however; I've tried searching in a variety of ways and am unable to find this via the library.

I initially thought this was an error, but looking on the main page of the repo I don't see mention of ACS summary tables making me think the summary tables simply aren't included yet. Thus I'm writing here to ask if there's any plan to do so and when that might happen?

Or alternatively, am I doing something wrong in not being able to pull them?

Wrong url(?)

Hi I'm not sure if this is an issue about the module itself, but when I try to retrieve 2013 ACS 5-year estimates data, I used a following code:

g = censusdata.censusgeo([('state', '25'), ('county', '017'), ('block group', '*')])
d = censusdata.download('acs5', year, g, features)

However I got an value error unknown/unsupported geography heirarchy

I notice the url requested is

https://api.census.gov/data/2013/acs/acs5?get=NAME,B25001_001E,B19013_001E&for=block+group:*&in=state:25+county:017

And if I remove /acs in the url, it actually worked:

https://api.census.gov/data/2013/acs5?get=NAME,B25001_001E,B19013_001E&for=block+group:*&in=state:25+county:017

[Feature Request] Search variables on multiple criterion

Right now, it's only possible to search variables on a single criterion. It'd be convenient to specify multiple criteria on multiple fields. And it looks like it would be pretty quick to implement.

Queries for all us census tracts timing out

Hi, I am trying to run queries for acs5 2015 for all census tracts in all US states. I had already run this in the past (a few months ago) for 2016, so my code was set up and I was hoping to just rerun the whole scrip once. However, just trying to run my same code for 2016 again yields the following error:

Unexpected response (URL: https://api.census.gov/data/2015/acs/acs5?get=NAME,B23025_003E,B23025_005E,B15003_001E,B15003_002E,B15003_003E,B15003_004E,B15003_005E,B15003_006E,B15003_007E,B15003_008E,B15003_009E,B15003_010E,B15003_011E,B15003_012E,B15003_013E,B15003_014E,B15003_015E,B15003_016E&for=block+group:*&in=state:08+county:*): Sorry, the system is currently undergoing maintenance or is busy. Please try again later.
My code:
I basically write a loop to go through a list of states and use a query very similar to the Example 1 in https://jtleider.github.io/censusdata/example1.html.

For instance, using your Example 1 for all states, the code below makes the error pop up.

for i in range(len(state_lst)):
      cookbg = censusdata.download('acs5', 2015,
                               censusdata.censusgeo([('state', state_lst[i]), ('county', '*'), ('block group', '*')]),
                               ['B23025_003E', 'B23025_005E', 'B15003_001E', 'B15003_002E', 'B15003_003E',
                                'B15003_004E', 'B15003_005E', 'B15003_006E', 'B15003_007E', 'B15003_008E',
                                'B15003_009E', 'B15003_010E', 'B15003_011E', 'B15003_012E', 'B15003_013E',
                                'B15003_014E', 'B15003_015E', 'B15003_016E'])

I basically have 20 or 30 of queries like this one and none work with the whole list of states, although they worked perfectly fine some months ago...

I can run it state by state and, for some of my queries, multiple states in a loop work, but I just wanted to do a loop for all states as i had done in the past. Is there a scheduled maintenance that will soon be complete? I have been having this error for the past 3 days.

Thanks!

How to get census data from a specific zip code?

I'm trying to get census information form a specific zip code. I have used this code below but it doesn't work:

sensu = censusdata.download(
        src="acs5", 
        year=2019,
        geo=censusdata.censusgeo([("zip code tabulation area", "92101")]),
        var=["GEO_ID"],
        tabletype="profile"
    )

How can I get information from the specific zipcode 92101?

jtleider / censusdata Goto Github PK

censusdata's People

Contributors

Stargazers

Watchers

Forkers

censusdata's Issues

Recommend Projects

Recommend Topics

Recommend Org