oturns / geosnap Goto Github PK

The Geospatial Neighborhood Analysis Package

Home Page: https://oturns.github.io/geosnap-guide

License: BSD 3-Clause "New" or "Revised" License

Python 94.51% Jupyter Notebook 5.49%

spatial-analysis urban-modeling geodemographics neighborhood-dynamics spatial-data-science urban-analytics urban-data-science urban-informatics

geosnap's People

Contributors

Stargazers

Watchers

geosnap's Issues

No module named 'libpysal.api' found when I tried to import

Hello,

When I tried to import the function read_ltdb using the following command:

from osnap.data import read_ltdb

I got this error:

In this case I need to manually change this line in 'heuristics.py', but perhaps it would be nice if this intermediate step was not necessary.

Kind regards,
Renan

check contiguity of shape data

we need to ensure the integrity of the shape data included in the package to make sure the spatial weights matrices are built correctly

User authentication

OSNAP web interface will need to allow for account generation for users via the web, and isolation of user created data.

This is related to #40.

make examples more discoverable

add data import notebook to examples
include some example code in top-level README

Creating a database fails due to bad file name

The following code, taken from the example notebook, failed to run on my machine 😞:

import osnap
import libpysal as lps
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set_context('notebook')

va = gpd.read_file(lps.examples.get_path('virginia.shp'))
virginia = osnap.data.Dataset(name='Virginia', source='ltdb', boundary=va)

This code raises:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-3-dbd75e222aaa> in <module>
----> 1 virginia = osnap.data.Dataset(name='Virginia', source='ltdb', boundary=va)

~/Desktop/osnap/osnap/data/data.py in __init__(self, name, source, states, counties, add_indices, boundary, **kwargs)
    391         if source == "ltdb":
    392             _df = pd.read_parquet(
--> 393                 os.path.join(_package_directory, "ltdb.parquet.gzip"))
    394         elif source == "ncdb":
    395             _df = pd.read_parquet(

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pandas/io/parquet.py in read_parquet(path, engine, columns, **kwargs)
    286 
    287     impl = get_engine(engine)
--> 288     return impl.read(path, columns=columns, **kwargs)

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pandas/io/parquet.py in read(self, path, columns, **kwargs)
    129             kwargs['use_pandas_metadata'] = True
    130             result = self.api.parquet.read_table(path, columns=columns,
--> 131                                                  **kwargs).to_pandas()
    132         if should_close:
    133             try:

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/parquet.py in read_table(source, columns, use_threads, metadata, use_pandas_metadata, nthreads)
   1072         return fs.read_parquet(source, columns=columns,
   1073                                use_threads=use_threads, metadata=metadata,
-> 1074                                use_pandas_metadata=use_pandas_metadata)
   1075 
   1076     pf = ParquetFile(source, metadata=metadata)

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/filesystem.py in read_parquet(self, path, columns, metadata, schema, use_threads, nthreads, use_pandas_metadata)
    180         use_threads = _deprecate_nthreads(use_threads, nthreads)
    181         dataset = ParquetDataset(path, schema=schema, metadata=metadata,
--> 182                                  filesystem=self)
    183         return dataset.read(columns=columns, use_threads=use_threads,
    184                             use_pandas_metadata=use_pandas_metadata)

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/parquet.py in __init__(self, path_or_paths, filesystem, schema, metadata, split_row_groups, validate_schema, filters, metadata_nthreads)
    858          self.common_metadata_path,
    859          self.metadata_path) = _make_manifest(
--> 860             path_or_paths, self.fs, metadata_nthreads=metadata_nthreads)
    861 
    862         if self.common_metadata_path is not None:

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/parquet.py in _make_manifest(path_or_paths, fs, pathsep, metadata_nthreads)
   1033             if not fs.isfile(path):
   1034                 raise IOError('Passed non-file path: {0}'
-> 1035                               .format(path))
   1036             piece = ParquetDatasetPiece(path)
   1037             pieces.append(piece)

OSError: Passed non-file path: /Users/alex/Desktop/osnap/osnap/data/ltdb.parquet.gzip

A quick filesystem check indicates that this is because the file the code looks for is not among those populated during the download process:

ipdb> path
'/Users/alex/Desktop/osnap/osnap/data/ltdb.parquet.gzip'
ipdb> import os
ipdb> os.listdir('/Users/alex/Desktop/osnap/osnap/data/')
['cenpy_fetch.py', '__init__.py', '__pycache__', 'README.md', '.gitignore', 'counties.parquet.gzip', 'tracts.parquet.gzip', 'variables.csv', 'msas.parquet.gzip', 'data.py', 'states.parquet.gzip']

expose util functions to users

inflation adjustment
gdf_from_parquet

others?

fix cluster tests

since we're not rounding the ltdb data any more, some of the cluster assignments have changed causing tests to fail

Update data README.md to discuss NCDB

Currently there are no instructions for NCDB.

move quilt data to spatialucr account

[high priority] mislabelling in osnap.data.dictionary

This is the current

Please correct like below:

consider GADM interface

as we continue to abstract osnap into cases beyond the US, it might be useful to provide an API that gives access to international data sources like GADM.

I'm not sure how good the indices are for each country (e.g. FIPS codes aren't included as an attribute for the USA counties on there) so it might cause headaches when users have to attach their own census data, but it's something to consider

trying to run the python program in the different directory, but it fails

I am trying to run the python program in the different directory, but it fails like below
http://173.255.192.133/~suhan/STARS/python/testsr.py

The same works in the directory Serge set up: http://173.255.192.133/~suhan/cgi-bin/testsr.py

It looks like I need to change the path the python in the first line of code. But I am not sure which path that I need to enter. Any thought?

I read below:
Some of these potential problems are:

The Python script is not marked as executable. When CGI scripts are not executable most web servers will let the user download it, instead of running it and sending the output to the user. For CGI scripts to run properly on Unix-like operating systems, the +x bit needs to be set. Using chmod a+x your_script.py may solve this problem.
On a Unix-like system, The line endings in the program file must be Unix style line endings. This is important because the web server checks the first line of the script (called shebang) and tries to run the program specified there. It gets easily confused by Windows line endings (Carriage Return & Line Feed, also called CRLF), so you have to convert the file to Unix line endings (only Line Feed, LF). This can be done automatically by uploading the file via FTP in text mode instead of binary mode, but the preferred way is just telling your editor to save the files with Unix line endings. Most editors support this.
Your web server must be able to read the file, and you need to make sure the permissions are correct. On unix-like systems, the server often runs as user and group www-data, so it might be worth a try to change the file ownership, or making the file world readable by using chmod a+r your_script.py.
The web server must know that the file you’re trying to access is a CGI script. Check the configuration of your web server, as it may be configured to expect a specific file extension for CGI scripts.
On Unix-like systems, the path to the interpreter in the shebang (#!/usr/bin/env python) must be correct. This line calls /usr/bin/env to find Python, but it will fail if there is no /usr/bin/env, or if Python is not in the web server’s path. If you know where your Python is installed, you can also use that full path. The commands whereis python and type -p python could help you find where it is installed. Once you know the path, you can change the shebang accordingly: #!/usr/bin/python.
The file must not contain a BOM (Byte Order Mark). The BOM is meant for determining the byte order of UTF-16 and UTF-32 encodings, but some editors write this also into UTF-8 files. The BOM interferes with the shebang line, so be sure to tell your editor not to write the BOM.
If the web server is using mod_python, mod_python may be having problems. mod_python is able to handle CGI scripts by itself, but it can also be a source of issues.

it works on my user directory, so one other thing to check would be to see if eli can add a cgi-bin directory, put a script in there and test it.

bug: installation error on networkx

Trying to install on the ucr cluster I'm hitting:

Processing dependencies for osnap==0.1.0
error: networkx 2.2 is installed but networkx<2.0.0 is required by {'region'}

Will investigate. Wanted to flag it here in case other had hit this?

move all shape data to quilt

the package is currently too big to be hosted on PyPI (must be <60mb) so we should probably move all spatial data to the spatialucr quilt account

catch error and update quilt package

we recently moved all shape data into quilt and out of local parquet files. osnap works as normal on fresh installs, but when updating the codebase, nothing triggers a refresh of the quilt package, so osnap ends up asking for a df that isn't there

add tests

We need to add tests for the following

~~- [x] Community plotting~~ plotting directly is a lot easier with the new structure since geoms are attached, but should we still try and implement top-level plotting?

geolytics importer
from_lodes constructor
from_geodataframes constructor
from_census constructor
harmonize functionality
Transition class (using Community as input)
alternative distance calcs for Sequence class (markov, arbitrary, trans)

add cbsa-county crosswalk

per @weikang9009's suggestion, it would be good to include a mapping of counties to MSAs so that metro-scale analyses can use faster table filters instead of clip operations

include segregation analytics

example notebook throws import error on clean install

During handling of the above exception, another exception occurred:

ModuleNotFoundError Traceback (most recent call last)
in
----> 1 import osnap
2 import libpysal as lps
3 import geopandas as gpd
4 import matplotlib.pyplot as plt
5 import seaborn as sns

~/Dropbox/o/osnap/osnap/init.py in
2
3 from . import analytics
----> 4 from . import data
5 from .data import metros

~/Dropbox/o/osnap/osnap/data/init.py in
----> 1 from .data import Dataset, metros, read_ltdb, read_ncdb

~/Dropbox/o/osnap/osnap/data/data.py in
13 quilt.install("spatialucr/census")
14 quilt.install("spatialucr/census_cartographic")
---> 15 from quilt.data.knaaptime import census
16 import matplotlib.pyplot as plt
17 import pandas as pd

ModuleNotFoundError: No module named 'quilt.data.knaaptime'

documentation needed for msas.parquet.zip

Documentation should include:

Link to source
Link or code that generateosnap/data/msas.parquet.zip

time series plotting method

it would be really useful to have a .tsplot() method on the Dataset class that arranges plots for each available time period on the same figure

investigate moving from CGI to django

one of the performance bottlenecks on the web interface is that we're using the old CGI interface which is unsustainable for 2 reasons:

it is a script based interface, so each time a script is called it spawns a new python interpreter (which means programs cant really be dynamic)
it is slow

instead, we should consider a modern web framework like django

dependencies missing in environment

We need to keep the environment.yml updated for dependencies so devs have a unified env.

ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
pyarrow or fastparquet is required for parquet support

If the install needs more than conda, we should document that as well.

raise an informative warning if user tries to access a dataset that isn't installed

add missing currency variables to inflation adjustment

ltdb reader
ncdb reader

rewrite cenpy_fetch to use new `product` api

LTDB reader rounding to integers?

df = df.round(0)
Can we make this optional? Rounding to integers will lead to a loss of information and sometimes users might need the info for the analysis.

trim dependencies

place api

we could use fuzzywuzzy or similar to add a named-place based api similar to what cenpy and osmnx do

parameter description of Kmeans

I think that you have been working on it. But it would be really nice to add descriptions of all parameters for Kmeans in the near future. I was looking at the page below:

http://osnap.cloud/~suhan/osnap/osnap/doc/_build/html/generated/osnap.analyze.analytics.kmeans.html#osnap.analyze.analytics.kmeans

refactor Dataset to Community

The data.Dataset.data namespace is a bad one (for which I'm responsible). I propose renaming the Dataset class to Community, and restructuring it for clarity

A community is a collection of neighborhoods with different properties describing its various boundaries (tracts, counties, states, over several time periods), and compositional attributes, such as data from surveys, or sensors, or geocoded misc.

goals:

make the nomenclature more intuitive
abstract away from US-only data
allow for multiple spatial representations with non-compatible indices (e.g. zipcodes vs tracts or 1990 vs 2000 tracts)

something like

from osnap import Community

# attribs are pd.DataFrames
Community.attrib.census 
Community.attrib.osm
Community.attrib.misc

# boundaries are gpd.GeoDataFrames
Community.boundaries.units # the currently-set primitive units like geopandas.geometry
Community.boundaries.zipcodes
Community.boundaries.counties
Community.boundaries.tracts_1990
Community.boundaries.tracts_2010

I'm thinkning this would simplify class instantiation and the source argument would move to a method (e.g. Community().from_ltdb(...). I'm also imagining methods to collapse to a single wide-form geodataframe and instantiate from one

I'm not particularly wedded to any of the names specifically, but soliciting input on whether folks think this would be a useful direction

add documentation for available clustering methods

Anaconda3 in the installation instruction in osnap

I think it is better to mention that "we suggest to use Anaconda3 for osnap since it has been fully tested in Anaconda3".

Currently the instruction says, "The recommended method for installing OSNAP is with anaconda." But it does not specify either 2 or 3.

When I tried to install osnpa using yml file you provide in Anaconda2, I had errors and could not resolve errors by myself.

create dataset from list of counties

current behavior selects every combination of (2-digit)states + (3-digit)counties. It should also allow lists of 5-digit state-county fips for finer-grain selection

importing geosnap (KeyError: 'states'?)

I forked geosnap and install the develop version and everything went smooth, but when I try to import it, I get this:

Is there a way to fix this? Do I need to change something in the data.py?

consider compositional distances for clustering

as described here

census attributes missing

Just wanted to remind you that please add census attributes osnap.data.Community in doc_build\html\api.html when you make changes next time

boundary test might miss some edge cases

add metadata to existing datasets

It would be helpful to document the original source for the cbsa definitions as these can change over time and we may need to cross-reference different time slices.

For example, we don't have parquet files for the following:

41900 San Germán-Cabo Rojo, PR Metropolitan Statistical Area
41980 San Juan-Caguas-Guaynabo, PR Metropolitan Statistical Area
39140 Prescott, AZ Metropolitan Statistical Area
38660 Ponce, PR Metropolitan Statistical Area
10380 Aguadilla-Isabela-San Sebastián, PR Metropolitan Statistical Area
11640
32420 Mayagüez, PR Metropolitan Statistical Area
25020 Guayama, PR Metropolitan Statistical Area
19380 Dayton, OH Metropolitan Statistical Area

It makes sense that ltdb is not covering PR areas, but the AZ, OH and 11640 are less clear why they are not in the data.

Migrate code from gitlab to here

remove large private files from the repository (store on private google drive)
set up to test private files locally
make travis tests public

add a private function that loads all geoms into memory

queries on the web-based interface are slow, so it would be useful to add a function that loads all the necessary geoms into memory so they dont need to be created at each instantiation of Community

cannot install pandas in Tobler

Importing OSNAP in Web-enabled python gives PermissionError: [Errno 13] Permission denied: '/var/www/.local'

I am trying to run the example python code to test if it is possible to use osnap in the Web-enabled python.

Here is the code (osnap_example.py) that I am testing:
///////////////////////////////////////////////////////////////////////////////////////////////
#!/home/suhan/public_html/cgi-bin/anaconda3/envs/osnap/bin/python
import cgitb
cgitb.enable()

import platform
import numpy

pltform = platform.platform()

print("Content-type: text/html\n\n")
print("\n")
print("<div style="width: 100%; font-size: 40px; font-weight: bold; text-align: center;">")
print("Python Script Test Page")
print("CGI Works!")
print("

\n\nPlatform: %s

"%pltform)
print("\n\n")

import osnap
//////////////////////////////////////////////////////////////////////////////////////////////////////

I tried to run this code both in Terminal (image on the left) and on the Web (image on the right).

The code that run in the Terminal did not give me any error in terms of importing OSNAP (the left mage). But when I run the same code on the Web (http://173.255.192.133/~suhan/cgi-bin/osnap/osnap_example.py), it gives me an PermissionError: [Errno 13] , so OSANP modules cannot be used.

I think they are related to the file permission issues, it seems like I need to change permission level of some files. But I am not sure what to change exactly. It seems to be related to something like " rwxrwxrwx " Is anyone familiar with the permission issues in Linux?

get san diego & DC data from census with `variables.csv`

Working notes

most of the variables in in the sf3, not the sf1. So, this means the connections are to cenpy.base.Connection('2000sf3') and cenpy.base.Connection('DecennialSF31990'), not 2000sf1 or DecennialSF11990
the ACS table column codes are not the SF3 column codes. So, B03001_004E doesn't exist in the sf3. This means we must match by "category," called concept in the census API.
sf3 for 2000 from the API only goes down to block groups, not blocks.

currently, this means building up analogue columns by querying into the pandas dataframe, connection.variables:

import cenpy
c2000sf3 = cenpy.base.Connection('2000sf3')
has_ancestry = c2000sf3.variables['concept'].str.lower().apply(lambda x: 'ancestry' in x)
c2000sf3.variables[has_ancestry]['concept']

So, if we can get the column names that correspond, then:

c2000sf3.query(columns=columns, geo_unit='block group', 
               geo_filter=dict(state='06',county='073', tract='*'))

should work.

reimplement tests

Reading LTBD data - inconsistent paths for full count and sample files?

Not sure why the relative path of LTDB full count file is different from the sample file?

For instance, the path to the full count 1980 is "LTDB_Std_1980_fullcount.csv":

    fullcount80 = _ltdb_reader(
        fullcount_zip, "LTDB_Std_1980_fullcount.csv", year=1980)

while the path for the sample file lives in another folder "ltdb_std_all_sample/ltdb_std_1980_sample.csv":

    sample80 = _ltdb_reader(
        sample_zip,
        "ltdb_std_all_sample/ltdb_std_1980_sample.csv",
        dropcols=["pop80sf3", "pop80sf4", "hu80sp", "ohu80sp"],
        year=1980,
    )

It makes more sense to adopt either form to keep consistency.

adding a table after the chropleth map in examples/02_creating_community_datasets.ipynb

https://github.com/knaaptime/osnap/blob/master/examples/02_creating_community_datasets.ipynb
In the example above,
I was just thinking that it would be nice to add something to show the data used on the map such as

sacramento.census[sacramento.cesus.year==1990]['median_household_income']

after the plot below
sacramento.plot(column='median_household_income', year=1990)

cenpy interface

the old decennial census downloaders need to be rewritten to support the new and improved cenpy API. We also need to add downloaders for ACS (which will be easier now)

LTDB reader: adjust for inflation?

I might be missing sth, but based on this line:inflate_cols = ["mhmval", "mrent", "hinc"], I think only these three variables are adjusted for inflation while the other economic variables like per capita income incpc are not?

oturns / geosnap Goto Github PK

geosnap's People

Contributors

Stargazers

Watchers

Forkers

geosnap's Issues

Working notes

Recommend Projects

Recommend Topics

Recommend Org