Git Product home page Git Product logo

oturns / geosnap Goto Github PK

View Code? Open in Web Editor NEW
229.0 17.0 32.0 455.28 MB

The Geospatial Neighborhood Analysis Package

Home Page: https://oturns.github.io/geosnap-guide

License: BSD 3-Clause "New" or "Revised" License

Python 94.51% Jupyter Notebook 5.49%
spatial-analysis urban-modeling geodemographics neighborhood-dynamics spatial-data-science urban-analytics urban-data-science urban-informatics

geosnap's People

Contributors

angwar26 avatar jgaboardi avatar knaaptime avatar ljwolf avatar renanxcortes avatar sjsrey avatar suhanmappingideas avatar weikang9009 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geosnap's Issues

No module named 'libpysal.api' found when I tried to import

Hello,

When I tried to import the function read_ltdb using the following command:

from osnap.data import read_ltdb

I got this error:

image

In this case I need to manually change this line in 'heuristics.py', but perhaps it would be nice if this intermediate step was not necessary.

Kind regards,
Renan

check contiguity of shape data

we need to ensure the integrity of the shape data included in the package to make sure the spatial weights matrices are built correctly

User authentication

OSNAP web interface will need to allow for account generation for users via the web, and isolation of user created data.

This is related to #40.

Creating a database fails due to bad file name

The following code, taken from the example notebook, failed to run on my machine 😞:

import osnap
import libpysal as lps
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set_context('notebook')

va = gpd.read_file(lps.examples.get_path('virginia.shp'))
virginia = osnap.data.Dataset(name='Virginia', source='ltdb', boundary=va)

This code raises:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-3-dbd75e222aaa> in <module>
----> 1 virginia = osnap.data.Dataset(name='Virginia', source='ltdb', boundary=va)

~/Desktop/osnap/osnap/data/data.py in __init__(self, name, source, states, counties, add_indices, boundary, **kwargs)
    391         if source == "ltdb":
    392             _df = pd.read_parquet(
--> 393                 os.path.join(_package_directory, "ltdb.parquet.gzip"))
    394         elif source == "ncdb":
    395             _df = pd.read_parquet(

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pandas/io/parquet.py in read_parquet(path, engine, columns, **kwargs)
    286 
    287     impl = get_engine(engine)
--> 288     return impl.read(path, columns=columns, **kwargs)

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pandas/io/parquet.py in read(self, path, columns, **kwargs)
    129             kwargs['use_pandas_metadata'] = True
    130             result = self.api.parquet.read_table(path, columns=columns,
--> 131                                                  **kwargs).to_pandas()
    132         if should_close:
    133             try:

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/parquet.py in read_table(source, columns, use_threads, metadata, use_pandas_metadata, nthreads)
   1072         return fs.read_parquet(source, columns=columns,
   1073                                use_threads=use_threads, metadata=metadata,
-> 1074                                use_pandas_metadata=use_pandas_metadata)
   1075 
   1076     pf = ParquetFile(source, metadata=metadata)

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/filesystem.py in read_parquet(self, path, columns, metadata, schema, use_threads, nthreads, use_pandas_metadata)
    180         use_threads = _deprecate_nthreads(use_threads, nthreads)
    181         dataset = ParquetDataset(path, schema=schema, metadata=metadata,
--> 182                                  filesystem=self)
    183         return dataset.read(columns=columns, use_threads=use_threads,
    184                             use_pandas_metadata=use_pandas_metadata)

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/parquet.py in __init__(self, path_or_paths, filesystem, schema, metadata, split_row_groups, validate_schema, filters, metadata_nthreads)
    858          self.common_metadata_path,
    859          self.metadata_path) = _make_manifest(
--> 860             path_or_paths, self.fs, metadata_nthreads=metadata_nthreads)
    861 
    862         if self.common_metadata_path is not None:

~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/parquet.py in _make_manifest(path_or_paths, fs, pathsep, metadata_nthreads)
   1033             if not fs.isfile(path):
   1034                 raise IOError('Passed non-file path: {0}'
-> 1035                               .format(path))
   1036             piece = ParquetDatasetPiece(path)
   1037             pieces.append(piece)

OSError: Passed non-file path: /Users/alex/Desktop/osnap/osnap/data/ltdb.parquet.gzip

A quick filesystem check indicates that this is because the file the code looks for is not among those populated during the download process:

ipdb> path
'/Users/alex/Desktop/osnap/osnap/data/ltdb.parquet.gzip'
ipdb> import os
ipdb> os.listdir('/Users/alex/Desktop/osnap/osnap/data/')
['cenpy_fetch.py', '__init__.py', '__pycache__', 'README.md', '.gitignore', 'counties.parquet.gzip', 'tracts.parquet.gzip', 'variables.csv', 'msas.parquet.gzip', 'data.py', 'states.parquet.gzip']

fix cluster tests

since we're not rounding the ltdb data any more, some of the cluster assignments have changed causing tests to fail

consider GADM interface

as we continue to abstract osnap into cases beyond the US, it might be useful to provide an API that gives access to international data sources like GADM.

I'm not sure how good the indices are for each country (e.g. FIPS codes aren't included as an attribute for the USA counties on there) so it might cause headaches when users have to attach their own census data, but it's something to consider

trying to run the python program in the different directory, but it fails

I am trying to run the python program in the different directory, but it fails like below
http://173.255.192.133/~suhan/STARS/python/testsr.py
image

The same works in the directory Serge set up: http://173.255.192.133/~suhan/cgi-bin/testsr.py

It looks like I need to change the path the python in the first line of code. But I am not sure which path that I need to enter. Any thought?

I read below:
Some of these potential problems are:

The Python script is not marked as executable. When CGI scripts are not executable most web servers will let the user download it, instead of running it and sending the output to the user. For CGI scripts to run properly on Unix-like operating systems, the +x bit needs to be set. Using chmod a+x your_script.py may solve this problem.
On a Unix-like system, The line endings in the program file must be Unix style line endings. This is important because the web server checks the first line of the script (called shebang) and tries to run the program specified there. It gets easily confused by Windows line endings (Carriage Return & Line Feed, also called CRLF), so you have to convert the file to Unix line endings (only Line Feed, LF). This can be done automatically by uploading the file via FTP in text mode instead of binary mode, but the preferred way is just telling your editor to save the files with Unix line endings. Most editors support this.
Your web server must be able to read the file, and you need to make sure the permissions are correct. On unix-like systems, the server often runs as user and group www-data, so it might be worth a try to change the file ownership, or making the file world readable by using chmod a+r your_script.py.
The web server must know that the file you’re trying to access is a CGI script. Check the configuration of your web server, as it may be configured to expect a specific file extension for CGI scripts.
On Unix-like systems, the path to the interpreter in the shebang (#!/usr/bin/env python) must be correct. This line calls /usr/bin/env to find Python, but it will fail if there is no /usr/bin/env, or if Python is not in the web server’s path. If you know where your Python is installed, you can also use that full path. The commands whereis python and type -p python could help you find where it is installed. Once you know the path, you can change the shebang accordingly: #!/usr/bin/python.
The file must not contain a BOM (Byte Order Mark). The BOM is meant for determining the byte order of UTF-16 and UTF-32 encodings, but some editors write this also into UTF-8 files. The BOM interferes with the shebang line, so be sure to tell your editor not to write the BOM.
If the web server is using mod_python, mod_python may be having problems. mod_python is able to handle CGI scripts by itself, but it can also be a source of issues.

it works on my user directory, so one other thing to check would be to see if eli can add a cgi-bin directory, put a script in there and test it.

bug: installation error on networkx

Trying to install on the ucr cluster I'm hitting:

Processing dependencies for osnap==0.1.0
error: networkx 2.2 is installed but networkx<2.0.0 is required by {'region'}

Will investigate. Wanted to flag it here in case other had hit this?

move all shape data to quilt

the package is currently too big to be hosted on PyPI (must be <60mb) so we should probably move all spatial data to the spatialucr quilt account

catch error and update quilt package

we recently moved all shape data into quilt and out of local parquet files. osnap works as normal on fresh installs, but when updating the codebase, nothing triggers a refresh of the quilt package, so osnap ends up asking for a df that isn't there

image

add tests

We need to add tests for the following

- [x] Community plotting plotting directly is a lot easier with the new structure since geoms are attached, but should we still try and implement top-level plotting?

  • geolytics importer
  • from_lodes constructor
  • from_geodataframes constructor
  • from_census constructor
  • harmonize functionality
  • Transition class (using Community as input)
  • alternative distance calcs for Sequence class (markov, arbitrary, trans)

example notebook throws import error on clean install

During handling of the above exception, another exception occurred:

ModuleNotFoundError Traceback (most recent call last)
in
----> 1 import osnap
2 import libpysal as lps
3 import geopandas as gpd
4 import matplotlib.pyplot as plt
5 import seaborn as sns

~/Dropbox/o/osnap/osnap/init.py in
2
3 from . import analytics
----> 4 from . import data
5 from .data import metros

~/Dropbox/o/osnap/osnap/data/init.py in
----> 1 from .data import Dataset, metros, read_ltdb, read_ncdb

~/Dropbox/o/osnap/osnap/data/data.py in
13 quilt.install("spatialucr/census")
14 quilt.install("spatialucr/census_cartographic")
---> 15 from quilt.data.knaaptime import census
16 import matplotlib.pyplot as plt
17 import pandas as pd

ModuleNotFoundError: No module named 'quilt.data.knaaptime'

time series plotting method

it would be really useful to have a .tsplot() method on the Dataset class that arranges plots for each available time period on the same figure

investigate moving from CGI to django

one of the performance bottlenecks on the web interface is that we're using the old CGI interface which is unsustainable for 2 reasons:

  1. it is a script based interface, so each time a script is called it spawns a new python interpreter (which means programs cant really be dynamic)

  2. it is slow

instead, we should consider a modern web framework like django

dependencies missing in environment

We need to keep the environment.yml updated for dependencies so devs have a unified env.

ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
pyarrow or fastparquet is required for parquet support

If the install needs more than conda, we should document that as well.

place api

we could use fuzzywuzzy or similar to add a named-place based api similar to what cenpy and osmnx do

refactor Dataset to Community

The data.Dataset.data namespace is a bad one (for which I'm responsible). I propose renaming the Dataset class to Community, and restructuring it for clarity

A community is a collection of neighborhoods with different properties describing its various boundaries (tracts, counties, states, over several time periods), and compositional attributes, such as data from surveys, or sensors, or geocoded misc.

goals:

  • make the nomenclature more intuitive
  • abstract away from US-only data
  • allow for multiple spatial representations with non-compatible indices (e.g. zipcodes vs tracts or 1990 vs 2000 tracts)

something like

from osnap import Community

# attribs are pd.DataFrames
Community.attrib.census 
Community.attrib.osm
Community.attrib.misc

# boundaries are gpd.GeoDataFrames
Community.boundaries.units # the currently-set primitive units like geopandas.geometry
Community.boundaries.zipcodes
Community.boundaries.counties
Community.boundaries.tracts_1990
Community.boundaries.tracts_2010

I'm thinkning this would simplify class instantiation and the source argument would move to a method (e.g. Community().from_ltdb(...). I'm also imagining methods to collapse to a single wide-form geodataframe and instantiate from one

I'm not particularly wedded to any of the names specifically, but soliciting input on whether folks think this would be a useful direction

Anaconda3 in the installation instruction in osnap

I think it is better to mention that "we suggest to use Anaconda3 for osnap since it has been fully tested in Anaconda3".

Currently the instruction says, "The recommended method for installing OSNAP is with anaconda." But it does not specify either 2 or 3.

When I tried to install osnpa using yml file you provide in Anaconda2, I had errors and could not resolve errors by myself.

create dataset from list of counties

current behavior selects every combination of (2-digit)states + (3-digit)counties. It should also allow lists of 5-digit state-county fips for finer-grain selection

importing geosnap (KeyError: 'states'?)

I forked geosnap and install the develop version and everything went smooth, but when I try to import it, I get this:

image

Is there a way to fix this? Do I need to change something in the data.py?

census attributes missing

Just wanted to remind you that please add census attributes osnap.data.Community in doc_build\html\api.html when you make changes next time

add metadata to existing datasets

It would be helpful to document the original source for the cbsa definitions as these can change over time and we may need to cross-reference different time slices.

For example, we don't have parquet files for the following:

  • 41900 San Germán-Cabo Rojo, PR Metropolitan Statistical Area
  • 41980 San Juan-Caguas-Guaynabo, PR Metropolitan Statistical Area
  • 39140 Prescott, AZ Metropolitan Statistical Area
  • 38660 Ponce, PR Metropolitan Statistical Area
  • 10380 Aguadilla-Isabela-San Sebastián, PR Metropolitan Statistical Area
  • 11640
  • 32420 Mayagüez, PR Metropolitan Statistical Area
  • 25020 Guayama, PR Metropolitan Statistical Area
  • 19380 Dayton, OH Metropolitan Statistical Area

It makes sense that ltdb is not covering PR areas, but the AZ, OH and 11640 are less clear why they are not in the data.

Migrate code from gitlab to here

  • remove large private files from the repository (store on private google drive)
  • set up to test private files locally
  • make travis tests public

Importing OSNAP in Web-enabled python gives PermissionError: [Errno 13] Permission denied: '/var/www/.local'

I am trying to run the example python code to test if it is possible to use osnap in the Web-enabled python.

Here is the code (osnap_example.py) that I am testing:
///////////////////////////////////////////////////////////////////////////////////////////////
#!/home/suhan/public_html/cgi-bin/anaconda3/envs/osnap/bin/python
import cgitb
cgitb.enable()

import platform
import numpy

pltform = platform.platform()

print("Content-type: text/html\n\n")
print("\n")
print("<div style="width: 100%; font-size: 40px; font-weight: bold; text-align: center;">")
print("Python Script Test Page")
print("CGI Works!")
print("

\n\nPlatform: %s

"%pltform)
print("\n\n")

import osnap
//////////////////////////////////////////////////////////////////////////////////////////////////////

I tried to run this code both in Terminal (image on the left) and on the Web (image on the right).
image

The code that run in the Terminal did not give me any error in terms of importing OSNAP (the left mage). But when I run the same code on the Web (http://173.255.192.133/~suhan/cgi-bin/osnap/osnap_example.py), it gives me an PermissionError: [Errno 13] , so OSANP modules cannot be used.

I think they are related to the file permission issues, it seems like I need to change permission level of some files. But I am not sure what to change exactly. It seems to be related to something like " rwxrwxrwx " Is anyone familiar with the permission issues in Linux?

get san diego & DC data from census with `variables.csv`

Working notes

  1. most of the variables in in the sf3, not the sf1. So, this means the connections are to cenpy.base.Connection('2000sf3') and cenpy.base.Connection('DecennialSF31990'), not 2000sf1 or DecennialSF11990
  2. the ACS table column codes are not the SF3 column codes. So, B03001_004E doesn't exist in the sf3. This means we must match by "category," called concept in the census API.
  3. sf3 for 2000 from the API only goes down to block groups, not blocks.

currently, this means building up analogue columns by querying into the pandas dataframe, connection.variables:

import cenpy
c2000sf3 = cenpy.base.Connection('2000sf3')
has_ancestry = c2000sf3.variables['concept'].str.lower().apply(lambda x: 'ancestry' in x)
c2000sf3.variables[has_ancestry]['concept'] 

So, if we can get the column names that correspond, then:

c2000sf3.query(columns=columns, geo_unit='block group', 
               geo_filter=dict(state='06',county='073', tract='*'))

should work.

Reading LTBD data - inconsistent paths for full count and sample files?

Not sure why the relative path of LTDB full count file is different from the sample file?

For instance, the path to the full count 1980 is "LTDB_Std_1980_fullcount.csv":

    fullcount80 = _ltdb_reader(
        fullcount_zip, "LTDB_Std_1980_fullcount.csv", year=1980)

while the path for the sample file lives in another folder "ltdb_std_all_sample/ltdb_std_1980_sample.csv":

    sample80 = _ltdb_reader(
        sample_zip,
        "ltdb_std_all_sample/ltdb_std_1980_sample.csv",
        dropcols=["pop80sf3", "pop80sf4", "hu80sp", "ohu80sp"],
        year=1980,
    )

It makes more sense to adopt either form to keep consistency.

cenpy interface

the old decennial census downloaders need to be rewritten to support the new and improved cenpy API. We also need to add downloaders for ACS (which will be easier now)

LTDB reader: adjust for inflation?

I might be missing sth, but based on this line:inflate_cols = ["mhmval", "mrent", "hinc"], I think only these three variables are adjusted for inflation while the other economic variables like per capita income incpc are not?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.