peopleforbikes / brokenspoke-analyzer Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 3.0 6.7 MB

Run a BNA analysis locally.

Home Page: https://peopleforbikes.github.io/brokenspoke-analyzer/

License: MIT License

Python 98.01% Just 1.11% Dockerfile 0.87%

bna osm peopleforbikes

brokenspoke-analyzer's Introduction

Brokenspoke-analyzer

The Brokenspoke Analyzer is a tool allowing the user to run “Bicycle Network Analysis” locally.

Requirements

docker: official page
docker compose plugin V2: official page
osmium: official page
osm2pgrouting: official page
osm2pgsql: official page
osmconvert: OSM wiki
osmium-tool: official page
psql: official page
postgis: official page

Quickstart

We recommend using Poetry for installing the tool and working in a virtual environment. Once you have Poetry set up:

git clone [email protected]:PeopleForBikes/brokenspoke-analyzer.git
cd brokenspoke-analyzer
poetry install

Activate the virtual environment in the cloned folder by using:

source .venv/bin/activate

The simplest way to run an analysis is to use docker compose.

bna run-with compose usa "santa rosa" "new mexico" 3570670

This command takes care of starting and stopping the PostgreSQL/PostGIS server, running all the analysis commands, and exporting the results.

The data required to perform the analysis will be saved in data/santa-rosa-new-mexico-usa, and the results exported in results/usa/new mexico/santa rosa/23.11.

For more details about the different ways to run an analysis and how to adjust the options, please refer to the full documentation.

brokenspoke-analyzer's People

Contributors

Stargazers

Watchers

Forkers

rgreinho lalver1 mitchellhenke

brokenspoke-analyzer's Issues

Feature request

Current Behavior

This sub-command does not exist.

Expected Behavior

The compare command will add the ability to compare the results from the original BNA and the brokenspoke-analyzer in an automated fashion.

This command could potentially be hidden, or marked as experimental.

Cannot run analysis for "spain valencia"

Bug report

Trying to run an analysis for the city of Valencia, in Spain, in the Valenciana community.

Current Behavior

It seems like the analysis is failing because the city is composed of several polygons.

❯ poetry run bna run spain valencia
/Users/rgreinhofer/projects/PeopleForBikes/brokenspoke-analyzer/.venv/lib/python3.10/site-packages/geopandas/_compat.py:112: UserWarning: The Shapely GEOS version (3.10.2-CAPI-1.16.0) is incompatible with the GEOS version PyGEOS was compiled with (3.10.1-CAPI-1.16.0). Conversions between both will be slow.
  warnings.warn(
2022-08-07 10:24:00.967 | DEBUG    | brokenspoke_analyzer.core.analysis:retrieve_city_boundaries:125 - Query used to retrieve the boundaries: valencia, spain
/Users/rgreinhofer/projects/PeopleForBikes/brokenspoke-analyzer/brokenspoke_analyzer/core/analysis.py:130: UserWarning: Column names longer than 10
characters will be truncated when saved to ESRI Shapefile.
  city_gdf.to_file(output / f"{slug}.shp")
[10:24:01] Boundary files ready.                                                                                                             cli.py:100
           OSM Region file downloaded.                                                                                                       cli.py:106
           OSM file for valencia ready.                                                                                                      cli.py:115
2022-08-07 10:24:01.834 | DEBUG    | brokenspoke_analyzer.core.processhelper:run:13 - cmd='docker run --rm -e PFB_SHPFILE="/data/valencia-spain.shp" -e PFB_OSM_FILE="/data/valencia-spain.osm" -e PFB_STATE=al -e PFB_STATE_FIPS=91 -e NB_OUTPUT_DIR=/data -e PFB_DEBUG=1 -v "/Users/rgreinhofer/projects/PeopleForBikes/brokenspoke-analyzer/data":/data azavea/analyzer:13-3.1'
"docker run --rm -e PFB_SHPFILE="/data/valencia-spain.shp" -e PFB_OSM_FILE="/data/valencia-spain.osm" -e PFB_STATE=al -e PFB_STATE_FIPS=91 -e
NB_OUTPUT_DIR=/data -e PFB_DEBUG=1 -v "/Users/rgreinhofer/projects/PeopleForBikes/brokenspoke-analyzer/data":/data azavea/analyzer:13-3.1" failed to
execute with error code 1 for the following reason:
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
NOTICE:  empty string is not a valid password, clearing password
2022-08-07 15:24:36.862 UTC [14] LOG:  starting PostgreSQL 13.7 (Debian 13.7-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6)
10.2.1 20210110, 64-bit
2022-08-07 15:24:36.868 UTC [14] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-08-07 15:24:36.868 UTC [14] LOG:  listening on IPv6 address "::", port 5432
2022-08-07 15:24:36.872 UTC [14] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-08-07 15:24:36.900 UTC [276] LOG:  database system was shut down at 2022-08-07 15:24:36 UTC
2022-08-07 15:24:36.921 UTC [14] LOG:  database system is ready to accept connections
+ '[' ./scripts/import.sh = ./scripts/import.sh ']'
+ '[' /data/valencia-spain.shp = --help ']'
+ '[' -z /data/valencia-spain.shp ']'
+ ../import/import_neighborhood.sh /data/valencia-spain.shp 91
+ '[' ../import/import_neighborhood.sh = ../import/import_neighborhood.sh ']'
+ '[' /data/valencia-spain.shp = --help ']'
+ '[' -z /data/valencia-spain.shp ']'
+ NB_BOUNDARY_FILE=/data/valencia-spain.shp
+ NB_STATE_FIPS=91
++ mktemp -d
+ NB_TEMPDIR=/tmp/tmp.KM5HwdqedT/import_neighborhood
+ mkdir -p /tmp/tmp.KM5HwdqedT/import_neighborhood
+ NB_BOUNDARY_BUFFER=2680
+ update_status IMPORTING 'Importing boundary shapefile'
+ echo 'Updating job status: IMPORTING' 'Importing boundary shapefile'
+ '[' -n '' ']'
+ import_and_transform_shapefile /data/valencia-spain.shp neighborhood_boundary 4326
+ IMPORT_FILE=/data/valencia-spain.shp
+ IMPORT_TABLENAME=neighborhood_boundary
+ IMPORT_SRID=4326
+ echo 'START: Importing neighborhood_boundary'
+ shp2pgsql -I -p -D -s 4326 /data/valencia-spain.shp neighborhood_boundary
+ psql -h localhost -U gis -d pfb
Field bbox_north is an FTDouble with width 24 and precision 15
Field bbox_south is an FTDouble with width 24 and precision 15
Field bbox_east is an FTDouble with width 24 and precision 15
Field bbox_west is an FTDouble with width 24 and precision 15
Field place_id is an FTDouble with width 18 and precision 0
Field osm_id is an FTDouble with width 18 and precision 0
Field lat is an FTDouble with width 24 and precision 15
Field lon is an FTDouble with width 24 and precision 15
Field importance is an FTDouble with width 24 and precision 15
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
+ shp2pgsql -I -d -D -s 4326 /data/valencia-spain.shp neighborhood_boundary
+ psql -h localhost -U gis -d pfb
Field bbox_north is an FTDouble with width 24 and precision 15
Field bbox_south is an FTDouble with width 24 and precision 15
Field bbox_east is an FTDouble with width 24 and precision 15
Field bbox_west is an FTDouble with width 24 and precision 15
Field place_id is an FTDouble with width 18 and precision 0
Field osm_id is an FTDouble with width 18 and precision 0
Field lat is an FTDouble with width 24 and precision 15
Field lon is an FTDouble with width 24 and precision 15
Field importance is an FTDouble with width 24 and precision 15
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
Unable to convert data value to UTF-8 (iconv reports "Invalid or incomplete multibyte or wide character"). Current encoding is "UTF-8". Try "LATIN1"
(Western European), or one of the values described at http://www.postgresql.org/docs/current/static/multibyte.html.
+ psql -h localhost -U gis -d pfb -c 'ALTER TABLE neighborhood_boundary ALTER COLUMN geom             TYPE geometry(MultiPolygon,32630) USING
ST_Force2d(ST_Transform(geom,32630));'
2022-08-07 15:24:43.351 UTC [353] ERROR:  relation "neighborhood_boundary" does not exist
2022-08-07 15:24:43.351 UTC [353] STATEMENT:  ALTER TABLE neighborhood_boundary ALTER COLUMN geom             TYPE geometry(MultiPolygon,32630) USING
ST_Force2d(ST_Transform(geom,32630));
ERROR:  relation "neighborhood_boundary" does not exist
2022-08-07 15:24:43.556 UTC [14] LOG:  received fast shutdown request
2022-08-07 15:24:43.559 UTC [14] LOG:  aborting any active transactions
2022-08-07 15:24:43.570 UTC [14] LOG:  background worker "logical replication launcher" (PID 285) exited with exit code 1
2022-08-07 15:24:43.573 UTC [278] LOG:  shutting down
2022-08-07 15:24:43.678 UTC [14] LOG:  database system is shut down
.

Feature request

Current Behavior

The files are downloaded every time we delete the prepared files or when we run the tests.

Expected Behavior

We should cache the downloads instead of re-downloading the files for nothing.

Possible Solution

We should create a .bna_cache folder which would contain the files successfully downloaded.
When we would prepare the fhe files required for an analysis we would check in the cache folder first.
- For example the water_census_blocks file or the speed_limits file are always the same for any US city. The OSM files should be cached as well in case we re-run an analysis for a city, or for several cities in the same state/region.

Import the new functions from the modular BNA.

Feature request

Expected Behavior

Reorganize the API based on what was added during the modular BNA experiment.

Use SQLFluff to validate SQL scripts.

Feature request

Current Behavior

The SQL scripts were created by hand and do not follow any kind of automated formatting/convention.

Expected Behavior

Use SQLFluff as part of the linting process in the CI.

Cannot specify the output directory for the prepare command

Bug report

Running the prepare command with an output directory fails if the state is not specified.

Current Behavior

bna prepare arizona flagstaff /tmp/data

fails with the following error:

ValueError: Nominatim geocoder returned 0 results for query 'flagstaff, /tmp/data, arizona'

Expected Behavior

The prepare command should complete and store the files in /tmp/data.

Possible Solution

The prepare command has the following syntax:

 Usage: bna prepare [OPTIONS] COUNTRY CITY [STATE] [OUTPUT_DIR]

Which makes it impossible to specify a output_dir without also specifying a state.

A possible solution would be to make output_dir an option instead of an argument: -o / --output_dir.

Rewrite the export script in Python

Feature request

Current Behavior

The script is written in Bash.

Expected Behavior

The scripts following scripts must be rewritten in Python:

40-export-export_connectivity.sh

The export must implement the logic to support multiple export formats. The first 2 options required are local export and S2 export.

The export MUST NOT use uuids, but instead must use Calver to identify the different BNA run results.

Feature request

Expected Behavior

Create a dedicated package for the CLI, and a dedicated module for each sub-command:

cli
├──__init__.py
├── root.py
├── prepare.py
├── import.py
├── compute.py
├── export.py
└── run.py

Refer to the Typer documentation to see how to proceed to organize the sub-commands.

The commands are the following ones:

Prepare: prepare the necessary files to run the analysis
Import: import the files into PostgreSQL/PostGIS
Compute: perform the analysis
Export: export the results.
Run: a combination of all the previous steps
- With or without managing docker-compose
- Keep the simplicity: bna run [ [<city_fips>]]
  - US cities must specify the state and the city FIPS code.
  - For non-US cities:
    - The FIPS code is always ignored and set to 0000000.
    - The state is optional, and useful only if there are several cities with the same name in different parts of the country or to retrieve a specific (sub)region (it is used only to retrieve the boundary file and the region file).
  - We want to keep the ability to run the original BNA as well

Other requirements:

The CLI should be just a thin wrapper passing parameters to core functions and orchestrating them.
Use smart default values
- Better let it fail than producing incorrect results.
The CLI must display user friendly error messages by default. If opted-in it should show the stack trace.

Use Mypy for static type checking in function signatures

Feature request

Current Behavior

The current code base does not have any type annotations.

Expected Behavior

Use type annotations for our modules. The vendored modules should be ignored.

Add the ability to define synthetic population square size

Feature request

Current Behavior

The size of a synthetic population square is currently defined as a constant in the code with the value of 1000 x 1000 meters.

Expected Behavior

As a user I would like to be able to pass the square size on the command line.

Possible Solution

Add a new flag to the prepare command. For instance --square-size.

Add progressive verbosity to the CLI

Feature request

Current Behavior

The logging verbosity is currently hardcoded.

Expected Behavior

As a user I would like to be able to combine-v/-vvv on the CLI to increase the verbosity level.

Partial downloads should be deleted

Bug report

Current Behavior

When running bna prepare or bna run and the download of one of the input files fails, the file stays on disk. If the command is run a second time, the file will be considered as downloaded even though it is incomplete, causing the step to be skipped.

Expected Behavior

If the file was not download entirely, if should be deleted from the disk.

Possible Solution

Steps to Reproduce

Run bna prepare arizona flagstaff, and cancel the process with CTRL+C in the middle of the download (or turn off the internet or wifi).

Cannot run analysis for "lorraine metz"

Bug report

Trying to run an analysis for the city of Metz, in France, in the Lorraine region.

Current Behavior

❯ poetry run bna run lorraine metz
/Users/rgreinhofer/projects/PeopleForBikes/brokenspoke-analyzer/.venv/lib/python3.10/site-packages/geopandas/_compat.py:112: UserWarning: The Shapely GEOS version (3.10.2-CAPI-1.16.0) is incompatible with the GEOS version PyGEOS was compiled with (3.10.1-CAPI-1.16.0). Conversions between both will be slow.
  warnings.warn(
2022-08-07 09:15:19.581 | DEBUG    | brokenspoke_analyzer.core.analysis:retrieve_city_boundaries:125 - Query used to retrieve the boundaries: metz, lorraine
/Users/rgreinhofer/projects/PeopleForBikes/brokenspoke-analyzer/brokenspoke_analyzer/core/analysis.py:130: UserWarning: Column names longer than 10
characters will be truncated when saved to ESRI Shapefile.
  city_gdf.to_file(output / f"{slug}.shp")
[09:15:19] Boundary files ready.                                                                                                             cli.py:100
           OSM Region file downloaded.                                                                                                       cli.py:106
           OSM file for metz ready.                                                                                                          cli.py:115
2022-08-07 09:15:20.010 | DEBUG    | brokenspoke_analyzer.core.processhelper:run:13 - cmd='docker run --rm -e PFB_SHPFILE="/data/metz-lorraine.shp" -e PFB_OSM_FILE="/data/metz-lorraine.osm" -e PFB_STATE=al -e PFB_STATE_FIPS=91 -e NB_OUTPUT_DIR=/data -e PFB_DEBUG=1 -v "/Users/rgreinhofer/projects/PeopleForBikes/brokenspoke-analyzer/data":/data azavea/analyzer:13-3.1'
"docker run --rm -e PFB_SHPFILE="/data/metz-lorraine.shp" -e PFB_OSM_FILE="/data/metz-lorraine.osm" -e PFB_STATE=al -e PFB_STATE_FIPS=91 -e
NB_OUTPUT_DIR=/data -e PFB_DEBUG=1 -v "/Users/rgreinhofer/projects/PeopleForBikes/brokenspoke-analyzer/data":/data azavea/analyzer:13-3.1" failed to
execute with error code 3 for the following reason:
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
NOTICE:  empty string is not a valid password, clearing password
2022-08-07 14:15:55.403 UTC [14] LOG:  starting PostgreSQL 13.7 (Debian 13.7-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6)
10.2.1 20210110, 64-bit
2022-08-07 14:15:55.408 UTC [14] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-08-07 14:15:55.408 UTC [14] LOG:  listening on IPv6 address "::", port 5432
2022-08-07 14:15:55.412 UTC [14] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-08-07 14:15:55.440 UTC [275] LOG:  database system was shut down at 2022-08-07 14:15:55 UTC
2022-08-07 14:15:55.462 UTC [14] LOG:  database system is ready to accept connections
+ '[' ./scripts/import.sh = ./scripts/import.sh ']'
+ '[' /data/metz-lorraine.shp = --help ']'
+ '[' -z /data/metz-lorraine.shp ']'
+ ../import/import_neighborhood.sh /data/metz-lorraine.shp 91
+ '[' ../import/import_neighborhood.sh = ../import/import_neighborhood.sh ']'
+ '[' /data/metz-lorraine.shp = --help ']'
+ '[' -z /data/metz-lorraine.shp ']'
+ NB_BOUNDARY_FILE=/data/metz-lorraine.shp
+ NB_STATE_FIPS=91
++ mktemp -d
+ NB_TEMPDIR=/tmp/tmp.9ObXVP2Q62/import_neighborhood
+ mkdir -p /tmp/tmp.9ObXVP2Q62/import_neighborhood
+ NB_BOUNDARY_BUFFER=2680
+ update_status IMPORTING 'Importing boundary shapefile'
+ echo 'Updating job status: IMPORTING' 'Importing boundary shapefile'
+ '[' -n '' ']'
+ import_and_transform_shapefile /data/metz-lorraine.shp neighborhood_boundary 4326
+ IMPORT_FILE=/data/metz-lorraine.shp
+ IMPORT_TABLENAME=neighborhood_boundary
+ IMPORT_SRID=4326
+ echo 'START: Importing neighborhood_boundary'
+ shp2pgsql -I -p -D -s 4326 /data/metz-lorraine.shp neighborhood_boundary
+ psql -h localhost -U gis -d pfb
Field bbox_north is an FTDouble with width 24 and precision 15
Field bbox_south is an FTDouble with width 24 and precision 15
Field bbox_east is an FTDouble with width 24 and precision 15
Field bbox_west is an FTDouble with width 24 and precision 15
Field place_id is an FTDouble with width 18 and precision 0
Field osm_id is an FTDouble with width 18 and precision 0
Field lat is an FTDouble with width 24 and precision 15
Field lon is an FTDouble with width 24 and precision 15
Field importance is an FTDouble with width 24 and precision 15
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
+ shp2pgsql -I -d -D -s 4326 /data/metz-lorraine.shp neighborhood_boundary
+ psql -h localhost -U gis -d pfb
Field bbox_north is an FTDouble with width 24 and precision 15
Field bbox_south is an FTDouble with width 24 and precision 15
Field bbox_east is an FTDouble with width 24 and precision 15
Field bbox_west is an FTDouble with width 24 and precision 15
Field place_id is an FTDouble with width 18 and precision 0
Field osm_id is an FTDouble with width 18 and precision 0
Field lat is an FTDouble with width 24 and precision 15
Field lon is an FTDouble with width 24 and precision 15
Field importance is an FTDouble with width 24 and precision 15
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
+ psql -h localhost -U gis -d pfb -c 'ALTER TABLE neighborhood_boundary ALTER COLUMN geom             TYPE geometry(MultiPolygon,32632) USING
ST_Force2d(ST_Transform(geom,32632));'
+ echo 'DONE: Importing neighborhood_boundary'
+ update_status IMPORTING 'Downloading water blocks'
+ echo 'Updating job status: IMPORTING' 'Downloading water blocks'
+ '[' -n '' ']'
+ psql -h localhost -U gis -d pfb -c '
        CREATE TABLE IF NOT EXISTS "water_blocks" (
            "STATEFP10" integer,
            "COUNTYFP10" integer,
            "TRACTCE10" integer,
            "BLOCKCE10" integer,
            GEOID varchar(15),
            "NAME10" char(10),
            "MTFCC10" char(5),
            "UR10" char(1),
            "UACE10" integer,
            "UATYP10" char(1),
            "FUNCSTAT10" char(1),
            "ALAND10" integer,
            "AWATER10" bigint,
            "INTPTLAT10" decimal,
            "INTPTLON10" decimal
        );'
+ WATER_FILENAME=censuswaterblocks
+ WATER_DOWNLOAD=/tmp/tmp.9ObXVP2Q62/import_neighborhood/censuswaterblocks.zip
+ wget -nv -O /tmp/tmp.9ObXVP2Q62/import_neighborhood/censuswaterblocks.zip https://s3.amazonaws.com/pfb-public-documents/censuswaterblocks.zip
2022-08-07 14:16:04 URL:https://s3.amazonaws.com/pfb-public-documents/censuswaterblocks.zip [13306081/13306081] ->
"/tmp/tmp.9ObXVP2Q62/import_neighborhood/censuswaterblocks.zip" [1]
+ unzip /tmp/tmp.9ObXVP2Q62/import_neighborhood/censuswaterblocks.zip -d /tmp/tmp.9ObXVP2Q62/import_neighborhood
+ psql -h localhost -U gis -d pfb -c '\copy water_blocks FROM /tmp/tmp.9ObXVP2Q62/import_neighborhood/censuswaterblocks.csv delimiter '\'','\'' csv
header'
+ echo 'DONE: Importing water blocks'
+ update_status IMPORTING 'Downloading census blocks'
+ echo 'Updating job status: IMPORTING' 'Downloading census blocks'
+ '[' -n '' ']'
+ NB_BLOCK_FILENAME=tabblock2010_91_pophu
+ S3_PATH=s3:///data/tabblock2010_91_pophu.zip
+ '[' -f /data/tabblock2010_91_pophu.zip ']'
+ echo 'Using local census blocks file'
+ BLOCK_DOWNLOAD=/data/tabblock2010_91_pophu.zip
+ unzip /data/tabblock2010_91_pophu.zip -d /tmp/tmp.9ObXVP2Q62/import_neighborhood
+ update_status IMPORTING 'Loading census blocks'
+ echo 'Updating job status: IMPORTING' 'Loading census blocks'
+ '[' -n '' ']'
+ import_and_transform_shapefile /tmp/tmp.9ObXVP2Q62/import_neighborhood/tabblock2010_91_pophu.shp neighborhood_census_blocks 4326
+ IMPORT_FILE=/tmp/tmp.9ObXVP2Q62/import_neighborhood/tabblock2010_91_pophu.shp
+ IMPORT_TABLENAME=neighborhood_census_blocks
+ IMPORT_SRID=4326
+ echo 'START: Importing neighborhood_census_blocks'
+ shp2pgsql -I -p -D -s 4326 /tmp/tmp.9ObXVP2Q62/import_neighborhood/tabblock2010_91_pophu.shp neighborhood_census_blocks
+ psql -h localhost -U gis -d pfb
Field pop10 is an FTDouble with width 18 and precision 0
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
+ psql -h localhost -U gis -d pfb
+ shp2pgsql -I -d -D -s 4326 /tmp/tmp.9ObXVP2Q62/import_neighborhood/tabblock2010_91_pophu.shp neighborhood_census_blocks
Field pop10 is an FTDouble with width 18 and precision 0
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
+ psql -h localhost -U gis -d pfb -c 'ALTER TABLE neighborhood_census_blocks ALTER COLUMN geom             TYPE geometry(MultiPolygon,32632) USING
ST_Force2d(ST_Transform(geom,32632));'
+ echo 'DONE: Importing neighborhood_census_blocks'
+ update_status IMPORTING 'Applying boundary buffer'
+ echo 'Updating job status: IMPORTING' 'Applying boundary buffer'
+ '[' -n '' ']'
+ echo 'START: Removing blocks outside buffer with size 2680'
+ psql -h localhost -U gis -d pfb -c 'DELETE FROM neighborhood_census_blocks AS blocks USING neighborhood_boundary                 AS boundary WHERE
NOT ST_DWithin(blocks.geom, boundary.geom,                 2680);'
+ echo 'DONE: Finished removing blocks outside buffer'
+ update_status IMPORTING 'Removing water blocks'
+ echo 'Updating job status: IMPORTING' 'Removing water blocks'
+ '[' -n '' ']'
+ echo 'START: Removing blocks that are 100% water from analysis'
+ psql -h localhost -U gis -d pfb -c 'DELETE FROM neighborhood_census_blocks AS blocks USING water_blocks                 AS water WHERE
blocks.BLOCKID10 = water.geoid;'
+ echo 'DONE: FINISHED removing blocks that are 100% water'
++ psql -h localhost -U gis -d pfb -t -c 'SELECT count(*) as total_census_blocks FROM neighborhood_census_blocks;'
+ BLOCK_COUNT='                   0'
+ echo 'Census Blocks in analysis:                    0'
+ set_job_attr census_block_count '                   0'
+ '[' -n '' ']'
+ rm -rf /tmp/tmp.9ObXVP2Q62/import_neighborhood
+ ../import/import_jobs.sh al
++ dirname ../import/import_jobs.sh
+ source ../import/../scripts/utils.sh
+ '[' ../import/import_jobs.sh = ../import/import_jobs.sh ']'
+ '[' al = --help ']'
+ '[' -z al ']'
+ NB_STATE_ABBREV=al
+ update_status IMPORTING 'Importing jobs data'
+ echo 'Updating job status: IMPORTING' 'Importing jobs data'
+ '[' -n '' ']'
+ import_job_data al main
++ mktemp -d
+ ROOT_TEMPDIR=/tmp/tmp.xsIOzjVHGK
+ NB_TEMPDIR=/tmp/tmp.xsIOzjVHGK/import_jobs
+ mkdir -p /tmp/tmp.xsIOzjVHGK/import_jobs
+ chmod -R 775 /tmp/tmp.xsIOzjVHGK
+ NB_STATE_ABBREV=al
+ NB_DATA_TYPE=main
+ NB_JOB_FILENAME=al_od_main_JT00_2018.csv
+ S3_PATH=s3:///data/al_od_main_JT00_2018.csv.gz
+ '[' -f /data/al_od_main_JT00_2018.csv.gz ']'
+ '[' '' ']'
+ JOB_DOWNLOAD=/tmp/tmp.xsIOzjVHGK/import_jobs/al_od_main_JT00_2018.csv.gz
+ set +e
+ wget -nv -O /tmp/tmp.xsIOzjVHGK/import_jobs/al_od_main_JT00_2018.csv.gz
http://lehd.ces.census.gov/data/lodes/LODES7/al/od/al_od_main_JT00_2018.csv.gz
2022-08-07 14:16:31 URL:https://lehd.ces.census.gov/data/lodes/LODES7/al/od/al_od_main_JT00_2018.csv.gz [10203041/10203041] ->
"/tmp/tmp.xsIOzjVHGK/import_jobs/al_od_main_JT00_2018.csv.gz" [1]
+ WGET_STATUS=0
+ set -e
+ [[ 0 -eq 8 ]]
+ '[' '' ']'
+ gunzip -c /tmp/tmp.xsIOzjVHGK/import_jobs/al_od_main_JT00_2018.csv.gz
+ psql -h localhost -U gis -d pfb -c '
CREATE TABLE IF NOT EXISTS "state_od_main_JT00" (
    w_geocode varchar(15),
    h_geocode varchar(15),
    "S000" integer,
    "SA01" integer,
    "SA02" integer,
    "SA03" integer,
    "SE01" integer,
    "SE02" integer,
    "SE03" integer,
    "SI01" integer,
    "SI02" integer,
    "SI03" integer,
    createdate VARCHAR(32)
);'
+ psql -h localhost -U gis -d pfb -c 'TRUNCATE TABLE "state_od_main_JT00";'
+ psql -h localhost -U gis -d pfb -c '\copy "state_od_main_JT00"(w_geocode, h_geocode, "S000", "SA01", "SA02", "SA03", "SE01", "SE02", "SE03", "SI01",
"SI02", "SI03", createdate) FROM '\''/tmp/tmp.xsIOzjVHGK/import_jobs/al_od_main_JT00_2018.csv'\'' DELIMITER '\'','\'' CSV HEADER;'
+ rm -rf /tmp/tmp.xsIOzjVHGK/import_jobs
+ import_job_data al aux
+ ROOT_TEMPDIR=/tmp/tmp.xsIOzjVHGK/import_jobs
+ NB_TEMPDIR=/tmp/tmp.xsIOzjVHGK/import_jobs/import_jobs
+ mkdir -p /tmp/tmp.xsIOzjVHGK/import_jobs/import_jobs
+ chmod -R 775 /tmp/tmp.xsIOzjVHGK/import_jobs
+ NB_STATE_ABBREV=al
+ NB_DATA_TYPE=aux
+ NB_JOB_FILENAME=al_od_aux_JT00_2018.csv
+ S3_PATH=s3:///data/al_od_aux_JT00_2018.csv.gz
+ '[' -f /data/al_od_aux_JT00_2018.csv.gz ']'
+ '[' '' ']'
+ JOB_DOWNLOAD=/tmp/tmp.xsIOzjVHGK/import_jobs/import_jobs/al_od_aux_JT00_2018.csv.gz
+ set +e
+ wget -nv -O /tmp/tmp.xsIOzjVHGK/import_jobs/import_jobs/al_od_aux_JT00_2018.csv.gz
http://lehd.ces.census.gov/data/lodes/LODES7/al/od/al_od_aux_JT00_2018.csv.gz
2022-08-07 14:16:57 URL:http://lehd.ces.census.gov/data/lodes/LODES7/al/od/al_od_aux_JT00_2018.csv.gz [665598/665598] ->
"/tmp/tmp.xsIOzjVHGK/import_jobs/import_jobs/al_od_aux_JT00_2018.csv.gz" [1]
+ WGET_STATUS=0
+ set -e
+ [[ 0 -eq 8 ]]
+ '[' '' ']'
+ gunzip -c /tmp/tmp.xsIOzjVHGK/import_jobs/import_jobs/al_od_aux_JT00_2018.csv.gz
+ psql -h localhost -U gis -d pfb -c '
CREATE TABLE IF NOT EXISTS "state_od_aux_JT00" (
    w_geocode varchar(15),
    h_geocode varchar(15),
    "S000" integer,
    "SA01" integer,
    "SA02" integer,
    "SA03" integer,
    "SE01" integer,
    "SE02" integer,
    "SE03" integer,
    "SI01" integer,
    "SI02" integer,
    "SI03" integer,
    createdate VARCHAR(32)
);'
+ psql -h localhost -U gis -d pfb -c 'TRUNCATE TABLE "state_od_aux_JT00";'
+ psql -h localhost -U gis -d pfb -c '\copy "state_od_aux_JT00"(w_geocode, h_geocode, "S000", "SA01", "SA02", "SA03", "SE01", "SE02", "SE03", "SI01",
"SI02", "SI03", createdate) FROM '\''/tmp/tmp.xsIOzjVHGK/import_jobs/import_jobs/al_od_aux_JT00_2018.csv'\'' DELIMITER '\'','\'' CSV HEADER;'
+ rm -rf /tmp/tmp.xsIOzjVHGK/import_jobs/import_jobs
+ ../import/import_osm.sh /data/metz-lorraine.osm
NOTICE:  table "neighborhood_ways" does not exist, skipping
NOTICE:  table "neighborhood_ways_intersections" does not exist, skipping
NOTICE:  table "neighborhood_relations_ways" does not exist, skipping
NOTICE:  table "neighborhood_osm_nodes" does not exist, skipping
NOTICE:  table "neighborhood_osm_relations" does not exist, skipping
NOTICE:  table "neighborhood_osm_way_classes" does not exist, skipping
NOTICE:  table "neighborhood_osm_way_tags" does not exist, skipping
NOTICE:  table "neighborhood_osm_way_types" does not exist, skipping
NOTICE:  table "neighborhood_cycwys_ways" does not exist, skipping
NOTICE:  table "neighborhood_cycwys_ways_vertices_pgr" does not exist, skipping
NOTICE:  table "neighborhood_cycwys_relations_ways" does not exist, skipping
NOTICE:  table "neighborhood_cycwys_osm_nodes" does not exist, skipping
NOTICE:  table "neighborhood_cycwys_osm_relations" does not exist, skipping
NOTICE:  table "neighborhood_cycwys_osm_way_classes" does not exist, skipping
NOTICE:  table "neighborhood_cycwys_osm_way_tags" does not exist, skipping
NOTICE:  table "neighborhood_cycwys_osm_way_types" does not exist, skipping
osmconvert Error: use border format:  -b="x1,y1,x2,y2"
2022-08-07 14:17:10.633 UTC [14] LOG:  received fast shutdown request
2022-08-07 14:17:10.635 UTC [14] LOG:  aborting any active transactions
2022-08-07 14:17:10.647 UTC [14] LOG:  background worker "logical replication launcher" (PID 284) exited with exit code 1
2022-08-07 14:17:10.690 UTC [277] LOG:  shutting down
2022-08-07 14:17:10.843 UTC [14] LOG:  database system is shut down
.

Confusing output_dir option for the bna run command

Bug report

Current Behavior

The bna run command provides the output_dir option, which is used to specify both the input file directory (where the shapefile, census files, etc. are located) and the directory where the analysis results will be stored. This behavior is confusing and prevents the user to actually separate the input and output data.

Expected Behavior

There should be a way to specify both/either the location of the input data and output results.

The tool should also allow the user to run the bna prepare command to fetch the data, and then the bna run command using this data:

bna prepare arizona flagstaff --output-dir ~/my-data-directory
bna run arizona flagstaff --data_dir ~/my-data-directory --output-dir ~/my-result-directory

Possible Solution

An idea could be to add a new input-dir flag (or similar), and use output-dir only to specify where to write the results.

Remove dependency to GDAL

Feature request

Current Behavior

We currently use GDAL to determine the UTM zone of a shapefile. However it is very complicated to install in an unattended fashion and the python bindings depend on C libraries which are different on every system, making the setup experience even worse.

Expected Behavior

The setup should be simpler and the end user should not have to go through so many steps just to be able to determine the UTM zzone of a shape file.

Possible Solution

Use the Geopandas estimate_utm_crs function instead (https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.estimate_utm_crs.html)

>>> import geopandas as gpd
>>> gdf = gpd.read_file("santa-rosa-new-mexico-usa.shp")
>>> gdf.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

>>> gdf.estimate_utm_crs()
<Projected CRS: EPSG:32613>
Name: WGS 84 / UTM zone 13N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 108°W and 102°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - Northwest Territories (NWT); Nunavut; Saskatchewan. Mexico. United States (USA).
- bounds: (-108.0, 0.0, -102.0, 84.0)
Coordinate Operation:
- name: UTM zone 13N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

>>> utm = gdf.estimate_utm_crs()
>>> utm.utm_zone
'13N'
>>> utm.to_string()
'EPSG:32613'
>>> utm.to_string()[5:]
'32613'

Add the ability to define synthetic population amount per square

Feature request

Current Behavior

The amount of the synthetic population per square is currently defined as a constant in the code with the value of 100 people.

Expected Behavior

As a user I would like to be able to pass the population amount per square on the command line.

Possible Solution

Add a new flag to the prepare command. For instance --square-population.

Validate downloads with cheksums

Feature request

When possible it would be nice to validate the downloads with their checksums.

Current Behavior

The files are just downloaded and written on disk. There is no validation.

Expected Behavior

Once the download is complete, we could use the checksum file to ensure that it did not get corrupted.

Possible Solution

For example, for Geofabrik, they provide checksums for their file: https://download.geofabrik.de/north-america/us.html

Create a dependency graph of the SQL scripts

Feature request

Current Behavior

We know that the scripts must be run in order, but this order is currently not documented.

Expected Behavior

The graph would allow us to batch the queries and run them asynchronously.

Use latest version of the analyzer

Feature request

Current Behavior

Currently, only the version 0.16 of the analyzer is supported.

Expected Behavior

The latest version includes new features which should make it easier to run analysis for non-US cities. We should be able to run this version.

The docker container runs as root

Bug report

Current Behavior

The docker container runs as root.

Expected Behavior

The user should have the ability to run the docker container using an unprivileged user.

Possible Solution

Add a CLI flag to specify the docker user, like --docker-user or something similar.

See the official documentation for more details regarding the possible implementation.

No GitHub link from the documentation

Bug report

Current Behavior

There is no link to the GitHub Repository from the documentation site.

Expected Behavior

As a user I would like to be able to navigate between the GitHub repository and the documentation simply by clicking a link.

Add subcommands to the "Prepare" command

Feature request

It may be useful to add subcommands to the prepare command so that retrieving the city's boundaries, downloading the OSM region file, reducing the OSM file, and downloading census data can each be ran separately and not only in a group as part of prepare.

Current Behavior

The prepare command runs several preparation steps in sequence as one group.

Expected Behavior

The prepare command would for example look like bna prepare city-boundaries to only download the city boundaries file, or bna prepare osm-region to only download the OSM region file.

Possible Solution

Use the SubCommands concept from Typer.

Incorrect state info and FIPS code for non-us cities

Bug report

Current Behavior

The analyzer requires a FIPS code and state information to work properly.

When non-us cities are being analyzed, the state information defaults to Alabama/AL and the FIPS code to 91. This may cause the results to be incorrect, since the analyzer will then download US census datasets matching the US state of Alabama to compute some results like job opportunities for instance.

Expected Behavior

When analyzing non-US cities, we should be able to skip the parts which are not relevant.

Possible Solution

A fake state could be used, like for instance "ZZ", with a fake FIPS code of 100.

Add the ability to define standard speed limit

Feature request

Current Behavior

The standard speed limit is currently defined as a constant in the code with the value of 50 km/h.

Expected Behavior

As a user I would like to be able to pass the standard speed limit on the command line.

Possible Solution

Add a new flag to the prepare command. For instance --speed-limit.

Ensure the documentation is up to date

Feature request

Expected Behavior

A couple new features and breaking changes were added. We must ensure they get reflected in the documentation. Both the user and contributor sections must be updated accordingly.

Package the client side as a Docker image

Feature request

Current Behavior

There is only a Python setup available.

Expected Behavior

Since the installation of all the dependencies is not straightforward, especially across the multiple platforms we support, we should provide a package image containing all the components.

Possible Solution

The image should be multi-platform.

Log download retries

Feature request

Current Behavior

When the Retryer kicks in and start a new attempt to download the file, there is no direct indication of this behavior in the logs.

Expected Behavior

We should be able to read the logs and see when there were download issues and we had to attempt multiple times to download a file.

Possible Solution

Tenacity offers a before_log feature to log a message before retrying.

Multipolygon cities are not handled properly

Bug report

Current Behavior

When a city is composed of several detached polygons, like Valencia, Spain for instance, only the main (i.e. the largest) one is taken into account.

Expected Behavior

While the results should not change drastically, it would still be more correct to account for all the polygons of a city.

Retry on failure from a prepare step

Feature request

Current Behavior

If a prepare step fails it does not retry automatically.

Expected Behavior

If a step from prepare fails, the brokenspoke-analyzer should retry automatically as this is most likely a network error.

This should work for both the prepare and the run command.

Possible Solution

Use tenacity to configure some retry logic around the download functions.

The number of attempts should be configurable at runtime (https://tenacity.readthedocs.io/en/latest/#changing-arguments-at-run-time), from the command line. Possible the delay as well. For instance:

bna --retry 2 --retry-delay 10 prepare arizona flagstaff

Brokenspoke-analyzer error message

Help request

After initial install of the brokenspoke-analyzer and running a test analysis, I get an error.

Problem

After installing from GitHub directly and running a test command bna run massachusetts pittsfield I get this error:
AttributeError: module 'brokenspoke_analyzer.core.analysis' has no attribute 'download_lodes_data'
Originating from line 291 of cli.py.
Being new to this analyzer I am probably missing something basic, but thought I would ask to see if this has a simple solution.

I can alternatively try to do a Docker install, though I don't yet have experience with Docker (I'm not a software developer) so am curious if it should still be possible to run an analysis using the simpler install without this particular error.

Thanks for helping a newbie.

Explain how to used tagged version

Feature request

Current Behavior

The documentation simply explains how to use the main branch from GitHub.

Expected Behavior

It should also explain how to use the tagged versions.

Make PgAdmin optional in compose.

Feature request

Current Behavior

Currently, docker compose runs PgAdmin, even when it is not required, for instance if one just wants to run an analysis and not inspecting the database.

Expected Behavior

There should be 2 compose file, one to only run PostGIS, the only one to run both PostGIS and PgAdmin.

Possible Solution

Refer to this blog article to get an overview of the ability to extend compose files: https://www.docker.com/blog/improve-docker-compose-modularity-with-include/

Rewrite the prepare script in Python

Feature request

Current Behavior

The script is written in Bash.

Expected Behavior

The scripts following scripts must be rewritten in Python:

01-better_setup_database.sh

Add a command to clean generated files

Feature request

Current Behavior

Currently when an analysis is ran, the generated files will end up in the data folder. These generated files MUST be removed before running a new analysis., otherwise the next run will either fail, either be erroneous.

Expected Behavior

As a user I would like to have the ability to clean the data folder either partially (i.e. only generated files), or fully (i.e. remove the folder).

Possible Solution

# Clean up the generated files
bna clean

# Remove all the data
bna clean --full

Rewrite the compute scripts in Python

Feature request

Current Behavior

The scripts are written in Bash.

Expected Behavior

The scripts following scripts must be rewritten in Python:

30-compute-features.sh
31-compute-stress.sh
32-compute-run-connectivity.sh

The SQL queries must be executed using asyncio.

Adjust log output according to the log level

Feature request

Current Behavior

The log output displays a lot of information that is not always necessary cased on the verbosity level the user wants.

Expected Behavior

The log output should adjust dynamically based on the verbosity level.

Possible Solution

-v: INFO level / basic logging information / message
-vv: DEBUG level / more detailed information / {time:YYYY-MM-DDTHH:mm:ssZZ} {level:.3} {module}:{line} {message}
-vvv: TRACE level / trace information / {time:YYYY-MM-DDTHH:mm:ssZZ} {level:.3} {name}:{line} {message}

Failing "prepare" step does not halt the process

Bug report

Current Behavior

If a step in the prepare command fails, it continues to the next one instead of failing.

Expected Behavior

If a step fail, the program should halt and a clear error message should be displayed to the user.

We also must ensure that the process is in a clean state, for instance by cleaning up incomplete files.

Explore removing the need for Geofrabrik/BBBike

Feature request

Current Behavior

Currently, we retrieve the osm.pbf files from either Geofrabrik, either BBBike. This works fine, but is not always optimal. For instance, when analyzing a Spanish city, we retrieve the data of the full country instead of the specific area since PyrOSM does not yet handle the communities (work in progress: pyrosm/pyrosm#203).

Expected Behavior

We should download the more specific areas all the time. This would speed up the "prepare" part of the process.

Possible Solution

A solution would be to leverage OSMNX to download the specific area in .osm, and then convert it to .osm.pbf with osmium.

Use SQLAlchemy ORM when applicable

Feature request

Current Behavior

We use external SQL scripts to create some tables or perform some basic operations.

Expected Behavior

Instead, we could leverage SQLAlchemy and its ORM.

Here are the candidates for the conversion:

clip_osm.sql
create_jobs_table.sql
create_us_water_blocks_table.sql
speed_tables.sql

There are also some tables created via psql in the code:

the jobs table in the ingestor.load_jobs() function
saving the speed limits into the residential_speed_limit table in the ingestor.manage_speed_limits() function
validating the speed limit data in the ingestor.manage_speed_limits() function

And the last piece would be to fix the dbcore.import_csv_file_with_header() function. An idea for this one would be to keep a version using psql for the import, but creating another version which would use the ORM to create the table if needed and insert the data into it.

Invalid arguments for bna prepare

Bug report

The documentation states that we can run the following command:

bna prepare usa flagstaff arizona

However, this command fails when executed.

Current Behavior

The command ails with the following error message:

ValueError: The dataset 'usa, arizona' is not available.

Expected Behavior

This command should run identically to:

bna prepare arizona flagstaff

Possible Solution

The command fails as it cannot find the dataset for the pair "country, state". This is due to the fact that Geofabrik does not have a dataset for this. It must either be USA, or Arizona.

Therefore a solution could be to ignore the country if the state is specified. In this case, it would lookup for only for the arizona dataset.

Add the run subcommand

Feature request

Expected Behavior

Add a run sub-command which would execute all the analysis steps in a row (i.e. prepare + import + compute + export) from a single command.

Run the integration tests in the CI.

Feature request

Expected Behavior

This PR depends on #297.

We want to be able to run some integration tests in the CI.

Possible Solution

Given the length of the integration tests, we could configure pytest to run x number of tests randomly. The test would come from the modular BNA, and only the tests marked as "XS" should be selected (https://github.com/rgreinho/modular-bna/wiki/Test-cities).

For example, the CI could choose to run the analysis for 3 random XS cities with each PR.

If need be, the full test suite could be ran on a schedule, for instance weekly.

Add CLI flags to configure some osmnx settings

Feature request

Current Behavior

There is currently no way to configure osmnx.

Expected Behavior

As a user, I would like to be able to define some osmnx settings from the CLI and/or environment variables.

Another use case, is that the Docker container does not need to use the osmnx cache since the container will be discarded at the end of its run.

Possible Solution

Here is the osmnx settings reference: https://osmnx.readthedocs.io/en/stable/internals-reference.html#osmnx-settings-module

Add README badges

Feature request

Current Behavior

There is not badges on the README.md.

Expected Behavior

As a user I would like to see relevant badges on the README.md.

Possible Solution

Rewrite the import scripts in Python

Feature request

Current Behavior

The scripts are written in Bash.

Expected Behavior

The scripts following import scripts must be rewritten in Python:

21-import_neighborhood.sh
22-import_jobs.sh
23-import_osm.sh

Customize the theme to make it PFB like

Feature request

Current Behavior

The current documentation uses the default Furo style.

Expected Behavior

The documentation site should be customized to be more PFB like

Possible Solution

Configure loguru with the RichHandler

Feature request

Current Behavior

Loguru and rich do not interact very well together at the moment.

Expected Behavior

The logs should be displayed in a way that works well in the Rich TUI.

Possible Solution

Use the RichHandler: https://rich.readthedocs.io/en/latest/logging.html

Describe what happens during the prepare steps

Feature request

Current Behavior

The prepare command automates a lot of tasks behind the scene, which is great in terms of simplifying the process, but may be too magic if a user needs to understand or modify them.

Expected Behavior

As a user I would like to be able to find a documentation page detailing the steps that happen when the prepare command runs.

Add integration test suite

Feature request

Expected Behavior

Add an integration test suite similar to what was created for the modular-bna experiment: https://github.com/rgreinho/modular-bna/blob/main/tests/test_compare.py

Update instructions to use the previous version of the docker image

Bug report

Current Behavior

Azavea released a new breaking update in August 5th 2022 (azavea/pfb-network-connectivity@7b40bf4), preventing us to use the latest image wit the brokenspoke-analyzer.

Expected Behavior

The instructions should be updated to ensure the image is built with the version from July 1st 2022 (3acceb977ff76b5d01985b78447623bfd4a97bb9).

Possible Solution

A better solution would be to have Azavea release tagged images.