richardbeare / geospatialstroke Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 5.0 114.17 MB

HTML 85.28% R 0.04% Makefile 0.01% Jupyter Notebook 13.69% TeX 0.98%

geospatialstroke's People

Contributors

Stargazers

Watchers

Forkers

napo gboeing mpadge andrewkawai

geospatialstroke's Issues

googleway & madpeck

@richardbeare I've updated the googleway branch with some code and examples. I'm geocoding 10,000 x 3 sets of directions (from random addresses to the rehab centres), so will leave this running this evening and finish up the plot and examples tomorrow.

Catchment basins

@mpadge : I've placed a rough skeleton of the catchment basin example. The plan is

3 Rehabilitation centres
Postcodes within 10km (as with chloropleth example)
Generate sample addresses within each postcode (5 at the moment, but increase when it is working)
Compute road distance and travel time from each address to each rehab hospital
Generate the catchment areas for each rehab hospital
Generate estimates of case load.

It will be good if you can take over this one now. The road distance travel time estimates sound ideal for some of the work you've been doing recently. We should also clean up the geocoding for the rehab centers. I'm not sure how to do better with osm.

The file is rehab_catchment_areas.R in the CatchmentAreas branch

Data sets for examples

@richardbeare - do you have any specific data sets you want us to use when creating the visuals?

Merge branches sooner rather than later

Yo all (@richardbeare @richardbeare @gboeing @njtierney @SymbolixAU), with my latest branch, we are now up to 5 and counting. Merging these is only going to get more difficult, so I suggest that it'll make our work easier if we merge sooner rather than later. My suggestion would be to simply replace each branch with a folder (as with current py branch, and put all the work in the main README.md of that folder, so everything can still be kept separate yet directly seen on github. This will allow for much easier collation into one or two single docs, because everything will be able to be opened at the same time (both locally and on gh) without needing to switch branches.

Please thumbs up if you're in agreement, and let's say as soon as we have 4 or so, then I'd ask @SymbolixAU to please do the merging.

The web site

@SymbolixAU the other major job - assembling this stuff into a sensible website. I only know the basics, so I'm hoping you have some ideas on how to mash it into a github hosted version, while also allowing someone to clone and play around. Feel free to change names etc. I presume we'll have to run some R code by hand somewhere to produce the html.

RehabCatchment - repeated nodes

Hi @mpadge ,

generally looking good. I've made some changes in commit

3e00133

I should have done a PR and requested a review, but please check it over.

Another question - how are the map images created? is it mapview::mapshot ?

Python instructions for linux (mint/ubuntu)

Notes for testing the python notebooks under linux

My old mint quiana python 3.5 installation appears broken - will do further testing

Setup on a newer mint 18.

sudo apt-get install apt-get install python3-dev libspatialindex-dev
virtualenv --python=python3.5 venv
. venv bin activate
pip install -r requirements.txt
jupyter-notebook

@gboeing

Currently getting the following errors with example 1 - am I installing the wrong version of something?

gdf = gdf.to_crs(crs=ox.settings.default_crs)
RuntimeError                              Traceback (most recent call last)
<ipython-input-20-5d14e48a51a9> in <module>
----> 1 gdf = gdf.to_crs(crs=ox.settings.default_crs)

~/Projects/GeospatialStroke/python/venv/lib/python3.5/site-packages/geopandas/geodataframe.py in to_crs(self, crs, epsg, inplace)
    441         else:
    442             df = self.copy()
--> 443         geom = df.geometry.to_crs(crs=crs, epsg=epsg)
    444         df.geometry = geom
    445         df.crs = geom.crs

~/Projects/GeospatialStroke/python/venv/lib/python3.5/site-packages/geopandas/geoseries.py in to_crs(self, crs, epsg)
    302             except TypeError:
    303                 raise TypeError('Must set either crs or epsg for output.')
--> 304         proj_in = pyproj.Proj(self.crs, preserve_units=True)
    305         proj_out = pyproj.Proj(crs, preserve_units=True)
    306         project = partial(pyproj.transform, proj_in, proj_out)

~/Projects/GeospatialStroke/python/venv/lib/python3.5/site-packages/pyproj/__init__.py in __new__(self, projparams, preserve_units, **kwargs)
    360         # on case-insensitive filesystems).
    361         projstring = projstring.replace('EPSG','epsg')
--> 362         return _proj.Proj.__new__(self, projstring)
    363 
    364     def __call__(self, *args, **kw):

_proj.pyx in _proj.Proj.__cinit__()

RuntimeError: b'no arguments in initialization list'

RehabCatchments

@mpadge

Hi,
Reviewing the latest changes of RehabCatchments/README.Rmd

Lets get rid of all references to tmap. Stick with mapview only to keep it simple. If you can use mapview to do the geocoding, then replace the tmaptools::geocode_osm as well. Definitely get rid of the tmap display calls.
Sections 3 and 4 need to be reorganised a bit to match the methods (4 is empty at present)
There's something wrong with the figures in the last calculation - the number of strokes is very small, and I notice there is repeated use of the 1/100000 factor.
I don't think the calculation is quite right. At some point we need a per postcode calculation of the proportion of sampled locations going to each destination, those proportions get multiplied by the stroke count to get a count per destination per postcode. From there we get a sum per destination. This will need a group by postcode somewhere.
Confirm that the postcodes being simulated are the same as the python.

Final steps

@njtierney @mpadge @SymbolixAU @mdsumner @gboeing

Hi Everyone,
We're running into the end of our extra time, I'm afraid. We need to knock this off. I've done a reorganisation of the paper text and I have a set of tasks that will give us the final pieces. I'm taking an executive decision to assign tasks as follows:

Those tasks that involve writing, don't worry too much about the flow, get the ideas down in a coherent form in a section at the end of the document. Use Nick's morgue section for any new sections I assign.

@mpadge Finish off the catchment basin example by using the incidence ideas from the choropleth to compute per postcode stroke rates, then we compute strokes per service centre based on the proportion of randomly sampled cases that are assigned to a service centre. i.e if 30 % of the random cases from a postcode are assigned to service centre A, then 30% of the predicted stroke cases are assigned to it.

Ensure that neither example uses API keys.

Have a test section at the beginning that checks for required packages.

@SymbolixAU A discussion paragraph on API keys, what they are needed for and which common platforms need them.

Next, modify both examples to use only keys - specifically google for the various geocoding and distance calculations (you might need to drop the number of samples per suburb to avoid hitting getting charged), and replace all of the visualization steps with mapdeck

@njtierney Write a section on curated data. Ideally a US and or Canadian and European example to match what we've already mentioned. i.e. census data, including boundaries, address/position data bases + anything else that you can think of.
@gboeing Python versions of both examples without API keys. Also the versions with keys once available, if it makes sense. Send any ideas you have about the curated data sets, especially python friendly forms to @njtierney.

Still to do - once the examples are running we'll edit the text around them to suit

@mdsumner off the hook for now, but we'll be leaning heavily on you for proof reading, I think.

dodgr automatically removing impassable routes for given wt_profile?

Thanks for this excellent resource. I am currently working through a similar workflow for analysing the effects of relocating a fire station in Queesnland.

In Catchment Zones, Step 8 the following statement is made:

The resultant network has a d_weighted column which preferentially weights the distances for the nominated mode of tranport. Those parts of the network which are unsuitable for vehicular transport have values of .Machine$double.xmax = r .Machine$double.xmax. Because we want to align our random points to the routable component of the network, these need to be removed.

net <- net [which (net$d_weighted < .Machine$double.xmax), ]
nrow (net)
#> [1] 293853

I've found that on a different street network in Queensland, this step had no effect, as weight_streetnet() has dropped edges which were impassable for "motorvehicle". I've verified this by comparing the values of way_id present in the weighted network with the osm_id in the streetnet downloaded using dodgr_streetnet().

Is this step no longer requried? I couldn't find any reference to it in the dodgr NEWS file.

R versus python

To avoid junking up #12, I've created a new document at RehabCatchments/rvspy (in the README so can be directly viewed on github) that attempts to examine reasons for the different results generated by @gboeing in python and myself in R. To repeat the table in #12, these differences were in terms of final estimated case loads on each rehab centre (here just the relative percentages):

Destination	R	python
CaseyHospital	19.4	12.8
DandenongHospital	29.4	37.9
KingstonHospital	51.2	49.4

I had hypothesised there that differences could be due to

Thanks to input of @richardbeare, the R code uses the PSMA::fetch_postcode() function which generates random samples of actual street addresses within a postcode, where the python code uses a simple sample of network nodes

I've been uncertain for a while whether osmx.network_from_xxx -> networkx.shortest_path_length actually does the same thing as dodgr_dists(), and suspect in fact not. @gboeing Your wisdom greatly appreciated here, but in my potentially uniformed view, osmx.nework_from_xxx extracts the specified part of the network (here, network_type = "drive"), but does not actually weight that to generate weighted route preferences. The networkx.shortest_path_length calculation is then in linear km (or whatever), but absent a preferential weighting scheme. In contrast, dodgr calculates a dual graph with a weighted version used for preferential routing, and an unweighted version used for distance calculations. That could be the source of some discrepancy here?

The new document here compares both of those differences, yet generates quite robust estimates reflecting the previous R values, yet still failing to recreate the python values. Any insights, help, solutions appreciated here!

Markers in tmap

@mpadge @mdsumner @SymbolixAU

I have a problem with the marker on the choropleth at the end of the choropleth example. The marker shows up fine in the Rstudio viewer, and on the website when viewed with safari, but not with chrome or firefox. Same problem when I open locally with a browser from rstudio.

Visualization consistency

@mpadge
Quick note - I had a failure in the catchment basin example because I have mapdeck installed but hadn't set up keys.

Lets keep the use of keys in entirely separate documents so there's no danger of confusing people via failures.

I especially like tmap's ggplot style colouring for regions. If it doesn't scale to the number of points we have using markers either, try another type of point display or test map view for that example or downsample our addresses for display purposes.

Python setup for windows

@gboeing

I'm writing the supplementary section to describe how to get set up with python - essentially a duplication of what will be on the web site. A query - where do windows users type the commands? And does the miniconda setup set paths etc.

Can you elaborate the readme for windows users and I'll put it into the .tex.

Thanks

Choropleth - stroke by postcode

(I'll make some numbered suggestions for discussion, if you agree just thumbs-up and I will modify the example accordingly, otherwise please discuss!)

How about this for the stroke_count_estimate?

It simplifies the code somewhat by putting the array of incidence constants in directly against the right age-specific column, and doesn't require an indirect use of apply, or the extra function ComputeStrokeCount.

The current code https://github.com/SymbolixAU/GeospatialStroke/blob/ChoroplethMMC/mmc_surrounds.R#L54-L77 then becomes:

#################
## --- 1) 
basicDemographicsVIC <- mutate(basicDemographicsVIC, 
                       Age_0_24_yr_P = Age_0_4_yr_P + Age_5_14_yr_P + 
                         Age_15_19_yr_P + Age_20_24_yr_P)
basicDemographicsVIC <- mutate(basicDemographicsVIC, stroke_count_estimate = ( 
                   Age_0_24_yr_P  * 5 +  
                   Age_25_34_yr_P * 30 +   
                   Age_35_44_yr_P * 44 +  
                   Age_45_54_yr_P * 111 +  
                   Age_55_64_yr_P * 299 +  
                   Age_65_74_yr_P * 747 + 
                   Age_75_84_yr_P * 1928 +  
                   Age_85ov_P     * 3976) / 100000)

The other "tidyverse" way to map those values to the right column would require a named vector of the constants, and conversion to long form - which I think is less clear. The text might say that though, "in a database context these columns would be stored in long-form, with incidence as a join-able second table" - we can move from interactive exploratory mode to production mode.

API keys query

Can we use a google private key and client ID with googleway/mapbox?

critical eye

@mdsumner Final countdown is on. Not sure of your status now. The draft is getting close to submission. We're short on discussion at the moment, but will get some ideas from Thanh. Would value any suggestions on the article as it stands. Latex code in article/geospatial-stroke.tex. Also, please add your affiliations.