We should find a way of exposing sindex attribute eac

ENH: expose spatial index and `query_bulk` about dask-geopandas HOT 2 OPEN

geopandas commented on September 26, 2024

ENH: expose spatial index and `query_bulk`

from dask-geopandas.

Comments (2)

martinfleis commented on September 26, 2024

I am looking at some other examples I use and I guess that most of then can be done via overlay in the end, but likely with an avoidable overhead.

Network orientation deviation example

Having a street network (LineString geometry), I am interested in a mean deviation of orientation between each LineString and its neighbours (intersecting ones).

# query geometries
inp, res = gdf.sindex.query_bulk(gdf.geometry, predicate="intersects")

# remove self
itself = inp == res
inp = inp[~itself]
res = res[~itself]

# get orieantation values based on the query (no geometry involved anymore)
left = gdf["orientation"].take(inp).reset_index(drop=True)
right = gdf.["orientation"].take(res).reset_index(drop=True)

# get a difference
deviations = (left - right).abs()

# get mean deviation
results = deviations.groupby(inp).mean()

This can be done via overlay as well but there's a lot of overhead in doing intersections that are not necessary. If you want network to play with, there's one in momepy: geopandas.read_file(momepy.datasets.get_path('bubenec'), layer="streets").

Mark polygons to work with

This example has enclosures gdf with block-like polygons and buildings footprints. For further steps, I am interested in which enclosures contain 1 building, which contain more than one and which contain no building. Again, there's no geometric operation after query, rendering overlay sub-optimal (though it would be possible to do it that way).

# determine which polygons should be split
inp, res = buildings.sindex.query_bulk(
    enclosures.geometry, predicate="intersects"
)
unique, counts = np.unique(inp, return_counts=True)
splits = unique[counts > 1]
single = unique[counts == 1]

from dask-geopandas.

brendan-ward commented on September 26, 2024

For using a query_bulk operation against the same input GeoDataFrame as used to create the spatial index, is there also an issue with symmetric pairs? I.e., the code above filters out self-intersections (AA and BB), but seems like it would still produce both sides of a symmetric pair (AB, BA), right?

For queries against the same input as spatial index, we may want to consider exposing that as a separate API specifically to handle some of those issues at a lower level (i.e., within a given partition) rather than aggregating up all results across partitions, or whatever the larger API is across partitions. But this depends on the degree to which we want to generalize this at a higher level.

Somewhat related nearest neighbor join issue in pygeos; the idea being a different lower-level API around spatial index operations that specifically handle excluding self-joins from the results.

from dask-geopandas.

Recommend Projects

ENH: expose spatial index and `query_bulk` about dask-geopandas HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent