Hi, I am following your data borrowing tutorial to calculate the spatial lag varia

replace any nan in kW.weights with kW.sparse.max() <p d

spatial kernel weight (spatial lag variable) calculated with NAN about scipy2018-geospatial-data HOT 7 CLOSED

jy03189211 commented on September 26, 2024

spatial kernel weight (spatial lag variable) calculated with NAN

from scipy2018-geospatial-data.

Comments (7)

ljwolf commented on September 26, 2024

Hmm... not really sure what you mean. I've run exactly your code with df=listings in the data borrowing notebook and everything works fine. Are you using a different df?

Also, a bit odd to use the df.loc[:,columns] construct? I'd just use df[columns] unless you're actually slicing on an index & column simultaneously.

from scipy2018-geospatial-data.

jy03189211 commented on September 26, 2024

Yes, sorry I didn't mention that I am using my own dataset.
I figure out eventually when I have a fixed bandwidth, I will always have all the weights calculated
but if the bandwidth is adaptive or endogenous adaptive, it is not promising to get all the weights due to the The weights matrix is not fully connected error with the k increases the number of non calculated weights(NAN) will decrease. Is there a way to get a suitable k=n, such that all the weights will get calculated?

from scipy2018-geospatial-data.

ljwolf commented on September 26, 2024

Thanks for the prompt response. Can you please post the data so I can try to replicate what you're seeing?

but if the bandwidth is adaptive or endogenous adaptive, it is not promising to get all the weights due to the the weights matrix is not fully connected error.

That is a warning, and it shouldn't affect the ability of lag_spatial(W,X) to return valid results. The warning is telling you that the graph of observations is disconnected, and that there are other fully-connected subgraphs. But, in each of those graphs, the spatial lag exists, so the warning shouldn't affect what you're seeing.

it's hard for me to explore your issue without data. In general, no we don't offer a way to compute the minimum connected KNN graph, but would welcome contributions that implement that. That said, a solution to that might not solve your problem.

from scipy2018-geospatial-data.

jy03189211 commented on September 26, 2024

For example,When we choose k such as 2,3,4,5, We should get some nan values, if the fixed is True, nan will be gone. For a a bigger dataset fixed bandwidth costs significantly more time to calculate. So I would prefer adaptive bandwidth.

df['geometry'] = df[['longitude','latitude']].apply(shp.Point, axis=1)
kW = lp.weights.Kernel.from_dataframe(df['geometry'],geom_col='geometry', fixed=False,function='gaussian',k=2)
kW = fill_diagonal(kW,0)
kW.transform = 'r'
WX = lp.weights.lag_spatial(kW, kvkl['price'])
print(np.isnan(WX).sum())

price,longitude,latitude
3805,24.95322990,60.18914032
3680,24.95712090,60.18714905
4475,24.96603012,60.18764114
3851,24.95573044,60.18740845
4943,24.95982933,60.18661880
3163,24.94256973,60.16741180
3361,24.96338081,60.18872833
4810,24.94164085,60.18502045
4248,24.95738029,60.18653107
3137,24.96570015,60.18777847
3512,24.95331955,60.18817902
4188,24.95322990,60.18914032
3685,24.96313095,60.18497849
3610,24.95982933,60.18661880
3835,24.95647049,60.18709946
3457,24.96338081,60.18872833
4180,24.94977951,60.19038010
4483,24.95911026,60.18727112
4034,24.95626068,60.18915939
4266,24.96352959,60.18471146
4341,24.95483971,60.18518066
3276,24.96705055,60.18852997
4065,24.95594025,60.18703842
3371,24.96491051,60.18891907
4205,24.95573044,60.18740845
4826,24.95594025,60.18703842
3620,24.96442032,60.18899155
4480,24.96491051,60.18891907
3233,24.96442032,60.18899155
3907,24.95899963,60.18843842
3680,24.95594025,60.18703842
3730,24.96147919,60.18846893
3367,24.96132088,60.18746185
3680,24.95804024,60.18869019
3480,24.96027946,60.18914032
3686,24.95322990,60.18869019
4123,24.95573044,60.18740845
4438,24.95965004,60.18706131
4187,24.95842934,60.18822098
4736,24.96231079,60.18661118
3599,24.95347977,60.18626022
4071,24.96237946,60.18714905
3257,24.96162033,60.18656158
3744,24.95879936,60.18883896
4430,24.95369911,60.18685150
3915,24.95929909,60.18495941
3595,24.96022034,60.18490982
3960,24.95747948,60.18753052
4003,24.95347977,60.18626022
4469,24.94977951,60.19038010
3172,24.96179962,60.18690109
3777,24.95579910,60.18788147
4158,24.95879936,60.18883896
4609,24.94256973,60.16741180
3474,24.96192932,60.18725967
4513,24.94256973,60.16741180
3856,24.95825005,60.18807983
3338,24.95984077,60.18838882
3614,24.95495987,60.18695831
3305,24.95746040,60.18917084
4260,24.96006012,60.18616104
3567,24.96538925,60.18529892
3389,24.96022034,60.18490982
4024,24.95577049,60.18820953
4195,24.95322990,60.18869019
4464,25.04997063,60.24179840
3115,24.95556068,60.18920135
4077,24.95879936,60.18883896
4278,24.95668030,60.18601990
3881,24.95879936,60.18883896
4257,24.96314049,60.18832016
4310,24.95899963,60.18843842
3207,24.96162033,60.18656158
3413,24.95726967,60.18712997
3580,24.95348930,60.18642044
4028,24.96369934,60.18556976
4200,24.95594025,60.18703842
4223,24.95577049,60.18820953
3904,24.96132088,60.18746185
3806,24.95395088,60.18920135
4004,24.95499039,60.18846893
3751,24.95816994,60.18918991
3800,24.95825005,60.18807983
3225,24.96338081,60.18872833
3659,24.95747948,60.18753052
4138,24.95612907,60.18902969
3806,24.96313095,60.18497849
3532,24.95879936,60.18883896
3930,24.95494080,60.18688965
3725,24.96603012,60.18867111
3959,24.95951080,60.18688965
4283,24.96022034,60.18490982
3225,24.96338081,60.18872833
3930,24.96192932,60.18725967
3427,24.95429039,60.18688965
4151,24.95495987,60.18695831
4095,24.96680069,60.18708038
4083,24.95347977,60.18626022
3930,24.96429062,60.18901062
3786,24.95331955,60.18817902

from scipy2018-geospatial-data.

ljwolf commented on September 26, 2024

Wonderful! Thank you for the data! this is excellent.

As I suspected, this is due to coincident points, not graph disconnection.

n_unique_points = len(set(map(tuple, data[['longitude', 'latitude']].values)))
assert n_unique_points == len(data) # fails!

What's happening in your case is that the adaptive bandwidth is zero for some site. It's zero because there are at least k observations at that site, so we don't have to leave the site to borrow any data. When the adaptive bandwidth is zero for a site, the standardization returns nan from a divide by zero, since bw[i] == 0. This is that first warning in your warning stack.

The "k" you're looking for is not the one that connects the graph. Rather, it's the smallest k where the kth neighbor of every site is not co-located at that site.

I'm real sorry the weights construction stuff has this issue, as it's hit me and some colleagues in the past, too. We've simply not had the time to resolve it. As a stopgap, you can:

replace any nan in kW.weights with kW.sparse.max(),
Use a bigger k so that you're always leaving the site to borrow data,
Add a very small bit of noise to the point's latitude and longitude.

We'll try to fix this when we can, but upvote that issue and it'll be easier to get dedicated attention.

from scipy2018-geospatial-data.

jy03189211 commented on September 26, 2024

replace any nan in kW.weights with kW.sparse.max()

will throw errors because there are nan in the matrix
So I did something like this np.nanmax(kW.sparse.toarray()), but I'm not sure if the sum of the all the weights should be 1, if the sum of the max value is larger than one, then the final lag variable will be larger.

What about an average weighting number for coincident points by using 1/k so that the sum weights of the k neighbours should be 1 ?

from scipy2018-geospatial-data.

ljwolf commented on September 26, 2024

I'm not sure if the sum of all the weights should be 1

Do not do the row-standardized transform until after you do the substitution. If you do the w.transform = 'r' afterwards, there will be no issue.

what about...

this is exactly what will happen with the advice I gave you if you do the transformation after the replacement. Think of the row-standardization w.transform = 'r' as the final step in the workflow, before computing the lag. Anything you do to the weights should happen before you row-standardize.

from scipy2018-geospatial-data.

spatial kernel weight (spatial lag variable) calculated with NAN about scipy2018-geospatial-data HOT 7 CLOSED

Comments (7)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent