Comments (43)
Please reopen. It even brings additional errors now.
from geowombat.
@mmann1123 I didn't intentionally close this. Do we need to reopen?
from geowombat.
@willieseun can you paste a code snippet to show us how you created your predictions?
from geowombat.
from geowombat.
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
from glob import glob
import rioxarray
import os
import re
le = LabelEncoder()
psearlst = ['B1','B2','B3','B4','B5','B6','B7','B8','B9','B10']
labels = gpd.read_file('Z:\Projects\Project2022\Tif file Oct\trainingsamplesoct.shp')
labels['Classvalue'] = le.fit(labels.Classname).transform(labels.Classname)
print(labels)
predictors = ['something.tif']
with gw.open(predictors, resampling='bilinear', stack_dim="band", band_names=psearlst) as src:
pl = Pipeline([ ('scaler', StandardScaler()),
('pca', PCA()),
('clf', RandomForestClassifier(n_estimators=100, max_features=9))])
fig, ax = plt.subplots(dpi=200,figsize=(5,5))
X, Xy, clf = fit(src, pl, labels, col="Classvalue")
print('Starting to predict')
y = predict(src, X, clf)
y.plot(robust=True, ax=ax)
y.sel(time="t1").gw.to_raster("wom_RF.tif")
plt.tight_layout(pad=1)
plt.show()
from geowombat.
I hope you get it. It is not rendering it well enough.
from geowombat.
Just to add, because I don't want to open another issue.
It seems that the gauss resampling method is not available as a parameter when resampling.
from geowombat.
@willieseun I think I am getting close to the issue. Before we make any changes, can you check if the following creates the output that you are hoping for.
Note that I used geowombat
data because I don't have access to the data you used.
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
from geowombat.core import ndarray_to_xarray
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
le = LabelEncoder()
# psearlst = ['B1','B2','B3','B4','B5','B6','B7','B8','B9','B10']
psearlst = ['blue', 'green', 'red', 'nir', 'swir1', 'swir2']
labels = gpd.read_file(l8_224078_20200518_polygons) #gpd.read_file('Z:\Projects\Project2022\Tif file Oct\trainingsamplesoct.shp')
# Added 1 here
labels['Classvalue'] = le.fit(labels.name).transform(labels.name) + 1
fig, ax = plt.subplots(dpi=200, figsize=(5,5))
# Resampling for faster processing/testing
with gw.config.update(ref_res=500):
with gw.open(l8_224078_20200518, resampling='bilinear', stack_dim="band", band_names=psearlst) as src:
pl = Pipeline(
[
('scaler', StandardScaler()),
('pca', PCA()),
('clf', RandomForestClassifier(n_estimators=100))
]
)
X, Xy, clf = fit(src, pl, labels, col="name")
y = predict(src, X, clf)
# Convert the numpy array to a DataArray and add the 'no data' value
y = ndarray_to_xarray(
src,
y.astype('uint8'),
band_names=['estimates'],
row_chunks=64,
col_chunks=64,
attrs={
'crs': src.crs,
'res': src.res,
'transform': src.transform,
'nodatavals': (0,)
}
)
print(y)
y.gw.imshow(robust=True, ax=ax)
y.gw.save("wom_RF.tif", overwrite=True)
plt.tight_layout(pad=1)
from geowombat.
Ok, Let me check...
from geowombat.
It is returning a type error traceback.
TypeError: not all arguments converted during string formatting
from this line
X, Xy, clf = fit(src, pl, labels, col="name")
Note: I updated to the latest release
from geowombat.
The save function seems to be working well though.
from geowombat.
Apologies, it should be col='Classvalue'
not 'name'.
from geowombat.
It is still not working properly.
from geowombat.
The resulting tif file is looking bad.
from geowombat.
I think you should try using it with classes more than 10. The shapefile I am using has 12 classes. Maybe that is why.
from geowombat.
@willieseun Can you elaborate on what you mean by bad? Do you mean that it is still not rendering the nodata value properly, that the classified values are not correct, or that the classification does not look accurate? On the latter, if you are using the test data then I would not expect it to look good because the data being used are just test data, which are meant to show the utility of the function but not to produce a good map.
The only things we are addressing with this open issue are 1) nodata rendering in the classification and 2) the classified values are correct. Can you confirm that either of these are still not as expected?
If you would like us to reproduce any errors with your data then you will need to post a link to your dataset.
from geowombat.
I have attached the predicted map
from geowombat.
Can you point out the issue? It looks like there are ~10 classes, so is this from your dataset? The pixels also look to be resampled, so did you keep the gw.config.update(ref_res=500)
?
from geowombat.
It does look like zeros are being displayed, so if you attempted to set them as your nodata
value then something is still not working. However, we will have some updates to how nodata
values are handled coming up in #204.
from geowombat.
No. I changed it to 50.
from geowombat.
Okay, if you've modified anything then we need to see your code snippet in order to reproduce the results that you shared.
from geowombat.
What does 500 mean for clarification, 500m?
from geowombat.
What does 500 mean for clarification, 500m?
ref_res=value <CRS units>
, so in this case 500 is in meters.
from geowombat.
Ok
from geowombat.
import matplotlib.pyplot as plt
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
from geowombat.core import ndarray_to_xarray
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
import rioxarray
from pyspatialml import Raster
from pyspatialml.datasets import nc
from copy import deepcopy
le = LabelEncoder()
labels = gpd.read_file("trainingsamplesextentNEW.shp")
# Added 1 here
labels['Classvalues'] = le.fit(labels.Classname).transform(labels.Classname) + 1
predictors = ["MSSEP2020_RED.tif", "MSSEP2020_NIR.tif", "MSSEP2020_GREEN.tif", "MSSEP2020_REDEDGE.tif", "RGBSEP2020_BLUE.tif", "RGBSEP2020_GREEN.tif", "RGBSEP2020_GRAY.tif", "RGBSEP2020_RED.tif", "MSSEP2020_HEIGHT.tif", "RGBSEP2020_HEIGHT.tif",
"SEP_CTVI.tif", "SEP_GNDVI.tif", "SEP_KNDVI.tif", "SEP_MSAVI.tif", "SEP_MSAVI2.tif", "SEP_NDVI.tif", "SEP_NDWI.tif", "SEP_NRVI.tif", "SEP_SAVI.tif", "SEP_TTVI.tif"]
band_names = ["redms", "nirms", "greenms", "rededgems", "bluergb", "greenrgb", "grayrgb", "redrgb", "heightms", "heightrgb", "ctvi", "gndvi", "kndvi", "msavi", "msavi2", "ndvi", "ndwi", "nrvi", "savi", "ttvi"]
b_n = []
pred = ["MSSEP2020_RED.tif"]
stack1 = Raster(pred)
stack1.write('SEP_all.tif')
for x in range(1, len(predictors)+1):
b_n.append(x)
# Use a data pipeline
fig, ax = plt.subplots(dpi=200, figsize=(5,5))
# Resampling for faster processing/testing
with gw.config.update(ref_res=50):
with gw.open("SEP_all.tif", resampling='bilinear') as src:
pl = Pipeline(
[
('scaler', StandardScaler()),
('pca', PCA()),
('clf', RandomForestClassifier(n_estimators=100, max_features=3))
]
)
print('Starting to fit')
X, Xy, clf = fit(src, pl, labels, col="Classvalues")
print('Starting to predict')
y = predict(src, X, clf)
# Convert the numpy array to a DataArray and add the 'no data' value
y = ndarray_to_xarray(
src,
y.astype('uint8'),
band_names=['estimates'],
row_chunks=64,
col_chunks=64,
attrs={
'crs': src.crs,
'res': src.res,
'transform': src.transform,
'nodatavals': (0,)
}
)
print(y)
y.gw.imshow(robust=True, ax=ax)
y.gw.save("wom_RF1.tif", overwrite=True)
plt.tight_layout(pad=1)
plt.show()
from geowombat.
@willieseun below is a code snippet that masks the nodata
values in the predictions, plots the data and hides nodata
, and correctly saves the nodata
values to file. Note that I still cannot reproduce your specific errors because I don't have access to your data. If you install the latest version (geowombat==2.0.6
from #204) you should be able to reproduce the code 👇 . We still need another PR to make some changes in the ML module (@mmann1123) so that you don't have to modify the DataArray
like below. But at least this hopefully helps address your issue. Let us know if it doesn't.
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict
import geopandas as gpd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
def main():
predictors = ['blue', 'green', 'red', 'nir', 'swir1', 'swir2']
labels = gpd.read_file(l8_224078_20200518_polygons)
fig, ax = plt.subplots(dpi=200, figsize=(5,5))
# Resampling for faster processing/testing
with gw.config.update(ref_res=500):
with gw.open(
l8_224078_20200518,
resampling='bilinear',
stack_dim='band',
band_names=predictors
) as src:
pl = Pipeline(
[
('scaler', StandardScaler()),
('pca', PCA()),
('clf', RandomForestClassifier(n_estimators=100))
]
)
X, Xy, clf = fit(src, pl, labels, col="name")
y = predict(src, X, clf)
y = (
y.astype('uint8')
# Coerce from numpy array to dask array (for gw.save())
# you could borrow chunks from src.gw.row_chunks, but the gw.save()
# method and rasterio require the blocks to be in intervals of 16
.chunk({'band': -1, 'y': 64, 'x': 64})
# Assign geo-attributes
.assign_attrs(**src.attrs)
# Set the 'no data' attribute
.gw.assign_nodata_attrs(0)
# Convert 'no data' values to nans
.gw.mask_nodata()
)
print(y)
y.gw.imshow(robust=True, ax=ax)
y.gw.save("wom_RF.tif", overwrite=True)
plt.tight_layout(pad=1)
plt.savefig('test.png')
if __name__ == '__main__':
main()
from geowombat.
Testing this with your data, this is my result.
from geowombat.
Yeah sorry, that's my fault. Something went sideways. I will need some patience.
from geowombat.
Alright. You have all the time.
from geowombat.
Ok this should be resolved soon.
from geowombat.
@willieseun This should be resolved and the new build pushed to conda-forge as well. Please note you need to specify your missing data value (if its not in the tif profile) by setting nodata
in gw.open
.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(dpi=200)
with gw.config.update(
ref_res=300,
):
with gw.open(l8_224078_20200518, nodata=0) as src:
y1 = fit_predict(src, pl_wo_feat, aoi_poly, col="lc")
y1.sel(band=["targ"]).gw.imshow(robust=True, ax=ax)
Closing this issue unless I hear otherwise.
from geowombat.
@willieseun Is this resolved?
from geowombat.
I am unable to update to the latest release currently.
from geowombat.
from geowombat.
No I use pip
from geowombat.
@willieseun I would recommend trying out miniconda and doing an install from conda-forge. Everything should be working now sorry that took a while to resolve but I am going to close this.
from geowombat.
Still not working
from geowombat.
Can you share some data and your script? I can't replicate on this side.
from geowombat.
@willieseun from the original post
I tried plotting the resulting .tif file in ArcMap and first, the predicted classes were not up to the number of classes in my training data. Second, I wasn't able to build unique values for the raster and the black bands around the raster are not disappearing when I set to display nodata value with no colour.
Issue #1: predicted classes do not match the input training classes
Issue #2: 'No data' values are not hidden in a GIS
Can you please describe what is not working? Is it still both of the issues that you raised? For predictions, unless we can replicate the issue with our test data, we need an example of your data if you are able to share somehow.
from geowombat.
Sorry for being so ambiguous. I meant I wasn't able to install with anaconda.
from geowombat.
I might be able to help interpret if you send the error from the install. In your terminal window try:
conda create geowom_env python=3.9
conda activate geowom_env
conda config --add channels conda-forge
conda config --set channel_priority strict
conda install geowombat
from geowombat.
Thanks @mmann1123 and @willieseun, if this remains an issue, please open a new issue with your conda
install blocker.
from geowombat.
Thanks, I have been able to install the new version
from geowombat.
Related Issues (20)
- Ml no data HOT 2
- gw.open bug?? HOT 5
- Apply example
- Apply on bands error HOT 1
- Apply HOT 2
- Expand pinned python version HOT 1
- Feedstock Issues HOT 1
- Failed to Install Geowombat on pip and conda HOT 12
- Requirement differences between package & conda-forge HOT 6
- Coordinating efforts on potential new geospatial Xarray accessor HOT 2
- Regression capability in geowombat HOT 3
- ARM Installation Issues HOT 9
- Apply interpolation HOT 4
- Stac import HOT 2
- series.apply big tiff / bad block
- EVI equation is wrong HOT 2
- Support for pyproj 3.4.1? HOT 6
- Slow extract HOT 2
- mosaic bounds HOT 4
- fit_predict
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from geowombat.