Predicted .tif file assigns a class to zero,about jgrss/geowombat

Comments (43)

willieseun commented on May 24, 2024 1

Please reopen. It even brings additional errors now.

from geowombat.

jgrss commented on May 24, 2024

@mmann1123 I didn't intentionally close this. Do we need to reopen?

from geowombat.

jgrss commented on May 24, 2024

@willieseun can you paste a code snippet to show us how you created your predictions?

from geowombat.

mmann1123 commented on May 24, 2024

I think in the latest ml updates I resolved the issue of dropping one of the prediction classes. Missing values I'm not sure.

…

On Wed, Sep 14, 2022, 10:22 AM Jordan Graesser ***@***.***> wrote: @willieseun <https://github.com/willieseun> can you paste a code snippet to show us how you created your predictions? — Reply to this email directly, view it on GitHub <#193 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHR6VAFIP4535TSVFL5KELV6HNR3ANCNFSM6AAAAAAQEZGFME> . You are receiving this because you were mentioned.Message ID: ***@***.***>

from geowombat.

willieseun commented on May 24, 2024

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
from glob import glob
import rioxarray
import os
import re
le = LabelEncoder()

psearlst = ['B1','B2','B3','B4','B5','B6','B7','B8','B9','B10']

labels = gpd.read_file('Z:\Projects\Project2022\Tif file Oct\trainingsamplesoct.shp')
labels['Classvalue'] = le.fit(labels.Classname).transform(labels.Classname)
print(labels)
predictors = ['something.tif']
with gw.open(predictors, resampling='bilinear', stack_dim="band", band_names=psearlst) as src:
pl = Pipeline([ ('scaler', StandardScaler()),
('pca', PCA()),
('clf', RandomForestClassifier(n_estimators=100, max_features=9))])
fig, ax = plt.subplots(dpi=200,figsize=(5,5))

        X, Xy, clf = fit(src, pl, labels, col="Classvalue")
	print('Starting to predict')
	y = predict(src, X, clf)
	y.plot(robust=True, ax=ax)
	y.sel(time="t1").gw.to_raster("wom_RF.tif")
	plt.tight_layout(pad=1)
	plt.show()

from geowombat.

willieseun commented on May 24, 2024

I hope you get it. It is not rendering it well enough.

from geowombat.

willieseun commented on May 24, 2024

Just to add, because I don't want to open another issue.
It seems that the gauss resampling method is not available as a parameter when resampling.

from geowombat.

jgrss commented on May 24, 2024

@willieseun I think I am getting close to the issue. Before we make any changes, can you check if the following creates the output that you are hoping for.

Note that I used geowombat data because I don't have access to the data you used.

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
from geowombat.core import ndarray_to_xarray

import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt

le = LabelEncoder()

# psearlst = ['B1','B2','B3','B4','B5','B6','B7','B8','B9','B10']
psearlst = ['blue', 'green', 'red', 'nir', 'swir1', 'swir2']

labels = gpd.read_file(l8_224078_20200518_polygons) #gpd.read_file('Z:\Projects\Project2022\Tif file Oct\trainingsamplesoct.shp')
# Added 1 here
labels['Classvalue'] = le.fit(labels.name).transform(labels.name) + 1

fig, ax = plt.subplots(dpi=200, figsize=(5,5))

# Resampling for faster processing/testing
with gw.config.update(ref_res=500):
    with gw.open(l8_224078_20200518, resampling='bilinear', stack_dim="band", band_names=psearlst) as src:
        pl = Pipeline(
            [
                ('scaler', StandardScaler()),
                ('pca', PCA()),
                ('clf', RandomForestClassifier(n_estimators=100))
            ]
        )

        X, Xy, clf = fit(src, pl, labels, col="name")
        y = predict(src, X, clf)
       # Convert the numpy array to a DataArray and add the 'no data' value
        y = ndarray_to_xarray(
            src,
            y.astype('uint8'),
            band_names=['estimates'],
            row_chunks=64,
            col_chunks=64,
            attrs={
                'crs': src.crs,
                'res': src.res,
                'transform': src.transform,
                'nodatavals': (0,)
            }
        )
        print(y)
        y.gw.imshow(robust=True, ax=ax)
        y.gw.save("wom_RF.tif", overwrite=True)
        plt.tight_layout(pad=1)

from geowombat.

willieseun commented on May 24, 2024

Ok, Let me check...

from geowombat.

willieseun commented on May 24, 2024

It is returning a type error traceback.
TypeError: not all arguments converted during string formatting
from this line
X, Xy, clf = fit(src, pl, labels, col="name")

Note: I updated to the latest release

from geowombat.

willieseun commented on May 24, 2024

The save function seems to be working well though.

from geowombat.

jgrss commented on May 24, 2024

Apologies, it should be col='Classvalue' not 'name'.

from geowombat.

willieseun commented on May 24, 2024

It is still not working properly.

from geowombat.

willieseun commented on May 24, 2024

The resulting tif file is looking bad.

from geowombat.

willieseun commented on May 24, 2024

I think you should try using it with classes more than 10. The shapefile I am using has 12 classes. Maybe that is why.

from geowombat.

jgrss commented on May 24, 2024

@willieseun Can you elaborate on what you mean by bad? Do you mean that it is still not rendering the nodata value properly, that the classified values are not correct, or that the classification does not look accurate? On the latter, if you are using the test data then I would not expect it to look good because the data being used are just test data, which are meant to show the utility of the function but not to produce a good map.

The only things we are addressing with this open issue are 1) nodata rendering in the classification and 2) the classified values are correct. Can you confirm that either of these are still not as expected?

If you would like us to reproduce any errors with your data then you will need to post a link to your dataset.

from geowombat.

willieseun commented on May 24, 2024

I have attached the predicted map

from geowombat.

jgrss commented on May 24, 2024

Can you point out the issue? It looks like there are ~10 classes, so is this from your dataset? The pixels also look to be resampled, so did you keep the gw.config.update(ref_res=500)?

from geowombat.

jgrss commented on May 24, 2024

It does look like zeros are being displayed, so if you attempted to set them as your nodata value then something is still not working. However, we will have some updates to how nodata values are handled coming up in #204.

from geowombat.

willieseun commented on May 24, 2024

No. I changed it to 50.

from geowombat.

jgrss commented on May 24, 2024

Okay, if you've modified anything then we need to see your code snippet in order to reproduce the results that you shared.

from geowombat.

willieseun commented on May 24, 2024

What does 500 mean for clarification, 500m?

from geowombat.

jgrss commented on May 24, 2024

What does 500 mean for clarification, 500m?

ref_res=value <CRS units>, so in this case 500 is in meters.

from geowombat.

willieseun commented on May 24, 2024

from geowombat.

willieseun commented on May 24, 2024

import matplotlib.pyplot as plt
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
from geowombat.core import ndarray_to_xarray
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
import rioxarray
from pyspatialml import Raster
from pyspatialml.datasets import nc
from copy import deepcopy

le = LabelEncoder()
labels = gpd.read_file("trainingsamplesextentNEW.shp")
# Added 1 here
labels['Classvalues'] = le.fit(labels.Classname).transform(labels.Classname) + 1

predictors = ["MSSEP2020_RED.tif", "MSSEP2020_NIR.tif", "MSSEP2020_GREEN.tif", "MSSEP2020_REDEDGE.tif", "RGBSEP2020_BLUE.tif", "RGBSEP2020_GREEN.tif", "RGBSEP2020_GRAY.tif", "RGBSEP2020_RED.tif", "MSSEP2020_HEIGHT.tif", "RGBSEP2020_HEIGHT.tif",
"SEP_CTVI.tif", "SEP_GNDVI.tif", "SEP_KNDVI.tif", "SEP_MSAVI.tif", "SEP_MSAVI2.tif", "SEP_NDVI.tif", "SEP_NDWI.tif", "SEP_NRVI.tif", "SEP_SAVI.tif", "SEP_TTVI.tif"]
band_names = ["redms", "nirms", "greenms", "rededgems", "bluergb", "greenrgb", "grayrgb", "redrgb", "heightms", "heightrgb", "ctvi", "gndvi", "kndvi", "msavi", "msavi2", "ndvi", "ndwi", "nrvi", "savi", "ttvi"]
b_n = []
pred = ["MSSEP2020_RED.tif"]
stack1 = Raster(pred)
stack1.write('SEP_all.tif')
for x in range(1, len(predictors)+1):
    b_n.append(x)
# Use a data pipeline

fig, ax = plt.subplots(dpi=200, figsize=(5,5))

# Resampling for faster processing/testing
with gw.config.update(ref_res=50):
    with gw.open("SEP_all.tif", resampling='bilinear') as src:
        pl = Pipeline(
            [
                ('scaler', StandardScaler()),
                ('pca', PCA()),
                ('clf', RandomForestClassifier(n_estimators=100,  max_features=3))
            ]
        )
        print('Starting to fit')
        X, Xy, clf = fit(src, pl, labels, col="Classvalues")
        print('Starting to predict')
        y = predict(src, X, clf)
       # Convert the numpy array to a DataArray and add the 'no data' value
        y = ndarray_to_xarray(
            src,
            y.astype('uint8'),
            band_names=['estimates'],
            row_chunks=64,
            col_chunks=64,
            attrs={
                'crs': src.crs,
                'res': src.res,
                'transform': src.transform,
                'nodatavals': (0,)
            }
        )
        print(y)
        y.gw.imshow(robust=True, ax=ax)
        y.gw.save("wom_RF1.tif", overwrite=True)
        plt.tight_layout(pad=1)
plt.show()

from geowombat.

jgrss commented on May 24, 2024

@willieseun below is a code snippet that masks the nodata values in the predictions, plots the data and hides nodata, and correctly saves the nodata values to file. Note that I still cannot reproduce your specific errors because I don't have access to your data. If you install the latest version (geowombat==2.0.6 from #204) you should be able to reproduce the code 👇 . We still need another PR to make some changes in the ML module (@mmann1123) so that you don't have to modify the DataArray like below. But at least this hopefully helps address your issue. Let us know if it doesn't.

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict

import geopandas as gpd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt


def main():
    predictors = ['blue', 'green', 'red', 'nir', 'swir1', 'swir2']
    labels = gpd.read_file(l8_224078_20200518_polygons)

    fig, ax = plt.subplots(dpi=200, figsize=(5,5))

    # Resampling for faster processing/testing
    with gw.config.update(ref_res=500):
        with gw.open(
            l8_224078_20200518,
            resampling='bilinear',
            stack_dim='band',
            band_names=predictors
        ) as src:
            pl = Pipeline(
                [
                    ('scaler', StandardScaler()),
                    ('pca', PCA()),
                    ('clf', RandomForestClassifier(n_estimators=100))
                ]
            )

            X, Xy, clf = fit(src, pl, labels, col="name")
            y = predict(src, X, clf)
            y = (
                y.astype('uint8')
                # Coerce from numpy array to dask array (for gw.save())
                # you could borrow chunks from src.gw.row_chunks, but the gw.save()
                # method and rasterio require the blocks to be in intervals of 16
                .chunk({'band': -1, 'y': 64, 'x': 64})
                # Assign geo-attributes
                .assign_attrs(**src.attrs)
                # Set the 'no data' attribute
                .gw.assign_nodata_attrs(0)
                # Convert 'no data' values to nans
                .gw.mask_nodata()
            )
            print(y)
            y.gw.imshow(robust=True, ax=ax)
            y.gw.save("wom_RF.tif", overwrite=True)
            plt.tight_layout(pad=1)
            plt.savefig('test.png')


if __name__ == '__main__':
    main()

from geowombat.

willieseun commented on May 24, 2024

Testing this with your data, this is my result.

from geowombat.

mmann1123 commented on May 24, 2024

Yeah sorry, that's my fault. Something went sideways. I will need some patience.

from geowombat.

willieseun commented on May 24, 2024

Alright. You have all the time.

from geowombat.

mmann1123 commented on May 24, 2024

Ok this should be resolved soon.

from geowombat.

mmann1123 commented on May 24, 2024

@willieseun This should be resolved and the new build pushed to conda-forge as well. Please note you need to specify your missing data value (if its not in the tif profile) by setting nodata in gw.open.

import matplotlib.pyplot as plt

fig, ax = plt.subplots(dpi=200)

with gw.config.update(
    ref_res=300,
):
    with gw.open(l8_224078_20200518, nodata=0) as src:
        y1 = fit_predict(src, pl_wo_feat, aoi_poly, col="lc")

        y1.sel(band=["targ"]).gw.imshow(robust=True, ax=ax)

Closing this issue unless I hear otherwise.

from geowombat.

mmann1123 commented on May 24, 2024

@willieseun Is this resolved?

from geowombat.

willieseun commented on May 24, 2024

I am unable to update to the latest release currently.

from geowombat.

mmann1123 commented on May 24, 2024

Do you have anaconda installed?

…

On Mon, Sep 26, 2022, 6:14 AM willieseun ***@***.***> wrote: I am unable to update to the latest release currently. — Reply to this email directly, view it on GitHub <#193 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHR6VB36F4MQ5LHNGWWYKLWAFZQ3ANCNFSM6AAAAAAQEZGFME> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

from geowombat.

willieseun commented on May 24, 2024

No I use pip

from geowombat.

mmann1123 commented on May 24, 2024

@willieseun I would recommend trying out miniconda and doing an install from conda-forge. Everything should be working now sorry that took a while to resolve but I am going to close this.

from geowombat.

willieseun commented on May 24, 2024

Still not working

from geowombat.

mmann1123 commented on May 24, 2024

Can you share some data and your script? I can't replicate on this side.

from geowombat.

jgrss commented on May 24, 2024

@willieseun from the original post

I tried plotting the resulting .tif file in ArcMap and first, the predicted classes were not up to the number of classes in my training data. Second, I wasn't able to build unique values for the raster and the black bands around the raster are not disappearing when I set to display nodata value with no colour.

Issue #1: predicted classes do not match the input training classes
Issue #2: 'No data' values are not hidden in a GIS

Can you please describe what is not working? Is it still both of the issues that you raised? For predictions, unless we can replicate the issue with our test data, we need an example of your data if you are able to share somehow.

from geowombat.

willieseun commented on May 24, 2024

Sorry for being so ambiguous. I meant I wasn't able to install with anaconda.

from geowombat.

mmann1123 commented on May 24, 2024

I might be able to help interpret if you send the error from the install. In your terminal window try:

conda create geowom_env python=3.9
conda activate geowom_env
conda config --add channels conda-forge
conda config --set channel_priority strict
conda install geowombat

from geowombat.

jgrss commented on May 24, 2024

Thanks @mmann1123 and @willieseun, if this remains an issue, please open a new issue with your conda install blocker.

from geowombat.

willieseun commented on May 24, 2024

Thanks, I have been able to install the new version

from geowombat.

Predicted .tif file assigns a class to zero about geowombat HOT 43 CLOSED

Comments (43)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent