Comments (26)
I was able to reproduce with fake data, so I should have what I need to diagnose.
import geowombat as gw
import dask.array as da
import numpy as np
import geopandas as gpd
import xarray as xr
from affine import Affine
from shapely.geometry import box
num_time = 12
num_bands = 5
num_rows = 16400
num_cols = 20596
res = (3.3217508595230054, 3.321750859522998)
left, bottom, right, top = (455303.8340072106, 5061403.560293189, 523715.29295908695, 5115876.952638507)
x = np.linspace(left, right, num_cols)
y = np.linspace(top, bottom, num_rows)
data = xr.DataArray(
da.random.random(
(num_time, num_bands, num_rows, num_cols), chunks=(-1, -1, 128, 128)
),
dims=('time', 'band', 'y', 'x'),
coords={
'time': range(1, num_time + 1),
'band': [f"band{b}" for b in range(1, num_bands + 1)],
'y': y,
'x': x,
},
attrs={
'transform': list(Affine(
res[0], 0.0, left, 0.0, -res[1], top
)),
'crs': 'EPSG:6339',
'res': res,
'nodatavals': (0.0, 0.0, 0.0, 0.0, 0.0),
'_FillValue': 0.0,
},
)
poly_left, poly_bottom, poly_right, poly_top = 472449.7271, 5063777.276, 510789.9898, 5106120.4848
poly1 = box(poly_left, poly_top - 1000, poly_left + 1000, poly_top)
poly2 = box(poly_right - 1000, poly_bottom, poly_right, poly_bottom + 1000)
training_polys = gpd.GeoDataFrame(
data=[0, 1],
columns=['class'],
geometry=[poly1, poly2],
crs='EPSG:6339',
)
poly_array = gw.polygon_to_array(
training_polys,
col="class",
data=data,
bounds_by="union",
)
print(data)
xarray.DataArray 'random_sample-b383b64724d8ff4e461e9131e8739640'
time: 12 band: 5 y: 16400 x: 20596
print(poly_array)
xarray.DataArray 'array-c8fefabbbbd90f2375e1a36a7114dc8c'
band: 1 y: 16399 x: 20596
from geowombat.
I am still seeing 16399
after installing issue_310
branch
#pip install git+https://github.com/jgrss/geowombat@jgrss/issue_310
<xarray.DataArray 'array-74bf0873a2d08172017317f9392aa551' (band: 1, y: 16399,
x: 20596)>
dask.array<array, shape=(1, 16399, 20596), dtype=uint8, chunksize=(1, 1, 20596), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.3217508595230054, 0.0, 455302.1731317809, 0.0, -3.32175085...
crs: EPSG:6339
res: (3.3217508595230054, 3.321750859522998)
is_tiled: 1
from geowombat.
I pulled down the /reorg
branch and it looks like that does correct the initial image read dimensions, now matching rasterio
/GDAL
/...
with gw.open(
filenames,
) as src:
print(src)
<xarray.DataArray (time: 12, band: 5, y: 16376, x: 16462)> Size: 32GB
dask.array<concatenate, shape=(12, 5, 16376, 16462), dtype=uint16, chunksize=(1, 5, 1, 16462), chunktype=numpy.ndarray>
Coordinates:
* x (x) float64 132kB 4.69e+05 4.69e+05 ... 5.237e+05 5.237e+05
* y (y) float64 131kB 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* time (time) int64 96B 1 2 3 4 5 6 7 8 9 10 11 12
* band (band) int64 40B 1 2 3 4 5
Attributes: (12/14)
transform: (3.321750859523004, 0.0, 468984.71685261483, 0.0, -3...
crs: 6339
res: (3.321750859523004, 3.3217508595230103)
is_tiled: 1
nodatavals: (0.0, 0.0, 0.0, 0.0, 0.0)
_FillValue: 0.0
... ...
filename: ['myfile1.tif', 'myfile2.tif...
resampling: nearest
AREA_OR_POINT: Area
COLORINTERP: Blue
_data_are_separate: 1
_data_are_stacked: 1
strangely, the resulting polygon_to_array
is still unaligned? I think that branch is probably still worthwhile merging at some point
labels = polygon_to_array( training_polys,'class', src)
print(labels)
<xarray.DataArray 'array-010aa44b3977375e822586d17ce8e9a8' (band: 1, y: 16375,
x: 16462)> Size: 270MB
dask.array<array, shape=(1, 16375, 16462), dtype=uint8, chunksize=(1, 1, 16462), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 8B 1
* y (y) float64 131kB 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 132kB 4.69e+05 4.69e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.321750859523004, 0.0, 468984.71685261483, 0.0, -3.32175085...
crs: EPSG:6339
res: (3.321750859523004, 3.3217508595230103)
is_tiled: 1
from geowombat.
Hi @WY-CGhilardi thanks for posting.
The default behavior of polygon_to_array
is to return the bounding box of the intersection between the array and polygon. But, we might not be passing all the needed variables through fit
(Cc @mmann1123).
To start, could you try the following with your data?
import geowombat as gw
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
poly_array = gw.polygon_to_array(training_polys, col="class", data=src, bounds_by="union")
print(poly_array)
Are you able to share your data?
from geowombat.
Here's another test to see if the polygons are properly converted using a config context.
import geowombat as gw
from geowombat.backends.rasterio_ import get_file_bounds
aoi_bounds = get_file_bounds(filenames, return_bounds=True, bounds_by='union')
with gw.config.update(ref_bounds=aoi_bounds):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
X, Xy, clf = fit(data=src, clf=pl, labels=training_polys, col='class')
from geowombat.
Currently ml functions don't work on time series stacks, this would also require time series training data. Instead you need to set stack_dim = 'band'
. This will treat all bands over all time periods a columns of X.
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5'],
stack_dim='band'
) as src:
X, Xy, clf = fit(data = src, clf = pl, labels = training_polys, col='class')
from geowombat.
I cannot share the data but it sounds like this is a issue with how the series is stacked.
For completeness here are the two configs requested above.
The first returns the same results as my previous call to polygon_to_array
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
poly_array = gw.polygon_to_array(training_polys, col="class", data=src, bounds_by="union")
print(poly_array)
<xarray.DataArray 'array-74bf0873a2d08172017317f9392aa551' (band: 1, y: 16399,
x: 20596)>
dask.array<array, shape=(1, 16399, 20596), dtype=uint8, chunksize=(1, 1, 20596), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.3217508595230054, 0.0, 455302.1731317809, 0.0, -3.32175085...
crs: EPSG:6339
res: (3.3217508595230054, 3.321750859522998)
is_tiled: 1
The second returns the same error message as before but now with one extra row on the y axis (16401
) rather than one less
from geowombat.backends.rasterio_ import get_file_bounds
aoi_bounds = get_file_bounds(filenames, return_bounds=True, bounds_by='union')
with gw.config.update(ref_bounds=aoi_bounds):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
X, Xy, clf = fit(data=src, clf=pl, labels=training_polys, col='class')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[11], line 9
4 with gw.config.update(ref_bounds=aoi_bounds):
5 with gw.open(
6 filenames,
7 band_names=['band1','band2','band3','band4','band5']
8 ) as src:
----> 9 X, Xy, clf = fit(data=src, clf=pl, labels=training_polys, col='class')
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/ml/classifiers.py:269, in Classifiers.fit(self, data, clf, labels, col, targ_name, targ_dim_name)
266 else:
267 data = self._add_time_dim(data)
--> 269 data, labels = self._prepare_labels(data, labels, col, targ_name)
270 X, Xna = self._prepare_predictors(data, targ_name)
271 clf = self._prepare_classifiers(clf)
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/ml/classifiers.py:77, in ClassifiersMixin._prepare_labels(self, data, labels, col, targ_name)
71 labels = polygon_to_array(labels, col=col, data=data)
73 labels = xr.concat([labels] * data.gw.ntime, dim="band").assign_coords(
74 coords={"band": data.time.values.tolist()}
75 )
---> 77 data.coords[targ_name] = (["time", "y", "x"], labels.values)
79 return data, labels
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:528, in Coordinates.__setitem__(self, key, value)
527 def __setitem__(self, key: Hashable, value: Any) -> None:
--> 528 self.update({key: value})
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:554, in Coordinates.update(self, other)
546 # Discard original indexed coordinates prior to merge allows to:
547 # - fail early if the new coordinates don't preserve the integrity of existing
548 # multi-coordinate indexes
549 # - drop & replace coordinates without alignment (note: we must keep indexed
550 # coordinates extracted from the DataArray objects passed as values to
551 # `other` - if any - as those are still used for aligning the old/new coordinates)
552 coords_to_align = drop_indexed_coords(set(other_coords) & set(other), self)
--> 554 coords, indexes = merge_coords(
555 [coords_to_align, other_coords],
556 priority_arg=1,
557 indexes=coords_to_align.xindexes,
558 )
560 # special case for PandasMultiIndex: updating only its dimension coordinate
561 # is still allowed but depreciated.
562 # It is the only case where we need to actually drop coordinates here (multi-index levels)
563 # TODO: remove when removing PandasMultiIndex's dimension coordinate.
564 self._drop_coords(self._names - coords_to_align._names)
File /env/lib/python3.10/site-packages/xarray/core/merge.py:556, in merge_coords(objects, compat, join, priority_arg, indexes, fill_value)
554 _assert_compat_valid(compat)
555 coerced = coerce_pandas_values(objects)
--> 556 aligned = deep_align(
557 coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
558 )
559 collected = collect_variables_and_indexes(aligned, indexes=indexes)
560 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:952, in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value)
949 else:
950 out.append(variables)
--> 952 aligned = align(
953 *targets,
954 join=join,
955 copy=copy,
956 indexes=indexes,
957 exclude=exclude,
958 fill_value=fill_value,
959 )
961 for position, key, aligned_obj in zip(positions, keys, aligned):
962 if key is no_key:
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:888, in align(join, copy, indexes, exclude, fill_value, *objects)
692 """
693 Given any number of Dataset and/or DataArray objects, returns new
694 objects with aligned indexes and dimension sizes.
(...)
878
879 """
880 aligner = Aligner(
881 objects,
882 join=join,
(...)
886 fill_value=fill_value,
887 )
--> 888 aligner.align()
889 return aligner.results
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:575, in Aligner.align(self)
573 self.assert_no_index_conflict()
574 self.align_indexes()
--> 575 self.assert_unindexed_dim_sizes_equal()
577 if self.join == "override":
578 self.override_indexes()
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:476, in Aligner.assert_unindexed_dim_sizes_equal(self)
474 add_err_msg = ""
475 if len(sizes) > 1:
--> 476 raise ValueError(
477 f"cannot reindex or align along dimension {dim!r} "
478 f"because of conflicting dimension sizes: {sizes!r}" + add_err_msg
479 )
ValueError: cannot reindex or align along dimension 'y' because of conflicting dimension sizes: {16400, 16401} (note: an index is found along that dimension with size=16401)
from geowombat.
I did also try adding the stack_dim
parameter to the open
call but that threw a different error before it made it to the fit
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[13], line 1
----> 1 with gw.open(
2 filenames,
3 band_names=['band1','band2','band3','band4','band5'],
4 stack_dim='band'
5 ) as src:
6 src
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/core/api.py:531, in open.__init__(self, filename, band_names, time_names, stack_dim, bounds, bounds_by, resampling, persist_filenames, netcdf_vars, mosaic, overlap, nodata, scale_factor, offset, dtype, scale_data, num_workers, **kwargs)
518 self.data = gw_mosaic(
519 filename,
520 overlap=overlap,
(...)
526 **kwargs,
527 )
529 else:
530 # Stack images along the 'time' axis
--> 531 self.data = gw_concat(
532 filename,
533 stack_dim=stack_dim,
534 bounds_by=bounds_by,
535 resampling=resampling,
536 time_names=time_names,
537 band_names=band_names,
538 nodata=nodata,
539 overlap=overlap,
540 dtype=dtype,
541 netcdf_vars=netcdf_vars,
542 **kwargs,
543 )
544 self.__data_are_stacked = True
546 self.__data_are_separate = True
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/backends/xarray_.py:738, in concat(filenames, stack_dim, bounds_by, resampling, time_names, band_names, nodata, dtype, netcdf_vars, overlap, warp_mem_limit, num_threads, tap, **kwargs)
735 src.coords['time'] = parse_filename_dates(filenames)
737 if band_names:
--> 738 src.coords['band'] = band_names
739 else:
740 if src.gw.sensor:
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:528, in Coordinates.__setitem__(self, key, value)
527 def __setitem__(self, key: Hashable, value: Any) -> None:
--> 528 self.update({key: value})
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:566, in Coordinates.update(self, other)
560 # special case for PandasMultiIndex: updating only its dimension coordinate
561 # is still allowed but depreciated.
562 # It is the only case where we need to actually drop coordinates here (multi-index levels)
563 # TODO: remove when removing PandasMultiIndex's dimension coordinate.
564 self._drop_coords(self._names - coords_to_align._names)
--> 566 self._update_coords(coords, indexes)
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:834, in DataArrayCoordinates._update_coords(self, coords, indexes)
832 coords_plus_data = coords.copy()
833 coords_plus_data[_THIS_ARRAY] = self._data.variable
--> 834 dims = calculate_dimensions(coords_plus_data)
835 if not set(dims) <= set(self.dims):
836 raise ValueError(
837 "cannot add coordinates with new dimensions to a DataArray"
838 )
File /env/lib/python3.10/site-packages/xarray/core/variable.py:2997, in calculate_dimensions(variables)
2995 last_used[dim] = k
2996 elif dims[dim] != size:
-> 2997 raise ValueError(
2998 f"conflicting sizes for dimension {dim!r}: "
2999 f"length {size} on {k!r} and length {dims[dim]} on {last_used!r}"
3000 )
3001 return dims
ValueError: conflicting sizes for dimension 'band': length 60 on <this-array> and length 5 on {'x': 'x', 'y': 'y', 'band': 'band'}
from geowombat.
Comment out band_names, it should be the same length as the total number of bands in all images
from geowombat.
removing band_names
with stack_dim
set to bands
returns the original off by one I was running into above
with gw.open(
filenames,
stack_dim='band'
) as src:
X, Xy, clf = fit(data = src, clf = pl, labels = training_polys, col='class')
ValueError Traceback (most recent call last)
Cell In[7], line 6
1 with gw.open(
2 filenames,
3 stack_dim='band'
4 ) as src:
----> 6 X, Xy, clf = fit(data = src, clf = pl, labels = training_polys, col='class')
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/ml/classifiers.py:269, in Classifiers.fit(self, data, clf, labels, col, targ_name, targ_dim_name)
266 else:
267 data = self._add_time_dim(data)
--> 269 data, labels = self._prepare_labels(data, labels, col, targ_name)
270 X, Xna = self._prepare_predictors(data, targ_name)
271 clf = self._prepare_classifiers(clf)
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/ml/classifiers.py:77, in ClassifiersMixin._prepare_labels(self, data, labels, col, targ_name)
71 labels = polygon_to_array(labels, col=col, data=data)
73 labels = xr.concat([labels] * data.gw.ntime, dim="band").assign_coords(
74 coords={"band": data.time.values.tolist()}
75 )
---> 77 data.coords[targ_name] = (["time", "y", "x"], labels.values)
79 return data, labels
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:528, in Coordinates.__setitem__(self, key, value)
527 def __setitem__(self, key: Hashable, value: Any) -> None:
--> 528 self.update({key: value})
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:554, in Coordinates.update(self, other)
546 # Discard original indexed coordinates prior to merge allows to:
547 # - fail early if the new coordinates don't preserve the integrity of existing
548 # multi-coordinate indexes
549 # - drop & replace coordinates without alignment (note: we must keep indexed
550 # coordinates extracted from the DataArray objects passed as values to
551 # `other` - if any - as those are still used for aligning the old/new coordinates)
552 coords_to_align = drop_indexed_coords(set(other_coords) & set(other), self)
--> 554 coords, indexes = merge_coords(
555 [coords_to_align, other_coords],
556 priority_arg=1,
557 indexes=coords_to_align.xindexes,
558 )
560 # special case for PandasMultiIndex: updating only its dimension coordinate
561 # is still allowed but depreciated.
562 # It is the only case where we need to actually drop coordinates here (multi-index levels)
563 # TODO: remove when removing PandasMultiIndex's dimension coordinate.
564 self._drop_coords(self._names - coords_to_align._names)
File /env/lib/python3.10/site-packages/xarray/core/merge.py:556, in merge_coords(objects, compat, join, priority_arg, indexes, fill_value)
554 _assert_compat_valid(compat)
555 coerced = coerce_pandas_values(objects)
--> 556 aligned = deep_align(
557 coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
558 )
559 collected = collect_variables_and_indexes(aligned, indexes=indexes)
560 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:952, in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value)
949 else:
950 out.append(variables)
--> 952 aligned = align(
953 *targets,
954 join=join,
955 copy=copy,
956 indexes=indexes,
957 exclude=exclude,
958 fill_value=fill_value,
959 )
961 for position, key, aligned_obj in zip(positions, keys, aligned):
962 if key is no_key:
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:888, in align(join, copy, indexes, exclude, fill_value, *objects)
692 """
693 Given any number of Dataset and/or DataArray objects, returns new
694 objects with aligned indexes and dimension sizes.
(...)
878
879 """
880 aligner = Aligner(
881 objects,
882 join=join,
(...)
886 fill_value=fill_value,
887 )
--> 888 aligner.align()
889 return aligner.results
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:575, in Aligner.align(self)
573 self.assert_no_index_conflict()
574 self.align_indexes()
--> 575 self.assert_unindexed_dim_sizes_equal()
577 if self.join == "override":
578 self.override_indexes()
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:476, in Aligner.assert_unindexed_dim_sizes_equal(self)
474 add_err_msg = ""
475 if len(sizes) > 1:
--> 476 raise ValueError(
477 f"cannot reindex or align along dimension {dim!r} "
478 f"because of conflicting dimension sizes: {sizes!r}" + add_err_msg
479 )
ValueError: cannot reindex or align along dimension 'y' because of conflicting dimension sizes: {16400, 16399} (note: an index is found along that dimension with size=16400)
from geowombat.
Do you have polygons that exceed the bounds of the images?
from geowombat.
Do you have polygons that exceed the bounds of the images? Or that contain missing data?
from geowombat.
Do you have polygons that exceed the bounds of the images?
No, the polygons were explicitly drawn within this AOI. Doubling checking the bounding boxes for both the geodataframe and xarray stack check out as well
# minx, miny, maxx, maxy
training_polys.total_bounds
array([ 472449.7271, 5063777.276 , 510789.9898, 5106120.4848])
## hacky but does the job
# minx, miny, maxx, maxy
(float(src.x.min().values), float(src.y.min().values), float(src.x.max().values), float(src.y.max().values))
(455303.8340072106, 5061403.560293189, 523715.29295908695, 5115876.952638507)
or contain missing data
No, the dataframe is fully populated and all have geometry
training_polys.isnull().values.any()
False
from geowombat.
This is a weird one. I've never run into it and can't reproduce without the data.
Can you try seeing 'ref_image' to the first image in your list?
https://geowombat.readthedocs.io/en/latest/tutorial-config.html#reference-settings-image
from geowombat.
@WY-CGhilardi could you post the full transform (print(src.gw.transform)
) of your image stack? I can see the left bound from your printout but not the top. And if you can also post the total bounds (print(df.total_bounds)
)) of your polygon geodataframe, we might be able to reproduce this issue with fake data.
from geowombat.
Apologies, I see that you already did what I requested. I'll look into this and see if I can reproduce.
from geowombat.
I think this might just be a precision issue. The adjusted height was 16399.9999987, which was converted to 16399 instead of 16400 with int()
. See if #312 works for you. I still need to add additional tests.
from geowombat.
I didn't see that geowombat exposed a bounds
function! I ran that and I got slightly different values from my "manual parsing" version. It sounds like that isn't maybe the full issue but I thought worth calling out.
src.gw.bounds
(455302.1731317809, 5061401.89941776, 523716.9538345167, 5115878.613513936)
##as compared with (455303.8340072106, 5061403.560293189, 523715.29295908695, 5115876.952638507) from above
I was not able to successfully get the reference image config working correctly. The files I am accessing are remote on a cloud provider so not sure if that is the issue or something else.
with gw.config.update(ref_image=filenames[0]):
with gw.open(filenames, ) as src:
print(src.gw.bounds)
23:21:49:WARNING:94:xarray_._check_config_globals: The reference image does not exist
23:21:49:WARNING:94:xarray_._check_config_globals: The reference image does not exist
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[11], line 2
1 with gw.config.update(ref_image=filenames[0]):
----> 2 with gw.open(filenames, ) as src:
3 print(src.gw.bounds)
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/core/api.py:531, in open.__init__(self, filename, band_names, time_names, stack_dim, bounds, bounds_by, resampling, persist_filenames, netcdf_vars, mosaic, overlap, nodata, scale_factor, offset, dtype, scale_data, num_workers, **kwargs)
518 self.data = gw_mosaic(
519 filename,
520 overlap=overlap,
(...)
526 **kwargs,
527 )
529 else:
530 # Stack images along the 'time' axis
--> 531 self.data = gw_concat(
532 filename,
533 stack_dim=stack_dim,
534 bounds_by=bounds_by,
535 resampling=resampling,
536 time_names=time_names,
537 band_names=band_names,
538 nodata=nodata,
539 overlap=overlap,
540 dtype=dtype,
541 netcdf_vars=netcdf_vars,
542 **kwargs,
543 )
544 self.__data_are_stacked = True
546 self.__data_are_separate = True
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/backends/xarray_.py:619, in concat(filenames, stack_dim, bounds_by, resampling, time_names, band_names, nodata, dtype, netcdf_vars, overlap, warp_mem_limit, num_threads, tap, **kwargs)
616 with rio_open(filenames[0]) as src_:
617 tags = src_.tags()
--> 619 src_ = warp_open(
620 f'{filenames[0]}:{netcdf_vars[0]}' if netcdf_vars else filenames[0],
621 resampling=resampling,
622 band_names=[netcdf_vars[0]] if netcdf_vars else band_names,
623 nodata=nodata,
624 warp_mem_limit=warp_mem_limit,
625 num_threads=num_threads,
626 **kwargs,
627 )
629 attrs = src_.attrs.copy()
630 src_.close()
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/backends/xarray_.py:314, in warp_open(filename, band_names, resampling, dtype, netcdf_vars, nodata, return_windows, warp_mem_limit, num_threads, tap, **kwargs)
297 warped_objects = warp_images(
298 filenames, resampling=resampling, **ref_kwargs_netcdf_stack
299 )
301 yield xr.concat(
302 (
303 open_rasterio(
(...)
310 dim='band',
311 )
313 with open_rasterio(
--> 314 warp(filename, resampling=resampling, **ref_kwargs),
315 nodata=ref_kwargs['nodata'],
316 **kwargs,
317 ) if not filenames else warp_netcdf_vars() as src:
318 if band_names:
319 if len(band_names) > src.gw.nbands:
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/backends/rasterio_.py:783, in warp(filename, resampling, bounds, crs, res, nodata, warp_mem_limit, num_threads, tap, tac)
779 dst_height, dst_width = get_dims_from_bounds(dst_bounds, dst_res)
781 # Do all the key metadata match the reference information?
782 if (
--> 783 (tuple(src_info.src_bounds) == tuple(bounds))
784 and (src_info.src_res == dst_res)
785 and (src_crs == dst_crs)
786 and (src_info.src_width == dst_width)
787 and (src_info.src_height == dst_height)
788 and ('.nc' not in filename.lower())
789 ):
790 vrt_options = {
791 'resampling': getattr(Resampling, resampling),
792 'src_crs': src_crs,
(...)
803 },
804 }
806 else:
TypeError: 'NoneType' object is not iterable
from geowombat.
I was not able to successfully get the reference image config working correctly. The files I am accessing are remote on a cloud provider so not sure if that is the issue or something else.
Hmm, geowombat open()
is basically just wrapped rasterio. Are you opening a virtual file following their protocol?
from geowombat.
Could you run your polygon conversion with the following and let me know what you get?
with gw.config.update(ref_res=3.2175):
and
with gw.config.update(ref_res=3.22):
from geowombat.
Let me start off with a big thank you to both of you for working through this with me, I really appreciate it
with gw.config.update(ref_res=3.2175):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
src
labelss = polygon_to_array( training_polys, 'class', src)
labelss
<xarray.DataArray 'array-eb576767189c8f0f6be73fcd77ffb4ee' (band: 1, y: 16930,
x: 21263)>
dask.array<array, shape=(1, 16930, 21263), dtype=uint8, chunksize=(1, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.2175, 0.0, 455302.1731317809, 0.0, -3.2175, 5115878.613513...
crs: EPSG:6339
res: (3.2175, 3.2175)
is_tiled: 1
with gw.config.update(ref_res=3.22):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
src
labelss = polygon_to_array( training_polys, 'class', src)
labelss
<xarray.DataArray 'array-e7a6b86caea789cacaa30e07821d1756' (band: 1, y: 16917,
x: 21247)>
dask.array<array, shape=(1, 16917, 21247), dtype=uint8, chunksize=(1, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.22, 0.0, 455302.1731317809, 0.0, -3.22, 5115878.613513936)
crs: EPSG:6339
res: (3.22, 3.22)
is_tiled: 1
Hmm, geowombat open() is basically just wrapped rasterio. Are you opening a virtual file following their protocol?
Yes, the filenames
is functionally just ['s3://somebucket/somprefix/somefile1.tif', 's3://somebucket/somprefix/somefile2.tif']
Indexing seems to work fine as part of the normal open
call so I don't see why that wouldn't work as part of the ref_img
. I am not too concerned with this behavior now since we have narrowed it down to precision issue. 🙃
with gw.open(filenames[0]) as src:
print(src.gw.bounds)
(455302.1731317809, 5061401.89941776, 523716.9538345167, 5115878.613513936)
from geowombat.
Using the fake data generated above, the array size is
print(data.shape)
print(data.gw.bounds)
(12, 5, 16400, 20596)
(455300.5122563511, 5061400.23854233, 523718.61470994644, 5115880.274389366)
and the training polygon bounding box is
print(training_polys.total_bounds.tolist())
[472449.7271, 5063777.276, 510789.9898, 5106120.4848]
Those numbers should match what you have, as I simply copied what you posted.
Using the latest (geowombat=2.1.19
from main
) on Python 3.9.16, I get:
poly_array = gw.polygon_to_array(
training_polys,
col="class",
data=data,
)
print(poly_array.shape)
(1, 16400, 20596)
from geowombat.
I'm not sure if it's a formatting/syntax error, but in your last post it didn't look like you wrapped the polygon conversion within the context. Are you running the following test?
with gw.config.update(ref_res=3.2175):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
# This needs to be indented here in order to use the config resolution
labels = polygon_to_array(training_polys, col='class', data=src)
print(src)
print(labels)
If that prints aligned shapes, can you try
with gw.config.update(ref_res=3.2175):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
X, Xy, clf = fit(data=src, clf=pl, labels=training_polys, col='class')
I don't have the same raster file you're using, but if I resample the array then the polygon array shape aligns.
data_resamp = data.gw.transform_crs(dst_res=(3.2175, 3.2175))
poly_array = gw.polygon_to_array(
training_polys,
col="class",
data=data_resamp ,
)
print(data_resamp.shape)
(5, 16933, 21265)
print(poly_array.shape)
(1, 16933, 21265)
from geowombat.
That is just a formatting issue, I can confirm I still return mis-aligned arrays with the configs
Using 3.2175
with gw.config.update(ref_res=3.2175):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
# This needs to be indented here in order to use the config resolution
labels = polygon_to_array(training_polys, col='class', data=src)
print(src) #16931 x 21264
print(labels) #16930 x 21263
<xarray.DataArray (time: 12, band: 5, y: 16931, x: 21264)>
dask.array<concatenate, shape=(12, 5, 16931, 21264), dtype=uint16, chunksize=(1, 5, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* time (time) int64 1 2 3 4 5 6 7 8 9 10 11 12
* band (band) <U5 'band1' 'band2' 'band3' 'band4' 'band5'
Attributes: (12/14)
transform: (3.2175, 0.0, 455302.1731317809, 0.0, -3.2175, 51158...
crs: 6339
res: (3.2175, 3.2175)
is_tiled: 1
nodatavals: (0.0, 0.0, 0.0, 0.0, 0.0)
_FillValue: 0.0
... ...
filename: ['1_2023_6339.tif', '2_2023_6339.tif', '3_2023_6339....
resampling: nearest
AREA_OR_POINT: Area
COLORINTERP: Blue
_data_are_separate: 1
_data_are_stacked: 1
<xarray.DataArray 'array-eb576767189c8f0f6be73fcd77ffb4ee' (band: 1, y: 16930,
x: 21263)>
dask.array<array, shape=(1, 16930, 21263), dtype=uint8, chunksize=(1, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.2175, 0.0, 455302.1731317809, 0.0, -3.2175, 5115878.613513...
crs: EPSG:6339
res: (3.2175, 3.2175)
is_tiled: 1
Using 3.22
with gw.config.update(ref_res=3.22):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
# This needs to be indented here in order to use the config resolution
labels = polygon_to_array(training_polys, col='class', data=src)
print(src) #16918 x 21247
print(labels) #16917 x 21247
<xarray.DataArray (time: 12, band: 5, y: 16918, x: 21247)>
dask.array<concatenate, shape=(12, 5, 16918, 21247), dtype=uint16, chunksize=(1, 5, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* time (time) int64 1 2 3 4 5 6 7 8 9 10 11 12
* band (band) <U5 'band1' 'band2' 'band3' 'band4' 'band5'
Attributes: (12/14)
transform: (3.22, 0.0, 455302.1731317809, 0.0, -3.22, 5115878.6...
crs: 6339
res: (3.22, 3.22)
is_tiled: 1
nodatavals: (0.0, 0.0, 0.0, 0.0, 0.0)
_FillValue: 0.0
... ...
filename: ['1_2023_6339.tif', '2_2023_6339.tif', '3_2023_6339....
resampling: nearest
AREA_OR_POINT: Area
COLORINTERP: Blue
_data_are_separate: 1
_data_are_stacked: 1
<xarray.DataArray 'array-e7a6b86caea789cacaa30e07821d1756' (band: 1, y: 16917,
x: 21247)>
dask.array<array, shape=(1, 16917, 21247), dtype=uint8, chunksize=(1, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.22, 0.0, 455302.1731317809, 0.0, -3.22, 5115878.613513936)
crs: EPSG:6339
res: (3.22, 3.22)
is_tiled: 1
It seems that resampling does work to align the array. Note I just did this on a single image here, I was running into OOM errors with the full stack.
with gw.open(filenames[0]) as src:
data_resamp = src.gw.transform_crs(dst_res=(3.2175, 3.2175))
poly_array = gw.polygon_to_array(
training_polys,
col="class",
data=data_resamp ,
)
print(data_resamp.shape)
print(poly_array.shape)
(5, 16932, 21264)
(1, 16932, 21264)
from geowombat.
I have been poking around this some more locally and noticed something this morning. For a given image, the returned xarray
is a slightly different shape on read than what is reported from QGIS
/GDAL
even before anything involving the polygons.
NOTE this is a slightly clipped down version of the files discussed above for ease of working on laptop
gdalinfo myfile1_clipped.tif
Driver: GTiff/GeoTIFF
Size is 16462, 16376
..... truncated
Full gdalinfo output if helpful
```bash gdalinfo myfile1_clipped.tif Driver: GTiff/GeoTIFF Size is 16462, 16376 Coordinate System is: PROJCRS["NAD83(2011) / UTM zone 10N", BASEGEOGCRS["NAD83(2011)", DATUM["NAD83 (National Spatial Reference System 2011)", ELLIPSOID["GRS 1980",6378137,298.257222101, LENGTHUNIT["metre",1]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433]], ID["EPSG",6318]], CONVERSION["UTM zone 10N", METHOD["Transverse Mercator", ID["EPSG",9807]], PARAMETER["Latitude of natural origin",0, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8801]], PARAMETER["Longitude of natural origin",-123, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8802]], PARAMETER["Scale factor at natural origin",0.9996, SCALEUNIT["unity",1], ID["EPSG",8805]], PARAMETER["False easting",500000, LENGTHUNIT["metre",1], ID["EPSG",8806]], PARAMETER["False northing",0, LENGTHUNIT["metre",1], ID["EPSG",8807]]], CS[Cartesian,2], AXIS["(E)",east, ORDER[1], LENGTHUNIT["metre",1]], AXIS["(N)",north, ORDER[2], LENGTHUNIT["metre",1]], USAGE[ SCOPE["Engineering survey, topographic mapping."], AREA["United States (USA) - between 126┬░W and 120┬░W onshore and offshore - California; Oregon; Washington."], BBOX[30.54,-126,49.09,-119.99]], ID["EPSG",6339]] Data axis to CRS axis mapping: 1,2 Origin = (468984.716852614830714,5115798.602993652224541) Pixel Size = (3.321750859523004,-3.321750859523010) Metadata: AREA_OR_POINT=Area COLORINTERP=Blue Image Structure Metadata: INTERLEAVE=PIXEL Corner Coordinates: Upper Left ( 468984.717, 5115798.603) (123d24' 7.06"W, 46d11'42.21"N) Lower Left ( 468984.717, 5061401.611) (123d23'54.38"W, 45d42'19.70"N) Upper Right ( 523667.380, 5115798.603) (122d41'35.76"W, 46d11'43.27"N) Lower Right ( 523667.380, 5061401.611) (122d41'45.44"W, 45d42'20.75"N) Center ( 496326.048, 5088600.107) (123d 2'50.66"W, 45d57' 3.46"N) Band 1 Block=16462x1 Type=UInt16, ColorInterp=Gray Min=1.000 Max=10000.000 Minimum=1.000, Maximum=10000.000, Mean=311.924, StdDev=500.680 NoData Value=0 Metadata: STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=10000 STATISTICS_MEAN=311.92363571779 STATISTICS_MINIMUM=1 STATISTICS_STDDEV=500.68043708223 STATISTICS_VALID_PERCENT=80.34 Band 2 Block=16462x1 Type=UInt16, ColorInterp=Undefined Min=1.000 Max=10000.000 Minimum=1.000, Maximum=10000.000, Mean=373.399, StdDev=495.399 NoData Value=0 Metadata: STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=10000 STATISTICS_MEAN=373.39868300658 STATISTICS_MINIMUM=1 STATISTICS_STDDEV=495.39942964762 STATISTICS_VALID_PERCENT=80.34 Band 3 Block=16462x1 Type=UInt16, ColorInterp=Undefined Min=1.000 Max=9961.000 Minimum=1.000, Maximum=9961.000, Mean=366.853, StdDev=518.729 NoData Value=0 Metadata: STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=9961 STATISTICS_MEAN=366.8531565228 STATISTICS_MINIMUM=1 STATISTICS_STDDEV=518.72915463185 STATISTICS_VALID_PERCENT=80.34 Band 4 Block=16462x1 Type=UInt16, ColorInterp=Undefined NoData Value=0 Band 5 Block=16462x1 Type=UInt16, ColorInterp=Undefined NoData Value=0 ```with gw.open(
myfile1_clipped.tif
) as src:
print(src.shape)
(5, 16375, 16462)
EDIT: I tried a vanilla rasterio read and that agrees with the GDAL outputs
#check metadata
import rasterio
with rasterio.open(filenames[0]) as rast_src:
print(rast_src.meta)
{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': 0.0, 'width': 16462, 'height': 16376, 'count': 5, 'crs': CRS.from_epsg(6339), 'transform': Affine(3.321750859523004, 0.0, 468984.71685261483,
0.0, -3.3217508595230103, 5115798.602993652)}
#actually read a band and report array shape
with rasterio.open(filenames[0]) as rast_src:
arr = rast_src.read(1)
print(arr.shape)
(16376, 16462)
EDIT 2: rioxarray matches with raw rasterio
da = rioxarray.open_rasterio(filenames[0] )
da.shape
(5, 16376, 16462)
from geowombat.
Sorry for the delay @WY-CGhilardi.
I have a random data clone of your image, with gdalinfo test.tif
resulting in:
Driver: GTiff/GeoTIFF
Files: test.tif
Size is 16462, 16376
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 10N",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["UTM zone 10N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-123,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Between 126°W and 120°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - British Columbia (BC); Northwest Territories (NWT); Nunavut; Yukon. United States (USA) - Alaska (AK)."],
BBOX[0,-126,84,-120]],
ID["EPSG",32610]]
Data axis to CRS axis mapping: 1,2
Origin = (468984.717000000004191,5115798.603000000119209)
Pixel Size = (3.321750859523004,-3.321750859523010)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left ( 468984.717, 5115798.603) (123d24' 7.06"W, 46d11'42.21"N)
Lower Left ( 468984.717, 5061401.611) (123d23'54.38"W, 45d42'19.70"N)
Upper Right ( 523667.380, 5115798.603) (122d41'35.76"W, 46d11'43.27"N)
Lower Right ( 523667.380, 5061401.611) (122d41'45.44"W, 45d42'20.75"N)
Center ( 496326.048, 5088600.107) (123d 2'50.66"W, 45d57' 3.46"N)
Band 1 Block=256x256 Type=Byte, ColorInterp=Gray
NoData Value=0
Band 2 Block=256x256 Type=Byte, ColorInterp=Undefined
NoData Value=0
Band 3 Block=256x256 Type=Byte, ColorInterp=Undefined
NoData Value=0
Band 4 Block=256x256 Type=Byte, ColorInterp=Undefined
NoData Value=0
Band 5 Block=256x256 Type=Byte, ColorInterp=Undefined
NoData Value=0
With changes in #319, the opened array results in:
with gw.open("test.tif") as src:
print(src)
<xarray.DataArray (band: 5, y: 16376, x: 16462)> Size: 1GB
dask.array<open_rasterio-fa4e67f0ae8d81c28193cae47ef688b7<this-array>, shape=(5, 16376, 16462), dtype=uint8, chunksize=(5, 256, 256), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 40B 1 2 3 4 5
* x (x) float64 132kB 4.69e+05 4.69e+05 ... 5.237e+05 5.237e+05
* y (y) float64 131kB 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
Attributes: (12/13)
transform: (3.321750859523004, 0.0, 468984.717, 0.0, -3.3217508...
crs: 32610
res: (3.321750859523004, 3.32175085952301)
is_tiled: 1
nodatavals: (0.0, 0.0, 0.0, 0.0, 0.0)
_FillValue: 0.0
... ...
offsets: (0.0, 0.0, 0.0, 0.0, 0.0)
filename: test.tif
resampling: nearest
AREA_OR_POINT: Area
_data_are_separate: 0
_data_are_stacked: 0
from geowombat.
Related Issues (20)
- Failed to Install Geowombat on pip and conda HOT 12
- Requirement differences between package & conda-forge HOT 6
- Coordinating efforts on potential new geospatial Xarray accessor HOT 2
- Regression capability in geowombat HOT 3
- ARM Installation Issues HOT 9
- Apply interpolation HOT 4
- Stac import HOT 2
- series.apply big tiff / bad block
- EVI equation is wrong HOT 2
- Support for pyproj 3.4.1? HOT 6
- Slow extract HOT 2
- mosaic bounds HOT 4
- fit_predict
- Stack & Mosaic Advice? HOT 6
- writing with Bigtiff HOT 2
- [] in open trigger stacking
- band names HOT 1
- Unable to import geowombat HOT 1
- Mosaic Fails with nan nodata HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from geowombat.