Git Product home page Git Product logo

Comments (4)

lillythomas avatar lillythomas commented on May 28, 2024 1

Thanks for updating here @rbavery. To use that function, make sure to import re too 😉

from radiant-mlhub.

kbgg avatar kbgg commented on May 28, 2024 1

Thanks @rbavery,

To follow up in regards to timeline, part of our dataset updates also require some additional changes to our pipeline so we can't begin to publish the updated catalogs before those changes are made as well. We have some additional projects that we're also working on at the moment but we're targeting that we'll have the updated catalogs published before the holidays towards the end of this year.

We'll follow up in the call that we will schedule soon and also feel free to reach out for updates on our MLHub Slack!

from radiant-mlhub.

rbavery avatar rbavery commented on May 28, 2024

This is also a problem for the recently announced competition dataset for AgrifieldNet

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [15], line 1
----> 1 stack = stackstac.stack(source_items, epsg = 4326)

File ~/Library/Application Support/hatch/env/virtual/ml-pipeline-sYYlrDaP/nb/lib/python3.10/site-packages/stackstac/stack.py:279, in stack(items, assets, epsg, resolution, bounds, bounds_latlon, snap_bounds, resampling, chunksize, dtype, fill_value, rescale, sortby_date, xy_coords, properties, band_coords, gdal_env, errors_as_nodata, reader)
    272 if sortby_date is not False:
    273     plain_items = sorted(
    274         plain_items,
    275         key=lambda item: item["properties"].get("datetime", "") or "",
    276         reverse=sortby_date == "desc",
    277     )
--> 279 asset_table, spec, asset_ids, plain_items = prepare_items(
    280     plain_items,
    281     assets=assets,
    282     epsg=epsg,
    283     resolution=resolution,
    284     bounds=bounds,
    285     bounds_latlon=bounds_latlon,
    286     snap_bounds=snap_bounds,
    287 )
    288 arr = items_to_dask(
    289     asset_table,
    290     spec,
   (...)
    298     errors_as_nodata=errors_as_nodata,
    299 )
    301 return xr.DataArray(
    302     arr,
    303     *to_coords(
   (...)
    312     name="stackstac-" + dask.base.tokenize(arr)
    313 )

File ~/Library/Application Support/hatch/env/virtual/ml-pipeline-sYYlrDaP/nb/lib/python3.10/site-packages/stackstac/prepare.py:294, in prepare_items(items, assets, epsg, resolution, bounds, bounds_latlon, snap_bounds)
    291 # If there's no geotrans, compute resolutions from `proj:shape`
    292 else:
    293     if asset_bbox_proj is None or asset_shape is None:
--> 294         raise ValueError(
    295             f"Cannot automatically compute the resolution, "
    296             f"since asset {id!r} on item {item_i} {item['id']!r} "
    297             f"doesn't provide enough metadata to determine its native resolution.\n"
    298             f"We'd need at least one of (in order of preference):\n"
    299             f"- The `proj:transform` and `proj:epsg` fields set on the asset, or on the item\n"
    300             f"- The `proj:shape` and one of `proj:bbox` or `bbox` fields set on the asset, "
    301             "or on the item\n\n"
    302             "Please specify the `resolution=` argument to set the output resolution manually. "
    303             f"(Remember that resolution must be in the units of your CRS ([http://epsg.io/{out_epsg](http://epsg.io/%7Bout_epsg)})"
    304             "---not necessarily meters."
    305         )
    307     # NOTE: this would be inaccurate if `proj:bbox` was provided,
    308     # but the geotrans was non-rectilinear
    309     # TODO check for that if there's a geotrans??
    310     res_y = (asset_bbox_proj[3] - asset_bbox_proj[1]) / asset_shape[0]

ValueError: Cannot automatically compute the resolution, since asset 'B01' on item 0 'ref_agrifieldnet_competition_v1_source_ffe8c' doesn't provide enough metadata to determine its native resolution.
We'd need at least one of (in order of preference):
- The `proj:transform` and `proj:epsg` fields set on the asset, or on the item
- The `proj:shape` and one of `proj:bbox` or `bbox` fields set on the asset, or on the item

Please specify the `resolution=` argument to set the output resolution manually. (Remember that resolution must be in the units of your CRS (http://epsg.io/4326)---not necessarily meters.

A script to reproduce is here

import os
from configparser import ConfigParser
from radiant_mlhub import Dataset, get_session
import requests
from pystac_client import Client
from urllib.parse import urljoin
import stackstac

config = ConfigParser()
configFilePath = '../.mlhub_api_key.cfg'
with open(configFilePath) as f:
    config.read_file(f)
MLHUB_API_KEY = config.get('credentials', 'api_key')
os.environ['MLHUB_API_KEY'] = MLHUB_API_KEY
MLHUB_ROOT_URL = "https://api.radiant.earth/mlhub/v1/"
client = Client.open(
    MLHUB_ROOT_URL, parameters={"key": MLHUB_API_KEY}, ignore_conformance=True
)
class MLHubSession(requests.Session):
    def __init__(self, *args, api_key=None, **kwargs):
        super().__init__(*args, **kwargs)
        self.params.update({"key": api_key})

    def request(self, method, url, *args, **kwargs):
        url_prefix = MLHUB_ROOT_URL.rstrip("/") + "/"
        url = urljoin(url_prefix, url)
        return super().request(method, url, *args, **kwargs)
session = MLHubSession(api_key=MLHUB_API_KEY)
search = client.search(collections=["ref_agrifieldnet_competition_v1_source"])
source_items = search.get_all_items()
search = client.search(collections=["ref_agrifieldnet_competition_v1_labels_train"])
train_label_items = search.get_all_items()
search = client.search(collections=["ref_agrifieldnet_competition_v1_labels_test"])
test_label_items = search.get_all_items()
stack = stackstac.stack(source_items, epsg = 4326)

from radiant-mlhub.

rbavery avatar rbavery commented on May 28, 2024

Here is a quick function that @lillythomas put together to update the STAC metadata for a given item. CC @kbgg We made this for the sen12floods collection, so it would need to be adapted depending on the asset in the stac item.

On my connection, this takes about 5 seconds per Item. We're curious to hear what the timeline is for updating the metadata collections since 5 seconds per item can get long when it's run for large Item collections and whenever we use stackstac.

import time
import rasterio
from rasterio import logging

log = logging.getLogger()
log.setLevel(logging.ERROR)

def set_transform_epsg(source_item, verbose=False):
    """
    This modifies the source item in place to update projection metadata.
    Assumes this metadata is missing or it will be overwritten.
    """
    start = time.time()
    with rasterio.open(source_item.assets['VV'].href) as src:
        x = src.profile
    source_item.properties['proj:transform'] = list(x['transform'])
    source_item.properties['proj:epsg'] = re.findall('\d+', str(x['crs']))[0]
    #print(source_item.properties)
    if verbose:
        print(f"Time to update metadata for item {source_item.id}: ", time.time() - start)
    return source_item

from radiant-mlhub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.