stac-utils / pystac-client Goto Github PK

View Code? Open in Web Editor NEW

155.0 155.0 48.0 14.96 MB

Python client for searching STAC APIs

Home Page: https://pystac-client.readthedocs.io

License: Other

Python 99.17% Shell 0.83%

geospatial python stac

pystac-client's People

Contributors

Stargazers

Watchers

pystac-client's Issues

Ability to plug in alternate request functions

Similar to STAC_IO in PySTAC, users should be able to override the requests if they need to do authentication in some way.

Implement async requests

catalog.search method returns NotImplementedError

Using the python library after instantiating a client connection to a STAC server, calling the catalog.search method returns a NotImplementedError. Maybe this has to do with how the subclass inherits method definitions from the parent class?

Example method that returns NotImplementedError on line 3:

def my_read_method(stacserver_url):

  catalog = Client.open(stacserver_url)
  mysearch = catalog.search(bbox=[-72.5,40.5,-72,41], max_items=10)
  return(f"{mysearch.matched()} items found")`

Federated search

A big advantage of STAC is being able to use data from multiple sources.
It would be a nice feature to be able to search multiple STAC endpoints and combine the results into a single FeatureCollection

validate_all throws KeyError when validating Microsoft Planetary Computer STAC API

The code below throws a KeyError instead of giving a STACValidationError. I think it needs to be more defensive so if the "id" field is missing, and convert the KeyError into a STACValidationError. It's also possible that it's trying to process an entity that's not a STAC object and has no "id" field.

import pystac_client

pystac_client.Client.open(
    'https://planetarycomputer.microsoft.com/api/stac/v1/').validate_all()

The exception:

Traceback (most recent call last):
  File "/Users/philvarner/code/stac-api-validation-suite/stac_api_validation_suite/validate_all_error.py", line 3, in <module>
    pystac_client.Client.open(
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/catalog.py", line 679, in validate_all
    child.validate_all()
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/catalog.py", line 680, in validate_all
    for item in self.get_items():
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/stac_object.py", line 343, in get_stac_objects
    link.resolve_stac_object(root=self.get_root())
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/link.py", line 146, in resolve_stac_object
    obj = STAC_IO.read_stac_object(target_href, root=root)
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/stac_io.py", line 131, in read_stac_object
    return cls.stac_object_from_dict(d, href=uri, root=root)
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/serialization/__init__.py", line 37, in stac_object_from_dict
    return Catalog.from_dict(d, href=href, root=root)
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/catalog.py", line 790, in from_dict
    id = d.pop('id')
KeyError: 'id'

API or Client

Since we're keeping inheritance (#19), the API class is interchangeable with a static catalog or an API. The main difference is really that this repo is almost exclusively used for read-only (transaction extension excepted).

Users can use API.open() on any valid endpoint, even a local file.

Should the API class be renamed to Client? STACClient? Something else?

Documentation link goes nowhere

Following the documentation link in the readme goes to https://pystac-client.readthedocs.io/en/latest/, which says "This page does not exist yet."

sort extension

Finish sort extension (see sort-extension branch)

limit of between 1 and 10000 should be enforced by client

Per the definition of limit in OGC API Features (https://docs.opengeospatial.org/is/17-069r3/17-069r3.html), only values between 1 and 10000 are allowed. These restrictions on this parameter should be enforced by the client.

Deploy to ReadTheDocs

The ReadTheDocs page still wants to build from the old repo (duckontheweb/pystac-api). I'll update that project so that it builds from this location instead.

Is pystac-client a replacement for sat-search

I have been using sat-search (https://github.com/sat-utils/sat-search) to query and download data. Is pystac-client a newer api that will replace sat-search?
Is sat-search going to be deprecated anytime soon?

Need to support GET requests for search

Check if stac-api-spec requires POST for search, if so then client can just use POST for search, otherwise GET will need to be supported as well

Conformance with the USGS STAC API is failing

USGS STAC API: https://ibhoyw8md9.execute-api.us-west-2.amazonaws.com/prod

It seems that PySTAC Client is testing against an older version of the conformances than USGS has implemented!

Edit: this was with 0.1.1 of pystac-client

ConformanceError                          Traceback (most recent call last)
<ipython-input-2-667e3e4574de> in <module>
----> 1 client = pystac_client.Client.open("https://ibhoyw8md9.execute-api.us-west-2.amazonaws.com/prod")

/usr/local/lib/python3.9/site-packages/pystac_client/client.py in open(cls, url, headers)
     96         old_read_text_method = STAC_IO.read_text_method
     97         STAC_IO.read_text_method = read_text_method
---> 98         catalog = cls.from_file(url)
     99         STAC_IO.read_text_method = old_read_text_method
    100         catalog.headers = headers

/usr/local/lib/python3.9/site-packages/pystac/stac_object.py in from_file(cls, href)
    494             o = STAC_IO.stac_object_from_dict(d, href=href)
    495         else:
--> 496             o = cls.from_dict(d, href=href)
    497 
    498         # Set the self HREF, if it's not already set to something else.

/usr/local/lib/python3.9/site-packages/pystac_client/client.py in from_dict(cls, d, href, root)
    132         d.pop('stac_version')
    133 
--> 134         catalog = cls(
    135             id=id,
    136             description=description,

/usr/local/lib/python3.9/site-packages/pystac_client/client.py in __init__(self, id, description, title, stac_extensions, extra_fields, href, catalog_type, conformance, headers)
     66         if conformance is not None and not self.conforms_to(ConformanceClasses.STAC_API_CORE):
     67             allowed_uris = "\n\t".join(ConformanceClasses.STAC_API_CORE.all_uris)
---> 68             raise ConformanceError(
     69                 'API does not conform to {ConformanceClasses.STAC_API_CORE}. Must contain one of the following '
     70                 f'URIs to conform (preferably the first):\n\t{allowed_uris}.')

ConformanceError: API does not conform to {ConformanceClasses.STAC_API_CORE}. Must contain one of the following URIs to conform (preferably the first):
	https://api.stacspec.org/v1.0.0-beta.1/core
	http://stacspec.org/spec/api/1.0.0-beta.1/core.

Running stac-client with no arguments should give defined error and usage rather than exception

I ran stac-client with no arguments, and got the following:

$ stac-client
Traceback (most recent call last):
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/bin/stac-client", line 8, in <module>
    sys.exit(cli())
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac_client/cli.py", line 127, in cli
    loglevel = args.pop('logging')
KeyError: 'logging'

fields extension

Implement fields extension

error with max_items and limit set against earth-search

The following code throws the exception below with pystac_client==0.1.1

The issue seems to be related to limit or pagination. There are 880 results. When I change limit to 500, it runs successfully.

I see now that the server is returning a non-200 status code, but it's not clear if it's a problem with the server or the request, or how pystac-client is handling the error. @matthewhanson

from pystac_client import Client

catalog = Client.open("https://earth-search.aws.element84.com/v0")

mysearch = catalog.search(
    collections=['sentinel-s2-l2a-cogs'], 
    datetime = "2021-04-25T00:00:00Z/2021-04-25T02:00:00Z",
    max_items=1000,
    limit=1000
)
print(f"{mysearch.matched()} items found")

items = mysearch.items_as_collection()

Exception

---------------------------------------------------------------------------
APIError                                  Traceback (most recent call last)
<ipython-input-14-d789a1ae9ce9> in <module>
     13 print(f"{mysearch.matched()} items found")
     14 
---> 15 items = mysearch.items_as_collection()
     16 items.save('items.json')

/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/pystac_client/item_search.py in items_as_collection(self)
    406         item_collection : ItemCollection
    407         """
--> 408         return ItemCollection(self.items())

/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/pystac_client/item_collection.py in __init__(self, features)
     19 
     20         features = features or []
---> 21         self.features = [f.clone() for f in features]
     22         self.links = []
     23         for f in self.features:

/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/pystac_client/item_collection.py in <listcomp>(.0)
     19 
     20         features = features or []
---> 21         self.features = [f.clone() for f in features]
     22         self.links = []
     23         for f in self.features:

/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/pystac_client/item_search.py in items(self)
    391 
    392         try:
--> 393             yield from it.islice(_paginate(), self._max_items)
    394         except HTTPError as e:
    395             if e.code == 405:

/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/pystac_client/item_search.py in _paginate()
    387         """
    388         def _paginate():
--> 389             for item_collection in self.item_collections():
    390                 yield from item_collection.features
    391 

/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/pystac_client/item_search.py in item_collections(self)
    372         request = deepcopy(self.request)
    373 
--> 374         for page in get_pages(session=self.session,
    375                               request=request,
    376                               next_resolver=self._next_resolver):

/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/pystac_client/stac_io.py in get_pages(session, request, next_resolver)
    168     while True:
    169         # Yield all items
--> 170         page = make_request(session, request)
    171         yield page
    172 

/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/pystac_client/stac_io.py in make_request(session, request, additional_parameters)
     46     resp = session.send(prepped)
     47     if resp.status_code != 200:
---> 48         raise APIError(resp.text)
     49     return resp.json()
     50 

APIError: {"message": "Internal server error"}

Support /collections endpoint

move to pystac

I'm wondering if it makes sense to move this to pystac. I think it makes sense because there aren't any additional dependencies besides pystac itself.

cc @lossyrob

Enhancement: Support searching static catalogs

Based on input from @matthewhanson and @TomAugspurger over in intake/intake-stac#95 (comment),

Currently pystac-client will raise an APIError if there is no rel=search link (but it is intended to be a client for both static catalogs and APIs).

It would indeed be nice if pystac-client had the ability to search static catalogs in addition to API endpoints. There are a couple of use-cases for this:

Filtering pystac-client search results further (i.e. a local results.json file with an ItemCollection) without having to issue new requests to an API enpoint.
Searching static STAC catalogs that do not have an API endpoint set up.

It would be great if the "local search" function used the same syntax as an API search. Perhaps adding an optional dependency on geopandas would make this easy to implement?

Read collections from rel='data'

Any plan to read collections from link with rel='data' ? I mean the /collections endpoint

Use new ItemCollection STAC extension

Incorporate ItemCollection extension (doesn't yet exist).

Alternately, this might be better to include into PySTAC directly.

Add badges to README

Add badges in the README as in PySTAC

Client.get_collections_list throws type error

Code:

catalog: Client = Client.open('https://franklin.nasa-hsi.azavea.com/')
catalog.get_collections_list()

Error:

Traceback (most recent call last):
...
  File "/Users/philvarner/code/stac-api-validation-suite/stac_api_validation_suite/__init__.py", line 65, in validate_api
    for collection in catalog.get_collections_list():
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac_client/client.py", line 160, in get_collections_list
    return self.get_child_links()
TypeError: get_child_links() missing 1 required positional argument: 'self'

Relevant code:

in client.py:

 @classmethod
    def get_collections_list(self):
        """Gets list of available collections from this Catalog. Alias for get_child_links since children
            of an API are always and only ever collections
        """
        return self.get_child_links()

calling this in pystac/catalog.py:

 def get_child_links(self):
        """Return all child links of this catalog.

        Return:
            List[Link]: List of links of this catalog with ``rel == 'child'``
        """
        return self.get_links('child')

Should API inherit from Catalog?

The API class currently inherits from pystac.Catalog. This was originally done because the an API landing page has to be valid STAC Catalog according to the spec, and because it made all of the typical methods of a pystac.Catalog immediately available, which seemed convenient.

However, pystac.Catalog has a bunch of methods used for writing that really don't apply to the API class. One option would be to overwrite these so that they are either no-ops or raise a NotImplemented error. Another option would be to use composition rather than inheritance and have the landing page be an attribute within a completely separate API class.

Opening this issue to start a discussion on what the best approach would be for this.

CLI behavior

Looking for feedback on the behavior of the CLI.

Right now the default is to hit the API with limit=0 and print the total number matched. To actually fetch the items you specify either --stdout or --save. This allows piping of the output to something else such as jq or stacterm.

The requirement of --stdout to print out the results doesn't seem consistent with how most CLIs work.

Having the default be total matches only is nice to prevent people from making requests on a lot of items. But maybe it's better to have --matched be the keyword and prevent mistakes in another way (e.g., confirmation, default max_items).

Also --save is redundant since you can just pipe to a file. Is it worth keeping, or just update docs to show piping to an output file?

So the question is Do we by default fetch items (and have a --matched switch) or a default of matches (and have a --fetch switch)?

Support for STAC API 1.0.0-beta.2

https://tamn.snapplanet.io/ uses 1.0.0-beta.2 in it's conformance classes, and fails with:

Error https://tamn.snapplanet.io/: <class 'pystac_client.exceptions.ConformanceError'> API does not conform to {ConformanceClasses.STAC_API_CORE}. Must contain one of the following URIs to conform (preferably the first):
        https://api.stacspec.org/v1.0.0-beta.1/core
        http://stacspec.org/spec/api/1.0.0-beta.1/core.

It would be nice to validate against either beta.1 or beta.2 . I recognize this is complex to implement, but it would be nice to support.

Performance to construct a large ItemCollection

This issue documents some slowness on moderately large queries. In the snippet below we fetch 8,234 items. It takes about a minute to construct the results.

import rasterio.features
import pystac_client

area_of_interest = {
    "type": "Polygon",
        "coordinates": [
          [
            [
              -123.46435546875,
              46.4605655457854
            ],
            [
              -119.608154296875,
              46.4605655457854
            ],
            [
              -119.608154296875,
              48.26125565204099
            ],
            [
              -123.46435546875,
              48.26125565204099
            ],
            [
              -123.46435546875,
              46.4605655457854
            ]
          ]
        ]
}
bbox = rasterio.features.bounds(area_of_interest)
stac = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search = stac.search(
    bbox=bbox,
    datetime="2016-01-01/2020-12-31",
    collections=["sentinel-2-l2a"],
    limit=2500,  # fetch items in batches of 2500
)

print(search.matched())  # 8234
items = list(search.items())

I ran list(search.items()) under snakeviz and came up with this result: https://gistcdn.rawgit.org/TomAugspurger/fb5b3bde8cee09d2d9aa2f7215edf2b2/94e4ec2ae97bec2169f9263e8f41183418e885d9/mosaic-static.html

A few notes:

We're spend roughly 2/3s of our time in stac_io.get_pages, which includes IO, waiting for the endpoint (and maybe parsing the JSON into Python objects?)
We spend the other 1/3 of our time in item_collection.from_dict

Some ideas for optimization:

Most of the time in item_collections.from_dict is spent on a deepcopy in pystac.Item.from_dict. It might be safe to skip that copy (since these should be coming off the network with no other references) and provide a copy=False flag to pystac.Item.from_dict, to allow it to mutate the incoming dict.
Maybe pystac_client.Client or .search could provide a raw=True/False flag to allow skipping constructing pystac Items?
Maybe some kind of async magic would speed up the reads? Hard to say, since I don't know how much time is spent waiting for results vs. parsing JSON. I don't know if it's a good idea to parse JSON on the asyncio event loop.

Error message should probably be an f-string

pystac-client/pystac_client/client.py

Line 70 in 2aa5ecf

 'API does not conform to {ConformanceClasses.STAC_API_CORE}. Must contain one of the following ' 

The error message looks like:

Error https://api.radiant.earth/mlhub/v1/: <class 'pystac_client.exceptions.ConformanceError'> API does not conform to {ConformanceClasses.STAC_API_CORE}. Must contain one of the following URIs to conform (preferably the first):
        https://api.stacspec.org/v1.0.0-beta.1/core
        http://stacspec.org/spec/api/1.0.0-beta.1/core.

I'm guessing this is supposed to be an f-string, but isn't.

Number of matched items with limit 0 invalid

Hi all,

according to the STAC API specification, the limit parameter needs to be between 1 and 10.000:
https://github.com/radiantearth/stac-api-spec/blob/dev/item-search/openapi.yaml#L200-L204

Within the matched function, the limit is set to 0 to just get the number of found features within a query request. This request results in an exception (e.g., when using the stac-fastapi (pgstac) API backend) because the API returns an HTTP status code not equal to 200, which is fine according to the STAC API specification.

pystac-client/pystac_client/item_search.py

Lines 373 to 375 in 8282ef3

 def matched(self) -> int: 

 params = {**self._parameters, "limit": 0} 

 resp = self._stac_io.read_json(self.url, method=self.method, parameters=params)

Maybe the limit parameter needs to be change from 0 to 1?

Best
Jonas

Support /collections/<cid>/items endpoint

Deploy to PyPI

Set up GitHub Action to deploy to PyPI

Use STAC_URL environment variable in Client.open

Currently, only the command line can use the STAC_URL environment variable for automatically discovering the STAC endpoint. It'd be nice to use it with the Python Client class as well.

This should do the trick, however I don't have a good idea how to test it. Anyone have thoughts?

diff --git a/pystac_client/client.py b/pystac_client/client.py
index 0fc03c6..c8e684b 100644
--- a/pystac_client/client.py
+++ b/pystac_client/client.py
@@ -75,13 +75,13 @@ class Client(pystac.Catalog, STACAPIObjectMixin):
         return '<Catalog id={}>'.format(self.id)
 
     @classmethod
-    def open(cls, url, headers=None):
+    def open(cls, url=None, headers=None):
         """Alias for PySTAC's STAC Object `from_file` method
 
         Parameters
         ----------
-        url : str
-            The URL of a STAC Catalog
+        url : str, optional
+            The URL of a STAC Catalog. If not specified, this will use the `STAC_URL` environment variable.
 
         Returns
         -------
@@ -89,6 +89,12 @@ class Client(pystac.Catalog, STACAPIObjectMixin):
         """
         import pystac_client.stac_io
 
+        if url is None:
+            url = os.environ.get("STAC_URL")
+
+        if url is None:
+            raise TypeError("'url' must be specified or the 'STAC_URL' environment variable must be set.")
+
         def read_text_method(url):
             request = Request(url, headers=headers or {})
             return pystac_client.stac_io.read_text_method(request)

Virtual environment + build tools

Using poetry is nice for a lot of reasons, but might end up being a barrier to some contributors if it's not part of their workflow already. Bringing the build and environment tools for this library in line with what PySTAC uses might encourage more contribution.

Update to use PySTAC 1.0.X

PySTAC 1.0.X has several large changes:

stac io is now a class
extensions have been refactored

pystac-client needs to use the new StacIO class.
pystac-client extensions, which were based on the old PySTAC extension mechanic, need to be reworked

Add automated test suite for testing API compliance

Update docs

Several of the docs are no longer applicable (e.g., extensions) and/or need to be updated to account for changes in the way conformance is handled, extensions, and naming of the Client class.

`Client.get_collections_list` shouldn't be a classmethod?

catalog = Client.open('https://earth-search.aws.element84.com/v0')
catalog.get_collections_list()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-ade3012ce385> in <module>
----> 1 catalog.get_collections_list()

~/Library/Caches/pypoetry/virtualenvs/stackstac-talk-YcC8wOkC-py3.9/lib/python3.9/site-packages/pystac_client/client.py in get_collections_list(self)
    158             of an API are always and only ever collections
    159         """
--> 160         return self.get_child_links()
    161 
    162     def search(self,

TypeError: get_child_links() missing 1 required positional argument: 'self'

get_collections_list is a classmethod, but it seems like it's not meant to be?

pystac-client/pystac_client/client.py

Lines 155 to 160 in 77d95a2

 @classmethod 

 def get_collections_list(self): 

 """Gets list of available collections from this Catalog. Alias for get_child_links since children 

  of an API are always and only ever collections 

  """ 

 return self.get_child_links()

Version: 0.1.1

Search datetime parameter allows invalid RFC3339 datetimes

The datetime regex used for the search parameter datetime has a few problems wrt RFC 3339 datetimes.

DATETIME_REGEX = re.compile(
    r"(?P<year>\d{4})(-(?P<month>\d{2})(-(?P<day>\d{2})(?P<remainder>T\d{2}:\d{2}:\d{2}\w*)?)?)?")

T or t can separate the date and time (this is very confusing in the spec, but only these two characters are allowed, see https://stackoverflow.com/questions/63783868/what-are-valid-date-time-separators-in-rfc3339-strings for a clear description)
fractional seconds are not matched (b/c \w doesn't match the .)
\w* is overly permissive to match optional fractional seconds and timezone (Z or (-|+)\d\d:\d\d)

Copying this from my personal notes on RFC 3339 datetimes:

RFC 3339 is a profile of ISO 8601, adding these constraints:

a complete representation of date and time (fractional seconds optional).
requires 4-digit years
only allows a period character to be used as the decimal point for fractional seconds
requires the zone offset to be Z or like +00:00, while ISO8601 allows like +0000

These are a few examples of what would be allowed for ISO8601 but not RFC 3339:

1985-04-12
1937-01-01T12:00:27.87+0100
37-01-01T12:00:27.87+0100

Below are all valid RFC 3339 datetimes. Note the fractional seconds, Z or z as a timezone, positive and negative arbitrary offset timezones, T or any other character as a separator between date and time.

1985-04-12T23:20:50.52Z
1996-12-19T16:39:57-08:00
1990-12-31T23:59:60Z
1990-12-31T15:59:60-08:00
1937-01-01T12:00:27.87+01:00
1985-04-12T23:20:50.52Z
1937-01-01T12:00:27.8710+01:00
1937-01-01T12:00:27.8+01:00
1937-01-01T12:00:27.8Z
1985-04-12t23:20:50.5202020z
2020-07-23T00:00:00Z
2020-07-23T00:00:00.0Z
2020-07-23T00:00:00.01Z
2020-07-23T00:00:00.012Z
2020-07-23T00:00:00.0123Z
2020-07-23T00:00:00.01234Z
2020-07-23T00:00:00.012345Z
2020-07-23T00:00:00.000Z
2020-07-23T00:00:00.000+03:00
2020-07-23T00:00:00+03:00
2020-07-23T00:00:00.000+03:00
2020-07-23T00:00:00.000z

I think this is the correct regex for an RFC 3339 datetime:

r"^(\d\d\d\d)\-(\d\d)\-(\d\d)(T|t)(\d\d):(\d\d):(\d\d)(\.\d+)?(Z|([-+])(\d\d):(\d\d))$"

This is slightly different from the one in python-strict-rfc3339, as it allows T or t for the sep, and reverses +- and -+ so that the - doesn't need to be escaped with \-

Matching the datetime string to this will ensure it is a valid RFC 3339 (not just an ISO 8601 datetime), and then an ISO8601 parser can be used to parse it further if need be.

The built-in Python datetime library is not sufficient to parse all valid datetimes here -- notably, it doesn't parse Z as a timezone.

There are two options for this:

pyiso8601 - https://github.com/micktwomey/pyiso8601
dateutil - https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.isoparse

Additionally, hypothesis-jsonschema has support for generating dt's for testing: https://github.com/Zac-HD/hypothesis-jsonschema/blob/1c5f107230ccbd48c66d7c6693833745a598e294/src/hypothesis_jsonschema/_from_schema.py

Invalid Search.matched requests should raise an exception?

Currently, if I provide an invalid parameter to .search and call .matched(), a warning is printed.

>>> api = pystac_client.API.open("https://planetarycomputer-staging.microsoft.com/api/stac/v1/")

>>> search = api.search(
...     collections=["sentinel-2-l2a"],
...     bbox=[-93.112301, 29.649001, -92.075965, 30.719868],
...     datetime="2019-07-01/2020-06-30",
... )

>>> search.matched()
numberMatched or context.matched not in response  # this is the warning.

If I make that request explicitly, we'll see that the server returned a 422

import requests

payload = dict(
    collection=["sentinel-2-l2a"],
    bbox=[-93.112301, 29.649001, -92.075965, 30.719868],
    datetime="2019-07-01/2020-06-30",
)
>>> r = requests.post("https://planetarycomputer-staging.microsoft.com/api/stac/v1/search", json=payload)
>>> r.status_code
422

With this body

{'detail': [{'loc': ['body', 'datetime'],
   'msg': 'Invalid datetime, must match format (%Y-%m-%dT%H:%M:%SZ).',
   'type': 'value_error'}]}

That information should be surfaced to the user, ideally in an exception.

(as a tangent, I can't tell from the spec if 2019-07-01/2020-06-30 is a valid datetime value or if it has to be 2019-07-01T00:00:00Z/2020-06-30T00:00:00Z

Add tutorials

Add jupyter notebook tutorials

Support context fragment in `ItemSearch.matched`

Currently, ItemSearch.matched requires the numberMatched field, but does not understand the STAC context fragment.

Can the client be updated to accept either field?

Can not import 'API'

Hi,

Apologies if this is a dull question. But I can't find any solution or workaround elsewhere.

I was trying the example in the docs: https://pystac-client.readthedocs.io/en/latest/usage/stac_api.html

But encountered errors in the very beginning. Please see below:

from pystac_client.Client import API
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'pystac_client.Client'

And:

from pystac_client import API
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'API' from 'pystac_client' (/Applications/anaconda3/envs/deep/lib/python3.8/site-packages/pystac_client/init.py)

I tried it on both the base and virtual envs but they have the same error message.

The pystac-client package was installed by the command in doc: pip install git+https://github.com/stac-utils/pystac-client.git#egg=pystac_client

My python version is 3.7

Can someone help me with this issue? Appreciate it.

Add query extension for searching properties

Update Conformance logic

Combine Classes STACAPIObjectMixin and Conformance

The STACAPIObjectMixin is a class that takes in a Conformance class to determine conformance. Combine the logic into a single Conformance class that can be added to other classes, currently:

Client: Used to determine basic required conformance initially
Search: Used to determine conformance of search endpoints/parameters

Conformance is not currently used, and the code for handling non-conformance needs to be added/updated.

I see this working as follows:

you can always open any STAC catalog-like object with Client
Most (all?) functions added in the Client class (which inherits from PySTAC Catalog) should require a conformance check. If non-conformant a ConformanceError is thrown with details.
If there is no conformsTo array, then assume this is a static catalog. Client will functionally work as a PySTAC Catalog, but most (all?) functions should return a ConformanceErrror
There should be an override parameter, ignore_conformance to the Client.open() function
Conformance is also used to determine if certain parameters to the ItemSearch class are allowed. Thus, the ItemSearch class should be passed a ConformanceClass that can be used to check if specific query parameters are allowed. (e.g., if ItemSearch gets a field kwarg and the field conformance is not present, a ConformanceError should be thrown.

`simple_stac_resolver` does not handle requests properly

simple_stac_resolver does not handle the original_request: requests.Requests field correctly, accessing it like a dictionary with original_request.get() instead of using field access:

Both cause the function to fail (taken from the docstrings, tested on python 3.8):

import json
import requests
from pystac_api.stac_io import simple_stac_resolver

original_request = requests.Request(
    method='POST',
    url='https://stac-api/search',
    data=json.dumps({'collections': ['my-collection']}).encode('utf-8'),
    headers={'x-custom-header': 'hi-there'}
)

next_link = {
     'href': 'https://stac-api/search?next=sometoken',
     'rel': 'next'
}

next_request = simple_stac_resolver(next_link, original_request)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-3727c92cfc17> in <module>
     14      'rel': 'next'
     15 }
---> 16 next_request = simple_stac_resolver(next_link, original_request)

/anaconda/envs/geo/lib/python3.8/site-packages/pystac_api/stac_io.py in simple_stac_resolver(link, original_request)
    117     # If the link object includes a "headers" property, use that and respect the "merge" property.
    118     link_headers = link.get('headers')
--> 119     headers = original_request.get('headers', {})
    120     if link_headers is not None:
    121         headers = {**headers, **link_headers} if merge else link_headers

AttributeError: 'Request' object has no attribute 'get'

Documentation on how to use with NASA CMR STAC endpoint

I'm trying to migrate over to using pystac-client from https://github.com/sat-utils/sat-search. Specifically, I'd like to use NASA's CMR stac endpoint but I'm unable to get any searches to work using a bbox parameter, which is pretty critical! Is this a bug? Or am I missing something important with formatting these keyword arguments?

For example, the following search does work with sat-search

from satsearch import Search
results = Search(url='https://cmr.earthdata.nasa.gov/stac/LPCLOUD',
                 collections=['HLSS30.v1.5'], 
                 #bbox = [-122.4,41.3,-122.1,41.5], #SatSearchError: "Request failed with status code 400"
                 bbox = '-122.4,41.3,-122.1,41.5', #works! 
                 datetime='2021-01-01/2021-02-01',  
                 #sortby='-properties.datetime' # results in 0 found but no error raised
                )
results.found()

But trying to replicate this does not work with pystac-client.

# Works if bbox commented out
catalog = Client.open("https://cmr.earthdata.nasa.gov/stac/LPCLOUD")
results = catalog.search(collections=['HLSS30.v1.5'],
                         #bbox = [-122.4,41.3,-122.1,41.5], 
                         bbox = '-122.4,41.3,-122.1,41.5',
                         datetime='2021-01-01/2021-02-01',
                         )
print(f"{results.matched()} items found")

---------------------------------------------------------------------------
APIError                                  Traceback (most recent call last)
<ipython-input-124-b65c76d02221> in <module>
      6                          datetime='2021-01-01/2021-02-01',
      7                          )
----> 8 print(f"{results.matched()} items found")

~/miniconda3/envs/hyp3/lib/python3.9/site-packages/pystac_client/item_search.py in matched(self)
    351 
    352     def matched(self) -> int:
--> 353         resp = make_request(self.session, self.request, {"limit": 0})
    354         found = None
    355         if 'context' in resp:

~/miniconda3/envs/hyp3/lib/python3.9/site-packages/pystac_client/stac_io.py in make_request(session, request, additional_parameters)
     46     resp = session.send(prepped)
     47     if resp.status_code != 200:
---> 48         raise APIError(resp.text)
     49     return resp.json()
     50 

APIError: "Request failed with status code 400"

I suspect because of formatting, but the docs do state, bbox can be passed as a string:

pystac-client/pystac_client/item_search.py

Line 111 in 77d95a2

bbox: list or tuple or Iterator or str, optional

The client can handle this logic and be explicit about the datetimes it requests.
Users should be able to specify:

A single year which should translate to a datetime range encompassing the whole year
A range of years which should translate to beginning of year1 to end of year2
A single month or range of months (2020-01/2020-02) which should act as above
A single fully complete datetime, which should be unaltered
A single date which should translate to a range encompassing the whole day
A date range translate to beginning of date1 to end of date2

Update documentation with examples showing all the possible inputs

	def matched(self) -> int:
	params = {**self._parameters, "limit": 0}
	resp = self._stac_io.read_json(self.url, method=self.method, parameters=params)

	@classmethod
	def get_collections_list(self):
	"""Gets list of available collections from this Catalog. Alias for get_child_links since children
	of an API are always and only ever collections
	"""
	return self.get_child_links()

stac-utils / pystac-client Goto Github PK

pystac-client's People

Contributors

Stargazers

Watchers

Forkers

pystac-client's Issues

Recommend Projects

Recommend Topics

Recommend Org