stac-utils / stac-server Goto Github PK

View Code? Open in Web Editor NEW

64.0 64.0 28.0 8.11 MB

A Node-based STAC API, AWS Serverless, OpenSearch

License: MIT License

JavaScript 98.44% Shell 0.79% HTML 0.25% TypeScript 0.52%

aws opensearch serverless

stac-server's People

Contributors

Stargazers

Watchers

stac-server's Issues

Update OpenAPI docs to 1.0.0-rc.1

OpenAPI docs seem to be outdated from the latest stac-spec for 0.9.0, need to be updated

openapi-merge seems like a good tool to use to create the spec from those in stac-api-spec
https://www.npmjs.com/package/openapi-merge
https://github.com/robertmassaioli/openapi-merge

Add /api endpoint

(from Matt H.)
Add /api endpoint to return the OpenAPI spec for this API.

We can create a new yaml file based on
https://github.com/radiantearth/stac-spec/blob/master/api-spec/STAC-extensions.yaml

Follow instructions here
https://github.com/radiantearth/stac-spec/blob/master/api-spec/README.md#openapi-definitions

to create an API doc with core + all extensions but without the transaction extension.

Topics should be a list

According to the AWS CloudFormation docs AWS::SNS::TopicPolicy.Topics should be an array of strings. However, in the current serverless.yml, Topics is a single string.

Full compliance with OAF

Check and implement any missing functionality to be fully compliant with the OGC API - Features 1.0 specification.

Migrate docs page from widdershins to re-doc.

Create different mappings for Items and Collections

allow dateQuery against created and updated

API currently only does Elasticsearch dateQuery's for properties.datetime. Extend to allow vs properties.created and properties.updated.

Use Github actions for CI

Migrate CI to use Github actions instead of CircleCI.

Update GET format of sort extension

The sort extension (implemented here: https://github.com/stac-utils/stac-api/blob/develop/libs/es.js#L357) now has two different forms. The normal POST JSON format, and a new single parameter, sortby which can be used in GET or POST. stac-api should support both.

See additional details here:
https://github.com/radiantearth/stac-spec/tree/dev/api-spec/extensions/sort#get-or-post-form

Also note that the parameter is now sortby rater than sort

Update elasticsearch version to 7

Update primary key to be collectionId + itemId

Currently the primary key in the elasticsearch backend is the Item ID.

However, if the server has an item that has the same ID across multiple collections this can cause a problem.

For instance, a sentinel-s2-l2 collection and sentinel-s2-l2-cogs collection that is a mirror of the first collection but with COGs is going to have the same Item IDs. This is not possible within the same stac-server.

Instead the primary key should be something like collectionId_itemId

This has repercussions to the transaction extension (see #37 ), if either ID is edited during an update. In this case a new item will need to be added and the old one removed.

Replace circleci badge with github actions

update search link at root endpoint

The search link at the root endpoint must have a type of application/geo+json

Logging batch ingest errors

When one or more Items fails ingestion into Elasticsearch during a bulk write, the returned error message is long and unhelpful as it is a big string with info from all the 'docs' that were in the batch, here's a sample:

2019-01-22T20:51:11.320Z - error: _index=items, _type=doc, _id=LC80230012015141LGN00, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3705, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230012015125LGN00, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3690, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230012015109LGN00, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3789, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230012015093LGN00, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3706, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230012015077LGN00, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3691, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230102016096LGN01, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3600, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230102015253LGN02, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3601, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230102014106LGN01, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3604, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230102014234LGN01, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3602, _primary_term=1, status=201,
Therefore one record starts with "_index" and ends with "status". If they were successful they will have "failed=0"

However, a failed record actually looks different:

_index=items, _type=doc, _id=LC80231212013343LGN00, status=400, type=mapper_parsing_exception, reason=failed to parse [geometry], type=parse_exception, reason=invalid number of points in LinearRing (found [1] - must be >= [4]),
It doesn't contain "successful", "failed" or other terms and instead has status=400 and a "reason" which is the actual error.

We don't want to log the entire string, instead we want to log each item that failed in the batch, if any.

We just need to log the "_id" field and the "reason" field for the error.
(via Matt H.)

/collections endpoint doesn't conform to STAC spec

The /collections endpoint currently returns just an array of collections when the actual response should be an object which contains both a links array and collections array. An example of the response expected is here: https://gist.github.com/kbgg/ffdaedb329b3d7fb294c242a301fe9a8

Move to serverless framework deployment

Keyword search enhancement

Provide an ability to specify a list of keywords on an Item to use in item searching. This allows the item to remain generalized but include indexable/searchable information for specific use cases. For example, an item could be used to represent a parcel, county, contract, easement or land use agreement.

This is important for applications using STAC to index remote sensing data (e.g., imagery) across a number of vertical markets. I.e., using the same properties schema but allowing arbitrary searchable keywords.

Usage examples:

3111822110030 (to find an item listing this ID as a keyword)
HEN (to find keywords starting with "Hen")

Add lambda for generating summaries on collections

Add a Lambda function that can be invoked on a single collection.

Use elasticsearch aggregations to automatically generate summaries and update the collection metadata.

Update paging to follow API spec

stac-server currently converts POST requests to GET requests and uses that as a next link.

Instead it needs to follow the spec where it returns a next page token variable, along with headers and body that the client needs to resend to go to the next page.

Update sorting to align with OGC API

radiantearth/stac-api-spec#16

AWS Architecture

Can someone help me with AWS Architecture which gets deployed by this repo. Thanks

Update fields extension

Related to #3.... like #22, the fields extension now has an alternate GET or POST format, as seen here: https://github.com/radiantearth/stac-spec/tree/dev/api-spec/extensions/fields

stac-api should support both

Use stac-node-validator to validate ingested metadata

When ingesting Collections or Catalogs use stac-node-validator to validate

https://www.npmjs.com/package/stac-node-validator

publish libs package

Hi,
it would be really nice to have a package that exposes the library index https://github.com/stac-utils/stac-server/blob/master/libs/index.js so anyone can use the API implementation and deploy it as needed.

For exemple we easily plugged the library to fastify (because we don't use AWS cloud or have access to a FaaS). But currently we have to fork the repo in order to achieve this. Publishing the libs/index.js package would easily split the web server implementation/deployment from the main library. And maybe in a near future we could imagine some plugin packages like stac-server-(express|fastify|hapi|serverless) that can leverage the base library independently of the web server implementation used.

Anyway thanks for this work, it gave us a great idea of what we can achieve with the stac spec !

add publish SNS for newly ingested Items

There should be an SNS topic deployed with stac-server.

And then there should be an option (controlled via an envvar) that will publish newly ingested items to the topic.

This allows users to subscribe and monitor new Items added to the server.

Furthermore, the topic should be published with attributes for:

bounding box
datetime
collection

which allows a subscriber to filter on these attributes and only get messages which meet the criteria, e.g., message me all new Items within this bounding box for 2019 and for collection sentinel-s2-l2a.

Improve code examples in yaml docs

Add STAC API Filter Extension (CQL2) support

STAC API 1.0.0-beta.5 Filter Extension: https://github.com/radiantearth/stac-api-spec/tree/v1.0.0-beta.5/fragments/filter
OGC CQL2 spec: https://github.com/opengeospatial/ogcapi-features/tree/master/cql2
CQL-Text BNF: https://github.com/opengeospatial/ogcapi-features/blob/master/cql2/standard/schema/cql2.bnf
CQL-JSON JSON Schema: https://github.com/opengeospatial/ogcapi-features/blob/master/cql2/standard/schema/cql2.json

The conformance classes Basic CQL2 and Basic Spatial Operators should be implemented. Basic CQL2 allows the logical operators (AND, OR, NOT), comparison operators (=, <>, <, <=, >, >=), and IS NULL against string, numeric, boolean, date, and datetime types. Basic Spatial Operators allows S_INTERSECTS (spatial intersects) on geometry fields.

Stretch: Advanced Comparison Operators defines the LIKE, BETWEEN, and IN operators, though BETWEEN and IN can be be written less-concisely using comparisons AND'ed or OR'ed together. LIKE cannot (practically) be worked-around this way.

Other implementations:

pgstac - most up-to-date one
Pygeoap - might be out of date wrt the latest spec

Open Questions:

CQL2-Text and/or CQL2-JSON?
which conformance classes?

Fix failing tests and expand tests if needed

Fix the failing tests, and add new tests if needed

Next link generated in search results causes JSON parse error

When the result for the collection/items endpoint is returned, it includes a link to fetch the next page of results which looks like this:
https://api.server.com/collections/some_collection/items?collections=some_collection&page=2&limit=10

When navigating to this link, this error is generated:
Unexpected token r in JSON at position 0

It looks like the collections parameter is expected to be a JSON array of strings rather than just a single string causing a JSON parsing error.

Fix Field Extension

(matthewhanson)
I had to disable the fields code, because the default behavior was that it wasn't returning the complete item - only a subset.

Targeting this for a 0.4.0 final, if it doesn't make it then we can just keep the fields extension out and put it back in for 0.5.0

Update conformance

The conformance link can be removed and instead add a conformsTo field at the root catalog, with these classes:

https://api.stacspec.org/v1.0.0-beta.1/core
https://api.stacspec.org/v1.0.0-beta.1/item-search

https://github.com/radiantearth/stac-api-spec

add new paging support

pagination is now handled by providing a link in the returned ItemCollection with a rel type of next.

It doesn't seem to be clearly explained in the spec, only referred to the OpenAPI yaml file.

Missing items return with status code 200 instead of 404

I've been using Earth Search for Sentinel-2 data (it's awesome!) and noticed some weird behavior around missing items. For example,

https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l1c/items/foo

Will return a HTTP 200 status code, but with a body that looks like:

{
  "code": 404,
  "message": "Item not found"
}

When searching the catalog, I need to (extracted from search function):

import requests
from urllib.error import HTTPError

S2_SEARCH_URL = 'https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l1c/items'

search_url = f'{S2_SEARCH_URL}/foo'
response = requests.get(search_url)
response.raise_for_status()

scene_metadata = response.json()
if scene_metadata.get('code') == 404:
    raise HTTPError(search_url, 404, scene_metadata['message'], response.headers, None)

which is a little cumbersome.

I'd expect this:

>>> import requests
>>> S2_SEARCH_URL = 'https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l1c/items'
>>> response = requests.get(f'{S2_SEARCH_URL}/foo')
>>> response.raise_for_status()
Traceback (most recent call last):
...
requests.exceptions.HTTPError: 404 Client Error: Not Found for url:...

Which, for comparison, is how USGS's STAC catalog works:

>>> import requests
>>> LC2_SEARCH_URL = 'https://landsatlook.usgs.gov/sat-api/collections/landsat-c2l1/items'
>>> response = requests.get(f'{LC2_SEARCH_URL}/foo')
>>> response.raise_for_status()
Traceback (most recent call last):
...
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://landsatlook.usgs.gov/sat-api/collections/landsat-c2l1/items/foo

Consider pagination for catalog collection link list

PUT and PATCH transactions that change the collection should move the item to another index

Updating the backend to use separate elasticsearch indexes (#38) broke transactions (#37) when the collection is one of the fields to be edited.

If collection is one of the fields to be changed, then:
1 - the PATCH operation needs to retrieve the original Item and merge the updated fields with it
2 - add the new Item to the new index with the name of the collection the item uses
3 - delete the old Item from the old collection index

cc @seanmurph

Migrate Docs from Widdershins to Github Actions

Refactor index preparation

Update sat-api-deployment

Update deployment repo
https://github.com/sat-utils/sat-api-deployment
(via Matt Hanson)

Migrate to new elasticsearch client

elasticsearch.js package is being deprecated and replaced with the new client:
https://www.npmjs.com/package/@elastic/elasticsearch

Migration guide here:
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/breaking-changes.html

Code coverage

Determine code coverage for tests

Enabling test coverage in AVA https://github.com/avajs/ava/blob/main/docs/recipes/code-coverage.md

Review mappings

Review mappings in fixtures directory vs latest stac-spec version

update name from stac-api to stac-server

this codebase was forked from sat-api to stac-api since it's become a server for STAC and not restricted to satellite data.

However this is causing confusion since the STAC API spec is also called stac api.

Instead this should be called stac-server. The repo has been renamed (stac-api will redirect here), but there are references in the code that should be updated to stac-server.