stac-utils / stac-server Goto Github PK
View Code? Open in Web Editor NEWA Node-based STAC API, AWS Serverless, OpenSearch
License: MIT License
A Node-based STAC API, AWS Serverless, OpenSearch
License: MIT License
OpenAPI docs seem to be outdated from the latest stac-spec for 0.9.0, need to be updated
openapi-merge seems like a good tool to use to create the spec from those in stac-api-spec
https://www.npmjs.com/package/openapi-merge
https://github.com/robertmassaioli/openapi-merge
(from Matt H.)
Add /api endpoint to return the OpenAPI spec for this API.
We can create a new yaml file based on
https://github.com/radiantearth/stac-spec/blob/master/api-spec/STAC-extensions.yaml
Follow instructions here
https://github.com/radiantearth/stac-spec/blob/master/api-spec/README.md#openapi-definitions
to create an API doc with core + all extensions but without the transaction extension.
According to the AWS CloudFormation docs AWS::SNS::TopicPolicy.Topics should be an array of strings. However, in the current serverless.yml, Topics is a single string.
Check and implement any missing functionality to be fully compliant with the OGC API - Features 1.0 specification.
API currently only does Elasticsearch dateQuery's for properties.datetime. Extend to allow vs properties.created and properties.updated.
Migrate CI to use Github actions instead of CircleCI.
The sort extension (implemented here: https://github.com/stac-utils/stac-api/blob/develop/libs/es.js#L357) now has two different forms. The normal POST JSON format, and a new single parameter, sortby which can be used in GET or POST. stac-api should support both.
See additional details here:
https://github.com/radiantearth/stac-spec/tree/dev/api-spec/extensions/sort#get-or-post-form
Also note that the parameter is now sortby
rater than sort
Currently the primary key in the elasticsearch backend is the Item ID.
However, if the server has an item that has the same ID across multiple collections this can cause a problem.
For instance, a sentinel-s2-l2
collection and sentinel-s2-l2-cogs
collection that is a mirror of the first collection but with COGs is going to have the same Item IDs. This is not possible within the same stac-server.
Instead the primary key should be something like collectionId
_itemId
This has repercussions to the transaction extension (see #37 ), if either ID is edited during an update. In this case a new item will need to be added and the old one removed.
The search link at the root endpoint must have a type of application/geo+json
When one or more Items fails ingestion into Elasticsearch during a bulk write, the returned error message is long and unhelpful as it is a big string with info from all the 'docs' that were in the batch, here's a sample:
2019-01-22T20:51:11.320Z - error: _index=items, _type=doc, _id=LC80230012015141LGN00, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3705, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230012015125LGN00, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3690, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230012015109LGN00, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3789, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230012015093LGN00, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3706, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230012015077LGN00, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3691, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230102016096LGN01, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3600, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230102015253LGN02, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3601, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230102014106LGN01, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3604, _primary_term=1, status=201, _index=items, _type=doc, _id=LC80230102014234LGN01, _version=1, result=created, total=2, successful=2, failed=0, _seq_no=3602, _primary_term=1, status=201,
Therefore one record starts with "_index" and ends with "status". If they were successful they will have "failed=0"
However, a failed record actually looks different:
_index=items, _type=doc, _id=LC80231212013343LGN00, status=400, type=mapper_parsing_exception, reason=failed to parse [geometry], type=parse_exception, reason=invalid number of points in LinearRing (found [1] - must be >= [4]),
It doesn't contain "successful", "failed" or other terms and instead has status=400 and a "reason" which is the actual error.
We don't want to log the entire string, instead we want to log each item that failed in the batch, if any.
We just need to log the "_id" field and the "reason" field for the error.
(via Matt H.)
The /collections endpoint currently returns just an array of collections when the actual response should be an object which contains both a links array and collections array. An example of the response expected is here: https://gist.github.com/kbgg/ffdaedb329b3d7fb294c242a301fe9a8
Provide an ability to specify a list of keywords on an Item to use in item searching. This allows the item to remain generalized but include indexable/searchable information for specific use cases. For example, an item could be used to represent a parcel, county, contract, easement or land use agreement.
This is important for applications using STAC to index remote sensing data (e.g., imagery) across a number of vertical markets. I.e., using the same properties schema but allowing arbitrary searchable keywords.
Usage examples:
3111822110030 (to find an item listing this ID as a keyword)
HEN (to find keywords starting with "Hen")
Add a Lambda function that can be invoked on a single collection.
Use elasticsearch aggregations to automatically generate summaries and update the collection metadata.
stac-server currently converts POST requests to GET requests and uses that as a next link.
Instead it needs to follow the spec where it returns a next page token variable, along with headers and body that the client needs to resend to go to the next page.
Can someone help me with AWS Architecture which gets deployed by this repo. Thanks
Related to #3.... like #22, the fields extension now has an alternate GET or POST format, as seen here: https://github.com/radiantearth/stac-spec/tree/dev/api-spec/extensions/fields
stac-api should support both
When ingesting Collections or Catalogs use stac-node-validator to validate
Hi,
it would be really nice to have a package that exposes the library index https://github.com/stac-utils/stac-server/blob/master/libs/index.js so anyone can use the API implementation and deploy it as needed.
For exemple we easily plugged the library to fastify (because we don't use AWS cloud or have access to a FaaS). But currently we have to fork the repo in order to achieve this. Publishing the libs/index.js
package would easily split the web server implementation/deployment from the main library. And maybe in a near future we could imagine some plugin packages like stac-server-(express|fastify|hapi|serverless)
that can leverage the base library independently of the web server implementation used.
Anyway thanks for this work, it gave us a great idea of what we can achieve with the stac spec !
There should be an SNS topic deployed with stac-server.
And then there should be an option (controlled via an envvar) that will publish newly ingested items to the topic.
This allows users to subscribe and monitor new Items added to the server.
Furthermore, the topic should be published with attributes for:
which allows a subscriber to filter on these attributes and only get messages which meet the criteria, e.g., message me all new Items within this bounding box for 2019 and for collection sentinel-s2-l2a.
The conformance classes Basic CQL2 and Basic Spatial Operators should be implemented. Basic CQL2 allows the logical operators (AND, OR, NOT), comparison operators (=, <>, <, <=, >, >=), and IS NULL against string, numeric, boolean, date, and datetime types. Basic Spatial Operators allows S_INTERSECTS (spatial intersects) on geometry fields.
Stretch: Advanced Comparison Operators defines the LIKE, BETWEEN, and IN operators, though BETWEEN and IN can be be written less-concisely using comparisons AND'ed or OR'ed together. LIKE cannot (practically) be worked-around this way.
Other implementations:
Open Questions:
Fix the failing tests, and add new tests if needed
When the result for the collection/items endpoint is returned, it includes a link to fetch the next page of results which looks like this:
https://api.server.com/collections/some_collection/items?collections=some_collection&page=2&limit=10
When navigating to this link, this error is generated:
Unexpected token r in JSON at position 0
It looks like the collections parameter is expected to be a JSON array of strings rather than just a single string causing a JSON parsing error.
(matthewhanson)
I had to disable the fields code, because the default behavior was that it wasn't returning the complete item - only a subset.
Targeting this for a 0.4.0 final, if it doesn't make it then we can just keep the fields extension out and put it back in for 0.5.0
The conformance link can be removed and instead add a conformsTo
field at the root catalog, with these classes:
https://api.stacspec.org/v1.0.0-beta.1/core
https://api.stacspec.org/v1.0.0-beta.1/item-search
pagination is now handled by providing a link in the returned ItemCollection with a rel type of next
.
It doesn't seem to be clearly explained in the spec, only referred to the OpenAPI yaml file.
I've been using Earth Search for Sentinel-2 data (it's awesome!) and noticed some weird behavior around missing items. For example,
https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l1c/items/foo
Will return a HTTP 200 status code, but with a body that looks like:
{
"code": 404,
"message": "Item not found"
}
When searching the catalog, I need to (extracted from search function):
import requests
from urllib.error import HTTPError
S2_SEARCH_URL = 'https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l1c/items'
search_url = f'{S2_SEARCH_URL}/foo'
response = requests.get(search_url)
response.raise_for_status()
scene_metadata = response.json()
if scene_metadata.get('code') == 404:
raise HTTPError(search_url, 404, scene_metadata['message'], response.headers, None)
which is a little cumbersome.
I'd expect this:
>>> import requests
>>> S2_SEARCH_URL = 'https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l1c/items'
>>> response = requests.get(f'{S2_SEARCH_URL}/foo')
>>> response.raise_for_status()
Traceback (most recent call last):
...
requests.exceptions.HTTPError: 404 Client Error: Not Found for url:...
Which, for comparison, is how USGS's STAC catalog works:
>>> import requests
>>> LC2_SEARCH_URL = 'https://landsatlook.usgs.gov/sat-api/collections/landsat-c2l1/items'
>>> response = requests.get(f'{LC2_SEARCH_URL}/foo')
>>> response.raise_for_status()
Traceback (most recent call last):
...
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://landsatlook.usgs.gov/sat-api/collections/landsat-c2l1/items/foo
Updating the backend to use separate elasticsearch indexes (#38) broke transactions (#37) when the collection is one of the fields to be edited.
If collection
is one of the fields to be changed, then:
1 - the PATCH operation needs to retrieve the original Item and merge the updated fields with it
2 - add the new Item to the new index with the name of the collection the item uses
3 - delete the old Item from the old collection index
cc @seanmurph
Update deployment repo
https://github.com/sat-utils/sat-api-deployment
(via Matt Hanson)
elasticsearch.js package is being deprecated and replaced with the new client:
https://www.npmjs.com/package/@elastic/elasticsearch
Migration guide here:
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/breaking-changes.html
Determine code coverage for tests
Enabling test coverage in AVA https://github.com/avajs/ava/blob/main/docs/recipes/code-coverage.md
Review mappings in fixtures directory vs latest stac-spec version
this codebase was forked from sat-api
to stac-api
since it's become a server for STAC and not restricted to satellite data.
However this is causing confusion since the STAC API spec is also called stac api.
Instead this should be called stac-server
. The repo has been renamed (stac-api will redirect here), but there are references in the code that should be updated to stac-server.
Whether in readme.md or in a separate file, a few things should be expanded or changed:
Subscribing to SNS Topics - Implies this can be done through 'Create subscription' interface, but that doesn't seem to work. Instead, adding a trigger on Lambda edit page is the simple way through interface.
Create a docker-compose for elasticsearch.
Make sure local api can see the elasticsearch.
Not all fields need to be indexed. Instead the full document can be stored in elasticsearch but only certain fields are indexed.
Should use exclusion logic to avoid indexing new fields that may be added from extensions.
For example:
I am seeing a ReferenceError when results need to be paged. Maybe coming from here?
Line 327 in 008f49f
Should this be link
(defined on line 317) not links
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.