Git Product home page Git Product logo

nasa-pds / registry-api Goto Github PK

View Code? Open in Web Editor NEW
2.0 7.0 5.0 5.23 MB

Web API service for the PDS Registry, providing the implementation of the PDS Search API (https://github.com/nasa-pds/pds-api) for the PDS Registry.

Home Page: https://nasa-pds.github.io/pds-api

License: Other

Java 88.19% HCL 2.53% Python 5.45% ANTLR 0.19% JavaScript 1.87% HTML 0.61% Shell 0.06% Dockerfile 0.88% CSS 0.22%
data-search java nasa nasa-api pds pds4

registry-api's Introduction

Registry API

DOI ๐Ÿคช Unstable integration & delivery ๐Ÿ˜Œ Stable integration & delivery

This repository implements the search API v1.0.0-SNAPSHOT for the PDS registry.

It is composed with the following subcomponents:

  • lexer: parse the API request queries (q parameter), based on antlr4 grammar
  • model: library end-point controller definition and response objects generated from the openAPI specification (see https://github.com/NASA-PDS/pds-api/)
  • service: the API service, a spring-boot application

Prerequisites

For the API to work, you also need ElasticSearch/OpenSearch with some test data loaded in it.

Based on docker you can easily start all the prerequisites as configured in the registry repository. This repository is also useful to run the integration tests:

git clone https://github.com/NASA-PDS/registry.git

Start the prerequisites by following the Quick Start Guide

Start the application from a released package

Get the latest stable release https://github.com/NASA-PDS/registry-api/releases

Download the zip or tar.gz 'registry-api-service-1.0.0-bin' file.

Follow instructions in README.txt in the decompressed folder

Developers

Running the API

Prerequisites

To build and run the application you need:

  • jdk 17
  • maven

Additionally, harvested data will only be picked up correctly by the API if all of the following are true:

There are two approaches to running a local development instance of the API

[Option 1] Non-Containerized (useful for breakpoint debugging)

  1. Deploy an instance of the registry docker-compose

  2. Kill the existing API container

    docker kill docker-registry-api-1
    
  3. Temporarily disable certificate verification by making the following modification to application.properties

    openSearch.sslCertificateCNVerification=false
    
  4. Build the application

    mvn clean install
    
  5. Start the application

    cd service
    mvn spring-boot:run
    

The API will now be accessible on (by default) https://localhost:8080

  1. Specific configuration profile: if you run the application in a specific environment you can define a dedicated application.properties, for example application-dev.properties that does not need to be commited on git. Launch it as follow:

    mvn -Dspring-boot.run.profiles=dev spring-boot:run

[Option 2] Build a development docker image

Your local docker image will be used in the integration deployment described below.

mvn spring-boot:build-image

View Swagger UI

Go to http://localhost:8080

Integration deployment

You can deploy the registry-api together with all other components of the registry (harvest, opensearch, ...) and reference datasets.

Clone the registry repository, and launch the docker compose script as described in https://github.com/NASA-PDS/registry/tree/main/docker

For example, launch:

docker compose --profile int-registry-batch-loader up

The integration tests will be automatically applied. Check the results, update/complete them as necessary

Tests

Important note: As a developer you are asked to complete the postman test suite according to the new feature you are developing. Do a pull request in the registry project to submit the updates.

Integration test are maintained in postman.

Edit/Run of the integration tests in postman GUI

Install the postman desktop, from https://www.postman.com/downloads/

Download and open the test suite found in https://github.com/NASA-PDS/registry/tree/main/docker/postman

Run the integration tests in command line

In the registry project.

Launch the test in command line:

npm install newman
newman run docker/postman/postman_collection.json --env-var baseUrl=http://localhost:8080

registry-api's People

Contributors

al-niessner avatar alexdunnjpl avatar c-suh avatar dependabot[bot] avatar jimmie avatar jordanpadams avatar lylebarner avatar nutjob4life avatar pdsen-ci avatar tdddblog avatar testpersonal avatar tloubrieu-jpl avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

registry-api's Issues

As an API user, I want to be able to use the API for free text search

Motivation

...so that I can perform a keyword or Google-like search on the registry and get reasonable results back.

Additional Details

This development with be later developed into a full natural language search feature. See NASA-PDS/pds-api#93

The premise of this task is to come up with the API design in order to enable keyword search. The actual result set will be refined per NASA-PDS/pds-api#93.

Acceptance Criteria

Given : some data products for a particular mission (e.g. insight) or targeting a particular planet (e.g. mars)
When I perform: an API search for something like keyword=insight or keyword=mars
Then I expect: it returns products that provide some fuzzy match for the keyword terms searched

Engineering Details

This ticket requires to update the web API specification to accept a free text search criteria instead of only {pds4 field attribute}=value criteria.

For this ticket, we, at minimum, what to allow for free text search utilizing ES default weighting. We will then want to investigate what we think should maybe be weighted a little more strongly to enable more robust search results. NASA-PDS/pds-api#49

As an API user, I want to handle long-running queries that take >10 seconds.

Motivation

...so that I can ensure the user does not lose attention or think the software is broken.

Additional Details

Acceptance Criteria

Given a deployed API and registry with data ingested
When I perform a query against an endpoint where the response time is >10 seconds and max_response_time flag not indicated.
Then I expect an error response indicating response time for this query is the default 10 max response time. user should narrow query or use the max_response_time flag to increase the time.

Engineering Details

  • introduce new max_response_time flag to the API with default 10 (seconds)
  • implement error handling for max_response_time failure

As a user, I want to receive a JSON response that contains the PDS4 label metadata in JSON format (application/vnd.nasa.pds.pds4+json)

๐Ÿ’ช Motivation

...so that I can search for and parse the metadata for a particular product in a JSON-structured form. This will keep the nesting that comes with XML, enabling robust client applications to use that metadata.

๐Ÿ“– Additional Details

โš–๏ธ Acceptance Criteria

Given an ingested PDS4 product
When I perform an API query for that product using the endnpoint /products/{lidvid}
Then I expect the response to look something like this:

  {
    "id": "urn:nasa:pds:insight_cameras::7.0",
    "meta": {
      "file_size": 34534,
      "checksum": "fhfghx1224f"
    },
    "additional_meta": {
      "opus:": { ... }
      "analyst_notebook" : { ... }
      "dataCite" : { ... }
    },
    "pds4": {
       "Product_Bundle": {
          "Identification_Area": {
             "logical_identifier": "urn:nasa:pds:insight_cameras",
             "version_id": "7.0",
             "title": "InSight Cameras Bundle",
             "information_model_version": "1.11.1.0",
             "product_class": "Product_Bundle",
             "Citation_Information": {
                "author_list": "R. Deen, H. Abarca, P. Zamani, J.Maki",
                "publication_year": "2019",
                "description": "InSight Cameras Experiment Data Record (EDR) and Reduced Data Record (RDR) Data Products"
             }
          },
          "Context_Area": {
             "comment": "Observational Intent",
             "Time_Coordinates": {
                "start_date_time": "2020-07-05T14:15:07.441Z",
                "stop_date_time": "2020-10-05T02:02:53.219Z"
             },
             "Primary_Result_Summary": {
                "purpose": "Science",
                "processing_level": "Raw",
                "Science_Facets": {
                   "wavelength_range": "Visible",
                   "domain": "Surface",
                   "discipline_name": "Imaging"
                }
             },
             "Investigation_Area": {
                "name": "Insight",
                "type": "Mission",
                "Internal_Reference": {
                   "lid_reference": "urn:nasa:pds:context:investigation:mission.insight",
                   "reference_type": "bundle_to_investigation"
                }
             },
             "Observing_System": {
                "Observing_System_Component": [
                   {
                      "name": "Insight Lander",
                      "type": "Spacecraft",
                      "Internal_Reference": {
                         "lid_reference": "urn:nasa:pds:context:instrument_host:spacecraft.insight",
                         "reference_type": "is_instrument_host",
                         "comment": "Reference to the Insight spacecraft."
                      }
                   },
                   {
                      "name": "Insight Context Camera",
                      "type": "Instrument",
                      "Internal_Reference": {
                         "lid_reference": "urn:nasa:pds:context:instrument:icc.insight",
                         "reference_type": "is_instrument",
                         "comment": "Reference to the InSight Context Camera instrument onboard the InSight spacecraft."
                      }
                   },
                   {
                      "name": "Insight Deployment Camera",
                      "type": "Instrument",
                      "Internal_Reference": {
                         "lid_reference": "urn:nasa:pds:context:instrument:idc.insight",
                         "reference_type": "is_instrument",
                         "comment": "Reference to the InSight Deployment Camera instrument onboard the InsSight spacecraft."
                      }
                   }
                ]
             },
             "Target_Identification": {
                "name": "Mars",
                "type": "Planet",
                "Internal_Reference": {
                   "lid_reference": "urn:nasa:pds:context:target:planet.mars",
                   "reference_type": "document_to_target",
                   "comment": "Reference to the Planet - Mars target"
                }
             }
          },
          "Bundle": {
             "bundle_type": "Archive",
             "description": "This Bundle contains InSight camera data."
          },
          "Bundle_Member_Entry": [
             {
                "lid_reference": "urn:nasa:pds:insight_cameras:browse",
                "member_status": "Primary",
                "reference_type": "bundle_has_browse_collection"
             },
             {
                "lid_reference": "urn:nasa:pds:insight_cameras:calibration",
                "member_status": "Primary",
                "reference_type": "bundle_has_calibration_collection"
             },
             {
                "lid_reference": "urn:nasa:pds:insight_cameras:data",
                "member_status": "Primary",
                "reference_type": "bundle_has_data_collection"
             },
             {
                "lid_reference": "urn:nasa:pds:insight_cameras:document",
                "member_status": "Primary",
                "reference_type": "bundle_has_document_collection"
             },
             {
                "lid_reference": "urn:nasa:pds:insight_cameras:miscellaneous",
                "member_status": "Primary",
                "reference_type": "bundle_has_document_collection"
             },
             {
                "lid_reference": "urn:nasa:pds:insight_cameras:xml_schema",
                "member_status": "Primary",
                "reference_type": "bundle_has_schema_collection"
             }
          ],
          "_xmlns": "http://pds.nasa.gov/pds4/pds/v1",
          "_xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
          "_xsi:schemaLocation": "http://pds.nasa.gov/pds4/pds/v1 https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B10.xsd"
       }
    }
  }
]
}

โš™๏ธ Engineering Details

  • update harvest to ingest JSON blob
  • add new response format to API spec
  • update API service to output JSON from blob

As an API user, I want to know the Bundle for a given Product.

For more information on how to populate this new feature request, see the PDS Wiki on User Story Development:

https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Issue-Tracking#user-story-development

Motivation

...so that I can <why do you want to do this?>

Additional Details

Notional API Design:

GET /products/{identifier}/member-of/member-of

See NASA-PDS/registry#109 and NASA-PDS/registry#108 for how the registry ingests primary and secondary products.

Note: This logic is a little funkier for secondary bundles:

  1. find all collections this product belongs to (as a primary or secondary product)
  2. find all bundles those collections belong to (as a primary or secondary collection)

Acceptance Criteria

Given I have a product LID OR LIDVID
When I perform an API query by that LIDVID for its grandparent bundle(s)
Then I expect the API to returns the primary bundle (there should be only 1) AND any secondary bundle(s) (could be many) the product belongs to

Sub-tasks

  • Update Swagger API
  • Implement in API Server

As an API user, I want to search by a temporal range as an ISO-8601 time interval.

Motivation

...so that I can find data within some notional time.

Additional Details

The API enables this out of the box. This issue is really just to document how to do it.

For example: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#temporal-range-searches

Should we enable "fuzzy" temporal ranges (e.g. datetime ge 2017)? or maybe that is a separate ticket and we wait to see if people are actually upset about requiring ISO 8601 time intervals.

Acceptance Criteria

Given a deployed API server
When I perform A request with time criteria as follow, for example

curl --location --request GET 'http://localhost:8080/products?&q=((pds:Time_Coordinates.pds:start_date_time gt "2018-05-04T00:00:00Z") and (pds:Time_Coordinates.pds:start_date_time lt "2018-05-06T00:00:00Z"))'

Then I expect a result where the start_date_time property matches the criteria

Note that the postman collection is updated with this testNASA-PDS/pds-api#72 in https://github.com/NASA-PDS/registry/blob/main/docker/postman/postman_collection.json

Engineering Details

Should require ISO-8601 time intervals.

Unsupported response formats (csv or kvp) should be removed from end-points

๐Ÿ› Describe the bug

In the swagger web ui the end-point returning single product (or bundle or collection) do not support kvp related formats.
These formats should be removed from the swagger specification so that they are not proposed to the user in the UI

๐Ÿ“œ To Reproduce

Steps to reproduce the behavior:

  1. Go to https://pds-gamma.jpl.nasa.gov/api/swagger-ui.html#!/bundles/bundleByLidvid
  2. Add identifier 'urn:nasa:pds:orex.ovirs::10.0'
  3. Check that default selected format is 'application/csv'
  4. Click on try out
  5. See error status 500

๐Ÿ•ต๏ธ Expected behavior

application/csv or text/csv or kvp related format should not be proposed for the end-point returning a single entity.

๐Ÿ“š Version of Software Used

0.5.0-SNAPSHOT

๐Ÿฉบ Test Data / Additional context

๐ŸžScreenshots

image

๐Ÿ–ฅ System Info

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

๐Ÿฆ„ Related requirements

โš™๏ธ Engineering Details

As an API user, I want to know the Product(s) that belong to a given Bundle.

For more information on how to populate this new feature request, see the PDS Wiki on User Story Development:

https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Issue-Tracking#user-story-development

Motivation

...so that I can <why do you want to do this?>

Additional Details

Notional API Design:

GET /products/{bundle-identifier}/members/members

See NASA-PDS/registry#109 and NASA-PDS/registry#108 for how the registry ingests primary and secondary products.

Note: The logic here is a little funky:

  1. find all primary and secondary collections for the bundle
  2. find all primary and secondary products for those collections
    a. if product comes from secondary collection, ALL products from those collections should be considered secondary

Acceptance Criteria

Given I have a bundle LID OR LIDVID
When I perform an API query by that LIDVID for its products
Then I expect the API to return the primary product(s) AND any secondary product(s) for that bundle

Sub-tasks

  • Update Swagger API
  • Implement in API Server

As a user, I want to get a key-value-pair JSON response

๐Ÿ’ช Motivation

...so that I can query for a specific subset of fields from the result set.

๐Ÿ“– Additional Details

Tightly-coupled in functionality with #445

โš–๏ธ Acceptance Criteria

Given a registry + API installment with 1 or more data sets ingested
When I perform a query of the API with response format specified as application/kvp+json
Then I expect the values for the records from the return set in a valid JSON format where each document returned in the result set has the values "key": "value"

โš™๏ธ Engineering Details

As a user, I want an end-point of each of the PDS4 IM classes of products

Motivation

...so that I can easily request only the suclass I am interested in (bundle, collection, observational_products, document, context).

This ticket is needed to implement the Search UI, see NASA-PDS/pds.nasa.gov-search-prototype#33

Additional Details

We will still have a generic end-point /products which we return any classe of product.

Acceptance Criteria

Given
When I perform
Then I expect

Engineering Details

This ticket is linked to NASA-PDS/registry-api-service#34

As an API user, I want to specify whether I get the latest or all versions of a product

๐ŸŒฌ Motivation

PDS labels form references with collections and other labels in a bundle through two mechanisms: one is by specifying a full logical identifier + version identifier, or "lidvid", such as:

<Bundle_Member_Entry>
    <lidvid_reference>urn:nasa:pds:insight_documents:document_mission::2.0</lidvid_reference>
    <member_status>Primary</member_status>
</Bundle_Member_Entry>

Another is with a reference just to the logical identifier, or "lid"; for example:

<Bundle_Member_Entry>
    <lid_reference>urn:nasa:pds:ladee_mission:xml_schema_collection</lid_reference>
    <member_status>Primary</member_status>
</Bundle_Member_Entry>

In the former, a single referenced collection is indicated; in the latter, there's a choice: do we want all versions of the collection or just the latest? When searching for collections within a bundle, it would be great to be able to add a search parameter that gives that choice to the client.

๐Ÿ•ต๏ธ Additional Details

The use case for this is in registry version of the PDS Deep Archive. When examining "lid-only" references, the PDS Deep Archive can generate two kinds of Archive Information Packages and Submission Information Packages:

  • One with all versions of the referenced collections
  • Or one with just the latest version

A command-line parameter, --include-latest-collection-only, turns on the second behavior.

Because the API service (and the ElasticSearch behind it) is in an ideal position to do this distinction, it should provide it as a feature. This would also reduce the over-the-wire data transferred from the APIโ€”which can be problematic for huge bundles (think insight_cameras).

Of course, if it already does support this, please close this issue and tell me how! ๐Ÿ˜Š

Acceptance Criteria

See sub-tickets to this Epic for specific acceptance criteria addressing these two use cases

As an API user, I want to explicitly request the latest version of a product

๐Ÿ’ช Motivation

...so that I can get the latest version of a product only, ignoring the superseded versions of the product.

๐Ÿ“– Additional Details

โš–๏ธ Acceptance Criteria

Given a registry populated with multiple versions of a product
When I perform a query to the products/{lid}/latest endpoint by LID
Then I expect to have the latest version of the product returned, only

Given a registry populated with multiple versions of a collection
When I perform a query to the collections/{lid}/latest endpoint by LID
Then I expect to have the latest version of the collection returned, only

Given a registry populated with multiple versions of a bundle
When I perform a query to the bundles/{lid}/latest endpoint by LID
Then I expect to have the latest version of the bundle returned, only

โš™๏ธ Engineering Details

API design discussed at 2021-07-28 API WG Meeting

As a user, I want to have singular urls when I should only expect a single element in the response

๐Ÿ’ช Motivation

...so that the API underlying data model is more readable and intuitive.

๐Ÿ“– Additional Details

the urls:

  • /products/:lidvid/collections should be /products/:lidvid/collection
  • /products/:lidvid/bundles should be /products/:lidvid/bundle
  • /collections/:lidvid/bundles should be /collections/:lidvid/bundle

If the format of the response should be an array or a single element can be discussed.

If the format of the response remains an array with a single element, we could keep the same url end-points with plural.

If the a product or a collection can belong to multiple collections and bunldes we should keep the url as-is and cancel this ticket.

โš–๏ธ Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

As a user, I want specific end points for products which are not collections or bundles

Motivation

Currently the end-point /products refers to any type of product, including collections and bundles.

So the end-points /products/{lidvid}/bundles or /products/{lidvid}/collections created for ticket #29 are confusing.

We need to find a better solution for that.

Additional Details

Acceptance Criteria

Given
When I perform
Then I expect

Engineering Details

As an operator, I want to have a wrapper script for starting up the API service

Motivation

...so that I can have a better user experience when trying to run the tool.

Additional Details

This helps the Registry API Service fall in line with all our others tools, which use a command-line script to execute the java -jar command by finding the Java command, and executing the software. For example:

Acceptance Criteria

Given a pre-built registry-api-service application
When I perform an execution of a command-line script registry-api-service
Then I expect the registry-api-service JAR to be executed with the appropriate arguments given to the script via the command-line (as applicable)

Given a pre-built registry-api-service application
When I perform an execution of a window command-line batch file registry-api-service.bat
Then I expect the registry-api-service JAR to be executed with the appropriate arguments given to the script via the command-line (as applicable)

Engineering Details

  • develop Unix wrapper script
  • develop windows batch wrapper script

As an API user, I want to know the Collection(s) for a given Product.

For more information on how to populate this new feature request, see the PDS Wiki on User Story Development:

https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Issue-Tracking#user-story-development

Motivation

...so that I can know all collections a product is referenced by

Additional Details

Notional API Design:

GET /products/{lidvid}/collections

See NASA-PDS/registry#109 and NASA-PDS/registry#108 for how the registry ingests primary and secondary products.

Acceptance Criteria

Given I have a product LID OR LIDVID
When I perform an API query by that LIDVID for its collections
Then I expect the API to return the primary collection (there should be only 1) AND any secondary collections (could be many) the product belongs to

Sub-tasks

  • Update Swagger API
  • Implement in API Server

NOTE: This is now done by /products/{identifier}/member-of

As an API user, I want to know how long a request took to complete

๐Ÿ’ช Motivation

...so that I can use this for metrics purposes and/or for a better understanding of how to improve my access to the API from a client.

๐Ÿ“– Additional Details

โš–๏ธ Acceptance Criteria

Given a running API and registry with data ingested
When I perform a query to the API
Then I expect the response to include a took field with the amount of time it takes for the query to be pushed to the registry, parsed by the API service, and then tagged with with the time right before the response is returned by the API service

โš™๏ธ Engineering Details

Similar to ESDIS CMR response, something like:

{
  "hits" : 2,
  "took" : 11,
  "items" : [ {
...

seems like this is tightly coupled with NASA-PDS/pds-api#68 so figured I would add to this sprint as well

As a user, I want to receive a XML response that contains the PDS4 label metadata in XML format (application/vnd.nasa.pds.pds4+xml)

๐Ÿ’ช Motivation

...so that I can get the original label from the API

๐Ÿ“– Additional Details

We need to reimplement the 'application/pds4+xml' MVC implementation on top of the PDS4Product object (instead of previously on top of the default Product object)

โš–๏ธ Acceptance Criteria

Given a valid lidvid in the registry
When I perform curl -X GET http://{base_url}/products/{lidvid} --header Accept: application/pds+xml
Then I expect to get the original PDS4 XML label for the lidvid

When I perform curl -X GET http://{base_url}/products --header Accept: application/pds+xml
Then I expect to get the search result with original PDS4 XML labels

โš™๏ธ Engineering Details

As an API user, I want to search using URL parameters

Motivation

...so that I can provide more simple search criteria, and do not need to know the complexity of the API query syntax

Additional Details

Acceptance Criteria

Given a deployed API with data ingested
When I perform a query against an API endpoint with <endpoint>?lidvid eq *my_lid*
Then I expect the API to query the Registry for a lidvid that matches *my_lid*.

Engineering Details

As an API user, I want a CSV response format option

๐Ÿ’ช Motivation

...so that I can query the API and get something like a PDS3 index table, e.g. https://pds-imaging.jpl.nasa.gov/data/cassini/cassini_orbiter/coiss_2025/index/index.tab which has a PDS3 label https://pds-imaging.jpl.nasa.gov/data/cassini/cassini_orbiter/coiss_2025/index/index.lbl

๐Ÿ“– Additional Details

โš–๏ธ Acceptance Criteria

Given a registry + API installment with 1 or more data sets ingested
When I perform a query of the API with response format specified as application/csv (or text/csv?)
Then I expect a valid CSV response to be returned

โš™๏ธ Engineering Details

Multiple values for `ops:Label_File_Info.ops:file_ref` and `โ€ฆ:md5_checksum`

When executing the /bundles/{lidvid} API, the properties key in the JSON result contains two important values, ops:Label_File_Info.ops:file_ref and ops:Data_File_Info.ops:md5_checksum.

For example, for /bundles/urn:nasa:pds:insight_documents::2.0, once upon a time we got these properties:

  • ops:Label_File_Info.ops:file_ref = "https://pds-gamma.jpl.nasa.gov/data/pds4/test-data/registry/urn-nasa-pds-insight_documents/bundle_insight_documents.xml"
  • ops:Label_File_Info.ops:md5_checksum = "a366a14158f5a7f0dc7a1b4c06c003ae"

However, now on pds-gamma (as of 2021-06-29) these two keys have changed from strings to lists:

  • ops:Label_File_Info.ops:file_ref = ["https://pds-gamma.jpl.nasa.gov/data/pds4/test-data/registry/urn-nasa-pds-insight_documents/bundle_insight_documents.xml"]
  • ops:Label_File_Info.ops:md5_checksum = ["a366a14158f5a7f0dc7a1b4c06c003ae"]

Multiple values for the file_ref and md5_checksum don't make sense when describing a single label, but if this is the new expected correct behavior, please go ahead and triage-close this ticket immediately (i.e., wontfix resolution).

However, if this is a regression, please leave the ticket open for assigningment, estimation, milestoning, etc.

To reproduce:

  1. Run curl --request GET --header 'Accept: application/json' 'https://pds-gamma.jpl.nasa.gov/api/bundles/urn%3Anasa%3Apds%3Ainsight_documents%3A%3A2.0' | json_pp
  2. Look for ops:Label_File_Info.ops:md5_checksum and ops:Label_File_Info.ops:file_ref in the output.
  3. Observe that ["โ€ฆ"] now appear where "โ€ฆ" used to be.

This behavior is passed through the PDS API Client and affects the way the PDS Deep Archive works.

As a developer, I never want the label blob to be returned

Motivation

...so that I simplify the amount of metadata/data returned by the API + the API should handle translating this via content negotiation and some other response, not through the returned metadata.

Additional Details

Acceptance Criteria

Given a deployed API with label blobs ingested
When I perform a query for a data product
Then I expect to return the associated metadata product, but NOT the blob

Engineering Details

parsing right hand side of operator does not behave as desired

๐Ÿ› Describe the bug

The search string a eq b does not parse while a eq "b" does. The expectation is that both parse to the same result.

๐Ÿ“œ To Reproduce

See TestParsing.isParsable() a unit test for this module.

๐Ÿ•ต๏ธ Expected behavior

Should pass,


** ๐Ÿฆ„ Applicable requirements**

As an API user, I want to get only the fields I explicitly requested

Motivation

...so that I can I only get the properties and nothing else. If I am querying for something specific, then I know what I want and don't want extra default properties.

Additional Details

As a PDS knowledgeable user, I want an extra API end-point which does not contains default attributes that will impact the performance of my request.
As a newbie user I still want to have a default representation of the PDS products in a simple intuitive format.

Acceptance Criteria

Given PDS4 fields 2 fields
When I perform an API request
Then I expect I want to get this 2 fields only
And I expect for each product returned, at least one of the 2 fields has a valid value

Engineering Details

This can be done by implementing an extra end-point

//properties
(where is products, collections or bundles)

where parameters:
q,fields,start,limit would apply.

This end-point would only return the content of the current properties object in the current json response from /

The end-point could also be created
///properties

it would only take 'fields' as a parameter.

As a developer, I want to update in a single place the list of supported MIME types

As an API user, I want to know the unique values for a specific API field.

Motivation

...so that I can know what all possible values are to enhance my ability to search the archive for various collections / products.

This ticket need to implement the Search UI, see NASA-PDS/pds.nasa.gov-search-prototype#33

Additional Details

From OpenPlanetary Lunch Talk:

do you have anything similar to DISTINCT? I find that super useful, e.g. "tell me all the unique values of instrument type" or something like that

Acceptance Criteria

Given
When I perform
Then I expect

Engineering Details

That implementation should be part of a facetting feature which will be useful for search UI as well.
In this case, in addition to the unique values we will also get the number of objects having the given value.

As a user, I want to have a complete default fields (for now at least)

Motivation

...so that the fields parameter only impacts the property content and is readable to the user (the match between pds4 syntax and attribute in the json api object are not explicit to the user).

This also aims at enabling the returned object to validate the swagger definition (checked by the python api client)

Additional Details

Acceptance Criteria

Given
When I perform request curl --location --request GET 'http://localhost:8080/products?limit=10&fields=title,ops:Label_File_Info.ops:md5_checksum'
Then I expect a complete result in the default fields and properties fields containing only title and md5_sum

Engineering Details

As an API user, I want an average query response time of 1 second for q=* queries

Motivation

...so that I can ensure usability of the API through rapid responses to queries

Additional Details

1 second is somewhat arbitrary but loosely taken from https://www.nngroup.com/articles/response-times-3-important-limits/
Other details for the requirement:

  • Registry should contain a minimum of 1mil products for sufficient testing
  • Time starts from query received by API service

Acceptance Criteria

Given a deployed API and registry with 1mil+ products ingested
When I perform a request or query against any endpoint with a query of q=*
Then I expect an average 1 second response time, regardless of the type of response type (e.g. pds4+json, json, etc.)

Note: per the performance note, this should be tested against all endpoints and all response formats.

Engineering Details

Once #13 is implemented, this may just be a simple regression test we add to the repo to check this. Or we can talk to folks on the team to figure out if we know of any long-running queries that may push this. right now, I can't think of any.

As a user, I want to see the version of the API specification in the URL of the service

๐Ÿ’ช Motivation

...so that I can access various version of the API if available.

๐Ÿ“– Additional Details

I should also be able to access the latest API version.

A home page, should propose all available versions. Also all the submodule of the API specification (e.g. registry, doi)

โš–๏ธ Acceptance Criteria

Given a maintained version X.Y.Z of the API specification
When I perform a request to uRL http://server/api/registry/X.Y.Z/ (TO BE REVIEWED)
Then I expect to get the swaggerhub ui for this version of the API

Given
When I perform a request to uRL http://server/api/
Then I expect to get the list of API module and version available

โš™๏ธ Engineering Details

AWS cost analysis tag is not 'Alpha' but instead 'Alfa'

๐Ÿ› Describe the bug

The terraform scripts are employing an AWS tag 'Alpha' as one of the cost tags, however the correct spelling is 'Alfa'. This is making it difficult (but not impossible) to identify nodes in CUR queries (since the Alfa tag is used to store the node name).

๐Ÿ“œ To Reproduce

  1. Check terraform scripts - see references to 'Alpha' tag and assigned the node name abbreviation
  2. As the cost_analysis role, open the Athena query editor: https://console.aws.amazon.com/athena/home?region=us-east-1#/query-editor
  3. Examine the schema for the default_445837347542 table (which stores the CUR information), note the column 'resource_tags_user_alfa' (which is assigned the value of the 'Alfa' tag).
  4. Optionally query this column value, note that it is empty.

๐Ÿ•ต๏ธ Expected behavior

The resource_tags_user_alfa column should contain the node name abbreviation.

๐Ÿ“š Version of Software Used

0.4.1 and 0.5.0

As an API user, I want to get an XML response

Motivation

...so that the API can support numerous format responses to support the community.

Details

Note: The content should match the application/json structure

That can be done easily by applying the default springboot content negotiation instead of the custom made content negotiation which returns the original pds4 label.

The pds4 label response could be kept for content-type=application/pds4+xml

Acceptance criteria

Given any API request
When I perform with content-type header 'application/xml'
I expect to get the same attributes and same structure as the same request run with content-type header set to application/json

As a user, I want to know why my query syntax is invalid

Motivation

...so that I can update my query (q param) to make it work

Additional Details

Acceptance Criteria

Given deployed API server
When I perform request q=ops:Data_File_Info.ops:file_size gte 138172
Then I expect an explicit error message like "Unkown operator gte", status 400

To be completed

Engineering Details

I believe this can be easily added by using messages in the ParseCancellationException and throwing the exception all the way through the api controllers. Actually this requires a bit a research to understand how a springboot controller method can returm multiple type (products or error).

As an API user, I want to get the latest version of a product, by default

Motivation

...so that I can get the latest version of a product only, ignoring the superseded versions of the product.

Additional Details

Acceptance Criteria

Given a registry populated with multiple versions of a product
When I perform a query to the products/{lid} endpoint by LID
Then I expect to have the latest version of the product returned, only

Given a registry populated with multiple versions of a collection
When I perform a query to the collections/{lid} endpoint by LID or collections/ endpoint
Then I expect to have the latest version of the collection(s) returned, only

Given a registry populated with multiple versions of a bundle
When I perform a query to the bundles/{lid} endpoint by LID or bundles/ endpoint
Then I expect to have the latest version of the bundle(s) returned, only

Engineering Details

API design discussed at 2021-07-28 API WG Meeting

implement the start/limit efficiently

All end-point need the pagination implementation except the urn resolvers.

This need to be implemented efficiently so that we don't go through the same items if a sequence of requests is:

collections?start=0&limit=10
collections?start=11&limit=10
...

One way of doing that is using java stream.skip() method.

Acceptance criteria:
For requests with numerous results, the processing time for any page is the same for any page (e.g. it does not become longer when the start number is bigger). This can be tested on the demo deployment https://pds-gamma.jpl.nasa.gov/api/swagger-ui.html of the reference implementation (https://github.com/NASA-PDS/registry-api-service).

Request examples:
curl --location --request GET 'https://pds-gamma.jpl.nasa.gov/api/products?start=1&limit=500'

curl --location --request GET 'https://pds-gamma.jpl.nasa.gov/api/collections/urn:nasa:pds:orex.ovirs:data_calibrated::10.0/products?start=100&limit=100'

Replace start and limit with your values.

parsing of string does not succeed

Describe the bug
Given the simple URL from pds-api 54 /products?q="lid eq *text*" and expectation (acceptance criteria), the registry-api-service when nuts and blew up with out of memory. Tracked it down to this module. Turns out there is a simple infinite loop in ANTLR 4.9.2 and earlier that is most likely a fault of the language or how it is being parsed because the ANTL FAQ suggests that not terminating the stream is the biggest problem and the infinite loop is caused by never finding the end of the stream even though all data has been read.

To Reproduce
Run/Debug in Eclipse api.pds.nasa.gov.api_search_query_lexer.TestParsing.

Expected behavior
TestParsing should run to completion and successfully parse the lid eq *text* string without running out of memory.

Version of Software Used
ANTLR 4.3, 4.7, 4.9.

As a user, I want to query for all versions of a product

Motivation

...so that I can see the provenance history of a particular product

Additional Details

Acceptance Criteria

Given a registry populated with multiple versions of a product
When I perform a query to the products/{lid} endpoint by LID
Then I expect to have the latest version of the product returned, only

Given a registry populated with multiple versions of a collection
When I perform a query to the collections/{lid} endpoint by LID
Then I expect to have the latest version of the collection returned, only

Given a registry populated with multiple versions of a bundle
When I perform a query to the bundles/{lid} endpoint by LID
Then I expect to have the latest version of the bundle returned, only

Engineering Details

  • design the API (swagger) for that
  • develop the new feature

As a API manager, I want to restrict access to registered products that should not be publicly accessible

๐Ÿ’ช Motivation

...so that I can ingest data that is not yet operational, and disable access to those products from the API

๐Ÿ“– Additional Details

A couple different use cases that may require separate user stories:

  1. Disable access to the "private" products' metadata (initial solution could be based upon something like archive_status to restrict what products are returned)
  2. Enable access to the "private" products' metadata for a subset of users (requires auth capability)
  3. Enable access to all "private" products' metadata, but disable download (probably something that should be handled server-side, separate from the API and also a separate requirement for eventual transport service)

โš–๏ธ Acceptance Criteria

Given a registry populated with products with archive_status = staged and archive_status = archived, see documentation on https://nasa-pds.github.io/registry
When I perform a query of the products/ API endpoint
Then I expect to only return those products where archive_status == archived

โš™๏ธ Engineering Details

As an API user, I want to perform a search using wildcards

For more information on how to populate this new feature request, see the PDS Wiki on User Story Development:

https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Issue-Tracking#user-story-development

Motivation

...so that I can more easily search for a product without knowing the entire LID

Additional Details

As stated in the PDS API Spec:

The PDS Search API also supports wild cards ? and . A search with no q parameter specified will default to q= (search for all possible records).

Must also use like operator to search for the wildcard.

See specification https://docs.google.com/document/d/16d0MVh48bFLvWsa5-B_Hy-cby1rGWdnNojWOJpUcOvA/edit#heading=h.dihrtzltiwag

Acceptance Criteria

Given There is a product with some LID like urn:nasa:pds:my_bundle:my_collection:my_product
When I perform A query against the the products endpoint like GET /products?q=lid like "*my_bundle*"
Then I expect the API results to include, at minimum the product metadata for urn:nasa:pds:my_bundle:my_collection:my_product (and any other products that contain my_bundle in their lid.

Given There is a product with some title like this is my product foobar
When I perform A query against the the products endpoint like GET /products?q=title like "*foobar*"
Then I expect the API results to include, at minimum the product metadata for the product with the title this is my product foobar (and any other products that contain foobar in their title

As a user, I want the /products end point to work for any class of products

Motivation

...so that I am not confused, since currently it always returns empty results:

  • for /products/{lidvid}/collections when lidvid is a bundle
  • or for /products/{lidvid}/bundles when lidvid is a collection

Additional Details

Acceptance Criteria

Given a bundle lidvid
When I perform {{baseUrl}}/products/{lidvid}/collections
Then I expect to get the collections of this bundle as /bundles/{lidvid}/collections does

Given a collection lidvid
When I perform {{baseUrl}}/products/{lidvid}/bundles
Then I expect to get the bundle of this collection as /collections/{lidvid}/bundles does

Engineering Details

this question is the reason why I intially created the ticket NASA-PDS/registry-api-service#32

As a client developer, I want to facet on 1 or more fields in the registry

๐Ÿ’ช Motivation

...so that I can enable faceting for my client across one or more fields to help guide a user towards their expected search results.

๐Ÿ“– Additional Details

โš–๏ธ Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

Tightly coupled to #284

As a user, when I request specific fields I want to get records which have at least one of these fields

When the parameters fields=... is used.

The API has to send back products which have at least one of the requested fields.

The criteria can be added to the elasticsearch request

Acceptance criteria:
If a request is done to the API with parameter fields= , none of the results can contain only empty values for these fields (ie at least one of the fields has a valid value), to be validated on rederence implementation https://github.com/NASA-PDS/registry-api-service and demo deployment https://pds-gamma.jpl.nasa.gov/api/swagger-ui.html

As an API user, I want to know the Bundle for a given Collection.

For more information on how to populate this new feature request, see the PDS Wiki on User Story Development:

https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Issue-Tracking#user-story-development

Motivation

...so that I can know the bundle(s) this collection belongs to.

Additional Details

See NASA-PDS/registry#109 and NASA-PDS/registry#108 for how the registry ingests primary and secondary products.

Acceptance Criteria

Given I have a Collection LID OR LIDVID
When I perform an API query by that LIDVID for its parent bundle(s)
Then I expect the API to return the primary bundle (there should be only 1) AND any secondary bundle(s) (could be many) the Collection belongs to

/products/{collection-identifier}/member-of

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.