Git Product home page Git Product logo

nasa-pds / registry-harvest-service Goto Github PK

View Code? Open in Web Editor NEW
2.0 9.0 0.0 1.92 MB

DEPRECATED. Server application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/NASA-PDS/registry). Different from the standalone Harvest Tool, this goes along with Crawler and Harvest Client to enable performant ingestion.

License: Other

Java 86.10% Shell 8.93% Batchfile 1.03% Dockerfile 3.94%
data-loader etl java nasa pds pds4

registry-harvest-service's Introduction

Harvest Web Service

Server application providing the functionality for capturing and indexing product metadata into the PDS Registry system. This application is different from the standalone Harvest Tool (see https://github.com/nasa-pds/harvest).

It has to be used with other components, such as RabbitMQ message broker, Crawler Server and Harvest Client to enable performant ingestion of large data sets. The description of the full application is available on https://nasa-pds.github.io/registry-harvest-service/ . A facilities to launch the full application (including this components) are provided in the registry repository (see https://github.com/NASA-PDS/registry/tree/main/docker).

๐Ÿ“€ Installation

This is a Java application. You need Java 11 JDK and Maven to build it. To create a binary distribution (ZIP and TGZ archives) run the following maven command:

mvn package

Binary archives (such as "registry-harvest-service-1.0.0-SNAPSHOT-bin.zip") will be created in "target" directory.

Prebuilt binaries are available in https://github.com/NASA-PDS/registry-harvest-service/releases

To install, just extract a binary archive into some folder, such as "/opt/harvest"

๐Ÿ’โ€โ™€๏ธ Usage

  • Go to <install_directory>/bin and run "harvest-server" without any parameters to see usage information.

  • This project includes documentation web application (maven site). The documentation provides PDS Registry architecture overview, installation, and operation instructions.

To build and run local documentation web application execute the following maven command:

mvn site:run

Then open this URL in your web browser http://localhost:8080

๐Ÿ‘ฅ Contributing

Within the NASA Planetary Data System, we value the health of our community as much as the code. Towards that end, we ask that you read and practice what's described in these documents:

  • Our contributor's guide delineates the kinds of contributions we accept.
  • Our code of conduct outlines the standards of behavior we practice and expect by everyone who participates with our software.

๐Ÿ”ข Versioning

We use the SemVer philosophy for versioning this software. Or not! Update this as you see fit.

Manual Publication

NOTE: Requires using PDS Maven Parent POM to ensure release profile is set.

Update Version Numbers

Update pom.xml for the release version or use the Maven Versions Plugin, e.g.:

# Skip this step if this is a RELEASE CANDIDATE, we will deploy as SNAPSHOT version for testing
VERSION=1.15.0
mvn versions:set -DnewVersion=$VERSION
git add pom.xml
git add */pom.xml

Update Changelog

Update Changelog using Github Changelog Generator. Note: Make sure you set $CHANGELOG_GITHUB_TOKEN in your .bash_profile or use the --token flag.

# For RELEASE CANDIDATE, set VERSION to future release version.
GITHUB_ORG=NASA-PDS
GITHUB_REPO=validate
github_changelog_generator --future-release v$VERSION --user $GITHUB_ORG --project $GITHUB_REPO --configure-sections '{"improvements":{"prefix":"**Improvements:**","labels":["Epic"]},"defects":{"prefix":"**Defects:**","labels":["bug"]},"deprecations":{"prefix":"**Deprecations:**","labels":["deprecation"]}}' --no-pull-requests --token $GITHUB_TOKEN

git add CHANGELOG.md

Commit Changes

Commit changes using following template commit message:

# For operational release
git commit -m "[RELEASE] Validate v$VERSION"

# Push changes to main
git push -u origin main

Build and Deploy Software to Maven Central Repo

# For operational release
mvn clean site site:stage package deploy -P release

# For release candidate
mvn clean site site:stage package deploy

Push Tagged Release

# For Release Candidate, you may need to delete old SNAPSHOT tag
git push origin :v$VERSION

# Now tag and push
REPO=validate
git tag v${VERSION} -m "[RELEASE] $REPO v$VERSION" -m "See [CHANGELOG](https://github.com/NASA-PDS/$REPO/blob/main/CHANGELOG.md) for more details."
git push --tags

Deploy Site to Github Pages

From cloned repo:

git checkout gh-pages

# Copy the over to version-specific and default sites
rsync -av target/staging/ .

git add .

# For operational release
git commit -m "Deploy v$VERSION docs"

# For release candidate
git commit -m "Deploy v${VERSION}-rc${CANDIDATE_NUM} docs"

git push origin gh-pages

Update Versions For Development

Update pom.xml with the next SNAPSHOT version either manually or using Github Versions Plugin.

For RELEASE CANDIDATE, ignore this step.

git checkout main

# For release candidates, skip to push changes to main
VERSION=1.16.0-SNAPSHOT
mvn versions:set -DnewVersion=$VERSION
git add pom.xml
git commit -m "Update version for $VERSION development"

# Push changes to main
git push -u origin main

Complete Release in Github

Currently the process to create more formal release notes and attach Assets is done manually through the Github UI, but should eventually be automated via script.

NOTE: Be sure to add the tar.gz and zip from the target/ directory to the release assets, and use the CHANGELOG generated above to create the RELEASE NOTES.

CI/CD

The template repository comes with our two "standard" CI/CD workflows, stable-cicd and unstable-cicd. The unstable build runs on any push to main (+/- ignoring changes to specific files) and the stable build runs on push of a release branch of the form release/<release version>. Both of these make use of our GitHub actions build step, Roundup. The unstable-cicd will generate (and constantly update) a SNAPSHOT release. If you haven't done a formal software release you will end up with a v0.0.0-SNAPSHOT release (see NASA-PDS/roundup-action#56 for specifics). Additionally, tests are executed on any non-main branch push via branch-cicd.

๐Ÿ“ƒ License

The project is licensed under the Apache version 2 license. Or it isn't. Change this after consulting with your lawyers.

registry-harvest-service's People

Contributors

dependabot[bot] avatar jordanpadams avatar nutjob4life avatar pdsen-ci avatar ramesh-maddegoda avatar tdddblog avatar testpersonal avatar tloubrieu-jpl avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

registry-harvest-service's Issues

Harvest service sometimes skips collection inventory files

๐Ÿ› Describe the bug

Inconsistently, the scalable harvest service (might be the harvest server or crawler) will see a directory containing a collection inventory and skip it, instead of ingesting it for the references.

On several occasions, the harvest service has been pointed to a directory containing a bundle, and ingested some of the collection inventory files but skipped others. Even further, I will delete the collection docs in the registry index and re-harvest, only to see it ingest a different subset of the collections it was pointed to.

So, frustratingly, I don't have perfect steps to reproduce. I do have some logs demonstrating the harvest service output in these scenarios though.

In the below example, the harvest job was pointed at this directory with the four collections having been deleted from the registry (there were no matching docs in registry or registry-refs). Note that it sees and harvest all four collection labels, but only picks up the collection inventories for the document and calibration collections. (The browse collection had already previously been correctly ingested.)

[INFO] Processing batch of 2 products: /dsk1/www/archive/pds4/hayabusa2/hyb2_tir_v1.0/calibration/collection_hyb2_tir_calibration.xml, ...
[INFO] Processing collection inventory file /dsk1/www/archive/pds4/hayabusa2/hyb2_tir_v1.0/calibration/collection_hyb2_tir_calibration.csv
[INFO] Processing /dsk1/www/archive/pds4/hayabusa2/hyb2_tir_v1.0/calibration/collection_hyb2_tir_calibration.xml
[INFO] Extract metadata with rule /dsk1/www/archive/pds4/
[INFO] Loading data.
[INFO] Loaded 1 products.
[INFO] Processing batch of 1 products: /dsk1/www/archive/pds4/hayabusa2/hyb2_tir_v1.0/data_btemp/collection_hyb2_tir_data_btemp.xml, ...
[INFO] Processing /dsk1/www/archive/pds4/hayabusa2/hyb2_tir_v1.0/data_btemp/collection_hyb2_tir_data_btemp.xml
[INFO] Extract metadata with rule /dsk1/www/archive/pds4/
[INFO] Loading data.
[INFO] Loaded 1 products.
[INFO] Processing batch of 1 products: /dsk1/www/archive/pds4/hayabusa2/hyb2_tir_v1.0/data_raw/collection_hyb2_tir_data_raw.xml, ...
[INFO] Processing /dsk1/www/archive/pds4/hayabusa2/hyb2_tir_v1.0/data_raw/collection_hyb2_tir_data_raw.xml
[INFO] Extract metadata with rule /dsk1/www/archive/pds4/
[INFO] Loading data.
[INFO] Wrote 23 collection inventory document(s)
[INFO] Loaded 1 products.
[INFO] Processing batch of 5 products: /dsk1/www/archive/pds4/hayabusa2/hyb2_tir_v1.0/document/TIR-Calibration_Okada_2020.xml, ...
[INFO] Processing /dsk1/www/archive/pds4/hayabusa2/hyb2_tir_v1.0/document/collection_hyb2_tir_document.xml
[INFO] Extract metadata with rule /dsk1/www/archive/pds4/
[INFO] Loading data.
[INFO] Processing collection inventory file /dsk1/www/archive/pds4/hayabusa2/hyb2_tir_v1.0/document/collection_hyb2_tir_document.csv
[INFO] Loaded 1 products.
[INFO] Wrote 2 collection inventory document(s)

๐Ÿ•ต๏ธ Expected behavior

I expect harvest service to detect any collection inventory files, and harvest them into the registry-refs directory

๐Ÿ“š Version of Software Used

registry-harvest-service-1.0.1-SNAPSHOT
registry-crawler-service-1.0.0

๐Ÿฉบ Test Data / Additional context

The bundle mentioned above can be downloaded here (24GB direct download)


๐Ÿฆ„ Related requirements

โš™๏ธ Engineering Details

Stable Roundup can no longer trigger Imaging workflow

๐Ÿ› Describe the bug

After pushing a tag like release/x.y.z, the Roundup Action performs a release and then deletes the release/x.y.z tag. However, the Stable workflow then triggers the Imaging workflow via repository dispatch. The Imaging workflow then checks out release/x.y.z, but it no longer exists.

๐Ÿ“œ To Reproduce

  1. Push a tag like release/x.y.z
  2. ๐Ÿฟ

๐Ÿ•ต๏ธ Expected behavior

Docker images are constructed and pushed to the Hub.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.