<g-emoji class="g-emoji" alias="bug" fallback-src="https://github.githubassets.com/im

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Per <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Reconsider harvest flow architecture about harvest HOT 5 OPEN

alexdunnjpl commented on August 18, 2024

Reconsider harvest flow architecture

from harvest.

Comments (5)

tloubrieu-jpl commented on August 18, 2024

@alexdunnjpl Group this thinking with the scalable harvest components.

from harvest.

alexdunnjpl commented on August 18, 2024

Given that harvest-service and (standalone) harvest are parallel development projects.

Tentative thoughts:

Identify infra-agnostic stages of the process, such as
1. Given a file-system root node (ie path), enumerate all products (Bundles, Collections, and simple Products) under that node's subtree.
2. Given an enumeration of products, map them to registry JSON documents.
3. Given a batch of registry JSON documents, process their registration with an OpenSearch instance (maybe - is this infra-dependent?)
Extract each of these stages as utility modules (using standalone harvest for development, since that will be simplest).
Once these utility modules are written/extracted, replace harvest-service implementations with calls to the modules as dependencies.

Once complete, the implementation code of harvest-service will just be the management/delegation code and some simple calls to glue it to the utility modules, and the implementation code of (standalone) harvest will just be a CLI wrapper around calls to the utility modules.

As a result, each (standalone/scalable) version of harvest will be doing exactly the same thing, and the utility libraries will be easily unit-testable.

from harvest.

tloubrieu-jpl commented on August 18, 2024

@alexdunnjpl will organize a meeting to discuss that with @jordanpadams @viviant100 and @tloubrieu-jpl next week.

from harvest.

alexdunnjpl commented on August 18, 2024

Better question than "how should we support both?" is "why do we support both?".

Since targeting a bundle directory with a element will ingest all labels nested within, I don't see what the benefit of the option (which iterates on the bundle label, and all first-descendant collection labels, and all <=20th-descendant product labels) as a separate thing.

@jordanpadams is it reasonable to argue for dropping support for the functionality entirely in preference of the approach?

from harvest.

alexdunnjpl commented on August 18, 2024

Per @jordanpadams

This is true. <bundles> was kept for backwards compatibility support, but we changed the way we treated this when we decided to just use this part of the config to know where to look, but no longer decipher between bundles/collections/products. Just read everything below where this points and load the data as fast as you can.

Currently unclear whether removal of support is now acceptable - will be determined in tomorrow's meeting.

from harvest.

Recommend Projects

Reconsider harvest flow architecture about harvest HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent