Git Product home page Git Product logo

odm-platform's Introduction

Open Data Mesh Platform

Build Release

Open Data Mesh Platform is a platform that manages the full lifecycle of a data product from deployment to retirement. It uses the Data Product Descriptor Specification to create, deploy and operate data product containers in a mesh architecture.

Run it

Prerequisites

The project requires the following dependencies:

  • Java 11
  • Maven 3.8.6

Run locally

Clone repository

Clone the repository and move to the project root folder

git clone [email protected]:opendatamesh-initiative/odm-platform.git
cd odm-platform

Compile project

First, in order to correctly download external Maven dependencies from GitHub Packages, you need to configure the Maven settings.xml file with your GitHub credentials. The GITHUB TOKEN must have read:packages permissions.

<settings>
    <servers>
        <server>
            <id>github</id>
            <username>GITHUB USERNAME</username>
            <password>GITHUB TOKEN</password>
        </server>
    </servers>
</settings>

The settings.xml file is in the ~/.m2 directory.

For additional information, see "How to install an Apache Maven package from GitHub Packages".

Then run:

mvn clean install -DskipTests

Run application

Run the application:

java -jar registry-server/target/odm-platform-pp-registry-server-1.0.0.jar

*version could be greater than 1.0.0, check on parent POM

Stop application

To stop the application type CTRL+C or just close the shell. To start it again re-execute the following command:

java -jar registry-server/target/odm-platform-pp-registry-server-1.0.0.jar

Note: The application run in this way uses an in-memory instance of the H2 database. For this reason, the data is lost every time the application is terminated. On the next restart, the database is recreated from scratch.

Run with Docker

Clone repository

Clone the repository and move it to the project root folder

git clone [email protected]:opendatamesh-initiative/odm-platform.git
cd odm-platform

Here you can find the Dockerfile which creates an image containing the application by directly copying it from the build executed locally (i.e. from target folder).

Compile project

First, in order to correctly download external Maven dependencies from GitHub Packages, you need to configure the Maven settings.xml file with your GitHub credentials. The GITHUB TOKEN must have read:packages permissions.

<settings>
    <servers>
        <server>
            <id>github</id>
            <username>GITHUB USERNAME</username>
            <password>GITHUB TOKEN</password>
        </server>
    </servers>
</settings>

The settings.xml file is in the ~/.m2 directory.

For additional information, see "How to install an Apache Maven package from GitHub Packages".

Then you need to first execute the build locally by running the following command:

mvn clean install -DskipTests

Run database

The image generated from Dockerfile contains only the application. It requires a database to run properly. The supported databases are MySql and Postgres. If you do not already have a database available, you can create one by running the following commands:

MySql

docker run --name odmp-mysql-db -d -p 3306:3306  \
   -e MYSQL_DATABASE=ODMREGISTRY \
   -e MYSQL_ROOT_PASSWORD=root \
   mysql:8

Postgres

docker run --name odmp-postgres-db -d -p 5432:5432  \
   -e POSTGRES_DB=odmpdb \
   -e POSTGRES_USER=postgres \
   -e POSTGRES_PASSWORD=postgres \
   postgres:11-alpine

Check that the database has started correctly:

MySql

docker logs odmp-mysql-db

Postgres

docker logs odmp-postgres-db

Build image

Build the Docker image of the application and run it.

*Before executing the following commands change properly the value of arguments DATABASE_USERNAME, DATABASE_PASSWORD and DATABASE_URL. Reported commands already contains right argument values if you have created the database using the commands above.

MySql

docker build -t odmp-mysql-app . -f Dockerfile \
   --build-arg DATABASE_URL=jdbc:mysql://localhost:3306/ODMREGISTRY \
   --build-arg DATABASE_USERNAME=root \
   --build-arg DATABASE_PASSWORD=root \
   --build-arg FLYWAY_SCRIPTS_DIR=mysql

Postgres

docker build -t odmp-postgres-app . -f Dockerfile \
   --build-arg DATABASE_URL=jdbc:postgresql://localhost:5432/odmpdb \
   --build-arg DATABASE_USERNAME=postgres \
   --build-arg DATABASE_PASSWORD=postgres \
   --build-arg FLYWAY_SCRIPTS_DIR=postgresql

Run application

Run the Docker image.

Note: Before executing the following commands remove the argument --net host if the database is not running on localhost

MySql

docker run --name odmp-mysql-app -p 8001:8001 --net host odmp-mysql-app

Postgres

docker run --name odmp-postgres-app -p 8001:8001 --net host odmp-postgres-app

Stop application

*Before executing the following commands:

  • change the DB name to odmp-postgres-db if you are using postgres and not mysql
  • change the instance name to odmp-postgres-app if you are using postgres and not mysql
docker stop odmp-mysql-app
docker stop odmp-mysql-db

To restart a stopped application execute the following commands:

docker start odmp-mysql-db
docker start odmp-mysql-app

To remove a stopped application to rebuild it from scratch execute the following commands :

docker rm odmp-mysql-app
docker rm odmp-mysql-db

Run with Docker Compose

Clone repository

Clone the repository and move it to the project root folder

git clone [email protected]:opendatamesh-initiative/odm-platform.git
cd odm-platform

Build image

Build the docker-compose images of the application and a default PostgreSQL DB (v11.0).

Before building it, create a .env file in the root directory of the project similar to the following one:

DATABASE_NAME=odmpdb
DATABASE_PASSWORD=pwd
DATABASE_USERNAME=usr
DATABASE_PORT=5432
SPRING_PORT=8001

Then, build the docker-compose file:

docker-compose build

Run application

Run the docker-compose images.

docker-compose up

Stop application

Stop the docker-compose images

docker-compose down

To restart a stopped application execute the following commands:

docker-compose up

To rebuild it from scratch execute the following commands :

docker-compose build --no-cache

Test it

REST services

You can invoke REST endpoints through OpenAPI UI available at the following url:

*for a static version of the API documentation, check the APIdoc.md file

Database

If the application is running using an in memory instance of H2 database you can check the database content through H2 Web Console available at the following url:

In all cases you can also use your favourite sql client providing the proper connection parameters

odm-platform's People

Contributors

aliprax avatar andrea-gioia avatar federicoinverniciquantyca avatar fedesala22 avatar gabrielefantini avatar gabrielefantiniblindata avatar giandom avatar giuliabusnelli avatar mattia155 avatar riccardoambrosini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

odm-platform's Issues

DataProductDescriptorValidator improvements

  • allow configuring in property file the baseUrl to find the schema.json validation file for a JSON DPD
  • allow supporting a minVersion and a maxVersion of schema.json validation file for a JSON DPD through property files
  • add schema files also locally, so the application could rollback and take that from their resources in a scenario where it's impossible to fetch fhe schema.json validation file from the combinato of baseUrl and version

DevOps Module: Add body in the task callback API

Add the possibility of returning the actual task status and additional information in the callback call to ODM. Currently, it's only possible to STOP the task without the ability to declare the status (it's assumed it has been completed successfully).

Below is a proposed structure for the body to be added to the callback call:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Generated schema for Callback API",
  "type": "object",
  "properties": {
    "errors": {
      "type": ["string", "null"]
    },
    "results": {
      "type": ["object", "null"],
      "properties": {},
      "required": []
    },
    "status": {
      "type": "string",
      "enum": ["PROCESSED", "FAILED"]
    }
  },
  "required": [
    "status"
  ]
}

The "results" attribute can be used to return additional information generated by the task to ODM, such as pipeId, name, outputs generated by processes, and so on.

Let me know what you think.

Thanks,
Giandomenico

CRUD for domains

Implementing the full CRUD stack for the domain entity type, including the client and the IT

CI/CD integration

Integrate a CI/CD process in the odm-platform module.

  • Trigger integration tests execution during build
  • Re-organize docker files in repo
  • Setup a baseline CI/CD process with GitHub Actions

Handle templates like standard definitions

Use StandardDefinitionDPDS to handle templates in DTO but then map them on the dedicated domain entity TemplateDefinition. Do the same for API mapping the DTO on the dedicated domain entity ApiDefinition. Rename then the rest collection definitions in apis

VARIABLES entity

REGISTRY

  • Add a VARIABLES entity (so, also a VARIABLES table in the ODMREGISTRY schema) which will contain variables defined through the descriptor and their value.
  • At the moment of the READ of a resource, the descriptor will be serialized with the values of the variables of the new table and the variables section won't be returned.
  • At the moment of the creation, the descriptor will be scanned searching for variables in the syntax ${variableName}

DEVOPS

  • when a devops provider call the stopTask endpoint returning the results of the Task execution, check for variables in the TaskResultsResource and if founded, update the VARIABLES table

ODM-CLI enhancement

  • Add a command to create a blueprint from file.
  • Reduce verbosity for the CLI

Extend `DPDSParser` to permit resource fetching also from private git repositories

If the descriptor files are stored in a Git repository, no meter if private or public, the repository must be cloned on the host machine, and then files fetched locally. Authentication will be performed during the cloning operation using the Git protocol (https or ssh). Acting at the level of git protocol and not at the API level will allow having just only one implementation for all git servers (ex. github, gitlab, azure devops, ecc...).

Blueprint Enhancements

  • Try GitHubApp / OAuthApp instead of PAT for accessing the GitHub API
  • Allow different GitServices active at the same time and choose which to use based on the blueprint
  • Add an "instances" collection/entity to keep track of correctly created projects from blueprints
  • Mock Git providers in test for /instances endpoint and add blueprint initialization as test
  • minor fixes

API documentation: add redocly/cli integration

  • creation of a in the pom of the server modules with the maven plugin to generate the static documentation through redocly/cli + script for redocly command
  • creation of a script on root level to generate the documentation for each server module
  • Fix APIdoc.md

OAuth centralization

  • Centralization of all OAuth features in a single module
  • Replace actual usages of OAuth with functions from the new module

Add Lifecycle entity to devops

  • Lifecyle entity + DDL + resource
  • Lifecycle Repository
  • Lifecycle Mapper
  • Lifecycle Service
  • Lifecycle Controller + CRUD (only GET endpoints)
  • Extend client
  • IT

Create a separate module for ODMClient

  • Create a separate module for ODMClient and for an Interface / Class for the Routes
  • Refactor RegistryClient & PolicyClient in according to the previous task

Git centralization

  • Centralization of all Git features in a single module
  • Replace actual usages of Git with functions from the new module
  • Depends on #128

Add to the Registry Module API the `schemas` collection

Store in the database sall chemas exposed by data services endpoints (i.e. endpoints of API associated to output ports and observability ports) then create CRUD services in the registry module to operate on this new collection.

Upload bug fix

Upload endpoint fail to resolve reference with a depth grater than the first level. It resolves first level references, but fail to resolve 2nd level references.

Example:
| basedir
|— dp1.0.0.json
|— subdir
|—— port1.json
|—— subcomponentport1.json

Where dp1.0.0.json contains:
"$ref": "subdir/port1.json"
That reference is correctly resolved, but then, resolving it, also port1.json have a reference, that is "$ref": "subcomponentport1.json" . This last reference CANNOT be resolved because the application consider basedir as base URL even when resolving references inside subdir , so it looks for a file named "basedir/subcomponentport1.json" instead of "basedir/subdir/subcomponentport1.json" , due to the reference of the port. It works if port1.json use "$ref": "subdir/subcomponentport1.json" , but it's logically wrong to use it in the file.

Blueprint module enhancements

  • Refactoring
  • Minor fixes
  • Add a params.json default file in blueprint remote repository, fetch it on creation and store in a "paramsDescription" column of the blueprint entity (it must be a file with a fixed JSON structure to describe parameters and their default values)

CRUD for Owner entity

Develop:

  • the entire Owner CRUD stack (resource, repository, controller, mapper, ...)
  • the client
  • IT tests (happy and error path)

Fix Data Product Version Delete

When multiple versions of a Data Product are stored in the DB, deletion of a single version will lead to the deletion of all the stored versions

Refactor odm-cli

Refactor odm-cli to add a level of subcommands when needed + minor fixes + style.

Example of refactor to add a level:

  • from: odm-cli registry listDP
  • to: odm-cli registry list dp

Add GET internal components endpoints

Add two endpoints to the DataProductVersion controller for the internal components:

  • GET application components
  • GET infrastructure components

Modify the client and the IT accordingly

Extend RegistryClient

  • Extend RegistryClient to cover missing endpoints
  • Extend basic ITs to cover missing basic scenario (CRUD)

Add `devops` module to ODM Paltform's product plane

The DevOps module allows platform users to execute DevOps activities over a data product version like provision, build, and deployment. It decomposes the activity in tasks and calls an executor service exposed by the utility plane to execute them. The executor's API implementation works as an abstraction layer that decouples the DevOps service from the underlying tools used to provision infrastructure, build code and deploy artifacts.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.