Git Product home page Git Product logo

datahub-helm's Introduction

DataHub Kubernetes Helm Charts

Artifact Hub

Introduction

This repo provides the Kubernetes Helm charts for deploying Datahub and it's dependencies (Elasticsearch, optionally Neo4j, MySQL, and Kafka) on a Kubernetes cluster.

Setup

  1. Set up a kubernetes cluster
  2. Install the following tools:
    • kubectl to manage kubernetes resources
    • helm to deploy the resources based on helm charts. Note, we only support Helm 3.

Components

Datahub consists of 4 main components: GMS, MAE Consumer (optional), MCE Consumer (optional), and Frontend. Kubernetes deployment for each of the components are defined as subcharts under the main Datahub helm chart.

The main components are powered by 4 external dependencies:

  • Kafka
  • Local DB (MySQL, Postgres, MariaDB)
  • Search Index (Elasticsearch)
  • Graph Index (Supports either Neo4j or Elasticsearch)

The dependencies must be deployed before deploying Datahub. We created a separate chart for deploying the dependencies with example configuration. They could also be deployed separately on-prem or leveraged as managed services. To remove your dependency on Neo4j, set enabled to false in the datahub-kubernetes/prerequisites/values.yaml file. Then, override the graph_service_impl field in datahub-kubernetes/datahub/values.yaml to have the value elasticsearch instead of neo4j.

Quickstart

Assuming kubectl context points to the correct kubernetes cluster, first create kubernetes secrets that contain MySQL and Neo4j passwords.

kubectl create secret generic mysql-secrets --from-literal=mysql-root-password=datahub
kubectl create secret generic neo4j-secrets --from-literal=neo4j-password=datahub

The above commands sets the passwords to "datahub" as an example. Change to any password of choice.

Add datahub helm repo by running the following

helm repo add datahub https://helm.datahubproject.io/

Then, deploy the dependencies by running the following

helm install prerequisites datahub/datahub-prerequisites

Note, the above uses the default configuration defined here. You can change any of the configuration and deploy by running the following command.

helm install prerequisites datahub/datahub-prerequisites --values <<path-to-values-file>>

Run kubectl get pods to check whether all the pods for the dependencies are running. You should get a result similar to below.

NAME                                               READY   STATUS      RESTARTS   AGE
elasticsearch-master-0                             1/1     Running     0          62m
elasticsearch-master-1                             1/1     Running     0          62m
elasticsearch-master-2                             1/1     Running     0          62m
prerequisites-cp-schema-registry-cf79bfccf-kvjtv   2/2     Running     1          63m
prerequisites-kafka-0                              1/1     Running     2          62m
prerequisites-mysql-0                              1/1     Running     1          62m
prerequisites-neo4j-community-0                    1/1     Running     0          52m
prerequisites-zookeeper-0                          1/1     Running     0          62m

deploy Datahub by running the following

helm install datahub datahub/datahub

Values in values.yaml have been preset to point to the dependencies deployed using the prerequisites chart with release name "prerequisites". If you deployed the helm chart using a different release name, update the quickstart-values.yaml file accordingly before installing.

Run kubectl get pods to check whether all the datahub pods are running. You should get a result similar to below.

NAME                                               READY   STATUS      RESTARTS   AGE
datahub-datahub-frontend-84c58df9f7-5bgwx          1/1     Running     0          4m2s
datahub-datahub-gms-58b676f77c-c6pfx               1/1     Running     0          4m2s
datahub-datahub-mae-consumer-7b98bf65d-tjbwx       1/1     Running     0          4m3s
datahub-datahub-mce-consumer-8c57d8587-vjv9m       1/1     Running     0          4m2s
datahub-elasticsearch-setup-job-8dz6b              0/1     Completed   0          4m50s
datahub-kafka-setup-job-6blcj                      0/1     Completed   0          4m40s
datahub-mysql-setup-job-b57kc                      0/1     Completed   0          4m7s
elasticsearch-master-0                             1/1     Running     0          97m
elasticsearch-master-1                             1/1     Running     0          97m
elasticsearch-master-2                             1/1     Running     0          97m
prerequisites-cp-schema-registry-cf79bfccf-kvjtv   2/2     Running     1          99m
prerequisites-kafka-0                              1/1     Running     2          97m
prerequisites-mysql-0                              1/1     Running     1          97m
prerequisites-neo4j-community-0                    1/1     Running     0          88m
prerequisites-zookeeper-0                          1/1     Running     0          97m

You can run the following to expose the frontend locally. Note, you can find the pod name using the command above. In this case, the datahub-frontend pod name was datahub-datahub-frontend-84c58df9f7-5bgwx.

kubectl port-forward <datahub-frontend pod name> 9002:9002

You should be able to access the frontend via http://localhost:9002.

Once you confirm that the pods are running well, you can set up ingress for datahub-frontend to expose the 9002 port to the public.

Contributing

We welcome contributions from the community. Please refer to our Contributing Guidelines for more details.

Community

Join our slack workspace for discussions and important announcements.

datahub-helm's People

Contributors

aseembansal-gogo avatar datasciencechris avatar dwmkerr avatar jjoyce0510 avatar jsotelo avatar keerthiis avatar mihaitodor avatar rocel avatar wiktor2200 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.