Git Product home page Git Product logo

arm-templates's Introduction

Dremio + Kubernetes Cluster Setup

Overview

This is a Helm chart to deploy a Dremio cluster in kubernetes. It uses a persistent volume for the master node to store the metadata for the cluster. The default configuration uses the default persistent storage supported by the kubernetes platform. For example,

Kubernetes platform Persistent store
AWS EKS EBS
Azure AKS Azure disk (HDD)
Google GKE Persistent Disk
Local K8S on Docker Hostpath

If you want to use a different storage class available in your kubernetes environment, add the storageClass in values.yaml.

An appropriate distributed file store (S3, ADLS, HDFS, etc) should be used for paths.dist as this deployment will lose locally persisted reflections and uploads. You can update config/dremio.conf. Dremio documentation provides more information on this.

This assumes you already have kubernetes cluster setup, kubectl configured to talk to your kubernetes cluster and helm setup in your cluster. Review and update values.yaml to reflect values for your environment before installing the helm chart. This is specially important for for the memory and cpu values - your kubernetes cluster should have sufficient resources to provision the pods with those values. If your kubernetes installation does not support serviceType LoadBalancer, it is recommended to comment the serviceType value in values.yaml file before deploying.

Installing the helm chart

Run this from the charts directory

cd charts
helm install --wait dremio

If it takes longer than a couple of minutes to complete, check the status of the pods to see where they are waiting. If they are pending scheduling due to limited memory or cpu, either adjust the values in values.yaml and restart the process or add more resources to your kubernetes cluster.

Connect to the Dremio UI

If your kubernetes supports serviceType LoadBalancer, you can get to the Dremio UI on the load balancer external ip.

For example, if your service output is:

kubectl get services dremio-client
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)                          AGE
dremio-client   LoadBalancer   10.99.227.180   35.226.31.211     31010:32260/TCP,9047:30620/TCP   2d

you can get to the Dremio UI using the value under column EXTERNAL-IP:

http://35.226.31.211:9047

If your kubernetes does not have support of serviceType LoadBalancer, you can access the Dremio UI on the port exposed on the node. For example, if the service output is:

kubectl get services dremio-client
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)                          AGE
dremio-client   NodePort       10.110.65.97    <none>            31010:32390/TCP,9047:30670/TCP   1h

where there is no external ip and the Dremio master is running on node "localhost", you can get to Dremio UI using:

http://localhost:30670

Dremio Client Port

The port 31010 is used for ODBC and JDBC connections. You can look up service dremio-client in kubernetes to find the host to use for ODBC or JDBC connections. Depending on your kubernetes cluster supporting serviceType LoadBalancer, you will use the load balancer external-ip or the node on which a coordinator is running.

kubectl get services dremio-client
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)                          AGE
dremio-client   LoadBalancer   10.99.227.180   35.226.31.211     31010:32260/TCP,9047:30620/TCP   2d

For example, in the above output, the service is exposed on an external-ip. So, you can use 35.226.31.211:31010 in your ODBC or JDBC connections.

Viewing logs

Logs are written to the container's console. All the logs - server.log, server.out, server.gc and access.log - are written into the console simultaneously. You can view the logs using kubectl.

kubectl logs <container-name>

You can also tail the logs using the -f parameter.

kubectl logs -f <container-name>

Scale by adding additional Coordinators or Executors (optional)

Get the name of the helm release. In the example below, the release name is plundering-alpaca.

helm list
NAME             	REVISION	UPDATED                 	STATUS  	CHART       	NAMESPACE
plundering-alpaca	1       	Wed Jul 18 09:36:14 2018	DEPLOYED	dremio-0.0.5	default

Add additional coordinators

helm upgrade <release name> dremio --set coordinator.count=3

Add additional executors

helm upgrade <release name> dremio --set executor.count=5

You can also scale down the same way.

Upgrading Dremio

You should attempt upgrade when no queries are running on the cluster. Update the Dremio image tag in your values.yaml file. E.g.

image: dremio/dremio-oss:3.0.0
...

Get the name of the helm release. In the example below, the release name is plundering-alpaca.

helm list
NAME             	REVISION	UPDATED                 	STATUS  	CHART       	NAMESPACE
plundering-alpaca	1       	Wed Jul 18 09:36:14 2018	DEPLOYED	dremio-0.0.5	default

Upgrade the deployment via helm upgrade command:

helm upgrade <release name> .

Existing pods will be terminated and new pods will be created with the new image. You can monitor the status of the pods by running:

kubectl get pods

Once all the pods are restarted and running, your Dremio cluster is upgraded.

Customizing Dremio configuration

Dremio configuration files used by the deployment are in the config directory. These files are propagated to all the pods in the cluster. Updating the configuration and upgrading the helm release - just like doing an upgrade - would refresh all the pods with the new configuration. Dremio documentation covers the configuration capabilities in Dremio.

If you need to add a core-site.xml, you can add the file to the config directory and it will be propagated to all the pods on install or upgrade of the deployment. <<<<<<< HEAD

553135d9bd7f49c753c1597f44bfc602c9b3318c

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.