The kafka-scheduler-chart's intro from vertica

This Helm chart deploys the vertica-kafka-scheduler with two modes:

initializer: Configuration mode. Starts a container so that you can exec into it and configure it.
launcher: Launch mode. Launches the vkconfig scheduler. Starts a container that calls vkconfig launch automatically. Run this mode after you configure the container in initializer mode.

Install the charts

Add the charts to your repo and install the Helm chart. The following helm install command uses the image.tag parameter to install version 24.1.0:

$ helm repo add vertica-charts https://vertica.github.io/charts
$ helm repo update
$ helm install vkscheduler vertica-charts/vertica-kafka-scheduler \ 
    --set "image.tag=24.1.0"

Sample manifests

The following dropdowns provide sample manifests for a Kafka cluster, VerticaDB operator and custom resource (CR), and vkconfig scheduler. These manifests are applied in Usage to demonstrate how a simple deployment:

kafka-cluster.yaml (with Strimzi operator)

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:

  namespace: kafka
  name: my-cluster
spec:
  kafka:
    version: 3.6.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      default.replication.factor: 1
      min.insync.replicas: 1
      inter.broker.protocol.version: "3.6"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
  zookeeper:
    replicas: 1
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}

vdb-op-cr.yaml

apiVersion: vertica.com/v1
kind: VerticaDB
metadata:
  annotations:
    vertica.com/include-uid-in-path: "false"
    vertica.com/vcluster-ops: "false"
  name: vdb-1203
spec:
  communal:
    credentialSecret: ""
    endpoint: https://s3.amazonaws.com
    path: s3://<path>/<to>/<s3-bucket>
  image: vertica/vertica-k8s:12.0.3-0
  initPolicy: Create
  subclusters:
  - name: sc0
    size: 3
    type: primary

vertica-kafka-scheduler.yaml

 image:
   repository: opentext/kafka-scheduler
   pullPolicy: IfNotPresent
   tag: 12.0.3
 launcherEnabled: false
 replicaCount: 1
 initializerEnabled: true
 conf:
   generate: true
   content:
     config-schema: Scheduler
     username: dbadmin
     dbport: '5433'
     enable-ssl: 'false'
     dbhost: 10.20.30.40
 tls:
   enabled: false
 serviceAccount:
   create: true

Usage

The following sections deploy a Kafka cluster and a VerticaDB operator and CR on Kubernetes. Then, they show you how to configure Vertica to consume data from Kafka by setting up the necessary tables and configuring the scheduler. Finally, you launch the scheduler and send data on the command line to test the implementation.

Deploy the manifests

Apply manifests on Kubernetes to create a Kafka cluster, VerticaDB operator, and VerticaDB CR:

Create a namespace. The following command creates a namespace named kafka:
```
kubectl create namespace kafka
```
Create the Kafka custom resource. Apply the kafka-cluster.yaml manifest:
```
kubectl apply -f kafka-cluster.yaml
```
Deploy the VerticaDB operator and custom resource. The vdb-op-cr.yaml manifest deploys version 12.0.3. Before you apply the manifest, edit spec.communal.path to provide a path to an existing S3 bucket:
```
kubectl apply -f vdb-op-cr.yaml
```

Set up Vertica

Create tables and resources so that Vertica can consume data from a Kafka topic:

Create a Vertica database for Kafka messages:
```
CREATE FLEX TABLE KafkaFlex();
```
Create the Kafka user:
```
CREATE USER KafkaUser;
```

Create a resource pool:

CREATE RESOURCE POOL scheduler_pool PLANNEDCONCURRENCY 1;

Create a Kafka topic

Start the Kafka service, and create a Kafka topic that the scheduler can consume data from:

Open a new shell and start the Kafka producer:

kubectl -namespace kafka run kafka-producer -ti --image=quay.io/strimzi/kafka:0.38.0-kafka-3.6.0 --rm=true --restart=Never -- bash

Create the Kafka topic that the scheduler subscribes to:

bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap.kafka:9092 --topic KafkaTopic1

Configure the scheduler

Deploy the scheduler container in initializer mode, and configure the scheduler to consume data from the Kafka topic:

Deploy the vertica-kafka-scheduler Helm chart. This manifest has initializerEnabled set to true so you can configure the vkconfig container before you launch the scheduler:
```
kubectl apply -f vertica-kafka-scheduler.yaml
```

Use kubectl exec to get a shell in the initializer pod:

kubectl exec -namespace main -it vk1-vertica-kafka-scheduler-initializer -- bash

Set configuration options for the scheduler. For descriptions of each of the following options, see vkconfig script options:

# scheduler options 
vkconfig scheduler --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
 --frame-duration 00:00:10 \
 --create --operator KafkaUser \
 --eof-timeout-ms 2000 \
 --config-refresh 00:01:00 \
 --new-source-policy START \
 --resource-pool scheduler_pool

# target options 
vkconfig target --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
 --target-schema public \
 --target-table KafkaFlex

# load spec options 
vkconfig load-spec --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
 --load-spec KafkaSpec \
 --parser kafkajsonparser \
 --load-method DIRECT \
 --message-max-bytes 1000000

# cluster options 
vkconfig cluster --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
 --cluster KafkaCluster \
 --hosts my-cluster-kafka-bootstrap.kafka:9092

# source options 
vkconfig source --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
 --cluster KafkaCluster \
 --source KafkaTopic1 \
 --partitions 1

# microbatch options 
vkconfig microbatch --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
 --microbatch KafkaBatch1 \
 --add-source KafkaTopic1 \
 --add-source-cluster KafkaCluster \
 --target-schema public \
 --target-table KafkaFlex \
 --rejection-schema public \
 --rejection-table KafkaFlex_rej \
 --load-spec KafkaSpec

Launch the scheduler

After you configure the scheduler options, you can deploy it in launcher mode:

helm upgrade -namespace main vk1 vertica-charts/vertica-kafka-scheduler \
  --set "launcherEnabled=true"

Testing the deployment

Now that you have a containerized Kafka cluster and VerticaDB CR running, you can test that the scheduler is automatically sending data from the Kafka producer to Vertica:

In the terminal that is running your Kafka producer, send sample JSON data:
```
>{"a": 1}
>{"a": 1000}
```

In a different terminal, open vsql and query the KafkaFlex table for the data:

=> SELECT compute_flextable_keys_and_build_view('KafkaFlex');
                                 compute_flextable_keys_and_build_view                    
--------------------------------------------------------------------------------------------------------
 Please see public.KafkaFlex_keys for updated keys
The view public.KafkaFlex_view is ready for querying
(1 row)
 
=> SELECT a from KafkaFlex_view;
 a
-----
 1
 1000
(2 rows)

Parameters

affinity: Applies affinity rules that constrain the scheduler to specific nodes.
conf.configMapName: Name of the ConfigMap to use and optionally generate. If omitted, the chart picks a suitable default.
conf.content: Set of key-value pairs in the generated ConfigMap. If conf.generate is false, this setting is ignored.
conf.generate: When set to true, the Helm chart controls the creation of the vkconfig.conf ConfigMap.; Default: true
fullNameOverride: Gives the Helm chart full control over the name of the objects that get created. This takes precedence over nameOverride.
initializerEnabled: When set to true, the initializer pod is created. This can be used to run any setup tasks needed.; Default: true
image.pullPolicy: How often Kubernetes pulls the image for an object. For details, see Updating Images in the Kubernetes documentation.; Default: IfNotPresent
image.repository: The image repository and name that contains the Vertica Kafka Scheduler.; Default: opentext/kafka-scheduler
image.tag: Version of the Vertica Kafka Scheduler. This setting must match the version of the Vertica server that the scheduler connects to.; Default: Helm chart's appVersion
imagePullSecrets: List of Secrets that contain the required credentials to pull the image.
launcherEnabled: When set to true, the Helm chart creates the launch deployment. Enable this setting after you configure the scheduler options in the container.; Default: true
jvmOpts: Values to assign to the VKCONFIG_JVM_OPTS environment variable in the pods.

NOTE You can omit most truststore and keystore settings because they are set by tls.* parameters.
nameOverride: Controls the name of the objects that get created. This is combined with the Helm chart release to form the name.
nodeSelector: nodeSelector that controls where the pod is scheduled.
podAnnotations: Annotations that you want to attach to the pods.
podSecurityContext: Security context for the pods.
replicaCount: Number of launch pods that the chart deploys.; Default: 1
resources: Host resources to use for the pod.
securityContext: Security context for the container in the pod.
serviceAccount.annotations: Annotations to attach to the ServiceAccount.
serviceAccount.create: When set to true, a ServiceAccount is created as part of the deployment.; Default: true
serviceAccount.name: Name of the service account. If this parameter is not set and serviceAccount.create is set to true, a name is generated using the fullname template.
timezone: Utilize this to manage the timezone of the logger. As logging employs log4j, ensure you use a Java-friendly timezone ID. Refer to this site for available IDs: https://docs.oracle.com/middleware/1221/wcs/tag-ref/MISC/TimeZones.html; Default: UTC
tls.enabled: When set to true, the scheduler is set up for TLS authentication.; Default: false
tls.keyStoreMountPath: Directory name where the keystore is mounted in the pod. This setting controls the name of the keystore within the pod. The full path to the keystore is constructed by combining this parameter and tls.keyStoreSecretKey.
tls.keyStorePassword: Password that protects the keystore. If this setting is omitted, then no password is used.
tls.keyStoreSecretKey: Key within tls.keyStoreSecretName that is used as the keystore file name. This setting and tls.keyStoreMountPath form the full path to the key in the pod.
tls.keyStoreSecretName: Name of an existing Secret that contains the keystore. If this setting is omitted, no keystore information is included.
tls.trustStoreMountPath: Directory name where the truststore is mounted in the pod. This setting controls the name of the truststore within the pod. The full path to the truststore is constructed by combining this parameter with tls.trustStoreSecretKey.
tls.trustStorePassword: Password that protects the truststore. If this setting is omitted, then no password is used.
tls.trustStoreSecretKey: Key within tls.trustStoreSecretName that is used as the truststore file name. This is used with tls.trustStoreMountPath to form the full path to the key in the pod.
tls.trustStoreSecretName: Name of an existing Secret that contains the truststore. If this setting is omitted, then no truststore information is included.
tolerations: Applies tolerations that control where the pod is scheduled.

vertica / kafka-scheduler-chart Goto Github PK

kafka-scheduler-chart's Introduction

Install the charts

Sample manifests

Usage

Deploy the manifests

Set up Vertica

Create a Kafka topic

Configure the scheduler

Launch the scheduler

Testing the deployment

Parameters

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent