A Kubernetes operator for Apache Flink, implemented in Java. See FLIP-212.
The operator is managed helm chart. To install run:
cd helm/flink-operator
helm install flink-operator .
In order to use the webhook for FlinkDeployment validation, you must install the cert-manager on the Kubernetes cluster:
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.7.1/cert-manager.yaml
The webhook can be disabled during helm install by passing the --set webhook.create=false
parameter or editing the values.yaml
directly.
The operator supports watching a specific list of namespaces for FlinkDeployment resources. You can enable it by setting the --set watchNamespaces={flink-test}
parameter.
When this is enabled role-based access control is only created specifically for these namespaces for the operator and the jobmanagers, otherwise it defaults to cluster scope.
The flink-operator will watch the CRD resources and submit a new Flink deployment once the CR is applied.
kubectl create -f examples/basic.yaml
kubectl delete -f examples/basic.yaml
OR
kubectl delete flinkdep {dep_name}
Get all the Flink deployments running in the K8s cluster
kubectl get flinkdep
Describe a specific Flink deployment to show the status(including job status, savepoint, etc.)
kubectl describe flinkdep {dep_name}
docker build . -t <repo>/flink-java-operator:latest
docker push <repo>flink-java-operator:latest
helm install flink-operator . --set image.repository=<repo> --set image.tag=latest
You can run or debug the FlinkOperator
from your preferred IDE. The operator itself is accessing the deployed Flink clusters through the REST interface. When running locally the rest.port
and rest.address
Flink configuration parameters must be modified to a locally accessible value.
When using minikube tunnel
the rest service is exposed on localhost:8081
> minikube tunnel
> kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
basic-session-example ClusterIP None <none> 6123/TCP,6124/TCP 14h
basic-session-example-rest LoadBalancer 10.96.36.250 127.0.0.1 8081:30572/TCP 14h
The operator picks up the default log and flink configurations from /opt/flink/conf
. You can put the rest configuration parameters here:
cat /opt/flink/conf/flink-conf.yaml
rest.port: 8081
rest.address: localhost
GitHub Actions help you automate your software development workflows in the same place you store code and collaborate on pull requests and issues. You can write individual tasks, called actions, and combine them to create a custom workflow. Workflows are custom automated processes that you can set up in your repository to build, test, package, release, or deploy any code project on GitHub.
Considering the cost of running the builds, the stability, and the maintainability, flink-kubernetes-operator chose GitHub Actions and build the whole CI/CD solution on it. All the unit tests, integration tests, and the end-to-end tests will be triggered for each PR.
Note: Please make sure the CI passed before merging.
The operator extends the Flink Metric System that allows gathering and exposing metrics to centralized monitoring solutions. The well known Metric Reporters are shipped in the operator image and are ready to use.
The default metrics reporter in the operator is Slf4j. It does not require any external monitoring systems, and it is enabled in the operator Helm chart by default, mainly for demonstrating purposes.
metrics.reporter.slf4j.factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory
metrics.reporter.slf4j.interval: 1 MINUTE
To use a more robust production grade monitoring solution the configuration needs to be changed.
The following example shows how to enable the Prometheus metric reporter:
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9999
Some metric reporters, including the Prometheus, needs a port to be exposed on the container. This can be achieved be defining a value for the otherwise empty metrics.port
variable.
Either in the values.yaml file:
metrics:
port: 9999
or using the option --set metrics.port=9999
in the command line.
The Prometheus Operator among other options provides an elegant, declarative way to specify how group of pods should be monitored using custom resources.
To install the Prometheus operator via Helm run:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack
The Grafana dashboard can be accessed through port-forwarding:
kubectl port-forward deployment/prometheus-grafana 3000
To enable the operator metrics in Prometheus create a pod-monitor.yaml
file with the following content:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: flink-operator
labels:
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: flink-operator
podMetricsEndpoints:
- port: metrics
and apply it on your Kubernetes environment:
kubectl create -f pod-monitor.yaml
Once the custom resource is created in the Kubernetes environment the operator metrics are ready to explore http://localhost:3000/explore.
Savepoints can be triggered manually by defining a random (nonce) value to the variable savepointTriggerNonce
in the job specification:
job:
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
parallelism: 2
upgradeMode: savepoint
state: running
savepointTriggerNonce: 123
The operator will trigger a savepoint every time the modified CR is applied and the nonce is different from the previous value.