Git Product home page Git Product logo

quarkus-observability-app's Introduction

Quarkus Observability App

1. Introduction

This application showcases how to configure Logging, Metrics, and Tracing in a Quarkus and collect and manage them using the supported infrastructure of Openshift.

1.1. Quarkus application

The application is built using Quarkus, a container-first framework for writing Java applications.

Table 1. Used Quarkus extensions
Extension Name Purpose

Micrometer Registry Prometheus

Expose Metrics

Logging JSON

Format Logs in JSON

OpenTelemetry

Distributed Tracing

SmallRye Health

Live and Running endpoints

1.2. Openshift Components

In order to collect the logs, metrics, and traces from our application, we are going to deploy and configure several Openshift components.

Table 2. Openshift Supported Components
Openshift Component Purpose

OCP Infra Monitoring

Collect metrics from containers like memory, cpu, networking, etc. to display on Grafana and match with user-workload-monitoring

OCP User-workload Monitoring

Collect metrics in OpenMetrics format from user workloads and present it in the built-in dashboard.

OCP Alerting

The Alertmanager service handles alerts received from Prometheus. Alertmanager is also responsible for sending the alerts to external notification systems.

OCP Distributed Tracing

Collect and display distributed traces. It is based on the Grafana Tempo project. It uses the OpenTelemetry standard.

Cluster Logging Operator

Collect, store, and visualize application, infrastructure and audit logs.

1.3. Community components

Apart from Red Hat supported components like the ones listed in the previous section, we are also going to use community projects. As of today, we only use the Grafana operator to deploy a Grafana cluster.

Table 3. Community Components
Component Purpose

Grafana Operator

The Grafana Operator is a Kubernetes operator built to help you manage your Grafana instances and its resources in and outside of Kubernetes.

2. The Quarkus Application

2.1. How to start?

Access the Code Quarkus site that will help you to generate the application quickstart with the Quarkus extensions:

Quarkus Application Generator
Figure 1. Quarkus Application Generator

Generate the application and download it as .zip.

2.2. How it works?

The application is similar to the autogenerated version, but with the following customizations:

  • I’ve added a new endpoint to count something using the Swagger OpenApi library.

  • I’ve used the Micrometer metrics library to generate custom metrics that I will expose in the Prometheus endpoint. I’ve created three new metrics:

    • Gauges measure a value that can increase or decrease over time, like the speedometer on a car.

    • Counters are used to measure values that only increase.

    • Distribution summaries record an observed value, which will be aggregated with other recorded values and stored as a sum

2.3. How to run it?

2.3.1. Option 1: Locally

You can run your application in dev mode that enables live coding using:

mvn compile quarkus:dev

NOTE: Quarkus now ships with a Dev UI, which is available in dev mode only at http://localhost:8080/q/dev/.

2.3.2. Option 2: Packaging and running the application

The application can be packaged using:

mvn package

It produces the quarkus-run.jar file in the target/quarkus-app/ directory. Be aware that it’s not an uber-jar as the dependencies are copied into the target/quarkus-app/lib/ directory.

The application is now runnable using java -jar target/quarkus-app/quarkus-run.jar.

If you want to build an uber-jar, execute the following command:

mvn package -Dquarkus.package.type=uber-jar

The application, packaged as an uber-jar, is now runnable using java -jar target/*-runner.jar.

2.3.3. Option 3: Shipping it into a Container

Manual steps to generate the container image locally:

# Generate the Native executable
mvn package -Pnative -Dquarkus.native.container-runtime=podman -Dquarkus.native.remote-container-build=true -Dquarkus.container-image.build=true

# Add the executable to a container image
podman build -f src/main/docker/Dockerfile.native -t quarkus/quarkus-observability-app .

# Launch the application
podman run -i --rm -p 8080:8080 quarkus/quarkus-observability-app

3. Full install on OpenShift

ℹ️
This repository has been fully migrated to the GitOps pattern. This means that it is strongly recommended to deploy ArgoCD in order to deploy these components in an standard way.

What do you need before installing the application?

  • This repo is tested on OpenShift version 4.16.10, but most of the configuration should work in previous versions. There has been changes to the code to adapt to latest releases, so you can always check old commits for old configurations :)

  • Both Grafana Loki and Grafana Tempo relies on Object storage that is not available on OCP after installation. As I don’t want to mix things installing ODF (Super nice component), the auto-install.sh script will use your AWS credentials to create two AWS S3 buckets on Amazon.

  • This is the GitOps era, so you will need ArgoCD deployed on your cluster. I recommend using OpenShift GitOps and for that I have a really cool repo. Have a lok at it here.

As this is a public repo, it is not possible to upload all the credentials freely to the git repository. For that reason, there is a script that will create some prerequisites (Buckets and Secrets mainly) before creating the app-of-apps pattern. Please execute the following script:

./auto-install.sh

After that, you should see the following apps on ArgoCD:

App of Apps for Quarkus Observability
Figure 2. App of Apps for Quarkus Observability

4. Deploy components individually

4.1. Quarkus App

Deploy the app in a new namespace using the following command:

oc apply -f apps/application-quarkus-observability.yaml

4.2. Red Hat build of OpenTelemetry

Red Hat build of OpenTelemetry product provides support for deploying and managing the OpenTelemetry Collector and simplifying the workload instrumentation. It can receive, process, and forward telemetry data in multiple formats, making it the ideal component for telemetry processing and interoperability between telemetry systems.

OpenTelemetry is made of several components that interconnect to process metrics and traces. The following diagram from this blog will help you to understand the architecture:

Red Hat Build of OpenTelemetry - Architecture
Figure 3. Red Hat Build of OpenTelemetry - Architecture

For more context about OpenTelemetry, I strongly recommend reading the following blogs:

ℹ️
If you struggle with OTEL configuration, please check this redhat-rhosdt-samples repository.

This component is currently used only as an aggregator of traces for Distributed Tracing, so it is deployed together. Please, continue to the new section to see how.

4.3. OpenShift Distributed Tracing

Red Hat OpenShift Distributed Tracing lets you perform distributed tracing, which records the path of a request through various microservices that make up an application. Tempo is split into several components deployed as different microservices. The following diagram from this blog will help you to better understand the architecture:

Red Hat Distributed Tracing - Architecture
Figure 4. Red Hat Distributed Tracing - Architecture

For more context about DistTracing, I strongly recommend reading the following blogs:

For more information, check the official documentation.

You con deploy Grafana Tempo and OpenTelemetry using the following ArgoCD application:

oc apply -f apps/application-ocp-dist-tracing.yaml

Once you have configured everything, you can access the Metrics tab and show stats retrieved directly from the Traces collected by the OpenTelemetry collector. This is an example of the output:

Red Hat Distributed Tracing - Metrics tab
Figure 5. Red Hat Distributed Tracing - Metrics tab

4.3.1. Dashboards

By default, the Grafana Tempo operator does not configure or provide any Grafana Dashboards for monitoring. Therefore, I have collected the ones provided upstream in this folder: https://github.com/grafana/tempo/tree/main/operations/tempo-mixin-compiled. They are deployed together with the Grafana deployment that you will see in a section below.

4.4. OpenShift Monitoring

In OpenShift Container Platform 4.16, you can enable monitoring for user-defined projects in addition to the default platform monitoring. You can monitor your own projects in OpenShift Container Platform without the need for an additional monitoring solution. In this section we only configure the components, but we don’t set up the monitoring of the application using a ServiceMonitor. This is done in the application section:

oc apply -f apps/application-ocp-monitoring.yaml

For more information, check the official documentation.

ℹ️
If you face issues creating and configuring the Service monitor, you can use this Thoubleshooting guide.

4.5. OpenShift Alerting

Using Openshift Metrics, it is really simple to add alerts based on those Prometheus Metrics:

oc apply -f apps/application-ocp-alerting.yaml

4.6. OpenShift Logging

The logging subsystem aggregates infrastructure and applications logs from throughout your cluster and stores them in a default log store. The Openshift Logging installation consists on installing first the Cluster Logging Operator, the Loki Operator and configuring them.

ℹ️
The Openshift Logging team decided to move from EFK to Vector+Loki. The original Openshift Logging Stack was split into three products: ElasticSearch ( Log Store and Search), Fluentd (Collection and Transportation), and Kibana (Visualization). Now, there will be only two: Vector (Collection) and Loki (Store).
Installing Logging
oc apply -f apps/application-ocp-logging.yaml

4.6.1. External logging storage

By default, the logging subsystem sends container and infrastructure logs to the default internal log store based on Loki. Administrators can create ClusterLogForwarder resources that specify which logs are collected, how they are transformed, and where they are forwarded to.

ClusterLogForwarder resources can be used up to forward container, infrastructure, and audit logs to specific endpoints within or outside of a cluster. Transport Layer Security (TLS) is supported so that log forwarders can be configured to send logs securely.

In the current implementation, the CLF only enables audit logs on the default Loki store. It is possible to configure other stuff like sending logs to the AWS Cloudwatch service. If you want to do so, please, check the CLF definition gitops/ocp-logging/clusterlogforwarder-instance.yaml and uncomment the sections related to Cloudwatch. You will need the infrastructureName that can be retrieved using the following command and you will need to add it to .spec.outputs.cloudwatch.groupPrefix:

oc get Infrastructure/cluster -o=jsonpath='{.status.infrastructureName}'

Now, you can check the logs in Cloudwatch using the following command:

source aws-env-vars
aws --output json logs describe-log-groups --region=$AWS_DEFAULT_REGION

4.7. Grafana Operator

Installing Grafana
oc apply -f apps/application-grafana.yaml

After installing, you can access the Grafana UI and see the following dashboard:

Grafana dashboard
Figure 6. Grafana dashboard

Annex A: Network Policies with Observability

As you may already know, you can define network policies that restrict traffic to pods in your cluster. When the cluster is empty and your applications don’t rely on other Openshift components, this is easy to configure. However, when you add the full observability stack plus extra common services, it can get tricky. That’s why I would like to summarize some of the common NetworkPolicies:

# Here you will deny all traffic except for Routes, Metrics, and webhook requests.
oc process -f openshift/ocp-network-policies/10-basic-network-policies.yaml | oc apply -f -

For other NetworkPolicy configurations, check the official documentation.

Annex B: Tekton Pipelines as Code

Pipelines as code allow to define CI/CD in a file located in git. This file is then used to automatically create a pipeline for a Pull Request or a Push to a branch.

Step 1: Create a GH application

This step automates all the steps in this section of the documentation:

  • Create an application in GitHub with the configuration of the cluster.

  • Create a secret in Openshift with the configuration of the GH App pipelines-as-code-secret.

tkn pac bootstrap
# In the interactive menu, set the application name to "pipelines-as-code-app"

Step 2: Create a Repository CR

This section creates a Repository CR with the configuration of the GitHub application in the destination repository:

tkn pac create repository

Annex C: New image with expiration in Quay

It is possible to use Labels to set the automatic expiration of individual image tags in Quay. In order to test that, I just added a new dockerfile that takes an image as a build argument and labels it with a set expiration time.

podman build -f src/main/docker/Dockerfile.add-expiration \
    --build-arg IMAGE_NAME=quay.io/alopezme/quarkus-observability-app \
    --build-arg IMAGE_TAG=latest-micro \
    --build-arg EXPIRATION_TIME=2h \
    -t quay.io/alopezme/quarkus-observability-app:expiration-test .
Check the results
# Nothing related to expiration:
podman inspect image --format='{{json .Config.Labels}}'  quay.io/alopezme/quarkus-observability-app:latest-micro | jq

# Adds expiration label:
podman inspect image --format='{{json .Config.Labels}}'  quay.io/alopezme/quarkus-observability-app:expiration-test | jq

quarkus-observability-app's People

Contributors

alvarolop avatar dgpmakes avatar skoussou avatar

Stargazers

User avatar  avatar Tosin Akinosho avatar Fran Perea avatar Carlos Cornejo avatar Marek Martofel avatar Jorge Tudela avatar  avatar

Watchers

 avatar

quarkus-observability-app's Issues

[Monitor] Add custom tags to app metrics

I want to have the mechanism to tag metrics with extra labels.

  • The tag that I want is domain: test.
  • I don't want any modification in the application code or application.yml.
  • I want to add an extra selector in the Grafana Dashboard to filter metrics that contain that label.

[Tracing] Jaeger in production mode

Current Distributed Tracing CR deploys a Jaeger instance with everything in memory in the same Pod. This is great for demos, but not for production.

We need to create a new file openshift/ocp-distributed-tracing/21-jaeger-production.yaml to explore the capabilities of strategy: production. In this file, I would like to have the same as in 20-jaeger.yaml, but with all the relevant production configuration variables (commented if not needed) as well as a default configuration.

More info: https://github.com/alvarolop/quarkus-observability-app/blob/main/openshift/ocp-distributed-tracing/20-jaeger.yaml#L20-L22

Documentation: https://docs.openshift.com/container-platform/4.12/distr_tracing/distr_tracing_install/distr-tracing-deploying-jaeger.html#distr-tracing-deploy-production_deploying-distr-tracing-platform

[Monitoring] Correct counter representation in Grafana

The Grafana dashboard represents the count of executions to the /hello endpoint of the API, but the value in the X axis is not correct. (It tries to calculate the mean count per second during a minute, so 10 calls shows less than 1.0 for a minute).

Dashboard screenshot: https://github.com/alvarolop/quarkus-observability-app/blob/main/docs/images/grafana-dashboard.png

This is the wrong query: https://github.com/alvarolop/quarkus-observability-app/blob/main/openshift/ocp-monitoring/grafana/dashboard.json#L246-L263

I would like to see all the executions in each minute (So, current-previousMinute)

[Logging] Add custom tags to app logs

I want to have the mechanism to tag logs with extra labels.

  • The tag that I want is domain: test.
  • I don't want any modification in the application code or application.yml.
  • I want to be able to filter in Loki Dashboard all the logs from pods with that label.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.