appuio / component-openshift4-logging Goto Github PK

View Code? Open in Web Editor NEW

3.0 13.0 0.0 318 KB

Commodore component to manage OpenShift 4 cluster logging

License: BSD 3-Clause "New" or "Revised" License

Makefile 15.04% Jsonnet 79.35% Shell 5.61%

commodore-component projectsyn syn openshift openshift4 logging vshn-project-ocp

component-openshift4-logging's Introduction

Commodore Component: OpenShift4 Logging

This is a Commodore Component for OpenShift4 Logging.

This repository is part of Project Syn. For documentation on Project Syn and this component, see syn.tools.

Documentation

The rendered documentation for this component is available on the Commodore Components Hub.

Documentation for this component is written using Asciidoc and Antora. It can be found in the docs folder. We use the Divio documentation structure to organize our documentation.

Run the make docs-serve command in the root of the project, and then browse to http://localhost:2020 to see a preview of the current state of the documentation.

After writing the documentation, please use the make docs-vale command and correct any warnings raised by the tool.

Contributing and license

This library is licensed under BSD-3-Clause. For information about how to contribute, see CONTRIBUTING.

component-openshift4-logging's People

Contributors

Stargazers

Watchers

component-openshift4-logging's Issues

Add support for cluster-logging 5.6

Context

The cluster-logging stack version 5.6 is now released, cf. https://docs.openshift.com/container-platform/4.12/logging/cluster-logging-release-notes.html

We need to add support for the new logging stack version in the component. Things to consider are:

Upstream alerting configs which got moved
Upstream alerts which got changed/added/removed

Task deliverables

The component supports installing logging stack 5.6
New/changed/removed upstream alerts are reviewed and patched if necessary

Support fetching alert rules from Go constant

Context

We switch fetching the logging collector alert rules to use a lookup table in #72. However, future releases of the logging stack will no longer provide the collector alert rules as a YAML file, but instead as a Go constant, cf. openshift/cluster-logging-operator#1732.

We'll need to add support for handling this change to the component in some form.

Follow-up to #69.

Acceptance Criteria

The component can deploy collector alert rules for logging stack versions which ship the default alert rules as a Go constant
The solution supports selecting a logging stack version at catalog compilation time

Alternatives

Vendor the logging stack version 5.5 alert rules in the component, and manage them ourselves going forward. This risks that our alert rule configuration starts drifting from upstream (e.g. when upstream changes labels on the metrics used by the rules).

Proposed implementation

Have a small Go program which imports the cluster-logging package and writes out the alerts constant to a YAML file
The program is parametrized in some way to support different logging stack versions
The program can be run through a make target
There's a scheduled GitHub action which regularly runs the program and creates a PR when there's changes upstream

Add default network policy to allow communication between elasticsearch-operator and ns openshift-logging

Context

When installing elasticsearch-operator in the official namespace openshift-operators-redhat, Elasticsearch is not accessible. There are default network policies allow-from-other-namespaces and allow-from-same-namespace in place which allow ingress traffic and traffic within a ns. Communication between ns is not allowed. The elasticsearch-operator is installed in ns openshift-operators-redhat and needs access to elasticsearch in ns openshift-logging.

A network policy should be implemented, which should allow this communication.

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

This repository currently has no open or pending branches.

Check this box to trigger a request for Renovate to run again on this repository

OpenShift 4.11 requires `clusterlogging.collection.type` to be set explicitly

Steps to Reproduce the Problem

On OpenShift 4.11, upgrade hosted logging to ≥ 5.5.2

Actual Behavior

Collector pods keep being restarted.

{"_ts":"2022-10-18T12:18:16.867228372Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogforwarder-controller error updating status","_error":{"msg":"Operation cannot be fulfilled on clusterlogforwarders.logging.openshift.io "instance": the object has been modified; please apply your changes to the latest version and try again"}}

Expected Behavior

Collector pods keeps running

Suggested implementation

https://access.redhat.com/solutions/6976455 suggests to explicitly set:

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
...
  name: instance
  namespace: openshift-logging
...
spec:
  collection:
    type: fluentd
    logs:
      type: fluentd

For spec.collection.logs.type this is already done (

component-openshift4-logging/class/defaults.yml

Line 58 in 679ea77

type: fluentd

). Adding spec.collection.type is straight forward.

After ugrading to OpenShift Logging 5.7.0 Operator CreateContainerConfigError

After upgrading the OpenShift Logging 5.7.0 Operator stuck on start with the error

Error: container has runAsNonRoot and image will run as root (pod: "cluster-logging-operator-6b5d9c7495-8rhqh_openshift-logging(ea92698c-ce34-48b5-b458-47aba00c469d)", container: cluster-logging-operator)`.

On the openshift-logging namespace the pod security is set to privileged:

$ oc get ns openshift-logging -o yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    ...
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/warn: privileged

The OpenShift Logging Operator enforces the pod starting non-root:

spec:
  template:
    spec:
      containers:
      - name: cluster-logging-operator
        ...
        securityContext:
          allowPrivilegeEscalation: false
      securityContext:
        runAsNonRoot: true

The pod is actually started with the SCC privileged-higher-prio:

oc get po cluster-logging-operator-674c877f5b-w25p9 -n openshift-logging -o yaml | grep "openshift.io/scc"
openshift.io/scc: privileged-higher-prio

Either the namespace configuration is wrong or the operator deployment on upgrades.

Steps to Reproduce the Problem

Upgrade an existing OpenShift Logging from 5.6 to 5.7

Actual Behavior

The Operator does not start. After removing the securityContext.allowPrivilegeEscalation and securityContext. runAsNonRoot the operator does startwithout issues.

Expected Behavior

The upgrade does run without the operator stuck in root / non-root conflicts.

Increase pv with commodore configuration

Context

To increase elastic storage we currently need to

increase storage in commodore config and
increase pvc to actually increase the storage per CSI

Alternatives

If possible it would be nice if the commodore component increased the pv as well.

Component fails to retrieve collector rules for logging >= release-5.5

When configuring the component to retrieve alert rules for release-5.5 or later, compiling the component fails.

This happens because the upstream rules were moved from https://raw.githubusercontent.com/openshift/cluster-logging-operator/${openshift4_logging:alerts}/files/fluentd/fluentd_prometheus_alerts.yaml to https://raw.githubusercontent.com/openshift/cluster-logging-operator/${openshift4_logging:alerts}/files/collector/fluentd_prometheus_alerts.yaml for release-5.5 and later (file moved to different folder).

Steps to Reproduce the Problem

Set openshift4_logging.alerts: "release-5.4"
Compile -> works
Set openshift4_logging.alerts: "release-5.5"
Compile -> fails

Actual Behavior

Unknown (Non-Kapitan) Error occurred
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kapitan/dependency_manager/base.py", line 159, in fetch_http_dependency
    content_type = fetch_http_source(source, cached_source_path, item_type)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kapitan/dependency_manager/base.py", line 207, in fetch_http_source
    content, content_type = make_request(source)
                            ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kapitan/utils.py", line 478, in make_request
    r.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 953, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/openshift/cluster-logging-operator/master/files/fluentd/fluentd_prometheus_alerts.yaml
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/kapitan/targets.py", line 113, in compile_targets
    fetch_dependencies(
  File "/usr/local/lib/python3.11/site-packages/kapitan/dependency_manager/base.py", line 88, in fetch_dependencies
    [p.get() for p in pool.imap_unordered(http_worker, http_deps.items()) if p]
  File "/usr/local/lib/python3.11/site-packages/kapitan/dependency_manager/base.py", line 88, in <listcomp>
    [p.get() for p in pool.imap_unordered(http_worker, http_deps.items()) if p]
  File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 873, in next
    raise value
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/openshift/cluster-logging-operator/master/files/fluentd/fluentd_prometheus_alerts.yaml
404 Client Error: Not Found for url: https://raw.githubusercontent.com/openshift/cluster-logging-operator/master/files/fluentd/fluentd_prometheus_alerts.yaml

Expected Behavior

Successful compile

Allow configuration of ClusterLogForwarder in component

Context

Making the ClusterLogForwarder configurable in this component would be a good thing to do. With that, another part of the OCP4 logging could be configured in this component and parts of existing configuration could be easily reused (e.g. namespace).

Alternatives

It's possible using https://github.com/projectsyn/component-adhoc-configurations, but this object should be managed by this component.

Add clusterLogForwarding

Context

Enable clusterLogForwarding to forward logs to 3rd party systems.
E.g. forward logs to an external Splunk server.

The component should manage its own alert rules

Context

Currently we manage and deploy alert rules for the logging stack in component openshift4-monitoring, cf. https://hub.syn.tools/openshift4-monitoring/references/parameters.html#_upstreamrules_elasticsearchoperator. This doesn't make a lot of sense, since we can't reliably select the correct version of the alert rules for the logging stack in that component, as we don't exactly know which logging stack version is deployed.

In contrast, this component always knows which logging stack version is getting installed and can therefore easily select the matching set of alert rules.

Alternatives

Keep the alert rules management in component openshift4-monitoring

Add wildcard namespace selector for `clusterLogForwarding.namespace_groups`

Context

clusterLogForwarding.namespace_groups currently requires a list of namespaces. It would be convenient if namespaces could be selected based on naming pattern, e.g. *-dev. This would reduce the effort of manually keeping the namespace list up to date in the config if projects are added/removed.

Implementation idea

expose label selector since wildcards are not possible

Component compilation is non-deterministic with the default parameters

The component fetches the upstream fluentd alerts from the master branch of https://github.com/openshift/cluster-logging-operator, cf.

component-openshift4-logging/class/defaults.yml

Line 5 in 600107c

alerts: 'master'

and

component-openshift4-logging/class/openshift4-logging.yml

Line 5 in 600107c

 source: https://raw.githubusercontent.com/openshift/cluster-logging-operator/${openshift4_logging:alerts}/files/fluentd/fluentd_prometheus_alerts.yaml 

By doing so, we can't generate reproducible golden test outputs, as the contents of the fluentd alerts on the master branch may change arbitrarily.

Steps to Reproduce the Problem

generate golden tests output (make gen-golden)
observe that golden tests may fail after some time, see e.g. #46

Actual Behavior

Golden tests start failing if the upstream alerts are modified on the master branch

Expected Behavior

We want deterministic compilation of components with default parameters. We should pick some stable version of the upstream repo to use in the component defaults.

Allow to control the log chattiness of the Kubelet

Context

By default, the log level of the Kubelet is set to debug. This generates a lot of noise in the logs. It also results in a huge amount of data stored within logging (Elastisearch). The log level should be configurable through the configuration hierarchy. Section "3. Persistent configuration for OCP 4.6 and later" in https://access.redhat.com/solutions/4619431 explains how this can be done.

Removing openshift-logging leaves some objects untouched

After removing openshift-logging component, the following objects are left behind.

❯ oc get crd
clusterlogforwarders.logging.openshift.io                         2021-05-18T07:41:41Z
clusterloggings.logging.openshift.io                              2021-05-18T07:41:41Z
elasticsearches.logging.openshift.io                              2021-05-18T07:41:42Z
kibanas.logging.openshift.io                                      2021-05-18T07:41:42Z

❯ oc get operator
NAME                                       AGE
cluster-logging.openshift-logging          49m
elasticsearch-operator.openshift-logging   49m

Steps to Reproduce the Problem

Remove openshift-logging

applications:
  - ~openshift-logging

Push changes to cluster

Actual Behavior

The Operator and CRDs are orphaned.

Expected Behavior

All logging related objects are removed.

Add extracted alerts for the OpenShift Logging 5.7

Context

The extracted alerts for the version 5.7 are missing. Check if they are still required, needs to be updated and add them specific for this version.

Path: component/extracted_alerts/release-5.7/*

Alternatives

N/A

appuio / component-openshift4-logging Goto Github PK

component-openshift4-logging's Introduction

Commodore Component: OpenShift4 Logging

Documentation

Contributing and license

component-openshift4-logging's People

Contributors

Stargazers

Watchers

component-openshift4-logging's Issues

Context

Task deliverables

Context

Acceptance Criteria

Alternatives

Proposed implementation

Context

Steps to Reproduce the Problem

Actual Behavior

Expected Behavior

Suggested implementation

Steps to Reproduce the Problem

Actual Behavior

Expected Behavior

Context

Alternatives

Steps to Reproduce the Problem

Actual Behavior

Expected Behavior

Context

Alternatives

Context

Context

Alternatives

Context

Implementation idea

Steps to Reproduce the Problem

Actual Behavior

Expected Behavior

Context

Steps to Reproduce the Problem

Actual Behavior

Expected Behavior

Context

Alternatives

Recommend Projects

Recommend Topics

Recommend Org