Git Product home page Git Product logo

component-openshift4-logging's Introduction

Commodore Component: OpenShift4 Logging

This is a Commodore Component for OpenShift4 Logging.

This repository is part of Project Syn. For documentation on Project Syn and this component, see syn.tools.

Documentation

The rendered documentation for this component is available on the Commodore Components Hub.

Documentation for this component is written using Asciidoc and Antora. It can be found in the docs folder. We use the Divio documentation structure to organize our documentation.

Run the make docs-serve command in the root of the project, and then browse to http://localhost:2020 to see a preview of the current state of the documentation.

After writing the documentation, please use the make docs-vale command and correct any warnings raised by the tool.

Contributing and license

This library is licensed under BSD-3-Clause. For information about how to contribute, see CONTRIBUTING.

component-openshift4-logging's People

Contributors

anothertobi avatar bastjan avatar ccremer avatar corvus-ch avatar debakelorakel avatar github-actions[bot] avatar glrf avatar happytetrahedron avatar megian avatar mhutter avatar renovate[bot] avatar simu avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

component-openshift4-logging's Issues

Add support for cluster-logging 5.6

Context

The cluster-logging stack version 5.6 is now released, cf. https://docs.openshift.com/container-platform/4.12/logging/cluster-logging-release-notes.html

We need to add support for the new logging stack version in the component. Things to consider are:

  • Upstream alerting configs which got moved
  • Upstream alerts which got changed/added/removed

Task deliverables

  • The component supports installing logging stack 5.6
  • New/changed/removed upstream alerts are reviewed and patched if necessary

Support fetching alert rules from Go constant

Context

We switch fetching the logging collector alert rules to use a lookup table in #72. However, future releases of the logging stack will no longer provide the collector alert rules as a YAML file, but instead as a Go constant, cf. openshift/cluster-logging-operator#1732.

We'll need to add support for handling this change to the component in some form.

Follow-up to #69.

Acceptance Criteria

  • The component can deploy collector alert rules for logging stack versions which ship the default alert rules as a Go constant
  • The solution supports selecting a logging stack version at catalog compilation time

Alternatives

  • Vendor the logging stack version 5.5 alert rules in the component, and manage them ourselves going forward. This risks that our alert rule configuration starts drifting from upstream (e.g. when upstream changes labels on the metrics used by the rules).

Proposed implementation

  • Have a small Go program which imports the cluster-logging package and writes out the alerts constant to a YAML file
  • The program is parametrized in some way to support different logging stack versions
  • The program can be run through a make target
  • There's a scheduled GitHub action which regularly runs the program and creates a PR when there's changes upstream

Component compilation is non-deterministic with the default parameters

The component fetches the upstream fluentd alerts from the master branch of https://github.com/openshift/cluster-logging-operator, cf.

and
source: https://raw.githubusercontent.com/openshift/cluster-logging-operator/${openshift4_logging:alerts}/files/fluentd/fluentd_prometheus_alerts.yaml
By doing so, we can't generate reproducible golden test outputs, as the contents of the fluentd alerts on the master branch may change arbitrarily.

Steps to Reproduce the Problem

  1. generate golden tests output (make gen-golden)
  2. observe that golden tests may fail after some time, see e.g. #46

Actual Behavior

Golden tests start failing if the upstream alerts are modified on the master branch

Expected Behavior

We want deterministic compilation of components with default parameters. We should pick some stable version of the upstream repo to use in the component defaults.

After ugrading to OpenShift Logging 5.7.0 Operator CreateContainerConfigError

After upgrading the OpenShift Logging 5.7.0 Operator stuck on start with the error

Error: container has runAsNonRoot and image will run as root (pod: "cluster-logging-operator-6b5d9c7495-8rhqh_openshift-logging(ea92698c-ce34-48b5-b458-47aba00c469d)", container: cluster-logging-operator)`.

On the openshift-logging namespace the pod security is set to privileged:

$ oc get ns openshift-logging -o yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    ...
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/warn: privileged

The OpenShift Logging Operator enforces the pod starting non-root:

spec:
  template:
    spec:
      containers:
      - name: cluster-logging-operator
        ...
        securityContext:
          allowPrivilegeEscalation: false
      securityContext:
        runAsNonRoot: true

The pod is actually started with the SCC privileged-higher-prio:

oc get po cluster-logging-operator-674c877f5b-w25p9 -n openshift-logging -o yaml | grep "openshift.io/scc"
openshift.io/scc: privileged-higher-prio

Either the namespace configuration is wrong or the operator deployment on upgrades.

Steps to Reproduce the Problem

  1. Upgrade an existing OpenShift Logging from 5.6 to 5.7

Actual Behavior

The Operator does not start. After removing the securityContext.allowPrivilegeEscalation and securityContext. runAsNonRoot the operator does startwithout issues.

Expected Behavior

The upgrade does run without the operator stuck in root / non-root conflicts.

Add extracted alerts for the OpenShift Logging 5.7

Context

The extracted alerts for the version 5.7 are missing. Check if they are still required, needs to be updated and add them specific for this version.

Path: component/extracted_alerts/release-5.7/*

Alternatives

N/A

Add default network policy to allow communication between elasticsearch-operator and ns openshift-logging

Context

When installing elasticsearch-operator in the official namespace openshift-operators-redhat, Elasticsearch is not accessible. There are default network policies allow-from-other-namespaces and allow-from-same-namespace in place which allow ingress traffic and traffic within a ns. Communication between ns is not allowed. The elasticsearch-operator is installed in ns openshift-operators-redhat and needs access to elasticsearch in ns openshift-logging.

A network policy should be implemented, which should allow this communication.

OpenShift 4.11 requires `clusterlogging.collection.type` to be set explicitly

Steps to Reproduce the Problem

  1. On OpenShift 4.11, upgrade hosted logging to ≥ 5.5.2

Actual Behavior

Collector pods keep being restarted.

{"_ts":"2022-10-18T12:18:16.867228372Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogforwarder-controller error updating status","_error":{"msg":"Operation cannot be fulfilled on clusterlogforwarders.logging.openshift.io "instance": the object has been modified; please apply your changes to the latest version and try again"}}

Expected Behavior

Collector pods keeps running

Suggested implementation

https://access.redhat.com/solutions/6976455 suggests to explicitly set:

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
...
  name: instance
  namespace: openshift-logging
...
spec:
  collection:
    type: fluentd
    logs:
      type: fluentd

For spec.collection.logs.type this is already done (

). Adding spec.collection.type is straight forward.

The component should manage its own alert rules

Context

Currently we manage and deploy alert rules for the logging stack in component openshift4-monitoring, cf. https://hub.syn.tools/openshift4-monitoring/references/parameters.html#_upstreamrules_elasticsearchoperator. This doesn't make a lot of sense, since we can't reliably select the correct version of the alert rules for the logging stack in that component, as we don't exactly know which logging stack version is deployed.

In contrast, this component always knows which logging stack version is getting installed and can therefore easily select the matching set of alert rules.

Alternatives

Keep the alert rules management in component openshift4-monitoring

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

This repository currently has no open or pending branches.


  • Check this box to trigger a request for Renovate to run again on this repository

Component fails to retrieve collector rules for logging >= release-5.5

When configuring the component to retrieve alert rules for release-5.5 or later, compiling the component fails.

This happens because the upstream rules were moved from https://raw.githubusercontent.com/openshift/cluster-logging-operator/${openshift4_logging:alerts}/files/fluentd/fluentd_prometheus_alerts.yaml to https://raw.githubusercontent.com/openshift/cluster-logging-operator/${openshift4_logging:alerts}/files/collector/fluentd_prometheus_alerts.yaml for release-5.5 and later (file moved to different folder).

Steps to Reproduce the Problem

  1. Set openshift4_logging.alerts: "release-5.4"
  2. Compile -> works
  3. Set openshift4_logging.alerts: "release-5.5"
  4. Compile -> fails

Actual Behavior

Unknown (Non-Kapitan) Error occurred
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kapitan/dependency_manager/base.py", line 159, in fetch_http_dependency
    content_type = fetch_http_source(source, cached_source_path, item_type)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kapitan/dependency_manager/base.py", line 207, in fetch_http_source
    content, content_type = make_request(source)
                            ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kapitan/utils.py", line 478, in make_request
    r.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 953, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/openshift/cluster-logging-operator/master/files/fluentd/fluentd_prometheus_alerts.yaml
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/kapitan/targets.py", line 113, in compile_targets
    fetch_dependencies(
  File "/usr/local/lib/python3.11/site-packages/kapitan/dependency_manager/base.py", line 88, in fetch_dependencies
    [p.get() for p in pool.imap_unordered(http_worker, http_deps.items()) if p]
  File "/usr/local/lib/python3.11/site-packages/kapitan/dependency_manager/base.py", line 88, in <listcomp>
    [p.get() for p in pool.imap_unordered(http_worker, http_deps.items()) if p]
  File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 873, in next
    raise value
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/openshift/cluster-logging-operator/master/files/fluentd/fluentd_prometheus_alerts.yaml
404 Client Error: Not Found for url: https://raw.githubusercontent.com/openshift/cluster-logging-operator/master/files/fluentd/fluentd_prometheus_alerts.yaml

Expected Behavior

Successful compile

Increase pv with commodore configuration

Context

To increase elastic storage we currently need to

  • increase storage in commodore config and
  • increase pvc to actually increase the storage per CSI

Alternatives

If possible it would be nice if the commodore component increased the pv as well.

Add clusterLogForwarding

Context

Enable clusterLogForwarding to forward logs to 3rd party systems.
E.g. forward logs to an external Splunk server.

Removing openshift-logging leaves some objects untouched

After removing openshift-logging component, the following objects are left behind.

❯ oc get crd
clusterlogforwarders.logging.openshift.io                         2021-05-18T07:41:41Z
clusterloggings.logging.openshift.io                              2021-05-18T07:41:41Z
elasticsearches.logging.openshift.io                              2021-05-18T07:41:42Z
kibanas.logging.openshift.io                                      2021-05-18T07:41:42Z

❯ oc get operator
NAME                                       AGE
cluster-logging.openshift-logging          49m
elasticsearch-operator.openshift-logging   49m

Steps to Reproduce the Problem

  1. Remove openshift-logging
applications:
  - ~openshift-logging
  1. Push changes to cluster

Actual Behavior

The Operator and CRDs are orphaned.

Expected Behavior

All logging related objects are removed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.