Git Product home page Git Product logo

openshift4-docs's Introduction

APPUiO Managed OpenShift 4 Documentation

This repository contains all the content of the APPUiO Managed OpenShift 4 documentation hosted at https://kb.vshn.ch/oc4/.

The content is written in AsciiDoc and rendered using Antora.

Tip
  • Writing AsciiDoc is best done using Visual Studio Code with the AsciiDoc addon.

  • For a reference what you can do with AsciiDoc, have a look at AsciiDoc Writer’s Guide.

  • Antora is capable of doing many great things with documentation. See the Antora docs to gain insights into the tooling.

Documentation structure

The documentation structure is inspired by the Divio’s documentation structure:

Tutorials (Learning-oriented)

A lesson which teaches you something. Location: docs/modules/ROOT/pages/tutorials.

How-to guides (Problem-oriented)

Step-by-step guides to achieve a goal. Location: docs/modules/ROOT/pages/how-tos.

Technical reference (Information-oriented)

Description about the inner ongoings. Location: docs/modules/ROOT/pages/tutorials.

Explanation (Understanding-oriented)

Explains the background. Location: docs/modules/ROOT/pages/explanations.

Contributing

Create a new branch to make the changes. After you’re satisfied with the changes open a Pull Request against the master branch.

Previewing Changes

To preview your changes locally, make sure you have Docker or Podman installed.

Just type make preview and open your browser at http://localhost:2020. This even provides support for live reload when working on the content. See this documentation for more information about it.

Adding a New Page

  1. Create new AsciiDoc (.adoc) file in the best matching folder according to the described structure under docs/modules/ROOT/pages/.

  2. Add the file to the navigation under docs/modules/ROOT/partials/

For removing pages just do the opposite: Remove the file and remove the entry in the navigation.

Deployment

This repository only holds Antora content, no plumbing and tooling (aka the Antora playbook.yml and Dockerfile) to build and deploy it. All pushes to the master branch trigger a GitHub action (.github/worflows/triggerci.yml) which in turn triggers the GitLab CI job to build and deploy the content using Antora.

openshift4-docs's People

Contributors

54nd20 avatar a-tell avatar anothertobi avatar arska avatar bastjan avatar bliemli avatar ccremer avatar clnrmn avatar corvus-ch avatar debakelorakel avatar glrf avatar haasad avatar happytetrahedron avatar madchr1st avatar megian avatar mhutter avatar nunojusto avatar rxbn avatar ryazenth avatar schemen avatar simu avatar srueg avatar thebiglee avatar tobru avatar zugao avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

acteru nunojusto

openshift4-docs's Issues

Concept for monitoring OpenShift 4

OpenShift 4 includes a cluster monitoring based on Prometheus. This ticket aims to answer the question: how do we make use of this.

Motivation

The documentation about Configuring the monitoring stack lists quite a lot of things that can not be configured. This includes:

  • Add additional ServiceMonitors
  • Creating unexpected ConfigMap objects or PrometheusRule objects.
  • Directly edit the resources and custom resources of the monitoring stack
  • Using resources of the stack for your purposes
  • Stopping the Cluster Monitoring Operator from reconciling the monitoring stack
  • Adding new and edit existing alert rules
  • Modifying Grafana

We know from experience with OpenShift 3.11: some tweaking will required at some point. This includes adding ServiceMonitors for things not (yet) covered by Cluster Monitoring, adding new rules to cover additional failure scenarios and altering rules that are noisy and or not actionable.

Goals

Enable us to:

  • monitor things not covered by Cluster Monitoring
  • tweak existing alert rules in case they do not provide any value to us and or are noisy without being actionable

Non-Goals

Answer the question where alerts are being sent to and thus how the are being acted upon.

Design Proposal

Based on all those restrictions, one could conclude to omit the Cluster Monitoring and do it on your own. This would give full control over everything. But the Cluster Monitoring is a fundamental part of an OpenShift 4 setup and will always be present. It is required for certain things to work properly. The result of doing all again, would be a huge waste of resources both in terms of management/engineering as well as compute and storage resources.

For that reason we will make use of Cluster Monitoring as much as possible. We will operate a second pair of Prometheus instances in parallel to the Cluster Monitoring ones. Yet that second pair of instances only take care of the things we can not do with the Cluster Monitoring.

Those additional Prometheus instance will get the needed metrics from Cluster Monitoring. Targets are only scraped directly, when cluster Monitoring not already is doing so. Alerts will be sent to the Alertmanager instances of the Cluster Monitoring.

User Stories

Noisy and or non actionable alert rule

The Configuring the monitoring stack documentation explicitly prohibits changing of the existing alert rules. From our experience with OpenShift 3.11, we had cases where we were in need to do do so. The reasons being a rule that just produced noise, was not actionable and or did not cover for some edge cases.

The OpenShift 4 monitoring is based on kube-prometheus. We have also experience with this as we are using it for non OpenShift Kubernetes clusters. We also already had to tweak some of those rules. See CPUThrottlingHigh false positives for an example.

For those cases, we can make use of Alertmanager. With the routing configuration we route those troublesome alerts to the void. The second set of Prometheus will then evaluate a replacement alert rule.

Service not monitored

The Configuring the monitoring stack documentation explicitly prohibits creation of additional ServiceMonitors with the cluster monitoring. Instead we will use our second set of Prometheus to scrape metrics from those services. Rules based on those metrics will also be evaluated there.

Failure scenario not covered by existing alert rules

The Configuring the monitoring stack explicitly prohibits creation of additional alert rules.

Additional alert rules will be configured and evaluated on our second set of Prometheus instances. The metrics will come from Cluster Monitoring and or from targets directly scraped.

Custom Dashboards

The Configuring the monitoring stack documentation explicitly prohibits changing of the Grafana instance. In order to have custom dashboards, we can operate our own Grafana instance which uses our second set of Prometheus as its data source.

Implementation Details/Notes/Constraints

Our own pair of Prometheus instances will use remote read to query metrics from the Cluster Monitoring. This does not create additional replicas of the metrics. No additional storage is needed except for the additional targets scraped. The remote read is also efficient on memory usage (see Remote Read Meets Streaming).

Risks and Mitigations

This setup mitigates all the configuration restrictions of Cluster Monitoring. It also does so with non or only a minimal resource overhead.

The remote read is a source of failure usually not present and has to be accounted for.

  • Cluster Monitoring operates two Prometheus instances configured equally
  • Thanos Querier load balances queries to those Prometheus instances
  • Two additional Prometheus instances operated for scraping additional metrics and evaluation of rules
  • Alert Rules must be engineered to discover if the remote read has issues

See Remote Read Meets Streaming for an in depth discussion on the subject.

Drawbacks

This setup up is specific to OpenShift 4. It can not—or not without major change—be applied to non OpenShift 4 setups.

Alternatives

Remote write

The OpenShift 4 documentation does not mention it but in the source we see that remote write targets can be configured. Prometheus itself does not provide a receive endpoint instead Thanos Receiver could be used.

Thanos Reciever writes the received metrics in the same format as Prometheus does. It is possible to point a Prometheus instance to the same data directory and thus "import" it to Prometheus. While this works technically, this is probably not save for production. Instead, remote read or Thanos Ruler must be used.

So, this is less an alternative but a complement to achieve long term storage.

Cluster Monitoring will be configured to write metrics into a Thanos Receiver. The receiver then stores those metrics into S3. With Thanos Querier, those metrics again, will then be made available to Prometheus using remote read and also to Grafana.

Federation

Federation allows a Prometheus server to scrape selected time series from another Prometheus server. The key word here is selected.

It is possible to use the federation endpoint to scrape all metrics. This has several downsides.

  • Both the federation and the federated instance need a substantial amount of memory. This is due to the fact that all metrics need to be loaded into memory for marshalling and unmarshalling to and from the transport format. This can be mitigated by splitting up the federation into smaller chunks.
  • All scraped metrics, need to be stored yet again which probably results in additional costs for the required disk space.
  • Federation requires planning in advance. Metrics are not available if they have not been scraped.

Federation is meant to built aggregated view in a hierarchical architecture. It is not built to bring most if not all metrics from one Prometheus instance to another.

References

Error compiling commodore - error fetching cert-manager chart

I'm getting this error when using the documented alias and cloudscale provider

commodore () {
    mkdir -p inventory/classes/global dependencies/lib compiled/ catalog/
    docker run \
    --interactive=true \
    --tty \
    --rm \
    --user="$(id -u):$(id -u)" \
    --volume "$HOME"/.ssh:/app/.ssh:ro \
    --volume "$PWD"/compiled/:/app/compiled/ \
    --volume "$PWD"/catalog/:/app/catalog \
    --volume "$PWD"/dependencies/:/app/dependencies/ \
    --volume "$PWD"/inventory/:/app/inventory/ \
    --volume ~/.gitconfig:/app/.gitconfig:ro \
    -e COMMODORE_API_URL=$COMMODORE_API_URL \
    -e COMMODORE_GLOBAL_GIT_BASE=$COMMODORE_GLOBAL_GIT_BASE \
    -e COMMODORE_API_TOKEN=$COMMODORE_API_TOKEN \
    projectsyn/commodore:latest \
    $*
}

Executing:
commodore catalog compile ${CLUSTER_ID} --push -i

Outputs:

...
Dependency helm chart cert-manager and version v0.15.1: looks like "https://charts.jetstack.io" is not a valid chart repository or cannot be reached: open /app/.cache/helm/repository/mYbJ6JPEcnKROC+FDBw2MDZnHzE=-index.yaml: no such file or directory

Compile error: cert-manager/helmcharts/cert-manager for target: cluster not found in search_paths: ['./', './dependencies/', PosixPath('/usr/local/lib/python3.8/site-packages/commodore'), '/tmp/tmpwup4bkyh.kapitan']

I can get that URL from everywhere, so it's really strange.
I'm using the latest commodore image (i've also removed all commodore images, just to be sure and pulled again)
and I'm running on Linux Ubuntu 20.04.

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:45:44 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:15 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Installation on cloudscale.ch: Lifetime of provided credentials is unclear

It's unclear to me whether those values will only be used to bootstrap the cluster, or whether they will be persisted on the cluster.

As a cluster operator,
in order to set an appropriate lifetime of access tokens generated,
I'd like to know how long they will be used.

Examples:

  • GitLab access tokens
  • cloudscale.ch API tokens
  • Terraform-related tokens

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.