Git Product home page Git Product logo

loki-k8s-operator's Introduction

Loki Charmed Operator for K8s

CharmHub Badge Release Discourse Status

Description

Loki is an open-source fully-featured logging system. This Loki charmed operator handles installation, scaling, configuration, optimisation, networking, service mesh, observability, and Day 2 operations specific to Loki using Juju and the Charmed Operator Lifecycle Manager (OLM).

On the principle that an operator should "do one thing and do it well", this operator drives Loki application only. However, it can be composed with other operators to deliver a complex application or service. Because operators package expert knowledge in a reusable and shareable form, they hugely simplify software management and operations.

Getting started

Basic deployment

Create a Juju model for your operator, say "observability"

juju add-model observability

The Loki Charmed Operator may be deployed using the Juju command line in a quite very way.

juju deploy loki-k8s --channel=stable

Checking deployment status

Once the Charmed Operator is deployed, the status can be checked running:

juju status --color --relations
$ juju status --color --relations

Model          Controller           Cloud/Region        Version  SLA          Timestamp
observability  charm-dev-batteries  microk8s/localhost  3.0.2    unsupported  10:39:52-03:00

App       Version  Status   Scale  Charm     Channel  Rev  Address         Exposed  Message
loki-k8s  2.4.1    waiting      1  loki-k8s  stable    47  10.152.183.195  no       waiting for container

Unit         Workload  Agent  Address      Ports  Message
loki-k8s/0*  active    idle   10.1.36.115

Loki HTTP API

Loki Charmed Operator exposes its HTTP API over port 3100.

Example 1 - Get Loki version:

/loki/api/v1/status/buildinfo exposes the build information in a JSON object. The fields are version, revision, branch, buildDate, buildUser, and goVersion.

loki_ip=$(juju status loki-k8s/0 | grep "loki-k8s/0" | awk '{print $4}')

curl http://$loki_ip:3100/loki/api/v1/status/buildinfo
{"version":"2.4.1","revision":"f61a4d261","branch":"HEAD","buildUser":"root@39a6e600b2df","buildDate":"2021-11-08T13:09:51Z","goVersion":""}

Example 2 - Send logs entries to Loki with curl:

/loki/api/v1/push is the endpoint used to send log entries to Loki. The default behavior is for the POST body to be a snappy-compressed protobuf message. Alternatively, if the Content-Type header is set to application/json, a JSON post body can be sent in the following format:

loki_ip=$(juju status loki-k8s/0 | grep "loki-k8s/0" | awk '{print $4}')

curl -v -H "Content-Type: application/json" -XPOST -s "http://$loki_ip:3100/loki/api/v1/push" --data-raw \
  '{"streams": [{ "stream": { "foo": "bar2" }, "values": [ [ "1570818238000000000", "fizzbuzz" ] ] }]}'

Example 3 - Send logs entries to Loki with Promtail:

Promtail is an agent which ships the contents of local logs to Loki. It is usually deployed to every machine that has applications needed to be monitored.

It primarily:

  • Discovers targets
  • Attaches labels to log streams
  • Pushes them to the Loki instance.

Currently, Promtail can tail logs from two sources: local log files and the systemd journal (on AMD64 machines only).

To set up a Promtail instance to work with Loki Charmed Operator please refer to Configuring Promtail documentation. Anyway the most important part is the clients section in Promtail config file, for instance:

clients:
  - url: http://<LOKI_ADDRESS>:3100/loki/api/v1/push

Relations

Overview

Relations provide a means to integrate applications and enable a simple communications channel. Loki Charmed Operator supports the following:

Provides

Logging

  logging:
    interface: loki_push_api

Loki Charmed Operator may receive logs from any charm that supports the loki_push_api relation interface.

Let's say that we have a Charmed Operator that implements the other side (requires) of the relation, for instance Zinc After deploying this charm, we can relate Loki and Zinc through loki_push_api relation interface:

juju relate zinc-k8s loki-k8s

And verify the relation between both charms is created:

$ juju status --relations

Model          Controller           Cloud/Region        Version  SLA          Timestamp
observability  charm-dev-batteries  microk8s/localhost  3.0.2    unsupported  10:57:01-03:00

App       Version  Status  Scale  Charm     Channel  Rev  Address         Exposed  Message
loki-k8s  2.4.1    active      1  loki-k8s  stable    47  10.152.183.168  no
zinc-k8s  0.3.5    active      1  zinc-k8s  stable    45  10.152.183.144  no

Unit         Workload  Agent      Address      Ports  Message
loki-k8s/0*  active    executing  10.1.36.79
zinc-k8s/0*  active    executing  10.1.36.123

Relation provider  Requirer          Interface      Type     Message
loki-k8s:logging   zinc-k8s:logging  loki_push_api  regular

Once the relation is established, Zinc charm can start sending logs to Loki charm.

Grafana-source

  grafana-source:
    interface: grafana_datasource
    optional: true

The Grafana Charmed Operator aggregates logs obtained by Loki and provides a versatile dashboard to view these logs in configurable ways. Loki relates to Grafana over the grafana_datasource interface.

For example, let's say that we have already deployed the Grafana Charmed Operator in our observability model. The way to relate Loki and Grafana is again very simple:

juju relate grafana-k8s:grafana-source loki-k8s:grafana-source

And verify the relation between both charms is created:

$ juju status --relations

Model          Controller           Cloud/Region        Version  SLA          Timestamp
observability  charm-dev-batteries  microk8s/localhost  3.0.2    unsupported  11:09:08-03:00

App          Version  Status  Scale  Charm        Channel  Rev  Address         Exposed  Message
grafana-k8s  9.2.1    active      1  grafana-k8s  stable    52  10.152.183.40   no
loki-k8s     2.4.1    active      1  loki-k8s     stable    47  10.152.183.168  no

Unit            Workload  Agent  Address     Ports  Message
grafana-k8s/0*  active    idle   10.1.36.93
loki-k8s/0*     active    idle   10.1.36.79

Relation provider        Requirer                    Interface           Type     Message
grafana-k8s:grafana      grafana-k8s:grafana         grafana_peers       peer
loki-k8s:grafana-source  grafana-k8s:grafana-source  grafana_datasource  regular

Metrics-endopoint

  metrics-endpoint:
    interface: prometheus_scrape

This Loki Charmed Operator provides a metrics endpoint so a charm that implements the other side of the relation (requires), for instance Prometheus Charmed Operator can scrape Loki metrics.

For instance, let's say that we have already deployed the Prometheus Charmed Operator in our observability model. The way to relate Loki and Promethes is again very simple:

juju relate prometheus-k8s loki-k8s
$ juju status --relations

Model          Controller           Cloud/Region        Version  SLA          Timestamp
observability  charm-dev-batteries  microk8s/localhost  3.0.2    unsupported  11:25:19-03:00

App             Version  Status  Scale  Charm           Channel  Rev  Address         Exposed  Message
grafana-k8s     9.2.1    active      1  grafana-k8s     stable    52  10.152.183.40   no
loki-k8s        2.4.1    active      1  loki-k8s        stable    47  10.152.183.168  no
prometheus-k8s  2.33.5   active      1  prometheus-k8s  stable    79  10.152.183.144  no

Unit               Workload  Agent  Address     Ports  Message
grafana-k8s/0*     active    idle   10.1.36.93
loki-k8s/0*        active    idle   10.1.36.79
prometheus-k8s/0*  active    idle   10.1.36.84

Relation provider                Requirer                         Interface           Type     Message
grafana-k8s:grafana              grafana-k8s:grafana              grafana_peers       peer
loki-k8s:grafana-source          grafana-k8s:grafana-source       grafana_datasource  regular
loki-k8s:metrics-endpoint        prometheus-k8s:metrics-endpoint  prometheus_scrape   regular
prometheus-k8s:prometheus-peers  prometheus-k8s:prometheus-peers  prometheus_peers    peer

Grafana-dashboard

  grafana-dashboard:
    interface: grafana_dashboard

Loki Charmed Operator may send its own Dashboards to Grafana by using this relation.

After relating both charms this way:

juju relate grafana-k8s:grafana-dashboard loki-k8s:grafana-dashboard

You will be able to check that the relation is established, and see the Loki Dashboard in Grafana UI.

$ juju status --relations

Model          Controller           Cloud/Region        Version  SLA          Timestamp
observability  charm-dev-batteries  microk8s/localhost  3.0.2    unsupported  13:21:21-03:00

App             Version  Status  Scale  Charm           Channel  Rev  Address         Exposed  Message
grafana-k8s     9.2.1    active      1  grafana-k8s     stable    52  10.152.183.40   no
loki-k8s        2.4.1    active      1  loki-k8s        stable    47  10.152.183.168  no
prometheus-k8s  2.33.5   active      1  prometheus-k8s  stable    79  10.152.183.144  no

Unit               Workload  Agent  Address     Ports  Message
grafana-k8s/0*     active    idle   10.1.36.85
loki-k8s/0*        active    idle   10.1.36.97
prometheus-k8s/0*  active    idle   10.1.36.67

Relation provider                Requirer                         Interface           Type     Message
grafana-k8s:grafana              grafana-k8s:grafana              grafana_peers       peer
loki-k8s:grafana-dashboard       grafana-k8s:grafana-dashboard    grafana_dashboard   regular
loki-k8s:grafana-source          grafana-k8s:grafana-source       grafana_datasource  regular
loki-k8s:metrics-endpoint        prometheus-k8s:metrics-endpoint  prometheus_scrape   regular
prometheus-k8s:prometheus-peers  prometheus-k8s:prometheus-peers  prometheus_peers    peer

Requires

Alertmanager

Alertmanager receives alerts from Loki, aggregates and deduplicates them, then forwards them to specified targets. Loki Charmed Operator relates to Alertmanager over the alertmanager_dispatch interface.

Let's assume the we have already deployed Alertmanager Charmed Operator in our obsevability model, and relate it with Loki:

juju relate alertmanager-k8s loki-k8s

We can check the relation is established:

juju status --color --relations
$ juju status --relations
Model          Controller           Cloud/Region        Version  SLA          Timestamp
observability  charm-dev-batteries  microk8s/localhost  3.0.2    unsupported  13:27:18-03:00

App               Version  Status   Scale  Charm             Channel  Rev  Address         Exposed  Message
alertmanager-k8s  0.23.0   waiting      1  alertmanager-k8s  stable    36  10.152.183.56   no       waiting for container
grafana-k8s       9.2.1    active       1  grafana-k8s       stable    52  10.152.183.40   no
loki-k8s          2.4.1    active       1  loki-k8s          stable    47  10.152.183.168  no
prometheus-k8s    2.33.5   active       1  prometheus-k8s    stable    79  10.152.183.144  no

Unit                 Workload  Agent      Address      Ports  Message
alertmanager-k8s/0*  active    idle       10.1.36.113
grafana-k8s/0*       active    idle       10.1.36.85
loki-k8s/0*          active    executing  10.1.36.97
prometheus-k8s/0*    active    idle       10.1.36.67

Relation provider                Requirer                         Interface              Type     Message
alertmanager-k8s:alerting        loki-k8s:alertmanager            alertmanager_dispatch  regular
alertmanager-k8s:replicas        alertmanager-k8s:replicas        alertmanager_replica   peer
grafana-k8s:grafana              grafana-k8s:grafana              grafana_peers          peer
loki-k8s:grafana-dashboard       grafana-k8s:grafana-dashboard    grafana_dashboard      regular
loki-k8s:grafana-source          grafana-k8s:grafana-source       grafana_datasource     regular
loki-k8s:metrics-endpoint        prometheus-k8s:metrics-endpoint  prometheus_scrape      regular
prometheus-k8s:prometheus-peers  prometheus-k8s:prometheus-peers  prometheus_peers       peer

Ingress

  ingress:
    interface: ingress_per_unit
    limit: 1

Interactions with the Loki charm can not be assumed to originate within the same Juju model, let alone the same Kubernetes cluster, or even the same Juju cloud. Hence the Loki charm also supports an Ingress relation. There are multiple use cases that require an ingress, in particular

  • Querying the Loki HTTP API endpoint across network boundaries.
  • Self monitoring of Loki that must happen across network boundaries to ensure robustness of self monitoring.
  • Supporting the Loki push API.

Loki typical needs a "per unit" Ingress. This per unit ingress is necessary since Loki exposes it loki push api endpoint on a per unit basis. A per unit ingress relation is available in the traefik-k8s charm and this Loki charm does support that relation over ingress_per_unit interface.

Let's assume the we have already deployed Traefik Charmed Operator in our obsevability model, and relate it with Loki:

juju relate traefik-k8s loki-k8s

We can check the relation is established:

juju status --color --relations
$ juju status --relations
Model          Controller           Cloud/Region        Version  SLA          Timestamp
observability  charm-dev-batteries  microk8s/localhost  3.0.2    unsupported  15:46:43-03:00

App               Version  Status  Scale  Charm             Channel  Rev  Address         Exposed  Message
alertmanager-k8s  0.23.0   active      1  alertmanager-k8s  stable    36  10.152.183.56   no
grafana-k8s       9.2.1    active      1  grafana-k8s       stable    52  10.152.183.40   no
loki-k8s          2.4.1    active      1  loki-k8s          stable    47  10.152.183.168  no
prometheus-k8s    2.33.5   active      1  prometheus-k8s    stable    79  10.152.183.144  no
traefik-k8s                active      1  traefik-k8s       stable    93  192.168.122.10  no

Unit                 Workload  Agent      Address      Ports  Message
alertmanager-k8s/0*  active    idle       10.1.36.95
grafana-k8s/0*       active    executing  10.1.36.121
loki-k8s/0*          active    idle       10.1.36.116
prometheus-k8s/0*    active    idle       10.1.36.80
traefik-k8s/0*       active    idle       10.1.36.122

Relation provider                Requirer                         Interface              Type     Message
alertmanager-k8s:alerting        loki-k8s:alertmanager            alertmanager_dispatch  regular
alertmanager-k8s:replicas        alertmanager-k8s:replicas        alertmanager_replica   peer
grafana-k8s:grafana              grafana-k8s:grafana              grafana_peers          peer
loki-k8s:grafana-dashboard       grafana-k8s:grafana-dashboard    grafana_dashboard      regular
loki-k8s:grafana-source          grafana-k8s:grafana-source       grafana_datasource     regular
loki-k8s:metrics-endpoint        prometheus-k8s:metrics-endpoint  prometheus_scrape      regular
prometheus-k8s:prometheus-peers  prometheus-k8s:prometheus-peers  prometheus_peers       peer
traefik-k8s:ingress-per-unit     loki-k8s:ingress                 ingress_per_unit       regular

OCI Images

Every release of the Loki Charmed Operator uses the latest stable version of grafana/loki at the time of release.

Official Documentation

For further details about Loki configuration and usage, please refer to Grafana Loki Documentation

loki-k8s-operator's People

Contributors

abuelodelanada avatar balbirthomas avatar benhoyt avatar dstathis avatar ibraaoad avatar lucabello avatar michaeldmitry avatar mmkay avatar nrobinaubertin avatar observability-noctua-bot avatar pietropasotti avatar rbarry82 avatar samirps avatar sed-i avatar simskij avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

loki-k8s-operator's Issues

`_endpoints` methods must not rely on `planned_units`

Bug Description

The following method:

    def _endpoints(self) -> List[dict]:
        """Return a list of Loki Push Api endpoints."""
        return [{"url": self._url(unit_number=i)} for i in range(self._charm.app.planned_units())]

in LokiPushApiProvider class relies on planned units numbers to generate Loki endpoints.

But lifecycle reasons mean that at any given point unit numbers may not start with 0, may also have gaps, for instance if unit 5 is faulty I would remove it and add a new one, I'd end up with units 4, 6, 7.

To Reproduce

There is no need to have steps to reproduce this.

Environment

  • juju 2.9.25

Relevant log output

None

Additional context

No response

Avoid logging warnings when default alert rules folder does not exist

Bug Description

When a charm author includes the loki_push_api library, if they do not create the src/loki_alert_rules folder, the library spits WARNING logs continuously. We should issue the WARNING logs if and only if: the charm author has CUSTOMIZED the alert rules path AND the directory does not exist. By default, we should support charm authors not to set alert rules and not to be spammed :-)

To Reproduce

  • Add the LokiPushApiConsumer to a charm, do not create the src/loki_alert_rules folder
  • charmcraft pack
  • juju deploy ...
  • juju debug-log
  • enjoy the spam

Environment

Wherever

Relevant log output

THere is too much log

Additional context

No response

Integrate loki_push_api with Ingress

Bug Description

Loki charmed in standalone mode must implement an ingress relation using the ingress_per_unit relation interface and the https://github.com/canonical/traefik-k8s-operator/blob/main/lib/charms/traefik_k8s/v0/ingress_per_unit.py library. The URL received by Loki must be used to produce the URL of the Loki Push API endpoint, and the

class LokiPushApiProvider(RelationManagerBase):
. No tight coupling between the ingress and loki_push_api API is desired. LokiPushApiProvider must therefore be updated to enable the Loki charm to specify the host address and the path of the LokiPushApi endpoint.

Note: The path of the LokiPushApi endpoint will be affected by an ingress using a path routing rule, see e.g. https://github.com/canonical/traefik-k8s-operator#configurations

Note: The future charming of Loki using the distributed mode, will instead use the https://github.com/canonical/traefik-k8s-operator/blob/main/lib/charms/traefik_k8s/v0/ingress.py library and the respective ingress relation interface, as the Loki ingester components are designed to be load-balanced, rather than exposed as separate endpoints.

To Reproduce

None

Environment

  • juju
  • microk8s

Relevant log output

None

Additional context

No response

The promtail binary used in the lib is dynamically linked

Bug Description

The promtail binary currently used in the lib is dynamically linked. This means it won't work on all Linux environments. We need to build our own version from upstream that is statically linked (ie. with CGO disabled). The suggested course of action, for now, is to build it ourselves, place it in the Loki repo, and fetch it from there instead of directly from the upstream repo.

To Reproduce

  1. wget https://github.com/grafana/loki/releases/download/v2.4.1/promtail-linux-amd64.zip
  2. unzip promtail-linux-amd64.zip
  3. file promtail-linux-amd64

Environment

N/A

Relevant log output

❯ file promtail-linux-amd64

promtail-linux-amd64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, Go BuildID=ZZTDSIbZy_VfnPWWwZOz/C9YTBAHGKginqRccK1ae/XVCFXE7cTRaMg17D9-Xy/drsoguGJyTXhi9FV8Vbu, BuildID[sha1]=c4e8062fc97a9e5fd59355a162aba1bab0609cac, stripped

Additional context

No response

Loki config missing in a charm using LokiPushApiConsumer if VM running juju and microk8s is restarted.

Loki config missing in grafana-agent charm.

how can it be that with the latest Grafana Agent edge, when I relate Grafana Agent with Loki, I get no loki configurations generated?

michele@boombox:~$ juju status --relations
Model  Controller   Cloud/Region        Version  SLA          Timestamp
lma    development  microk8s/localhost  2.9.22   unsupported  18:56:09+01:00

App            Version  Status  Scale  Charm              Store     Channel  Rev  OS          Address         Message
alertmanager            active      1  alertmanager-k8s   charmhub  edge       7  kubernetes  10.152.183.187  
grafana                 active      1  grafana-k8s        charmhub  edge      14  kubernetes  10.152.183.218  
grafana-agent           active      2  grafana-agent-k8s  charmhub  edge       4  kubernetes  10.152.183.232  
loki                    active      1  loki-k8s           charmhub  edge      11  kubernetes  10.152.183.168  
prometheus              active      1  prometheus-k8s     charmhub  edge      13  kubernetes  10.152.183.139  

Unit              Workload  Agent  Address      Ports  Message
alertmanager/0*   active    idle   10.1.151.76         
grafana-agent/0*  active    idle   10.1.151.80         
grafana-agent/1   active    idle   10.1.151.81         
grafana/0*        active    idle   10.1.151.77         
loki/0*           active    idle   10.1.151.78         
prometheus/0*     active    idle   10.1.151.79         

Relation provider          Requirer                 Interface              Type     Message
alertmanager:alerting      prometheus:alertmanager  alertmanager_dispatch  regular  
alertmanager:replicas      alertmanager:replicas    alertmanager_replica   peer     
grafana:grafana-peers      grafana:grafana-peers    grafana_peers          peer     
loki:grafana-source        grafana:grafana-source   grafana_datasource     regular  
loki:logging               grafana-agent:logging    loki_push_api          regular  
prometheus:grafana-source  grafana:grafana-source   grafana_datasource     regular 
michele@boombox:~$ microk8s.kubectl exec -it grafana-agent-0 -n lma -c agent -- cat /etc/agent/agent.yaml
integrations:
  agent:
    enabled: true
    relabel_configs:
    - regex: (.*)
      replacement: lma_f149bca2-4d3e-4289-8b72-6b66b40d14dc_grafana-agent_grafana-agent/0
      target_label: instance
    - replacement: grafana-agent-k8s
      source_labels:
      - __address__
      target_label: juju_charm
    - replacement: lma
      source_labels:
      - __address__
      target_label: juju_model
    - replacement: f149bca2-4d3e-4289-8b72-6b66b40d14dc
      source_labels:
      - __address__
      target_label: juju_model_uuid
    - replacement: grafana-agent
      source_labels:
      - __address__
      target_label: juju_application
    - replacement: grafana-agent/0
      source_labels:
      - __address__
      target_label: juju_unit
  prometheus_remote_write: []
loki: {}
prometheus:
  configs:
  - name: agent_scraper
    remote_write: []
    scrape_configs: []
server:
  log_level: info

`_remove_alert_rules_files` method deletes all alert rules files when a charm departs the relation.

Issue

LokiPushApiProvider is deleting the entire alert rules directory every time a charm departs the relation. So if we have 2 charms related to Loki and one of them departs the relation, the _remove_alert_rules_files deletes everything.

Expected Behavior

We need to delete just the files of the charm that departs the relation.

Additional info, logs, etc:

    def _remove_alert_rules_files(self, container: Container) -> None:
        """Remove alert rules files from workload container.

        Args:
            container: Container which has alert rules files to be deleted
        """
        container.remove_path(self._rules_dir, recursive=True)
        logger.debug("Previous Alert rules files deleted")
        # Since container.remove_path deletes the directory itself with its files
        # we should create it again.
        os.makedirs(self._rules_dir, exist_ok=True)

Possible Fix

In _remove_alert_rules_files we should retrieve alert rules from the file system and delete the files of the charm that departs the relation.

Steps to Reproduce

Test to verify this works as expected:

  1. Deploy Spring Music and Grafana Agent in a separate model
  2. Deploy another Spring Music in the same model as Loki
  3. Relate the Spring Music instance in the same model as Loki directly to Loki
  4. Relate Spring Music to Grafana Agent
  5. Relate Grafana Agent to Loki over a CMR
  6. Remove the relation
  7. Make sure the correct files are deleted

Do the same with a direct relation between Spring Music and Loki. Experiment with different variations.

Context

Your Environment

  • Charm Version used: main branch
  • Environment name and version (e.g. Juju, Microk8s, etc):
    • Juju: 2.9.22
    • microk8s: 1.23/stable
  • Operating System and version (e.g Ubuntu 20.04): Ubuntu 20.04

Custom events names

loki_push_api_alert_rules_error is well scoped, and is an instance of the well named LokiPushApiAlertRulesError.
I wonder if we should drop the prefix here

    alert_rules_error = EventSource(LokiPushApiAlertRulesError)

so the user could just

        self.framework.observe(
            self._loki_consumer.on.alert_rules_error,
            self._alert_rules_error
        )

Originally posted by @sed-i in #45 (comment)


If we remove the prefix here, we should also remove:

class LoggingEvents(ObjectEvents):
    """Event descriptor for events raised by `LokiPushApiProvider`."""

    loki_push_api_alert_rules_error = EventSource(LokiPushApiAlertRulesError)
    loki_push_api_endpoint_departed = EventSource(LokiPushApiEndpointDeparted)
    loki_push_api_endpoint_joined = EventSource(LokiPushApiEndpointJoined)

Will create an issue to address it later.

use a peer relation

Also, not sure what it means that no peer relation is defined in metadata.

It means nothing special. We could use a peer relation to observe wether the Loki charm is scaled to more than one unit, and > put it in blocked state like we do it with Prometheus + Ingress thou.

Originally posted by @mmanciop in #48 (comment)

LogProxyConsumer does not support dash-separated container names.

It is my understanding that the container_name in [container_name]-pebble-ready are a dash-to-underscore mapped version of the 'actual' container name.
LogProxyConsumer fails to implement this same logic and therefore gives:

  File "./src/charm.py", line 29, in __init__                                                                          
    self.log_proxy_consumer = LogProxyConsumer(                                                                        
  File "/var/lib/juju/agents/unit-loki-tester-0/charm/lib/charms/loki_k8s/v0/loki_push_api.py", line 1624, in __init__ 
    getattr(self._charm.on, "{}_pebble_ready".format(self._container_name)),                                           
AttributeError: 'CharmEvents' object has no attribute 'loki-tester_pebble_ready'                                       

When the container name is 'loki-tester'. Note that that is not a valid python identifier.

Solve by, in __init__:

        self.framework.observe(
            getattr(self._charm.on, "{}_pebble_ready".format(self._container_name.replace("-", "_"))),
            self._on_pebble_ready,
        )

loki_push_api LIBPATCH

Since charmcraft is very strict with the LIBPATCH correlation, downgrade this value to: 6

$ charmcraft publish-lib charms.loki_k8s.v0.loki_push_api
Library charms.loki_k8s.v0.loki_push_api has a wrong LIBPATCH number, it's too high, Charmhub highest version is 0.5.

Change property contract

This property now may return:

  • A list
  • A string
  • None

Maybe a better contract for consumes will be a property that always returns a list:

  • A list with many elements
  • A list with one element
  • An empty list

Originally posted by @Abuelodelanada in #48 (comment)

loki_push_api does not like loki_alert_rules being an empty file.

_is_official_alert_rule_format(rule_file)
raises an uncaught TypeError: argument of type 'NoneType' is not iterable
when loki_alert_rules is an empty file.

Is that a valid 'rules' file to begin with?
Either way, looks like we should return list() in _from_file even if the file is invalid/empty.

Rename Repo

Per the naming guidelines recently published, this repo should be renamed loki-k8s-operator. Will need a ticket into IS.

call to autostart raises ChangeError if service already started

I deployed loki from charmhub edge and after a while noticed the following recurring error in the log.
IIRC this was the reason I stopped used autostart in alertmanager.

unit-loki-0: 02:13:46 ERROR unit.loki/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 109, in <module>
    main(LokiOperatorCharm)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 409, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 278, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 722, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 769, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 79, in _on_loki_pebble_ready
    container.autostart()
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/model.py", line 1111, in autostart
    self._pebble.autostart_services()
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 822, in autostart_services
    return self._services_action('autostart', [], timeout, delay)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 862, in _services_action
    raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:
- Start service "loki" (service "loki" was previously started)

Promtail is currently hard-coded to amd64. Needs to support arm64 as well

See

PROMTAIL_BINARY_ZIP_URL = (
"https://github.com/grafana/loki/releases/download/v2.4.1/promtail-linux-amd64.zip"
)
# Paths in `charm` container
BINARY_DIR = "/tmp"
BINARY_ZIP_FILE_NAME = "promtail-linux-amd64.zip"
BINARY_ZIP_PATH = "{}/{}".format(BINARY_DIR, BINARY_ZIP_FILE_NAME)
BINARY_FILE_NAME = "promtail-linux-amd64"
BINARY_PATH = "{}/{}".format(BINARY_DIR, BINARY_FILE_NAME)
BINARY_ZIP_SHA256SUM = "978391a174e71cfef444ab9dc012f95d5d7eae0d682eaf1da2ea18f793452031"
BINARY_SHA256SUM = "00ed6a4b899698abc97d471c483a6a7e7c95e761714f872eb8d6ffd45f3d32e6"

Use a peer relation bucket

That's kind of a can of worms.

The initial impetus behind that OF PR came about early in the charming campaign, when Balbir quickly realized that YAML/JSON dumping didn't work on Stored* objects, because they don't behave like the types they're named after (in a rather large variety of ways, but in that way particularly).

The PR was principally about hiding this from users by implementing equality methods with transparent conversions, and letting them be converted to JSON/YAML without the use of an explicit YAML/JSON formatter. My feeling then (and now) is that it doesn't feel Pythonic to have StoredDict.as_dict(), or to need a formatter for YAML/JSON, since a FooDict and FooList should just behave like the types they are claiming to be.

In the end, we had a discussion about whether StoredState (and the objects underneath it) are/were even appropriate places to keep data. You can't put the cat back in the bag, but certainly "we" at Canonical can write charms in a way which reflects best practice. For resiliency, as your recent comments about StoredState being harmful allude to, we may be better off creating peer relation buckets as the default and keeping data there, which is battle-tested, has a well-known API (anything that can be converted to/from a string), etc.

I'd suggest that, instead of going the route of explicit conversions (which will actually be _type_convert_stored anyway), a peer relation bucket is added and used here instead. The code block in the middle of this comment trivially demonstrates why. The existing magic conversions in OF already turn a list under a dict into a StoredList. Every node under the parent becomes a Stored* type, and they all need to be converted back recursively if it needs to work with json.dumps() or yaml.dump(), so an explicit conversion of StoredDict.to_dict() is necessarily more or less identical to _type_convert_stored, and the same goes for StoredList.to_list().

Let's add and use a peer relation bucket, which gives anyone reading the code for this when writing a new charm a sense of "best practice" without list of footnotes illustrating the footguns from StoredState

Originally posted by @rbarry82 in #48 (comment)

Warn the user multiple standalone units.

If the juju admin scales up loki without proper HA support, we need to supply sufficient warnings. Log messages and possibly status messages should explain the issues with doing it this way and make it clear that it is not supported.

log_proxy_departed fails if the promtail config file does not exist

This is not very high priority since it should only happen when another bug causes the file to not exist. (At time of writing such a bug exists #60).

The bug here is that the _build_config_file requires the config file to exist already. This seems like it could possibly lead to more serious issues down and we should consider fixing it.

Instantiating the library with alert rule paths as relative paths will break it

Commenting here so it's close, since I missed the last review, but as happy as I am to see a nested function as _group_name, I suppose it should be moved out since we decided not to use this as a pattern.

Sorry in advance, since this is long for a part of the library which isn't even in this PR.

def _group_name(path) -> str: should be top-level (and then typing it as typing.Union[Path.PosixPath, Path.WindowsPath]) wouldn't hurt. and using path.relative_to

Also:

return "{}_{}alerts".format(
    topology.identifier, "" if relpath == "." else relpath.replace(os.path.sep, "_") + "_"
)

This hurts to parse.

This is a nit, really, and I hate commenting on committed code, but it is a nested function, and as long as it's being moved, it could look nicer. os.path is ok, but the format is definitely a place where it's not intuitive when + "_" is actually applied, and re-reading it to grok it is slow. It looks like a smell, and the double ) from .format() and .replace() makes me worry that later patches could subtly break it.

Arguably, the current formatting is a fragile, since relative paths with "../" or something, absolute paths will lead to __, this would not work on any Windows charms, as a library function in the proxy, etc. May as well code defensively as long as I'm looking at it.

It would be very possible, right now, to get a string like the following on Windows (or just keep the .. part on Linux), and that may be an ugly bug.

{topology.identifier)_C:_foo_.._bar_baz_alerts

Pathlib ends up being somewhat more verbose, but much more flexible/resilient. Even if it isn't used, we should absolutely strip out ../ and ^([A-Z]+:)?/ with a regex and stick with the rest of os.path.

relpath = path.parent.relative_to(dir_path)

# We should account for both absolute paths and Windows paths. Convert it to a POSIX
# string, strip off any leading /, then join it

if relpath.samefile(Path(".")):
    path_str = ""
else:
    # Get rid of leaving / and optionally drive letters so they don't muck up the template later,
    # since Path.parts returns them. The 'if relpath.is_absolute ...' isn't even needed since re.sub
    # doesn't throw exceptions if it doesen't match, so it's optional, but it makes it clear what we're
    # doing.
    
    # Note that Path doesn't actually care whether the path is valid just to instantiate the object, so
    # we can happily strip that stuff out to make templating nicer
    relpath = Path(re.sub(r'^([A-Za-z]+:)?/', '', relpath.as_posix()) if relpath.is_absolute() else relpath
    
    # Get rid of relative path characters in the middle which both os.path and pathlib leave hanging
    # around. We could use path.resolve(), but that would lead to very long template strings when
    # rules come from pods and/or other deeply nested charm paths
    char_filter = lambda x: x not in ["..", "/"]
    path_str = "{}_".format("_".join(filter(char_filter, relpath.parts)))

return "{}_{}alerts.format(
    topology.identifier, path_str
)

Originally posted by @rbarry82 in #45 (comment)

Merge log_proxy and loki_push_api libraries

Now, both libraries uses the same interface name: loki_push_api and they do different things:

  • loki_push_api shares Loki Push API endpoint over relation data
  • log_proxy shares Loki Push API endpoint over relation data AND Injects Promtail in workload charm

Loki push_api lib is dropping a valid alert rules file

While debugging cos-config I found out that the following rule never reaches loki (doesn't end up in relation data).
This sample file is taken from the official docs:
https://grafana.com/docs/loki/latest/rules/#example

groups:
  - name: should_fire
    rules:
      - alert: HighPercentageError
        expr: |
          sum(rate({app="foo", env="production"} |= "error" [5m])) by (job)
            /
          sum(rate({app="foo", env="production"}[5m])) by (job)
            > 0.05
        for: 10m
        labels:
            severity: page
        annotations:
            summary: High request latency
  - name: credentials_leak
    rules: 
      - alert: http-credentials-leaked
        annotations: 
          message: "{{ $labels.job }} is leaking http basic auth credentials."
        expr: 'sum by (cluster, job, pod) (count_over_time({namespace="prod"} |~ "http(s?)://(\\w+):(\\w+)@" [5m]) > 0)'
        for: 10m
        labels: 
          severity: critical

logs and status can get filled up with `HTTPConnectionPool` error

Would be nice if unit status could be shorter, or not change so much that it fills the logs.

$ juju show-status-log loki/0
Time                   Type       Status       Message
09 Feb 2022 23:29:36Z  workload   waiting      installing agent
09 Feb 2022 23:29:36Z  juju-unit  allocating   
09 Feb 2022 23:30:29Z  juju-unit  allocating   Error: ImagePullBackOff
09 Feb 2022 23:30:29Z  workload   waiting      agent initializing
09 Feb 2022 23:30:41Z  workload   maintenance  installing charm software
09 Feb 2022 23:30:41Z  juju-unit  executing    running install hook
09 Feb 2022 23:30:57Z  workload   waiting      HTTPConnectionPool(host='localhost', port=3100): Max retries exceeded with url: /loki/api/v1/status/buildinfo (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f02c46ca670>: Failed to establish a new connection: [Errno 111] Connection refused'))
09 Feb 2022 23:30:59Z  juju-unit  executing    running grafana-source-relation-created hook
09 Feb 2022 23:31:02Z  workload   waiting      HTTPConnectionPool(host='localhost', port=3100): Max retries exceeded with url: /loki/api/v1/status/buildinfo (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9e45532940>: Failed to establish a new connection: [Errno 111] Connection refused'))
09 Feb 2022 23:31:03Z  juju-unit  executing    running leader-elected hook
09 Feb 2022 23:31:04Z  workload   waiting      HTTPConnectionPool(host='localhost', port=3100): Max retries exceeded with url: /loki/api/v1/status/buildinfo (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6cfb3aa9d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
09 Feb 2022 23:31:04Z  juju-unit  executing    running active-index-directory-storage-attached hook
09 Feb 2022 23:31:06Z  workload   waiting      HTTPConnectionPool(host='localhost', port=3100): Max retries exceeded with url: /loki/api/v1/status/buildinfo (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7faf5d25fb50>: Failed to establish a new connection: [Errno 111] Connection refused'))
09 Feb 2022 23:31:07Z  juju-unit  executing    running loki-chunks-storage-attached hook
09 Feb 2022 23:31:10Z  workload   waiting      HTTPConnectionPool(host='localhost', port=3100): Max retries exceeded with url: /loki/api/v1/status/buildinfo (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f928e5c23d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
09 Feb 2022 23:31:10Z  juju-unit  executing    running config-changed hook
09 Feb 2022 23:31:12Z  workload   waiting      HTTPConnectionPool(host='localhost', port=3100): Max retries exceeded with url: /loki/api/v1/status/buildinfo (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff8467d4be0>: Failed to establish a new connection: [Errno 111] Connection refused'))
09 Feb 2022 23:31:13Z  juju-unit  executing    running start hook
09 Feb 2022 23:31:15Z  juju-unit  allocating   Pulling image "registry.jujucharms.com/charm/ayt4gaga2x20gye32qthgygevco13qgq6rfff/loki-image@sha256:b61a65dbf591ccebebf3874c8d5e37d4ba1a8d2017625da97a83a4ced2ff252e"
09 Feb 2022 23:31:15Z  workload   waiting      Waiting for Pebble ready
09 Feb 2022 23:31:17Z  workload   waiting      HTTPConnectionPool(host='localhost', port=3100): Max retries exceeded with url: /loki/api/v1/status/buildinfo (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd9090bac10>: Failed to establish a new connection: [Errno 111] Connection refused'))
09 Feb 2022 23:31:18Z  juju-unit  executing    running grafana-source-relation-joined hook for grafana/0
09 Feb 2022 23:31:19Z  workload   waiting      HTTPConnectionPool(host='localhost', port=3100): Max retries exceeded with url: /loki/api/v1/status/buildinfo (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa08aa89df0>: Failed to establish a new connection: [Errno 111] Connection refused'))
09 Feb 2022 23:31:20Z  juju-unit  executing    running grafana-source-relation-changed hook for grafana/0
09 Feb 2022 23:31:23Z  juju-unit  idle         
09 Feb 2022 23:36:22Z  juju-unit  allocating   Back-off pulling image "registry.jujucharms.com/charm/ayt4gaga2x20gye32qthgygevco13qgq6rfff/loki-image@sha256:b61a65dbf591ccebebf3874c8d5e37d4ba1a8d2017625da97a83a4ced2ff252e"
09 Feb 2022 23:36:22Z  workload   waiting      HTTPConnectionPool(host='localhost', port=3100): Max retries exceeded with url: /loki/api/v1/status/buildinfo (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb9a4849d30>: Failed to establish a new connection: [Errno 111] Connection refused'))

LokiPushApi events review

Shouldn’t LogProxyConsumer also observe relation_broken for safety? Also it needs relation_joined for new consumer units being spun up, e.g., after a pod churn, as those do not get the relation_created, and I am not sure relation_changed will fire.

LokiPushApiProvider: alert rules probably should be deleted on relation_broken: relation_departed is for single units going away, but alert rules are global to the consumer Juju app. Also, same comment about relation_joined applies.

Originally posted by @mmanciop in #63 (comment)

Loki push_api lib should validate alert rules have at least one equality matcher

While working on cos-config I found out that alert rules that I forward to Loki are rejected: Loki sees them but complains about syntax

> $ curl 10.152.183.14:3100/loki/api/v1/rules                                                                                                                                                              
failed to list rule group for user fake and namespace 0_cos-configuration-k8s_alert.rules: error parsing /loki/rules/fake/0_cos-configuration-k8s_alert.rules: /loki/rules/fake/0_cos-configuration-k8s_alert.rules: could not parse expression: parse error at line 1, col 1: syntax error: unexpected IDENTIFIER

but the only thing in line 1 is groups:.

Apparently,

you need at least one equality matcher
cortexproject/cortex#3875 (comment)

When I replaced the rule with the following, Loki was happy

groups:
 - name: example
   rules:
     - alert: HighThroughputLogStreams
       expr: absent_over_time({namespace="dev",job=~".*-logs"}[2m])
       for: 2m

Seems like the loki lib needs to check for this so the user could be prompted.

Loki push api: ModelError on relation-departed

This happens from time to time on juju remove-application loki-k8s; might be a timing issue, if the relation-departed is queued after the databag gets deleted?

in _on_relation_departed                                                                                  
    new_config = self._promtail_config                                                                    
  File "/var/lib/juju/agents/unit-loki-tester-0/charm/lib/charms/loki_k8s/v0/loki_push_api.py", line 1890,
in _promtail_config                                                                                       
    config = {"clients": self._clients_list()}                                                            
  File "/var/lib/juju/agents/unit-loki-tester-0/charm/lib/charms/loki_k8s/v0/loki_push_api.py", line 1904,
in _clients_list                                                                                          
    clients = clients + json.loads(relation.data[relation.app]["endpoints"])                              
  File "/var/lib/juju/agents/unit-loki-tester-0/charm/venv/ops/model.py", line 430, in __getitem__        
    return self._data[key]                                                                                
  File "/var/lib/juju/agents/unit-loki-tester-0/charm/venv/ops/model.py", line 414, in _data              
    data = self._lazy_data = self._load()                                                                 
  File "/var/lib/juju/agents/unit-loki-tester-0/charm/venv/ops/model.py", line 779, in _load              
    return self._backend.relation_get(self.relation.id, self._entity.name, self._is_app)                  
  File "/var/lib/juju/agents/unit-loki-tester-0/charm/venv/ops/model.py", line 1588, in relation_get      
    return self._run(*args, return_output=True, use_json=True)                                            
  File "/var/lib/juju/agents/unit-loki-tester-0/charm/venv/ops/model.py", line 1523, in _run              
    raise ModelError(e.stderr)                                                                            
ops.model.ModelError: b'ERROR permission denied\n'                                                        

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.