Git Product home page Git Product logo

canonical / cos-configuration-k8s-operator Goto Github PK

View Code? Open in Web Editor NEW
4.0 8.0 5.0 429 KB

This charmed operator for Kubernetes enables you to provide configurations to various components of the Canonical Observability Stack (COS) bundle.

Home Page: https://charmhub.io/cos-configuration-k8s

License: Apache License 2.0

Python 100.00%
alertmanager gitops kubernetes loki observability prometheus hacktoberfest juju juju-charm

cos-configuration-k8s-operator's Introduction

COS Configuration Repository Operator for Kubernetes

COS configuration Release Discourse status

This charmed operator for Kubernetes enables you to provide configurations to various components of the Canonical Observability Stack (COS) bundle.

Supported configurations

The charm facilitates forwarding freestanding files from a git repository to the following operators:

Internally, the charm is using git-sync to sync a remote repo with the local copy. The repo syncs on update-status or when the user manually runs the sync-now action.

It's possible to sync a private repository by setting the git_ssh_key in the Juju configuration for the charm; please note that the key will be saved in the model, thus you should use a key with a very limited scope.

Getting started

Deployment

juju deploy cos-configuration-k8s \
  --config git_repo=https://path.to/repo \
  --config git_branch=main \
  --config git_depth=1 \
  --config prometheus_alert_rules_path=rules/prod/prometheus/
# ... and additionally, for a private repo
  --config git_ssh_key=@path/to/ssh/private.key

juju relate cos-configuration-k8s prometheus-k8s

Paths to rules files etc. can also be set after deployment:

juju config cos-configuration-k8s loki_alert_rules_path=rules/prod/loki/
juju relate cos-configuration-k8s loki-k8s

juju config cos-configuration-k8s grafana_dashboards_path=dashboards/prod/grafana/
juju relate cos-configuration-k8s grafana-k8s

Verification

After setting the git_repo (and optionally git_branch), the contents should be present in the workload container,

$ juju ssh --container git-sync cos-configuration-k8s/0 ls -l /git
total 4
drwxr-xr-x 6 root root 4096 Oct 24 08:59 7f0b1eac9317850aee320b4f47a7f1527aaff625
lrwxrwxrwx 1 root root   40 Oct 24 08:59 repo -> 7f0b1eac9317850aee320b4f47a7f1527aaff625

and accessible from the charm container

$ juju ssh cos-configuration-k8s/0 ls -l /var/lib/juju/storage/content-from-git/0
total 4
drwxr-xr-x 6 root root 4096 Oct 24 08:59 7f0b1eac9317850aee320b4f47a7f1527aaff625
lrwxrwxrwx 1 root root   40 Oct 24 08:59 repo -> 7f0b1eac9317850aee320b4f47a7f1527aaff625

After relating to e.g. prometheus, rules from the synced repo should appear in app data,

juju show-unit promethus-k8s/0 --format json | jq '."prometheus-k8s/0"."relation-info"' 

as well as in prometheus itself

juju ssh prometheus-k8s/0 curl localhost:9090/api/v1/rules

Scale Out Usage

N/A

Relations

Currently, supported relations are:

  • prometheus-config, for interfacing with prometheus.
  • loki-config, for interfacing with loki.
  • grafana-dashboards, for interfacing with grafana.

About Juju Topology

This charm forwards alert rules, recording rules and dashboards but does not add its own metadata to the topology.

The Juju topology describes a node in the model, not the data flow. That's why this charm does not inject Juju topology.

While a cos-configuration charm provides alerting rules, recording rules, and dashboards for charms, and topology labels could be used to give a sense of origin (as in data flow), the cos-configuration deployment itself is neither enriched with nor aware of suitable values for metadata to identify workloads.

In addition, the ability of cos-configuration to provide rules and dashboards which are not intrinsically tied to topology metadata offers administrators the flexibility to use COS to monitor non-charmed applications, use rules or dashboards directly from other sources, implement aggregate dashboards or rules which may collate metrics from more than one application, and more.

Addition of Juju topology metadata to the data structures provided by cos-configuration would be semantically inconsistent with charms, where topology labels indicate a node (application or unit) in Juju, and cos-configuration itself would not be consistent with the design model of Juju topology if it were to suggest label selectors for applications whose status cannot be known by cos-configuration itself.

Finally, addition of Juju topology labels may unpredictably interfere with group_by directive if an incorrect selector were injected.

On the other hand, the juju administrator may add annotations (or labels) to alert rules, recording rules and dashboards using different nomenclature that describes how it got into the model (like: origin, giturl, branch, synctime).

OCI Images

This charm can be used with the following image:

  • k8s.gcr.io/git-sync/git-sync:v3.5.0

Resource revisions

Workload images are archived on charmhub by revision number.

Resource Revision Image
git-sync-image r1 k8s.gcr.io/git-sync/git-sync:v3.4.0
git-sync-image r2 k8s.gcr.io/git-sync/git-sync:v3.5.0

cos-configuration-k8s-operator's People

Contributors

abuelodelanada avatar dstathis avatar lucabello avatar mmkay avatar observability-noctua-bot avatar rbarry82 avatar sed-i avatar simskij avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cos-configuration-k8s-operator's Issues

chore: update libraries to new major versions

This issue was created automatically because a new major version was detected for a charm library.

You should update the following libraries:

  • update charms.loki_k8s.v0.loki_push_api to v1

Config change after deployment does not update Grafana dashboard

Bug Description

Changing the configuration (e.g. grafana_dashboards) does not update the Grafana dashboard. This seems to be due to the fact that the dashboard is only updated during an install or upgrade event:

self.framework.observe(self._charm.on.leader_elected, self._update_all_dashboards_from_dir)
self.framework.observe(self._charm.on.upgrade_charm, self._update_all_dashboards_from_dir)

As a user, I would expect the dashboard data to be updated when the configuration is changed.

To Reproduce

  1. juju deploy --trust grafana-k8s grafana
  2. juju deploy cos-configuration-k8s --config git_repo=https://github.com/canonical/github-runner-operator --config git_branch=main
  3. juju config cos-configuration-k8s grafana_dashboards_path=src/grafana_dashboard_metrics
  4. juju relate cos-configuration-k8s:grafana-dashboards grafana:grafana-dashboard
  5. juju show-unit grafana/0 --format json | jq '."grafana/0"."relation-info"'

The last command shows an empty dashboard in the integration data. It would not be empty if the configuration value grafana_dashboards_path=src/grafana_dashboard_metric had been passed at deployment time.

Environment

Juju (tested with 3.1 and 2.9) within multipass using microk8s.

Relevant log output

nit-cos-configuration-k8s-0: 14:14:25 INFO unit.cos-configuration-k8s/0.juju-log git-sync: I1012 12:14:24.881140     135 main.go:473] "level"=0 "msg"="starting up" "pid"=135 "args"=["/git-sync","--repo","https://github.com/canonical/github-runner-operator","--branch","main","--rev","HEAD","--depth","1","--root","/git","--dest","repo","--one-time"]
unit-cos-configuration-k8s-0: 14:14:25 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-cos-configuration-k8s-0: 14:14:40 INFO juju.worker.uniter.operation ran "grafana-dashboards-relation-created" hook (via hook dispatching script: dispatch)
unit-grafana-0: 14:14:42 INFO unit.grafana/0.juju-log grafana-dashboard:39: Restarted grafana-k8s
unit-grafana-0: 14:14:43 INFO juju.worker.uniter.operation ran "grafana-dashboard-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-0: 14:14:43 INFO juju.worker.uniter.operation ran "grafana-dashboard-relation-joined" hook (via hook dispatching script: dispatch)
unit-grafana-0: 14:14:44 INFO juju.worker.uniter.operation ran "grafana-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-0: 14:14:45 INFO unit.grafana/0.juju-log grafana-dashboard:39: HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/observability/statefulsets/grafana "HTTP/1.1 200 OK"
unit-grafana-0: 14:14:45 INFO unit.grafana/0.juju-log grafana-dashboard:39: HTTP Request: GET https://10.152.183.1/api/v1/namespaces/observability/pods/grafana-0 "HTTP/1.1 200 OK"
unit-grafana-0: 14:14:45 INFO unit.grafana/0.juju-log grafana-dashboard:39: reqs=ResourceRequirements(claims=None, limits={}, requests={'cpu': '0.25', 'memory': '200Mi'}), templated=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'}), actual=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'})
unit-grafana-0: 14:14:45 INFO unit.grafana/0.juju-log grafana-dashboard:39: HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/observability/statefulsets/grafana "HTTP/1.1 200 OK"
unit-grafana-0: 14:14:45 INFO unit.grafana/0.juju-log grafana-dashboard:39: HTTP Request: GET https://10.152.183.1/api/v1/namespaces/observability/pods/grafana-0 "HTTP/1.1 200 OK"
unit-grafana-0: 14:14:45 INFO unit.grafana/0.juju-log grafana-dashboard:39: Initializing dashboard provisioning path
unit-grafana-0: 14:14:46 INFO unit.grafana/0.juju-log grafana-dashboard:39: Restarted grafana-k8s
unit-grafana-0: 14:14:46 INFO juju.worker.uniter.operation ran "grafana-dashboard-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-0: 14:14:47 INFO juju.worker.uniter.operation ran "grafana-relation-changed" hook (via hook dispatching script: dispatch)

Additional context

No response

Juju topology labels shouldn't be added to loki rule files

Bug Description

The charm currently uses LokiPushApiConsumer to forward rule files to loki, but that class currently auto adds topology labels to the alerts.

self.loki_rules_provider = LokiPushApiConsumer(
self,
self.loki_relation_name,
alert_rules_path=os.path.join(self._repo_path, self.config["loki_alert_rules_path"]),
recursive=True,
)

To Reproduce

juju deploy --channel=edge cos-configuration-k8s cos-config --config git_repo=https://github.com/canonical/cos-configuration-k8s-operator.git --config git_branch=main --config loki_alert_rules_path=tests/samples/loki_alert_rules
juju deploy --channel=edge loki-k8s loki
juju relate loki cos-config

juju ssh --container loki loki/0 grep -r "juju_" /loki/rules/fake

Environment

Model    Controller           Cloud/Region        Version  SLA          Timestamp
welcome  charm-dev-batteries  microk8s/localhost  3.0.2    unsupported  12:55:10-05:00

App         Version  Status  Scale  Charm                  Channel  Rev  Address         Exposed  Message
cos-config  3.5.0    active      1  cos-configuration-k8s  edge      18  10.152.183.195  no       
loki        2.4.1    active      1  loki-k8s               edge      65  10.152.183.124  no       

Relevant log output

NTA

Additional context

No response

ops.model.ModelError: ERROR invalid value "content-from-git/1" for option -s: getting filesystem attachment info: filesystem attachment "1" on "unit cos-configuration/0" not provisioned

Bug Description

It's a race condition in a deployment and the charm can turn into an error state.

unit-cos-configuration-ceph-0: 13:06:39 INFO juju.worker.uniter found queued "install" hook
unit-cos-configuration-ceph-0: 13:06:40 DEBUG unit.cos-configuration-ceph/0.juju-log ops 2.4.1 up and running.
unit-cos-configuration-ceph-0: 13:06:40 INFO unit.cos-configuration-ceph/0.juju-log Running legacy hooks/install.
unit-cos-configuration-ceph-0: 13:06:40 DEBUG unit.cos-configuration-ceph/0.juju-log ops 2.4.1 up and running.
unit-cos-configuration-ceph-0: 13:06:40 DEBUG unit.cos-configuration-ceph/0.juju-log Charm called itself via hooks/install.
unit-cos-configuration-ceph-0: 13:06:40 DEBUG unit.cos-configuration-ceph/0.juju-log Legacy hooks/install exited with status 0.
unit-cos-configuration-ceph-0: 13:06:41 ERROR unit.cos-configuration-ceph/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-cos-configuration-ceph-0/charm/venv/ops/model.py", line 2693, in _run
    result = subprocess.run(args, **kwargs)  # type: ignore
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-cos-configuration-ceph-0/storage-get', '-s', 'content-from-git/1', 'location', '--format=json')' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./src/charm.py", line 473, in <module>
    main(COSConfigCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-cos-configuration-ceph-0/charm/venv/ops/main.py", line 429, in main
    charm = charm_class(framework)
  File "./src/charm.py", line 84, in __init__
    self._git_sync_mount_point = self.model.storages["content-from-git"][0].location
  File "/var/lib/juju/agents/unit-cos-configuration-ceph-0/charm/venv/ops/model.py", line 1792, in location
    raw = self._backend.storage_get(self.full_id, "location")
  File "/var/lib/juju/agents/unit-cos-configuration-ceph-0/charm/venv/ops/model.py", line 2919, in storage_get
    out = self._run('storage-get', '-s', storage_name_id, attribute,
  File "/var/lib/juju/agents/unit-cos-configuration-ceph-0/charm/venv/ops/model.py", line 2695, in _run
    raise ModelError(e.stderr)
ops.model.ModelError: ERROR invalid value "content-from-git/1" for option -s: getting filesystem attachment info: filesystem attachment "1" on "unit cos-configuration-ceph/0" not provisioned

unit-cos-configuration-ceph-0: 13:06:41 ERROR juju.worker.uniter.operation hook "install" (via hook dispatching script: dispatch) failed: exit status 1

To Reproduce

  1. juju deploy cos-lite --trust --channel latest/edge
  2. juju deploy cos-configuration-k8s cos-configuration-ceph --config ...

Environment

prometheus-scrape-config-k8s latest/edge 47

Relevant log output

$ juju show-status-log cos-configuration-ceph/0 --days 1
Time                   Type       Status       Message
01 Mar 2024 13:06:00Z  juju-unit  allocating   
01 Mar 2024 13:06:00Z  workload   waiting      installing agent
01 Mar 2024 13:06:34Z  workload   waiting      agent initialising
01 Mar 2024 13:06:39Z  juju-unit  executing    running install hook
01 Mar 2024 13:06:41Z  juju-unit  error        hook failed: "install"
01 Mar 2024 13:06:46Z  workload   maintenance  installing charm software
01 Mar 2024 13:06:46Z  juju-unit  executing    running install hook
01 Mar 2024 13:06:50Z  juju-unit  executing    running prometheus-config-relation-created hook
01 Mar 2024 13:06:51Z  juju-unit  executing    running replicas-relation-created hook
01 Mar 2024 13:06:51Z  juju-unit  executing    running grafana-dashboards-relation-created hook
01 Mar 2024 13:06:52Z  juju-unit  executing    running leader-elected hook
01 Mar 2024 13:07:27Z  juju-unit  executing    running git-sync-pebble-ready hook
01 Mar 2024 13:07:29Z  juju-unit  executing    running content-from-git-storage-attached hook
01 Mar 2024 13:07:30Z  juju-unit  executing    running config-changed hook
01 Mar 2024 13:07:32Z  juju-unit  executing    running start hook
01 Mar 2024 13:07:34Z  juju-unit  executing    running grafana-dashboards-relation-joined hook for grafana/0
01 Mar 2024 13:07:36Z  juju-unit  executing    running grafana-dashboards-relation-changed hook for grafana/0
01 Mar 2024 13:07:37Z  juju-unit  executing    running replicas-relation-changed hook
01 Mar 2024 13:07:38Z  juju-unit  idle         
01 Mar 2024 13:07:49Z  juju-unit  executing    running prometheus-config-relation-joined hook for prometheus/0
01 Mar 2024 13:07:51Z  juju-unit  executing    running prometheus-config-relation-changed hook for prometheus/0
01 Mar 2024 13:08:05Z  juju-unit  idle         
01 Mar 2024 14:11:38Z  workload   active

Additional context

No response

`context deadline exceeded` with a large git repository

Bug Description

When I tried to use https://github.com/ceph/ceph.git, the git repository wouldn't be available in the filesystem somehow.

Once looking at the detail, it hit into:

too many failures, aborting" "error"="Run(git submodule update --init --recursive --depth 1): context deadline exceeded:

Two suggestions here:

  1. support specifying git submodule behavior in the charm configuration to minimize the data to download
/git-sync --help
...
-submodules string
   git submodule behavior: one of 'recursive', 'shallow', or 'off' (default "recursive")
  1. support customizing the deadline

To Reproduce

juju deploy -m cos cos-configuration-k8s cos-configuration \
    --config git_repo=https://github.com/ceph/ceph.git \
    --config git_branch=main \
    --config grafana_dashboards_path=monitoring/ceph-mixin/dashboards_out/

Environment

# /git-sync --version
v3.5.0

# git version
git version 2.30.2

cos-configuration 3.5.0 cos-configuration-k8s  latest/edge   39

Relevant log output

# /git-sync --repo https://github.com/ceph/ceph.git --branch main --rev HEAD --depth 1 --root /git --dest repo --one-time -v 1
I1030 10:11:21.707807    2104 main.go:473] "level"=0 "msg"="starting up" "pid"=2104 "args"=["/git-sync","--repo","https://github.com/ceph/ceph.git","--branch","main","--rev","HEAD","--depth","1","-
-root","/git","--dest","repo","--one-time","-v","1"]
I1030 10:11:21.708051    2104 main.go:923] "level"=0 "msg"="cloning repo" "origin"="https://github.com/ceph/ceph.git" "path"="/git"
I1030 10:11:25.723881    2104 main.go:737] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="bf82a7bd34d6e65f817265743e0375d89ee403ea"
I1030 10:12:07.458563    2104 main.go:726] "level"=1 "msg"="removing worktree" "path"="/git/bf82a7bd34d6e65f817265743e0375d89ee403ea"
I1030 10:12:07.469542    2104 main.go:772] "level"=0 "msg"="adding worktree" "path"="/git/bf82a7bd34d6e65f817265743e0375d89ee403ea" "branch"="origin/main"
I1030 10:12:08.620354    2104 main.go:833] "level"=0 "msg"="reset worktree to hash" "path"="/git/bf82a7bd34d6e65f817265743e0375d89ee403ea" "hash"="bf82a7bd34d6e65f817265743e0375d89ee403ea"
I1030 10:12:08.620378    2104 main.go:838] "level"=0 "msg"="updating submodules"
E1030 10:14:47.073790    2104 main.go:525] "msg"="too many failures, aborting" "error"="Run(git submodule update --init --recursive --depth 1): context deadline exceeded: { stdout: "Submodule path
'ceph-erasure-code-corpus': checked out '2d7d78b9cc52e8a9529d8cc2d2954c7d375d5dd7'\nSubmodule path 'ceph-object-corpus': checked out '038c72b5acec667e1aca4c79a8cfcae705d766fe'\nSubmodule path 'src/
arrow': checked out '347a88ff9d20e2a4061eec0b455b8ea1aa8335dc'\nSubmodule path 'src/arrow/cpp/submodules/parquet-testing': checked out '600d437de0e8b0e9927c87e76f844a1b385b02e8'\nSubmodule path 'sr
c/arrow/testing': checked out 'a60b715263d9bbf7e744527fb0c084b693f58043'\nSubmodule path 'src/blkin': checked out 'f24ceec055ea236a093988237a9821d145f5f7c8'\nSubmodule path 'src/c-ares': checked ou
t 'fd6124c74da0801f23f9d324559d8b66fb83f533'\nSubmodule path 'src/cpp_redis': checked out 'c659475ea43bc77850018aa1433d55cad902ea85'\nSubmodule path 'src/crypto/isa-l/isa-l_crypto': checked out 'a6
dc869666fca3eef9a0305b290e4e0fc8bac645'\nSubmodule path 'src/dmclock': checked out 'e4ccdcfa828c84b8ea775a928118f2b8012d0f42'\nSubmodule path 'src/erasure-code/jerasure/gf-complete': checked out '7
e61b44404f0ed410c83cfd3947a52e88ae044e1'\nSubmodule path 'src/erasure-code/jerasure/jerasure': checked out '96c76b89d661c163f65a014b8042c9354ccf7f31'\nSubmodule path 'src/fmt': checked out
...

Additional context

No response

Syncing from private git repositories is broken

Bug Description

I faced multiple issues trying to enable sync from a private repository.

  1. Specifying the private key during the charm deployment doesn't work, the private key file is not created. After resetting and setting again to the same value, the file is created.
  2. git-sync command fails because the known_hosts file path doesn't exist.

Now the following issues were encountered after fixing the above with juju ssh workarounds (all commands listed below).

  1. Remote server's SSH key is not auto-accepted/ignored by git-sync called by the charm and the sync action fails.
  2. Private SSH key has incorrect permissions 0644 instead of 0600 (or even more restrictive).

Additionally, there's no validation of the private key, after setting the option value using =$(cat id_ecdsa) results in a file without a newline at the end, using the =@id_ecdsa syntax works. Because of that, the sync action may also fail as it will report that the key file has an invalid format.

To Reproduce

  1. Deploy:
juju deploy cos-configuration-k8s --config git_repo=git+ssh://[email protected]/~redacted/+git/redacted --config git_branch=main --config git_depth=1 --config git_ssh_key="$(cat redacted.key)" cos-configuration
  1. Try to sync:
juju run-action cos-configuration/0 sync-now --wait
...
  log:
  - 2023-07-07 07:24:46 +0000 UTC Calling git-sync with --one-time...
  - '2023-07-07 07:24:46 +0000 UTC ERROR: can''t configure SSH: can''t access SSH
    key: stat /run/cos-config-ssh-key.priv: no such file or directory'
  message: 'Sync error: Exited with code 1.'
  1. Reset the config option and set it to the same value again.
  2. Attempt to sync again, the key is there now but the action fails again:
juju run-action cos-configuration/0 sync-now --wait
...
  log:
  - 2023-07-07 07:29:59 +0000 UTC Calling git-sync with --one-time...
  - '2023-07-07 07:29:59 +0000 UTC ERROR: can''t configure SSH: can''t access SSH
    known_hosts: stat /etc/git-secret/known_hosts: no such file or directory'
  message: 'Sync error: Exited with code 1.'
  1. Created empty known_hosts file manually in an attempt to work this around:
juju ssh --container git-sync cos-configuration/0 mkdir /etc/git-secret/
juju ssh --container git-sync cos-configuration/0 ls -l /etc/git-secret/known_hosts
  1. Synced again, this time it failed on Host key verification failed.
juju run-action cos-configuration/0 sync-now --wait
...
  log:
  - 2023-07-07 07:40:05 +0000 UTC Calling git-sync with --one-time...
  - 2023-07-07 07:40:05 +0000 UTC I0707 07:40:05.187060     146 main.go:473] "level"=0
    "msg"="starting up" "pid"=146 "args"=["/git-sync","--repo","git+ssh://[email protected]/~redacted/+git/redacted","--branch","main","--rev","HEAD","--depth","1","--root","/git","--dest","repo","--ssh","--ssh-key-file","/run/cos-config-ssh-key.priv","--one-time"]
  - 2023-07-07 07:40:05 +0000 UTC I0707 07:40:05.187213     146 main.go:923] "level"=0
    "msg"="cloning repo" "origin"="git+ssh://[email protected]/~redacted/+git/redacted"
    "path"="/git"
  - '2023-07-07 07:40:05 +0000 UTC E0707 07:40:05.303971     146 main.go:525] "msg"="too
    many failures, aborting" "error"="Run(git clone -v --no-checkout -b main --depth
    1 git+ssh://[email protected]/~redacted/+git/redacted /git):
    exit status 128: { stdout: "", stderr: "Cloning into ''/git''...\nHost key verification
    failed.\r\nfatal: Could not read from remote repository.\n\nPlease make sure you
    have the correct access rights\nand the repository exists.\n" }" "failCount"=0'
  message: 'Sync error: Exited with code 1.'
  1. I assumed git-sync prompts to accept the remote key so I ran the same command via juju ssh:
juju ssh --container git-sync cos-configuration/0 "/git-sync --repo git+ssh://[email protected]/~redacted/+git/redacted --branch main --rev HEAD --depth 1 --root /git --dest repo --ssh --ssh-key-file /run/cos-config-ssh-key.priv --one-time"

It did and I typed yes:

The authenticity of host 'git.launchpad.net (185.125.188.44)' can't be established.
RSA key fingerprint is SHA256:UNOzlP66WpDuEo34Wgs8mewypV0UzqHLsIFoqwe8dYo.
Are you sure you want to continue connecting (yes/no)? yes

It failed again after that:

E0707 07:44:58.716917     181 main.go:525] "msg"="too many failures, aborting" "error"="Run(git clone -v --no-checkout -b main --depth 1 git+ssh://[email protected]/~redacted/+git/redacted /git): exit status 128: { stdout: "", stderr: "Cloning into '/git'...\nWarning: Permanently added 'git.launchpad.net,185.125.188.44' (RSA) to the list of known hosts.\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nPermissions 0644 for '/run/cos-config-ssh-key.priv' are too open.\r\nIt is required that your private key files are NOT accessible by others.\r\nThis private key will be ignored.\r\nLoad key \"/run/cos-config-ssh-key.priv\": bad permissions\r\[email protected]: Permission denied (publickey).\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n" }" "failCount"=0

The private key file created by the charm has incorrect permissions 0644 instead of 0600.

  1. I changed permissions manually using juju ssh to work this around:
juju ssh --container git-sync cos-configuration/0 chmod 0600 /run/cos-config-ssh-key.priv
  1. Tried the sync again and it finally worked:
juju run-action cos-configuration/0 sync-now --wait
unit-cos-configuration-0:
  UnitId: cos-configuration/0
  id: "46"
  log:
  - 2023-07-07 07:49:13 +0000 UTC Calling git-sync with --one-time...
  - '2023-07-07 07:49:15 +0000 UTC Warning: I0707 07:49:13.172126     270 main.go:473]
    "level"=0 "msg"="starting up" "pid"=270 "args"=["/git-sync","--repo","git+ssh://[email protected]/~redacted/+git/redacted","--branch","main","--rev","HEAD","--depth","1","--root","/git","--dest","repo","--ssh","--ssh-key-file","/run/cos-config-ssh-key.priv","--one-time"]'
  - '2023-07-07 07:49:16 +0000 UTC Warning: I0707 07:49:13.172225     270 main.go:923]
    "level"=0 "msg"="cloning repo" "origin"="git+ssh://[email protected]/~redacted/+git/redacted"
    "path"="/git"'
  - '2023-07-07 07:49:16 +0000 UTC Warning: I0707 07:49:13.886433     270 main.go:737]
    "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="3fecae3005ecf7eb23fd7e748b9999eb6ead91cb"'
  - '2023-07-07 07:49:16 +0000 UTC Warning: I0707 07:49:14.839919     270 main.go:772]
    "level"=0 "msg"="adding worktree" "path"="/git/3fecae3005ecf7eb23fd7e748b9999eb6ead91cb"
    "branch"="origin/main"'
  - '2023-07-07 07:49:16 +0000 UTC Warning: I0707 07:49:14.845755     270 main.go:833]
    "level"=0 "msg"="reset worktree to hash" "path"="/git/3fecae3005ecf7eb23fd7e748b9999eb6ead91cb"
    "hash"="3fecae3005ecf7eb23fd7e748b9999eb6ead91cb"'
  - '2023-07-07 07:49:16 +0000 UTC Warning: I0707 07:49:14.845783     270 main.go:838]
    "level"=0 "msg"="updating submodules"'
  results:
    git-sync-stdout: ""
  status: completed
  timing:
    completed: 2023-07-07 07:49:19 +0000 UTC
    enqueued: 2023-07-07 07:49:12 +0000 UTC
    started: 2023-07-07 07:49:12 +0000 UTC

Environment

COS Lite on top of microk8s, charm from latest/edge.

Relevant log output

Included in steps to reproduce.

Additional context

No response

The `_common_exit_hook` challenge!

Challenge: get rid of the _common_exit_hook pattern, with minimal changes to tests.

Top score: for whoever introduces a repeatable pattern.
Negative score: for transforming into explosive spaghetti.

Please link you PR to this issue.

Optional constraints

Feel free to use all/some/none:

  • Only top-level event hooks allowed to change unit status
  • Do not use StoredState
  • Construct provider/consumer instances only if the respective relation is formed
  • Do not rely on update-status to finish the charm's startup sequence

Past attempts

Deferring decorators

    @defer_unless(network_ready, WaitingStatus("Waiting for ip address"))
    def _on_alertmanager_pebble_ready(self, event: ops.charm.PebbleReadyEvent):
        # ...

    @defer_unless(network_ready, WaitingStatus("Waiting for ip address"))
    @skip_unless(config_valid, BlockedStatus("PagerDuty service key missing"))
    def _on_config_changed(self, event: ops.charm.ConfigChangedEvent):
        # ...

The underlying assumptions were:

  1. Deferral/skip is decideable via a boolean callable (e.g. network_ready) at event entry
  2. The boolean callables are always of the form check(charm, event)
  3. The @defer_unless/@skip_unless decorator usage can be limited to event hooks only

Rely on update-status

In all event hooks, use defer if a precondition is not met. This means that the charm may end up in stagnation until the next update-status. If the default update-status hook interval of 5min is kept by the user, then relying on update-status is not too bad, but if the user creates a controller or model with a default of, say, 60 min, then it may be too long.

changing some charm configs does not push new alert rules into relation data with prometheus

Bug Description

  • After deploying cos-config and relating it to prometheus, I see new alert rules in prometheus.

  • After changing the following charm configs,

git_repo
git_branch
prometheus_alert_rules_path

the data bag gets emptied out

    "endpoint": "metrics-endpoint",
    "related-endpoint": "prometheus-config",
    "application-data": {
      "alert_rules": "{}"
    },

I checked the cos-configuration pod and I see it successfully pulled in the the data from github.

  • Workaround
juju remove-relation cos-configuration prometheus
# wait till hooks are done executing
juju integrate cos-configuration prometheus

To Reproduce

  1. juju deploy cos and cos coniguration
  2. set cos config charm configs
  3. relate cos config and cos
  4. change charm configs for
git_repo
git_branch
prometheus_alert_rules_path

Environment

cos    manual-controller  microk8s-cloud/localhost  3.1.7    unsupported  16:05:57Z
...
cos-configuration  3.5.0    active      1  cos-configuration-k8s  latest/stable   42 ...
prometheus         2.47.2   active      1  prometheus-k8s         stable         159 ...
...

Relevant log output

N/A

Additional context

No response

New dropdowns do not work with a cos-config dashboard

Bug Description

Until recently, a dashboard coming via cos-config was working.

Now, with the new dropdown (or the new grafana 9?), there is a new drop down in which I must choose an application
Screen Shot 2022-10-26 at 12 47 55 PM

As a result, the dashboard no longer works.

Screen Shot 2022-10-26 at 12 44 06 PM

To Reproduce

Deploy the load test and open the sre mock 2 panels - 6 lines and 6 log sources dashboard in grafana.

Environment

COS bundle from edge, on GCP.

Model               Controller  Cloud/Region        Version  SLA          Timestamp
cos-lite-load-test  uk8s        microk8s/localhost  2.9.35   unsupported  09:51:40Z

App            Version  Status  Scale  Charm                         Channel  Rev  Address         Exposed  Message
alertmanager   0.23.0   active      1  alertmanager-k8s              edge      36  10.152.183.254  no       
catalogue               active      1  catalogue-k8s                 edge       4  10.152.183.2    no       
cos-config     3.5.0    active      1  cos-configuration-k8s         edge      11  10.152.183.148  no       
grafana        9.2.1    active      1  grafana-k8s                   edge      52  10.152.183.252  no       
loki           2.4.1    active      1  loki-k8s                      edge      47  10.152.183.111  no       
prometheus     2.33.5   active      1  prometheus-k8s                edge      79  10.152.183.118  no       
scrape-config  n/a      active      1  prometheus-scrape-config-k8s  edge      38  10.152.183.104  no       
scrape-target  n/a      active      1  prometheus-scrape-target-k8s  edge      23  10.152.183.13   no       
traefik                 active      1  traefik-k8s                   edge      93  10.128.0.5      no       

Relevant log output

n/a

Additional context

No response

Add support for private repositories

Enhancement Proposal

When configuring new alert rules for a Cos stack, I may need to add private details of my services being monitored and I would like that to be stored on a private repository.

cos-configuration should support this possibility managing different types of authentication: ssh private key, custom CA, username/password, access token.

Sync fails behind a proxy

Bug Description

Git sync fails behind a proxy regardless of the protocol used, SSH and HTTP both fail.

To Reproduce

juju run-action cos-config/0 sync-now --wait
unit-cos-config-0:
  UnitId: cos-config/0
  id: "6"
  log:
  - 2023-08-18 12:06:54 +0000 UTC Calling git-sync with --one-time...
  - 2023-08-18 12:10:29 +0000 UTC I0818 12:06:54.309038     306 main.go:473] "level"=0
    "msg"="starting up" "pid"=306 "args"=["/git-sync","--repo","https://<username>:<token>@git.launchpad.net/~<redacted>/+git/cos-config","--branch","<branch>","--r
ev","HEAD","--depth","1","--root","/git","--dest","repo","--one-time"]
  - 2023-08-18 12:10:29 +0000 UTC I0818 12:06:54.309175     306 main.go:923] "level"=0
    "msg"="cloning repo" "origin"="https://<username>:<token>@git.launchpad.net/~<redacted>/+git/cos-config"
    "path"="/git"
  - 2023-08-18 12:10:29 +0000 UTC I0818 12:06:54.311270     306 main.go:929] "level"=0
    "msg"="git root exists and is not empty (previous crash?), cleaning up" "path"="/git"
  - '2023-08-18 12:10:30 +0000 UTC E0818 12:10:29.295155     306 main.go:525] "msg"="too
    many failures, aborting" "error"="Run(git clone -v --no-checkout -b <branch>
    --depth 1 https://<username>:<token>@git.launchpad.net/~<redacted>/+git/cos-config
    /git): context deadline exceeded: { stdout: "", stderr: "Cloning into ''/git''...\nfatal:
    unable to access ''https://git.launchpad.net/~<redacted>/+git/cos-config/'':
    Failed to connect to git.launchpad.net port 443: Connection timed out\n" }" "failCount"=0'
  message: 'Sync error: Exited with code 1.'
  results: {}
  status: failed
  timing:
    completed: 2023-08-18 12:10:30 +0000 UTC
    enqueued: 2023-08-18 12:03:02 +0000 UTC
    started: 2023-08-18 12:06:54 +0000 UTC

Repository is accessible through the proxy from the cos-config git-sync container:

$ juju ssh --container git-sync cos-config/0

# export https_proxy=http://<proxy_url>:8000
# git clone -v --no-checkout -b <branch> --depth 1 https://<username>:<token>@git.launchpad.net/~<redacted>/+git/cos-config
Cloning into 'cos-config'...
POST git-upload-pack (415 bytes)
POST git-upload-pack (157 bytes)
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (6/6), 16.93 KiB | 298.00 KiB/s, done.

Without the proxy env var it throws the same error as the sync-now action:

$ juju ssh --container git-sync cos-config/0

# unset https_proxy
# git clone -v --no-checkout -b <branch> --depth 1 https://<username>:<token>@git.launchpad.net/~<redacted>/+git/cos-config
Cloning into 'cos-config'...
fatal: unable to access 'https://git.launchpad.net/~<redacted>/+git/cos-config/': Failed to connect to git.launchpad.net port 443: Connection timed out

Environment

cos-configuration-k8s-operator rev 38 running behind proxya

Relevant log output

juju run-action cos-config/0 sync-now --wait
unit-cos-config-0:
  UnitId: cos-config/0
  id: "6"
  log:
  - 2023-08-18 12:06:54 +0000 UTC Calling git-sync with --one-time...
  - 2023-08-18 12:10:29 +0000 UTC I0818 12:06:54.309038     306 main.go:473] "level"=0
    "msg"="starting up" "pid"=306 "args"=["/git-sync","--repo","https://<username>:<token>@git.launchpad.net/~<redacted>/+git/cos-config","--branch","<branch>","--r
ev","HEAD","--depth","1","--root","/git","--dest","repo","--one-time"]
  - 2023-08-18 12:10:29 +0000 UTC I0818 12:06:54.309175     306 main.go:923] "level"=0
    "msg"="cloning repo" "origin"="https://<username>:<token>@git.launchpad.net/~<redacted>/+git/cos-config"
    "path"="/git"
  - 2023-08-18 12:10:29 +0000 UTC I0818 12:06:54.311270     306 main.go:929] "level"=0
    "msg"="git root exists and is not empty (previous crash?), cleaning up" "path"="/git"
  - '2023-08-18 12:10:30 +0000 UTC E0818 12:10:29.295155     306 main.go:525] "msg"="too
    many failures, aborting" "error"="Run(git clone -v --no-checkout -b <branch>
    --depth 1 https://<username>:<token>@git.launchpad.net/~<redacted>/+git/cos-config
    /git): context deadline exceeded: { stdout: "", stderr: "Cloning into ''/git''...\nfatal:
    unable to access ''https://git.launchpad.net/~<redacted>/+git/cos-config/'':
    Failed to connect to git.launchpad.net port 443: Connection timed out\n" }" "failCount"=0'
  message: 'Sync error: Exited with code 1.'
  results: {}
  status: failed
  timing:
    completed: 2023-08-18 12:10:30 +0000 UTC
    enqueued: 2023-08-18 12:03:02 +0000 UTC
    started: 2023-08-18 12:06:54 +0000 UTC

Additional context

Fixing it for ssh protocol might be tricky.

'[... state-set', '--file', '-']' returned non-zero exit status 1: ERROR max allowed value length (65536) exceeded

Bug Description

Not sure what's going on but the charm gives an error when the following steps are taken.

To Reproduce

  1. deploy
$ juju deploy cos-configuration-k8s cos-configuration
Deployed "cos-configuration" from charm-hub charm "cos-configuration-k8s", revision 45 in channel latest/stable on [email protected]/stable
  1. configure
$ juju config cos-configuration \
    git_repo=https://github.com/nobuto-m/ceph.git \
    git_branch=cos-configuration-testing-main \
    grafana_dashboards_path=monitoring/ceph-mixin/dashboards_out/

Environment

cos-configuration-k8s latest/stable 45

Relevant log output

unit-cos-configuration-0: 11:48:14 INFO unit.cos-configuration/0.juju-log git-sync: I0404 11:48:14.230593      86 main.go:473] "level"=0 "msg"="starting up" "pid"=86 "args"=["/git-sync","--repo","https://github.com/nobuto-m/ceph.git","--branch","cos-configuration-testing-main","--rev","HEAD","--depth","1","--root","/git","--dest","repo","--one-time"]
unit-cos-configuration-0: 11:48:14 INFO unit.cos-configuration/0.juju-log Updating stored hash: git-sync hash changed from None (<class 'NoneType'>) to 0bfc7851eb297277581ed0a64d5c7a9b4824cc5f (<class 'str'>)
unit-cos-configuration-0: 11:48:14 DEBUG unit.cos-configuration/0.juju-log Alert rules path does not exist: /var/lib/juju/storage/content-from-git/0/repo/prometheus_alert_rules
unit-cos-configuration-0: 11:48:14 INFO unit.cos-configuration/0.juju-log Updating relation data with rule files from disk
unit-cos-configuration-0: 11:48:14 DEBUG unit.cos-configuration/0.juju-log storing reinit_without_topology_dropdowns: changed from [None] to [Done]
unit-cos-configuration-0: 11:48:14 DEBUG unit.cos-configuration/0.juju-log storing hash: changed from [None] to [0bfc7851eb297277581ed0a64d5c7a9b4824cc5f]
unit-cos-configuration-0: 11:48:15 DEBUG unit.cos-configuration/0.juju-log Alert rules path does not exist: /var/lib/juju/storage/content-from-git/0/repo/prometheus_alert_rules
unit-cos-configuration-0: 11:48:15 INFO unit.cos-configuration/0.juju-log Updating relation data with rule files from disk
unit-cos-configuration-0: 11:48:15 WARNING unit.cos-configuration/0.leader-elected ERROR max allowed value length (65536) exceeded
unit-cos-configuration-0: 11:48:15 ERROR unit.cos-configuration/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 473, in <module>
    main(COSConfigCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-cos-configuration-0/charm/venv/ops/main.py", line 443, in main
    framework.commit()
  File "/var/lib/juju/agents/unit-cos-configuration-0/charm/venv/ops/framework.py", line 674, in commit
    self.on.commit.emit()
  File "/var/lib/juju/agents/unit-cos-configuration-0/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-cos-configuration-0/charm/venv/ops/framework.py", line 833, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-cos-configuration-0/charm/venv/ops/framework.py", line 922, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-cos-configuration-0/charm/venv/ops/framework.py", line 1027, in on_commit
    self.framework.save_snapshot(self)
  File "/var/lib/juju/agents/unit-cos-configuration-0/charm/venv/ops/framework.py", line 710, in save_snapshot
    self._storage.save_snapshot(value.handle.path, data)
  File "/var/lib/juju/agents/unit-cos-configuration-0/charm/venv/ops/storage.py", line 226, in save_snapshot
    self._backend.set(handle_path, snapshot_data)
  File "/var/lib/juju/agents/unit-cos-configuration-0/charm/venv/ops/storage.py", line 365, in set
    _run(["state-set", "--file", "-"], input=content, check=True)
  File "/var/lib/juju/agents/unit-cos-configuration-0/charm/venv/ops/storage.py", line 48, in _run
    return subprocess.run([cmd, *args[1:]], encoding='utf-8', **kw)
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/var/lib/juju/tools/unit-cos-configuration-0/state-set', '--file', '-']' returned non-zero exit status 1.
unit-cos-configuration-0: 11:48:15 ERROR juju.worker.uniter.operation hook "leader-elected" (via hook dispatching script: dispatch) failed: exit status 1

Additional context

cos-configuration-0.log

Add support for custom Prometheus scrape jobs

Enhancement Proposal

It would be great if a user could manually add any scrape_jobs configuration to COS Prometheus using cos-configuration charm.

Historically, we relied on this config option https://charmhub.io/prometheus/configure#scrape-jobs in the LMA Prometheus charm to achieve that.

Use cases:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.