Comments (6)
Thanks!
For bonus points we could detect if there would be changes to the config file and only call the reload if needed.
I believe this is a must-have, as long as alertmanager reload/restart has a chance of being triggered as part of the
update-status
hook.
If the reload api call works correctly, it should be a no-op when there is no config change.
from alertmanager-k8s-operator.
It seems to be restarting on the update-status
hook. 10.152.183.1 is Kubernetes API service in this env.
$ kubectl logs -n cos alertmanager-0
...
2023-12-13T10:41:36.684Z [container-agent] 2023-12-13 10:41:36 ERROR juju-log Failed to obtain status: Bad response
2023-12-13T10:41:36.770Z [container-agent] 2023-12-13 10:41:36 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/cos/statefulsets/alertmanager "HTTP/1.1 200 OK"
2023-12-13T10:41:36.848Z [container-agent] 2023-12-13 10:41:36 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/cos/pods/alertmanager-0 "HTTP/1.1 200 OK"
2023-12-13T10:41:36.880Z [container-agent] 2023-12-13 10:41:36 INFO juju-log reqs=ResourceRequirements(claims=None, limits={}, requests={'cpu': '0.25', 'memory': '200Mi'}), templated=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'}), actual=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'})
2023-12-13T10:41:36.925Z [container-agent] 2023-12-13 10:41:36 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/cos/statefulsets/alertmanager "HTTP/1.1 200 OK"
2023-12-13T10:41:36.998Z [container-agent] 2023-12-13 10:41:36 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/cos/pods/alertmanager-0 "HTTP/1.1 200 OK"
2023-12-13T10:41:38.515Z [container-agent] 2023-12-13 10:41:38 WARNING juju-log config reload via HTTP POST failed: Bad response
2023-12-13T10:41:38.522Z [container-agent] 2023-12-13 10:41:38 INFO juju-log Restarting service alertmanager
2023-12-13T10:41:40.139Z [container-agent] 2023-12-13 10:41:40 WARNING juju-log cannot determine if reload succeeded
2023-12-13T10:41:40.518Z [container-agent] 2023-12-13 10:41:40 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)
from alertmanager-k8s-operator.
It seems that each update-status calls _common_exit_hook
which includes this logic:
# Reload or restart the service
try:
self.alertmanager_workload.reload()
except ConfigUpdateFailure as e:
self.unit.status = BlockedStatus(str(e))
return
and as a result, it triggers a restart/reload every 5 minutes by default.
from alertmanager-k8s-operator.
It seems that each update-status calls
_common_exit_hook
which includes this logic:# Reload or restart the service try: self.alertmanager_workload.reload() except ConfigUpdateFailure as e: self.unit.status = BlockedStatus(str(e)) return
and as a result, it triggers a restart/reload every 5 minutes by default.
As an additional note, BlockedStatus
set in the code here should be an error status as there is no user action that resolves it.
from alertmanager-k8s-operator.
To fix this we should fix the reload API call. For bonus points we could detect if there would be changes to the config file and only call the reload if needed.
from alertmanager-k8s-operator.
Thanks!
For bonus points we could detect if there would be changes to the config file and only call the reload if needed.
I believe this is a must-have, as long as alertmanager reload/restart has a chance of being triggered as part of the update-status
hook.
from alertmanager-k8s-operator.
Related Issues (20)
- Create alert rule for cos-alerter
- Figure out how to configure cos-alerter
- AlertmanagerNotificationsFailed will fire forever if fired once
- Only leader scrape jobs are sent to Prometheus HOT 1
- TLS part 2: Enable TLS for Prometheus alerting and self-monitoring HOT 1
- Future TLS work: Encrypt traffic of the gossip ring. HOT 1
- Action `show-config` fails HOT 3
- Add scheme to alertmanager client
- wrong port (`None`) in Prometheus scrape job
- Support proxies HOT 2
- Race in alertmanager configuration HOT 3
- Missing GPG key HOT 1
- Use the Dockerhub Ubuntu ROCK instead of the ghcr.io one
- Use fqdn and correct scheme in `alertmanager_dispatch`
- Add an action to test external integrations like Pagerduty HOT 5
- Grouping of alerts can not be disabled HOT 2
- Wrong Alert's source URL HOT 5
- Alertmanger fails hook "config changed" due to assert private_key is not None HOT 2
- Discrepancy between alerts firing in prometheus and those reported in alertmanager HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alertmanager-k8s-operator.