cloudflare / unsee Goto Github PK
View Code? Open in Web Editor NEWAlert dashboard for Prometheus Alertmanager
License: Apache License 2.0
Alert dashboard for Prometheus Alertmanager
License: Apache License 2.0
I'm using unsee against AM 0.15.1, and a few things look weird.
Can I verify myself if the new alertmanager is the culprit, or assist in debugging it?
The exact problem I'm experiencing: the alertname doesn't show in the header of an "incident block", it's promoted (or demoted) to the listing inside the incident block.
We have 3 Alert managers running in Prod/Preprod/QA, we would like to differentiate with specific names to it, Please let us know where to set Alertmanager names. Right now when we use @alertmanager, we see the name as " default"
The reason we use Unsee is to see all Alerts from 3 alertmanagers but now we are unable to differentiate alertmanager names as it displays everything as default. Please advise. Thanks
When looking at a silenced alert it would be nice to be able to get a link to the silence in alertmanager.
If nothing else it would be acceptable to be able to get the silence id to search in amtool
Hi,
Sometimes I get this error in the unsee UI
Get http://prometheus-alertmanager-xxxxx/api/v1/silences: dial tcp: lookup prometheus-alertmanager-xxxxx on [::1]:53: read udp [::1]:55668->[::1]:53: read: connection refused
I have to delete the UNSEE pods to make it disappear.
is there a specific flag I'm supposed to use to get it working with nginx ?
I'm trying to setup a container with alertmanager
and unsee
when I hit /unsee
or /unsee
I get a 404
I run unsee with the following ...
/usr/bin/unsee --listen.port 9095 --alertmanager.uri http://127.0.0.1:9094 --listen.prefix /unsee
This is my nginx file ...
server {
listen 9093;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;
proxy_set_header X-NginX-Proxy true;
proxy_pass http://127.0.0.1:9094;
proxy_redirect off;
# Socket.IO Support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
server {
listen 9093;
location /unsee {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;
proxy_set_header X-NginX-Proxy true;
proxy_pass http://127.0.0.1:9095;
proxy_redirect off;
# Socket.IO Support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Debian sometimes like to append git commit IDs to package versions, and this is resulting in a panic when Unsee tries to parse the version:
INFO[0000] GET http://fra-alertmgr:9093/api/v1/status timeout=40s
INFO[0000] [default] Remote Alertmanager version: 0.15.0~rc.1~git20180507.28967e3+ds
panic: semver: Parse(0.15.0~rc.1~git20180507.28967e3+ds): Invalid character(s) found in patch number "0~rc.1~git20180507.28967e3"
goroutine 11 [running]:
github.com/cloudflare/unsee/vendor/github.com/blang/semver.MustParse(0xc420025440, 0x22, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/travis/gopath/src/github.com/cloudflare/unsee/vendor/github.com/blang/semver/semver.go:319 +0x20c
github.com/cloudflare/unsee/internal/mapper/v04.SilenceMapper.IsSupported(0x0, 0x0, 0xc420025440, 0x22, 0xc4201066a0)
/home/travis/gopath/src/github.com/cloudflare/unsee/internal/mapper/v04/silences.go:67 +0x70
github.com/cloudflare/unsee/internal/mapper.GetSilenceMapper(0xc420025440, 0x22, 0xc42002e500, 0xc420065418, 0xc420065420, 0xc4200907d0)
/home/travis/gopath/src/github.com/cloudflare/unsee/internal/mapper/mapper.go:59 +0xa1
github.com/cloudflare/unsee/internal/alertmanager.(*Alertmanager).pullSilences(0xc420161110, 0xc420025440, 0x22, 0x0, 0x0)
/home/travis/gopath/src/github.com/cloudflare/unsee/internal/alertmanager/models.go:107 +0x79
github.com/cloudflare/unsee/internal/alertmanager.(*Alertmanager).Pull(0xc420161110, 0x23, 0xc420040f90)
/home/travis/gopath/src/github.com/cloudflare/unsee/internal/alertmanager/models.go:300 +0x6d
main.pullFromAlertmanager.func1(0xc420026f20, 0xc420161110)
/home/travis/gopath/src/github.com/cloudflare/unsee/timer.go:25 +0xab
created by main.pullFromAlertmanager
/home/travis/gopath/src/github.com/cloudflare/unsee/timer.go:23 +0x10e
I realize that this is technically not a bug in Unsee, rather in blang/semver, but perhaps Unsee could offer a workaround?
We should document prometheus/alertmanager#609.
Hey there - thanks for unsee
, just started using and it's great.
One question: is it possible to remove either alert description
or summary
from the dashboard? Would clear up a good amount of dashboard space. For our purposes we don't need both.
Thanks!
Because ,
is used to separate filters it's impossible to use it as a filter value.
Would be great to enable taking the username from a HTTP header.
Dear all,
I have many Alertmanager and somes with TLS auth.
Is it possible to use a client cert in Unsee to scrape alerts?
Regards
The 'generator URL' which links to a graph in Prometheus UI is currently labelled according to the Alertmanager instance.
This is misleading, since the Prometheus instance that triggered the alert does not necessarily map to a given Alertmanager.
Moreover, it makes it hard to find the link. Anecdotally, most people using Unsee don't seem to be aware that you can click on the name of the Alertmanager instance and see the data that triggered an alert.
When trying to create a silence in Unsee, the POST fails because it has a double-slash in the URL, causing Alertmanager to respond with a 307 Temporary Redirect.
For example, the initial POST goes to http://ewr-alertmgr:9093//api/v1/silences
and fails with the 307. If I edit and resend the POST using Firefox developer console, with the extra slash removed, it succeeds.
Tested with Unsee 0.9.2 and Alertmanager 0.15.0-rc1.
Hi,
When an alertmanager is not available for some reasons, unsee UI starts showing a warning on the screen. The warning contains the full URL including the unmasked credentials (username/password) used for Basic Auth.
It would be nice if we can get the credentials part masked to not leak credentials that easily (e.g. the unsee UI is shown on TVs in our offices). Same should be the case for any outputs (logs etc.)
-- Sven
Hi,
I configured that requests from browsers to our Alertmanagers will be proxied via unsee. But when looking at the developer console when trying to create a new silence in the UI, i see that the POST request to the proxy-endpoint returns a 404. We are using unsee 0.9.1.
request:
POST https://unsee.mydomain.com/proxy/alertmanager/test/api/v1/silences 404
snippet form the config:
...
servers:
- name: test
uri: https://username:[email protected]
timeout: 30s
proxy: true
...
listen:
address: "0.0.0.0"
port: 8080 # port in Docker container (we are running unsee in k8s)
prefix: /
...
When triggering the silencing requests, i can see the following debug log:
time="2018-04-12T15:50:02Z" level=debug msg="[test] Proxy request for /api/v1/silences"
Any idea what causes the issue or how i can debug this?
Thx a lot,
Sven
we have alertmanager instance inside each of our environments, if unsee can get data from all of them and display them together, that will be great.
This is a fairly minor request, but when manually editing the query field- say to add alertname=My_Stupid_Alert, due to the JS manipulations that occur undo/redo logic isn't possible. Specifically since the text is converted to a different object, the browser cannot 'undo' that action.
When one screws up and removes long parts of the query, this is a bit annoying.
I only have a surface understanding of the DOM/events involved in this, but http://mattjmattj.github.io/simple-undo/ is an example that can likely be examined for which hooks we'd need to file. From there something like https://github.com/ArthurClemens/Javascript-Undo-Manager/blob/master/lib/undomanager.js gives an object approach for managing the queue of changes, and providing a 'history' that the undo/redo can act upon.
This line...
unsee/assets/static/silence.js
Line 95 in 4c306c0
d.push("curl " + alertmanagerSilencesAPIUrl(uri));
Produces a curl command like this, which is super handy!
curl https://alertmanager1.internal.net:9093//api/v1/silences
-X POST --data {
"matchers": [
{
"name": "alertname",
"value": "SshProbeFailing",
"isRegex": false
}
],
"startsAt": "2018-08-23T21:34:50.845Z",
"endsAt": "2018-08-23T22:34:50.989Z",
"createdBy": "[email protected]",
"comment": "I'm fixing this..."
}
But, it doesn't include the TLS settings (e.g. from the config file or commandline args). If TLS options are included, it should look more like this...
curl https://alertmanager1.internal.net:9093//api/v1/silences --cert /var/certs/cert.pem --key /var/certs/pkey.pem
-X POST --data {
"matchers": [
{
"name": "alertname",
"value": "SshProbeFailing",
"isRegex": false
}
],
"startsAt": "2018-08-23T21:34:50.845Z",
"endsAt": "2018-08-23T22:34:50.989Z",
"createdBy": "[email protected]",
"comment": "I'm fixing this..."
}
Some other monitoring systems have a concept of severity level for alerts, e.g. a high_memory_usage alert is much less important than a service_is_down alert. A way to differentiate (color, shape, size, etc.) in unsee to support severity level would be very helpful in case you have a lot of alert noise and wanted to be sure you won't miss an important alert somewhere. Or some ordering based on severity could help if you want to optimize your works on eliminating them.
A simple implementation would be allow user to config one label to be used as severity level and adding UI support for ordering based on that label (or even better, order based on any label).
It would be nice to be able to serve unsee from a given web prefix - so that it's easy peasy to put behind a proxy-pass'ing web server.
Docker Images from 0.7.1 to 0.8.0
I get:
prometh_unsee.1.pow67kacmekn@dallpdsm93050u | time="2018-01-22T23:03:34Z" level=error msg="[http] //alertmanager:9093/api/v1/status request failed: Unsupported URI scheme '' in '//alertmanager:9093/api/v1/status'"
In my compose file I've tried:
environment:
ALERTMANAGER_URIS: "http://alertmanager:9093"
environment:
ALERTMANAGER_URIS: 'http://alertmanager:9093'
environment:
ALERTMANAGER_URIS: alertmanager:9093"
After silencing an alert via the unsee dashboard I don't see the comment immediately. As a matter of fact I only see the comment after silence expiration. What am I missing? I'm using v0.9-14.
Using a docker-compose with the
image: cloudflare/unsee:latest
environment:
WEB_PREFIX: /prefix/
didn't working.
When looking for logs with
docker logs unsee
see that prefix is
msg=" prefix: /"
We should determine and document how to configure Unsee when running Alertmanager in a highly-available (HA) setup.
Version 0.5.x of Alertmanager introduces HA capability, which roughly works as follows:
That model works well for sending notifications (since you should receive a notification if it reached at least one of the peers), but less well for API queries since each Alertmanager instance may have a differing view of what alerts are currently firing.
Seems to me that the options are:
Rely on one instance and accept that some alerts may be missing - if they're severe enough, we should be paged for them anyway.
Try to poll all Alertmanager instances and merge the results.
My intuition says that option 1 is by far the most preferable, for simplicity's sake. If we agree, then we should document that approach.
Would be greate to mark some annotations HTML safe.
Cloned master. Trying "make run". Got:
go-bindata-assetfs -prefix assets -nometadata assets/templates/... assets/static/...
make: go-bindata-assetfs: Command not found
Makefile:41: recipe for target 'bindata_assetfs.go' failed
make: *** [bindata_assetfs.go] Error 127
INFO[0360] GET http://monitoring-1:9093/api/v1/silences?limit=4294967295
ERRO[0360] json: cannot unmarshal array into Go struct field SilenceAPIResponse.data of type alertmanager.silencesData
I often see people taking screenshots of alerts in Unsee and adding them to JIRA tickets to show what alert fired when they were investigating an issue. It's difficult to copy information from a screenshot, so it seems it would be useful to have a way to copy the alert details as text.
One solution is to send alerts directly to JIRA (we do that already), but for alerts not sent to JIRA, I think this would be useful. The same information can be copied from amtool but I think people are unlikely to jump from their browser to their shell to do that.
Maybe we should just provide a link to the alert page in Alertmanager where the information can be copy-pasted.
It's officially 2018, time for a cleaner UI code.
It should also use SSE or websockets instead of AJAX polling, which will require backend code changes.
As of #206, the silence ID is shown in the UI when a silence is added.
Suggest that the silence ID links to the Alertmanager silence page, e.g.:
https://alertmanager.example.com/#/silences/1234-1234-1234-1234
It's time to release unsee 0.5, but there is one thing might need changing first, @status
filter was added to the master branch to support new status
key from Alertmanager >=0.6.1
status
ended up being nested in Alertmanager (it was added to solve AM issue 609 and that was a long PR with lots of changes), current unsee implementation ended being slightly off with how Alertmanager is naming this, it should actually be @state
rather than @status
.
"status": {
"inhibitedBy": [],
"silencedBy": [],
"state": "active"
}
We should rename it before releasing 0.5, any objections @jamesog / @mattbostock / @Tenzer ?
Alertmanager API isn't stable yet, which means that 0.4
branch API is incompatible with 0.5
.
Let's cut 0.1
unsee branch that targets Alertmanager 0.4
and 0.2
branch that targets 0.5
API.
With that we can release v0.1.0
of the 0.1
branch and start work on v0.2.0
.
After that let's update README
and document which unsee one needs for given Alertmanager version.
From #128#issuecomment-313331723
Would it be an idea to have the unsee_collected_alerts split into which receiver they are for? As it is right now it's just a total number of all alerts received for the system, but adding the receiver as a dimension would allow different teams using the same Alertmanager instance to dig into how many alerts they currently have going off.
Seems like a good idea, but current code doesn't make it easy to add.
We can make unsee support multiple versions by first checking remote Alertmanager version via GET /api/v1/status
, reading the version and using the correct model for response handling.
Unsee does not work on our dashboard TVs (running PlayIPP), with the error message "Internal error TypeError Alerts is undefined". Any idea what would cause this? Just poor javascript support in the box?
We've protected the alertmanager and unsee instances directly over the proxyserver.
But submitting silences will not work, because the the alertmanager is directly queried from client side.
Possibly is there any ability to proxy those requests through unsee instance to alertmanager.
Hi,
Here my setup
It works well but the silence feature unfortunately tries to use this private address of the Alertmanager declared in the conf.
It would be nice if we could define for each Alertmanagers a public ip that the silence feature would use.
Cheers.
I would like to see the ability to create silences without an alert being fired.
Wger I try to create a silence, I get a 404 from unseen.
It is not even making API calls to alertmanager.
We're trying to clear up / simplify our unsee
dashboard by using STRIP_LABELS
- which is great. However many exporters add additional labels and adding them all to a list can be tedious.
Could a INCLUDE_LABELS
option be added, where we specifically say: we want to see labels a
, b,
and c
, but nothing else?
Thank ya.
There are a few open PRs, can we open this repo once they are merged?
Are there any other outstanding issues preventing us from making it public?
Small tasks to be done once it's open:
Hi. Thanks for the nice tool!
We (Upwork) have just one major problem with Unsee.
In our AlertManager setup we are grouping the same alerts by different criteria in order to provide different email notifications. Specifically, we have a lot of microservices maintained by several teams. Every team maintains more than one service. Our alerts have both "service" and "team" labels. And AlertManager groups the alerts by service for daily email notification and groups the same alerts by team for weekly digest. So we have 2 nodes in our routing tree.
But Unsee groups all alerts by team only! It seems to be the first group met in AlertManager response, and Unsee just uses it. It is not convenient for us. We'd rather prefer to have the alerts grouped by service in Unsee. But unfortunately this is not configurable at all.
Have you ever considered storing all alerts received from AlertManager as a plain list without any grouping, and then allowing the user to define any "groupBy" condition he want? Perfectly if in UI directly, but in config file is OK too. It'd be just cool, at least for us.
Thanks in advance for any answer!
Hi!
thanks for unsee! I'd like to include clickable links in annotation text (e.g. link to relevant logs, dashboards, etc) is there a way to do that? AFAICS the annotation text now isn't styled in any way.
thanks!
Trying to build with go 1.9 and i am getting this error tried 0.8 and master.
[11] ./unsee.js 12.7 kB {0} [built] [prefetched]
[15] ./templates.js 2.84 kB {0} [built]
[25] ./alerts.js 5.69 kB {0} [built]
[27] ./autocomplete.js 2.04 kB {0} [built]
[28] ./colors.js 1.68 kB {0} [built]
[29] ./config.js 3.45 kB {0} [built]
[37] ./filters.js 7.63 kB {0} [built]
[50] ./counter.js 1.81 kB {0} [built]
[51] ./grid.js 1.43 kB {0} [built]
[53] ./summary.js 1.82 kB {0} [built]
[266] ./progress.js 1.03 kB {0} [built]
[270] ./silence.js 13.5 kB {0} [built]
[271] ./unsilence.js 3.07 kB {0} [built]
[272] ./watchdog.js 1.71 kB {0} [built]
[273] ./help.js 126 bytes {1} [built] [prefetched]
+ 273 hidden modules
go-bindata-assetfs -prefix assets -nometadata assets/templates/... assets/static/dist/...
go build -ldflags "-X main.version="
alerts.go:6:2: use of internal package not allowed
main.go:10:2: use of internal package not allowed
alerts.go:7:2: use of internal package not allowed
alerts.go:8:2: use of internal package not allowed
views.go:13:2: use of internal package not allowed
main.go:11:2: use of internal package not allowed
make: *** [unsee] Error 1
Please help me. Thanks
@prymitive any thoughts
It would be good to be able to specify labels that are not part of a currently firing alert so that we can reduce the number of silences that are needed.
For exampe I would like to do the following:
{ "node": "foo", "instance": "foo" }
on an alert where only instance is firing
We need to run more strict tests than just the ones using mock files. We need to add support for integration tests that will spawn an instance of Alertmanager, generate some alerts and check if we can read those correctly.
By popular request from @terinjokes - people want binaries, so let's have it.
The silences format unsee is expecting seems different from what is being sent by alertmanager 0.5.1,
I see the following error.
ERRO[0000] json: cannot unmarshal array into Go struct field SilenceAPIResponse.data of type alertmanager.silencesData
The changes in #23 resolved this error. It may be that you are running against a newer version of AM.
The gin_requests_total
metrics output by Unsee contain the request URI, which can contain arbitrary values. This produces very high cardinality in Unsee's metrics.
The handler
label should be sufficient; access logs can be used for more detailed analysis.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.