Git Product home page Git Product logo

prometheus / alertmanager Goto Github PK

View Code? Open in Web Editor NEW
6.5K 180.0 2.1K 40.71 MB

Prometheus Alertmanager

Home Page: https://prometheus.io

License: Apache License 2.0

Go 81.87% Makefile 0.42% HTML 0.84% Shell 0.19% Elm 15.19% Dockerfile 0.08% CSS 0.14% JavaScript 0.20% Procfile 0.04% TypeScript 1.03%
monitoring alertmanager pagerduty email notifications opsgenie slack deduplication hacktoberfest

alertmanager's Introduction

Prometheus
Prometheus

Visit prometheus.io for the full documentation, examples and guides.

CI Docker Repository on Quay Docker Pulls Go Report Card CII Best Practices Gitpod ready-to-code Fuzzing Status OpenSSF Scorecard

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.

The features that distinguish Prometheus from other metrics and monitoring systems are:

  • A multi-dimensional data model (time series defined by metric name and set of key/value dimensions)
  • PromQL, a powerful and flexible query language to leverage this dimensionality
  • No dependency on distributed storage; single server nodes are autonomous
  • An HTTP pull model for time series collection
  • Pushing time series is supported via an intermediary gateway for batch jobs
  • Targets are discovered via service discovery or static configuration
  • Multiple modes of graphing and dashboarding support
  • Support for hierarchical and horizontal federation

Architecture overview

Architecture overview

Install

There are various ways of installing Prometheus.

Precompiled binaries

Precompiled binaries for released versions are available in the download section on prometheus.io. Using the latest production release binary is the recommended way of installing Prometheus. See the Installing chapter in the documentation for all the details.

Docker images

Docker images are available on Quay.io or Docker Hub.

You can launch a Prometheus container for trying it out with

docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus

Prometheus will now be reachable at http://localhost:9090/.

Building from source

To build Prometheus from source code, You need:

Start by cloning the repository:

git clone https://github.com/prometheus/prometheus.git
cd prometheus

You can use the go tool to build and install the prometheus and promtool binaries into your GOPATH:

GO111MODULE=on go install github.com/prometheus/prometheus/cmd/...
prometheus --config.file=your_config.yml

However, when using go install to build Prometheus, Prometheus will expect to be able to read its web assets from local filesystem directories under web/ui/static and web/ui/templates. In order for these assets to be found, you will have to run Prometheus from the root of the cloned repository. Note also that these directories do not include the React UI unless it has been built explicitly using make assets or make build.

An example of the above configuration file can be found here.

You can also build using make build, which will compile in the web assets so that Prometheus can be run from anywhere:

make build
./prometheus --config.file=your_config.yml

The Makefile provides several targets:

  • build: build the prometheus and promtool binaries (includes building and compiling in web assets)
  • test: run the tests
  • test-short: run the short tests
  • format: format the source code
  • vet: check the source code for common errors
  • assets: build the React UI

Service discovery plugins

Prometheus is bundled with many service discovery plugins. When building Prometheus from source, you can edit the plugins.yml file to disable some service discoveries. The file is a yaml-formated list of go import path that will be built into the Prometheus binary.

After you have changed the file, you need to run make build again.

If you are using another method to compile Prometheus, make plugins will generate the plugins file accordingly.

If you add out-of-tree plugins, which we do not endorse at the moment, additional steps might be needed to adjust the go.mod and go.sum files. As always, be extra careful when loading third party code.

Building the Docker image

The make docker target is designed for use in our CI system. You can build a docker image locally with the following commands:

make promu
promu crossbuild -p linux/amd64
make npm_licenses
make common-docker-amd64

Using Prometheus as a Go Library

Remote Write

We are publishing our Remote Write protobuf independently at buf.build.

You can use that as a library:

go get buf.build/gen/go/prometheus/prometheus/protocolbuffers/go@latest

This is experimental.

Prometheus code base

In order to comply with go mod rules, Prometheus release number do not exactly match Go module releases. For the Prometheus v2.y.z releases, we are publishing equivalent v0.y.z tags.

Therefore, a user that would want to use Prometheus v2.35.0 as a library could do:

go get github.com/prometheus/[email protected]

This solution makes it clear that we might break our internal Go APIs between minor user-facing releases, as breaking changes are allowed in major version zero.

React UI Development

For more information on building, running, and developing on the React-based UI, see the React app's README.md.

More information

  • Godoc documentation is available via pkg.go.dev. Due to peculiarities of Go Modules, v2.x.y will be displayed as v0.x.y.
  • See the Community page for how to reach the Prometheus developers and users on various communication channels.

Contributing

Refer to CONTRIBUTING.md

License

Apache License 2.0, see LICENSE.

alertmanager's People

Contributors

alexweav avatar asquare14 avatar benridley avatar beorn7 avatar brancz avatar brian-brazil avatar dependabot[bot] avatar diogonicoleti avatar discordianfish avatar fabxc avatar gotjosh avatar grobie avatar grobinson-grafana avatar johncming avatar josedonizetti avatar juliusv avatar metalmatze avatar mxinden avatar nexucis avatar pgier avatar pracucci avatar prombot avatar pstibrany avatar roidelapluie avatar sdurrheimer avatar simonpasquier avatar stuartnelson3 avatar superq avatar vladimiroff avatar w0rm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alertmanager's Issues

web/static/js/alerts.js should use -web.path-prefix

Currently the the API is assumed to be located at /api/ which if -web.path-prefix is set it is most likely not.

Looking around how the URLs are constructed, this might be a case of just changing to 'api/..' instead of '/api/..'. I'd be happy to do a pull request either way if I get some guidance on how you want to solve this.

GeneratorURL generates wrong link

I setuped notifications via Slack and all is ok. However link I receive is always broken:
Here is example:
bls:9090/graph#%5B%7B%22expr%22%3A%22up%20%3D%3D%200%22%2C%22tab%22%3A0%7D%5D

POST to /api/alerts causes invalid memory address or nil pointer dereference

The current prometheus is crashing alertmanager when sending alerts:

2014/04/03 11:42:04 /go/src/code.google.com/p/gorest/gorest.go:194 (0x572cf3)
  google.com/p/gorest.func.001: log.Printf("%s", debug.Stack())
/usr/local/go/src/pkg/runtime/panic.c:248 (0x415766)
  panic: runtime·newstackcall(d->fn, (byte*)d->args, d->siz);
/usr/local/go/src/pkg/runtime/panic.c:482 (0x41600d)
  panicstring: runtime·panic(err);
/usr/local/go/src/pkg/runtime/os_linux.c:234 (0x414eaa)
  sigpanic: runtime·panicstring("invalid memory address or nil pointer dereference");
/go/src/github.com/prometheus/alertmanager/web/api/alert.go:40 (0x46d70c)
  com/prometheus/alertmanager/web/api.AlertManagerService.AddAlerts: s.Manager.Receive(as)
/go/src/github.com/prometheus/alertmanager/web/api/alert.go:1 (0x46e6d4)
  com/prometheus/alertmanager/web/api.(*AlertManagerService).AddAlerts: // Copyright 2013 Prometheus Team
/usr/local/go/src/pkg/runtime/asm_amd64.s:339 (0x426b32)
  call32: CALLFN(call32, 32)
/usr/local/go/src/pkg/reflect/value.go:474 (0x490a7b)
  Value.call: call(fn, ptr, uint32(size))
/usr/local/go/src/pkg/reflect/value.go:345 (0x48fb6d)
  Value.Call: return v.call("Call", in)
/go/src/code.google.com/p/gorest/reflect.go:391 (0x5708c0)
  google.com/p/gorest.prepareServe: ret = servVal.Method(ep.methodNumberInParent).Call(arrArgs)
/go/src/code.google.com/p/gorest/gorest.go:215 (0x56771b)
  google.com/p/gorest.manager.ServeHTTP: data, state := prepareServe(ctx, ep)
/go/src/code.google.com/p/gorest/api.go:1 (0x5739ae)
  google.com/p/gorest.(*manager).ServeHTTP: //Copyright 2011 Siyabonga Dlamini ([email protected]). All rights reserved.
/go/src/github.com/prometheus/alertmanager/web/compression.go:90 (0x46b2b2)
  com/prometheus/alertmanager/web.compressionHandler.ServeHTTP: c.handler.ServeHTTP(compWriter, req)
/go/src/github.com/prometheus/alertmanager/web/alerts.go:1 (0x46d3ee)
  com/prometheus/alertmanager/web.(*compressionHandler).ServeHTTP: // Copyright 2013 Prometheus Team
/go/src/github.com/prometheus/client_golang/prometheus/exp/coarsemux.go:65 (0x573e0a)
  com/prometheus/client_golang/prometheus/exp.handlerDelegator.ServeHTTP: h.delegate.ServeHTTP(rwd, r)
/go/src/github.com/prometheus/client_golang/prometheus/exp/coarsemux.go:1 (0x5755b4)
  com/prometheus/client_golang/prometheus/exp.(*handlerDelegator).ServeHTTP: // Copyright (c) 2013, Prometheus Team
/usr/local/go/src/pkg/net/http/server.go:1496 (0x537f03)
  (*ServeMux).ServeHTTP: h.ServeHTTP(w, r)
/usr/local/go/src/pkg/net/http/server.go:1597 (0x53870e)
  serverHandler.ServeHTTP: handler.ServeHTTP(rw, req)
/usr/local/go/src/pkg/net/http/server.go:1167 (0x5365a7)
  (*conn).serve: serverHandler{c.server}.ServeHTTP(w, w.req)
/usr/local/go/src/pkg/runtime/proc.c:1394 (0x419a60)
  goexit: runtime·goexit(void)

I've did a tcpdump and reproduced what prometheus posts:

 curl -L -d '[{"Description":"http://192.168.100.66:9080/metrics has been down for more than 5 minutes.","Labels":{"alertname":"NodeDown","instance":"http://192.168.100.66:9080/metrics","job":"node_exporter","severity":"page"},"Payload":{"ActiveSince":"2014-04-03T11:20:49Z","AlertingRule":"ALERT NodeDown IF (up == 0) FOR 2m WITH {severity=\"page\"}","GeneratorUrl":"http://19275cb8ad55:9090/graph#%5B%7B%22expr%22%3A%22%28up%20%3D%3D%200%29%22%2C%22tab%22%3A1%7D%5D","Value":"0.000000"},"Summary":"Node http://192.168.100.66:9080/metrics down"}]' -X POST localhost:9090/api/alerts

It seems that in web/api/alert.go:40 s.Manager is nil.

Makefile doesn't work with make -j

I have MAKEFLAGS=-j4 in my environment, and with this the makefile seems to try go access .deps/go before it has been extracted from the tarball:

# git clone https://github.com/prometheus/alertmanager.git
# cd alertmanager
# MAKEFLAGS=-j4 make
make: Entering directory `/home/alex/alertmanager'
mkdir -p .deps
mkdir -p /home/alex/alertmanager/.deps/gopath/src/github.com/prometheus
make -C web
make -C config
curl -o .deps/go1.4.1.linux-amd64.tar.gz -L https://golang.org/dl/go1.4.1.linux-amd64.tar.gz
ln -s /home/alex/alertmanager /home/alex/alertmanager/.deps/gopath/src/github.com/prometheus/alertmanager
make[1]: Entering directory `/home/alex/alertmanager/config'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/alex/alertmanager/config'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0make[1]: Entering directory `/home/alex/alertmanager/web'
./blob/embed-static.sh static templates | /home/alex/alertmanager/.deps/go/bin/gofmt > blob/files.go
/bin/sh: 1: /home/alex/alertmanager/.deps/go/bin/gofmt: not found
cat: write error: Broken pipe
make[1]: *** [blob/files.go] Error 127
make: *** [web] Error 2
make: *** Waiting for unfinished jobs....
make[1]: Leaving directory `/home/alex/alertmanager/web'
100    87  100    87    0     0    942      0 --:--:-- --:--:-- --:--:--   935
100 59.5M  100 59.5M    0     0  66.4M      0 --:--:-- --:--:-- --:--:-- 66.4M
make: Leaving directory `/home/alex/alertmanager'

I don't know GNU Make well enough to know if parallelism requires that you take special care with your rules. But apparently if you're not compatible with parallelism you can disable it altogether with the .NOTPARALLEL pseudo-target, explained here.

While we're talking makefiles, it's probably a good idea to run curl as curl -q to disable any local ~/.curlrc from possibly disrupting your script.

Use ISO 8601 date format everywhere.

While dates are in general displayed as ISO 8601, the date-and-time fields in the form to create or edit a silence have some other format (depending on locale settings of the browser?). ISO 8601 format should be used consistently.

Slack Integration

Has anyone had issues with the Slack Integration?

My current configuration is as follows (webhook url obfuscated) -

notification_config {
  name: "slack"
  slack_config {
      webhook_url: "https://hooks.slack.com/services/T03Rsdfklsjdflksjdfkljskldfklsjdf8zo"
      send_resolved: true
    }
}

aggregation_rule {
  notification_config_name: "slack"
}

I've tested that the webhook url works using the following curl command -

url -X POST --data-urlencode 'payload={"text": "This is posted to <#general> and comes from *monkey-bot*.", "channel": "#general", "username": "monkey-bot", "icon_emoji": ":monkey_face:"}' https://hooks.slack.com/services/T03Rsdfklsjdflksjdfkljskldfklsjdf8zo
ok%

However I get the following error in AlertManager when a notification is triggered -

==> /var/log/alert_manager.log <==
time="2015-08-13T15:23:50Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398 
time="2015-08-13T15:23:50Z" level=info msg="Sent 1 notifications" file=manager.go line=353 
time="2015-08-13T15:23:55Z" level=error msg="Error sending Slack notification: Post https://hooks.slack.com/services/T03Rsdfklsjdflksjdfkljskldfklsjdf8zo: net/http: request canceled while waiting for connection" file=notifier.go line=617 

I can see that a connection is being made but as it's HTTPS I can't see the details -

15:23:55.344214 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [S], seq 2276716265, win 29200, options [mss 1460,sackOK,TS val 316406 ecr 0,nop,wscale 6], length 0
15:23:55.383832 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [S.], seq 208576001, ack 2276716266, win 65535, options [mss 1460], length 0
15:23:55.383944 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [.], ack 1, win 29200, length 0
15:23:55.385009 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [P.], seq 1:147, ack 1, win 29200, length 146
15:23:55.385207 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [.], ack 147, win 65535, length 0
15:23:55.426374 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [P.], seq 1:1229, ack 147, win 65535, length 1228
15:23:55.426406 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [.], ack 1229, win 31928, length 0
15:23:55.426442 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [P.], seq 1229:2457, ack 147, win 65535, length 1228
15:23:55.426450 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [.], ack 2457, win 34384, length 0
15:23:55.426469 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [P.], seq 2457:2833, ack 147, win 65535, length 376
15:23:55.426472 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [.], ack 2833, win 36840, length 0
15:23:55.446903 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [P.], seq 147:222, ack 2833, win 36840, length 75
15:23:55.447114 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [.], ack 222, win 65535, length 0
15:23:55.447266 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [P.], seq 222:228, ack 2833, win 36840, length 6
15:23:55.447355 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [P.], seq 228:273, ack 2833, win 36840, length 45
15:23:55.447420 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [.], ack 228, win 65535, length 0
15:23:55.447472 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [.], ack 273, win 65535, length 0
15:23:55.567803 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [P.], seq 2833:2884, ack 273, win 65535, length 51
15:23:55.606955 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [.], ack 2884, win 36840, length 0
15:24:25.567987 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [.], ack 2884, win 36840, length 0
15:24:25.568260 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [.], ack 273, win 65535, length 0
15:24:55.670983 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [.], ack 2884, win 36840, length 0
15:24:55.671307 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [.], ack 273, win 65535, length 0
15:24:55.754572 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [P.], seq 2884:2915, ack 273, win 65535, length 31
15:24:55.754614 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [.], ack 2915, win 36840, length 0
15:24:55.754840 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [P.], seq 273:304, ack 2915, win 36840, length 31
15:24:55.754916 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [F.], seq 304, ack 2915, win 36840, length 0
15:24:55.755011 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [.], ack 304, win 65535, length 0
15:24:55.755017 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [.], ack 305, win 65535, length 0
15:24:55.794813 IP ec2-52-20-82-67.compute-1.amazonaws.com.https > 10.0.2.15.53373: Flags [F.], seq 2915, ack 305, win 65535, length 0
15:24:55.794863 IP 10.0.2.15.53373 > ec2-52-20-82-67.compute-1.amazonaws.com.https: Flags [.], ack 2916, win 36840, length 0

I've tried setting 'channel' in slack_config to both my '@mynickname' and '#mychannel' but this didn't seem to fix the issue.

I've looked around and I can't see many examples of the Slack integration which makes me think that it might actually be broken.

Any help is much appreciated!

First thoughts on using AlertManager

  • The documentation suggests using config/fixtures/sample.conf.input as a starting point for your own config. This file applies a filter to all alerts, meaning your alerts will most likely end up being ignored. I suggest removing the filter from this file (or commenting it out, or creating a better starter config)
  • The documentation suggests reading config/config.proto to learn more about the configuration, but unless you know how to interpret golang variables and structs it doesn't actually help you with the layout of your configuration file. I'd rather see an actual configuration file with everything in it.
  • Explain how you can enable/disable certain notification types
  • Give Prometheus, PromDash and AlertManager each a distinct favicon (diff. color versions of the same favicon fe.) so I can see which browser tab goes where.
  • Adding a column "next notification moment" in the web-ui alert list would be nice
  • -notification.smtp.smarthost requires you specify the port number. It would be user-friendly if AlertManager would assume 25 if no port is specified

alertmanager.url can not be an alias setup in /etc/hosts

Here is part of my docker-compose.yml

promalertmgr:
  build: promalertmgr/
  container_name: promalertmgr
  ports:
    - "9093:9093"
  expose:
    - "9093"
  command: "-config.file=/alertmanager.conf"
  env_file:
    - './.env'

prometheus:
  build: prometheus/
  container_name: prometheus
  links:
    - cadvisor
    - promalertmgr
    - streamapp
    - dashboardapp
    - analyzeapp
    - mongo
  ports:
    - "9090:9090"
  command: "-config.file=/etc/prometheus/prometheus.yml -alertmanager.url=http://192.168.99.100:9093"
  env_file:
    - './.env'

If I change this:

-alertmanager.url=http://192.168.99.100:9093

into this:

-alertmanager.url=promalertmgr:9093

It does not work anymore.

Load templates from file

The current email config is in notifier.go as a blob of text.

This can pretty obviously be loaded up from a file specified as a configuration option, since email is one of those things you'd want to pretty heavily customize.

I'm just dropping this here so I can reference it later when I get a chance to have a poke at it.

Pushover notifications are broken

Enabling pushover results on the following error:

time="2015-07-22T17:10:40Z" level=error msg="Error sending Pushover notification: pushover: request 531f66f26c76006b84864f02a434b858: retry is invalid" file=notifier.go line=640

tokens we verified and correct.

Allow aggregations based on alert names

Something I'm setting up right now - I have an alert where I just want to grab a specific alert name for a temporary scheme (in this case DiskSpaceAlert) rather then monitoring a swathe of systems.

This is a pretty common use case when you have system ownership, or (as in my current case) are running a long running job and want special case alerts to go to just 1 or several people.

JSON api for unsilenced alerts

With the /api/alerts interface, the next immediate problem is having a quick way to get unsilenced alerts (or silenced alerts) without having to make a lot of calls to the silence API and without needing to externally duplicate the alertmanager silencing logic.

I'm presently thinking /api/alerts/silenced and /api/alerts/unsilenced but that feels slightly unclean.

Emails are not encoded properly

If alert summary and/or description contains non-ascii symbols, it breaks emails.

  • There is no content type headers in email and most of mail clients don't use UTF-8 by default - the message body is broken in most of mail clients.
  • If non-ascii symbols will end up in email subject, it's even worse. All headers (including subject) need to be encoded properly and if they are not, various sanity checks in mail transport (including virus and spam scanners) could throw these mails away. For example Amavis complains here:

BAD HEADER Non-encoded 8-bit data (char C3 hex) in message header 'Subject'

Null pointer dereference when reloading config via HUP

Just a contribution of an interesting crash case. Noting here for my own and others reference.

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x48d82d]
goroutine 14 [running]:
github.com/prometheus/alertmanager/manager.(*memoryAlertManager).removeExpiredAggregates(0xc20803d720)
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:282 +0x23d
github.com/prometheus/alertmanager/manager.(*memoryAlertManager).runIteration(0xc20803d720)
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:395 +0x33
github.com/prometheus/alertmanager/manager.(*memoryAlertManager).Run(0xc20803d720)
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:408 +0x9a
created by main.main
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/main.go:82 +0xb42
goroutine 1 [chan receive, 4 minutes]:
github.com/prometheus/alertmanager/manager.(*notifier).Dispatch(0xc2080fc8a0)
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/notifier.go:639 +0x54
main.main()
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/main.go:128 +0x1380
goroutine 39 [IO wait, 1 minutes]:
net.(*pollDesc).Wait(0xc2080a3100, 0x72, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2080a3100, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2080a30a0, 0xc208215000, 0x1000, 0x1000, 0x0, 0x7f7f0ed64d30, 0xc2082a2008)
/home/wrouesnel/opt/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20802a598, 0xc208215000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/net.go:121 +0xdc
net/http.(*liveSwitchReader).Read(0xc20803c048, 0xc208215000, 0x1000, 0x1000, 0xc00000000000000, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:214 +0xab
io.(*LimitedReader).Read(0xc2080fd620, 0xc208215000, 0x1000, 0x1000, 0xc2082a2040, 0x0, 0x0)
alertmanager.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Unit alertmanager.service entered failed state.
/home/wrouesnel/opt/go/src/io/io.go:408 +0xce
bufio.(*Reader).fill(0xc20805aa80)
/home/wrouesnel/opt/go/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).ReadSlice(0xc20805aa80, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:295 +0x257
bufio.(*Reader).ReadLine(0xc20805aa80, 0x0, 0x0, 0x0, 0xc207fd5f00, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:324 +0x62
net/textproto.(*Reader).readLineSlice(0xc2082a0060, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:55 +0x9e
net/textproto.(*Reader).ReadLine(0xc2082a0060, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:36 +0x4f
net/http.ReadRequest(0xc20805aa80, 0xc20837c000, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/request.go:598 +0xcb
net/http.(*conn).readRequest(0xc20803c000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:586 +0x26f
net/http.(*conn).serve(0xc20803c000)
/home/wrouesnel/opt/go/src/net/http/server.go:1162 +0x69e
created by net/http.(*Server).Serve
/home/wrouesnel/opt/go/src/net/http/server.go:1751 +0x35e
goroutine 13 [chan receive]:
main.func·001()
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/main.go:60 +0x84
created by main.main
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/main.go:65 +0x659
goroutine 15 [IO wait]:
net.(*pollDesc).Wait(0xc2080a2fb0, 0x72, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2080a2fb0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).accept(0xc2080a2f50, 0x0, 0x7f7f0ed64d30, 0xc208304a10)
/home/wrouesnel/opt/go/src/net/fd_unix.go:419 +0x40b
net.(*TCPListener).AcceptTCP(0xc20802a558, 0x61a3fe, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/tcpsock_posix.go:234 +0x4e
net/http.tcpKeepAliveListener.Accept(0xc20802a558, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:1976 +0x4c
net/http.(*Server).Serve(0xc20805a900, 0x7f7f0ed67238, 0xc20802a558, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:1728 +0x92
net/http.(*Server).ListenAndServe(0xc20805a900, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:1718 +0x154
net/http.ListenAndServe(0x87c260, 0x5, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:1808 +0xba
github.com/prometheus/alertmanager/web.WebService.ServeForever(0xc208104990, 0xc2081049c0, 0xc2080fcd60, 0xc20802d810, 0x87b860, 0x1, 0x0, 0x0)
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/web/web.go:80 +0x8d0
created by main.main
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/main.go:116 +0x10b1
goroutine 16 [select, 6 minutes]:
github.com/prometheus/alertmanager/config.(*fileWatcher).Watch(0xc208106930, 0xc2081049f0)
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/config/watcher.go:49 +0x8fb
created by main.main
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/main.go:125 +0x1270
goroutine 18 [syscall, 6 minutes]:
syscall.Syscall(0x0, 0x3, 0xc20816dee0, 0x10000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/syscall/asm_linux_amd64.s:21 +0x5
syscall.read(0x3, 0xc20816dee0, 0x10000, 0x10000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/syscall/zsyscall_linux_amd64.go:867 +0x6e
syscall.Read(0x3, 0xc20816dee0, 0x10000, 0x10000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/syscall/syscall_unix.go:136 +0x58
gopkg.in/fsnotify%2ev0.(*Watcher).readEvents(0xc20805a3c0)
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify_linux.go:219 +0x12c
created by gopkg.in/fsnotify%2ev0.NewWatcher
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify_linux.go:126 +0x420
goroutine 19 [chan receive, 6 minutes]:
gopkg.in/fsnotify%2ev0.(*Watcher).purgeEvents(0xc20805a3c0)
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify.go:21 +0x55
created by gopkg.in/fsnotify%2ev0.NewWatcher
/home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify_linux.go:127 +0x43a
goroutine 43 [IO wait, 3 minutes]:
net.(*pollDesc).Wait(0xc2080a2220, 0x72, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2080a2220, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2080a21c0, 0xc208233000, 0x1000, 0x1000, 0x0, 0x7f7f0ed64d30, 0xc2082a37e8)
/home/wrouesnel/opt/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20802a2b8, 0xc208233000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/net.go:121 +0xdc
net/http.(*liveSwitchReader).Read(0xc20803c5e8, 0xc208233000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:214 +0xab
io.(*LimitedReader).Read(0xc20826c040, 0xc208233000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
/home/wrouesnel/opt/go/src/io/io.go:408 +0xce
bufio.(*Reader).fill(0xc20805a060)
/home/wrouesnel/opt/go/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).ReadSlice(0xc20805a060, 0x7e0a, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:295 +0x257
bufio.(*Reader).ReadLine(0xc20805a060, 0x0, 0x0, 0x0, 0xc208109000, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:324 +0x62
net/textproto.(*Reader).readLineSlice(0xc20826e0f0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:55 +0x9e
net/textproto.(*Reader).ReadLine(0xc20826e0f0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:36 +0x4f
net/http.ReadRequest(0xc20805a060, 0xc20837cdd0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/request.go:598 +0xcb
net/http.(*conn).readRequest(0xc20803c5a0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:586 +0x26f
net/http.(*conn).serve(0xc20803c5a0)
/home/wrouesnel/opt/go/src/net/http/server.go:1162 +0x69e
created by net/http.(*Server).Serve
/home/wrouesnel/opt/go/src/net/http/server.go:1751 +0x35e
goroutine 44 [IO wait, 3 minutes]:
net.(*pollDesc).Wait(0xc2080a2ae0, 0x72, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2080a2ae0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2080a2a80, 0xc20826a000, 0x1000, 0x1000, 0x0, 0x7f7f0ed64d30, 0xc2082a3730)
/home/wrouesnel/opt/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20802a2c0, 0xc20826a000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/net.go:121 +0xdc
net/http.(*liveSwitchReader).Read(0xc20803c908, 0xc20826a000, 0x1000, 0x1000, 0xc208254600, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:214 +0xab
io.(*LimitedReader).Read(0xc20826c100, 0xc20826a000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
/home/wrouesnel/opt/go/src/io/io.go:408 +0xce
bufio.(*Reader).fill(0xc20805a420)
/home/wrouesnel/opt/go/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).ReadSlice(0xc20805a420, 0x1e0a, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:295 +0x257
bufio.(*Reader).ReadLine(0xc20805a420, 0x0, 0x0, 0x0, 0xc208109000, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:324 +0x62
net/textproto.(*Reader).readLineSlice(0xc20826e2d0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:55 +0x9e
net/textproto.(*Reader).ReadLine(0xc20826e2d0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:36 +0x4f
net/http.ReadRequest(0xc20805a420, 0xc20837cd00, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/request.go:598 +0xcb
net/http.(*conn).readRequest(0xc20803c8c0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:586 +0x26f
net/http.(*conn).serve(0xc20803c8c0)
/home/wrouesnel/opt/go/src/net/http/server.go:1162 +0x69e
created by net/http.(*Server).Serve
/home/wrouesnel/opt/go/src/net/http/server.go:1751 +0x35e
goroutine 45 [IO wait, 3 minutes]:
net.(*pollDesc).Wait(0xc2080a2e60, 0x72, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2080a2e60, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2080a2e00, 0xc208257000, 0x1000, 0x1000, 0x0, 0x7f7f0ed64d30, 0xc2082a3660)
/home/wrouesnel/opt/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20802a2c8, 0xc208257000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/net.go:121 +0xdc
net/http.(*liveSwitchReader).Read(0xc20803cae8, 0xc208257000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:214 +0xab
io.(*LimitedReader).Read(0xc20826c340, 0xc208257000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
/home/wrouesnel/opt/go/src/io/io.go:408 +0xce
bufio.(*Reader).fill(0xc20805a4e0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).ReadSlice(0xc20805a4e0, 0x7e0a, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:295 +0x257
bufio.(*Reader).ReadLine(0xc20805a4e0, 0x0, 0x0, 0x0, 0xc208109000, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:324 +0x62
net/textproto.(*Reader).readLineSlice(0xc20826e4b0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:55 +0x9e
net/textproto.(*Reader).ReadLine(0xc20826e4b0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:36 +0x4f
net/http.ReadRequest(0xc20805a4e0, 0xc20837cc30, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/request.go:598 +0xcb
net/http.(*conn).readRequest(0xc20803caa0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:586 +0x26f
net/http.(*conn).serve(0xc20803caa0)
/home/wrouesnel/opt/go/src/net/http/server.go:1162 +0x69e
created by net/http.(*Server).Serve
/home/wrouesnel/opt/go/src/net/http/server.go:1751 +0x35e
goroutine 46 [IO wait, 3 minutes]:
net.(*pollDesc).Wait(0xc2080a3170, 0x72, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2080a3170, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2080a3110, 0xc208286000, 0x1000, 0x1000, 0x0, 0x7f7f0ed64d30, 0xc2082a35a0)
/home/wrouesnel/opt/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20802a2d0, 0xc208286000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/net.go:121 +0xdc
net/http.(*liveSwitchReader).Read(0xc20803d1c8, 0xc208286000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:214 +0xab
io.(*LimitedReader).Read(0xc20826c540, 0xc208286000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
/home/wrouesnel/opt/go/src/io/io.go:408 +0xce
bufio.(*Reader).fill(0xc20805a540)
/home/wrouesnel/opt/go/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).ReadSlice(0xc20805a540, 0x1fe0a, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:295 +0x257
bufio.(*Reader).ReadLine(0xc20805a540, 0x0, 0x0, 0x0, 0xc208109000, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:324 +0x62
net/textproto.(*Reader).readLineSlice(0xc20826e690, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:55 +0x9e
net/textproto.(*Reader).ReadLine(0xc20826e690, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:36 +0x4f
net/http.ReadRequest(0xc20805a540, 0xc20837cb60, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/request.go:598 +0xcb
net/http.(*conn).readRequest(0xc20803d180, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:586 +0x26f
net/http.(*conn).serve(0xc20803d180)
/home/wrouesnel/opt/go/src/net/http/server.go:1162 +0x69e
created by net/http.(*Server).Serve
/home/wrouesnel/opt/go/src/net/http/server.go:1751 +0x35e
goroutine 82 [IO wait]:
net.(*pollDesc).Wait(0xc20833b100, 0x72, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc20833b100, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc20833b0a0, 0xc20825f000, 0x1000, 0x1000, 0x0, 0x7f7f0ed64d30, 0xc208305018)
/home/wrouesnel/opt/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20802a5c8, 0xc20825f000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/net.go:121 +0xdc
net/http.(*liveSwitchReader).Read(0xc208318688, 0xc20825f000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:214 +0xab
io.(*LimitedReader).Read(0xc208300500, 0xc20825f000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
/home/wrouesnel/opt/go/src/io/io.go:408 +0xce
bufio.(*Reader).fill(0xc20805b0e0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).ReadSlice(0xc20805b0e0, 0xc2082c9b0a, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:295 +0x257
bufio.(*Reader).ReadLine(0xc20805b0e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:324 +0x62
net/textproto.(*Reader).readLineSlice(0xc20831b080, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:55 +0x9e
net/textproto.(*Reader).ReadLine(0xc20831b080, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:36 +0x4f
net/http.ReadRequest(0xc20805b0e0, 0xc208032b60, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/request.go:598 +0xcb
net/http.(*conn).readRequest(0xc208318640, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:586 +0x26f
net/http.(*conn).serve(0xc208318640)
/home/wrouesnel/opt/go/src/net/http/server.go:1162 +0x69e
created by net/http.(*Server).Serve
/home/wrouesnel/opt/go/src/net/http/server.go:1751 +0x35e
goroutine 49 [syscall, 4 minutes, locked to thread]:
runtime.goexit()
/home/wrouesnel/opt/go/src/runtime/asm_amd64.s:2232 +0x1
goroutine 69 [IO wait, 1 minutes]:
net.(*pollDesc).Wait(0xc2082a61b0, 0x72, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2082a61b0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2082a6150, 0xc208265000, 0x1000, 0x1000, 0x0, 0x7f7f0ed64d30, 0xc2082a2f48)
/home/wrouesnel/opt/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20802a370, 0xc208265000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/net.go:121 +0xdc
net/http.(*liveSwitchReader).Read(0xc2083180e8, 0xc208265000, 0x1000, 0x1000, 0xc20805a480, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:214 +0xab
io.(*LimitedReader).Read(0xc2082a42a0, 0xc208265000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
/home/wrouesnel/opt/go/src/io/io.go:408 +0xce
bufio.(*Reader).fill(0xc20805a600)
/home/wrouesnel/opt/go/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).ReadSlice(0xc20805a600, 0x1e0a, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:295 +0x257
bufio.(*Reader).ReadLine(0xc20805a600, 0x0, 0x0, 0x0, 0xc208109000, 0x0, 0x0)
/home/wrouesnel/opt/go/src/bufio/bufio.go:324 +0x62
net/textproto.(*Reader).readLineSlice(0xc2082a01e0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:55 +0x9e
net/textproto.(*Reader).ReadLine(0xc2082a01e0, 0x0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/textproto/reader.go:36 +0x4f
net/http.ReadRequest(0xc20805a600, 0xc20837c340, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/request.go:598 +0xcb
net/http.(*conn).readRequest(0xc2083180a0, 0x0, 0x0, 0x0)
/home/wrouesnel/opt/go/src/net/http/server.go:586 +0x26f
net/http.(*conn).serve(0xc2083180a0)
/home/wrouesnel/opt/go/src/net/http/server.go:1162 +0x69e
created by net/http.(*Server).Serve
/home/wrouesnel/opt/go/src/net/http/server.go:1751 +0x35e

Clicking 'silence alert' buttons on Alerts page does nothing

It looks like the edit silence form isn't present on that page, and therefore can't be reset.

Console stack trace:

Uncaught TypeError: Cannot read property 'reset' of undefined
alerts.js:118 initNewSilence
alerts.js:156 (anonymous function)
jquery.min.js:5 x.event.dispatch
jquery.min.js:5 y.handle

Unable to Build Latest

Funny problem, this morning I was able to build alertmanger fine but now I'm encountering the following issue when building -

root@dev-ubuntu-1404:/tmp/alertmanager# make
GOROOT=/tmp/alertmanager/.build/go1.4.2 GOPATH=/tmp/alertmanager/.build/gopath /tmp/alertmanager/.build/go1.4.2/bin/go build  -o alertmanager
# github.com/prometheus/alertmanager/manager
.build/gopath/src/github.com/prometheus/alertmanager/manager/notifier.go:478: unknown sns.PublishInput field 'TopicARN' in struct literal
make: *** [alertmanager] Error 2
root@dev-ubuntu-1404:/tmp/alertmanager# 

root@dev-ubuntu-1404:/tmp/alertmanager# git rev-parse HEAD
33d6097ce4f0a4b4f91476c18199f9a955ab86b1

Could this be something to do with the ws-sdk-go v0.9.0rc1 release today? Amazon said that there is a breaking change - https://github.com/aws/aws-sdk-go.

Incidentally simply commenting out line 478 allows me to build fine.

Edit: It look's like all that is needed is to simply rename TopicARN to TopicARN on line 478, after doing this all tests passed -

root@dev-ubuntu-1404:/tmp/alertmanager# cat manager/notifier.go | grep TopicAr
                TopicArn:         aws.String(config.GetTopicArn()),
root@dev-ubuntu-1404:/tmp/alertmanager# make test
GOROOT=/tmp/alertmanager/.build/go1.4.2 GOPATH=/tmp/alertmanager/.build/gopath /tmp/alertmanager/.build/go1.4.2/bin/go test ./...
?       _/tmp/alertmanager      [no test files]
ok      _/tmp/alertmanager/config       0.009s
?       _/tmp/alertmanager/config/generated     [no test files]
ok      _/tmp/alertmanager/manager      0.009s
?       _/tmp/alertmanager/web  [no test files]
?       _/tmp/alertmanager/web/api      [no test files]
?       _/tmp/alertmanager/web/blob     [no test files]
root@dev-ubuntu-1404:/tmp/alertmanager# 

I created a pull request with the change here.

Slack notification not sent - {{$labels.name}} returns <no value>

Hello,

I run alertmanager 0.0.4 in a docker (version 1.8.1) container. I tried to setup slack notification but although alerts appear on the UI, nothing is sent.

I /bin/sh in the image, installed curl to test if I could send a notification and it works so no network problem.

Second problem, I have the following alert :

ALERT MysqlDown
IF my_condition
FOR 5s
  WITH {
    severity="page"
  }
  SUMMARY "{{$labels.name}} at {{$labels.instance}} is down"
  DESCRIPTION "{{$labels.name}} at {{$labels.instance}} has been down for more than 5 seconds."

but on the alertmanager I get :

<no value> at <no value> is down

Let me know if I can provide more info.

OpsGenie Integration uses wrong 'close' endpoint

When an alert is opened or closed, OpsGenie uses the same endpoint.

Set here: https://github.com/prometheus/alertmanager/blob/master/manager/notifier.go#L89
Used here: https://github.com/prometheus/alertmanager/blob/master/manager/notifier.go#L671

However, the docs say that for opening and closing one should use different endpoints.

Open: https://api.opsgenie.com/v1/json/alert/
Close: https://api.opsgenie.com/v1/json/alert/close

This is causing errors when attempting to close, and means OpsGenie notifications must be manually closed.

level=info msg="Sent OpsGenie notification: 3df913c5cc5fbd74: HTTP 400: {"code":3,"error":
"DiscardableValidationException[Can not validate alert. Reason: Property [message] cannot be empty.]; nested: FieldValidationException[Property [message] cannot be empty.]; "}"
 file=notifier.go line=685 

Have way to do reporting on alertmanager information

Currently there's no way to report from prometheus on the silenced state of alerts. I propose exposing alerts via the /metrics endpoint as alertmanager_alerts or similar. At a minimum, the series should include the labels as passed from prometheus, as well as an additional bool label for silenced, and the current value.

The use-case is to allow consoles displaying alerts to take into consideration the silenced state. I could probably implement this, but I'm concerned that it might be wasted effort with a rewrite in the wings.

Refs prometheus/prometheus#966

Allow specifying silence duration, not just end time

If I go to /silences and add a silence by entering "45h" in the "Ends At" textbox the textbox updates with a time - however when I hit "Create Silence" it silently fails to create the silence. Using the date picker works.

Fails to build with Go 1.5.1 and can't download 1.4.2

Hi everyone. I'm trying to build the project on my local environment, but it's failing because I don't have Go 1.4.2 installed, and it's also failing to download it because the download URL is returning 404:

inkel@miralejos ~/dev/prometheus/alertmanager (master=)
$ make test
Go version 1.4.2 required but not found in PATH.
About to download and install go1.4.2 to /Users/inkel/dev/prometheus/alertmanager/.build/go1.4.2
Abort now if you want to manually install it system-wide instead.

mkdir -p /Users/inkel/dev/prometheus/alertmanager/.build/go1.4.2
curl -L https://golang.org/dl/go1.4.2.darwin-amd64-osx10.9.5.tar.gz | tar -C /Users/inkel/dev/prometheus/alertmanager/.build/go1.4.2 --strip 1 -xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    98  100    98    0     0     88      0  0:00:01  0:00:01 --:--:--    89
100   127  100   127    0     0     57      0  0:00:02  0:00:02 --:--:--     0
tar: Unrecognized archive format
tar: Error exit delayed from previous errors.
make: *** [/Users/inkel/dev/prometheus/alertmanager/.build/go1.4.2/bin/go] Error 1

inkel@miralejos ~/dev/prometheus/alertmanager (master=)
$ wget https://golang.org/dl/go1.4.2.darwin-amd64-osx10.9.5.tar.gz
--2015-10-06 20:15:14--  https://golang.org/dl/go1.4.2.darwin-amd64-osx10.9.5.tar.gz
Resolving golang.org... 64.233.186.141, 2800:3f0:4003:c00::8d
Connecting to golang.org|64.233.186.141|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://storage.googleapis.com/golang/go1.4.2.darwin-amd64-osx10.9.5.tar.gz [following]
--2015-10-06 20:15:14--  https://storage.googleapis.com/golang/go1.4.2.darwin-amd64-osx10.9.5.tar.gz
Resolving storage.googleapis.com... 64.233.186.128, 2800:3f0:4003:c00::80
Connecting to storage.googleapis.com|64.233.186.128|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2015-10-06 20:15:15 ERROR 404: Not Found.

This is the version I'm running locally:

inkel@miralejos ~/dev/prometheus/alertmanager (master=)
$ go version
go version go1.5.1 darwin/amd64

Note that assigning GO_VERSION environment variable does the trick:

$ GO_VERSION=1.5.1 make test
mkdir -p /Users/inkel/dev/prometheus/alertmanager/.build/gopath/src/github.com/prometheus/
ln -s /Users/inkel/dev/prometheus/alertmanager /Users/inkel/dev/prometheus/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager
GOPATH=/Users/inkel/dev/prometheus/alertmanager/.build/gopath /usr/local/bin/go get -d
touch dependencies-stamp
GOPATH=/Users/inkel/dev/prometheus/alertmanager/.build/gopath /usr/local/bin/go test ./...
?       _/Users/inkel/dev/prometheus/alertmanager   [no test files]
ok      _/Users/inkel/dev/prometheus/alertmanager/config    0.019s
?       _/Users/inkel/dev/prometheus/alertmanager/config/generated  [no test files]
ok      _/Users/inkel/dev/prometheus/alertmanager/manager   0.023s
?       _/Users/inkel/dev/prometheus/alertmanager/web   [no test files]
?       _/Users/inkel/dev/prometheus/alertmanager/web/api   [no test files]
?       _/Users/inkel/dev/prometheus/alertmanager/web/blob  [no test files]

Alert notification templates

All fields that we send out should be configurable via Go templates.

Tricky bit is how we handle numeric fields.

Store record of sent alerts between sessions

Currently the alert manager automatically resends all alerts when the session is restarted. Sent alerts should be persisted to disk so that the re-alert timeouts can be preserved between daemon-restarts.

In the case of alerts such as emails (which may go to shared queues) this is especially important. I'm thinking the simplest answer at the moment would be to adopt the json file approach that silence's use, until some clustering approach is determined.

Null pointer dereference

I received the following panic, not doing anything in particular, there were two active alerts at the time, one silenced. Let me know if there's any further information I can provide.

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x4e532d]

goroutine 15 [running]:
github.com/prometheus/alertmanager/manager.(*memoryAlertManager).removeExpiredAggregates(0xc208054280)
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:282 +0x23d
github.com/prometheus/alertmanager/manager.(*memoryAlertManager).runIteration(0xc208054280)
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:395 +0x33
github.com/prometheus/alertmanager/manager.(*memoryAlertManager).Run(0xc208054280)
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:408 +0x9a
created by main.main
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/main.go:111 +0xd50

goroutine 1 [chan receive, 15 minutes]:
github.com/prometheus/alertmanager/manager.(*notifier).Dispatch(0xc2080fc420)
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/notifier.go:813 +0x54
main.main()
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/main.go:157 +0x15c0

goroutine 14 [chan receive]:
main.func·001()
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/main.go:85 +0x84
created by main.main
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/main.go:90 +0x659

goroutine 16 [IO wait, 3 minutes]:
net.(*pollDesc).Wait(0xc208108fb0, 0x72, 0x0, 0x0)
        /usr/lib/golang/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc208108fb0, 0x0, 0x0)
        /usr/lib/golang/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).accept(0xc208108f50, 0x0, 0x7fb150e92e40, 0xc208276768)
        /usr/lib/golang/src/net/fd_unix.go:419 +0x40b
net.(*TCPListener).AcceptTCP(0xc20802e5b0, 0x63ab5e, 0x0, 0x0)
        /usr/lib/golang/src/net/tcpsock_posix.go:234 +0x4e
net/http.tcpKeepAliveListener.Accept(0xc20802e5b0, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/net/http/server.go:1976 +0x4c
net/http.(*Server).Serve(0xc208032a20, 0x7fb150e95528, 0xc20802e5b0, 0x0, 0x0)
        /usr/lib/golang/src/net/http/server.go:1728 +0x92
net/http.(*Server).ListenAndServe(0xc208032a20, 0x0, 0x0)
        /usr/lib/golang/src/net/http/server.go:1718 +0x154
net/http.ListenAndServe(0x904bd0, 0x5, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/net/http/server.go:1808 +0xba
github.com/prometheus/alertmanager/web.WebService.ServeForever(0xc2080fd050, 0xc2080fd080, 0xc20808c640, 0xc208030c30, 0x904bd0, 0x5, 0x9041d0, 0x1, 0x0, 0x0)
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/web/web.go:78 +0x8e6
created by main.main
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/main.go:145 +0x12f1

goroutine 17 [select, 15 minutes]:
github.com/prometheus/alertmanager/config.(*fileWatcher).Watch(0xc2080aa430, 0xc2080fd0b0)
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/config/watcher.go:49 +0x8fb
created by main.main
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/main.go:154 +0x14b0

goroutine 19 [syscall, 15 minutes]:
syscall.Syscall(0x0, 0x3, 0xc20816fee0, 0x10000, 0x0, 0x850f70, 0x7fb150e95560)
        /usr/lib/golang/src/syscall/asm_linux_amd64.s:21 +0x5
syscall.read(0x3, 0xc20816fee0, 0x10000, 0x10000, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/syscall/zsyscall_linux_amd64.go:867 +0x6e
syscall.Read(0x3, 0xc20816fee0, 0x10000, 0x10000, 0xffffffffffffffff, 0x0, 0x0)
        /usr/lib/golang/src/syscall/syscall_unix.go:136 +0x58
gopkg.in/fsnotify%2ev0.(*Watcher).readEvents(0xc2080325a0)
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify_linux.go:219 +0x12c
created by gopkg.in/fsnotify%2ev0.NewWatcher
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify_linux.go:126 +0x420

goroutine 20 [chan receive, 15 minutes]:
gopkg.in/fsnotify%2ev0.(*Watcher).purgeEvents(0xc2080325a0)
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify.go:21 +0x55
created by gopkg.in/fsnotify%2ev0.NewWatcher
        /home/mavdev/dave/prometheus/alertmanager/alertmanager/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify_linux.go:127 +0x43a

Trigger / Resolve alarms in OpsGenie

User Story: As a system administrator using OpsGenie for alerting and Prometheus for metrics and alerts based on metrics I don't want to have to use two different alerting systems.

In other words: OpsGenie support would be great but currently I don't have enough spare time to implement it myself. If someone has enough spare time and maybe would benefit themselves I'd be really happy to have that integration.

wait for refresh before sending alert after repeat_rate_seconds

Currently if I have my min refresh period set to 5 minutes, and I set my repeat_rate_seconds to 30 minutes I can end up getting 2 alerts for something that takes 25 minutes to resolve.

Once the time since the original alert crosses the repeat_rate_seconds it send an alert no matter what. This maybe was a design choice, but I feel like it would be better to also wait for a refresh to occur. In my example if something is checking that it is healthy every 30 seconds and it goes unhealthy for 26 minutes, then it will wait until 31 minutes after it originally went unhealthy before it gets cleared. This will cause 2 alerts to get sent out 1 at the beginning and 1 at the 30 minute mark, even though it was resolved and fixed at the 26 minute mark. This does not seem intuitive to me.

This is especially evident if you set the repeat_rate_seconds to be be less than the "alerts.min-refresh-period" since there will be no way to keep from getting multiple alerts.

Pushover: Resolved alerts are sent as emergency alerts

This is a follow-up from #90:

Well, now that you made me look at it again, I wonder whether "resolved" notifications should not be emergencies :) (@juliusv)

Confirmed, this totally should be the case! Shocked me a little bit to receive three new emergency notifications after the first bunch… xD

2015-08-01-22 45 39

Redirect /${pathPrefix} to /${pathPrefix}/

Since y'all are requiring that pathPrefix end up with / before and after it does not match /${pathPrefix} . I usually find this somewhat annoying because sometimes I forget to type the trailing /. I think that it would be nice to redirect users to add the /.

In prometheus everything that should be a 404 is redirected. I would not advocate doing this, because its somewhat strange behavior to not have a 404 page, but just doing it for the /${pathPrefix} to /${pathPrefix}/ seems relatively reasonable and is likely what the user meant to type.

When one of my docker container fail I get the alert as much time as my number of containers

here is my prometheus.yml

global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  external_labels:
    monitor: 'twitter-apps-monitor'

rule_files:
  - '/etc/prometheus/alert.rules'

scrape_configs:
    - job_name: 'streamapp'
      scrape_interval: 5s
      scrape_timeout: 10s
      target_groups:
        - targets: ['streamapp-container:3002']
          labels:
            job: 'streamapp'
    - job_name: 'dashboardapp'
      scrape_interval: 5s
      scrape_timeout: 10s
      target_groups:
        - targets: ['dashboardapp-container:3000']
          labels:
            job: 'dashboardapp'
    - job_name: 'analyzeapp'
      scrape_interval: 5s
      scrape_timeout: 10s
      target_groups:
        - targets: ['analyzeapp-container:3001']
          labels:
            job: 'analyzeapp'

All the containers are sinatra apps. I have setup '/metrics' routes on each service to have alertmanager behave correctly.
If I do not provide a '/metrics' route my sinatra app container then the container is detected has dead and the alert is sent immediately.
So ones the route is set, everything looks good but when I stop one container (let say dashboardapp). I get all 3 alerts in red and received 3 msg via slack. The 3 messages are saying that dashboardapp is down.

Am I doing things right?
I found it strange to have to provide a '/metrics' route and the fact that I received 3 times the same message is strange.
I am using cadvisor for the prometheus analytics.

Support SNMP trap notifications

I am logging this in the hopes of getting to it soon. I am currently setting up Prometheus at DST Systems for my group, but we only have an SNMP trap for external notifications when we aren't at work. We are trying to get something like pagerduty, but if we don't, then I'll need to write something for submitting an SNMP trap to an agent.

I may start really rough using an os.exec with the snmptrap command. I haven't found any good snmp libraries with trap support, but if anyone knows of one, please point me to it.

tag release version in docker repo

I think that it would be better to be able to specify a specific version that I want to run of alert manager rather than just using docker run prom/alertmanager something like docker run prom/alertmanager:0.0.4

Nice to have would be having MAJOR and MINOR tags as well, so that I can commit to running something like 0.0.X or 0.X.X

Pushover uses normal prio instead of "emergency"

As a ops guy getting alerts from the alertmanager I use pushover to get notified. In the current code the notification is sent with default (=normal) priority which triggers no notification using the alarm channel and I'm not woken up.

The priority should either get set to pushover.Emergency or made configurable in the configuration.

use go generate and not xxd

I dont have xxd package on my distro, and as i see build requires go 1.5 , so may be force to use go generate to embed data ?

Unable to make the project

I am unable to make the project, I get the following error.

GOROOT=/usr/local/Cellar/go/1.4.2/libexec GOPATH=/Users/dummyuser/workspace/alertmanager/.build/gopath /usr/local/Cellar/go/1.4.2/libexec/bin/go get -d
go: missing Mercurial command. See http://golang.org/s/gogetcmd
package bitbucket.org/ww/goautoneg: exec: "hg": executable file not found in $PATH
make: *** [dependencies-stamp] Error 1

Nil pointer derefrence

I have been having an alert going off for a while. It is just a warning. It also is flapping back and forth. I think that this is possibly causing this Nil pointer. I am not sure what is needed to make this issue more clear, so tell me if I can help.

I notice that there is a lot of logs about 'Recomputing notifications'
If I run increase(prometheus_notifications_latency_milliseconds_count[10m]) I get 40. This to mean implies that prometheus is sending the alert every 15 seconds. There are 3 alerts going off. So this does not really make sense to me that I would see the 'Recomputing' every second.

I am running version 216d9dd:0.0.4 of prom/alertmanager from the docker registry, and I am running 64349aa:0.15.1 of prometheus.

Oct 07 15:58:51 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:58:51Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:58:52 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:58:52Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:58:53 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:58:53Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:58:54 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:58:54Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:58:55 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:58:55Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:58:56 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:58:56Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:58:57 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:58:57Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:58:58 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:58:58Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:58:59 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:58:59Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:00 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:00Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:01 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:01Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:02 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:02Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:03 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:03Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:04 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:04Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:05 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:05Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:06 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:06Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:07 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:07Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:08 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:08Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:09 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:09Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:10 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: time="2015-10-07T15:59:10Z" level=info msg="Recomputing notification outputs (active alerts have changed)" file=manager.go line=398
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: panic: runtime error: invalid memory address or nil pointer dereference
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: [signal 0xb code=0x1 addr=0x0 pc=0x4e532d]
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: goroutine 14 [running]:
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: github.com/prometheus/alertmanager/manager.(*memoryAlertManager).removeExpiredAggregates(0xc208125860)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:282 +0x23d
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: github.com/prometheus/alertmanager/manager.(*memoryAlertManager).runIteration(0xc208125860)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:395 +0x33
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: github.com/prometheus/alertmanager/manager.(*memoryAlertManager).Run(0xc208125860)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:408 +0x9a
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: created by main.main
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/main.go:111 +0xd50
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: goroutine 1 [chan receive, 24 minutes]:
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: github.com/prometheus/alertmanager/manager.(*notifier).Dispatch(0xc2080ae6c0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/.build/gopath/src/github.com/prometheus/alertmanager/manager/notifier.go:816 +0x54
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: main.main()
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/main.go:157 +0x15c0
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: goroutine 19 [chan receive, 22 minutes]:
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: gopkg.in/fsnotify%2ev0.(*Watcher).purgeEvents(0xc208066240)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify.go:21 +0x55
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: created by gopkg.in/fsnotify%2ev0.NewWatcher
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify_linux.go:127 +0x43a
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: goroutine 18 [syscall, 24 minutes]:
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: syscall.Syscall(0x0, 0x3, 0xc208189ee0, 0x10000, 0x0, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/syscall/asm_linux_amd64.s:21 +0x5
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: syscall.read(0x3, 0xc208189ee0, 0x10000, 0x10000, 0x0, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/syscall/zsyscall_linux_amd64.go:867 +0x6e
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: syscall.Read(0x3, 0xc208189ee0, 0x10000, 0x10000, 0x0, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/syscall/syscall_unix.go:136 +0x58
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: gopkg.in/fsnotify%2ev0.(*Watcher).readEvents(0xc208066240)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify_linux.go:219 +0x12c
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: created by gopkg.in/fsnotify%2ev0.NewWatcher
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/.build/gopath/src/gopkg.in/fsnotify.v0/fsnotify_linux.go:126 +0x420
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: goroutine 13 [chan receive]:
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: main.func·001()
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/main.go:85 +0x84
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: created by main.main
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/main.go:90 +0x659
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: goroutine 15 [IO wait]:
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: net.(*pollDesc).Wait(0xc20809c840, 0x72, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/net/fd_poll_runtime.go:84 +0x47
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: net.(*pollDesc).WaitRead(0xc20809c840, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: net.(*netFD).accept(0xc20809c7e0, 0x0, 0x7f6e0747ed70, 0xc2082c2b30)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/net/fd_unix.go:419 +0x40b
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: net.(*TCPListener).AcceptTCP(0xc20804e438, 0x63a5ae, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/net/tcpsock_posix.go:234 +0x4e
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: net/http.tcpKeepAliveListener.Accept(0xc20804e438, 0x0, 0x0, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/net/http/server.go:1976 +0x4c
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: net/http.(*Server).Serve(0xc2080665a0, 0x7f6e07482658, 0xc20804e438, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/net/http/server.go:1728 +0x92
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: net/http.(*Server).ListenAndServe(0xc2080665a0, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/net/http/server.go:1718 +0x154
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: net/http.ListenAndServe(0x90ecf0, 0x5, 0x0, 0x0, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /usr/local/go/src/net/http/server.go:1808 +0xba
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: github.com/prometheus/alertmanager/web.WebService.ServeForever(0xc2080aef90, 0xc2080aefc0, 0xc20802fc20, 0xc208099360, 0x90ecf0, 0x5, 0x7ffff82c9e5a, 0x8, 0x0, 0x0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/.build/gopath/src/github.com/prometheus/alertmanager/web/web.go:78 +0x8e6
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: created by main.main
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/main.go:145 +0x12f1
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: goroutine 16 [select, 24 minutes]:
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: github.com/prometheus/alertmanager/config.(*fileWatcher).Watch(0xc208033b40, 0xc2080aeff0)
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/.build/gopath/src/github.com/prometheus/alertmanager/config/watcher.go:49 +0x8fb
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: created by main.main
Oct 07 15:59:11 ip-172-31-23-215.us-west-2.compute.internal docker[18598]: /app/main.go:154 +0x14b0

everytime alertmanager panics, it restarts. It would be nice to know what is going on.

Email client should attempt to use STARTTLS whenever its offered.

Forgive the ignorance if this is untrue since I'm pretty unfamiliar with Go; my understanding from reading through manager/notifier.go is that AUTH appears to be sent to the server shortly after initial connection, and that STARTTLS follows after AUTH. (Host/Port information is extracted in the getSMTPAuth function to setup the tls config).

For both security reasons (especially with PLAIN authentication) as well as some providers mandating such, a STARTTLS command should be issued prior to AUTH. This would allow the password to have some reasonable level of protection during transmit and also fix issues with providers such as Gmail/Google that mandate STARTTLS prior to AUTH.

Currently using AlertManager with Gmail results in the following error:
530 5.7.0 Must issue a STARTTLS command first. gs7sm7571158pbc.6 - gsmtp

Add sns notification support back

Hi all,
i'm trying to send notification to a SNS topic from alertmanager for trigger autoscaling of ECS,
i tried successfully slack notification,
but when i try to enable SNS notification the alertmanager panic with this error :
time="2015-11-18T15:20:11Z" level=info msg="Sent 1 notifications" file=manager.go line=353
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x6b51d2]

goroutine 1 [running]:
github.com/aws/aws-sdk-go/service/sns.New(0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/app/.build/gopath/src/github.com/aws/aws-sdk-go/service/sns/service.go:53 +0x72
github.com/prometheus/alertmanager/manager.(_notifier).sendAmazonSnsNotification(0xc820194ae0, 0x1, 0xc820195560, 0xc82017c580, 0x0, 0x0)
/app/.build/gopath/src/github.com/prometheus/alertmanager/manager/notifier.go:464 +0x55
github.com/prometheus/alertmanager/manager.(_notifier).handleNotification(0xc820194ae0, 0xc82017c580, 0x1, 0xc8200d9500)
/app/.build/gopath/src/github.com/prometheus/alertmanager/manager/notifier.go:809 +0x853
github.com/prometheus/alertmanager/manager.(*notifier).Dispatch(0xc820194ae0)
/app/.build/gopath/src/github.com/prometheus/alertmanager/manager/notifier.go:817 +0x86
main.main()
/app/main.go:159 +0x1280

goroutine 17 [syscall, 1 minutes, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

i"m using Docker images with this release:

branch master
date 20151109-20:48:36
go_version 1.5.1
revision 86f5f80
user @afbbf75ae75d
version 0.0.4

and alertmanager.conf is this:

notification_config {
name: "alertmanager_test"
amazon_sns_config {
topic_arn: "arn:aws:sns:us-east-1:1234567890X:MyTopicName"
send_resolved: true
}
}

aggregation_rule {
repeat_rate_seconds: 300
notification_config_name: "alertmanager_test"
}

My goal is authenticate to AWS using Instance-role, so i created one that permit action to AWS SNS, ( it's supported by AWS-SDK ) , but i also tried using credentials and environment variable but i had same issue

Best,
Enrico Casti

status page has absolute links instead of relative paths

I noticed that the /status page in AlertManager has broken CSS. It ended up being because the CSS links are incorrect.
We proxy to /alertmanager with nginx, but I confirmed it also misbehaves talking directly to AlertManager. This happens on at least 0.0.3 and 0.0.4. I don't see anything in the source that would cause /status to be treated differently.

We are running alertmanager -config.file=some_config.conf, not bothering to set any other command-line parameters.

  • Relative hrefs for /, /alerts and /silences. CSS loads just fine.
curl -s localhost:9093/ |grep default.css
    <link href="static/css/default.css" media="all" rel="stylesheet" type="text/css" />

curl -s localhost:9093/silences |grep default.css
    <link href="static/css/default.css" media="all" rel="stylesheet" type="text/css" />
  • Absolute hrefs, not correct. Links all broken, no CSS styles loaded.
curl -s localhost:9093/status |grep default.css
    <link href="/static/css/default.css" media="all" rel="stylesheet" type="text/css" />

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.