Comments (10)
This has been on the backlog for a while... waiting for someone with a real use case to request it. Suggestions on how it might work in practice are welcome.
I'm looking at using a plugin to suppress the alert if it meets certain criteria -- but what criteria and how it is determined (and updated) is open to debate.
from alerta.
I was thinking about this. The first thing that came to mind is a list of regular expressions. If an alert matches, note that it's in maintenance mode instead of doing the normal behavior.
alerta-api gets CRUD for maintenance mode expressions. Python client and web-UI gets enhancements to deal with them.
from alerta.
I'm reluctant to support a list of rules based on regular expressions because it then becomes a case of iterating through the list of rules, running the regex and seeing if any match. This could potentially be very bad for performance. Alternatively, you use an event stream processor but that is on another level of complexity.
The simple alternative, as far as I can tell, is to use equality match rules that work as a simple query. Alerta defines many different alert attributes that can be used to group alerts and it is these attributes that I have used to define rules eg. environment
, service
, group
. However, resource
and event
attributes are still supported for situations that require that level of granularity.
I have also supported the use of tags
to define a blackout rule which should allow a lot of flexibility -- one or more tags can be required to match an alert for the suppression to apply. And tags can be added at source, using the alerta
CLI, or using a plug-in.
I have only done the work on the API at present. Feedback on this would be welcome before I start work on adding support to the web UI and CLI tool.
from alerta.
That makes sense.
I think if my users could say something like, "mute all alerts where resource=hostname", that would work. Or "service=some-service".
from alerta.
The reason I initially thought of regexes, is if the users weren't certain what the 'resource' was going to look like. ie, did they correctly use FQDN, or short name? Or to make it easier to do something like mute hostname[1-30].company.com.
from alerta.
I completely understand the requirement for something like hostname[1-30].company.com
however this is very difficult to do efficiently. I would suggest that to achieve this you would tag those 30 hosts with whatever makes them common (eg. frontend
) and use that tag to add/remove them from maintenance.
Using tags is much more flexible as well -- what if you add hostname31.company.com then you'd need to update your regex to hostname[1-31].company.com
but if you used tags it would match without any extra work.
from alerta.
Right now, I'm roughly achieving this by manually editing the pagerduty.py script which sends pages out. If I know something is going into maintenance and I don't want it to page out, I just pop a line in the script to catch whatever it is, a host, a service, an environment, etc, and then not continue. Obviously this is a horrible way of accomplishing it, and it also requires a restart of the alerta server, but it works until we get something real into Alerta.
I think tags are a good way of looking at this for putting certain services into maintenance. But what about a specific host that contains multiple services when those services are also part of a cluster with other nodes? I don't want to maintenance the cluster as a whole, just the service with that name on a specific node, or possibly anything coming from that specific node.
from alerta.
Actually, it looks like your PR #112 pretty much knocks out everything I'd personally need. 👍
from alerta.
@bcwilsondotcom the only combination you mention not currently supported by #112 would be putting into maintenance only certain services from a specific node. Something like this combination could be added if there was no other way to match this category of alerts with the currently proposed rules.
from alerta.
This is now available for use as version 4.5.0 (both server and client versions). The web UI has also been updated -- the "blackouts" page is a menu option under "Configuration".
If you have any problems with it or would like changes/enhancements please raise a new issue.
from alerta.
Related Issues (20)
- alerta.plugins.zabbix[41]: [ERROR] Zabbix: eventId missing from alert attributes
- Option to omit merging of severity+description to text field
- DEFAULT_FIELD in alertad.conf is not working - Search bar
- Selection box on the top
- How to create new own plugins and deploy HOT 4
- AUTO_REFRESH_INTERVAL; be able to easily copy/paste alert.
- How to create a plugin that have a callback for alert create
- unable to query alerts api for more than 2 tags. / params HOT 1
- Sound configuration on basis of severity HOT 6
- How to set colors for severity label and status label? HOT 1
- Weak input validation HOT 1
- Assignign alerts to groups/users in UI HOT 4
- pymongo has to be manually installed HOT 1
- Update Python, Postgres and MongoDB versions
- can't match LDAP/AD group and assign permissions accordingly HOT 1
- Include option to turn OFF alerta's De-Duplication
- Change password button for simple user
- Unable to create alerts - Input has too many columns HOT 1
- Received 404 page, when we refresh the screen on WEB UI after login (via F5 button or Reload Button) from ALL Screens HOT 5
- Next release of alerta docker ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alerta.