Git Product home page Git Product logo

Comments (10)

satterly avatar satterly commented on May 11, 2024

This has been on the backlog for a while... waiting for someone with a real use case to request it. Suggestions on how it might work in practice are welcome.

I'm looking at using a plugin to suppress the alert if it meets certain criteria -- but what criteria and how it is determined (and updated) is open to debate.

from alerta.

blysik avatar blysik commented on May 11, 2024

I was thinking about this. The first thing that came to mind is a list of regular expressions. If an alert matches, note that it's in maintenance mode instead of doing the normal behavior.

alerta-api gets CRUD for maintenance mode expressions. Python client and web-UI gets enhancements to deal with them.

from alerta.

satterly avatar satterly commented on May 11, 2024

I'm reluctant to support a list of rules based on regular expressions because it then becomes a case of iterating through the list of rules, running the regex and seeing if any match. This could potentially be very bad for performance. Alternatively, you use an event stream processor but that is on another level of complexity.

The simple alternative, as far as I can tell, is to use equality match rules that work as a simple query. Alerta defines many different alert attributes that can be used to group alerts and it is these attributes that I have used to define rules eg. environment, service, group. However, resource and event attributes are still supported for situations that require that level of granularity.

I have also supported the use of tags to define a blackout rule which should allow a lot of flexibility -- one or more tags can be required to match an alert for the suppression to apply. And tags can be added at source, using the alerta CLI, or using a plug-in.

I have only done the work on the API at present. Feedback on this would be welcome before I start work on adding support to the web UI and CLI tool.

from alerta.

blysik avatar blysik commented on May 11, 2024

That makes sense.

I think if my users could say something like, "mute all alerts where resource=hostname", that would work. Or "service=some-service".

from alerta.

blysik avatar blysik commented on May 11, 2024

The reason I initially thought of regexes, is if the users weren't certain what the 'resource' was going to look like. ie, did they correctly use FQDN, or short name? Or to make it easier to do something like mute hostname[1-30].company.com.

from alerta.

satterly avatar satterly commented on May 11, 2024

I completely understand the requirement for something like hostname[1-30].company.com however this is very difficult to do efficiently. I would suggest that to achieve this you would tag those 30 hosts with whatever makes them common (eg. frontend) and use that tag to add/remove them from maintenance.

Using tags is much more flexible as well -- what if you add hostname31.company.com then you'd need to update your regex to hostname[1-31].company.com but if you used tags it would match without any extra work.

from alerta.

bcwilsondotcom avatar bcwilsondotcom commented on May 11, 2024

Right now, I'm roughly achieving this by manually editing the pagerduty.py script which sends pages out. If I know something is going into maintenance and I don't want it to page out, I just pop a line in the script to catch whatever it is, a host, a service, an environment, etc, and then not continue. Obviously this is a horrible way of accomplishing it, and it also requires a restart of the alerta server, but it works until we get something real into Alerta.

I think tags are a good way of looking at this for putting certain services into maintenance. But what about a specific host that contains multiple services when those services are also part of a cluster with other nodes? I don't want to maintenance the cluster as a whole, just the service with that name on a specific node, or possibly anything coming from that specific node.

from alerta.

bcwilsondotcom avatar bcwilsondotcom commented on May 11, 2024

Actually, it looks like your PR #112 pretty much knocks out everything I'd personally need. 👍

from alerta.

satterly avatar satterly commented on May 11, 2024

@bcwilsondotcom the only combination you mention not currently supported by #112 would be putting into maintenance only certain services from a specific node. Something like this combination could be added if there was no other way to match this category of alerts with the currently proposed rules.

from alerta.

satterly avatar satterly commented on May 11, 2024

This is now available for use as version 4.5.0 (both server and client versions). The web UI has also been updated -- the "blackouts" page is a menu option under "Configuration".

If you have any problems with it or would like changes/enhancements please raise a new issue.

from alerta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.