Git Product home page Git Product logo

Comments (11)

darkcrux avatar darkcrux commented on August 19, 2024

sorry @lyrixx, I think v0.3 might be buggy. Not sure if this is already resolved. Finally got time to try and fix these issues. Been away for a while. :(

from consul-alerts.

bcwilsondotcom avatar bcwilsondotcom commented on August 19, 2024

Running into this as well. I'll get alerts when a service is healthy, but sometimes I don't get an alert when the service failed. I'd call this a critical issue as it breaks it's primary function.

from consul-alerts.

darkcrux avatar darkcrux commented on August 19, 2024

I think I fudged up the code in previous releases. Where I switched Critical and Success variables. It should be resolved with the later releases. Will release a new build soon.

from consul-alerts.

akmalabbasov avatar akmalabbasov commented on August 19, 2024

@darkcrux, I'm running 0.3.3, and I've noticed the same behaviour. I'm getting notification that service is healthy, but not when it's failed.
Here you can see that hbase_status check was reported three times in a row, with healty status

[consul-notifier] 2016/02/01 14:53:14 Node=test-hbase-m2, Service=hbase, Check=hbase_status, Status=passing
[consul-notifier] 2016/02/01 14:53:14 Node=test-hbase-m2, Service=hdfs, Check=hdfs_status, Status=passing
[consul-notifier] 2016/02/01 15:07:43 Node=test-hbase-m2, Service=hbase, Check=hbase_status, Status=passing
[consul-notifier] 2016/02/01 15:21:46 Node=test-hbase-m2, Service=hbase, Check=hbase_status, Status=passing

I've checked the logs of other consul-alerts daemons, and can confirm that there were no change of the leadership.

from consul-alerts.

juicedM3 avatar juicedM3 commented on August 19, 2024

We're seeing the same as issue as @akmalabbasov and we're running 0.3.3. We'll receive more Status=passing messages then critical messages. For example, we had a service go down on 8/9 @ 4am. Never received an email. The event did show in Consul's log (simple TCP check). The issue was corrected on 8/10 @ 11pm and we got the passing email.

from consul-alerts.

imrangit avatar imrangit commented on August 19, 2024

This is still the issue with the latest release. We are running consul 0.6.4. We do get bunch of HEALTHY alerts but sometime it fails to generate CRITICAL alerts.

from consul-alerts.

fusiondog avatar fusiondog commented on August 19, 2024

The v0.3.3 release is well behind the current master. I have set a new release that is current with master. Please try that or the latest from the repository master branch and see if you still see the inconsistencies.

from consul-alerts.

juicedM3 avatar juicedM3 commented on August 19, 2024

Sorry, I was using 0.3.3 as a reference point, but we were actually pulling from master. The last time we built our image was on 6/27. Looking through the commit logs, doesn't seem like there's been too many updates wrt notifications. We'll build a new image based on the latest from the repository.

from consul-alerts.

juicedM3 avatar juicedM3 commented on August 19, 2024

We rebuilt our image with the latest and greatest and did some more digging. What we think is going on might not be related to the original poster's issue and may or may not spawn some enhancement.

So we run mesos-consul and consul-alerts. mesos-consul registers a health check for each agent and gives it a service and check ID with the Mesos Agent ID as part of the key. consul-alerts sees this and does everything as expected. The kv pair consul-alerts generates can be seen under /consul-alerts/checks/NODE/ServiceID/CheckID.

Now the agent dies and restarts. It happens within the 60 second period that consul-alerts doesn't send out any alerts. However, mesos-consul has since generated a new service and check ID associated to that node. consul-alerts says, "Oh hey! There's a new health check registered with Consul, lets register it!". Now the old and the new kv pair exists under /consul-alerts/checks/NODE and we sometimes get what appears to be a random email saying everything is healthy since there was no unhealthy email.

Picking one of our nodes, I can see under consul-alerts/checks, there are 6 KV pairs for the same check but with different service and check ID. 5 of them are outdated and only one of them is valid. Not sure if this would eventually cause any performance issues, but if it's consistent across our entire cluster, that's over 1,000 kv pairs that are outdated.

So it would be nice if consul-alerts could clean up after itself a little. For each KV pair it has registered, check to make sure they are still valid. I've been looking at the code for mesos-consul and you can't tell it not to use the Agent's ID. But I'm going to open a ticket with them to see if there's a solution so that mesos-consul & consul-alerts can be on the same page. The use of Agent ID throws a monkey wrench into the relationship.

from consul-alerts.

fusiondog avatar fusiondog commented on August 19, 2024

Thanks for the follow up and details. When I get a chance I'll see what it would take for the cleanup mentioned.

from consul-alerts.

lyrixx avatar lyrixx commented on August 19, 2024

I don't use this tool anymore. I'm closing this issue since it's not relevant anymore.
thanks for your work

from consul-alerts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.