Git Product home page Git Product logo

Comments (8)

darkcrux avatar darkcrux commented on August 19, 2024

Haven't tried consul-alerts on this scale before. How many instances of consul-alerts is running? I think the slow-down might be caused by the sheer volume of checks being processed. There are no metrics at the moment but I'm keen on finding out how to scale it to such a size.

from consul-alerts.

macb avatar macb commented on August 19, 2024

After messing around with consul-alerts it seems like https://github.com/AcalephStorage/consul-alerts/blob/master/consul/client.go#L223-L257 would severely limit performance. In a smaller datacenter with lower 4 digit checks the loop appears to take a minute. This causes https://github.com/AcalephStorage/consul-alerts/blob/master/check-handler.go#L57-L61 to take longer than expected and generally slow everything down.

from consul-alerts.

akmalabbasov avatar akmalabbasov commented on August 19, 2024

@macb , I have similar situation, I'm running consul-alerts in ~70 servers, ~5 checks in each. https://github.com/AcalephStorage/consul-alerts/blob/master/consul/client.go#L223-L257 is taking at ~1 minute, so it takes ~7 minutes for https://github.com/AcalephStorage/consul-alerts/blob/master/check-handler.go#L57-L61 to run. Any ideas to improve this? Thanks.

from consul-alerts.

mar-io avatar mar-io commented on August 19, 2024

I am using this for about 100 servers in AWS and definitely have noticed some inefficiencies and high cpu demands. I am being pretty ambitious and have servers with about 20 checks running. I have to use compute optimized c4 instances to be able to run at this scale and I assume that that the strain is going to increase. However, I think this is great project and I will try to dig into the code and help where I can as well.

from consul-alerts.

darkcrux avatar darkcrux commented on August 19, 2024

There are few factors affecting performance.

  1. Consul-Alerts is dependent on the "watcher" feature of Consul. This watches the health checks for changes. Any status change triggers consul to send the entire list of health checks from all nodes. This is what consul-alerts processes. (eg. 70 servers * 5 checks = 350 checks to check every time a change is detected).
  • still thinking of a way to just get the changed health check instead of all
  1. The code processes the checks + sending notifications in a linear way.
  • go routines might speed things up

from consul-alerts.

ariscn avatar ariscn commented on August 19, 2024

Any updates on this? I have a large deployment that's becoming Consul aware, and I'd love to use consul-alerts for notifications. Expected stats: ~200 servers, ~700 services, ~3000 total health checks.

from consul-alerts.

fusiondog avatar fusiondog commented on August 19, 2024

I feel like the real fix for this needs to come from upstream in consul. There is a ticket, I can't find right now, to have consul return only changed entries in the watches. That would be the ideal fix, with maybe full comparisons occasionally run for a sanity check.

from consul-alerts.

rhuddleston avatar rhuddleston commented on August 19, 2024

I noticed consul-alerts takes considerable resources on our server and consul itself gets very busy with writes when we have consul running so I traced it for a minute in our test environment and made these observations:

The main issue seems to be all the writes it produces
e.g. it seems like it loops over every check and re-writes the content each time even though those contents likely didn't change:

count URL prefix
680 PUT /v1/kv/consul-alerts/checks

count URL prefix
136 PUT /v1/kv/consul-alerts/checks/ecs-1269316829
120 PUT /v1/kv/consul-alerts/checks/ecs-205916921
104 PUT /v1/kv/consul-alerts/checks/ecs-2743417484
104 PUT /v1/kv/consul-alerts/checks/ecs-3237410996
72 PUT /v1/kv/consul-alerts/checks/node1
72 PUT /v1/kv/consul-alerts/checks/node2
72 PUT /v1/kv/consul-alerts/checks/node3

At a minimum it's already reading the contents of this on every loop so it should know if the content has changed. Doing these writes each time seems to have the largest overhead on consul

Some other observations in this capture:

688 calls to reminders prefix.
GET /v1/kv/consul-alerts/reminders

Given we have nothing down that prefix we could just do /v1/kv/consul-alerts/reminders?recurse in one shot

consul-alerts is doing 1375 calls to checks
GET /v1/kv/consul-alerts/checks

we seem to do this in more than one place and could make sense to do this in one shot.

We are doing 2674 calls into config
GET /v1/kv/consul-alerts/config/checks/blacklist

again one shot would be better as we have nothing in the blacklist

A couple of these changes could greatly reduce the overhead of consul-alerts running

from consul-alerts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.