Git Product home page Git Product logo

Comments (4)

keitwb avatar keitwb commented on August 13, 2024

Sure, having metrics about the agent and its monitors themselves is a good idea; I've been thinking about doing it for a while. Events would provide an exact error message, but metrics are a lot more scaleable in our backend, and events aren't really intended to be used for potentially high-frequency error logs anyway in our current product offering.

Therefore, having metrics for monitors that track failures to connect/read from a service being monitored is the best thing to do. Which monitors/collectd plugins specifically are you dealing the most with? I can try adding some metrics for a few of them and see how it works for you.

I think maintaining the current logging output though is pretty essential for basic debugging of the agent and to know more context about the errors directly from the agent. Not sure how much control you have over log retention but I can't imagine there is much value in keeping agent logs for more than a few days anyway unless you are trying to do some kind of analytics on them. I know you were talking about filtering in #719 and that is probably your best bet for this kind of thing. You can filter out the most common stuff (e.g. a connection refused/timeout error) but still let through rarer error conditions (e.g. a 500 response from a diagnostic endpoint on a service) so that they show up in logs.

from signalfx-agent.

MovieStoreGuy avatar MovieStoreGuy commented on August 13, 2024

Yeah, I want use this as reducing the amount of things ending up in our logging system.
I was having a discussion else where about having detectors for events but I gathered that is a bit of time away.

So the monitors I have mainly used that have been creating a lot of noise are:

  • expvar
  • collectd/genericjmx
    We also get a decent amount of EOF or timed out awaiting headers errors so we also don't really don't care about that per say and we definitely don't want service owners to freak out that there is an issue when it is just a NOOP.

I was thinking about the dimensions that would be useful and the ones I have are:

  • monitor
  • monitor_version
  • agent_version
    So that way we can see if bumping agent version increases / decreases issues as we run the smart agent across all of instances.

We log an insane amount a day so anything we can do to reduce is always appreciated for those who maintain our logging system. Also, while we are on the topic of logging. How hard of a change would it be to switch logging format as a config option?

from signalfx-agent.

keitwb avatar keitwb commented on August 13, 2024

Ok, I'll look at expvar since it would be the easiest to generate metrics for starting out.

Would it make any difference if I changed the connection errors to monitored services logged at a WARN (or even INFO) level instead of ERROR level? I don't think it is feasible to totally stop logging those type of things, even with metrics, because of how convenient it is to be able to see misconfiguration and/or see the exact details about the error (i.e. connection timeout, response read timeout, connection refused, etc.) by looking at the log output.

Also, while we are on the topic of logging. How hard of a change would it be to switch logging format as a config option?

It is pretty easy since we use logrus for logging. We already use their structured log mechanism quite a bit so you would get some amount of metadata out of the box (e.g. monitorType is a common field we use). Would you want JSON output?

from signalfx-agent.

MovieStoreGuy avatar MovieStoreGuy commented on August 13, 2024

Ok, I'll look at expvar since it would be the easiest to generate metrics for starting out.
Sweet, keep me in the loop regarding this because I would love to know.

Yeah, it would be sick if you could dropped the log level for failing to collect metrics. Currently we have the level set to error.

Is there a standard for what you guys follow for what things are logged at what level?
Or how many characters should be part of a log?
I did have a look as to how to figure the logger to emit logs as json rather than text but I got distracted doing other things.

from signalfx-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.