Comments (4)
Sure, having metrics about the agent and its monitors themselves is a good idea; I've been thinking about doing it for a while. Events would provide an exact error message, but metrics are a lot more scaleable in our backend, and events aren't really intended to be used for potentially high-frequency error logs anyway in our current product offering.
Therefore, having metrics for monitors that track failures to connect/read from a service being monitored is the best thing to do. Which monitors/collectd plugins specifically are you dealing the most with? I can try adding some metrics for a few of them and see how it works for you.
I think maintaining the current logging output though is pretty essential for basic debugging of the agent and to know more context about the errors directly from the agent. Not sure how much control you have over log retention but I can't imagine there is much value in keeping agent logs for more than a few days anyway unless you are trying to do some kind of analytics on them. I know you were talking about filtering in #719 and that is probably your best bet for this kind of thing. You can filter out the most common stuff (e.g. a connection refused/timeout error) but still let through rarer error conditions (e.g. a 500 response from a diagnostic endpoint on a service) so that they show up in logs.
from signalfx-agent.
Yeah, I want use this as reducing the amount of things ending up in our logging system.
I was having a discussion else where about having detectors for events but I gathered that is a bit of time away.
So the monitors I have mainly used that have been creating a lot of noise are:
- expvar
- collectd/genericjmx
We also get a decent amount of EOF or timed out awaiting headers errors so we also don't really don't care about that per say and we definitely don't want service owners to freak out that there is an issue when it is just a NOOP.
I was thinking about the dimensions that would be useful and the ones I have are:
- monitor
- monitor_version
- agent_version
So that way we can see if bumping agent version increases / decreases issues as we run the smart agent across all of instances.
We log an insane amount a day so anything we can do to reduce is always appreciated for those who maintain our logging system. Also, while we are on the topic of logging. How hard of a change would it be to switch logging format as a config option?
from signalfx-agent.
Ok, I'll look at expvar since it would be the easiest to generate metrics for starting out.
Would it make any difference if I changed the connection errors to monitored services logged at a WARN (or even INFO) level instead of ERROR level? I don't think it is feasible to totally stop logging those type of things, even with metrics, because of how convenient it is to be able to see misconfiguration and/or see the exact details about the error (i.e. connection timeout, response read timeout, connection refused, etc.) by looking at the log output.
Also, while we are on the topic of logging. How hard of a change would it be to switch logging format as a config option?
It is pretty easy since we use logrus for logging. We already use their structured log mechanism quite a bit so you would get some amount of metadata out of the box (e.g. monitorType
is a common field we use). Would you want JSON output?
from signalfx-agent.
Ok, I'll look at expvar since it would be the easiest to generate metrics for starting out.
Sweet, keep me in the loop regarding this because I would love to know.
Yeah, it would be sick if you could dropped the log level for failing to collect metrics. Currently we have the level set to error.
Is there a standard for what you guys follow for what things are logged at what level?
Or how many characters should be part of a log?
I did have a look as to how to figure the logger to emit logs as json rather than text but I got distracted doing other things.
from signalfx-agent.
Related Issues (20)
- how to get the sfx metric details and its complete description. HOT 3
- SignalFx agent installation on proxy based server HOT 3
- package doesn't include libnss_resolve HOT 2
- From where we can download the signalfx agent latest rpm file to install.
- Signalfx agent is unable to pick the proxy configurations
- how to run a powershell script from exec plugin HOT 2
- Deprecation notice doesn't make sense to me HOT 2
- Issue regarding metric sf.org.num.mutingactive HOT 1
- Error: " Unable to collect username for process " from SignalFx agent HOT 1
- Signalfx agent making IMDSv1 request HOT 1
- StatsD parser doesn't handle tag values that contain colons
- API to get the list of CI reporting to console from any cloud native like EC2 instances HOT 1
- logLevel is not working in signalfx helm chart HOT 1
- Agent Occasionally Drops HTTP POST Connections HOT 1
- Helm chart version 1.9.4 does not work with App version 5.21.0 (error retrieving resource lock during leaderelection) HOT 2
- What is the upcoming release schedule? HOT 2
- Monitor never create with discoveryRule HOT 2
- metrics monitor `container_cpu_utilization` is seconds not percentages HOT 2
- Bump issue in k8s deployment files HOT 1
- check_links action should ignore SQL link HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from signalfx-agent.