Git Product home page Git Product logo

datadog / integrations-core Goto Github PK

View Code? Open in Web Editor NEW
880.0 465.0 1.4K 174.01 MB

Core integrations of the Datadog Agent

License: BSD 3-Clause "New" or "Revised" License

Python 98.46% Ruby 0.01% Shell 0.64% Go 0.04% Erlang 0.09% PHP 0.01% JavaScript 0.01% Dockerfile 0.27% HCL 0.05% Batchfile 0.01% HTML 0.01% TypeScript 0.01% Roff 0.01% TSQL 0.28% COBOL 0.07% Kotlin 0.01% Swift 0.01% Scala 0.04% PowerShell 0.02% Lua 0.02%
datadog-agent datadog dd integrations

integrations-core's Introduction

Datadog Integrations - Core

CI/CD CI - Test CI - Coverage
Docs Docs - Release
Meta Hatch project Linting - Ruff Code style - black Typing - Mypy License - BSD-3-Clause

This repository contains open source integrations that Datadog officially develops and supports. To add a new integration, please see the Integrations Extras repository and the accompanying documentation.

The Datadog Agent packages are equipped with all the Agent integrations from this repository, so to get started using them, you can simply install the Agent for your operating system. The AGENT_CHANGELOG file shows which Integrations have been updated in each Agent version.

Contributing

Working with integrations is easy, the main page of the development docs contains all the info you need to get your dev environment up and running in minutes to run, test and build a Check. More advanced documentation can be found here.

Reporting Issues

For more information on integrations, please reference our documentation and knowledge base. You can also visit our help page to connect with us.

integrations-core's People

Contributors

alexandreyang avatar alopezz avatar christinetchen avatar coignetp avatar cswatt avatar djova avatar fanny-jiang avatar florentclarret avatar florianveaux avatar florimondmanca avatar gmmeyer avatar hithwen avatar ian28223 avatar iliakur avatar l0k0ms avatar lu-zhengda avatar masci avatar mgarabed avatar nbparis avatar nmuesch avatar ofek avatar ruthnaebeck avatar sarah-witt avatar steveny91 avatar swang392 avatar therve avatar truthbk avatar xvello avatar yzhan289 avatar zippolyte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

integrations-core's Issues

Why were these JMX Beans removed from the Kafka broker check?

I am updating our Datadog conf files. When I diff our old file (which was copy/pasted from the old default in the dd-agent repo) with the current default, I see two bean names were removed without a replacement added:

#
# Aggregate cluster stats
#
-    - include:
-        domain: 'kafka.server'
-        bean_regex: 'kafka\.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=.*'
-        attribute:
-          Count:
-            metric_type: rate
-            alias: kafka.messages_in.topic.rate
- include:
    domain: 'kafka.server'
    bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec'
@@ -414,11 +407,3 @@ init_config:
      Count:
        metric_type: rate
        alias: kafka.log.flush_rate.rate
-
-    - include:
-        domain: 'kafka.network'
-        bean: 'kafka.network:type=RequestChannel,name=RequestQueueSize'
-        attribute:
-          Value:
-            metric_type: gauge
-            alias: kafka.request.queue.size

Why was this done? I don't see a commit removing these, it appeared to happen during the migration of the file to the integrations-core repo... compare:

https://github.com/DataDog/dd-agent/commits/master/conf.d/kafka.yaml.example
with
https://github.com/DataDog/integrations-core/blob/master/kafka/conf.yaml.example

@truthbk it looks like you did the port, do you remember why you removed these bean names?

Go Expvar check shouldn't require users to configure URI ending in `/debug/vars`

It's a minor nit, but there's no need to make users configure the full expvar path in expvar_url, as /debug/vars is hardcoded in the expvar package. The hdfs integrations don't make users append /jmx in their URLs, so I suppose they shouldn't be required to here, either.

If we make this change, it can be made backwards compatible by stripping /debug/vars (if present), or rather leaving it there if present and only appending it if not present.

[couch] error "local variable 'db_stats' referenced before assignment"

I just started using datadog and have an issue getting the couch integration to run (on MacOS Sierra).

/usr/local/bin/datadog-agent info reports this:

 Checks
  ======

    ntp
    ---
      - Collected 0 metrics, 0 events & 1 service check

    disk
    ----
      - instance #0 [OK]
      - Collected 44 metrics, 0 events & 1 service check

    network
    -------
      - instance #0 [OK]
      - Collected 27 metrics, 0 events & 1 service check

    couch
    -----
      - instance #0 [ERROR]: "local variable 'db_stats' referenced before assignment"
      - Collected 0 metrics, 0 events & 2 service checks


  Emitters
  ========

    - http_emitter [OK]

===================
Dogstatsd (v 5.8.0)
===================

  Status date: 2017-02-22 17:11:34 (8s ago)
  Pid: 85989
  Platform: Darwin-16.4.0-x86_64-i386-64bit
  Python Version: 2.7.11, 64bit

To me, local variable 'db_stats' referenced before assignment looks like an error in the couchdb integration library.

Missing docker.disk metrics for docker_daemon integration

The docker_daemon integration page lists a series of metrics under the docker.disk.* namespace that don't appear to be reported anywhere in the docker_daemon integration code. These are the specific metrics in question, and I assume that metadata.csv file drives the content on the published integration page.

If the intention is for the metrics to actually exist, I guess consider this a bug report to eventually get them added to the integration code. If those have been retired and there's no intention of reporting them, I guess consider this a bug report for your docs page. Either way, it would be good to hear what the intention is. I think for now, we'll just write a quick custom check to exec a df in our containers, since they're all linux.

[kafka_consumer] The check should emit a datadog event when consumer lag is negative

The kafka_consumer.py should emit a Datadog event whenever lag for a consumer group is negative.

Rationale:
When kafka consumer lag is negative, it's a REALLY bad thing because it means the consumer group will miss messages.

In https://github.com/DataDog/integrations-core/pull/271/files#diff-8c6dbb3cccbcb816d57a048d94c9e3b5R184 I logged this as a log.error, however when discussing with @truthbk he suggested that this get emitted as a Datadog event so that users can monitor/alert on it.

Better docs for custom metrics

Configuring custom metrics for the postgres check is quite complex, users get often confused by the usage of the %s char in the query field and how the descriptors and metrics elements are used to compose the final query.

The example in the conf.yaml.example file could be improved adding details on how the check composes the query and what's the actual sql code produced.

[rabbitmq] support for custom tags

It looks like the rabbitmq integration does not currently support custom tags, like many other integrations do. It would be great if this could be added to the integration.

Add ability to filter docker events

We're using Docker Data Center. This is very chatty with respect to its containers (a lot of EXEC events). It would be useful to be able to filter these out.

Since the [https://docs.docker.com/engine/reference/commandline/events/#extended-description events] command offers a filter option it would make sense for DD to support as much of this as possible (image, event type etc.)

[riak] improve tests

Currently we have disabled the test coverage because a few of the metrics were unavailable. We will need to enable a search index, and touch up the tests a little bit. It's not the end of the world as we were at 96% coverage, but we do enforce 100% coverage on TravisCI, so that was a problem....

process monitor throws warnings on postgresql: Warning: Process 123 disappeared while scanning

process monitor throws warnings on postgresql: Warning: Process 123 disappeared while scanning

process
-------
- instance #0 [OK]
- instance DataDog/dd-agent#1 [WARNING]
Warning: Process 16039 disappeared while scanning
Warning: Process 16177 disappeared while scanning
Warning: Process 16178 disappeared while scanning
Warning: Process 16193 disappeared while scanning
Warning: Process 16194 disappeared while scanning
Warning: Process 16198 disappeared while scanning

      Warning: Process 15830 disappeared while scanning
      Warning: Process 15844 disappeared while scanning
  - instance DataDog/dd-agent#2 [OK]
  - instance DataDog/dd-agent#3 [OK]
  - instance DataDog/dd-agent#4 [OK]
  - instance DataDog/dd-agent#5 [OK]
  - instance DataDog/dd-agent#6 [OK]
  - instance DataDog/dd-agent#7 [OK]
  - instance DataDog/dd-agent#8 [OK]
  - instance DataDog/dd-agent#9 [OK]
  - Collected 43 metrics, 0 events & 11 service checks

Is is perfectly normal for postgresql to start and stop processes, and I see no reason why datadog would have to complain about that.

docker_daemon AttributeError


BUG REPORT INFORMATION

Use the commands below to provide key information from your environment:
You do NOT have to include this information if this is a FEATURE REQUEST
-->

**Output of the info page **

====================
Collector (v 5.15.0)
====================

  Status date: 2017-07-19 20:51:12 (5s ago)
  Pid: 23107
  Platform: Linux-4.4.41-35.53.amzn1.x86_64-x86_64-with-glibc2.3
  Python Version: 2.7.13, 64bit
  Logs: <stderr>, /var/log/datadog/collector.log, syslog:/dev/log

  Clocks
  ======

    NTP offset: -0.0012 s
    System UTC time: 2017-07-19 20:51:17.833630

  Paths
  =====

    conf.d: /etc/dd-agent/conf.d
    checks.d: /opt/datadog-agent/agent/checks.d

  Hostnames
  =========

    ec2-hostname: ip-10-1-100-97.us-west-2.compute.internal
    socket-hostname: ip-10-1-100-97
    local-hostname: ip-10-1-100-97.us-west-2.compute.internal
    local-ipv4: 10.1.100.97
    hostname: i-0f2e790c6c6b524c0
    socket-fqdn: ip-10-1-100-97.us-west-2.compute.internal
    instance-id: i-0f2e790c6c6b524c0

  Checks
  ======

    docker_daemon (5.15.0)
    ----------------------
      - instance #0 [OK]
      - Collected 249 metrics, 0 events & 1 service check

    process (5.15.0)
    ----------------
      - instance #0 [OK]
      - instance #1 [OK]
      - instance #2 [OK]
      - Collected 48 metrics, 0 events & 3 service checks

    ntp (5.15.0)
    ------------
      - Collected 0 metrics, 0 events & 0 service checks

    disk (5.15.0)
    -------------
      - instance #0 [OK]
      - Collected 24 metrics, 0 events & 0 service checks

    dms_check (5.15.0)
    ------------------
      - instance #0 [OK]
      - Collected 6 metrics, 0 events & 0 service checks

    ecs_deployment_check (5.15.0)
    -----------------------------
      - instance #0 [OK]
      - Collected 99 metrics, 0 events & 0 service checks

    network (5.15.0)
    ----------------
      - instance #0 [OK]
      - Collected 91 metrics, 0 events & 0 service checks


  Emitters
  ========

    - http_emitter [OK]

====================
Dogstatsd (v 5.15.0)
====================

  Status date: 2017-07-19 20:51:10 (7s ago)
  Pid: 23104
  Platform: Linux-4.4.41-35.53.amzn1.x86_64-x86_64-with-glibc2.3
  Python Version: 2.7.13, 64bit
  Logs: <stderr>, /var/log/datadog/dogstatsd.log, syslog:/dev/log

  Flush count: 68
  Packet Count: 1156
  Packets per second: 1.7
  Metric count: 9
  Event count: 0
  Service check count: 0

====================
Forwarder (v 5.15.0)
====================

  Status date: 2017-07-19 20:51:15 (3s ago)
  Pid: 23103
  Platform: Linux-4.4.41-35.53.amzn1.x86_64-x86_64-with-glibc2.3
  Python Version: 2.7.13, 64bit
  Logs: <stderr>, /var/log/datadog/forwarder.log, syslog:/dev/log

  Queue Size: 6699 bytes
  Queue Length: 1
  Flush Count: 217
  Transactions received: 138
  Transactions flushed: 137
  Transactions rejected: 0
  API Key Status: API Key is valid


======================
Trace Agent (v 5.15.0)
======================

  Pid: 23102
  Uptime: 691 seconds
  Mem alloc: 2736480 bytes

  Hostname: i-0f2e790c6c6b524c0
  Receiver: localhost:8126
  API Endpoint: https://trace.agent.datadoghq.com

  Bytes received (1 min): 0
  Traces received (1 min): 0
  Spans received (1 min): 0

  Bytes sent (1 min): 0
  Traces sent (1 min): 0
  Stats sent (1 min): 0

Additional environment details (Operating System, Cloud provider, etc):
Amazon Linux, AWS, ECS

Steps to reproduce the issue:

  1. Configure docker_daemon on AWS instance with ECS cluster configured
  2. Run or stop container on that instance
  3. Error will appear in logs.

Describe the results you received:
2017-07-19 20:45:14 UTC | ERROR | dd.collector | checks.docker_daemon(docker_daemon.py:314) | Docker_daemon check failed
Traceback (most recent call last):
File "/opt/datadog-agent/agent/checks.d/docker_daemon.py", line 293, in check
self._process_events(containers_by_id)
File "/opt/datadog-agent/agent/checks.d/docker_daemon.py", line 733, in _process_events
api_events = self._get_events()
File "/opt/datadog-agent/agent/checks.d/docker_daemon.py", line 768, in _get_events
self.ecsutil.invalidate_cache(events)
AttributeError: 'DockerDaemon' object has no attribute 'ecsutil'

And there are not events haven't been sent

Describe the results you expected:
No errors in log. And Events should be sent

Additional information you deem important (e.g. issue happens only occasionally):

First item in `descriptors` has no use

In documenting custom metrics for the Postgres check (DataDog/documentation#1409), we're finding it awkward to explain that the first item in a descriptors pair, i.e. relname in:

descriptors:
  - [relname, schema]

Has no use. I suspect its original purpose was to match schema to the column it's meant to tag?

Since the check only uses descriptors[x][1] now (https://github.com/DataDog/integrations-core/blob/master/postgres/check.py#L600), can we stop requiring each descriptor to be a pair?

Host tags are not being applied to HTTP checks

**Output of the info page **

  Pid: 1878
  Platform: Linux-4.4.0-1-amd64-x86_64-with-debian-8.1
  Python Version: 2.7.13, 64bit
  Logs: <stderr>, /var/log/datadog/collector.log, syslog:/dev/log

Describe the results you received:
Our http check only had our check_tag, and no host_tag.

Describe the results you expected:
We would've expected host_tag to appear on our http check.

These are the host config and the http check config we used to test the issue.

                                                                   
 init_config:                                                     
                                                                  
 instances:                                                       
   - name: health check                                           
     url: http://localhost/health                                 
                                                                  
     tags:                                                        
       - check_tag:check_tag                                      
[Main]                                                           
                                                                 
# The host of the Datadog intake server to send Agent data to    
dd_url: https://app.datadoghq.com                                
                                                                 
api_key: myapikey                                                
tags: host_tag:host                                              

I noticed this with @mnussbaum.

[kubernetes_state] Default value for kube_state_url, service discovery

Hey there,

I am not a fan of having to configure kube_state_url. Are you open to a PR that defaults to kubernetes-state-metrics if it is not set?

To make things more complicated; I'm running a version of Kubernetes that does not yet support a combination of having hostNetwork set to true and still want to be able to resolve cluster resources. See kubernetes/kubernetes#35761 for related issues. AFAICS this is fixed in kubernetes/kubernetes#29378 which should make it to the 1.6 release.
My approach would be to discover the service myself which is not that hard however I might be missing the point here. I see there are references to service discovery, but I do not see them described in the READMEs. If this is the case, do you have some references to doco?
I see auto_conf.yamls floating around, are they populated by service discovery, and if so, how do I use and troubleshoot it?

Cheers,

O

[http_check] disable_ssl_validation: true stopped working after 5.14.0

Log excerpt

Jun 14 08:43:15 app dd.collector[988]: WARNING (__init__.py:700): Skipping SSL certificate validation for https://veryimportantserver:8443/login based on configuration
Jun 14 08:43:15 app dd.collector[988]: ERROR (network_checks.py:161): Failed to process instance ''.#012Traceback (most recent call last):#012  File "/opt/datadog-gent/agent/checks/network_checks.py", line 147, in _process#012    statuses = self._check(instance)#012  File "/opt/datadog-agent/agent/checks.d/http_check.py", line 323, in _check#012    status, days_left,msg = self.check_cert_expiration(instance, timeout, instance_ca_certs)#012ValueError: need more than 2 values to unpack

Config snippet

  - name: Server Login page
    url: https://veryimportantserver:8443/login
    timeout: 10
    content_match: 'Login Page'
    collect_response_time: true
    allow_redirects: true
    skip_event: true
    disable_ssl_validation: true

Seems to be caused by #249.
Issue appeared after upgrading datadog-agent to 5.14.0.

Please fix, this is breaking our monitoring.

[appveyor][windows] fix winfixme checks

So for some reason the mocking of modules loaded in the check breaks on windows. Not sure what python does differently there, but it's definitely windows specific. We'll have to get to the bottom of it and address it.

[postgres] Improve config reading errors

I had this postgres.yaml:

init_config:

instances:
  - host: pepepe
    ...
    custom_metrics:
    - query: SELECT %s FROM pg_locks WHERE granted = false;
      metrics:
        count(distinct pid): [postgresql.connections_locked]
      descriptors: []
      relation: false

with a few other hosts and custom metrics. When deploying this I got the following error:

2017-02-13 15:33:14 UTC | ERROR | dd.collector | checks.postgres(__init__.py:762) | Check 'postgres' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 745, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/postgres.py", line 606, in check
    custom_metrics = self._get_custom_metrics(instance.get('custom_metrics', []), key)
  File "/opt/datadog-agent/agent/checks.d/postgres.py", line 576, in _get_custom_metrics
    for ref, (_, mtype) in m['metrics'].iteritems():
ValueError: need more than 1 value to unpack

This was caused by a missing metric type in the yaml above i.e. it should have been [postgresql.connections_locked, GAUGE].
Because the error message is unclear and also doesn't point to the offending metric (remember I have other hosts and custom metrics), it took me a couple of hours to figure out the cause of this error.
Please consider improving the error messages around config reading.

[redisdb] Redis Sentinel support

Currently it's possible to specify a sentinel as one of the instances in config, but it does not completely work because of this error:

Datadog's redis integration is reporting:
Instance #0[ERROR]:"unknown command 'SLOWLOG'"

Support for sentinels would probably mean that there should not be attempts to issue the SLOWLOG and probably it would be helpful to gather some sentinel-specific stats.

I'm wiling to provide a PR for that, please let me know if that would be helpful.

go_expvar ssl_verify not working correctly

go_expvar (5.12.3)
------------------
- instance #0 [ERROR]: '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:661)'
- instance #1 [ERROR]: '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:661)'

Steps to reproduce the issue:

  1. Try using datadog's agent go_expvar check on any ssl protected service

Describe the results you received:
Error pasted above.

Describe the results you expected:
Communication like elastic and mongo with ssl protected services

Additional information you deem important (e.g. issue happens only occasionally):
As I stated above, this support seems to be missing from the expvar check.

gearmand check leaks connections

Symptoms

recently we have been having trouble with gearmand running in to max connections issues. Since the ulimit was set to a very low default, we simply raised its ceiling to 32768 without much investigation.

Since making that change, we've seen the datadog agent cease reporting from hosts running gearman. The /var/log/datadog/collector.log tends to be full of

OSError: [Errno 24] Too many open files

inspecting the process with lsof indicates that nearly all of the open files are sockets connected to gearmand:

# lsof -p $DD_AGENT_PID | grep localhost:4730 | wc -l
  1015

Potential causes

Examination of the gearmand integration suggests that while it does open many connections (code), they are never explicitly closed. Due to the implementation, abandoning the client doesn't cause the connections to be collected.

I believe that this connection leaking behavior could be avoided by either:

  • memoizing the connection to a gearmand process on the Check object
  • explicitly calling close (defined here) when we're done using a connection.

I'm about to submit a PR for doing the first option, but if folks think going through the process of opening a new connection is important to be in the flow, we could go with the second.

Extra detail about how this was detected:

The number of open sockets to gearmand (1015, from above) is very close the limit set for the dd-agent process:

# cat /proc/$DD_AGENT_PID/limits | grep 'open files'
Max open files            1024                 4096                 files

My suspicion is that raising the gearmand file limit from 1024 made it possible for dd-agent to exhaust its own file descriptors connecting to it (previously, gearmand would have had n connections used by normal clients and workers, leaving 1024 - n available for dd-agent to try and use. I believe n is large enough to mean that it wasn't possible for dd-agent to totally exhaust its own 1024 file handles purely on these connections + other essential handles). The graphs we have of dd-agent's file descriptor usage correlate strongly with the deployment of the fd limit change to gearmand:

screen shot 2017-05-20 at 2 50 59 pm

Should a test flavor assume that the flavor will be self contained to one folder?

Each flavor currently runs every test file, one at a time, to see if it has the tags nose is looking for.

I want to open up a discussion here on this. I know @hush-hush thought we should make them more self contained, if only because it wouldn't take as long to run.

However, as a matter of practice, I'm not sure that we can assume this. At the very least for default, this is not true. However, perhaps we can change how it parses the flavor variable or include a secondary variable.

This is already leading to issues, as we can see here: #98, so I think that it's something we have to address and figure out how to resolve.

MySQL Replication Lag doesn't tag by channel name

MySQL 5.7 introduces "replication channels", which means a slave can replicate from multiple masters ("sources") at any given time.

To measure replication lag across n channels, it'd be great if DataDog could tag the mysql.replication.seconds_behind_master metric with channel:<CHANNEL_NAME>. That way users could get an average across the channels, or have monitors alert when one of the channels falls behind.

Currently, the check fetches only the first replication channel (https://github.com/DataDog/integrations-core/blob/master/mysql/check.py#L855)

[requirements] ensure there are no version mismatches with `dd-agent`

pip packages should be consistent with what is shipped with the dd-agent as well as other integrations in the repo. We must also make sure these all raise a CI error if there's a mismatch. We may well already be doing this with the rake requirements task, but please double-check.

postfix integration should not require sudo to root

Reading the source code to integrations-core/postfix/check.py I note that it does a sudo to root to run the find command.

This is noted in the docs / comments :

WARNING: the user that dd-agent runs as must have sudo access for the 'find' command

ย  | sudo access is not required when running dd-agent as root (not recommended)
ย  | ย 
ย  | example /etc/sudoers entry:
ย  | dd-agent ALL=(ALL) NOPASSWD:/usr/bin/find /var/spool/postfix* -type f

root should not be required here - postfix user should be sufficient. That would be combined with a '-u postfix' on line 64's sudo command to allow this to work.

This is a concern because find has a -exec parameter and your command list has a wildcard in it - this could be used to run arbitrary commands as root if the dd-agent user is compromised.

manifest.json

there is a description, overview, and config in the manifest. since those are going to be repeated in the readme, putting them here seem duplicative (wait, is that a word). the short description would probably be ok

what are the sizes of the images?

@irabinovitch

postgres - max_connections by instance not host for Docker

We are running several Posgresql docker containers on each server. Most metrics are by db instance, but max_connections is by host. Each instance can have it's own max_connections, which ours do.

Please alter max_connections to be by instance.

[mysql] "mysql.replication.slave_running" metric is incorrect on MySQL 5.5.54

Steps to reproduce the issue:

  1. Run MySQL 5.5.54.
  2. Set up replication
  3. Look at mysql.replication.slave_running metric.

Describe the results you received:
The metric reports "0", that is, that replication is not running.

Describe the results you expected:
It should report "1", that is, that replication is running.

Additional information you deem important (e.g. issue happens only occasionally):

This metric broke for us after installing the 5.13.0 Agent Update on 4/24/17.

Looking at the code for the check, it seems that there were recent changes around the calculation of the mysql.replication.slave_running metric. There's a special case for MySQL 5.7 where it will AND the values of "Slave_IO_Running" and "Slave_SQL_Running", because "Slave_Running" might not be present. It appears that on MySQL 5.5.54 (and maybe other versions, as well), "Slave_Running" is not in the output of the "show slave status" query. I think that 5.7 special case might need to be expanded to additional versions of MySQL.

`kubernetes.cpu.usage.total` is reported in nanocores

The Kubernetes integraton (https://github.com/DataDog/integrations-core/tree/master/kubernetes) provides CPU stats in two different units:

  • cpus (for kubernetes.cpu.limits and kubernetes.cpu.requests)
  • percent_nano (for kubernetes.cpu.usage.total)

These metrics are quite useful to put on the same graph in order to tune the requests/limits settings for Kubernetes containers. To do that, you have to scale kubernetes.cpu.usage.total by 1000000000 (one billion), because the value being reported by the integration is in units of nano CPUs (ie, percent_nano, though there doesn't appear to be any percent math applying. Somewhat annoyingly, you have to specify 1000000000 as (1000000 * 1000) because the web UI will otherwise convert it to 1e9 in the JSON version of a graph, but 1e9 isn't actually a valid value to scale a metric by.

The values are scraped from CAdvisor, who defines them here: https://github.com/google/cadvisor/blob/e14ee9be3506d260847d263e26a3e9e27f83ad96/info/v1/container.go#L267-L283, being pulled from the Docker daemon's statistics (https://github.com/moby/moby/blob/8b1adf55c2af329a4334f21d9444d6a169000c81/daemon/stats/collector_unix.go#L27-L71). It's essentially a rate (per second) of nanoseconds of CPU time used (thus, nanocpu).

I would like to have the values of kubernetes.cpu.usage.total appear as just "cpus" for use in DataDog. Right off the bat, two solutions jump out at me:

  • scale the kubernetes.cpu.usage.total metric by a billion at the agent level, and change its unit from percent_nano to cpu - I've made a PR for this (link coming shortly)
  • make a new unit, nanocores, that will accept the current values being sent, but display them as cores (similar to how it seems that byte-based metrics work)
  • maybe other folks have other ideas?

[docker] Check 'docker_daemon' instance #0 failed

**Output of the info page **

====================
Collector (v 5.11.2)
====================

  Status date: 2017-03-21 13:45:44 (18s ago)
  Pid: 21732
  Platform: Linux-4.4.0-66-generic-x86_64-with-Ubuntu-16.04-xenial
  Python Version: 2.7.12, 64bit
  Logs: <stderr>, /var/log/datadog/collector.log, syslog:/dev/log

  Clocks
  ======

    NTP offset: 0.0177 s
    System UTC time: 2017-03-21 13:46:03.049974

  Paths
  =====

    conf.d: /etc/dd-agent/conf.d
    checks.d: /opt/datadog-agent/agent/checks.d

  Hostnames
  =========

    socket-hostname: tpahaz
    hostname: tpahaz
    socket-fqdn: tpahaz

  Checks
  ======

    process
    -------
      - instance #0 [OK]
      - instance #1 [OK]
      - instance #2 [OK]
      - instance #3 [OK]
      - instance #4 [OK]
      - instance #5 [OK]
      - instance #6 [OK]
      - instance #7 [OK]
      - instance #8 [OK]
      - instance #9 [OK]
      - instance #10 [OK]
      - instance #11 [OK]
      - instance #12 [OK]
      - instance #13 [OK]
      - instance #14 [OK]
      - instance #15 [OK]
      - instance #16 [OK]
      - Collected 137 metrics, 0 events & 17 service checks

    network
    -------
      - instance #0 [OK]
      - Collected 39 metrics, 0 events & 0 service checks

    nginx
    -----
      - instance #0 [OK]
      - Collected 7 metrics, 0 events & 1 service check

    ntp
    ---
      - Collected 0 metrics, 0 events & 0 service checks

    consul
    ------
      - instance #0 [OK]
      - Collected 21 metrics, 0 events & 2 service checks

    disk
    ----
      - instance #0 [OK]
      - Collected 24 metrics, 0 events & 0 service checks

    docker_daemon
    -------------
      - instance #0 [OK]
      - Collected 184 metrics, 0 events & 1 service check


  Emitters
  ========

    - http_emitter [OK]

====================
Dogstatsd (v 5.11.2)
====================

  Status date: 2017-03-21 13:45:58 (4s ago)
  Pid: 21726
  Platform: Linux-4.4.0-66-generic-x86_64-with-Ubuntu-16.04-xenial
  Python Version: 2.7.12, 64bit
  Logs: <stderr>, /var/log/datadog/dogstatsd.log, syslog:/dev/log

  Flush count: 30
  Packet Count: 0
  Packets per second: 0.0
  Metric count: 1
  Event count: 0
  Service check count: 0

====================
Forwarder (v 5.11.2)
====================

  Status date: 2017-03-21 13:46:03 (0s ago)
  Pid: 21725
  Platform: Linux-4.4.0-66-generic-x86_64-with-Ubuntu-16.04-xenial
  Python Version: 2.7.12, 64bit
  Logs: <stderr>, /var/log/datadog/forwarder.log, syslog:/dev/log

  Queue Size: 12472 bytes
  Queue Length: 1
  Flush Count: 101
  Transactions received: 45
  Transactions flushed: 44
  Transactions rejected: 0
  API Key Status: API Key is valid


======================
Trace Agent (v 5.11.2)
======================

  Not running (port 8126)

Additional environment details (Operating System, Cloud provider, etc):

  • ubuntu server 16.04.2 LTS

Steps to reproduce the issue:

  1. restart some container

Sometimes it product the trace:

ERROR (__init__.py:784): Check 'docker_daemon' instance #0 failed
                                           Traceback (most recent call last):
                                             File "/opt/datadog-agent/agent/checks/__init__.py", line 767, in run
                                               self.check(copy.deepcopy(instance))
                                             File "/opt/datadog-agent/agent/checks.d/docker_daemon.py", line 230, in check
                                               self._report_performance_metrics(containers_by_id)
                                             File "/opt/datadog-agent/agent/checks.d/docker_daemon.py", line 489, in _report_performance_metrics
                                               self._report_cgroup_metrics(container, tags)
                                             File "/opt/datadog-agent/agent/checks.d/docker_daemon.py", line 507, in _report_cgroup_metrics
                                               stat_file = self._get_cgroup_from_proc(cgroup["cgroup"], container['_pid'], cgroup['file'])
                                             File "/opt/datadog-agent/agent/checks.d/docker_daemon.py", line 770, in _get_cgroup_from_proc
                                               return DockerUtil.find_cgroup_from_proc(self._mountpoints, pid, cgroup, self.docker_util._docker_root) % (params)
                                             File "/opt/datadog-agent/agent/utils/dockerutil.py", line 311, in find_cgroup_from_proc
                                               with open(proc_path, 'r') as fp:
                                           IOError: [Errno 2] No such file or directory: '/proc/21292/cgroup'

Additional information you deem important (e.g. issue happens only occasionally):
I think it may be related with container state.

[kafka] improve testing

Two big issues with the current kafka tests (#79):

  • We should be getting more beans from the broker (possibly from the clients as well), several of them can be seen via jconsole. Unsure if the problem is related to the metric type ($Meter vs $Gauge, etc). Tests are passing because they've been disabled for now. get back to it.
  • Add more flavors, at least 0.9.1, even if we have to backport fixes and roll our own container. Had to ignore older kafka's for now because https://github.com/wurstmeister/kafka-docker container has some issues (JMX_PORT exposition in the older versions, and some unknown problem bringing the cluster up in 0.8.2).

[core] PGBouncer stats for version latest fails due to extra columns

Output of the info page

    pgbouncer (5.14.1)
    ------------------
      - instance #0 [ERROR]: ''
        Traceback (most recent call last):
          File "/opt/datadog-agent/agent/checks/__init__.py", line 795, in run
            self.check(copy.deepcopy(instance))
          File "/opt/datadog-agent/agent/checks.d/pgbouncer.py", line 221, in check
            self._collect_stats(db, tags)
          File "/opt/datadog-agent/agent/checks.d/pgbouncer.py", line 115, in _collect_stats
            assert len(row) == len(cols) + len(desc)
        AssertionError
        
      - Collected 8 metrics, 0 events & 0 service checks

Additional environment details (Operating System, Cloud provider, etc):
PGBouncer latest (Will try to use version 1.7.2 to check this as well)

Steps to reproduce the issue:

  1. Install latest version for PGBouncer
  2. Configure to use the check

Describe the results you received:
Failed to receive stats for PGBouncer

Describe the results you expected:
Receive stats for PGBouncer

Additional information you deem important (e.g. issue happens only occasionally):
Results for SHOW STATS and SHOW POOLS for latest version o PGBouncer

pgbouncer=# SHOW POOLS;
 database  |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode 
-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+-----------
 mydb      | user      |         1 |          0 |         1 |       0 |       4 |         0 |        0 |       0 |          0 | session
 pgbouncer | pgbouncer |         2 |          0 |         0 |       0 |       0 |         0 |        0 |       0 |          0 | statement
(2 rows)

pgbouncer=# SHOW STATS;
 database  | total_requests | total_received | total_sent | total_query_time | avg_req | avg_recv | avg_sent | avg_query 
-----------+----------------+----------------+------------+------------------+---------+----------+----------+-----------
 mydb      |          76375 |       12603991 |  697447723 |        312246284 |       0 |        0 |        0 |         0
 pgbouncer |              4 |              0 |          0 |                0 |       0 |        0 |        0 |         0
(2 rows)

Checking the pgbouncer/check.py it only checks up to maxwait and accepts an extra column. But PGBouncer SHOW POOLS also returns maxwait_us and pool_mode

Kafka's RequestHandlerAvgIdlePercent MBean should be reported as gauge and not a rate

At the moment, the example kafka integration configuration is reporting the RequestHandlerAvgIdlePercent as follows:

    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent'
        attribute:
          Count:
            metric_type: rate
            alias: kafka.request.handler.avg.idle.pct.rate

I believe this is incorrect for two reasons:

  • this metric is already a rate, so transforming it as a rate as part of DataDog is not necessary
  • this metric indicates a percentage, its Count is not really interesting, whereas its MeanRate makes more senses.

As configured above, the metric displays as follows in a dashboard:

image

Which does not really represent the Request Handler Average Idle Percent.

If we change the config as follows:

    - include:
        domain: 'kafka.server'
        bean: 'kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent'                                                                           attribute:
          MeanRate:
            metric_type: gauge
            alias: kafka.request.handler.avg.idle.pct.rate

We then obtain the desired percentage value:

image

For more background, here is the description of that MBean according to the "Kafka, the definitive guide, chapter 10" (which can be obtained for free on Confluent's web site, so I hope they do not mind the copy/paste) :

The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is. Experience tells us that idle ratios lower than 20% indicate a potential problem, and lower than 10% is usually an active performance problem. Besides the cluster being undersized, there are two reasons you will see for high thread utilization in this pool. The first is that there are not enough threads in the pool. In general, you should set the number of request handler threads equal to the number of processors in the system (including hyperthreaded processors).

I'll submit a minor PR shortly.

mongo check failing at parsing mongo URL

Using mongodb://mongo-service:27017 as URL I get this error:

Trace from datadog/collector.log

2017-05-23 23:31:29 UTC | ERROR | dd.collector | checks.mongo(__init__.py:805) | Check 'mongo' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 788, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/mongo.py", line 644, in check
    username_uri = u"{}@".format(urllib.quote(username))
  File "/opt/datadog-agent/embedded/lib/python2.7/urllib.py", line 1286, in quote
    raise TypeError('None object cannot be quoted')
TypeError: None object cannot be quoted

mongo (5.12.3)
--------------
- instance #0 [ERROR]: 'None object cannot be quoted'

mongo.yaml (** added for privacy)

init_config:
instances:
  # Specify the MongoDB URI, with database to use for reporting (defaults to "admin")
  # E.g. mongodb://datadog:LnCbkX4uhpuLHSUrcayEoAZA@localhost:27016/my-db
  - server: mongodb://mongo-service:27017
    # Controls connectTimeoutMS, serverSelectionTimeoutMS and socketTimeoutMS (see http://api.mongodb.com/python/3.4.0/api/pymongo/mongo_client.html)
    # Defaults to 30 seconds
    # timeout: 30

    # tags:
    #   - optional_tag1
    #   - optional_tag2

    ssl: True # Optional (default to False)
    ssl_keyfile: **/**/service.key 
    ssl_certfile: **/**/service.crt 
    ssl_ca_certs: **/**/ca.crt 
    ssl_cert_reqs: 0 

Steps to reproduce the issue:

  1. Use ssl flags and no username:password and run ddagent
  2. error is [ERROR]: 'None object cannot be quoted'

Describe the results you received:
error is [ERROR]: 'None object cannot be quoted'

Describe the results you expected:
No error and successful connection to mongo and URL parsed properly

Additional information you deem important (e.g. issue happens only occasionally):
Without ssl context this url will work: mongodb://mongo-service:27017

[supevisord] fix flaky test

Supervisord seems to be flaking out in some runs. Let's try to get down to the root cause and get that fixed.

[http_check] Hardcoding check_hostname = True in check_cert_expiration() breaks checks for URLs targeting 127.0.0.1

I have deployed the http_check on haproxy server farms to monitor expiring certificate events. Here is an example http_check.yaml configuration:

init_config:

instances:
  - name: www.myawesomedomain.com
    url: https://127.0.0.1/haproxy_monitor
    timeout: 1
    collect_response_time: true
    skip_event: true
    check_certificate_expiration: true
    days_warning: 90
    days_critical: 7
    headers:
      Host: www.myawesomedomain.com

Notice that the actual URL does not contain the domain name; rather it is directed at localhost. Since multiple haproxy servers all have the same certificate installed on them, there is no one single host that would resolve to www.myawesomedomain.com -- instead, that resolves to an IP address of a Google Cloud Platform Global Load Balancer or Network Load Balancer (which load balances to the haproxy servers in question). Using wwww.myawesomedomain.com in the url field of the yaml would monitor something entirely different: the certificate installed on the cloud platform load balancer service.

This worked great for a while but broke fantastically when we upgraded the Datadog agent recently. Specifically, this change introduced the problem:

DataDog/dd-agent@3eebcda

Specifically, context.check_hostname = True now breaks when configuring the http_check monitor as I have in my example above. In my case, I don't need this set to True to make SNI enabled checks work as I'm not leveraging SNI.

Additional environment details (Operating System, Cloud provider, etc):

Linux Debian jessie
Google Cloud Platform

Steps to reproduce the issue:

  1. Locally monitor an HTTPS endpoint using something similar to the http_check.yaml I have included in this issue.
  2. The check will fail, complaining that the hostname '127.0.0.1' doesn't match either of '*.myawesomedomain.com', 'myawesomedomain.com'

Describe the results you received:

The check will fail even if the certificate is not expiring. In our case, we see a triggered alarm like the following:

[Triggered on {host:redactedl,instance:redacted,url:https://127.0.0.1/haproxy_monitor}] Certificate expired or expiring soon
Certificate expiring soon, please investigate.

@slack-datadog

hostname '127.0.0.1' doesn't match either of '*.myawesomedomain.com', 'myawesomedomain.com'

Describe the results you expected:

The previous default behavior prior to DataDog/dd-agent@3eebcda was to not check to ensure the CN and SANs host names match the URL hostname. I expect the behavior to be preserved and/or the capability to not enforce this be made available.

This may or may not be considered a bug -- some may feel it is a feature request. In either case, the behavior described in this issue changed between versions so I consider it a bug. It will likely require a new feature to address; specifically, a configuration value which allows check_hostname to be configured as True or False for certificate expiration checks. Another alternative is to determine whether or not the URL contains an IP address and/or localhost and to automatically set check_hostname = False in those cases.

Network tests do not work on MacOS

They assume they are running on linux. We should either skip these tests on MacOS or do some better mocking.

test_check_psutil (test_network.TestCheckNetwork) ... ok
test_cx_counters_psutil (test_network.TestCheckNetwork) ... ok
test_cx_state_linux_netstat (test_network.TestCheckNetwork) ... ERROR
test_cx_state_linux_ss (test_network.TestCheckNetwork) ... ERROR
test_cx_state_psutil (test_network.TestCheckNetwork) ... ok
test_parse_protocol_psutil (test_network.TestCheckNetwork) ... ok
test_win_uses_psutil (test_network.TestCheckNetwork) ... ok

======================================================================
ERROR: test_cx_state_linux_netstat (test_network.TestCheckNetwork)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/venv/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched
    return func(*args, **keywargs)
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/network/test_network.py", line 75, in test_cx_state_linux_netstat
    self.run_check({})
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/embedded/dd-agent/tests/checks/common.py", line 270, in run_check
    raise error  # pylint: disable=E0702
IOError: [Errno 2] No such file or directory: '/proc/net/dev'
-------------------- >> begin captured logging << --------------------
utils.subprocess_output: DEBUG: Popen(['/bin/hostname', '-f'], stderr = <tempfile.SpooledTemporaryFile instance at 0x10f601368>, stdout = <tempfile.SpooledTemporaryFile instance at 0x10f6013b0>) called
checks.network: DEBUG: Using `ss` to collect connection state
checks.network: INFO: `ss` not found: using `netstat` as a fallback
aggregator: DEBUG: received 0 payloads since last flush
--------------------- >> end captured logging << ---------------------

======================================================================
ERROR: test_cx_state_linux_ss (test_network.TestCheckNetwork)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/venv/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched
    return func(*args, **keywargs)
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/network/test_network.py", line 65, in test_cx_state_linux_ss
    self.run_check({})
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/embedded/dd-agent/tests/checks/common.py", line 270, in run_check
    raise error  # pylint: disable=E0702
IOError: [Errno 2] No such file or directory: '/proc/net/dev'
-------------------- >> begin captured logging << --------------------
utils.subprocess_output: DEBUG: Popen(['/bin/hostname', '-f'], stderr = <tempfile.SpooledTemporaryFile instance at 0x10f66c440>, stdout = <tempfile.SpooledTemporaryFile instance at 0x10f601488>) called
checks.network: DEBUG: Using `ss` to collect connection state
aggregator: DEBUG: received 0 payloads since last flush
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 7 tests in 0.140s

FAILED (errors=2)
Exception [Errno 2] No such file or directory: '/proc/net/dev' during check
Traceback (most recent call last):
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/embedded/dd-agent/tests/checks/common.py", line 250, in run_check
    self.check.check(copy.deepcopy(instance))
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/embedded/../network/check.py", line 71, in check
    self._check_linux(instance)
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/embedded/../network/check.py", line 302, in _check_linux
    proc = open(proc_dev_path, 'r')
IOError: [Errno 2] No such file or directory: '/proc/net/dev'

Exception [Errno 2] No such file or directory: '/proc/net/dev' during check
Traceback (most recent call last):
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/embedded/dd-agent/tests/checks/common.py", line 250, in run_check
    self.check.check(copy.deepcopy(instance))
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/embedded/../network/check.py", line 71, in check
    self._check_linux(instance)
  File "/Users/GregoryMeyer/dd-workspace/integrations-core/embedded/../network/check.py", line 302, in _check_linux
    proc = open(proc_dev_path, 'r')
IOError: [Errno 2] No such file or directory: '/proc/net/dev'

ImportError: No module named redis

Hi folks,

we were trying datadog on our servers. The setup was just fine.

After that we tried to integrate haproxy (successfully) and than we tried to integrate redis. And it doesn't work.

We are on a freebsd server (10.3).

As soon as we set up the yaml file we go this error:

2017-04-28 15:26:01,850 | ERROR | dd.collector | collector(agent.py:604) | Uncaught error running the Agent
Traceback (most recent call last):
File "agent/agent.py", line 600, in
sys.exit(main())
File "agent/agent.py", line 526, in main
return Agent.info(verbose=options.verbose)
File "agent/agent.py", line 218, in info
return CollectorStatus.print_latest_status(verbose=verbose)
File "/root/.datadog-agent/agent/checks/check_status.py", line 281, in print_latest_status
message = module_status.render()
File "/root/.datadog-agent/agent/checks/check_status.py", line 179, in render
] + ["", ""]
File "/root/.datadog-agent/agent/checks/check_status.py", line 545, in body_lines
' ' + '-' * (len(cs.name) + 3 + len(cs.check_version))
TypeError: object of type 'NoneType' has no len()
Traceback (most recent call last):
File "agent/agent.py", line 600, in
sys.exit(main())
File "agent/agent.py", line 526, in main
return Agent.info(verbose=options.verbose)
File "agent/agent.py", line 218, in info
return CollectorStatus.print_latest_status(verbose=verbose)
File "/root/.datadog-agent/agent/checks/check_status.py", line 281, in print_latest_status
message = module_status.render()
File "/root/.datadog-agent/agent/checks/check_status.py", line 179, in render
] + ["", ""]
File "/root/.datadog-agent/agent/checks/check_status.py", line 545, in body_lines
' ' + '-' * (len(cs.name) + 3 + len(cs.check_version))
TypeError: object of type 'NoneType' has no len()

Investigating a bit more we found that this command
./bin/agent check redis

returns ImportError: No module named redis

so we installed through pip the redis-client with this command:

pip install redis

but nothing changes.

Are we missing something?

[mongo] Make call to database_names() optional

For the mongo agent, it does the following:
dbnames = cli.database_names()

Which results in a call to listDatabases, causing an almost complete global lock of the database server, getting more severe in relation to the amount of databases / collections you have, as it basically freezes each database with a lock while it calculates the sizes, which listDatabases returns by default. It does this roughly every 15 - 20 seconds from what I can tell. On my servers, depending on load, this can result in an almost-indefinite load spike, and normal MongoDB activity slows to a crawl while every connection is waiting on locks to clear.

For a smaller MongoDB server / replset, it might not even be noticeable, but if you watch it on a server with hundreds of databases, some large, and many collections, you should be able to reproduce what I'm seeing.

Newer MongoDB versions (3.2.13 / 3.4.3) support passing { nameOnly : true } to the listDatabases call, but pyMongo would also have to be updated to account for this.

The short-term solution I'd like to ask for is to make the logic in the DataDog mongo monitor for counting number of databases and their basic statistics optional. For backwards compatibility, it would default to collect that information, but there could be an argument in mongo.yaml to skip the metrics that depend on this call.

For now, I've had to uninstall the mongo monitor due to the additional load it placed on our MongoDB servers. I'd love to try it again once this is resolved.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.