Git Product home page Git Product logo

Comments (21)

antoinepouille avatar antoinepouille commented on July 30, 2024


Thanks for reaching out about this issue! It seems like we are already investigating this on one of our support tickets, so we will take back investigation and be coming back to you through this ticket.


from docker-dd-agent.

poswald avatar poswald commented on July 30, 2024

Based on your documentation I'm trying to configure the certificates.

This error message is quite misleading because this seems to be the real error:

2018-02-20 03:02:16 UTC | ERROR | dd.collector | collector( | Failed to initialize kubelet connection. Will retry 0 time(s). Error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:661)

But that's not the one you end up showing to the user.

Doing something like this from the agent's pod gives a warning about self-signed certificates...

root@dd-agent-5rp7j:/# curl -v  --cacert /host/etc/kubernetes/cert/kubelet.pem

I think that's basically the same as the error message being reported by the python/requests library.

I have no idea what pem file I could pass to it to make it happy.

from docker-dd-agent.

antoinepouille avatar antoinepouille commented on July 30, 2024

This may be caused by the way Bluemix is signing certificates. They may not be signed correctly to allow access to the kubelet from the agent's pod.
We will have to reproduce the setup, and check the validity of the certificates there. Please bear with us while we do so. If we can make sure of this, we will then contact them.

We will continue to update you about this in the support ticket we have been using.

from docker-dd-agent.

irabinovitch avatar irabinovitch commented on July 30, 2024

We've shared this issue with the IBM Cloud team. Hopefully they'll chime in soon as well.

from docker-dd-agent.

xvello avatar xvello commented on July 30, 2024

If running inside a pod, the kubelet's certificate is validated against the cluster root CA, mounted in /var/run/secrets/ Could you try to validate this with curl?

from docker-dd-agent.

irabinovitch avatar irabinovitch commented on July 30, 2024

@poswald Did @xvello 's suggestion work?

from docker-dd-agent.

JulienBalestra avatar JulienBalestra commented on July 30, 2024

Thanks for you patience while we are looking into this!

The problem can occur from different sources:

  • kubelet configuration
  • the current python requests library.

The Pod /run/secrets/ is provided by the hyperkube controller-manager --root-ca-file flag.

It's not mandatory for the kubelet to use certificates from the same PKI as the controller-manager.

The kubelet can be started with different configurations:

1. TLS:
By setting the kubelet's appropriated flags --tls-cert-file and --tls-private-key-file to certs issued from various PKI.
If the PKI is the same as the --root-ca-file everything is fine.
Otherwise you must provide an additional certificate authority to establish a HTTPS connection with the kubelet.

2. Self-signed:
If none of --tls-cert-file and --tls-private-key-file are set, the kubelet will generate self-signed certificates into the --cert-dir: /var/lib/kubelet/pki by default.

As a note: Kubernetes CSR:
Set the kubelet --bootstrap-kubeconfig flag to request a client certificate from the API server and then store them to the --cert-dir.
The kubelet will submit a CSR and when it's approved, the certificates will be issued by the controller-manager. The controller-manager will use the flags --cluster-signing-cert-file and --cluster-signing-key-file and they can be are different from the --root-ca-file.

This is added as a note as it manages only communication between the kubelet client and the API server, not between te kubelet server API (e.g. /pods) and the pods client.

In order to check if your configuration correctly matches one of those two, you can run this:

curl -v --cacert /run/secrets/ \
  https://${status.nodeIP}:10250 \
  -H "Authorization: Bearer $(< /run/secrets/"

If this doesn't work, then most probably the kubelet configuration is not correct to allow access from the agent pod.

Aside of this configuration matters, during our investigation, we stumbled into an issue with Python requests version 2.11.1:

verify = "/path/to/issuing_ca"
r = requests.get("", verify=verify, cert=None)

will produce the following stack trace:

requests.exceptions.SSLError: HTTPSConnectionPool(host='', port=10250): Max retries exceeded with url: /pods (Caused by SSLError(CertificateError("hostname '' doesn't match either of 'e2e', '', '', ''",),))

(With the following details in the CA:CN=e2e and X509v3 Subject Alternative Name: DNS:e2e)

Upgrading the library solves this issue, so we will at least work to provide a fix for this. This could actually be part of the issue you are encountering.

We will keep looking into this and be updating you.

from docker-dd-agent.

antoinepouille avatar antoinepouille commented on July 30, 2024

After investigating the different certificates provided by Bluemix clusters, it seems like the certificate you need to access the kubernetes is located on the node in /var/lib/kubelet/pki/kubelet.crt. It is a self-signed certificate (as in case 2 of the previous message), so you will have to mount it in the agent to authenticate with the kubelet.
You can check that it's the correct one by running on the host curl --cacert /var/lib/kubelet/pki/kubelet.crt: you should get an "unauthorized", which shows the certificate is accepted.

However, testing this with the agent is failing because of the already mentioned issue with the requests library which is bundled with the agent. The issue lies with an error handling the Subject Alternate Name that has to be used here. We are currently working on updating this library in the agent to fix this issue.
Hopefully, this will finally allow this setup to work. Don't hesitate to ask if you have further questions regarding this matter!

from docker-dd-agent.

poswald avatar poswald commented on July 30, 2024

I'll be honest, I haven't tried as I had basically given up on it as I just didn't have enough visibility into how the IBM servers were set up to figure it out. Thank you for following up on this.

I'll give this another shot.

I was looking and I realized that there is a helm chart so perhaps I'll give that a go as well, although I suspect I'll have to use my own hand-created yaml file to get the host mounts. You might want to look into making sure that thing knows how to mount the certs into the agent pod as well.

I'll keep an eyes peeled for a release that closes this issue.

from docker-dd-agent.

antoinepouille avatar antoinepouille commented on July 30, 2024

Indeed, even if you move to the helm chart, you would need to specify manually the mount for the kubelet certificate. This should be pretty standard configuration, so please ask if you need help with that.

Actually, given that requests issue, I would advise you to wait for the next release of the agent, which will update the library, before starting to try it again as it is deemed to fail until then.

from docker-dd-agent.

irabinovitch avatar irabinovitch commented on July 30, 2024

@antoinepouille Were you referring to 6.1?

from docker-dd-agent.

JulienBalestra avatar JulienBalestra commented on July 30, 2024

@irabinovitch The datadog/agent:6.1.0 release solved the SSL issue with the upgrade of python requests (2.18.4).

from docker-dd-agent.

antoinepouille avatar antoinepouille commented on July 30, 2024

@poswald Agent 6.1.0 is now out. Could you test out the kubelet check with the new details we provided you? Please tell us if you still encounter issues then.

from docker-dd-agent.

mhulscher avatar mhulscher commented on July 30, 2024

I have the same issue. I just deployed agent 6.1.0 but the check breaks on not being able to verify the secure port's CA cert. This makes sense because the kubelet's server certificate is self-signed. Is there an option (envvar) I can set to ignore CA verification?

[ AGENT ] 2018-03-27 12:34:15 UTC | ERROR | (autoconfig.go:446 in collect) | Unable to collect configurations from provider Kubernetes pod annotation: temporary failure in kubeutil, will retry later: cannot connect: https: "Get x509: cannot validate certificate for because it doesn't contain any IP SANs", http: "Get dial tcp getsockopt: connection refused"

from docker-dd-agent.

antoinepouille avatar antoinepouille commented on July 30, 2024

@mhulscher Are you using IBM cloud? The certificate setup is different according to your Kubernetes distribution. You should be able to disable it using this option, but we are working on a bug related to that option so it may not work until a next version of the agent.
If you still have issues with that, please open up a case with our support where we will be able to help you, using your logs and config.


from docker-dd-agent.

antoinepouille avatar antoinepouille commented on July 30, 2024

@poswald Closing the issue since it should now be all set. Don't hesitate to reach back if you need more help!

from docker-dd-agent.

msvechla avatar msvechla commented on July 30, 2024

fyi: This is still an issue, however we are in contact with IBM Cloud Container specialists to resolve this. A fix will be deployed this week to enable webhook authentication to kubelet.

from docker-dd-agent.

JulienBalestra avatar JulienBalestra commented on July 30, 2024

@msvechla thanks for the update, let us know how it goes 👍

from docker-dd-agent.

irabinovitch avatar irabinovitch commented on July 30, 2024

@msvechla Do you have a ticket # or other details we could follow up with IBM on?

from docker-dd-agent.

kerberos5 avatar kerberos5 commented on July 30, 2024

i have same problem... i have resolve
I solved the issue with certificates by uploading them to the agent pod:
into value.yaml (
` # agents.volumes -- Specify additional volumes to mount in the dd-agent container

- hostPath:

path: <HOST_PATH>


- hostPath:
    path: /home/docker/.minikube
  name: cert

agents.volumeMounts -- Specify additional volumes to mount in the dd-agent container


- name: <VOLUME_NAME>


readOnly: true

- name: cert
  mountPath: /opt/datadog-agent/cert
  readOnly: true`

i have add this env:
` # agents.containers.agent.env -- Additional environment variables for the agent container
value: "false"
value: "/opt/datadog-agent/cert/ca.crt"
value: "/opt/datadog-agent/cert/client.crt"
value: "/opt/datadog-agent/cert/client.key"
fieldPath: status.hostIP

fieldPath: spec.nodeName`

finally I recreated the minikube cluster listening on port 6443

now i recived this error into agent log:

kube_controller_manager (1.7.0)
Instance ID: kube_controller_manager:1476f03dc31e9882 [ERROR]
Configuration Source: file:/etc/datadog-agent/conf.d/kube_controller_manager.d/auto_conf.yaml
Total Runs: 378
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 378
Average Execution Time : 8ms
Last Execution Date : 2020-10-27 13:34:23.000000 UTC
Last Successful Execution Date : Never
Error: HTTPConnectionPool(host='', port=10252): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f902bacf220>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 159, in _new_conn
conn = connection.create_connection(
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/", line 84, in create_connection
raise err
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/", line 74, in create_connection
ConnectionRefusedError: [Errno 111] Connection refused

  During handling of the above exception, another exception occurred:
  Traceback (most recent call last):
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 670, in urlopen
      httplib_response = self._make_request(
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 392, in _make_request
      conn.request(method, url, **httplib_request_kw)
    File "/opt/datadog-agent/embedded/lib/python3.8/http/", line 1255, in request
      self._send_request(method, url, body, headers, encode_chunked)
    File "/opt/datadog-agent/embedded/lib/python3.8/http/", line 1301, in _send_request
      self.endheaders(body, encode_chunked=encode_chunked)
    File "/opt/datadog-agent/embedded/lib/python3.8/http/", line 1250, in endheaders
      self._send_output(message_body, encode_chunked=encode_chunked)
    File "/opt/datadog-agent/embedded/lib/python3.8/http/", line 1010, in _send_output
    File "/opt/datadog-agent/embedded/lib/python3.8/http/", line 950, in send
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 187, in connect
      conn = self._new_conn()
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 171, in _new_conn
      raise NewConnectionError(
  urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f902bacf220>: Failed to establish a new connection: [Errno 111] Connection refused
  During handling of the above exception, another exception occurred:
  Traceback (most recent call last):
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 439, in send
      resp = conn.urlopen(
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 726, in urlopen
      retries = retries.increment(
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/", line 439, in increment
      raise MaxRetryError(_pool, url, error or ResponseError(cause))
  urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='', port=**10252**): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f902bacf220>: Failed to establish a new connection: [Errno 111] Connection refused'))
  During handling of the above exception, another exception occurred:
  Traceback (most recent call last):
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/", line 828, in run
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kube_controller_manager/", line 148, in check
      self.process(scraper_config, metric_transformers=transformers)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/", line 507, in process
      for metric in self.scrape_metrics(scraper_config):
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/", line 447, in scrape_metrics
      response = self.poll(scraper_config)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/", line 713, in poll
      response = self.send_request(endpoint, scraper_config, headers)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/", line 739, in send_request
      return http_handler.get(endpoint, stream=True, **kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/", line 283, in get
      return self._request('get', url, options)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/", line 332, in _request
      return getattr(requests, method)(url, **new_options)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 75, in get
      return request('get', url, params=params, **kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 60, in request
      return session.request(method=method, url=url, **kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 533, in request
      resp = self.send(prep, **send_kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 646, in send
      r = adapter.send(request, **kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 516, in send
      raise ConnectionError(e, request=request)
  requests.exceptions.ConnectionError: HTTPConnectionPool(host='', port=10252): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f902bacf220>: Failed to establish a new connection: [Errno 111] Connection refused'))

kube_scheduler (1.5.0)
  Instance ID: kube_scheduler:f948e5430c3c100b [ERROR]
  Configuration Source: file:/etc/datadog-agent/conf.d/kube_scheduler.d/auto_conf.yaml
  Total Runs: 378
  Metric Samples: Last Run: 0, Total: 0
  Events: Last Run: 0, Total: 0
  Service Checks: Last Run: 1, Total: 378
  Average Execution Time : 7ms
  Last Execution Date : 2020-10-27 13:34:30.000000 UTC
  Last Successful Execution Date : Never
  Error: HTTPConnectionPool(host='', port=**10251**): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f902bacfe80>: Failed to establish a new connection: [Errno 111] Connection refused'))
  Traceback (most recent call last):
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 159, in _new_conn
      conn = connection.create_connection(
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/", line 84, in create_connection
      raise err
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/", line 74, in create_connection
  ConnectionRefusedError: [Errno 111] Connection refused
  During handling of the above exception, another exception occurred:
  Traceback (most recent call last):
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 670, in urlopen
      httplib_response = self._make_request(
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 392, in _make_request
      conn.request(method, url, **httplib_request_kw)
    File "/opt/datadog-agent/embedded/lib/python3.8/http/", line 1255, in request
      self._send_request(method, url, body, headers, encode_chunked)
    File "/opt/datadog-agent/embedded/lib/python3.8/http/", line 1301, in _send_request
      self.endheaders(body, encode_chunked=encode_chunked)
    File "/opt/datadog-agent/embedded/lib/python3.8/http/", line 1250, in endheaders
      self._send_output(message_body, encode_chunked=encode_chunked)
    File "/opt/datadog-agent/embedded/lib/python3.8/http/", line 1010, in _send_output
    File "/opt/datadog-agent/embedded/lib/python3.8/http/", line 950, in send
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 187, in connect
      conn = self._new_conn()
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 171, in _new_conn
      raise NewConnectionError(
  urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f902bacfe80>: Failed to establish a new connection: [Errno 111] Connection refused
  During handling of the above exception, another exception occurred:
  Traceback (most recent call last):
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 439, in send
      resp = conn.urlopen(
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/", line 726, in urlopen
      retries = retries.increment(
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/", line 439, in increment
      raise MaxRetryError(_pool, url, error or ResponseError(cause))
  urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='', port=10251): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f902bacfe80>: Failed to establish a new connection: [Errno 111] Connection refused'))
  During handling of the above exception, another exception occurred:
  Traceback (most recent call last):
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/", line 828, in run
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kube_scheduler/", line 139, in check
      self.process(scraper_config, metric_transformers=transformers)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/", line 507, in process
      for metric in self.scrape_metrics(scraper_config):
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/", line 447, in scrape_metrics
      response = self.poll(scraper_config)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/", line 713, in poll
      response = self.send_request(endpoint, scraper_config, headers)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/", line 739, in send_request
      return http_handler.get(endpoint, stream=True, **kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/", line 283, in get
      return self._request('get', url, options)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/", line 332, in _request
      return getattr(requests, method)(url, **new_options)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 75, in get
      return request('get', url, params=params, **kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 60, in request
      return session.request(method=method, url=url, **kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 533, in request
      resp = self.send(prep, **send_kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 646, in send
      r = adapter.send(request, **kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/", line 516, in send
      raise ConnectionError(e, request=request)
  requests.exceptions.ConnectionError: HTTPConnectionPool(host='', port=10251): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f902bacfe80>: Failed to establish a new connection: [Errno 111] Connection refused'))`

is it normal that it tries to connect on other ports than 10250?

from docker-dd-agent.

omrishilton avatar omrishilton commented on July 30, 2024

@kerberos5 I am also receiving the exact same errors, did you end up solving this issue?

from docker-dd-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.