Git Product home page Git Product logo

Comments (13)

da991319 avatar da991319 commented on May 26, 2024 1

Hi @JoshVanL,

I hope you are doing well.
It looks like your suggestion fixed my issue. everything is holding up so far.
Thank you again for your time.

kind regards
Regis

from dapr.

da991319 avatar da991319 commented on May 26, 2024

I am experiencing the same behavior with dapr 1.12.4.
The sidecar randomly crashloop because of a sentry error.

logs from my application sidecar
time="2024-01-26T09:06:30.396537009Z" level=fatal msg="Fatal error from runtime: failed to retrieve the initial identity certificate: error from sentry SignCertificate: rpc error: code = PermissionDenied desc = failed to get configuration" app_id=core-stations-actor instance=core-stations-actor-7845dd8dc7-l2ch5 scope=dapr.runtime type=log ver=1.12.4

logs from the sentry
time="2024-01-26T09:07:31.054534097Z" level=error msg="Failed to get configuration \"\": Configuration.dapr.io \"\" not found" instance=dapr-sentry-5f54cfcbcc-qmtjz scope=dapr.sentry.identity.kubernetes type=log ver=1.12.4

Other applications with dapr sidecar are starting correctly.
Restarting or redeploying the app doesn't help.

Thank you for any feedback on how to resolve this issue

from dapr.

steve-heslouin avatar steve-heslouin commented on May 26, 2024

updating the title as it's indeed also on 1.12.4 ;)

from dapr.

yaron2 avatar yaron2 commented on May 26, 2024

cc @JoshVanL

from dapr.

JoshVanL avatar JoshVanL commented on May 26, 2024

Hi @steve-heslouin, would you be able to share logs from sentry around this time- does this include any more information?

@da991319 Do you have dapr.io/config: "" set as an annotation on your application? If no Configuration is needed then this annotation should be omitted.

from dapr.

steve-heslouin avatar steve-heslouin commented on May 26, 2024

@JoshVanL

thanks for coming checking out this issue ;)

For the logs, sure, i should have them, I can try to get back some logs on Monday ;)

from dapr.

da991319 avatar da991319 commented on May 26, 2024

Hi @JoshVanL,
Thanks for checking this out.
There is indeed this annotation but it is not needed. I will try omitted it.
Thank you for the suggestion

from dapr.

steve-heslouin avatar steve-heslouin commented on May 26, 2024

@JoshVanL

Here are the logs for sentry on this day, i noticed that there is usually few logs from this pod everyday, but this day in particular, there were much more than usual, and i noticed few errors

2024-01-22 02:32:17.001	time="2024-01-22T01:32:17.000533965Z" level=info msg="Renewing workload cert; current cert expires on: 2024-01-22 08:48:47 +0000 UTC" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.runtime.security type=log ver=1.12.4
2024-01-22 02:32:17.001	time="2024-01-22T01:32:17.001731734Z" level=info msg="Successfully renewed workload cert; new cert expires on: 2024-01-23 01:32:17 +0000 UTC" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.runtime.security type=log ver=1.12.4
2024-01-22 03:26:44.366	time="2024-01-22T02:26:44.365946279Z" level=info msg="Starting Dapr Sentry certificate authority -- version 1.12.4 -- commit e51e5cd3b1af27aef81daaf42530c232bf2b1594" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.366	time="2024-01-22T02:26:44.366023805Z" level=info msg="Log level set to: info" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.385	time="2024-01-22T02:26:44.384735799Z" level=info msg="metrics server started on :9090/" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.385	time="2024-01-22T02:26:44.385028197Z" level=info msg="Starting watch on filesystem directory: /var/run/secrets/dapr.io/credentials" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.385	time="2024-01-22T02:26:44.385078373Z" level=info msg="Healthz server is listening on :8080" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.385	time="2024-01-22T02:26:44.385128248Z" level=info msg="Using kubernetes secret store for trust bundle storage" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry.ca type=log ver=1.12.4
2024-01-22 03:26:44.401	time="2024-01-22T02:26:44.401155771Z" level=info msg="Root and issuer certs found: using existing certs" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry.ca type=log ver=1.12.4
2024-01-22 03:26:44.401	time="2024-01-22T02:26:44.401181851Z" level=info msg="CA certificate key pair ready" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.439	time="2024-01-22T02:26:44.438527459Z" level=info msg="Adding validator 'kubernetes' with Sentry ID: spiffe://cluster.local/ns/dapr-system/dapr-sentry" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.439	time="2024-01-22T02:26:44.438616282Z" level=info msg="Using validator 'kubernetes'" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.439	time="2024-01-22T02:26:44.439292689Z" level=info msg="Security is initialized successfully" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.runtime.security type=log ver=1.12.4
2024-01-22 03:26:44.439	time="2024-01-22T02:26:44.43933457Z" level=info msg="Starting workload cert expiry watcher; current cert expires on: 2024-01-23 02:26:44 +0000 UTC, renewing at 2024-01-22 19:10:14 +0000 UTC" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.runtime.security type=log ver=1.12.4
2024-01-22 03:26:44.439	time="2024-01-22T02:26:44.439451983Z" level=info msg="Running gRPC server on port 50001" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry.server type=log ver=1.12.4
2024-01-22 04:13:43.367	W0122 03:13:43.352751       1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.PartialObjectMetadata ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:43.405	W0122 03:13:43.373613       1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1alpha1.Configuration ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:43.490	W0122 03:13:43.490470       1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:44.789	W0122 03:13:44.787931       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.PartialObjectMetadata: Unauthorized
2024-01-22 04:13:44.789	E0122 03:13:44.787976       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized
2024-01-22 04:13:44.797	W0122 03:13:44.790812       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1alpha1.Configuration: Unauthorized
2024-01-22 04:13:44.797	E0122 03:13:44.790839       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1alpha1.Configuration: failed to list *v1alpha1.Configuration: Unauthorized
2024-01-22 04:13:45.072	W0122 03:13:45.066822       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: Unauthorized
2024-01-22 04:13:45.072	E0122 03:13:45.066854       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized
2024-01-22 04:13:46.111	time="2024-01-22T03:13:46.048705984Z" level=info msg="Received signal 'terminated'; beginning shutdown" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.signals type=log ver=1.12.4
2024-01-22 04:13:46.111	time="2024-01-22T03:13:46.105623728Z" level=info msg="Healthz server is shutting down" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.sentry type=log ver=1.12.4
2024-01-22 04:13:46.111	time="2024-01-22T03:13:46.107148833Z" level=info msg="Shutting down gRPC server" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.sentry.server type=log ver=1.12.4
2024-01-22 04:13:46.131	time="2024-01-22T03:13:46.130488773Z" level=info msg="Sentry shut down gracefully" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.sentry type=log ver=1.12.4

thanks for your help on this :)

from dapr.

JoshVanL avatar JoshVanL commented on May 26, 2024

Thanks @steve-heslouin, it looks like sentry lost connection to the kube API server for some reason. It needs this connection to perform authorization on incoming certificate requests from Pods, and to lookup Configuration for those Pods to determine the trust domain that identity is in.

This could be some intermittent networking issue between sentry and the API server, however perhaps more likely Sentry is being throttled/rate limited by the API server and Sentry is not handling this in the best way. I'm assuming Sentry is restarted here and eventually becomes fully operational?

Was there must Dapr Pod churn in the cluster when this occurred? Do you have the operator watchdog controller running in the cluster to help "kick" those Pods which are not being injected? Would you also be able to run Sentry in log debug mode as this might reveal a bit more info for us --set dapr_sentry.logLevel=debug?

from dapr.

steve-heslouin avatar steve-heslouin commented on May 26, 2024

@JoshVanL
If i take a look at the time the sidecar failed to be attached, i see 2024-01-22T02:26:31

time="2024-01-22T02:26:31.660675775Z" level=error msg="Sidecar injector failed to inject

while those error on sentry happened around 4.13AM, so they are probably not related for this issue at least , wdyt ?

2024-01-22 04:13:43.367	W0122 03:13:43.352751       1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.PartialObjectMetadata ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:43.405	W0122 03:13:43.373613       1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1alpha1.Configuration ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:43.490	W0122 03:13:43.490470       1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:44.789	W0122 03:13:44.787931       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.PartialObjectMetadata: Unauthorized
2024-01-22 04:13:44.789	E0122 03:13:44.787976       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized
2024-01-22 04:13:44.797	W0122 03:13:44.790812       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1alpha1.Configuration: Unauthorized
2024-01-22 04:13:44.797	E0122 03:13:44.790839       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1alpha1.Configuration: failed to list *v1alpha1.Configuration: Unauthorized
2024-01-22 04:13:45.072	W0122 03:13:45.066822       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: Unauthorized
2024-01-22 04:13:45.072	E0122 03:13:45.066854       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized

from dapr.

yaron2 avatar yaron2 commented on May 26, 2024

@JoshVanL If i take a look at the time the sidecar failed to be attached, i see 2024-01-22T02:26:31

time="2024-01-22T02:26:31.660675775Z" level=error msg="Sidecar injector failed to inject

while those error on sentry happened around 4.13AM, so they are probably not related for this issue at least , wdyt ?

2024-01-22 04:13:43.367	W0122 03:13:43.352751       1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.PartialObjectMetadata ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:43.405	W0122 03:13:43.373613       1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1alpha1.Configuration ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:43.490	W0122 03:13:43.490470       1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:44.789	W0122 03:13:44.787931       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.PartialObjectMetadata: Unauthorized
2024-01-22 04:13:44.789	E0122 03:13:44.787976       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized
2024-01-22 04:13:44.797	W0122 03:13:44.790812       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1alpha1.Configuration: Unauthorized
2024-01-22 04:13:44.797	E0122 03:13:44.790839       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1alpha1.Configuration: failed to list *v1alpha1.Configuration: Unauthorized
2024-01-22 04:13:45.072	W0122 03:13:45.066822       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: Unauthorized
2024-01-22 04:13:45.072	E0122 03:13:45.066854       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized

This does look like the k8s API server is closing a connection and perhaps timing out. Can you check the API server logs from EKS for the time that correlates with these errors?

from dapr.

avo-sepp avatar avo-sepp commented on May 26, 2024

This issue, or similar to it, brought down an entire cluster after a Kubernetes upgrade today. On AKS, after updating from 1.26.x to 1.27.7, ALL replica sets (and deployments) were failing with error similar to:

Error creating: admission webhook "sidecar-injector.dapr.io" denied the request: error from sentry SignCertificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.212.137:443: connect: connection refused"

To get deployments to come online I had to remove the sidecar-injector.Dapr.io mutation webhook from the cluster entirely. Then allow all deployments to come back online, before re-adding the mutation webhook. Yaron, I'm happy to provide additional logs or troubleshooting if it helps you triage this issue.

from dapr.

yaron2 avatar yaron2 commented on May 26, 2024

This issue, or similar to it, brought down an entire cluster after a Kubernetes upgrade today. On AKS, after updating from 1.26.x to 1.27.7, ALL replica sets (and deployments) were failing with error similar to:

Error creating: admission webhook "sidecar-injector.dapr.io" denied the request: error from sentry SignCertificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.212.137:443: connect: connection refused"

To get deployments to come online I had to remove the sidecar-injector.Dapr.io mutation webhook from the cluster entirely. Then allow all deployments to come back online, before re-adding the mutation webhook. Yaron, I'm happy to provide additional logs or troubleshooting if it helps you triage this issue.

Thanks Avo, this looks to be a different issue that we're now tracking here: #7479.

from dapr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.