Comments (13)
Hi @JoshVanL,
I hope you are doing well.
It looks like your suggestion fixed my issue. everything is holding up so far.
Thank you again for your time.
kind regards
Regis
from dapr.
I am experiencing the same behavior with dapr 1.12.4.
The sidecar randomly crashloop because of a sentry error.
logs from my application sidecar
time="2024-01-26T09:06:30.396537009Z" level=fatal msg="Fatal error from runtime: failed to retrieve the initial identity certificate: error from sentry SignCertificate: rpc error: code = PermissionDenied desc = failed to get configuration" app_id=core-stations-actor instance=core-stations-actor-7845dd8dc7-l2ch5 scope=dapr.runtime type=log ver=1.12.4
logs from the sentry
time="2024-01-26T09:07:31.054534097Z" level=error msg="Failed to get configuration \"\": Configuration.dapr.io \"\" not found" instance=dapr-sentry-5f54cfcbcc-qmtjz scope=dapr.sentry.identity.kubernetes type=log ver=1.12.4
Other applications with dapr sidecar are starting correctly.
Restarting or redeploying the app doesn't help.
Thank you for any feedback on how to resolve this issue
from dapr.
updating the title as it's indeed also on 1.12.4 ;)
from dapr.
cc @JoshVanL
from dapr.
Hi @steve-heslouin, would you be able to share logs from sentry around this time- does this include any more information?
@da991319 Do you have dapr.io/config: ""
set as an annotation on your application? If no Configuration is needed then this annotation should be omitted.
from dapr.
thanks for coming checking out this issue ;)
For the logs, sure, i should have them, I can try to get back some logs on Monday ;)
from dapr.
Hi @JoshVanL,
Thanks for checking this out.
There is indeed this annotation but it is not needed. I will try omitted it.
Thank you for the suggestion
from dapr.
Here are the logs for sentry on this day, i noticed that there is usually few logs from this pod everyday, but this day in particular, there were much more than usual, and i noticed few errors
2024-01-22 02:32:17.001 time="2024-01-22T01:32:17.000533965Z" level=info msg="Renewing workload cert; current cert expires on: 2024-01-22 08:48:47 +0000 UTC" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.runtime.security type=log ver=1.12.4
2024-01-22 02:32:17.001 time="2024-01-22T01:32:17.001731734Z" level=info msg="Successfully renewed workload cert; new cert expires on: 2024-01-23 01:32:17 +0000 UTC" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.runtime.security type=log ver=1.12.4
2024-01-22 03:26:44.366 time="2024-01-22T02:26:44.365946279Z" level=info msg="Starting Dapr Sentry certificate authority -- version 1.12.4 -- commit e51e5cd3b1af27aef81daaf42530c232bf2b1594" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.366 time="2024-01-22T02:26:44.366023805Z" level=info msg="Log level set to: info" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.385 time="2024-01-22T02:26:44.384735799Z" level=info msg="metrics server started on :9090/" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.385 time="2024-01-22T02:26:44.385028197Z" level=info msg="Starting watch on filesystem directory: /var/run/secrets/dapr.io/credentials" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.385 time="2024-01-22T02:26:44.385078373Z" level=info msg="Healthz server is listening on :8080" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.385 time="2024-01-22T02:26:44.385128248Z" level=info msg="Using kubernetes secret store for trust bundle storage" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry.ca type=log ver=1.12.4
2024-01-22 03:26:44.401 time="2024-01-22T02:26:44.401155771Z" level=info msg="Root and issuer certs found: using existing certs" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry.ca type=log ver=1.12.4
2024-01-22 03:26:44.401 time="2024-01-22T02:26:44.401181851Z" level=info msg="CA certificate key pair ready" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.439 time="2024-01-22T02:26:44.438527459Z" level=info msg="Adding validator 'kubernetes' with Sentry ID: spiffe://cluster.local/ns/dapr-system/dapr-sentry" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.439 time="2024-01-22T02:26:44.438616282Z" level=info msg="Using validator 'kubernetes'" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry type=log ver=1.12.4
2024-01-22 03:26:44.439 time="2024-01-22T02:26:44.439292689Z" level=info msg="Security is initialized successfully" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.runtime.security type=log ver=1.12.4
2024-01-22 03:26:44.439 time="2024-01-22T02:26:44.43933457Z" level=info msg="Starting workload cert expiry watcher; current cert expires on: 2024-01-23 02:26:44 +0000 UTC, renewing at 2024-01-22 19:10:14 +0000 UTC" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.runtime.security type=log ver=1.12.4
2024-01-22 03:26:44.439 time="2024-01-22T02:26:44.439451983Z" level=info msg="Running gRPC server on port 50001" instance=dapr-sentry-5b66cbdcd5-6479t scope=dapr.sentry.server type=log ver=1.12.4
2024-01-22 04:13:43.367 W0122 03:13:43.352751 1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.PartialObjectMetadata ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:43.405 W0122 03:13:43.373613 1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1alpha1.Configuration ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:43.490 W0122 03:13:43.490470 1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:44.789 W0122 03:13:44.787931 1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.PartialObjectMetadata: Unauthorized
2024-01-22 04:13:44.789 E0122 03:13:44.787976 1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized
2024-01-22 04:13:44.797 W0122 03:13:44.790812 1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1alpha1.Configuration: Unauthorized
2024-01-22 04:13:44.797 E0122 03:13:44.790839 1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1alpha1.Configuration: failed to list *v1alpha1.Configuration: Unauthorized
2024-01-22 04:13:45.072 W0122 03:13:45.066822 1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: Unauthorized
2024-01-22 04:13:45.072 E0122 03:13:45.066854 1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized
2024-01-22 04:13:46.111 time="2024-01-22T03:13:46.048705984Z" level=info msg="Received signal 'terminated'; beginning shutdown" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.signals type=log ver=1.12.4
2024-01-22 04:13:46.111 time="2024-01-22T03:13:46.105623728Z" level=info msg="Healthz server is shutting down" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.sentry type=log ver=1.12.4
2024-01-22 04:13:46.111 time="2024-01-22T03:13:46.107148833Z" level=info msg="Shutting down gRPC server" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.sentry.server type=log ver=1.12.4
2024-01-22 04:13:46.131 time="2024-01-22T03:13:46.130488773Z" level=info msg="Sentry shut down gracefully" instance=dapr-sentry-5b66cbdcd5-lhtts scope=dapr.sentry type=log ver=1.12.4
thanks for your help on this :)
from dapr.
Thanks @steve-heslouin, it looks like sentry lost connection to the kube API server for some reason. It needs this connection to perform authorization on incoming certificate requests from Pods, and to lookup Configuration for those Pods to determine the trust domain that identity is in.
This could be some intermittent networking issue between sentry and the API server, however perhaps more likely Sentry is being throttled/rate limited by the API server and Sentry is not handling this in the best way. I'm assuming Sentry is restarted here and eventually becomes fully operational?
Was there must Dapr Pod churn in the cluster when this occurred? Do you have the operator watchdog controller running in the cluster to help "kick" those Pods which are not being injected? Would you also be able to run Sentry in log debug mode as this might reveal a bit more info for us --set dapr_sentry.logLevel=debug
?
from dapr.
@JoshVanL
If i take a look at the time the sidecar failed to be attached, i see 2024-01-22T02:26:31
time="2024-01-22T02:26:31.660675775Z" level=error msg="Sidecar injector failed to inject
while those error on sentry happened around 4.13AM, so they are probably not related for this issue at least , wdyt ?
2024-01-22 04:13:43.367 W0122 03:13:43.352751 1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.PartialObjectMetadata ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:43.405 W0122 03:13:43.373613 1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1alpha1.Configuration ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:43.490 W0122 03:13:43.490470 1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2024-01-22 04:13:44.789 W0122 03:13:44.787931 1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.PartialObjectMetadata: Unauthorized
2024-01-22 04:13:44.789 E0122 03:13:44.787976 1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized
2024-01-22 04:13:44.797 W0122 03:13:44.790812 1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1alpha1.Configuration: Unauthorized
2024-01-22 04:13:44.797 E0122 03:13:44.790839 1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1alpha1.Configuration: failed to list *v1alpha1.Configuration: Unauthorized
2024-01-22 04:13:45.072 W0122 03:13:45.066822 1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: Unauthorized
2024-01-22 04:13:45.072 E0122 03:13:45.066854 1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized
from dapr.
@JoshVanL If i take a look at the time the sidecar failed to be attached, i see
2024-01-22T02:26:31
time="2024-01-22T02:26:31.660675775Z" level=error msg="Sidecar injector failed to inject
while those error on sentry happened around 4.13AM, so they are probably not related for this issue at least , wdyt ?
2024-01-22 04:13:43.367 W0122 03:13:43.352751 1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.PartialObjectMetadata ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding 2024-01-22 04:13:43.405 W0122 03:13:43.373613 1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1alpha1.Configuration ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding 2024-01-22 04:13:43.490 W0122 03:13:43.490470 1 reflector.go:347] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding 2024-01-22 04:13:44.789 W0122 03:13:44.787931 1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.PartialObjectMetadata: Unauthorized 2024-01-22 04:13:44.789 E0122 03:13:44.787976 1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized 2024-01-22 04:13:44.797 W0122 03:13:44.790812 1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1alpha1.Configuration: Unauthorized 2024-01-22 04:13:44.797 E0122 03:13:44.790839 1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1alpha1.Configuration: failed to list *v1alpha1.Configuration: Unauthorized 2024-01-22 04:13:45.072 W0122 03:13:45.066822 1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: Unauthorized 2024-01-22 04:13:45.072 E0122 03:13:45.066854 1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized
This does look like the k8s API server is closing a connection and perhaps timing out. Can you check the API server logs from EKS for the time that correlates with these errors?
from dapr.
This issue, or similar to it, brought down an entire cluster after a Kubernetes upgrade today. On AKS, after updating from 1.26.x to 1.27.7, ALL replica sets (and deployments) were failing with error similar to:
Error creating: admission webhook "sidecar-injector.dapr.io" denied the request: error from sentry SignCertificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.212.137:443: connect: connection refused"
To get deployments to come online I had to remove the sidecar-injector.Dapr.io
mutation webhook from the cluster entirely. Then allow all deployments to come back online, before re-adding the mutation webhook. Yaron, I'm happy to provide additional logs or troubleshooting if it helps you triage this issue.
from dapr.
This issue, or similar to it, brought down an entire cluster after a Kubernetes upgrade today. On AKS, after updating from 1.26.x to 1.27.7, ALL replica sets (and deployments) were failing with error similar to:
Error creating: admission webhook "sidecar-injector.dapr.io" denied the request: error from sentry SignCertificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.212.137:443: connect: connection refused"
To get deployments to come online I had to remove the
sidecar-injector.Dapr.io
mutation webhook from the cluster entirely. Then allow all deployments to come back online, before re-adding the mutation webhook. Yaron, I'm happy to provide additional logs or troubleshooting if it helps you triage this issue.
Thanks Avo, this looks to be a different issue that we're now tracking here: #7479.
from dapr.
Related Issues (20)
- update dapr to 1.12.4 fail to invoke HOT 4
- Dapr doesn't gracefully recover from cluster upgrade/shutdown HOT 1
- Error Standardization: Service Invocation API
- Error Standardization: State API
- Error Standardization: PubSub API
- Error Standardization: Bindings API
- Error Standardization: Secrets API HOT 1
- Error Standardization: Configuration API
- Error Standardization: Distributed Lock API HOT 1
- Error Standardization: Workflow API
- Error Standardization: Health API
- Error Standardization: Metadata API HOT 3
- Error Standardization: Placement API HOT 2
- Error Standardization: Cryptography API HOT 2
- Add support to BINARY data payloads in input and output bindings HOT 2
- HTTP requests directed to application from input bindings always have application/json content-type HOT 2
- Intermittent DaprBuiltInActorNotFoundRetries errors in Dapr app HOT 2
- Helm Chart Docs: global.tolerations should be array [] and not object {}
- Refactor use of 'localhost' in integration tests
- Understand why integration tests on MacOS require '127.0.0.1' instead of 'localhost' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dapr.