petabridge / akkadotnet-healthcheck Goto Github PK
View Code? Open in Web Editor NEWHealthchecks for Akka.NET Applications :hospital:
License: Apache License 2.0
Healthchecks for Akka.NET Applications :hospital:
License: Apache License 2.0
Need to rename all projects and the NuGet packages accordingly.
Need to validate the following via unit tests:
AkkaPersistenceLivenessProbeProvider
from HOCON configuration when starting up an ActorSystem
with it configuredAkkaPersistenceLivenessProbe
should be able to correctly handle subscriptions in any state.AkkaPersistenceLivenessProbe
should correctly report that Akka.Persistence is available when it isAkkaPersistenceLivenessProbe
should correctly report that Akka.Persistence is NOT available when it isn't able to load at startup.AkkaPersistenceLivenessProbe
should correctly report that Akka.Persistence has become unavailable AFTER initially being available (simulate a future change in dis-connectivity.)We will likely need to create some custom Akka.Persistence journal and SnapshotStore implementations in order to succeed in testing these - please take a look at some of the tests we have in the
and
Need to implement Recover for LivenessStatus with a test to make sure it is able to correctly signal when the cluster is up. As well as when it is no longer reachable.
These HCs are poisonous and can prevent cluster formation. Need to be relaxed.
AkkaClusterLivenessProbe - Liveness probe for clustering.
Reports healthy when:
The ActorSystem joined a cluster.
The ActorSystem is connected to a cluster
Reports unhealthy when:
The ActorSystem just started and has not joined a cluster.
The ActorSystem left the cluster.
Rewrite to
ClusterLivenessProbeProvider - Liveness probe for clustering.
Reports healthy when:
Reports unhealthy when:
The ActorSystem leaving the cluster.
AkkaClusterReadinessProbe - Readiness probe for clustering.
Reports healthy when:
The ActorSystem joined a cluster.
The ActorSystem is connected to a cluster
Reports unhealthy when:
The ActorSystem just started has not joined a cluster.
All other nodes in the cluster is unreachable.
Rewrite to:
ClusterReadinessProbeProvider - Liveness probe for clustering.
Reports healthy when:
Reports unhealthy when:
All other nodes in the cluster is unreachable.
Should log this automatically without any configuration settings - just to let the end-user know in the startup logs that the system is running with one or both of these tools enabled.
I would like to do akka.net cluster and healthcheck. Do I need Akka.HealthCheck.Cluster or is Akka.HealthCheck enough?
Do you have some example how to check if cluster is up and running and use this for monitoring?
Thank you
WebApiTemplate.App.csproj: [NU1100] Unable to resolve 'Akka.HealthCheck.Hosting.Web (>= 1.0.0)' for 'net7.0'. PackageSourceMapping is enabled, the following source(s) were not considered: nuget.****
Is the issue here how we're targeting ASP.NET?
LGTM - need to validate the nuget publication locally (or check what the build server produced.) Don't want to publish any sample projects and need to make sure that the correct `README.md` files are included.
Originally posted by @Aaronontheweb in #148 (review)
Add the ability to turn on debug logging for all built-in transports for both liveness and readiness probes.
The logs should also make it clear whether or not it's the liveness OR readiness probe writing to the transport.
It seems you can only configure a single probe for readiness (liveness) check. Isn't it possible to run multiple readiness (liveness) checks like 'cluster readiness', 'persistence journal check', 'my custom check1', ...
The SuicideProbe class is to check the persistence journal state.
It recovers the last event and snapshot from the journal
After it writes a new event and snapshot to the journal
and deletes all old events and snapshots.
The issue is that it already send a RecoveryStatus on successful revoery back
without checking the success of the new persistet event or snapshot
And in the case when the journal success in only recovery (read-mode) and not in persist (write-mode)
then the RecoveryStatus will still be always successful.
The bottom line is that the write and the delete of new "hit" messages is somehow
not used by the healthcheck itself and only makes a hit on one sector of the SSD every 10sec
During analysing our storage account blob snapshot folder I saw that I had a lot of Akka.Healthchecks snapshots. In the standup yesterday it was mentioned that the suicideProbe should cleanup the journal/snapshot store after it's tests. This is definetely not the case:
At this moment this cluster only has 3 nodes, so old recycled nodes (pods) are still having snapshots/journal records lingering around.
It would be nice to remove the snapshot/journal records used for the healthprobe after the probe is finished. Off course you have the scenario that a pod could crash during the healthprobe, then you still get undeleted journal/snapshots but the chance of that happening is really low and I can live with that.
Given how this is implemented:
We never actually handle the incoming socket requests, thus the liveness probe will eventually fail given enough time. Need to actually handle the socket request and verify it by sending back some trivial piece of data.
Can you make this fix and verify it via a TCP integration test @izavala ?
Should do a dump of all of the built-in HealthCheck settings at launch, so users can troubleshoot when first configuring the probes and transports.
[INFO][2/7/2023 10:35:01 PM][Thread 0055][akka.tcp://AkkaWebApi@localhost:8081/system/healthcheck-live-persistence] Persistence probe terminated. Recreating...
[INFO][2/7/2023 10:35:11 PM][Thread 0052][akka.tcp://AkkaWebApi@localhost:8081/system/healthcheck-live-persistence] Received recovery status PersistenceLivenessStatus(JournalRecovered=True, SnapshotRecovered=True, JournalPersisted
=True, SnapshotSaved=True, Failures=null) from probe.
[INFO][2/7/2023 10:35:21 PM][Thread 0054][akka.tcp://AkkaWebApi@localhost:8081/system/healthcheck-live-persistence] Received recovery status PersistenceLivenessStatus(JournalRecovered=True, SnapshotRecovered=False, JournalPersiste
d=True, SnapshotSaved=True, Failures=null) from probe.
[INFO][2/7/2023 10:35:31 PM][Thread 0055][akka.tcp://AkkaWebApi@localhost:8081/system/healthcheck-live-persistence] Received recovery status PersistenceLivenessStatus(JournalRecovered=True, SnapshotRecovered=True, JournalPersisted
=True, SnapshotSaved=True, Failures=null) from probe.
[INFO][2/7/2023 10:35:41 PM][Thread 0010][akka.tcp://AkkaWebApi@localhost:8081/system/healthcheck-live-persistence] Received recovery status PersistenceLivenessStatus(JournalRecovered=True, SnapshotRecovered=True, JournalPersisted
=True, SnapshotSaved=True, Failures=null) from probe.
This needs to go into debug logging or not get logged at all unless there's a problem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.