-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide Health Checks for external Systems #2687
Comments
I think while the current probes which simply check whether It would be great if this is done. Can I help? |
We feel this will be resolved by #2815; additional testing will determine if that PR covers everything we need. @c16a , thank you so much for the offer to help - much appreciated! I think we have this covered. There are plenty of issues open for contributors; don't hesitate to reach out if you find one that interests you. |
Do we have a health check endpoint which gives us the cluster health? Currently I have a NATS cluster which has say (n) number of servers in it. My understanding is that /healthz gives the NATS server health check and not of the entire cluster. |
I would suggest using the NATS cli. You need to have a system account access.
|
Feature Request
A number of external systems could utilize introspection into the readiness and liveness of the NATS server, such as K8s and others (see #1903). This will provide much better UX for K8s users and reduce errors on startup, resource issues, and loss of quorum.
Suggestions for Discussion
/healthz?current-cluster-size=N
/healthz?current-cluster-size=N&quorum=true
/healthz
/healthz?isCandidate=false
/healthz
/healthz?js-enabled=true
*Not sure if readiness failures would prevent cluster traffic (TBD).
Liveness (JetStream) would fail if the JetStream subsystem has been shutdown due to lack of resources, unavailable PVC, etc.
The endpoints would return 200 if successful.
Startup, Liveness, and Readiness probes would significantly help in terms of startup and potentially reduce time to problem resolution (especially the Liveness probe constrained resources in k8s).
This may not be correct but I hope to spur discussion and am looking for community feedback in this area.
CC @nats-io/core @wallyqs @ripienaar
The text was updated successfully, but these errors were encountered: