Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC health check? #522

Closed
nigelellis opened this issue Jan 28, 2022 · 17 comments
Closed

GRPC health check? #522

nigelellis opened this issue Jan 28, 2022 · 17 comments

Comments

@nigelellis
Copy link

I'm setting up the cache in EKS (AWS) with an ALB and need to define a gRPC health check. Similar to the HTTP endpoint /status, is there a gRPC path and success code I can use to check overall gRPC endpoint health?

I've tried the following but it's still failing health checks likely due to a bad path:

      "alb.ingress.kubernetes.io/healthcheck-path"         = "/grpc.health.v1.Health/Check",
      "alb.ingress.kubernetes.io/healthcheck-port"         = "9092",
      "alb.ingress.kubernetes.io/success-codes"            = "0",

I'm guessing the path is incorrect but don't know what the path and success code would be for the Bazel cache proto.

Thanks.

@nigelellis
Copy link
Author

Update -- I tried /build.bazel.remote.execution.v2.Capabilities/GetCapabilities but it's still not working.

@mostynb
Copy link
Collaborator

mostynb commented Jan 28, 2022

That's what I was going to suggest. Does bazel-remote log anything when the health check fires? If you see a GETCAPABILITIES log line then the request was made successfully, maybe kubernetes requires this specific healthcheck service?

@mostynb
Copy link
Collaborator

mostynb commented Jan 28, 2022

I wonder if something like this works? #523

I'm not sure what service string should be specified in the SetServingStatus, and this probably needs a tweak to make it work with the authentication interceptor. Would you be able to test this?

@nigelellis
Copy link
Author

nigelellis commented Jan 28, 2022

@mostynb I can definitely test and experiment with different configs. We use other gRPC service (mostly from Node) which all expose a default health check on /grpc.health.v1.Health/Check:

alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/backend-protocol-version: GRPC
alb.ingress.kubernetes.io/healthcheck-path: /grpc.health.v1.Health/Check
alb.ingress.kubernetes.io/healthcheck-port: 9092
alb.ingress.kubernetes.io/listen-ports: [{"HTTPS": 9092}]
alb.ingress.kubernetes.io/success-codes: 0
alb.ingress.kubernetes.io/target-type: ip

To your earlier question, I don't see any log entries when the ALB checks /GetCapabilities. I just see a silent error in the AWS console.

@mostynb
Copy link
Collaborator

mostynb commented Jan 28, 2022

Should I use /grpc.health.v1.Health/Check instead of bazel-remote in the SetServingStatus call?

@nigelellis
Copy link
Author

nigelellis commented Jan 28, 2022

@mostynb the path doesn't matter as long as we know what it is but it appears /grpc.health.v1.Health/Check seems to be used as a pseudo standard which maps to the gRPC class path from https://github.com/grpc/grpc/blob/master/src/proto/grpc/health/v1/health.proto.

When setting up the ALB health check we specify the path to use and configure this via. a Kubernetes Ingress Rule using an annotation. See https://kubernetes-sigs.github.io/aws-load-balancer-controller/v1.1/guide/ingress/annotation/#healthcheck-path

Here's an example from one of our Node gRPC services - our services expose https://github.com/grpc/grpc/blob/master/src/proto/grpc/health/v1/health.proto on /grpc.health.v1.Health/Check.

Thanks.

Screen Shot 2022-01-28 at 1 47 58 PM

@mostynb
Copy link
Collaborator

mostynb commented Jan 28, 2022

Updated the PR (not tested locally yet).

@nigelellis
Copy link
Author

Hi @mostynb, anything I can do to help test?

@mostynb
Copy link
Collaborator

mostynb commented Feb 4, 2022

@nigelellis
Copy link
Author

Thanks, I'll make time to check it out this week.

@nigelellis
Copy link
Author

Hi @mostynb, I was able to test your Docker image today and was able to get things working with authentication disabled. We make use of password authentication --htpasswd_file=.... From my testing, it appears the GRPC Health service requires authentication (presumably also if you're using CA auth too). When I disable auth, the endpoint came up fine worked fine.

For it to work with the ALB checks it needs to bypass auth. This would follow the same path that the /status HTTP endpoint does -- no auth. Assuming you can make that change, we should be good to roll! Thanks.

FYI -- I also found this tool useful for local testing: https://github.com/grpc-ecosystem/grpc-health-probe

@mostynb
Copy link
Collaborator

mostynb commented Feb 8, 2022

Thanks for the update. This might work with --allow_unauthenticated_reads already, but I guess it should always be accessible- I'll try to get that working.

@mostynb
Copy link
Collaborator

mostynb commented Feb 20, 2022

I landed #523 and pushed a new "latest" image to dockerhub (quay.io is readonly for maintenance work right now, I'll update that later). The health check service should not require authentication now. Please let me know if there are any problems.

@mostynb mostynb closed this as completed Feb 20, 2022
@nigelellis
Copy link
Author

@mostynb -- I verified the image buchgr/bazel-remote-cache:latest is working with gRPC. The ALB was able to probe the gRPC health endpoints without auth making the ALB very happy :)

Would you mind pushing the image with a semver tag so I can pin the version? Very much appreciate your help in getting this support in.

@mostynb
Copy link
Collaborator

mostynb commented Feb 22, 2022

I plan to make a new bazel-remote release once #527 and bazelbuild/remote-apis#213 land (hopefully soon).

@mostynb
Copy link
Collaborator

mostynb commented Feb 23, 2022

Released v2.3.4.

@nigelellis
Copy link
Author

Awesome, thank you @mostynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants