Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi containers health checks #2054

Closed
jie-bao opened this issue Apr 9, 2021 · 12 comments
Closed

Support multi containers health checks #2054

jie-bao opened this issue Apr 9, 2021 · 12 comments
Labels
kind/feature New features for Agones stale Pending closure unless there is a strong objection. wontfix Sorry, but we're not going to do that.

Comments

@jie-bao
Copy link
Contributor

jie-bao commented Apr 9, 2021

Is your feature request related to a problem? Please describe.
I have a gameserver that contains 2 unreal server (two containers) in the same pod. Each of them is necessary for running the game and can’t know the other one’s health status. According to current Agones design, any of them call the ClientSDK health API will mark the gameserver healthy regardless of whether the other one is healthy or not.

Describe the solution you'd like
Ideally, we want Agones to support multiple containers health check.

Describe alternatives you've considered
Currently, I made a sidecar for collecting their healthy status then reports to Agones. I did the same thing for the ready API too.

@jie-bao jie-bao added the kind/feature New features for Agones label Apr 9, 2021
@roberthbailey
Copy link
Member

Can you say a bit more about how the two containers are related? I'm curious how it is that they must live in the same pod (be colocated) but that they don't have a strong enough dependency for either one to know if the other is unhealthy.

@markmandel
Copy link
Member

Ooh! Interesting! I'm always intrigued to hear about how people are deconstructing the dedicated games server monolith!

I am wondering one thing though: Given the SDK is accessible on localhost, is there a reason you couldn't have both containers send health pings, with appropriate timing such that the GameServer moves to unhealthy if one fails?

@roberthbailey
Copy link
Member

That might work for health checks but if Ready shouldn't be triggered until both are ready, then we can't do that today. And my brain goes straight into how we could design this, but I'll hold off on inserting those comments until it's appropriate. :)

@jie-bao
Copy link
Contributor Author

jie-bao commented Apr 12, 2021

Can you say a bit more about how the two containers are related? I'm curious how it is that they must live in the same pod (be colocated) but that they don't have a strong enough dependency for either one to know if the other is unhealthy.

We have two UE servers for handling different workloads, one dedicated for AI and one for other main logics. They serve the clients through something like a proxy. They don't know each other's status. In addition, the "proxy" also can't tell whether a server is alive because it can't identify a server disconnection is a normal operation or due to some issue.

@jie-bao
Copy link
Contributor Author

jie-bao commented Apr 12, 2021

That might work for health checks but if Ready shouldn't be triggered until both are ready, then we can't do that today. And my brain goes straight into how we could design this, but I'll hold off on inserting those comments until it's appropriate. :)

I think the ready is easy. We know we have two containers need to be ready, then we can just waiting for two ready calls from different containers.

@markmandel
Copy link
Member

That might work for health checks but if Ready shouldn't be triggered until both are ready, then we can't do that today. And my brain goes straight into how we could design this, but I'll hold off on inserting those comments until it's appropriate. :)

Yeah, I wasn't worrying about the Ready since that's not ticketed issue at this stage 🙂

This is one of those situations where I know we could work out a fix, but I'm also wondering if we should ??? 🤔

Or at least - should we in core Agones? The idea of having some open source sidecars to solve these kind of generic problems might be a more flexible option? At least until we see more dominant patterns of multi-game-server usage?

@jie-bao
Copy link
Contributor Author

jie-bao commented Apr 14, 2021

I'm fine with the sidecar solution. But I wonder whether Agones has something like a "plugin" model that can integrate the customized functions more seamlessly, so that game developers don't need to call different APIs for relative functions.

In my situation, the UE server will call my sidecar's ready/health API, while it still calls the watchGameServer API with Agones Client SDK. That means it need to know 2 different service endpoints and make the game code a bit complicated.

@markmandel
Copy link
Member

But I wonder whether Agones has something like a "plugin" model that can integrate the customized functions more seamlessly, so that game developers don't need to call different APIs for relative functions.

Can you expand further on what you mean by this -- what would this look like in your mind?

@jie-bao
Copy link
Contributor Author

jie-bao commented Apr 15, 2021

Ideally, the game developers shouldn't know too much about infrastructure. Using Agones SDK only is fine. But using Agones SDK together with a customized sidecar is a bit fussy. Especially we need to explain why you should call the Ready/Health API provided by the sidecar and must not call the Ready/Health API in Agones SDK.

So I think if Agones can provide a way that redirect or proxy some API calls to another container. Or allowing injecting some customized logic into those APIs. But I don't have a clear idea about how to implement it.

@markmandel
Copy link
Member

Sounds like what may be a better solution is rolling your own SDK for your specific platform, that quite potentially wraps around the Agones SDK, only overwriting the pieces you need.

Then you can redirect what you need where you need it to go?

I also can't think of a good plugin mechanism for the SDK 🤔 Although you still have full access to the Agones/Kubernetes API - which may also be an interesting way to go.

@markmandel
Copy link
Member

Marking as stale, since it sounds like this was solved by a custom code + Agones SDK solution? Please let us know if you have objections to closing in the next few weeks.

@markmandel markmandel added the stale Pending closure unless there is a strong objection. label Jul 28, 2022
@markmandel markmandel added the wontfix Sorry, but we're not going to do that. label Sep 21, 2022
@markmandel
Copy link
Member

Closing, since no objections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New features for Agones stale Pending closure unless there is a strong objection. wontfix Sorry, but we're not going to do that.
Projects
None yet
Development

No branches or pull requests

3 participants