Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1st Party / Distributed Matchmaker support #660

Closed
markmandel opened this issue Mar 17, 2019 · 21 comments
Closed

1st Party / Distributed Matchmaker support #660

markmandel opened this issue Mar 17, 2019 · 21 comments
Assignees
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones
Milestone

Comments

@markmandel
Copy link
Collaborator

markmandel commented Mar 17, 2019

Objective

To be able to support matchmaking systems that require a gameserver to register themselves with the matchmaker -- such as first party matchmakers.

Background

Many matchmakers that console and others companies provide as hosted services, have a matchmaker workflow in which:

  1. A gameserver will register itself with the matchmaker when ready (usually for a specific period of time), with their IP and port(s) for gameplay
  2. The matchmaker will choose the gameserver out of the pool that have registered themselves
  3. The matchmaker will communicate to the game server that the matchmaker has chosen it for gameplay
  4. Players will then play on the game server.

This would be an alternative flow to FleetAllocation or GameServerAllocation, which will remain the preferred method of GameServer allocation, so that Agones can retain fine control of scheduling within the cluster, but since this is quite a prevalent workflow, Agones should also support it, with appropriate documentation on the tradeoffs.

Requirements and Scale

The design and implementation must ideally have no potential race conditions, and actively prevent the user from incurring race conditions in their usage as well.

Design Ideas

To support this within Agones, we will need to add three enhancements:

  1. A new Reserved GameServer state
  2. A new SDK function Reserve(seconds)
  3. A new SDK function Allocate()

Reserved GameServer State

Reserved state is to signify that the GameServer cannot be deleted, as it may move to allocated in a given time frame. Therefore:

  1. When scaling down a Fleet, Reserved instances will not be deleted
  2. When autoscaling a fleet, they will be counted towards the current buffer, and therefore a change of state to Reserved will not incur an increase in GameServers in the Fleet.

This will mean that if a GameSerer is not demarcated for a game session by the matchmaker, it can move back to Ready in a timely manner, and is able to be scaled down as needed.

SDK Function: Reserve(seconds)

This new SDK function, Reserve will set the GameServer record to the Reserved state for the given number of seconds. (0 indicating forever). When that time period has ended, the GameServer shall revert back to Ready.

It is assumed that when working with a matchmaker, the developer will mark the GameServer as Reserved for slightly longer than it is registered with the matchmaker, so as to avoid scale down race conditions.

Technical Details

Rather than implementing this with a queue, this should be a synchronous call to the Kubernetes API, with in-built retry and a timeout (30s) on failure. Otherwise there is potential for race conditions between calling the SDK function, and the GameServer being moved to Reserved state.

SDK Function: Allocate()

This new SDK function all allows a game server to mark itself as Allocatedwhen called.

Technical Details

Rather than implementing this with a queue, this should be a synchronous call to the Kubernetes API, with in-built retry and a timeout (30s) on failure. Otherwise there is potential for race conditions between calling the SDK function, and the GameServer being moved to Allocated state.

SDK Function: Ready()

As currently exists, Ready() should return the GameServer to a ready state, but also remove any timeout that may be in place from a Reserve(n).

Proposed Matchmaker Workflow

The following would be the workflow for a game server process as it is integrated with Agones and 1st party matchmaker.

sequence diagram

In this workflow, there is no requirement for a GameServer to mark themselves as Ready - they can Reserve(n) themselves as soon as they are about to register themselves with the matchmaker.

@startuml
GameServer -> "Agones SDK": Reserve(n)
note right: n is longer than the known\nReservation time with the\nmatchmaker
GameServer -> "MatchMaker SDK": Register()
GameServer <- "MatchMaker SDK": GameStarting()
GameServer -> "Agones SDK": Allocate()
GameServer -> "MatchMaker SDK": Confirm()
@enduml
@markmandel markmandel added kind/feature New features for Agones kind/design Proposal discussing new features / fixes and how they should be implemented area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc labels Mar 17, 2019
@markmandel
Copy link
Collaborator Author

/cc @victor-prodan pretty sure this is what you wanted a long time ago? (e.g. #297)

@markmandel
Copy link
Collaborator Author

Staring at this, I realised that game servers could move straight to Reserve(n) rather than Ready, as there is no real reason to go to Ready first. Updating design.

/cc @joeholley does the abovegel with what we had originally discussed?

@ilkercelikyilmaz
Copy link
Contributor

This new SDK function, Reserve will set the GameServer record to the Reserved state for the given number of seconds. (0 indicating forever). When that time period has ended, the GameServer shall revert back to Ready.

What happens after the GS returns to Ready state (after timeout)? Should it try to set itself to Reserve state and then Register with MM?

@victor-prodan
Copy link
Contributor

victor-prodan commented Mar 18, 2019

/cc @victor-prodan pretty sure this is what you wanted a long time ago? (e.g. #297)

Yep, sure this 👍
I don't understand the purpose of reserving for a limited amount of time though... :-\

@markmandel
Copy link
Collaborator Author

What happens after the GS returns to Ready state (after timeout)? Should it try to set itself to Reserve state and then Register with MM?

That's up to the game server. But yes, I expect it will re-register, possibly after a short period, to give an opportunity to scale down if there aren't any players currently playing.

I don't understand the purpose of reserving for a limited amount of time though... :-\

It's my understanding, that for many matchmakers, a gameserver will register themselves for a time period with the matchmaker - i.e. "I'm available to play a game on, for the next 5 minutes". Once that time has passed, the matchmaker no longer has it available as an option, unless it re-registeres. This happens so that a fleet can scale down as needed, if there are less people than anticipated playing a game, and one needs to scale the fleet down.

@cyriltovena
Copy link
Collaborator

It's my understanding, that for many matchmakers, a gameserver will register themselves for a time period with the matchmaker - i.e. "I'm available to play a game on, for the next 5 minutes". Once that time has passed, the matchmaker no longer has it available as an option, unless it re-registeres. This happens so that a fleet can scale down as needed, if there are less people than anticipated playing a game, and one needs to scale the fleet down.

Matchmaker could also delete those unused gameservers via the k8s API.

@victor-prodan
Copy link
Contributor

Matchmaker could also delete those unused gameservers via the k8s API.

Even simpler - just as it sends GameStarting event, it could send a 'Shutdown' event as well.

@markmandel
Copy link
Collaborator Author

Yeah - it is dependent on the matchmaker - some of which some people have control over, and some people do not. But it sounds like the above design would work for all these different requirements?

@victor-prodan
Copy link
Contributor

I would change Reserve(n) into Reserve() and Unreserve() (or other equivalent names), because it's more flexible and allows both the type of implementation when we know the reservation time in advance and also the kind in which the matchmaker tells a server to shut down.

@theminecoder
Copy link

I actually think having both might be worth while. Reserve(n) would auto timeout while Reserve() would not. Unreserve() would force it into its unreserved state, removing the timeout if it exists.

@markmandel
Copy link
Collaborator Author

markmandel commented Mar 25, 2019

I would change Reserve(n) into Reserve() and Unreserve() (or other equivalent names), because it's more flexible and allows both the type of implementation when we know the reservation time in advance and also the kind in which the matchmaker tells a server to shut down.

Not all languages have function overloading, but those that do can do Reserve() and Reserve(n), those that don't could do Reserve(n) and Reserve(0) (where 0 is forever).

Unreserve we already have 😄 it's called Ready() - just need to make some tests to make sure this logic path works, and make any adjustments as need be (like removing the timeout if it exists).

So I think then that the above design should work out well. I added a section on Ready() to indicate that it should remove the timeout, and make explicit its requirements.

@markmandel
Copy link
Collaborator Author

Starting work with a PR to implement SDK.Allocate()

WIP: https://github.com/markmandel/agones/tree/feature/sdk-allocate

@markmandel markmandel added this to the 0.10.0 milestone Apr 3, 2019
@markmandel markmandel self-assigned this Apr 14, 2019
markmandel added a commit to markmandel/agones that referenced this issue Apr 14, 2019
Now GameServers can self Allocate!

This is just the implementation of the GO SDK at this stage (although the gRPC
libraries have been regenerated). Other languages can come in later PRs.

This is the first part of 1st Party MatchMaking support (googleforgames#660)
markmandel added a commit to markmandel/agones that referenced this issue Apr 14, 2019
Now GameServers can self Allocate!

This is just the implementation of the GO SDK at this stage (although
the gRPC libraries have been regenerated). Other languages can come in
later PRs.

This is the first part of 1st Party MatchMaking support (googleforgames#660)
@markmandel
Copy link
Collaborator Author

I just got completed with SDK.Allocate() but it looks like when rebasing against master, gRPC regeneration is kinda broken 😢 - combo of the C++ changes and the new transition to Go modules.

@Kuqd looks like your work in #630 fixes this -- can you confirm? If so, I'll wait on that to go through (looks like it's close, just need to resolve some conflicts mostly).

/cc @dsazonoff for visibility as well.

@cyriltovena
Copy link
Collaborator

yes it does !

markmandel added a commit to markmandel/agones that referenced this issue Apr 15, 2019
Now GameServers can self Allocate!

This is just the implementation of the GO SDK at this stage (although
the gRPC libraries have been regenerated). Other languages can come in
later PRs.

This is the first part of 1st Party MatchMaking support (googleforgames#660)
markmandel added a commit to markmandel/agones that referenced this issue Apr 15, 2019
Now GameServers can self Allocate!

This is just the implementation of the GO SDK at this stage (although
the gRPC libraries have been regenerated). Other languages can come in
later PRs.

This is the first part of 1st Party MatchMaking support (googleforgames#660)
markmandel added a commit to markmandel/agones that referenced this issue Apr 17, 2019
Now GameServers can self Allocate!

This is just the implementation of the GO SDK at this stage (although
the gRPC libraries have been regenerated). Other languages can come in
later PRs.

This is the first part of 1st Party MatchMaking support (googleforgames#660)
markmandel added a commit to markmandel/agones that referenced this issue Apr 17, 2019
Now GameServers can self Allocate!

This is just the implementation of the GO SDK at this stage (although
the gRPC libraries have been regenerated). Other languages can come in
later PRs.

This is the first part of 1st Party MatchMaking support (googleforgames#660)
markmandel added a commit to markmandel/agones that referenced this issue Apr 17, 2019
Now GameServers can self Allocate!

This is just the implementation of the GO SDK at this stage (although
the gRPC libraries have been regenerated). Other languages can come in
later PRs.

This is the first part of 1st Party MatchMaking support (googleforgames#660)
markmandel added a commit to markmandel/agones that referenced this issue Apr 26, 2019
Now GameServers can self Allocate!

This is just the implementation of the GO SDK at this stage (although
the gRPC libraries have been regenerated). Other languages can come in
later PRs.

This is the first part of 1st Party MatchMaking support (googleforgames#660)
markmandel added a commit that referenced this issue Apr 26, 2019
Now GameServers can self Allocate!

This is just the implementation of the GO SDK at this stage (although
the gRPC libraries have been regenerated). Other languages can come in
later PRs.

This is the first part of 1st Party MatchMaking support (#660)
@markmandel markmandel removed this from the 0.10.0 milestone May 7, 2019
@markmandel markmandel added this to the 0.12.0 milestone Jun 18, 2019
markmandel added a commit to markmandel/agones that referenced this issue Jun 21, 2019
The proto file definition for the reserve status in googleforgames#660,
and the generated clients from there.
@markmandel
Copy link
Collaborator Author

I was digging into this more - from the design and the original version of Allocate (and eventually Reserve) was to make it a synchronous operation with a 30 second timeout - the idea being to stop race condtions.

Looking at the code, I don't think this is a good idea. They should be async like Ready and Shutdown. Having some SDK functions that change status.state values are sync and some that are async (a) gives us an inconsistent interface for the SDK and (b) will actually cause the race conditions that I previously was trying to remove.

I'll shift the current implementation of Allocate over to using the queue like the other implementations targeted at the next release. The API surface will stay the same though.

This also means it doesn't matter what state change you are trying to implement - you can use the watch command to see when the final change occurs - so you only need one logical path to implement.

Please let me know if anyone has objections / that doesn't make sense.

markmandel added a commit that referenced this issue Jul 2, 2019
The proto file definition for the reserve status in #660,
and the generated clients from there.
@markmandel
Copy link
Collaborator Author

Remaining work (I have done it, but am slowly feeding it through the PR queue):

  • E2E tests for Reserve
  • Reserve GameServer lifecycle diagram
  • Update GameServer state diagram to include Reserve

markmandel added a commit to markmandel/agones that referenced this issue Jul 18, 2019
Shows how the workflow can exist for 1st party/distributed matchmakers.

Work on googleforgames#660
markmandel added a commit to markmandel/agones that referenced this issue Jul 18, 2019
Shows how the workflow can exist for 1st party/distributed matchmakers.

Work on googleforgames#660
markmandel added a commit to markmandel/agones that referenced this issue Jul 18, 2019
Pushes a new simple-udp version (0.13)

Work on googleforgames#660
markmandel added a commit to markmandel/agones that referenced this issue Jul 18, 2019
Pushes a new simple-udp version (0.13)

Work on googleforgames#660
markmandel added a commit to markmandel/agones that referenced this issue Jul 18, 2019
Pushes a new simple-udp version (0.13)

Work on googleforgames#660
markmandel added a commit to markmandel/agones that referenced this issue Jul 18, 2019
Shows how the workflow can exist for 1st party/distributed matchmakers.

Work on googleforgames#660
roberthbailey pushed a commit that referenced this issue Jul 19, 2019
Pushes a new simple-udp version (0.13)

Work on #660
markmandel added a commit to markmandel/agones that referenced this issue Jul 20, 2019
Shows how the workflow can exist for 1st party/distributed matchmakers.

Work on googleforgames#660
roberthbailey pushed a commit that referenced this issue Jul 20, 2019
Shows how the workflow can exist for 1st party/distributed matchmakers.

Work on #660
markmandel added a commit to markmandel/agones that referenced this issue Jul 22, 2019
Update dot and generated PNG with the Reserved state, and resultant
flow from there.

Tried to keep it as simple as possible, while still representing
potential state changes.

Should be final work on googleforgames#660 except for missing SDK functions.
markmandel added a commit to markmandel/agones that referenced this issue Jul 23, 2019
Update dot and generated PNG with the Reserved state, and resultant
flow from there.

Tried to keep it as simple as possible, while still representing
potential state changes.

Should be final work on googleforgames#660 except for missing SDK functions.
markmandel added a commit to markmandel/agones that referenced this issue Jul 23, 2019
Update dot and generated PNG with the Reserved state, and resultant
flow from there.

Tried to keep it as simple as possible, while still representing
potential state changes.

Should be final work on googleforgames#660 except for missing SDK functions.
roberthbailey pushed a commit that referenced this issue Jul 23, 2019
Update dot and generated PNG with the Reserved state, and resultant
flow from there.

Tried to keep it as simple as possible, while still representing
potential state changes.

Should be final work on #660 except for missing SDK functions.
@markmandel
Copy link
Collaborator Author

So the base functionality of this is now complete.

The remaining item is support for all the languages/engines. How do we feel about closing this ticket -- wait for the SDK functionality to be finished, or create a new ticket for the missing (which we have in #927 .

Thoughts?

@roberthbailey
Copy link
Member

I'm ok tracking the remaining work in #927 (and it's broader than this issue).

@markmandel
Copy link
Collaborator Author

I'm ok tracking the remaining work in #927 (and it's broader than this issue).

Agreed. I will close this issue, and we can track the rest on #927 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones
Projects
None yet
Development

No branches or pull requests

6 participants