-
Notifications
You must be signed in to change notification settings - Fork 828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arbitrary Counts and Lists for GameServers, SDKs and Allocation #2716
Comments
Calling on people I think might find this interesting, since this is a big idea 😄 : @tenevdev , @highlyunavailable , @neuecc , @castaneai , @sisso , @issotina , @foxydevloper |
This is on the agenda for the community meeting tomorrow so if you have opinions / want to discuss with real time feedback we would love to see you there. |
I just realised, I didn't add a section on Fleet Autoscaling! I'll amend that shortly. |
We are looking to implement Room based High Density Game Servers About Counters
If it attempts to decrement the counter below 0, we would like it to filter out GameServers.
We would like to filter GameServers that have a counter with the provided key. About Lists
We would like it to filter out any GameServers that don’t have room.
Same as counters, we want it to filter by the GameServer that has the key. About SDK
I didn’t understand this question, could you please elaborate? About Critical User JourneysIn the Room based High Density Game Servers example, the StateAllocationFilter is not used. |
Just dropped several edits to remove some questions based on the above and internal feedback. PTAL. Summary:
To see a diff, use the edit history button: @katsew to respond to your questions directly:
As per above, you would need to explicitly tell the allocation with
I'm thinking that there might be an implicit filtering there (i.e. if you attempt to allocate on counter "foo" and it doesn't exist on the GameServer, the system would attempt to increment, fail, and then move on to another
You can choose if you want to use a list with a capacity, or just have a counter that lists how many rooms are left. It's up to you.
In that case, a list of room id tokens with a capacity seems like the appropriate choice for your use case, since you can filter on a room id within allocated game servers with this functionality.
Ah - this is an interesting point if you aren't aware of how the internal of K8s works. Essentially everything in k8s is eventually consistent, and therefore so is Agones. It allows the entire system to be self healing even if the control plane goes down for a time. So SDK commands are async (they go into a queue once the SDK command has been fired), and at the same time with this functionality, it's entirely possible for an Allocation or a K8s API command to change a list or counter value at the same time - so it's entirely likely that if people are doing both the count/list values in an SDK will be out of sync with what's in a CRD, and vice versa - because, eventually consistent. For Player Tracking, we told people "pick one path, so you don't have this issue". Here we are giving people lots of different options, and we'll need to be very explicit about what each of the tradeoffs are so that unexpected issues don't arise for end users. Did that make a certain amount of sense?
OMG. I totally missed that I didn't add those. That functionality definitely would still work, I just wasn't thinking and forgot to add it 🤦🏻 thanks for the excellent catch! Please let me know if any of that didn't make any sense. |
Yes, totally made sense, thank you. |
Was chatting with @roberthbailey , and he raised an interesting point. In Player Tracking, Lists where essentially treated as Sets (i.e. every value was unique in the List). If you add a Do we do that here are well with Lists (maybe rename them to Sets?), or since we're aiming for a more generic implementation, do we allow duplicate values in a List? 🤔 or do we need add a setting to a List, something like What do people think? |
I hadn't thought about the fact that we treated lists of players as having to be unique, but I guess we didn't expect to have the same player join a game session twice. If we are making the lists more generic, we should think whether there are scenarios where having duplicate values makes sense. Also, what do duplicate values means in terms of allocation requests? Presumably you would always end up checking for at least one occurrence of the string in the list. |
@markmandel Hi, thanks for the mentions. This is an interesting proposal!
I am very happy to see Agones add more features to address even more use cases. Thank you! |
I'm not sure if it should be in scope for this proposal, but it would be really interesting to see what that looks like - which parts are owned by Agones and which parts are split out. It might help us design a better solution that more seamlessly integrates with a solution that leverages an external RDB. |
I think we should go with lists (allow duplicate values) to cover more scenarios than with sets. Related to this, I'm wondering what is the expected behavior of |
Oooh, that's a good question also. I think for lists, it would have to be a single value. We could implement a |
In our case, we don't want to duplicate values in the list, so it would be helpful for us to have a document how to deduplicate values in the list at the initial release. |
Yeah, that 100% makes sense - we need to make sure there is a migration path. I'm leaning towards: lists: # list of lists.
players: # key for this list (players)
capacity: 100 # set capacity
unique: true # this makes it work like a set So have the How does that sound? |
The I have several thoughts:
|
Thanks for all the great discussion! Sorry I dropped off for a bit, been focusing on another open source project for a bit. Discussing things in the community meeting - we were discussing only allowing unique items in the List (so basically an ordered set), if we couldn't come up with a use case for having multiple of the same item in a list, to avoid implementing features that we didn't need. Can anyone come up with a use case? If not, maybe we just drop the ability to store duplicate values in a list (should we rename it to a
If we do go this route, totally agreed (see comment above).
We can definitely do this, I had left them off since there was lots of eventual-consistency management , but we can definitely do it with whatever the SDK knows about at that point and time from itself and/or what it's current information is on the CRD. I can't see any huge downside to adding thins, so I'll put this on my list of things to add back in. |
I don't have a use case for lists, so dropping the ability to store duplicate values sounds good to me. :) |
I have a question about data manipulation on allocation.
I have a plan to use multiple Allocator Service in a single k8s cluster for redundancy reason. |
K8s resource modifications are generationally locked - so unless the local system has the latest generation of a resource, any update is rejected, which avoids "last-update-wins" race conditions. You could in theory select the same GameServer in succession if after an Allocation it still matches the search criteria for Allocated GameServers - but that's an exercise for the developer to find the appropriate level of locking for their game. |
Sorry for the long delay - was focusing on working towards a Quilkin release, and the addition of capacity to counts was tricky. But we have updates! Would love your feedback! Summary of updates to the design above: Working through changes for autoscaling implemented a few changes to the design: Summary of changes:
Questions:
|
Thanks for updates!
Using "capacity" across lists and counts sounds good to me :)
I'm not sure but should we support multiple counts and lists for GameServer? 🤔 |
That's an interesting question. My thought was, while it may not be used for autoscaling, it may be used for allocation filtering. It's a bit clunky, but say you want to track Rooms and players per room - you might have something like: apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
generateName: "simple-game-server-"
spec:
ports:
- name: default
portPolicy: Dynamic
containerPort: 7654
template:
spec:
containers:
- name: simple-game-server
image: gcr.io/agones-images/simple-game-server:0.13
counters:
rooms:
default: 0
capacity: 4
lists:
players_1:
players_2:
players_3:
players_4: That probably doesn't scale if you have 1000 rooms (at which point, go use a DB), but this works in a pinch. So you would autoscale on WDYT? |
Ah, that's true. |
Thanks for the feedback and working through it with me! |
This issue is still being worked on. |
Much like the player tracking SDK documentation, I wanted to be explicit in the SDK documentation for Counts and Lists where data was being stored, and where default values where coming from with links back to the generated API reference - so made some improvements to help facilitate that. This does still need a Counters and Lists landing page to reference (this is my next task) as well, but wanted to get this done while I was thinking about it. Work on googleforgames#2716
Much like the player tracking SDK documentation, I wanted to be explicit in the SDK documentation for Counts and Lists where data was being stored, and where default values where coming from with links back to the generated API reference - so made some improvements to help facilitate that. This does still need a Counters and Lists landing page to reference (this is my next task) as well, but wanted to get this done while I was thinking about it. Work on googleforgames#2716
* Counts and Lists: Improvements to SDK docs Much like the player tracking SDK documentation, I wanted to be explicit in the SDK documentation for Counts and Lists where data was being stored, and where default values where coming from with links back to the generated API reference - so made some improvements to help facilitate that. This does still need a Counters and Lists landing page to reference (this is my next task) as well, but wanted to get this done while I was thinking about it. Work on #2716 * Few improvements over original PR. * Review updates.
Updated fleetautoscaler.md with more general descriptions of each type of autoscaling strategy. Since we have 4 now, it seemed like it would be useful to provide some use cases around each type of autoscaling and why you would choose one over another. Work on googleforgames#2716
More description on fleetautoscaler.md Updated fleetautoscaler.md with more general descriptions of each type of autoscaling strategy. Since we have 4 now, it seemed like it would be useful to provide some use cases around each type of autoscaling and why you would choose one over another. Work on #2716
The primary detail of this PR is to implement a Guide > Counters and Lists documentation landing page to give end users documentation on how to use all the variety of touch points of Counters and Lists. This does sprawl out a little, as in part of this process, this also touched on: * Links and warnings from other pages that should link to this landing page. * Found a bunch of minor fixes that needed doing, with both documentation and example bugs and updates just for consistency. * Implemented some small changes in terminology (primarily total capacity -> available capacity), which aligns the implementations and the documentation. * Fixes and updates to CRD and Go data structure documentation that goes along with the above. Work on googleforgames#2716
The primary detail of this PR is to implement a Guide > Counters and Lists documentation landing page to give end users documentation on how to use all the variety of touch points of Counters and Lists. This does sprawl out a little, as in part of this process, this also touched on: * Links and warnings from other pages that should link to this landing page. * Found a bunch of minor fixes that needed doing, with both documentation and example bugs and updates just for consistency. * Implemented some small changes in terminology (primarily total capacity -> available capacity), which aligns the implementations and the documentation. * Fixes and updates to CRD and Go data structure documentation that goes along with the above. * Found some example content that was missing. Work on googleforgames#2716
Just as an audit trail - ticking boxes on items that have split out issues that will be tracked separately, so we can close out this issue as the bulk of the work is complete. |
The primary detail of this PR is to implement a Guide > Counters and Lists documentation landing page to give end users documentation on how to use all the variety of touch points of Counters and Lists. This does sprawl out a little, as in part of this process, this also touched on: * Links and warnings from other pages that should link to this landing page. * Found a bunch of minor fixes that needed doing, with both documentation and example bugs and updates just for consistency. * Implemented some small changes in terminology (primarily total capacity -> available capacity), which aligns the implementations and the documentation. * Fixes and updates to CRD and Go data structure documentation that goes along with the above. * Found some example content that was missing. Work on googleforgames#2716
@igooch WDYT, shall we close this ticket? |
The primary detail of this PR is to implement a Guide > Counters and Lists documentation landing page to give end users documentation on how to use all the variety of touch points of Counters and Lists. This does sprawl out a little, as in part of this process, this also touched on: * Links and warnings from other pages that should link to this landing page. * Found a bunch of minor fixes that needed doing, with both documentation and example bugs and updates just for consistency. * Implemented some small changes in terminology (primarily total capacity -> available capacity), which aligns the implementations and the documentation. * Fixes and updates to CRD and Go data structure documentation that goes along with the above. * Found some example content that was missing. Work on googleforgames#2716
The primary detail of this PR is to implement a Guide > Counters and Lists documentation landing page to give end users documentation on how to use all the variety of touch points of Counters and Lists. This does sprawl out a little, as in part of this process, this also touched on: * Links and warnings from other pages that should link to this landing page. * Found a bunch of minor fixes that needed doing, with both documentation and example bugs and updates just for consistency. * Implemented some small changes in terminology (primarily total capacity -> available capacity), which aligns the implementations and the documentation. * Fixes and updates to CRD and Go data structure documentation that goes along with the above. * Found some example content that was missing. Work on googleforgames#2716
The primary detail of this PR is to implement a Guide > Counters and Lists documentation landing page to give end users documentation on how to use all the variety of touch points of Counters and Lists. This does sprawl out a little, as in part of this process, this also touched on: * Links and warnings from other pages that should link to this landing page. * Found a bunch of minor fixes that needed doing, with both documentation and example bugs and updates just for consistency. * Implemented some small changes in terminology (primarily total capacity -> available capacity), which aligns the implementations and the documentation. * Fixes and updates to CRD and Go data structure documentation that goes along with the above. * Found some example content that was missing. Work on googleforgames#2716
The primary detail of this PR is to implement a Guide > Counters and Lists documentation landing page to give end users documentation on how to use all the variety of touch points of Counters and Lists. This does sprawl out a little, as in part of this process, this also touched on: * Links and warnings from other pages that should link to this landing page. * Found a bunch of minor fixes that needed doing, with both documentation and example bugs and updates just for consistency. * Implemented some small changes in terminology (primarily total capacity -> available capacity), which aligns the implementations and the documentation. * Fixes and updates to CRD and Go data structure documentation that goes along with the above. * Found some example content that was missing. Work on googleforgames#2716
* Counters & Lists landing page and doc improvements The primary detail of this PR is to implement a Guide > Counters and Lists documentation landing page to give end users documentation on how to use all the variety of touch points of Counters and Lists. This does sprawl out a little, as in part of this process, this also touched on: * Links and warnings from other pages that should link to this landing page. * Found a bunch of minor fixes that needed doing, with both documentation and example bugs and updates just for consistency. * Implemented some small changes in terminology (primarily total capacity -> available capacity), which aligns the implementations and the documentation. * Fixes and updates to CRD and Go data structure documentation that goes along with the above. * Found some example content that was missing. Work on #2716 * Add in Fleet prioritisation section. * Review updates. * Add warning for Fleet priorities, until next release.
Objective
With the recent work with Player Tracking, as well as it’s cross over into High Density Game Server support / re-allocation of Allocated
GameServers
, it seems that to be able to provide arbitrary count values and/or lists of values that are tied to GameServers, much like Player Tracking values are right now, is very useful for a wide variety of use cases.This feature design’s contention is to replace Player Tracking with a generic way to track general counts as well as lists against a
GameServer
by an user provided key, as well as with integrated allocation, Fleet scheduling and SDK support, such that it can support the use case of player tracking as it currently stands, but also use cases like multi-tenant room server counting, or any other game specific value that could be utilised for a custom integration.An added benefit would be that simple gauge data as metrics would be exposed as well, although we may not want to advocate this as a blessed path for only exporting metrics, if not taking advantage of other functionality.
This feature would be built behind the
GameServerCountsAndLists
feature gate and should be on-par withPlayerTracking
before thePlayerTracking
functionality is removed.Requirements
GameServer
a set of attached lists and/or counters attached to an arbitrary, user supplied keyGameServer
keys for counters and list at runtime. Keys should be explicitly predefined with theGameServer
definition to put some limits on what can be stored against fuetcd and ideally avoid overloading the Kubernetes API control plane (although we will need strong documentation about this, as this will definitely put extra load on the control plane).Counters
GameServer
CRD status.Lists
GameServer
CRD status.Allocation filtering and sorting
Fleets scheduling
Metrics
Background
There have been a lot of discussions and issues about weighted allocation, being able to store “session room” counts to be used on allocation, and more (more on Slack as well), sorting on Fleet scale down.
We’ve also always had a desire to be able to set some level of metrics through Agones from a
GameServer
as well.Design ideas
Configuration
GameServers
Being able to set arbitrary counts and lists on a GameServer instance.
GameServer Status
This is where current count and list value and capacity are stored against the CRD. The values in the spec do not change once they have been initially declared.
Fleets
Status
FleetAutoscaling
Count based autoscaling
List based autoscaling
Allocations
SDK
The SDK will batch operations every 1 second for performance reasons, but changes made through the SDK will be atomically accurate through the SDK. Changes made through
Allocation
or the Kubernetes API will be eventually consistent when coming back to the SDK.Question: In PlayerTracking, we told users to either use the K8s API or use the SDK commands. Can we do that here? Should we do that here? I’d like to avoid it with the strategy written above.
Counter
All functions will error if the key was not predefined in the
GameServer
resource on creation.Returns the current count under the provided key.
Increment a counter by a given amount. Will max at max(int64).
Will execute the increment operation against the current CRD value.
Returns
false
if the count is at the current capacity (to the latest knowledge of the SDK), and no increment will occur.Note: A potential race condition here is that if count values are set from both the SDK and through the K8s API (Allocation or otherwise), since the SDK append operation back to the CRD value is batched asynchronous any value incremented past the capacity will get silently truncated.
Decrements the current count by the provided amount. Will not go below 0.
Will execute the decrement operation against the current CRD value.
Returns
false
if the count is at 0 (to the latest knowledge of the SDK), and no decrement will occur.Sets a count at a given value. Use with care, as this will overwrite any previous invocations’ value.
Update the capacity for a given count. A capacity of 0 is no capacity.
Get the current capacity for this specific count.
Lists
All functions will error if the key was not predefined in the
GameServer
resource on creation.Appends the provided value to the list. If the list is already at capacity, it will return an error.
Will retrieve the current CRD value before executing the append operation.
Returns false, if the value already exists in the list, or if the list is already at capacity (to the latest knowledge of the SDK).
Note: A potential race condition here is that of list values are set from both the SDK and through the K8s API (Allocation or otherwise), since the SDK append operation back to the CRD value is batched asynchronous any value appended past the capacity will get silently truncated.
Delete the specified value from the list.
Returns false if the value is not found in the list (to the latest knowledge of the SDK),
Update the capacity for a given list. Capacity must be between 1 and 1000.
Get the current capacity for this specific list.
Returns true if the given list contains a provided value.
Returns the current length of the given list.
Returns the contents of the given list.
Metrics
Metrics should be exported, using the
key
that the metric is stored under as a label on the metrics, in aggregate across allGameServers
, giving us the ability to export basic numeric values as gauge metrics.The
Fleet
name as a label attached to each metric.Counters
Total of all counters on all GameServers, by key
Total count capacity of all
GameServers
, by keyLists
Total number of items in each list, by key of all
GameServers
Total list capacity of all
GameServers
, by keyDashboards
Since we are using labels, we can create some generic dashboards with dropdowns for each fleet, and names for counts and lists.
Critical User Journeys
Some high level summaries for some user journeys that could be utilised with this new functionality.
Player Tracking
Player tracking could be implemented in essentially the same way that is possible now, but we could also take an approach that could reserve player connections at allocation time.
An end user could now add a player at allocation time to the GameServer, blocking that space for the player. A gameserver binary could watch for that addition, then wait a determined amount of time before removing it from a “players” list if that player has not yet connected.
For example:
Room based High Density Game Servers
This could now be handled as an integer value as a count, or as a list with individual room ids.
A count based Allocation could look something like:
This would prioritise allocation to server that have more rooms currently running, and increment the value of the room count at allocation time, which could be picked up on by
SDK.WatchGameServer()
A list based Allocation could look something like:
If you then wanted to allocate to a the GameServer with the specific Room session, you could do the following:
Note: An end user could still use the “label locking” method for high density game servers as well / still. This just provides another way to solve the same problem that may be more applicable for some use cases.
Game Specific Weight allocation
With this new functionality, if you wanted to prioritise Allocation based on how many blueberries were available in your game server (or any arbitrary thing) , you could now do this as well. I’ve had conversations with people on how to preferentially “Allocate to the most interesting GameServer” - this would allow you to do exactly that, through an arbitrary counter tracking at the GameServer level.
For example:
The
blueberries
key would then be incremental and decremental withAlpha().CountIncrement(key, amount)
andAlpha().CountDecrement(key, amount)
as necessary from within the game server binary as needed.Alternatives considered
We could continue having specific integrations for each specific use case -- much like we did for player tracking. Personally, this is what often dissuaded me from adding more specific solutions to specific problems in many of the tickets above -- their specificity. i.e. “This solution works for this specific problem”. I personally prefer more generic solutions that can power a wide multitude of solutions. I genuinely believe that Agones’ power comes from its configurability and flexibility. That tradeoff does come with a higher cost for integration and greater overall complexity of the stack, but I don’t think the project would be as successful as it is without that flexibility.
I think the difference in player tracking was that it felt generic “enough” across use cases that it made sense. But I think this new approach is even more generic in its approach, and allows for a much wider set of use cases (probably ones we haven’t thought of yet), without need to build out yet another CRD and SDK implementation, and without sacrificing capability (in fact I think it adds capability). Which is also why I’m quite excited about it.
Work Items
List of individual work items on this design, so it doesn't seem so overwhelming 😃
API Surfaces
This is not implementation, this is creating placeholders for data, CRD structures, proto API definitions, and stubs for SDK methods.
Implementation
Building functionality on top of the API surfaces that have been
built out above.
The text was updated successfully, but these errors were encountered: