Refactor ReadyGameServerCache to AllocationCache #2148

markmandel · 2021-06-21T23:52:15Z

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug
/kind cleanup
/kind documentation

/kind feature

/kind hotfix

What this PR does / Why we need it:

A bit of a chunky PR, but it does several things that were tied together.

Since the cache will need to track both Ready and Allocated GameServers, renamed everything to be "Allocation" related, rather than just be about Ready GameServers
Implement functionality to be able to cache either Ready (for the stable installation), or Allocated + Ready Gameservers for when the Feature Flag of FeatureStateAllocationFilter is enabled.
There were some functionality and tests in some strange places, so took the opportunity to clean that up, since I had to refactor a bunch of stuff anyway.
Made sure tests would continue to pass in this interim state.

Which issue(s) this PR fixes:

Work on #1239

Special notes for your reviewer:

The design on #1239 has it adding an annotation to the GameServer when allocating. I didn't put this behind the feature branch, because I didn't think it changed anything or was a risk. But please let me know if you think it should be behind the feature branch.

agones-bot · 2021-06-22T00:18:04Z

Build Succeeded 👏

Build Id: ac12682f-3d98-44d7-8828-88aeeacec829

The following development artifacts have been built, and will exist for the next 30 days:

image: gcr.io/agones-images/agones-controller:1.16.0-87c4a04
image: gcr.io/agones-images/agones-ping:1.16.0-87c4a04
Linux C++ SDK (build): agonessdk-1.16.0-87c4a04-linux-arch_64.tar.gz
SDK Server: agonessdk-server-1.16.0-87c4a04.zip

A preview of the website (the last 30 builds are retained):

https://87c4a04-dot-preview-dot-agones-images.appspot.com/

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/2148/head:pr_2148 && git checkout pr_2148
helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.16.0-87c4a04

roberthbailey · 2021-06-22T16:43:36Z

pkg/gameserverallocations/allocation_cache.go

 }

-// syncReadyGSServerCache syncs the gameserver cache and updates the local cache for any changes.
-func (c *ReadyGameServerCache) syncReadyGSServerCache() error {
+// syncCache syncs the gameserver cache and updates the local cache for any changes.


Does this function need any locks to prevent concurrent calls from conflicting?

We don't for two reasons:

the underlying gameServerCacheEntry has appropriate locking in place for individual records.2

The workerqueue is only running a single routine, so this method only ever gets called a single time per invocation.

(Also, all this code has been running in production for ages with no reported issues)

roberthbailey · 2021-06-22T16:44:49Z

pkg/gameserverallocations/allocator.go

@@ -57,9 +58,9 @@ import (
 )

 var (
-	// ErrNoGameServerReady is returned when there are no Ready GameServers
+	// ErrNoGameServer is returned when there are no Ready GameServers


"when there are no Allocatable GameServers"

Thank you! 👍🏻

pkg/gameserverallocations/allocator.go

roberthbailey · 2021-06-22T18:42:46Z

pkg/gameserverallocations/allocator.go

+	}
+
+	// add last allocated, so it always gets updated, even if it is already Allocated
+	gs.ObjectMeta.Annotations["agones.dev/last-allocated"] = time.Now().String()


agones.dev/last-allocated should be a constant instead of a magic string.

This is a bit nit picky, but do we care about timezones at all for this timestamp? This will apply the time in the current timezone of the controller process, which could have some exciting behavior during daylight savings adjustments (where time goes backwards).

Format of string is: "2006-01-02 15:04:05.999999999 -0700 MST", which includes the timezone, so I think we should be good there for readability.

For the sake of the purposes we need it for allocation, we actually only care that the value changes, so that the object gets updated within K8s, without a field that has changed, if we update the GameServer with no differences from what is stored in etcd, it becomes a noop, which means it can't be watched for by the SDK or other services.

pkg/gameserverallocations/allocator.go

roberthbailey · 2021-06-22T18:55:17Z

pkg/gameserverallocations/allocation_cache_test.go

+		k := k
+		v := v
+		t.Run(k, func(t *testing.T) {
+			// deliberately not resetting the Feature state, to catch any possible unknown regressions with the


Is the side effect of this to run (or worse, sometimes run) the old tests with the features enabled? I think if we want to ensure the old behavior works with the features enabled, it's simpler to just add a new test case exactly like the old one but with the feature turned on. Then we are deliberately testing the code path with and without the feature enabled.

Yes (admittedly this is a movement of an old test, as is much of this file) - since we don't know what features will be enabled/disabled across the whole set of features, we rely on go's random ordering of tests to find issues with combinations of feature flags over time.

Testing for every combination of feature flags everywhere for past and future flags is a combinatorial explosion, and probably not worth the effort (unless you have a clever way of doing this that I can't think of - in which case I'm all ears!)

I don't think we need to test every combination everywhere, but it places where we have two explicit code paths and it's easy to add coverage (by adding a new table driven test entry) it seems like not too much work to ensure that all test runs do both old and new.

pkg/gameserverallocations/allocation_cache_test.go

roberthbailey · 2021-06-22T19:05:23Z

pkg/gameserverallocations/find_test.go

 		},
 	}

 	for k, v := range fixtures {
 		t.Run(k, func(t *testing.T) {
+			runtime.FeatureTestMutex.Lock()
+			defer runtime.FeatureTestMutex.Unlock()
+			// deliberately not resetting the Feature state, to catch any possible unknown regressions with the


This doesn't make sense to me, since the tests are setting the feature flags to false, which should be exercising the old behavior.

Ah yep, I see how this would look weird - apologies.

I have more tests coming for this test as the this feature continues to grow, which use different feature flags, so I just copied it across. Happy to scrap the individual setting of feature flags these features to false for overall, and then adjust it as needed down the line.

That being said, reviewing all the tests I have, and plan to have for this - I never don't set a feature flag, so I'll make it required for this unit test. Let me know what you think.

roberthbailey · 2021-06-22T19:08:11Z

pkg/gameserverallocations/allocator_test.go

+	assert.Equal(t, "bar", gs.ObjectMeta.Labels["foo"])
+	assert.NotNil(t, gs.ObjectMeta.Annotations["agones.dev/last-allocated"])
+
+	gs, err = allocator.applyAllocationToGameServer(ctx, allocationv1.MetaPatch{Annotations: map[string]string{"foo": "bar"}}, &agonesv1.GameServer{})


What does it test to run this call again with the same parameters?

You know, I have no idea 😄 removed!

roberthbailey · 2021-06-22T19:09:10Z

pkg/gameserverallocations/controller_test.go

@@ -60,6 +60,11 @@ const (
 func TestControllerAllocator(t *testing.T) {
 	t.Parallel()

+	// TODO:(markmandel) remove once GameServerSelector is in place, since right now enabling the feature flag causes flaky tests


This comment is a bit scary. Does it make the tests flaky because they sometime run with the feature flag on and sometimes with it off?

That is understandably scary on review 😃 - let me adjust it so it is less scary.

TL;DR: once all the code is in place the test will pass with any feature flag enabled/disabled - but not yet.

A bit of a chunky PR, but it does several things that were tied together. 1. Since the cache will need to track both Ready and Allocated GameServers, renamed everything to be "Allocation" related, rather than just be about Ready GameServers 2. Implement functionality to be able to cache either Ready (for the stable installation), or Allocated + Ready Gameservers for when the Feature Flag of FeatureStateAllocationFilter is enabled. 3. There were some functionality and tests in some strange places, so took the opportunity to clean that up, since I had to refactor a bunch of stuff anyway. 4. Made sure tests would continue to pass in this interim state. Work on googleforgames#1239

markmandel · 2021-06-24T00:18:52Z

Review updates in place, PTAL!

agones-bot · 2021-06-24T00:43:06Z

Build Succeeded 👏

Build Id: 69c8e0d6-6085-46eb-894f-3acb4ee82eb8

The following development artifacts have been built, and will exist for the next 30 days:

image: gcr.io/agones-images/agones-controller:1.16.0-103ada5
image: gcr.io/agones-images/agones-ping:1.16.0-103ada5
Linux C++ SDK (build): agonessdk-1.16.0-103ada5-linux-arch_64.tar.gz
SDK Server: agonessdk-server-1.16.0-103ada5.zip

A preview of the website (the last 30 builds are retained):

https://103ada5-dot-preview-dot-agones-images.appspot.com/

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/2148/head:pr_2148 && git checkout pr_2148
helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.16.0-103ada5

google-oss-robot · 2021-06-26T16:21:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markmandel, roberthbailey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [markmandel,roberthbailey]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-oss-robot · 2021-06-26T16:21:34Z

New changes are detected. LGTM label has been removed.

agones-bot · 2021-06-26T16:46:05Z

Build Succeeded 👏

Build Id: 9d2debcd-5039-4916-8d03-cae48aac3b8b

The following development artifacts have been built, and will exist for the next 30 days:

image: gcr.io/agones-images/agones-controller:1.16.0-086190e
image: gcr.io/agones-images/agones-ping:1.16.0-086190e
Linux C++ SDK (build): agonessdk-1.16.0-086190e-linux-arch_64.tar.gz
SDK Server: agonessdk-server-1.16.0-086190e.zip

A preview of the website (the last 30 builds are retained):

https://086190e-dot-preview-dot-agones-images.appspot.com/

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/2148/head:pr_2148 && git checkout pr_2148
helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.16.0-086190e

Minor fix from a PR comment that I had in the backlog but hadn't submitted yet. Fix for googleforgames#2148 (comment)

Update the proto for allocation and related converters to allow advanced allocation to also occur through the Allocation endpoint, and not just the `GameServerAllocation` CRD. This has been tested for backward compatability with the previous proto version (ran allocation e2e from `main` branch against an install of this PR). The only outstanding item for googleforgames#2148 is updates to the documentation! Work on googleforgames#2148

Minor fix from a PR comment that I had in the backlog but hadn't submitted yet. Fix for #2148 (comment)

Update the proto for allocation and related converters to allow advanced allocation to also occur through the Allocation endpoint, and not just the `GameServerAllocation` CRD. This has been tested for backward compatability with the previous proto version (ran allocation e2e from `main` branch against an install of this PR). The only outstanding item for googleforgames#2148 is updates to the documentation! Work on googleforgames#2148

* Update proto and allocator for advanced allocation Update the proto for allocation and related converters to allow advanced allocation to also occur through the Allocation endpoint, and not just the `GameServerAllocation` CRD. This has been tested for backward compatability with the previous proto version (ran allocation e2e from `main` branch against an install of this PR). The only outstanding item for #2148 is updates to the documentation! Work on #2148 * Review updates. Co-authored-by: Robert Bailey <robertbailey@google.com>

markmandel added the kind/feature New features for Agones label Jun 21, 2021

markmandel requested review from roberthbailey and pooneh-m June 21, 2021 23:52

google-cla bot added the cla: yes label Jun 21, 2021

google-oss-robot requested review from aLekSer and cyriltovena June 21, 2021 23:52

google-oss-robot added approved size/XL labels Jun 21, 2021

roberthbailey reviewed Jun 22, 2021

View reviewed changes

markmandel added 2 commits June 23, 2021 17:13

Review updates.

103ada5

markmandel force-pushed the refactor/allocation-cache branch from 87c4a04 to 103ada5 Compare June 24, 2021 00:18

roberthbailey approved these changes Jun 26, 2021

View reviewed changes

google-oss-robot assigned roberthbailey Jun 26, 2021

google-oss-robot added the lgtm label Jun 26, 2021

Merge branch 'main' into refactor/allocation-cache

086190e

google-oss-robot removed the lgtm label Jun 26, 2021

roberthbailey merged commit 0624928 into googleforgames:main Jun 26, 2021

markmandel deleted the refactor/allocation-cache branch June 28, 2021 16:12

roberthbailey added this to the 1.16.0 milestone Jul 13, 2021

markmandel added a commit to markmandel/agones that referenced this pull request Jul 27, 2021

Rename metapatch var in applyAllocationToGameServer

2864e19

Minor fix from a PR comment that I had in the backlog but hadn't submitted yet. Fix for googleforgames#2148 (comment)

markmandel mentioned this pull request Jul 27, 2021

Rename metapatch var in applyAllocationToGameServer #2198

Merged

markmandel mentioned this pull request Jul 27, 2021

Update proto and allocator for advanced allocation #2199

Merged

roberthbailey pushed a commit that referenced this pull request Jul 28, 2021

Rename metapatch var in applyAllocationToGameServer (#2198)

7db59fc

Minor fix from a PR comment that I had in the backlog but hadn't submitted yet. Fix for #2148 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor ReadyGameServerCache to AllocationCache #2148

Refactor ReadyGameServerCache to AllocationCache #2148

markmandel commented Jun 21, 2021

agones-bot commented Jun 22, 2021

roberthbailey Jun 22, 2021

markmandel Jun 23, 2021

roberthbailey Jun 22, 2021

markmandel Jun 23, 2021

roberthbailey Jun 22, 2021

roberthbailey Jun 22, 2021

markmandel Jun 23, 2021

roberthbailey Jun 22, 2021

markmandel Jun 23, 2021

roberthbailey Jun 26, 2021

roberthbailey Jun 22, 2021

markmandel Jun 23, 2021

roberthbailey Jun 22, 2021

markmandel Jun 23, 2021

roberthbailey Jun 22, 2021

markmandel Jun 23, 2021

markmandel commented Jun 24, 2021

agones-bot commented Jun 24, 2021

google-oss-robot commented Jun 26, 2021

google-oss-robot commented Jun 26, 2021

agones-bot commented Jun 26, 2021

Refactor ReadyGameServerCache to AllocationCache #2148

Refactor ReadyGameServerCache to AllocationCache #2148

Conversation

markmandel commented Jun 21, 2021

agones-bot commented Jun 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markmandel commented Jun 24, 2021

agones-bot commented Jun 24, 2021

google-oss-robot commented Jun 26, 2021

google-oss-robot commented Jun 26, 2021

agones-bot commented Jun 26, 2021