Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ReadyGameServerCache to AllocationCache #2148

Merged

Conversation

markmandel
Copy link
Member

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug
/kind cleanup
/kind documentation

/kind feature

/kind hotfix

What this PR does / Why we need it:

A bit of a chunky PR, but it does several things that were tied together.

  1. Since the cache will need to track both Ready and Allocated GameServers, renamed everything to be "Allocation" related, rather than just be about Ready GameServers
  2. Implement functionality to be able to cache either Ready (for the stable installation), or Allocated + Ready Gameservers for when the Feature Flag of FeatureStateAllocationFilter is enabled.
  3. There were some functionality and tests in some strange places, so took the opportunity to clean that up, since I had to refactor a bunch of stuff anyway.
  4. Made sure tests would continue to pass in this interim state.

Which issue(s) this PR fixes:

Work on #1239

Special notes for your reviewer:

The design on #1239 has it adding an annotation to the GameServer when allocating. I didn't put this behind the feature branch, because I didn't think it changed anything or was a risk. But please let me know if you think it should be behind the feature branch.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: ac12682f-3d98-44d7-8828-88aeeacec829

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2148/head:pr_2148 && git checkout pr_2148
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.16.0-87c4a04

}

// syncReadyGSServerCache syncs the gameserver cache and updates the local cache for any changes.
func (c *ReadyGameServerCache) syncReadyGSServerCache() error {
// syncCache syncs the gameserver cache and updates the local cache for any changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function need any locks to prevent concurrent calls from conflicting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't for two reasons:

  1. the underlying gameServerCacheEntry has appropriate locking in place for individual records.2
  2. The workerqueue is only running a single routine, so this method only ever gets called a single time per invocation.

(Also, all this code has been running in production for ages with no reported issues)

@@ -57,9 +58,9 @@ import (
)

var (
// ErrNoGameServerReady is returned when there are no Ready GameServers
// ErrNoGameServer is returned when there are no Ready GameServers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"when there are no Allocatable GameServers"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 👍🏻

pkg/gameserverallocations/allocator.go Show resolved Hide resolved
}

// add last allocated, so it always gets updated, even if it is already Allocated
gs.ObjectMeta.Annotations["agones.dev/last-allocated"] = time.Now().String()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agones.dev/last-allocated should be a constant instead of a magic string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit nit picky, but do we care about timezones at all for this timestamp? This will apply the time in the current timezone of the controller process, which could have some exciting behavior during daylight savings adjustments (where time goes backwards).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format of string is: "2006-01-02 15:04:05.999999999 -0700 MST", which includes the timezone, so I think we should be good there for readability.

For the sake of the purposes we need it for allocation, we actually only care that the value changes, so that the object gets updated within K8s, without a field that has changed, if we update the GameServer with no differences from what is stored in etcd, it becomes a noop, which means it can't be watched for by the SDK or other services.

pkg/gameserverallocations/allocator.go Show resolved Hide resolved
k := k
v := v
t.Run(k, func(t *testing.T) {
// deliberately not resetting the Feature state, to catch any possible unknown regressions with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the side effect of this to run (or worse, sometimes run) the old tests with the features enabled? I think if we want to ensure the old behavior works with the features enabled, it's simpler to just add a new test case exactly like the old one but with the feature turned on. Then we are deliberately testing the code path with and without the feature enabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (admittedly this is a movement of an old test, as is much of this file) - since we don't know what features will be enabled/disabled across the whole set of features, we rely on go's random ordering of tests to find issues with combinations of feature flags over time.

Testing for every combination of feature flags everywhere for past and future flags is a combinatorial explosion, and probably not worth the effort (unless you have a clever way of doing this that I can't think of - in which case I'm all ears!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to test every combination everywhere, but it places where we have two explicit code paths and it's easy to add coverage (by adding a new table driven test entry) it seems like not too much work to ensure that all test runs do both old and new.

},
}

for k, v := range fixtures {
t.Run(k, func(t *testing.T) {
runtime.FeatureTestMutex.Lock()
defer runtime.FeatureTestMutex.Unlock()
// deliberately not resetting the Feature state, to catch any possible unknown regressions with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me, since the tests are setting the feature flags to false, which should be exercising the old behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep, I see how this would look weird - apologies.

I have more tests coming for this test as the this feature continues to grow, which use different feature flags, so I just copied it across. Happy to scrap the individual setting of feature flags these features to false for overall, and then adjust it as needed down the line.

That being said, reviewing all the tests I have, and plan to have for this - I never don't set a feature flag, so I'll make it required for this unit test. Let me know what you think.

assert.Equal(t, "bar", gs.ObjectMeta.Labels["foo"])
assert.NotNil(t, gs.ObjectMeta.Annotations["agones.dev/last-allocated"])

gs, err = allocator.applyAllocationToGameServer(ctx, allocationv1.MetaPatch{Annotations: map[string]string{"foo": "bar"}}, &agonesv1.GameServer{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it test to run this call again with the same parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, I have no idea 😄 removed!

@@ -60,6 +60,11 @@ const (
func TestControllerAllocator(t *testing.T) {
t.Parallel()

// TODO:(markmandel) remove once GameServerSelector is in place, since right now enabling the feature flag causes flaky tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a bit scary. Does it make the tests flaky because they sometime run with the feature flag on and sometimes with it off?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is understandably scary on review 😃 - let me adjust it so it is less scary.

TL;DR: once all the code is in place the test will pass with any feature flag enabled/disabled - but not yet.

A bit of a chunky PR, but it does several things that were tied
together.

1. Since the cache will need to track both Ready and Allocated
GameServers, renamed everything to be "Allocation" related, rather than
just be about Ready GameServers
2. Implement functionality to be able to cache either Ready (for the
stable installation), or Allocated + Ready Gameservers for when the
Feature Flag of FeatureStateAllocationFilter is enabled.
3. There were some functionality and tests in some strange places, so
took the opportunity to clean that up, since I had to refactor a bunch
of stuff anyway.
4. Made sure tests would continue to pass in this interim state.

Work on googleforgames#1239
@markmandel
Copy link
Member Author

Review updates in place, PTAL!

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 69c8e0d6-6085-46eb-894f-3acb4ee82eb8

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2148/head:pr_2148 && git checkout pr_2148
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.16.0-103ada5

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markmandel, roberthbailey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [markmandel,roberthbailey]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-robot
Copy link

New changes are detected. LGTM label has been removed.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 9d2debcd-5039-4916-8d03-cae48aac3b8b

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2148/head:pr_2148 && git checkout pr_2148
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.16.0-086190e

@roberthbailey roberthbailey merged commit 0624928 into googleforgames:main Jun 26, 2021
@markmandel markmandel deleted the refactor/allocation-cache branch June 28, 2021 16:12
@roberthbailey roberthbailey added this to the 1.16.0 milestone Jul 13, 2021
markmandel added a commit to markmandel/agones that referenced this pull request Jul 27, 2021
Minor fix from a PR comment that I had in the backlog but hadn't
submitted yet.

Fix for googleforgames#2148 (comment)
markmandel added a commit to markmandel/agones that referenced this pull request Jul 27, 2021
Update the proto for allocation and related converters to allow advanced
allocation to also occur through the Allocation endpoint, and not just
the `GameServerAllocation` CRD.

This has been tested for backward compatability with the previous proto
version (ran allocation e2e from `main` branch against an install of
this PR).

The only outstanding item for googleforgames#2148 is updates to the documentation!

Work on googleforgames#2148
roberthbailey pushed a commit that referenced this pull request Jul 28, 2021
Minor fix from a PR comment that I had in the backlog but hadn't
submitted yet.

Fix for #2148 (comment)
markmandel added a commit to markmandel/agones that referenced this pull request Jul 28, 2021
Update the proto for allocation and related converters to allow advanced
allocation to also occur through the Allocation endpoint, and not just
the `GameServerAllocation` CRD.

This has been tested for backward compatability with the previous proto
version (ran allocation e2e from `main` branch against an install of
this PR).

The only outstanding item for googleforgames#2148 is updates to the documentation!

Work on googleforgames#2148
roberthbailey added a commit that referenced this pull request Jul 28, 2021
* Update proto and allocator for advanced allocation

Update the proto for allocation and related converters to allow advanced
allocation to also occur through the Allocation endpoint, and not just
the `GameServerAllocation` CRD.

This has been tested for backward compatability with the previous proto
version (ran allocation e2e from `main` branch against an install of
this PR).

The only outstanding item for #2148 is updates to the documentation!

Work on #2148

* Review updates.

Co-authored-by: Robert Bailey <robertbailey@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants