Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reserve() SDK implementation #891

Merged
merged 2 commits into from
Jul 16, 2019

Conversation

markmandel
Copy link
Member

This implements Reserve, with an implementation with the Go SDK.

It was also convenience to switch Allocate back to the standard async
implementation here, so that all the SDK commands are consistent in
their concurrency patterns.

Still to come:

  • e2e tests
  • sdk conformance test
  • updates to gameserver lifecycle documentation

@markmandel markmandel added kind/feature New features for Agones area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/breaking Breaking change labels Jul 10, 2019
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: f7dee006-7775-4d57-8335-04a7139c3966

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/891/head:pr_891 && git checkout pr_891
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.12.0-2543584

@roberthbailey
Copy link
Member

Looks like this is part of #660.

@markmandel
Copy link
Member Author

Looks like this is part of #660.

Yes! Thanks, forgot to put that on here.

Copy link
Member

@roberthbailey roberthbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first long-ish review of Go code in this repository, so feel free to take my comments with a grain of salt. I'm also happy to be told where I'm barking up the wrong tree.

sdks/go/sdk.go Outdated
@@ -79,6 +79,13 @@ func (s *SDK) Shutdown() error {
return errors.Wrapf(err, "could not send Shutdown message")
}

// Reserve marks the Game Server as Reserved, for a given duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems incomplete, and is missing a matching parentheses.

### Reserve(seconds)

With some matchmaking scenarios and systems it is important to be able to ensure that a `GameServer` is unable to be deleted,
but doesn't trigger a FleetAutoscaler scale up. This is where `Reserver()` is useful.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/Reserver/Reserve

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, elsewhere the function is written with the seconds argument, but here it is written as if it has no arguments. Please make them consistent.

but doesn't trigger a FleetAutoscaler scale up. This is where `Reserver()` is useful.

`Reserve(seconds)` will move the `GameServer` into the Reserved state for the specified number of seconds (0 is forever), and then it will be
moved back to `Ready` state. While in `Reserved` state, the `GameServer` cannot be deleted on scale down or `Fleet` update,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/cannot/will not

moved back to `Ready` state. While in `Reserved` state, the `GameServer` cannot be deleted on scale down or `Fleet` update,
and also cannot be Allocated.

This is often used wherein a game server process must self register itself with an external system, such as a matchmaker,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/wherein/when


This is often used wherein a game server process must self register itself with an external system, such as a matchmaker,
that requires it to designate itself as available for a game session for a certain period. Once a game session has started,
it can call `SDK.Allocate` to designate that players are currently active on it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say must instead of can?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched it to should - just because you don't have to, but you should. 👍

// test ready
_, err = sc.Reserve(context.Background(), &sdk.Duration{Seconds: 3})
assert.NoError(t, err)
assertStateChange(v1alpha1.GameServerStateReserved, 2*time.Second, noAdditional)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above the check after calling reserve had an assert, why not do one here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this one? can you be specific about what is missing? I'm not seeing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you pass in the noAdditional function as the third argument. Above you did

assertStateChange(v1alpha1.GameServerStateReserved, 2*time.Second, func(status v1alpha1.GameServerStatus) {
		assert.Equal(t, time.Now().Add(3*time.Second).Round(time.Second), status.ReservedUntil.Time.Round(time.Second))
})

So the first time you verify that the state changed via the callback but in subsequent times you don't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I see it now. I figured it might be redundant to test it again, but I see your point. I'll make the change!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And done!

m, sc, stop := setup(t, defaultGs.DeepCopy())
defer close(stop)
done := make(chan struct{}, 10)
select {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this waiting 4 seconds to verify that the timer has been disabled? Can you just check that directly instead of looking for a side effect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, the timer should be stopped already 👍 . Updated! PTAL!

defer func() {
done <- struct{}{}
}()
// test allocated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, make this a full sentence.

case <-time.After(4 * time.Second):
}

// test shutdown
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

})

select {
case current := <-state:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 3 test cases all waiting 4+ seconds, this change adds a considerable amount of time to unit tests (which should always be blazingly fast). It would be nice to see if there is a way to do this that completes nearly instantly instead (e.g. my comment about clock manipulation above).

Copy link
Collaborator

@cyriltovena cyriltovena Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, they are in parallel though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now removed, as I'm testing the state of the timer directly - so that removes this slowdown at least.

@cyriltovena
Copy link
Collaborator

cyriltovena commented Jul 16, 2019

Looks very good. Ping me back when you're done.

@markmandel
Copy link
Member Author

Thanks for all the comments - will review tomorrow - got lots of time for coding tomorrow 👍

This implements Reserve, with an implementation with the Go SDK.

It was also convenience to switch Allocate back to the standard async
implementation here, so that all the SDK commands are consistent in
their concurrency patterns.

Still to come:
- e2e tests
- sdk conformance test
- updates to gameserver lifecycle documentation
Copy link
Member Author

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed up updates based on the review, and some more comments on others 👍

message += fmt.Sprintf(", for %v", s.gsReserveDuration.String())
}
s.gsUpdateMutex.RUnlock()
// unfortunate complexity due to mutex locking, resetReserveAfter has it's own mutex
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the SDK.Allocate() or reserve, etc could change the underlying values while the we are processing.

}

duration := time.Duration(d.Seconds) * time.Second
s.gsUpdateMutex.Lock()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Good catch 👍

})
}

// stopReserveTimer stops the reserve timer, if it is not nil
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaked to: // stopReserveTimer stops the reserve timer. This is a no-op and safe to call if the timer is nil

_, err := sc.Allocate(context.Background(), &sdk.Empty{})
assert.EqualError(t, err, "cannot Allocate a Shutdown GameServer")
// wait for the game server to go back to being Ready
assertStateChange(v1alpha1.GameServerStateRequestReady, 4*time.Second, func(status v1alpha1.GameServerStatus) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So usually we do this - but I couldn't find a good way to mock out time.AfterFunc() in this scenario - which was so very useful, as it has both timed functions, and reset capabilities., so this was the best thing I could come up with.

Interestingly, https://godoc.org/github.com/facebookgo/clock#Mock.AfterFunc, and a couple of other clock mocks has an AfterFunc mock, but the one we use (from apimachinery, which is handy because we are on k8s), does not.

Maybe we should move to another clock interface?

t.Run("contention on first build", func(t *testing.T) {
m, sc, stop := setup(t, defaultGs.DeepCopy())
defer close(stop)
sc, err := NewSDKServer("test", "default", m.KubeClient, m.AgonesClient)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

// test ready
_, err = sc.Reserve(context.Background(), &sdk.Duration{Seconds: 3})
assert.NoError(t, err)
assertStateChange(v1alpha1.GameServerStateReserved, 2*time.Second, noAdditional)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this one? can you be specific about what is missing? I'm not seeing it.

})

select {
case current := <-state:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now removed, as I'm testing the state of the timer directly - so that removes this slowdown at least.


This is often used wherein a game server process must self register itself with an external system, such as a matchmaker,
that requires it to designate itself as available for a game session for a certain period. Once a game session has started,
it can call `SDK.Allocate` to designate that players are currently active on it.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched it to should - just because you don't have to, but you should. 👍

message += fmt.Sprintf(", for %v", s.gsReserveDuration.String())
}
s.gsUpdateMutex.RUnlock()
// unfortunate complexity due to mutex locking, resetReserveAfter has it's own mutex
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've cleaned this up, much simpler locking now. PTAL.

s.gsUpdateMutex.RLock()
hasDuration := s.gsReserveDuration != nil
if hasDuration {
message += fmt.Sprintf(", for %v", s.gsReserveDuration.String())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you can drop the .String(), %v use the Stringer interface if implemented which is your case.

Done!

The final message will be SDK state change, for 10m. I also think it lacks of information even though the reason is Reserved. I would prefer SDK state change to Reserved for ... or even better SDK state change to Reserved until utc date since when looking at the event you might not know if it is gone.

To give an example of the full event output, that might help with context on this:

Events:
  Type    Reason          Age               From                   Message
  ----    ------          ----              ----                   -------
  Normal  PortAllocation  53s               gameserver-controller  Port allocated
  Normal  Creating        53s               gameserver-controller  Pod simple-udp-567rr created
  Normal  Scheduled       53s               gameserver-controller  Address and port populated
  Normal  RequestReady    47s               gameserver-sidecar     SDK state change
  Normal  Reserved        10s               gameserver-sidecar     SDK state change, for 10s
  Normal  Ready           0s (x2 over 47s)  gameserver-controller  SDK.Ready() complete

So you can see the State in the Reason column, so I figure it's redundant to include it again in the Message.

I would also leave the time period as for 10s - reason being, from the Age column, you can actually see how long its been since the Reserved state was set, and therefore know what the difference is going to be, and can calculate the time left.

How do you both feel about this now?

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: b0fbce8e-fed5-4c79-9135-084b8edd7b91

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/891/head:pr_891 && git checkout pr_891
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.12.0-7d4f05f

// 0 is forever.
if d.Seconds == 0 {
return e, nil
if d.Seconds > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check actually means that 0 or a negative value means forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was deliberate. I wanted to handle the error condition in a sane way.

Do you feel I should change the docs?

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 042262d8-afd6-47ee-af34-cad3f010d07a

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/891/head:pr_891 && git checkout pr_891
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.12.0-c278ea1

@markmandel
Copy link
Member Author

Going to push this through, as I know you are waiting on it for some other PRs. @roberthbailey if you think I should change the docs, I'll do so in a separate PR.

@markmandel markmandel merged commit 3430d34 into googleforgames:master Jul 16, 2019
@markmandel markmandel deleted the feature/reserve-sdk-impl branch July 16, 2019 20:31
@markmandel markmandel added this to the 0.12.0 milestone Jul 16, 2019
pjayara-g added a commit to pjayara-g/agones that referenced this pull request Jul 16, 2019
@markmandel markmandel removed the kind/breaking Breaking change label Aug 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/feature New features for Agones
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants