Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register liveness check in gameservers.Controller #160

Merged
merged 1 commit into from
Apr 5, 2018
Merged

Register liveness check in gameservers.Controller #160

merged 1 commit into from
Apr 5, 2018

Conversation

enocom
Copy link
Contributor

@enocom enocom commented Apr 5, 2018

The liveness check is based on the worker queue having all its worker
goroutines running. If one of those goroutines exits, the liveness check
reports an unhealthy status.

Fixes #116

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 52e05777-3c73-48ae-868e-24efd6c4c5bc

The following development artifacts have been built, and will exist for the next 30 days:

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 59b401cd-ffa4-4646-8143-8898bbd33d17

The following development artifacts have been built, and will exist for the next 30 days:

Copy link
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I had some comment, mainly around syntactic locking patterns within functions (I prefer to use defer). Interested to hear your thoughts.

@@ -82,7 +82,8 @@ func NewController(
kubeInformerFactory informers.SharedInformerFactory,
extClient extclientset.Interface,
agonesClient versioned.Interface,
agonesInformerFactory externalversions.SharedInformerFactory) *Controller {
agonesInformerFactory externalversions.SharedInformerFactory,
) *Controller {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, I find this being on the next line kinda weird. I'm assuming you did this on purpose 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha. I can put it back. It's not related. Just helps me parse the args faster.

@@ -109,6 +110,12 @@ func NewController(
c.recorder = eventBroadcaster.NewRecorder(scheme.Scheme, corev1.EventSource{Component: "gameserver-controller"})

c.workerqueue = workerqueue.NewWorkerQueue(c.syncGameServer, c.logger, stable.GroupName+".GameServerController")
health.AddLivenessCheck("game-server-worker-queue", healthcheck.Check(func() error {
if !c.workerqueue.Healthy() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought - would it make sense to have c.workerqueue.Healthy() return an error when it's not healthy - rather than a bool?

Then we could just do: health.AddLivenessCheck("game-server-worker-queue", c.workerqueue.Healthy)

Not wedded to it, but thought it might be worth discussion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a better idea. I'll make the change.

@@ -40,6 +45,10 @@ type WorkerQueue struct {
queue workqueue.RateLimitingInterface
// SyncHandler is exported to make testing easier (hack)
SyncHandler Handler

mu sync.Mutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a RWLock? I suppose there will only be a single thread hitting the Healthy() function, so there isn't much benefit to allow for multiple concurrent Healthy() requests. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to start with a plain mutex and leave open the possibility for improving performance of the WorkerQueue once we understand the bottle necks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

}

func (wq *WorkerQueue) run(stop <-chan struct{}) {
wq.inc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wq.inc()
defer wq.dec()
wait.Until(wq.runWorker, workFx, stop)

?

}

// Healthy reports whether all the worker goroutines are running.
func (wq *WorkerQueue) Healthy() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wq.mu.Lock()
defer wq.mu.Unlock()
return wq.workers == wq.running

?
In this case, this shows why I prefer a lock/defer unlock pattern, as once you past it, it shows much easier exactly what the logic of the function is. Otherwise you have to mentally unwind what want and got actually mean. In this case it's pretty easy, but be good to set a standard. WDYT?

}

func (wq *WorkerQueue) setWorkerCount(n int) {
wq.mu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wq.mu.Lock()
defer wq.mu.Unlock()
wq.workers = n

?
Really just being consistent in lock/unlock strategy through the code.

wq.mu.Unlock()
}

func (wq *WorkerQueue) inc() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wq.mu.Lock()
defer wq.mu.Unlock()
wq.running++

?

wq.mu.Unlock()
}

func (wq *WorkerQueue) dec() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wq.mu.Lock()
defer wq.mu.Unlock()
wq.running--

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to use the defer pattern.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 9e961156-17ee-42ca-bbc2-dda161ba8007

The following development artifacts have been built, and will exist for the next 30 days:

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: d2d16947-94cf-49cc-8a13-949c6ee2ad9c

The following development artifacts have been built, and will exist for the next 30 days:

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 47746dc9-6df4-4c24-b80d-11153abc1cd8

The following development artifacts have been built, and will exist for the next 30 days:

The liveness check is based on the worker queue having all its worker
goroutines running. If one of those goroutines exits, the liveness check
reports an unhealthy status.

Fixes #116
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: a1f9ebfd-df17-495d-87da-87f484a7c0ed

The following development artifacts have been built, and will exist for the next 30 days:

Copy link
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@markmandel markmandel merged commit cbd5c86 into googleforgames:master Apr 5, 2018
@markmandel markmandel added this to the 0.2 milestone Apr 5, 2018
@enocom enocom deleted the liveness branch April 5, 2018 23:06
@markmandel markmandel added the kind/feature New features for Agones label May 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New features for Agones
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants