Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GameServerAllocationPolicy with empty AllocationEndpoints errors on allocate #2011

Closed
highlyunavailable opened this issue Feb 27, 2021 · 2 comments · Fixed by #2012
Closed
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs.
Milestone

Comments

@highlyunavailable
Copy link
Contributor

highlyunavailable commented Feb 27, 2021

What happened:

Multi cluster allocation does not work with a local cluster policy defined.

Calling the allocator service with a GameServerAllocationPolicy that has a empty (nil, not present in yaml) set of allocationEndpoints causes the allocator service to fail with {"error":"Marshal called with nil","code":2,"message":"Marshal called with nil"}.

I also cannot set AllocationEndpoints to an empty array, as the CRD validation forces that to be a single item, and even if I edit the CRD to allow it, the same error occurs.

Additionally, the documentation for the REST example is incorrect: It says use '{"namespace":"'${NAMESPACE}'", "multi_cluster_settings":{"enabled":"true"}}' as the data, but the real JSON is '{"namespace":"'${NAMESPACE}'", "multiClusterSetting":{"enabled":"true"}}' which does give the error as above, as does the Protobuf client in examples/allocator-client.

What you expected to happen:

An empty (empty array) or missing (nil) allocationEndpoints should allocate to the local cluster.

How to reproduce it (as minimally and precisely as possible):

Follow the documentation for multi-cluster allocation and make sure the allocationEndpoints value is missing in the GameServerAllocationPolicy. Also, when calling the endpoint, make sure to use the protobuf client or use the correct multiClusterSetting data above - both will repro the error.

Anything else we need to know?:

I've taken a crack at fixing this but I can't figure out where it's attempting to marshal the nil (not really sure how to debug the path through the code - can I just go run this somehow?) and thus can't. Happy to fix this if someone points me in the right direction.

Logs from the allocator service:

agones-allocator-557d479c94-5jh27 agones-allocator {"message":"allocation request received.","request":{"namespace":"default","multiClusterSetting":{"enabled":true}},"severity":"info","source":"main","time":"2021-02-27T01:14:54.688724Z"}
agones-allocator-557d479c94-5jh27 agones-allocator {"gsa":{"metadata":{"namespace":"default","creationTimestamp":null},"spec":{"multiClusterSetting":{"enabled":true,"policySelector":{}},"required":{},"scheduling":"Packed","metadata":{}},"status":
agones-allocator-557d479c94-5jh27 agones-allocator {"error":"Could not find a Ready GameServer","gsa":{"metadata":{"namespace":"agones-system","creationTimestamp":null},"spec":{"multiClusterSetting":{"enabled":true,"policySelector":{}},"required":{},"scheduling":"Packed","metadata":{}},"status":{"state":"","gameServerName":""}},"gsaKey":"agones-system/","message":"failed to allocate. Retrying... ","severity":"warning","source":"*gameserverallocations.Allocator","time":"2021-02-27T01:14:55.1724364Z"}
agones-allocator-557d479c94-5jh27 agones-allocator {"error":null,"message":"allocation response is being sent","response":null,"severity":"info","source":"main","time":"2021-02-27T01:14:55.1725248Z"}
agones-allocator-557d479c94-5jh27 agones-allocator ERROR: 2021/02/27 01:14:55 grpc: server failed to encode response:  rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil

The "Could not find a Ready GameServer" error is spurious:

NAME                        STATE   ADDRESS        PORT   NODE             AGE
sd-gameserver-4h755-p9z2z   Ready   192.168.65.3   7134   docker-desktop   14m

Environment:

  • Agones version: 1.12.0
  • Kubernetes version (use kubectl version): v1.19.3
  • Cloud provider or hardware configuration: Docker desktop
  • Install method (yaml/helm): helm
  • Troubleshooting guide log(s):
  • Others:
@highlyunavailable highlyunavailable added the kind/bug These are bugs. label Feb 27, 2021
@highlyunavailable
Copy link
Contributor Author

highlyunavailable commented Feb 27, 2021

Okay, this appears to be a PEBKAC on my part caused by somewhat unclear documentation around this feature and a need to have extra error checks.

Here was the broken GSAP:

apiVersion: multicluster.agones.dev/v1
kind: GameServerAllocationPolicy
metadata:
  name: docker-desktop
  namespace: default
spec:
  connectionInfo:
    clusterName: docker-desktop
    namespace: agones-system
    secretName: local-allocator
    serverCa: [redacted]
  priority: 1
  weight: 100

Fixed by changing the connectionInfo namespace to "default". I figured it out when I noticed the code did this:

if gsa.Namespace != connectionInfo.Namespace {
	gsaCopy = gsa.DeepCopy()
	gsaCopy.Namespace = connectionInfo.Namespace
}

The connectionInfo.Namespace there must be a namespace you have configured to accept game server pods.

That said: I still get the panic: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil if there are no game servers ready (all are allocated) so there's an underlying error here.

@highlyunavailable
Copy link
Contributor Author

The clue is actually right here:

agones-allocator-6d9b48f48b-cp6r6 agones-allocator {"error":null,"message":"allocation response is being sent","response":null,"severity":"info","source":"main","time":"2021-02-27T04:12:46.4578428Z"}

It's most likely trying to marshal the response (which is nil! because it's actually an error!) to a response which obviously isn't gonna fly.

highlyunavailable added a commit to highlyunavailable/agones that referenced this issue Feb 27, 2021
Fixes googleforgames#2011

This resolves an edge case where if all connections in a multicluster
allocation failed, the allocation would return (nil, nil) and cause
protobuf to fail to serialize the response. Instead, the last result
(even if it was unsuccessful!) is returned. An example of this is if
all clusters being fully allocated.
highlyunavailable added a commit to highlyunavailable/agones that referenced this issue Mar 3, 2021
Fixes googleforgames#2011

This resolves an edge case where if all connections in a multicluster
allocation failed, the allocation would return (nil, nil) and cause
protobuf to fail to serialize the response. Instead, the last result
(even if it was unsuccessful!) is returned. An example of this is if
all clusters being fully allocated.
highlyunavailable added a commit to highlyunavailable/agones that referenced this issue Mar 3, 2021
Fixes googleforgames#2011

This resolves an edge case where if all connections in a multicluster
allocation failed, the allocation would return (nil, nil) and cause
protobuf to fail to serialize the response. Instead, the last result
(even if it was unsuccessful!) is returned. An example of this is if
all clusters being fully allocated.
pooneh-m pushed a commit that referenced this issue Mar 3, 2021
Fixes #2011

This resolves an edge case where if all connections in a multicluster
allocation failed, the allocation would return (nil, nil) and cause
protobuf to fail to serialize the response. Instead, the last result
(even if it was unsuccessful!) is returned. An example of this is if
all clusters being fully allocated.
@markmandel markmandel added this to the 1.13.0 milestone Mar 10, 2021
@markmandel markmandel added the area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc label Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants