Upgrading from 1.7.0 to 1.8.0 using the helm module for terraform fails with force_update=true #1767

comerford · 2020-08-20T12:25:45Z

What happened:
Changed the agones_version to "1.8.0", ran a terraform plan (OK), terraform apply (failed) with the following messaging:

Error: failed to replace object: Service "agones-ping-http-service" is invalid: spec.clusterIP: Invalid value: "": field is immutable && failed to replace object: Service "agones-controller-service" is invalid: spec.clusterIP: Invalid value: "": field is immutable && failed to replace object: Service "agones-allocator" is invalid: spec.clusterIP: Invalid value: "": field is immutable

  on .terraform/modules/helm_agones/install/terraform/modules/helm3/helm.tf line 29, in resource "helm_release" "agones":
  29: resource "helm_release" "agones" {

To workaround this (after digging into similar issues reported with helm), I changed the force_update setting to false (to do this I had to fork the module and use my own version, because the setting is not configurable in terraform at present. This allowed me to successfully upgrade to 1.8.0:

helm list -n agones-system
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
agones  agones-system   8               2020-08-20 13:17:59.956502806 +0100 IST deployed        agones-1.8.0    1.8.0

What you expected to happen:

Ideally, this upgrade would "just work". I assume the force_update setting is required for other reasons and setting it to false generally is not an option?

Changing the force_update flag to be configurable is one workaround that would be much less clunky than having to fork and use your own version of the module.

Based on what I have read elsewhere, the issue is that the setting of these immutable fields is blank and is generated when run first time, but then can't be blank for a second/subsequent run. Is another option to populate the settings perhaps?

How to reproduce it (as minimally and precisely as possible):

I am not 100% sure if this would occur with a config that has always used helm3, this was originally a helm2 config and I migrated to helm3, so this may be a hangover from that process.

I checked, and if I revert to using the official module from Agones, I once again get the error, so I can repro locally.

Anything else we need to know?:

Environment:

Agones version: 1.8.0
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-eks-4c6976", GitCommit:"4c6976793196d70bc5cd29d56ce5440c9473648e", GitTreeState:"clean", BuildDate:"2020-07-17T18:46:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: AWS - EKS
Install method (yaml/helm): helm/terraform
Troubleshooting guide log(s):
Others:

The text was updated successfully, but these errors were encountered:

comerford · 2020-08-20T12:27:47Z

for reference, here is a (lengthy) related bug from the helm project:

helm/helm#6378 (comment)

markmandel · 2020-08-24T18:01:09Z

That's so weird that force_update causes this to fail - from the docs, it "Force resource update through delete/recreate if needed." - so it should only fire if needed, which doesn't seem to be the case! So is this a bug on Helm's side or Terraform, or something else?

(each e2e test we run replaces the previous helm installation with an upgrade, so it's relatively well trodden path)

I'm also confused as to why it's failing on a clusterIP , I'm not seeing that in the allocation service yaml. So trying to work out where that is deriving from.

That all being said, having a PR that makes force_update configurable seems totally fine, and a good addition 👍 but I am wondering at the root cause of this issue.

comerford · 2020-08-25T20:57:48Z

I similarly grep'ed and searched through the charts and can't see a reference to clusterIP anywhere. I've also checked through all 8 versions of the manifest that I have on my cluster, and no sign of clusterIP anywhere either. We are still pre-production, so I am tempted to just burn it down and replace it from scratch, but not knowing the cause here is frustrating for sure.

comerford · 2020-08-25T21:25:16Z

I began to suspect that the root of this might be the serviceType: Loadbalancer config, so I did some googling for this type of error related to that instead, and I came to:

helm/helm#7956

TL;DR - this does look like a helm issue, and it relates to a change in behaviour between helm 2 (remove and recreate) and helm 3 (throw error and leave alone) in how they approach these assigned values in undefined fields.

And an associated pull request to add an option to have the helm2 style of dealing with it and effectively accepting the risks:

helm/helm#7431

There is a lot of debate in the PR about the appropriateness of the solution, whether it is a good idea at all etc.

Hence, not sure this is going to be accepted and released anytime soon and personally I may just take the easy way out for now and delete and re-install to get back onto a "standard" release

aLekSer · 2020-09-16T22:03:20Z

@comerford Thanks for giving more details to this issue root cause.
It would be interesting to see Upgrades from 1.8.0-rc into 1.8.0 and do the same for 1.8.0 to 1.9.0-rc.
By the way you are using Kubernetes version which is 2 versions above recommended (which is 1.15 as of now):

Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-eks-4c6976", GitCommit:"4c6976793196d70bc5cd29d56ce5440c9473648e", GitTreeState:"clean", BuildDate:"2020-07-17T18:46:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

comerford · 2020-10-13T17:03:41Z

For the record, I hit this again, this time no upgrade involved, I only altered the feature gates this time, the diff on the config looks like this:

 # module.helm_agones.helm_release.agones will be updated in-place
  ~ resource "helm_release" "agones" {

// SNIP FOR BREVITY

      - set {
          - name  = "agones.controller.logLevel" -> null
          - value = "info" -> null
        }
      + set {
          + name  = "agones.controller.logLevel"
          + value = "info"
        }
      - set {
          - name = "agones.featureGates" -> null
        }
      + set {
          + name  = "agones.featureGates"
          + value = "PlayerTracking=true&ContainerPortAllocation=true"
        }
      - set {
          - name  = "agones.image.controller.pullPolicy" -> null
          - value = "IfNotPresent" -> null
        }
      + set {
          + name  = "agones.image.controller.pullPolicy"
          + value = "IfNotPresent"
        }
      - set {
          - name = "agones.image.controller.pullSecret" -> null
        }
      + set {
          + name = "agones.image.controller.pullSecret"
        }
      - set {
          - name  = "agones.image.registry" -> null
          - value = "gcr.io/agones-images" -> null
        }
      + set {
          + name  = "agones.image.registry"
          + value = "gcr.io/agones-images"
        }
      - set {
          - name  = "agones.image.sdk.alwaysPull" -> null
          - value = "false" -> null
        }
      + set {
          + name  = "agones.image.sdk.alwaysPull"
          + value = "false"
        }
      - set {
          - name  = "agones.ping.http.serviceType" -> null
          - value = "LoadBalancer" -> null
        }
      + set {
          + name  = "agones.ping.http.serviceType"
          + value = "LoadBalancer"
        }
      - set {
          - name  = "agones.ping.udp.expose" -> null
          - value = "false" -> null
        }
      + set {
          + name  = "agones.ping.udp.expose"
          + value = "false"
        }
      - set {
          - name  = "agones.ping.udp.serviceType" -> null
          - value = "LoadBalancer" -> null
        }
      + set {
          + name  = "agones.ping.udp.serviceType"
          + value = "LoadBalancer"
        }
      - set {
          - name  = "crds.CleanupOnDelete" -> null
          - value = "true" -> null
        }
      + set {
          + name  = "crds.CleanupOnDelete"
          + value = "true"
        }
      - set {
          - name  = "gameservers.maxPort" -> null
          - value = "8000" -> null
        }
      + set {
          + name  = "gameservers.maxPort"
          + value = "8000"
        }
      - set {
          - name  = "gameservers.minPort" -> null
          - value = "7000" -> null
        }
      + set {
          + name  = "gameservers.minPort"
          + value = "7000"
        }
      - set {
          - name  = "gameservers.namespaces" -> null
          - value = "{default,gameservers}" -> null
        }
      + set {
          + name  = "gameservers.namespaces"
          + value = "{default,gameservers}"
        }
    }

agones.featureGates was the only thing I was actually changing, but everything got shifted, hence the change. Once again, I had to delete the release and re-install to resolve

For reference, my versions:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:12:48Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-eks-4c6976", GitCommit:"4c6976793196d70bc5cd29d56ce5440c9473648e", GitTreeState:"clean", BuildDate:"2020-07-17T18:46:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

If anyone thinks that 1.17 is genuinely the issue here I might be able to delete and reinstall the entire thing to get onto 1.16 (can't downgrade gracefully).

comerford · 2020-10-13T17:14:24Z

As a test I tried a single change on another cluster, the only change was the agones_version from 1.8.0 to 1.9.0

Here's the output of the terraform plan:

  # module.helm_agones.helm_release.agones will be updated in-place
  ~ resource "helm_release" "agones" {
        atomic                     = false
        chart                      = "agones"
        cleanup_on_fail            = false
        create_namespace           = true
        dependency_update          = false
        disable_crd_hooks          = false
        disable_openapi_validation = false
        disable_webhooks           = false
        force_update               = true
        id                         = "agones"
        lint                       = false
        max_history                = 0
        metadata                   = [
            {
                chart     = "agones"
                name      = "agones"
                namespace = "agones-system"
                revision  = 2
                values    = jsonencode(
                    {
                        agones      = {
                            controller   = {
                                logLevel = "info"
                            }
                            featureGates = "PlayerTracking=true&ContainerPortAllocation=true"
                            image        = {
                                controller = {
                                    pullPolicy = "IfNotPresent"
                                    pullSecret = ""
                                }
                                registry   = "gcr.io/agones-images"
                                sdk        = {
                                    alwaysPull = false
                                }
                            }
                            ping         = {
                                http = {
                                    serviceType = "LoadBalancer"
                                }
                                udp  = {
                                    expose      = false
                                    serviceType = "LoadBalancer"
                                }
                            }
                        }
                        crds        = {
                            CleanupOnDelete = true
                        }
                        gameservers = {
                            maxPort    = 8000
                            minPort    = 7000
                            namespaces = [
                                "default",
                                "gameservers",
                            ]
                        }
                    }
                )
                version   = "1.8.0"
            },
        ]
        name                       = "agones"
        namespace                  = "agones-system"
        recreate_pods              = false
        render_subchart_notes      = true
        replace                    = false
        repository                 = "https://agones.dev/chart/stable"
        reset_values               = false
        reuse_values               = false
        skip_crds                  = false
        status                     = "deployed"
        timeout                    = 420
        values                     = [
            "",
        ]
        verify                     = false
      ~ version                    = "1.8.0" -> "1.9.0"
        wait                       = true

        set {
            name  = "agones.controller.logLevel"
            value = "info"
        }
        set {
            name  = "agones.featureGates"
            value = "PlayerTracking=true&ContainerPortAllocation=true"
        }
        set {
            name  = "agones.image.controller.pullPolicy"
            value = "IfNotPresent"
        }
        set {
            name = "agones.image.controller.pullSecret"
        }
        set {
            name  = "agones.image.registry"
            value = "gcr.io/agones-images"
        }
        set {
            name  = "agones.image.sdk.alwaysPull"
            value = "false"
        }
        set {
            name  = "agones.ping.http.serviceType"
            value = "LoadBalancer"
        }
        set {
            name  = "agones.ping.udp.expose"
            value = "false"
        }
        set {
            name  = "agones.ping.udp.serviceType"
            value = "LoadBalancer"
        }
        set {
            name  = "crds.CleanupOnDelete"
            value = "true"
        }
        set {
            name  = "gameservers.maxPort"
            value = "8000"
        }
        set {
            name  = "gameservers.minPort"
            value = "7000"
        }
        set {
            name  = "gameservers.namespaces"
            value = "{default,gameservers}"
        }
    }

And the full error:

module.helm_agones.helm_release.agones: Modifying... [id=agones]

Error: failed to replace object: Service "agones-ping-http-service" is invalid: spec.clusterIP: Invalid value: "": field is immutable && failed to replace object: Service "agones-controller-service" is invalid: spec.clusterIP: Invalid value: "": field is immutable && failed to replace object: Service "agones-allocator" is invalid: spec.clusterIP: Invalid value: "": field is immutable

  on .terraform/modules/helm_agones/install/terraform/modules/helm3/helm.tf line 29, in resource "helm_release" "agones":
  29: resource "helm_release" "agones" {

comerford · 2020-10-13T18:39:18Z

For the record, I have tested a 1.8.0 --> 1.9.0 upgrade with force_update = false on two clusters successfully using my fork (that is now a PR):

https://github.com/comerford/agones/blob/master/install/terraform/modules/helm3/helm.tf#L32

comerford added the kind/bug These are bugs. label Aug 20, 2020

markmandel added area/operations Installation, updating, metrics etc good first issue These are great first issues. If you are looking for a place to start, start here! help wanted We would love help on these issues. Please come help us! labels Aug 24, 2020

comerford mentioned this issue Oct 13, 2020

Make the force_update option configurable in Helm/Terraform #1844

Merged

markmandel closed this as completed in #1844 Oct 14, 2020

markmandel added this to the 1.10.0 milestone Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading from 1.7.0 to 1.8.0 using the helm module for terraform fails with force_update=true #1767

Upgrading from 1.7.0 to 1.8.0 using the helm module for terraform fails with force_update=true #1767

comerford commented Aug 20, 2020

comerford commented Aug 20, 2020

markmandel commented Aug 24, 2020

comerford commented Aug 25, 2020

comerford commented Aug 25, 2020 •

edited

Loading

aLekSer commented Sep 16, 2020 •

edited

Loading

comerford commented Oct 13, 2020 •

edited

Loading

comerford commented Oct 13, 2020

comerford commented Oct 13, 2020

Upgrading from 1.7.0 to 1.8.0 using the helm module for terraform fails with force_update=true #1767

Upgrading from 1.7.0 to 1.8.0 using the helm module for terraform fails with force_update=true #1767

Comments

comerford commented Aug 20, 2020

comerford commented Aug 20, 2020

markmandel commented Aug 24, 2020

comerford commented Aug 25, 2020

comerford commented Aug 25, 2020 • edited Loading

aLekSer commented Sep 16, 2020 • edited Loading

comerford commented Oct 13, 2020 • edited Loading

comerford commented Oct 13, 2020

comerford commented Oct 13, 2020

comerford commented Aug 25, 2020 •

edited

Loading

aLekSer commented Sep 16, 2020 •

edited

Loading

comerford commented Oct 13, 2020 •

edited

Loading