Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading from 1.7.0 to 1.8.0 using the helm module for terraform fails with force_update=true #1767

Closed
comerford opened this issue Aug 20, 2020 · 8 comments · Fixed by #1844
Labels
area/operations Installation, updating, metrics etc good first issue These are great first issues. If you are looking for a place to start, start here! help wanted We would love help on these issues. Please come help us! kind/bug These are bugs.
Milestone

Comments

@comerford
Copy link
Contributor

What happened:
Changed the agones_version to "1.8.0", ran a terraform plan (OK), terraform apply (failed) with the following messaging:

Error: failed to replace object: Service "agones-ping-http-service" is invalid: spec.clusterIP: Invalid value: "": field is immutable && failed to replace object: Service "agones-controller-service" is invalid: spec.clusterIP: Invalid value: "": field is immutable && failed to replace object: Service "agones-allocator" is invalid: spec.clusterIP: Invalid value: "": field is immutable

  on .terraform/modules/helm_agones/install/terraform/modules/helm3/helm.tf line 29, in resource "helm_release" "agones":
  29: resource "helm_release" "agones" {

To workaround this (after digging into similar issues reported with helm), I changed the force_update setting to false (to do this I had to fork the module and use my own version, because the setting is not configurable in terraform at present. This allowed me to successfully upgrade to 1.8.0:

helm list -n agones-system
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
agones  agones-system   8               2020-08-20 13:17:59.956502806 +0100 IST deployed        agones-1.8.0    1.8.0

What you expected to happen:

Ideally, this upgrade would "just work". I assume the force_update setting is required for other reasons and setting it to false generally is not an option?

Changing the force_update flag to be configurable is one workaround that would be much less clunky than having to fork and use your own version of the module.

Based on what I have read elsewhere, the issue is that the setting of these immutable fields is blank and is generated when run first time, but then can't be blank for a second/subsequent run. Is another option to populate the settings perhaps?

How to reproduce it (as minimally and precisely as possible):

I am not 100% sure if this would occur with a config that has always used helm3, this was originally a helm2 config and I migrated to helm3, so this may be a hangover from that process.

I checked, and if I revert to using the official module from Agones, I once again get the error, so I can repro locally.

Anything else we need to know?:

Environment:

  • Agones version: 1.8.0

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-eks-4c6976", GitCommit:"4c6976793196d70bc5cd29d56ce5440c9473648e", GitTreeState:"clean", BuildDate:"2020-07-17T18:46:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration: AWS - EKS

  • Install method (yaml/helm): helm/terraform

  • Troubleshooting guide log(s):

  • Others:

@comerford comerford added the kind/bug These are bugs. label Aug 20, 2020
@comerford
Copy link
Contributor Author

for reference, here is a (lengthy) related bug from the helm project:

helm/helm#6378 (comment)

@markmandel
Copy link
Member

That's so weird that force_update causes this to fail - from the docs, it "Force resource update through delete/recreate if needed." - so it should only fire if needed, which doesn't seem to be the case! So is this a bug on Helm's side or Terraform, or something else?

(each e2e test we run replaces the previous helm installation with an upgrade, so it's relatively well trodden path)

I'm also confused as to why it's failing on a clusterIP , I'm not seeing that in the allocation service yaml. So trying to work out where that is deriving from.

That all being said, having a PR that makes force_update configurable seems totally fine, and a good addition 👍 but I am wondering at the root cause of this issue.

@markmandel markmandel added area/operations Installation, updating, metrics etc good first issue These are great first issues. If you are looking for a place to start, start here! help wanted We would love help on these issues. Please come help us! labels Aug 24, 2020
@comerford
Copy link
Contributor Author

I similarly grep'ed and searched through the charts and can't see a reference to clusterIP anywhere. I've also checked through all 8 versions of the manifest that I have on my cluster, and no sign of clusterIP anywhere either. We are still pre-production, so I am tempted to just burn it down and replace it from scratch, but not knowing the cause here is frustrating for sure.

@comerford
Copy link
Contributor Author

comerford commented Aug 25, 2020

I began to suspect that the root of this might be the serviceType: Loadbalancer config, so I did some googling for this type of error related to that instead, and I came to:

helm/helm#7956

TL;DR - this does look like a helm issue, and it relates to a change in behaviour between helm 2 (remove and recreate) and helm 3 (throw error and leave alone) in how they approach these assigned values in undefined fields.

And an associated pull request to add an option to have the helm2 style of dealing with it and effectively accepting the risks:

helm/helm#7431

There is a lot of debate in the PR about the appropriateness of the solution, whether it is a good idea at all etc.

Hence, not sure this is going to be accepted and released anytime soon and personally I may just take the easy way out for now and delete and re-install to get back onto a "standard" release

@aLekSer
Copy link
Collaborator

aLekSer commented Sep 16, 2020

@comerford Thanks for giving more details to this issue root cause.
It would be interesting to see Upgrades from 1.8.0-rc into 1.8.0 and do the same for 1.8.0 to 1.9.0-rc.
By the way you are using Kubernetes version which is 2 versions above recommended (which is 1.15 as of now):

Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-eks-4c6976", GitCommit:"4c6976793196d70bc5cd29d56ce5440c9473648e", GitTreeState:"clean", BuildDate:"2020-07-17T18:46:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

@comerford
Copy link
Contributor Author

comerford commented Oct 13, 2020

For the record, I hit this again, this time no upgrade involved, I only altered the feature gates this time, the diff on the config looks like this:

 # module.helm_agones.helm_release.agones will be updated in-place
  ~ resource "helm_release" "agones" {

// SNIP FOR BREVITY

      - set {
          - name  = "agones.controller.logLevel" -> null
          - value = "info" -> null
        }
      + set {
          + name  = "agones.controller.logLevel"
          + value = "info"
        }
      - set {
          - name = "agones.featureGates" -> null
        }
      + set {
          + name  = "agones.featureGates"
          + value = "PlayerTracking=true&ContainerPortAllocation=true"
        }
      - set {
          - name  = "agones.image.controller.pullPolicy" -> null
          - value = "IfNotPresent" -> null
        }
      + set {
          + name  = "agones.image.controller.pullPolicy"
          + value = "IfNotPresent"
        }
      - set {
          - name = "agones.image.controller.pullSecret" -> null
        }
      + set {
          + name = "agones.image.controller.pullSecret"
        }
      - set {
          - name  = "agones.image.registry" -> null
          - value = "gcr.io/agones-images" -> null
        }
      + set {
          + name  = "agones.image.registry"
          + value = "gcr.io/agones-images"
        }
      - set {
          - name  = "agones.image.sdk.alwaysPull" -> null
          - value = "false" -> null
        }
      + set {
          + name  = "agones.image.sdk.alwaysPull"
          + value = "false"
        }
      - set {
          - name  = "agones.ping.http.serviceType" -> null
          - value = "LoadBalancer" -> null
        }
      + set {
          + name  = "agones.ping.http.serviceType"
          + value = "LoadBalancer"
        }
      - set {
          - name  = "agones.ping.udp.expose" -> null
          - value = "false" -> null
        }
      + set {
          + name  = "agones.ping.udp.expose"
          + value = "false"
        }
      - set {
          - name  = "agones.ping.udp.serviceType" -> null
          - value = "LoadBalancer" -> null
        }
      + set {
          + name  = "agones.ping.udp.serviceType"
          + value = "LoadBalancer"
        }
      - set {
          - name  = "crds.CleanupOnDelete" -> null
          - value = "true" -> null
        }
      + set {
          + name  = "crds.CleanupOnDelete"
          + value = "true"
        }
      - set {
          - name  = "gameservers.maxPort" -> null
          - value = "8000" -> null
        }
      + set {
          + name  = "gameservers.maxPort"
          + value = "8000"
        }
      - set {
          - name  = "gameservers.minPort" -> null
          - value = "7000" -> null
        }
      + set {
          + name  = "gameservers.minPort"
          + value = "7000"
        }
      - set {
          - name  = "gameservers.namespaces" -> null
          - value = "{default,gameservers}" -> null
        }
      + set {
          + name  = "gameservers.namespaces"
          + value = "{default,gameservers}"
        }
    }

agones.featureGates was the only thing I was actually changing, but everything got shifted, hence the change. Once again, I had to delete the release and re-install to resolve

For reference, my versions:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:12:48Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-eks-4c6976", GitCommit:"4c6976793196d70bc5cd29d56ce5440c9473648e", GitTreeState:"clean", BuildDate:"2020-07-17T18:46:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

If anyone thinks that 1.17 is genuinely the issue here I might be able to delete and reinstall the entire thing to get onto 1.16 (can't downgrade gracefully).

@comerford
Copy link
Contributor Author

As a test I tried a single change on another cluster, the only change was the agones_version from 1.8.0 to 1.9.0

Here's the output of the terraform plan:

  # module.helm_agones.helm_release.agones will be updated in-place
  ~ resource "helm_release" "agones" {
        atomic                     = false
        chart                      = "agones"
        cleanup_on_fail            = false
        create_namespace           = true
        dependency_update          = false
        disable_crd_hooks          = false
        disable_openapi_validation = false
        disable_webhooks           = false
        force_update               = true
        id                         = "agones"
        lint                       = false
        max_history                = 0
        metadata                   = [
            {
                chart     = "agones"
                name      = "agones"
                namespace = "agones-system"
                revision  = 2
                values    = jsonencode(
                    {
                        agones      = {
                            controller   = {
                                logLevel = "info"
                            }
                            featureGates = "PlayerTracking=true&ContainerPortAllocation=true"
                            image        = {
                                controller = {
                                    pullPolicy = "IfNotPresent"
                                    pullSecret = ""
                                }
                                registry   = "gcr.io/agones-images"
                                sdk        = {
                                    alwaysPull = false
                                }
                            }
                            ping         = {
                                http = {
                                    serviceType = "LoadBalancer"
                                }
                                udp  = {
                                    expose      = false
                                    serviceType = "LoadBalancer"
                                }
                            }
                        }
                        crds        = {
                            CleanupOnDelete = true
                        }
                        gameservers = {
                            maxPort    = 8000
                            minPort    = 7000
                            namespaces = [
                                "default",
                                "gameservers",
                            ]
                        }
                    }
                )
                version   = "1.8.0"
            },
        ]
        name                       = "agones"
        namespace                  = "agones-system"
        recreate_pods              = false
        render_subchart_notes      = true
        replace                    = false
        repository                 = "https://agones.dev/chart/stable"
        reset_values               = false
        reuse_values               = false
        skip_crds                  = false
        status                     = "deployed"
        timeout                    = 420
        values                     = [
            "",
        ]
        verify                     = false
      ~ version                    = "1.8.0" -> "1.9.0"
        wait                       = true

        set {
            name  = "agones.controller.logLevel"
            value = "info"
        }
        set {
            name  = "agones.featureGates"
            value = "PlayerTracking=true&ContainerPortAllocation=true"
        }
        set {
            name  = "agones.image.controller.pullPolicy"
            value = "IfNotPresent"
        }
        set {
            name = "agones.image.controller.pullSecret"
        }
        set {
            name  = "agones.image.registry"
            value = "gcr.io/agones-images"
        }
        set {
            name  = "agones.image.sdk.alwaysPull"
            value = "false"
        }
        set {
            name  = "agones.ping.http.serviceType"
            value = "LoadBalancer"
        }
        set {
            name  = "agones.ping.udp.expose"
            value = "false"
        }
        set {
            name  = "agones.ping.udp.serviceType"
            value = "LoadBalancer"
        }
        set {
            name  = "crds.CleanupOnDelete"
            value = "true"
        }
        set {
            name  = "gameservers.maxPort"
            value = "8000"
        }
        set {
            name  = "gameservers.minPort"
            value = "7000"
        }
        set {
            name  = "gameservers.namespaces"
            value = "{default,gameservers}"
        }
    }

And the full error:

module.helm_agones.helm_release.agones: Modifying... [id=agones]

Error: failed to replace object: Service "agones-ping-http-service" is invalid: spec.clusterIP: Invalid value: "": field is immutable && failed to replace object: Service "agones-controller-service" is invalid: spec.clusterIP: Invalid value: "": field is immutable && failed to replace object: Service "agones-allocator" is invalid: spec.clusterIP: Invalid value: "": field is immutable

  on .terraform/modules/helm_agones/install/terraform/modules/helm3/helm.tf line 29, in resource "helm_release" "agones":
  29: resource "helm_release" "agones" {

@comerford
Copy link
Contributor Author

For the record, I have tested a 1.8.0 --> 1.9.0 upgrade with force_update = false on two clusters successfully using my fork (that is now a PR):

https://github.com/comerford/agones/blob/master/install/terraform/modules/helm3/helm.tf#L32

@markmandel markmandel added this to the 1.10.0 milestone Nov 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operations Installation, updating, metrics etc good first issue These are great first issues. If you are looking for a place to start, start here! help wanted We would love help on these issues. Please come help us! kind/bug These are bugs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants