-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading from 1.7.0 to 1.8.0 using the helm module for terraform fails with force_update=true #1767
Comments
for reference, here is a (lengthy) related bug from the helm project: |
That's so weird that (each e2e test we run replaces the previous helm installation with an upgrade, so it's relatively well trodden path) I'm also confused as to why it's failing on a clusterIP , I'm not seeing that in the allocation service yaml. So trying to work out where that is deriving from. That all being said, having a PR that makes force_update configurable seems totally fine, and a good addition 👍 but I am wondering at the root cause of this issue. |
I similarly grep'ed and searched through the charts and can't see a reference to clusterIP anywhere. I've also checked through all 8 versions of the manifest that I have on my cluster, and no sign of clusterIP anywhere either. We are still pre-production, so I am tempted to just burn it down and replace it from scratch, but not knowing the cause here is frustrating for sure. |
I began to suspect that the root of this might be the serviceType: Loadbalancer config, so I did some googling for this type of error related to that instead, and I came to: TL;DR - this does look like a helm issue, and it relates to a change in behaviour between helm 2 (remove and recreate) and helm 3 (throw error and leave alone) in how they approach these assigned values in undefined fields. And an associated pull request to add an option to have the helm2 style of dealing with it and effectively accepting the risks: There is a lot of debate in the PR about the appropriateness of the solution, whether it is a good idea at all etc. Hence, not sure this is going to be accepted and released anytime soon and personally I may just take the easy way out for now and delete and re-install to get back onto a "standard" release |
@comerford Thanks for giving more details to this issue root cause.
|
For the record, I hit this again, this time no upgrade involved, I only altered the feature gates this time, the diff on the config looks like this:
agones.featureGates was the only thing I was actually changing, but everything got shifted, hence the change. Once again, I had to delete the release and re-install to resolve For reference, my versions:
If anyone thinks that 1.17 is genuinely the issue here I might be able to delete and reinstall the entire thing to get onto 1.16 (can't downgrade gracefully). |
As a test I tried a single change on another cluster, the only change was the agones_version from 1.8.0 to 1.9.0 Here's the output of the terraform plan:
And the full error:
|
For the record, I have tested a 1.8.0 --> 1.9.0 upgrade with https://github.com/comerford/agones/blob/master/install/terraform/modules/helm3/helm.tf#L32 |
What happened:
Changed the agones_version to "1.8.0", ran a terraform plan (OK), terraform apply (failed) with the following messaging:
To workaround this (after digging into similar issues reported with helm), I changed the force_update setting to false (to do this I had to fork the module and use my own version, because the setting is not configurable in terraform at present. This allowed me to successfully upgrade to 1.8.0:
What you expected to happen:
Ideally, this upgrade would "just work". I assume the force_update setting is required for other reasons and setting it to false generally is not an option?
Changing the force_update flag to be configurable is one workaround that would be much less clunky than having to fork and use your own version of the module.
Based on what I have read elsewhere, the issue is that the setting of these immutable fields is blank and is generated when run first time, but then can't be blank for a second/subsequent run. Is another option to populate the settings perhaps?
How to reproduce it (as minimally and precisely as possible):
I am not 100% sure if this would occur with a config that has always used helm3, this was originally a helm2 config and I migrated to helm3, so this may be a hangover from that process.
I checked, and if I revert to using the official module from Agones, I once again get the error, so I can repro locally.
Anything else we need to know?:
Environment:
Agones version: 1.8.0
Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-eks-4c6976", GitCommit:"4c6976793196d70bc5cd29d56ce5440c9473648e", GitTreeState:"clean", BuildDate:"2020-07-17T18:46:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: AWS - EKS
Install method (yaml/helm): helm/terraform
Troubleshooting guide log(s):
Others:
The text was updated successfully, but these errors were encountered: