Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/postgresql] No warning about missing password value during upgrade when chart is a subchart #6121

Closed
hWorblehat opened this issue Apr 15, 2021 · 18 comments
Labels
on-hold Issues or Pull Requests with this label will never be considered stale

Comments

@hWorblehat
Copy link

Which chart:
Tested with postgresql 8.10.14, but I believe it's also present in more recent versions.

Describe the bug
When this chart is used as a subchart that doesn't explicitly have the postgresPassword value set, there is no error report on upgrade. As a result, access to the data on the persistent volume is lost the next time the pod is rescheduled.

To Reproduce
Steps to reproduce the behavior:

  1. Use this chart as a subchart of another that doesn't explicitly set the postgresPassword value (e.g. Requarks/wiki).
  2. Install the chart without specifying postgresPassword:
    $ helm upgrade --install postgres-test <chart>
    
  3. Make some change and update the release:
    $ helm upgrade --install postgres-test <chart> --set foo=bar
    
  4. Everything might still be working fine at this point, if the upgrade didn't cause any pods to be resceduled, but notice that the value of the postgres-password secret differs from the environment variable in the postgres pod:
    $ kubectl get secret  postgres-test-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode
    $ kubectl exec postgres-test-postgresql-0 -- bash -c 'echo $POSTGRES_PASSWORD'
    
  5. As soon as a pod gets rescheduled, and picks up the new password value from the secret, it can no longer access the data:
    $ kubectl delete pod postgres-test-postgresql-0
    $ kubectl get pod postgres-test-postgresql-0 -w # Wait for pod to be ready
    $ kubectl exec postgres-test-postgresql-0 -- bash -c 'PGPASSWORD="$POSTGRES_PASSWORD" pg_dump -U postgres'
    

Expected behavior
In step 3 above, the upgrade should fail with an error requesting the password be explicitly specified.

Version of Helm and Kubernetes:

  • Output of helm version:
version.BuildInfo{Version:"v3.5.3", GitCommit:"041ce5a2c17a58be0fcd5f5e16fb3e7e95fea622", GitTreeState:"dirty", GoVersion:"go1.16.2"}
  • Output of kubectl version:
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"archive", BuildDate:"2021-03-25T19:02:48Z", GoVersion:"go1.16.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"54684493f8139456e5d2f963b23cb5003c4d8055", GitTreeState:"clean", BuildDate:"2021-03-22T23:02:59Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

Additional context
This error doesn't occur when using the chart directly, as there's a macro in NOTES.txt that checks for the password. However, by default Helm doesn't render subchart notes so this macro never gets invoked. A simple fix would be to move the invocation of this macro into another template file to ensure it is always invoked.

@hWorblehat
Copy link
Author

#4364 Seems somewhat related but doesn't look like it's describing the exact same issue.

@rafariossaa
Copy link
Contributor

Hi,
If you are using this as a subchart you would need to add in your notes the checking.
For example, in our keycloack, postgresql is used as a subchart and some checking is added in the keycloack NOTES.txt.
Could you take a look to these lines?

@hWorblehat
Copy link
Author

I appreciate that this is a possible workaround, but it hardly seems ideal. I can't see any documentation in the chart's README indicating that this is a requirement.

Moreover, I expect that most people consuming a chart would expect it to behave the same when used as a subchart as when used directly. Is there a particular reason why the validation can't be moved into a template file that renders in all scenarios, and must stay in NOTES.txt?

@hWorblehat
Copy link
Author

To make the problem explicit: at present, anybody who uses this as a subchart and doesn't realise there is a requirement to put specific macros in their NOTES.txt will likely loose their data at some unknown point in the future.

What's worse, the requirement effectively propagates to all consuming charts - if there are nested subcharts, all charts that have postgresql somewhere in their subchart hierarchy must modify their NOTES.txt. Furthermore, any consuming chart that might itself be used as a subchart (which is essentially all charts) must document this requirement themself, and hope that consumers notice the documentation in order to avoid losing data.

Unless I'm missing something, I'd consider this a fairly unequivocal bug, given the risk of data loss.

@juan131
Copy link
Contributor

juan131 commented Apr 19, 2021

Hi @hWorblehat

Thanks for your feedback.

As @rafariossaa mentioned, these validations are addressed in the main chart (as we did in Keycloak) since the NOTES.txt of the subcharts are ignored.

Is there a particular reason why the validation can't be moved into a template file that renders in all scenarios, and must stay in NOTES.txt?

To be honest, we did not explore that possibility but I think it does make sense. We simply add them into the NOTES.txt for historical reasons.

@javsalgar @carrodher what do you think about this?

To make the problem explicit: at present, anybody who uses this as a subchart and doesn't realise there is a requirement to put specific macros in their NOTES.txt will likely loose their data at some unknown point in the future.

That's not correct. The data in PostgreSQL remains unmodified regardless you indicate the credentials during the chart upgrade or not. The actual problem is that the application is unable to authenticate with the PostgreSQL server.

If this happens, users can do a "helm rollback" to the previous release revision and everything should continue working as before.

and hope that consumers notice the documentation in order to avoid losing data.

Please note that this is a mechanism we introduced to avoid users to misuse the chart, but users still have the responsibility to carefully read the documentation and use charts accordingly.

We also have Troubleshooting guides that address this specific topic:

As an alternative, we also support managing these credentials externally to the chart making use of the existingSecret parameter. Using this approach, there's no chance for users to make inappropriate use of the charts.

@hWorblehat
Copy link
Author

Hi @juan131,

Thanks for responding and clarifying a few things.

That's not correct. The data in PostgreSQL remains unmodified...

Yep, my mistake here. I'm not that hot on how PostgreSQL works, and I'd thought that it was possibly encrypting its data at rest using the password. I've done some reading and it looks like I was clearly wrong here. However, if the password is lost (excepting your rollback solution, discussed below), it's a certain amount of manual faff to recover access, which looks like it would require manually editing some of the data on the PV (e.g. here).

If this happens, users can do a "helm rollback" to the previous release revision and everything should continue working as before.

True, I'd forgotten you can do this. However there's still a problem: the issue only manifests itself when the pod trying to access the data is rescheduled. Performing a Helm upgrade doesn't guarantee that will happen. So, it may not be clear that the Helm upgrade was what caused the issue at first (this is what happened to me - pod was rescheduled months later when upgrading the cluster). Also, if the the Helm release is updated more times than Helm's max history (10 by default) before the pod is rescheduled, then the password really will be lost. While this may be unlikely, it's not impossible.

Consider this scenario: you upgrade your Helm release twice. The first time doesn't cause pods to be rescheduled, but the second does. Here, it would look like the second upgrade had broken things, when in fact it was the first. I can see an administrator spending a really long time trying to get to the bottom of that...

users still have the responsibility to carefully read the documentation and use charts accordingly

Agreed, RTFM, and thanks for pointing me at the documentation. However, I came across this issue not as a user of this chart, but as a user of another chart that used postgresql as a subchart. Arguably, this is the fault of the subchart developer, who maybe didn't RTFM, but:

  1. The documentation is hardly prominent. Given how long the README for this chart already is, I'd expect something like this (or a prominent link to it) to be included there.

  2. Since this is an area where the chart behaves differently as a subchart than as a main chart, I'd expect that to be called out. It's not in the documentation you linked, so a developer reading that might experiment with postgresql on its own, then add it as a subchart to something else, naively expecting it to behave the same. Yes, maybe they should have tested it better, but...

  3. I'm all for documenting things, and expecting people to read documentation, but if there's an easy way to make things more robust and protect against when people don't read the documentation (which, let's face it, happens much more often than we'd like), then why not take it? We have such a situation here: if the macro to check for unspecified passwords on upgrades was called anywhere other than NOTES.txt, it would be invoked for consuming charts. As a consumer of a consuming chart, I'd be forced to confront the problem the first time I try to upgrade, before I've lost access to my data. If the developer of the chart I'm using, was lazy, and didn't read your documentation, or didn't properly write their own, it doesn't matter (as much).

    That seems like a win-win to me.

As an alternative, we also support...

Agreed, there are lots of ways to use this chart "correctly", both directly and as a subchart. I'm now employing one of them in my deployment (:+1: ). But, there's one way, which happens to be the "easiest" option, that is quite dangerous. On the surface of things to me, it looks like that danger could quite easily be mitigated by moving a small amount of code between template files, so I raised this bug as I thought that might be a good idea to explore.

So, on the basis of all of the above, I'm still of the opinion that either:

  • I'm missing something, and there's a good reason why the password-checking macro shouldn't be moved out of NOTES.txt.

    -or-

  • This is a bug: the intended behaviour is for the chart to error on upgrades if the password is auto-generated, but it doesn't when used as a subchart

@javsalgar
Copy link
Contributor

To be honest, we did not explore that possibility but I think it does make sense. We simply add them into the NOTES.txt for historical reasons.

@javsalgar @carrodher what do you think about this?

I agree it is for historical reasons. I think it makes sense to explore that possibility and figure out the best option to do it. It make take some time from us because we would need to adapt it to all of our charts.

@github-actions
Copy link

github-actions bot commented May 5, 2021

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label May 5, 2021
@rafariossaa
Copy link
Contributor

Hi,
I am creating a internal task to evaluate this and take actions if needed.
This is added to our backlog, I can not ensure if this will be implemented nor I can provide an ETA.

@github-actions github-actions bot removed the stale 15 days without activity label May 6, 2021
@max3903
Copy link

max3903 commented May 6, 2021

Hello,

I had the same problem on my helm chart and was able to fix it with something like this:

  {{ if .Release.IsInstall }}
  password: {{ randAlphaNum 16 | b64enc | quote }}
  {{ else }}
  password: {{ index (lookup "v1" "Secret" .Release.Namespace (include "app.fullname" . )).data "password" }}
  {{ end }}

Can we avoid regenerating secrets when the PostgreSQL chart is upgraded?

Thanks.

@carrodher
Copy link
Member

Hi, thanks for letting us know. We are investigating different alternatives to improve the current approach to handle random passwords and upgrades. One of the concerns of using lookup is that it's a Helm specific function and we are trying to make our charts compatible with "bare k8s", I mean users should be able to do helm template and then deploy the generated manifests directly in k8s without Helm, so this approach using lookup is not going to work in this scenario. Mainly this is the reason why we are still investigating different alternatives to solve this issue.

We will update this thread with more information once decided a proper solution.

@github-actions
Copy link

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label May 23, 2021
@rafariossaa rafariossaa added on-hold Issues or Pull Requests with this label will never be considered stale and removed stale 15 days without activity labels May 25, 2021
@dr4Ke
Copy link
Contributor

dr4Ke commented Nov 16, 2021

As an alternative, we also support managing these credentials externally to the chart making use of the existingSecret parameter. Using this approach, there's no chance for users to make inappropriate use of the charts.

Exactly! Thanks.

@mickours
Copy link

I've just been bitten by this. Any solution on the pipe?

@juan131
Copy link
Contributor

juan131 commented Mar 10, 2022

Recent versions of the PostgreSQL chart use Helm's "lookup" functionality to auto-detect existing secret keys. What version are you using @mickours ? In any case, the alternatives that you have to solve the issue if you faced it are documented in the Troubleshooting guide below:

@mickours
Copy link

Thanks, this is coming from the requarks/wiki chart which use an outdated version of this chart:
https://github.com/Requarks/wiki/blob/main/dev/helm/Chart.lock

@juan131
Copy link
Contributor

juan131 commented Mar 11, 2022

Hi @mickours

The ideal thing in this case would be to update the chart deps on that chart, but that's out of our scope. I hope that the alternatives documented in the Troubleshooting guide will be useful to workaround the issue.

@carrodher
Copy link
Member

Unfortunately, this issue was created a long time ago and although there is an internal task to fix it, it was not prioritized as something to address in the short/mid term. It's not a technical reason but something related to the capacity since we're a small team.

Being said that, contributions via PRs are more than welcome in both repositories (containers and charts). Just in case you would like to contribute.

During this time, there are several releases of this asset and it's possible the issue has gone as part of other changes. If that's not the case and you are still experiencing this issue, please feel free to reopen it and we will re-evaluate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on-hold Issues or Pull Requests with this label will never be considered stale
Projects
None yet
Development

No branches or pull requests

8 participants