Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: data too long issue rendering #923

Open
Tracked by #950
DrummyFloyd opened this issue Jun 12, 2024 · 7 comments
Open
Tracked by #950

BUG: data too long issue rendering #923

DrummyFloyd opened this issue Jun 12, 2024 · 7 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@DrummyFloyd
Copy link

User story

issue created , due to mention in slack

as a recurent tester on this project, ( because i like it)
i test upon some wanted operator on my stack.

atm i have some issue with following manifest with 0.10 OLM

apiVersion: catalogd.operatorframework.io/v1alpha1
kind: Catalog
metadata:
 name: operatorhubio
spec:
 source:
   type: image
   image:
     ref: quay.io/operatorhubio/catalog:latest
     pollInterval: 24h

---
apiVersion: olm.operatorframework.io/v1alpha1
kind: ClusterExtension
metadata:
 name: op-mariadb
spec:
 installNamespace: operators
 packageName: mariadb-operator
 version: 0.29.0

List if issues

error message rendered.

InstallationFailed:create: failed to create: Secret "sh.helm.release.v1.op-mariadb.v1" is invalid: data: Too long: must have at most 1048576 bytes
```

### New content or update?

Net-new content

### Types of documentation

_No response_

### References

_No response_
@DrummyFloyd DrummyFloyd added the kind/documentation Categorizes issue or PR as related to documentation. label Jun 12, 2024
@grokspawn grokspawn added kind/bug Categorizes issue or PR as related to a bug. and removed kind/documentation Categorizes issue or PR as related to documentation. labels Jun 12, 2024
@joelanford
Copy link
Member

Thanks for submitting this issue!

There are two things that cause this bug:

  1. mariadb-operator has some very large CRDs
  2. Helm tries to store the entire release inside of a single secret, and etcd has a size limit for what it can store.

@tmshort
Copy link
Contributor

tmshort commented Jun 17, 2024

We used tar + gz to create some of these resources, and that might be necessary here... unless even that is too big.

@itroyano
Copy link
Contributor

We have a couple options to choose from (thanks @varshaprasad96 and @komish! ):

  1. Split the impl into 2 charts - crds and main, with a dependency between them and owner refs determining correct deletion sequence (until we implement a Finalizer that takes care of it).
    Pros: a best practice by helm team
    Cons: possibly more edge cases ?

  2. Compress manifests as @tmshort suggested.
    Pros: should take care of this issue and similar ones.
    Cons: there could be cases where it's too big even after compression. would require implementing a custom secret encode and decode mechanism, instead of using Helm's default one.

  3. Use a different storage backend e.g. SQL https://helm.sh/docs/topics/advanced/#storage-backends
    Pros: size is not an issue.
    Cons: requires spinning up and maintaining a PG instance.

@tmshort
Copy link
Contributor

tmshort commented Jun 18, 2024

For (2) we do that for the test registry:

tgz="${bundle_dir}/manifests.tgz"
tar czf "${tgz}" -C "${bundle_dir}/" manifests metadata
kubectl create configmap -n "${namespace}" --from-file="${tgz}" operator-controller-${bundle_name}.manifests
rm "${tgz}"

@acornett21
Copy link
Contributor

This seems really strange to me that we are offloading CRD creation/management to helm when we know that helm itself can't manage the install/update of CRD's. I was under the impression (though it appears wrong), that OLMv1 was going to mange the creation of CRD's outside the context (knowledge) of helm. It seems that this was only in the context of namespace scoped installs, but even still that seems illogical, that we'd have two different control paths/flows within OLMv1.

Should we possibly look into a 4th option of managing the CRD's ourselves, or is that out of scope(effort/timeline)?

@varshaprasad96
Copy link
Member

varshaprasad96 commented Jun 18, 2024

Adding some thoughts based on initial findings and discussion with @itroyano:

Option 3:
The options available to us in terms of possible supported Helm backends are: Secrets, Configmaps and SQL.
The first two don't solve the problem, they have similar size limits. The last one - is in beta, as well as maintaining a separate component is an unnecessary added pain.

Option 2:
Compression is a good idea, and helm by default does an encoding and its hardcoded in its implementation. But there is a small caveat here - I am not sure if compressing it twice will enable helm to read secret data. The implementation is in here, and compressing it right way, would cause issues when helm tries to decode and read it (I haven't tried it, but looks so) - eg: https://github.com/helm/helm/blob/1a500d5625419a524fdae4b33de351cc4f58ec35/pkg/storage/driver/secrets.go#L96.

The reason it works in the e2e, I think, is because Kaniko uncompresses the tarball from its context. Based on a quick glance, when we provide a tarball as the context for Kaniko, it extracts the contents of the tarball and then proceeds to build the Docker image using the extracted files. Which means the manifests that are passed for the chart's creation are still uncompressed.

The other option here, as alluded by @joelanford was to reimplement the whole secret driver - which is the code available here: https://github.com/helm/helm/blob/1a500d5625419a524fdae4b33de351cc4f58ec35/pkg/storage/driver/secrets.go. Re-implementing an additional compression, or even creating shards would probably take more effort for us as well as maintaining it could be an additional problem.

Option 1:

Helm by default does not manage the lifecycle of CRDs. If the CRDs are stored in a separate crd/ folder, then they are applied before creating a chart, and is hence not a part of the release (ref: https://github.com/helm/helm/blob/1a500d5625419a524fdae4b33de351cc4f58ec35/pkg/action/install.go#L254-L263). The major benefit of using OLM v1 is the exact solution to this problem, and the intentional reason afaik to include it as a part of release, is so that we implement our own set of CRDUpgradeSafety checks, and let Helm handle CRDs as it would do with any other manifest. If we need to separate CRDs from the rest of the manifests, there are 2 things we could do:

a. Handle CRDs on our own - with a kubectl apply
b. Create a Helm chart that contains CRDs and the main chart containing manifests is marked as a dependent to the CRD chart. (This is one of the popular solutions: cloudnative-pg/charts#280 (comment))

Implementing (a) and (b) are synonymous imo. The only thing is - there shouldn't be an edge case, where CRDs themselves exceed the allowable size of Helm. I'm not sure if that is even a best practice (with other practical concerns that such huge CRDs have on performance, caching, (probably etcd limits) etc).

Both of these methods - which make us manage CRDs separately than manifests, bring in two concerns:

  • Upgrades! We need to dig into how complicated the upgrade process with a dependent chart is. Logically, dependent charts should get upgraded first, but this also may have impact on existing CRs remaining from the main chart when CRDs are being upgraded.
  • Deletion - There is a work around for this - either implement deletion logic or just make the CRD chart to be the owner of the main chart, such that cascading deletion makes sure that the CRs are deleted before the CRDs.

Both options (2) and (3) (ie reimplementing secret driver or separating out CRDs in another chart) come with their own maintenance challenges. The decision is to choose the one that is easier for us to implement and manage.

@joelanford
Copy link
Member

I tinkered on this today and came up with a custom driver in helm-operator-plugins that:

  1. doesn't do an extra (unnecessary) layer of base64 encoding (which makes it possible to fit more release data in a single secret
  2. chunks the gzipped data from a release (based on a ChunkSize provided to the driver), and manages an ordered set of secrets for a particular key.

In theory, it could handle up to MaxUint64 chunks, but we would probably want to cap it (maybe at 10?) to have some control of a supportable upper bound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

7 participants