BUG: data too long issue rendering #923

DrummyFloyd · 2024-06-12T07:54:13Z

User story

issue created , due to mention in slack

as a recurent tester on this project, ( because i like it)
i test upon some wanted operator on my stack.

atm i have some issue with following manifest with 0.10 OLM

apiVersion: catalogd.operatorframework.io/v1alpha1
kind: Catalog
metadata:
 name: operatorhubio
spec:
 source:
   type: image
   image:
     ref: quay.io/operatorhubio/catalog:latest
     pollInterval: 24h

---
apiVersion: olm.operatorframework.io/v1alpha1
kind: ClusterExtension
metadata:
 name: op-mariadb
spec:
 installNamespace: operators
 packageName: mariadb-operator
 version: 0.29.0

List if issues

error message rendered.

InstallationFailed:create: failed to create: Secret "sh.helm.release.v1.op-mariadb.v1" is invalid: data: Too long: must have at most 1048576 bytes
```

### New content or update?

Net-new content

### Types of documentation

_No response_

### References

_No response_

The text was updated successfully, but these errors were encountered:

joelanford · 2024-06-17T12:15:11Z

Thanks for submitting this issue!

There are two things that cause this bug:

mariadb-operator has some very large CRDs
Helm tries to store the entire release inside of a single secret, and etcd has a size limit for what it can store.

tmshort · 2024-06-17T13:44:53Z

We used tar + gz to create some of these resources, and that might be necessary here... unless even that is too big.

itroyano · 2024-06-18T15:12:55Z

We have a couple options to choose from (thanks @varshaprasad96 and @komish! ):

Split the impl into 2 charts - crds and main, with a dependency between them and owner refs determining correct deletion sequence (until we implement a Finalizer that takes care of it).
Pros: a best practice by helm team
Cons: possibly more edge cases ?
Compress manifests as @tmshort suggested.
Pros: should take care of this issue and similar ones.
Cons: there could be cases where it's too big even after compression. would require implementing a custom secret encode and decode mechanism, instead of using Helm's default one.
Use a different storage backend e.g. SQL https://helm.sh/docs/topics/advanced/#storage-backends
Pros: size is not an issue.
Cons: requires spinning up and maintaining a PG instance.

tmshort · 2024-06-18T15:53:25Z

For (2) we do that for the test registry:

operator-controller/testdata/bundles/registry-v1/build-push-e2e-bundle.sh

Lines 34 to 37 in bfd4142

    
           tgz="${bundle_dir}/manifests.tgz" 
        
           tar czf "${tgz}" -C "${bundle_dir}/" manifests metadata 
        
           kubectl create configmap -n "${namespace}" --from-file="${tgz}" operator-controller-${bundle_name}.manifests 
        
           rm "${tgz}"

acornett21 · 2024-06-18T16:30:01Z

This seems really strange to me that we are offloading CRD creation/management to helm when we know that helm itself can't manage the install/update of CRD's. I was under the impression (though it appears wrong), that OLMv1 was going to mange the creation of CRD's outside the context (knowledge) of helm. It seems that this was only in the context of namespace scoped installs, but even still that seems illogical, that we'd have two different control paths/flows within OLMv1.

Should we possibly look into a 4th option of managing the CRD's ourselves, or is that out of scope(effort/timeline)?

varshaprasad96 · 2024-06-18T17:34:19Z

Adding some thoughts based on initial findings and discussion with @itroyano:

Option 3:
The options available to us in terms of possible supported Helm backends are: Secrets, Configmaps and SQL.
The first two don't solve the problem, they have similar size limits. The last one - is in beta, as well as maintaining a separate component is an unnecessary added pain.

Option 2:
Compression is a good idea, and helm by default does an encoding and its hardcoded in its implementation. But there is a small caveat here - I am not sure if compressing it twice will enable helm to read secret data. The implementation is in here, and compressing it right way, would cause issues when helm tries to decode and read it (I haven't tried it, but looks so) - eg: https://github.com/helm/helm/blob/1a500d5625419a524fdae4b33de351cc4f58ec35/pkg/storage/driver/secrets.go#L96.

The reason it works in the e2e, I think, is because Kaniko uncompresses the tarball from its context. Based on a quick glance, when we provide a tarball as the context for Kaniko, it extracts the contents of the tarball and then proceeds to build the Docker image using the extracted files. Which means the manifests that are passed for the chart's creation are still uncompressed.

The other option here, as alluded by @joelanford was to reimplement the whole secret driver - which is the code available here: https://github.com/helm/helm/blob/1a500d5625419a524fdae4b33de351cc4f58ec35/pkg/storage/driver/secrets.go. Re-implementing an additional compression, or even creating shards would probably take more effort for us as well as maintaining it could be an additional problem.

Option 1:

Helm by default does not manage the lifecycle of CRDs. If the CRDs are stored in a separate crd/ folder, then they are applied before creating a chart, and is hence not a part of the release (ref: https://github.com/helm/helm/blob/1a500d5625419a524fdae4b33de351cc4f58ec35/pkg/action/install.go#L254-L263). The major benefit of using OLM v1 is the exact solution to this problem, and the intentional reason afaik to include it as a part of release, is so that we implement our own set of CRDUpgradeSafety checks, and let Helm handle CRDs as it would do with any other manifest. If we need to separate CRDs from the rest of the manifests, there are 2 things we could do:

a. Handle CRDs on our own - with a kubectl apply
b. Create a Helm chart that contains CRDs and the main chart containing manifests is marked as a dependent to the CRD chart. (This is one of the popular solutions: cloudnative-pg/charts#280 (comment))

Implementing (a) and (b) are synonymous imo. The only thing is - there shouldn't be an edge case, where CRDs themselves exceed the allowable size of Helm. I'm not sure if that is even a best practice (with other practical concerns that such huge CRDs have on performance, caching, (probably etcd limits) etc).

Both of these methods - which make us manage CRDs separately than manifests, bring in two concerns:

Upgrades! We need to dig into how complicated the upgrade process with a dependent chart is. Logically, dependent charts should get upgraded first, but this also may have impact on existing CRs remaining from the main chart when CRDs are being upgraded.
Deletion - There is a work around for this - either implement deletion logic or just make the CRD chart to be the owner of the main chart, such that cascading deletion makes sure that the CRs are deleted before the CRDs.

Both options (2) and (3) (ie reimplementing secret driver or separating out CRDs in another chart) come with their own maintenance challenges. The decision is to choose the one that is easier for us to implement and manage.

joelanford · 2024-07-03T17:12:56Z

I tinkered on this today and came up with a custom driver in helm-operator-plugins that:

doesn't do an extra (unnecessary) layer of base64 encoding (which makes it possible to fit more release data in a single secret
chunks the gzipped data from a release (based on a ChunkSize provided to the driver), and manages an ordered set of secrets for a particular key.

In theory, it could handle up to MaxUint64 chunks, but we would probably want to cap it (maybe at 10?) to have some control of a supportable upper bound.

DrummyFloyd added the kind/documentation Categorizes issue or PR as related to documentation. label Jun 12, 2024

grokspawn added kind/bug Categorizes issue or PR as related to a bug. and removed kind/documentation Categorizes issue or PR as related to documentation. labels Jun 12, 2024

joelanford mentioned this issue Jun 17, 2024

[epic] v1.0.0 Proposed release blockers #950

Open

16 tasks

joelanford mentioned this issue Jun 20, 2024

Possible bug: Do we properly handle mutations of spec.installNamespace? #961

Closed

joelanford mentioned this issue Jul 1, 2024

[epic] operator-controller rejects helm charts that use helm hooks #995

Open

joelanford self-assigned this Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: data too long issue rendering #923

BUG: data too long issue rendering #923

DrummyFloyd commented Jun 12, 2024

joelanford commented Jun 17, 2024

tmshort commented Jun 17, 2024

itroyano commented Jun 18, 2024

tmshort commented Jun 18, 2024 •

edited

Loading

acornett21 commented Jun 18, 2024

varshaprasad96 commented Jun 18, 2024 •

edited

Loading

joelanford commented Jul 3, 2024

BUG: data too long issue rendering #923

BUG: data too long issue rendering #923

Comments

DrummyFloyd commented Jun 12, 2024

User story

List if issues

joelanford commented Jun 17, 2024

tmshort commented Jun 17, 2024

itroyano commented Jun 18, 2024

tmshort commented Jun 18, 2024 • edited Loading

acornett21 commented Jun 18, 2024

varshaprasad96 commented Jun 18, 2024 • edited Loading

joelanford commented Jul 3, 2024

tmshort commented Jun 18, 2024 •

edited

Loading

varshaprasad96 commented Jun 18, 2024 •

edited

Loading