[kube-prometheus-stack] Long term fix for kube-prometheus-stack release size issues #3548

jkroepke · 2023-07-03T23:12:27Z

Hi @prometheus-community/helm-charts-maintainers,

I would like to invite some of you to collect some ideas to solve the issues around the release size of kube-prometheus-stack.

#3083 introduced the issue again, which could the resolved once more by minify the dashboards again.

However, if Prometheus Operator increases the sizes of the CRDs again, we may hit this issue soon.

I do some analysis of the current release data, based on #3546. Here are the steps that I do to gain some reproducable resultes:

helm upgrade -i kube-prometheus-stack . --set windowsMonitoring.enabled=true
kubectl get secrets sh.helm.release.v1.kube-prometheus-stack.v2 -o jsonpath='{.data.release}' | base64 -d > release
ls -l release                                                                                                      
-rw-r--r--  1 jkr  staff  1031276 Jul  4 00:03 release

By enable the windows monitoring feature from #3083, the release size is 1031276 (release name: kube-prometheus-stack / namespace: default). Without the windows monitoring, it's 1025044.

Both values are close to the hard-limit of 1048576 bytes.

To gain the full raw data of the helm release, run

kubectl get secrets sh.helm.release.v1.kube-prometheus-stack.v2 -o jsonpath='{.data.release}' | base64 -d | base64 -d | zcat - > release.json

This give us the raw JSON data of the helm release. Then I upload the data to an json size analyser:

Most of the release size is files section:

% jq -r '.chart.files[] | (.data | length | tostring) + ": " + .name' release.json | sort -n
532: .helmignore
876: CONTRIBUTING.md
8556: crds/crd-prometheusrules.yaml
24124: crds/crd-scrapeconfigs.yaml
47600: crds/crd-podmonitors.yaml
49656: crds/crd-probes.yaml
50024: crds/crd-servicemonitors.yaml
80636: README.md
360328: crds/crd-alertmanagerconfigs.yaml
565424: crds/crd-thanosrulers.yaml
600740: crds/crd-alertmanagers.yaml
674428: crds/crd-prometheusagents.yaml
773992: crds/crd-prometheuses.yaml

Add md files to helmignore.

As we see, the README.md is included which is not necessary. I did a test with exclude the 2 README files (add them to the .helmignore file) which only sizes 11KB here 1031276 -> 1010524

0.9% savings

Minify CRD files.

The size of the CRDs are pretty huge. In theory, a lot of data could be saved by store them as minified JSON, too.

Minify all YAML files

for i in *.yaml; do
  yq -o=json -I=0 $i > ${i/yaml/json} 
  rm $i
done

Would result into a much lower release size. 1031276 -> 851536 bytes

jq -r '.chart.files[] | (.data | length | tostring) + ": " + .name' release.json | sort -n
568: .helmignore
5656: crds/crd-prometheusrules.json
15260: crds/crd-scrapeconfigs.json
26808: crds/crd-podmonitors.json
28424: crds/crd-servicemonitors.json
29220: crds/crd-probes.json
159628: crds/crd-alertmanagerconfigs.json
321280: crds/crd-thanosrulers.json
337412: crds/crd-alertmanagers.json
385084: crds/crd-prometheusagents.json
442848: crds/crd-prometheuses.json

17.4% savings.

Risks: Unknown behavior in ArgoCD/FluxCD.

Move CRD files into a sub-chart.

After reading helm/helm#11493, it seems like that helm has a different behavior for sub-charts. We all love Helm.

I tried this out by moving the CRDs to an unmanaged sub-chart, named kube-prometheus-stack-crds:

See #3547, for an POC.

Results:

Running helm install still install all CRDs on a fresh installation. However, they are not longer included into the helm release. This is fine, since helm does not manage CRD anyways.

The CRD files are still included into the chart package (helm package), which is not the case by using the .helmignore file.

As result, the total release size after moving the CRD is reduced now: 1031276 -> 392092 bytes

61% savings.

My proposal would be to continue the POC from #3547. I would like to see some things from some maintainers. I learned that FluxCD has a special handling for CRDs and it would be interest into it, if this approach is still compatible with FluxCD.

This would finally resolve the Too long: must have at most 1048576 bytes issue forever and we have tons of new space for new manifests. If kube-prometheus-stack hits the release limit on some point in the future again, we could think about todo the same for the dashboard manifests. But for now, moving the CRDs into a sub-chart looks like the easiest one.

The text was updated successfully, but these errors were encountered:

GMartinez-Sisti · 2023-07-04T08:14:36Z

Hi @jkroepke, thank you for this great explanation and discovery process.

I'm in favour of moving the CRDs to a sub-chart, it's easier to manage than having to update them manually as we have to do now. Regarding FluxCD, unfortunately I'm not a user so I don't have any experience with it.

SuperQ · 2023-07-04T08:54:15Z

I did some digging around, it doesn't seem like helm has caught up with Server-Side Apply features in Kubernetes.

jkroepke · 2023-07-04T14:42:43Z

@GMartinez-Sisti do you talk about an dedicated helm chart, like https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-operator-crds or an „unmanaged“ sub-chart like #3547?

Mention that we can not use https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-operator-crds in this context, since the CRDs must be still on the crds directory here

GMartinez-Sisti · 2023-07-04T20:24:48Z

@GMartinez-Sisti do you talk about an dedicated helm chart, like https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-operator-crds or an „unmanaged“ sub-chart like #3547?

Mention that we can not use https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-operator-crds in this context, since the CRDs must be still on the crds directory here

I think having both a chart for CRDs and an unmanaged chart for CRDs can be kind of confusing, but I get the limitations we have. Looks like an unmanaged chart is currently our best option.

rouke-broersma · 2023-07-05T07:24:30Z

We need the managed chart for applying the CRDs using ArgoCD since we need to add annotations to tell it to use server side apply, and we cannot currently turn on server side apply globally due to some missing features.

jkroepke · 2023-07-05T07:49:02Z

We need the managed chart for applying the CRDs using ArgoCD since we need to add annotations to tell it to use server side apply

But we could also add the annotations on kube-prometheus-stack as well, too?

metadata:
  annotations:
    argocd.argoproj.io/sync-options: ServerSideApply=true

rouke-broersma · 2023-07-05T10:13:02Z

We need the managed chart for applying the CRDs using ArgoCD since we need to add annotations to tell it to use server side apply

But we could also add the annotations on kube-prometheus-stack as well, too?
metadata:
  annotations:
    argocd.argoproj.io/sync-options: ServerSideApply=true

This was discussed back when the large description were added and the preferred install method was switched to use server side apply, and iirc the community decision was then that product-specific annotations would not be added by default. Then the new standalone chart was created with the templated crds so we can add our own annotations as needed.

We are fine with deprecating the prometheus operator crds chart if the argocd annotation is added to the crds from this new unmanaged chart if the community opinion is now changed.

jkroepke · 2023-07-05T10:24:39Z

We also can everything leave as it is and "just" copy the CRDS from crds/ to charts/prometheus-operator-crds/crds/ directory like in #3547 and we are done.

The "unmanged" sub chart prometheus-operator-crds we not exposed anywhere and would be only part of the kube-prometheus-stack.

rouke-broersma · 2023-07-05T11:52:52Z

So prometheus-operator-crds would remain to exist as it is for users who need to template the crds and it would be used 'unmanaged' by kube-prometheus-stack to place the crds in the crds folder is that what you mean?

jkroepke · 2023-07-05T12:51:41Z

Correct, yes.

jkroepke · 2023-07-08T16:25:39Z

#3547 is opened for an review. More reviewer may help to guarantee that the change itself does not break something.

rgarcia6520 · 2023-08-15T17:12:17Z

I could be misunderstanding but seems this is a breaking change. Checking out the latest version of the chart with crds.enabled=true (not modifying anything) and running helm template no CRD templates are present.

Are we expected to also install or track the CRDs separately as another chart?

rouke-broersma · 2023-08-15T18:03:36Z

I could be misunderstanding but seems this is a breaking change. Checking out the latest version of the chart with crds.enabled=true (not modifying anything) and running helm template no CRD templates are present.

Are we expected to also install or track the CRDs separately as another chart?

This was a breaking change that's why the major version number was increased. The upgrade doc specifies what you need to do for this upgrade.

rgarcia6520 · 2023-08-15T18:14:56Z

Thanks re-reading I had only seen the We do not expect any breaking changes in this version. under this specific version hence the confusion

jkroepke · 2023-08-15T19:14:28Z

running helm template no CRD templates are present.

On my machine, it works fine

% helm template oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack --include-crds | grep CustomResourceDefinition
Pulled: ghcr.io/prometheus-community/charts/kube-prometheus-stack:48.3.1
Digest: sha256:c7f48ccf4070e3b26e2e489af773397be2bd045f3405679b6be601b3b0fdd8c2
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition
kind: CustomResourceDefinition

rgarcia6520 · 2023-08-15T19:22:06Z

We tried updating while deploying with fluxcd v2 and it's mad no matches for any CRDs are present in the latest chart.

Cloning it down and running helm template we are seeing the same thing. I was able to finally get it to list them out with a few changes to the dependencies: part of chart and ensuring the CRDs were in crds/templates in the sub-chart.

❯ cd charts/kube-prometheus-stack
❯ ls
CONTRIBUTING.md  Chart.lock  Chart.yaml  README.md  charts  ci  hack  install  templates  unittests  values.yaml
❯ helm template monitoring ./ > install --debug
install.go:194: [debug] Original chart version: ""
install.go:211: [debug] CHART PATH: /home/ryan/kps-helm-charts/charts/kube-prometheus-stack

❯ helm template monitoring ./ | grep CustomResourceDefinition

jkroepke · 2023-08-15T19:26:39Z

What about

helm template monitoring ./ --include-crds

Maybe the FluxCD integration is not compatible with the approach here. Feels like an issue from FluxCD that CRDs from sub-charts are not supported, not sure.

But from Helm and Helm Chart point of view, there isn't a breaking change.

rgarcia6520 · 2023-08-16T14:10:58Z

Fluxcd is compatible with that approach. It seems our issue was the chart/crds folder was not fully removed on a branch and the error was misleading as it was about duplicates but was outputting "no match or kind".

Update to latest kube-prometheus-stack chart version. Changes: 55.x This version upgrades Prometheus-Operator to v0.70.0. 54.x Grafana Helm Chart has bumped to version 7. 53.x This version upgrades Prometheus-Operator to v0.69.1, Prometheus to 2.47.2. 52.x This includes the ability to select between using existing secrets or create new secret objects for various thanos config. The defaults have not changed but if you were setting: - `thanosRuler.thanosRulerSpec.alertmanagersConfig` or - `thanosRuler.thanosRulerSpec.objectStorageConfig` or - `thanosRuler.thanosRulerSpec.queryConfig` or - `prometheus.prometheusSpec.thanos.objectStorageConfig` you will have to need to set `existingSecret` or `secret` based on your requirement. 51.x This version upgrades Prometheus-Operator to v0.68.0, Prometheus to 2.47.0 and Thanos to v0.32.2. 50.x This version requires Kubernetes 1.19+. We do not expect any breaking changes in this version. 49.x This version upgrades Prometheus-Operator to v0.67.1, 0, Alertmanager to v0.26.0, Prometheus to 2.46.0 and Thanos to v0.32.0. 48.x This version moved all CRDs into a dedicated sub-chart. No new CRDs are introduced in this version. See <prometheus-community/helm-charts#3548> for more context. We do not expect any breaking changes in this version. 47.x This version upgrades Prometheus-Operator to v0.66.0 with new CRDs (PrometheusAgent and ScrapeConfig). 46.x This version upgrades Prometheus-Operator to v0.65.1 with new CRDs (PrometheusAgent and ScrapeConfig), Prometheus to v2.44.0 and Thanos to v0.31.0. 45.x This version upgrades Prometheus-Operator to v0.63.0, Prometheus to v2.42.0 and Thanos to v0.30.2.

This was referenced Jul 3, 2023

[kube-prometheus-stack] Move CRDs into sub-charts #3547

Merged

[kube-prometheus-stack] update kps to v0.65.1 #3267

Closed

jkroepke added enhancement New feature or request lifecycle/frozen labels Jul 4, 2023

dotdc mentioned this issue Jul 5, 2023

[kube-state-metrics] add license annotation #3551

Merged

3 tasks

GMartinez-Sisti closed this as completed in #3547 Jul 12, 2023

ebuildy mentioned this issue Aug 21, 2023

fix(helm/crds): move crds to sub chart pattern spotahome/redis-operator#651

Closed

eddycharly mentioned this issue Sep 1, 2023

[Feature] Find a solution to reduce size of the chart kyverno/kyverno#8162

Closed

2 tasks

QuentinBisson mentioned this issue Sep 12, 2023

Move to chart dependency giantswarm/prometheus-operator-crd#76

Merged

kuzm1ch mentioned this issue Nov 11, 2023

[agent-operator] Move crds(agent, prometheus) into separate subcharts grafana/helm-charts#2781

Open

luisdavim mentioned this issue Nov 28, 2023

Stored release could be compressed more efficiently helm/helm#10308

Closed

iamleot mentioned this issue Dec 29, 2023

Update to kube-prometheus-stack chart 55.5.1 iamleot/rpi-flux#62

Merged

chgl mentioned this issue May 23, 2024

Provide ability to deploy CRDs independently of application cloudnative-pg/charts#280

Open

jaronoff97 mentioned this issue May 28, 2024

Create a separate crds helm chart open-telemetry/opentelemetry-helm-charts#677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kube-prometheus-stack] Long term fix for kube-prometheus-stack release size issues #3548

[kube-prometheus-stack] Long term fix for kube-prometheus-stack release size issues #3548

jkroepke commented Jul 3, 2023

GMartinez-Sisti commented Jul 4, 2023

SuperQ commented Jul 4, 2023

jkroepke commented Jul 4, 2023 •

edited

Loading

GMartinez-Sisti commented Jul 4, 2023

rouke-broersma commented Jul 5, 2023

jkroepke commented Jul 5, 2023

rouke-broersma commented Jul 5, 2023 •

edited

Loading

jkroepke commented Jul 5, 2023

rouke-broersma commented Jul 5, 2023

jkroepke commented Jul 5, 2023

jkroepke commented Jul 8, 2023

rgarcia6520 commented Aug 15, 2023

rouke-broersma commented Aug 15, 2023

rgarcia6520 commented Aug 15, 2023

jkroepke commented Aug 15, 2023

rgarcia6520 commented Aug 15, 2023

jkroepke commented Aug 15, 2023 •

edited

Loading

rgarcia6520 commented Aug 16, 2023

[kube-prometheus-stack] Long term fix for kube-prometheus-stack release size issues #3548

[kube-prometheus-stack] Long term fix for kube-prometheus-stack release size issues #3548

Comments

jkroepke commented Jul 3, 2023

Add md files to helmignore.

Minify CRD files.

Move CRD files into a sub-chart.

GMartinez-Sisti commented Jul 4, 2023

SuperQ commented Jul 4, 2023

jkroepke commented Jul 4, 2023 • edited Loading

GMartinez-Sisti commented Jul 4, 2023

rouke-broersma commented Jul 5, 2023

jkroepke commented Jul 5, 2023

rouke-broersma commented Jul 5, 2023 • edited Loading

jkroepke commented Jul 5, 2023

rouke-broersma commented Jul 5, 2023

jkroepke commented Jul 5, 2023

jkroepke commented Jul 8, 2023

rgarcia6520 commented Aug 15, 2023

rouke-broersma commented Aug 15, 2023

rgarcia6520 commented Aug 15, 2023

jkroepke commented Aug 15, 2023

rgarcia6520 commented Aug 15, 2023

jkroepke commented Aug 15, 2023 • edited Loading

rgarcia6520 commented Aug 16, 2023

jkroepke commented Jul 4, 2023 •

edited

Loading

rouke-broersma commented Jul 5, 2023 •

edited

Loading

jkroepke commented Aug 15, 2023 •

edited

Loading