Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cardinality or memory limit for prometheus exporters #33540

Open
LeoQuote opened this issue Jun 13, 2024 · 6 comments
Open

Cardinality or memory limit for prometheus exporters #33540

LeoQuote opened this issue Jun 13, 2024 · 6 comments

Comments

@LeoQuote
Copy link

Component(s)

exporter/prometheus, exporter/prometheusremotewrite

Is your feature request related to a problem? Please describe.

When the collector receives metrics, it occupies a portion of the memory, and when the workload stops sending metrics, this part of the memory is not released.

Memory growth may lead to memory limits being exceeded or excessively frequent garbage collection (GC), resulting in efficiency issues. Additionally, an excess of useless metrics can also cause storage and memory pressure on Prometheus.

Describe the solution you'd like

Provided a method to automatically expire related metrics at the collector level, it would alleviate the pressure on both the collector and Prometheus simultaneously.

Describe alternatives you've considered

Setting a cardinality limit could be an approach. If this limit is exceeded, the process should either exit or clean up the metrics. Developers can monitor the restart of the process to detect potential issues in real time.

Additional context

could be related to: #32511 #33324

Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@wildum
Copy link
Contributor

wildum commented Jul 4, 2024

Hi, isn't the memory released via the metric_expiration parameter in the prometheusexporter?

@LeoQuote
Copy link
Author

LeoQuote commented Jul 9, 2024

Yes, it is released, but metrics could be too much that reached the very high level for a short time even before any metric expires

Copy link
Contributor

github-actions bot commented Sep 9, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Sep 9, 2024
@dashpole dashpole removed Stale needs triage New item requiring triage labels Sep 9, 2024
@dashpole
Copy link
Contributor

dashpole commented Sep 9, 2024

Hey @LeoQuote, sorry this didn't get looked at.

this part of the memory is not released.

How do you know?

resulting in efficiency issues

Can you share more details? Are you looking at profiles? Go runtime metrics?

Can you share more about your setup? Are you using the prometheus receiver? Or the OTLP receiver? Or something else?

@dashpole dashpole self-assigned this Sep 9, 2024
@LeoQuote
Copy link
Author

LeoQuote commented Sep 9, 2024

Thanks for your reply, I’ve turned to limit the overall series number and restart the collector everyday to reset the memory consumption.

I’ll try to recreate the original issue and provide related info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants