Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic in compute_ctl metrics collection #9831

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

tristan957
Copy link
Member

Calling unwrap on the encoder is a little overzealous. One of the errors that can be returned by the encode function in particular is the non-existence of metrics for a metric family, so we should prematurely filter instances like that out. I believe that the cause of this panic was caused by a race condition between the prometheus collector and the compute collecting the installed extensions metric for the first time. The HTTP server is spawned on a separate thread before we even start bringing up Postgres.

Copy link

github-actions bot commented Nov 21, 2024

5561 tests run: 5335 passed, 0 failed, 226 skipped (full report)


Flaky tests (3)

Postgres 17

Postgres 15

Code coverage* (full report)

  • functions: 31.4% (7948 of 25322 functions)
  • lines: 49.3% (63083 of 127916 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
8878bcc at 2024-11-21T19:25:04.050Z :recycle:

@tristan957 tristan957 force-pushed the tristan957/installed_extensions branch from fc02412 to 554c36d Compare November 21, 2024 05:04
Copy link
Contributor

@knizhnik knizhnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not only eliminating panic, but also filtering empty metrics.
Is it actually required? If so it is better to explain why.

@tristan957 tristan957 force-pushed the tristan957/installed_extensions branch from 554c36d to f748f18 Compare November 21, 2024 16:40
Calling unwrap on the encoder is a little overzealous. One of the errors
that can be returned by the encode function in particular is the
non-existence of metrics for a metric family, so we should prematurely
filter instances like that out. I believe that the cause of this panic
was caused by a race condition between the prometheus collector and the
compute collecting the installed extensions metric for the first time.
The HTTP server is spawned on a separate thread before we even start
bringing up Postgres.

Signed-off-by: Tristan Partin <tristan@neon.tech>
@tristan957 tristan957 force-pushed the tristan957/installed_extensions branch from f748f18 to 8878bcc Compare November 21, 2024 16:41
@tristan957 tristan957 added this pull request to the merge queue Nov 21, 2024
Merged via the queue into main with commit 37962e7 Nov 21, 2024
79 checks passed
@tristan957 tristan957 deleted the tristan957/installed_extensions branch November 21, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants