Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: avoid logging the "ERROR" part of DbErrors that are successes #4902

Merged
merged 3 commits into from
Aug 8, 2023

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Aug 4, 2023

Problem

The pageserver<->safekeeper protocol uses error messages to indicate end of stream. pageserver already logs these at INFO level, but the inner error message includes the word "ERROR", which interferes with log searching.

Example:

  walreceiver connection handling ended: db error: ERROR: ending streaming to Some("pageserver") at 0/4031CA8

The inner DbError has a severity of ERROR so DbError's Display
implementation includes that ERROR, even though we are actually
logging the error at INFO level.

Summary of changes

Introduce an explicit WalReceiverError type, and in its From<>
for postgres errors, apply the logic from ExpectedError, for
expected errors, and a new condition for successes.

The new output looks like:

    walreceiver connection handling ended: Successful completion: ending streaming to Some("pageserver") at 0/154E9C0, receiver is caughtup and there is no computes

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Aug 4, 2023
@jcsp jcsp force-pushed the jcsp/spurious-db-error branch from 302d38b to be225bf Compare August 4, 2023 13:38
@github-actions
Copy link

github-actions bot commented Aug 4, 2023

1264 tests run: 1212 passed, 0 failed, 52 skipped (full report)


@jcsp jcsp marked this pull request as ready for review August 4, 2023 14:48
@jcsp jcsp requested review from a team as code owners August 4, 2023 14:48
@jcsp jcsp requested review from fprasx and koivunej and removed request for a team and fprasx August 4, 2023 14:48
Example:
```
  walreceiver connection handling ended: db error: ERROR: ending streaming to Some("pageserver") at 0/4031CA8
```

The inner DbError has a severity of ERROR so DbError's Display
implementation includes that ERROR, even though we are actually
logging the error at INFO level.

Introduce an explicit WalReceiverError type, and in its From<>
for postgres errors, apply the logic from ExpectedError, for
expected errors, and a new condition for successes.

The new output looks like:
```
    walreceiver connection handling ended with success: ending streaming to Some("pageserver") at 0/154E9C0, receiver is caughtup and there is no computes
```
@jcsp jcsp force-pushed the jcsp/spurious-db-error branch from be225bf to 645442c Compare August 7, 2023 18:40
@jcsp
Copy link
Collaborator Author

jcsp commented Aug 7, 2023

(force push because it's entirely different approach)

The more I picked at trying to make a more explicit check for the condition instead of playing with messages, the more it seemed to make sense to use some structured error enums here instead, and do traditional error matching instead of the is_expected route.

This is a more invasive change, but the overall LOC change is only about +10 compared with the old way, and it makes us ready to use the SuccessfulCompletion variant for logic as/when we need to.

@jcsp jcsp requested a review from a team as a code owner August 8, 2023 08:08
@jcsp jcsp requested review from petuhovskiy and removed request for a team August 8, 2023 08:08
safekeeper/src/send_wal.rs Outdated Show resolved Hide resolved
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is looking great, we are on the path to recognize the "ending streaming" succesful completions and can propagate that upwards once all or some subset of safekeepers have closed wal streams for this reason. Unsure though if we know how many safekeepers there should be; probably only probabilisticly.

@jcsp jcsp enabled auto-merge (squash) August 8, 2023 08:58
@jcsp jcsp merged commit 4892a5c into main Aug 8, 2023
@jcsp jcsp deleted the jcsp/spurious-db-error branch August 8, 2023 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants