Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert protocol version upgrade #7727

Merged
merged 1 commit into from
May 13, 2024
Merged

Revert protocol version upgrade #7727

merged 1 commit into from
May 13, 2024

Conversation

VladLazar
Copy link
Contributor

Problem

"John pointed out that the switch to protocol version 2 made test_gc_aggressive test flaky: #7692.
I tracked it down, and that is indeed an issue. Conditions for hitting the issue:
The problem occurs in the primary
GC horizon is set to a very low value, e.g. 0.
If the primary is actively writing WAL, and GC runs in the pageserver at the same time that the primary sends a GetPage request, it's possible that the GC advances the GC horizon past the GetPage request's LSN. I'm working on a fix here: #7708." - Heikki

Summary of changes

Use protocol version 1 as default.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@VladLazar VladLazar force-pushed the vlad/revert-0115fe6c branch from 5d98441 to 93bff7e Compare May 13, 2024 10:18
@VladLazar VladLazar requested review from hlinnaka and arssher May 13, 2024 10:19
Copy link
Contributor

@hlinnaka hlinnaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include the explanation of why we're reverting in the commit message too. (If you use github's squash & merge, it will be copied from the PR description by default I think)

@VladLazar
Copy link
Contributor Author

If you use github's squash & merge, it will be copied from the PR description by default I think

Yep. That's what I usually do. Thanks for the heads-up.

@VladLazar VladLazar marked this pull request as ready for review May 13, 2024 10:59
@VladLazar VladLazar requested review from a team as code owners May 13, 2024 10:59
@VladLazar VladLazar requested a review from tristan957 May 13, 2024 10:59
Copy link

3060 tests run: 2927 passed, 0 failed, 133 skipped (full report)


Flaky tests (1)

Postgres 16

  • test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

  • functions: 31.4% (6330 of 20161 functions)
  • lines: 47.3% (47753 of 100970 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
93bff7e at 2024-05-13T11:04:43.457Z :recycle:

@jcsp jcsp merged commit bbe730d into main May 13, 2024
55 checks passed
@jcsp jcsp deleted the vlad/revert-0115fe6c branch May 13, 2024 12:41
VladLazar added a commit that referenced this pull request May 13, 2024
## Problem

"John pointed out that the switch to protocol version 2 made
test_gc_aggressive test flaky:
#7692.
I tracked it down, and that is indeed an issue. Conditions for hitting
the issue:
The problem occurs in the primary
GC horizon is set to a very low value, e.g. 0.
If the primary is actively writing WAL, and GC runs in the pageserver at
the same time that the primary sends a GetPage request, it's possible
that the GC advances the GC horizon past the GetPage request's LSN. I'm
working on a fix here: #7708."
- Heikki

## Summary of changes
Use protocol version 1 as default.
VladLazar added a commit that referenced this pull request May 13, 2024
## Problem

"John pointed out that the switch to protocol version 2 made
test_gc_aggressive test flaky:
#7692.
I tracked it down, and that is indeed an issue. Conditions for hitting
the issue:
The problem occurs in the primary
GC horizon is set to a very low value, e.g. 0.
If the primary is actively writing WAL, and GC runs in the pageserver at
the same time that the primary sends a GetPage request, it's possible
that the GC advances the GC horizon past the GetPage request's LSN. I'm
working on a fix here: #7708."
- Heikki

## Summary of changes
Use protocol version 1 as default.
a-masterov pushed a commit that referenced this pull request May 20, 2024
## Problem

"John pointed out that the switch to protocol version 2 made
test_gc_aggressive test flaky:
#7692.
I tracked it down, and that is indeed an issue. Conditions for hitting
the issue:
The problem occurs in the primary
GC horizon is set to a very low value, e.g. 0.
If the primary is actively writing WAL, and GC runs in the pageserver at
the same time that the primary sends a GetPage request, it's possible
that the GC advances the GC horizon past the GetPage request's LSN. I'm
working on a fix here: #7708."
- Heikki

## Summary of changes
Use protocol version 1 as default.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants