Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s/disk_log_impl: don't prefix-truncate empty segments #19790

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented Jun 11, 2024

Consider the following sequence of events:

  1. there is a single segment in the log, with offsets [0-9]
  2. we call prefix_truncate(10)
  3. concurrently, another batch of 5 messages is being appended.
  4. empty segment with base_offset=10, dirty_offset=9 is created
  5. the appended batch is placed at offsets 10-14

Previously, the empty segment would have passed the dirty_offset check and (after waiting for the append to finish) would get deleted (including the data at offsets 10-14). Check also the segment base_offset to prevent that.

Fixes #19632

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

Consider the following sequence of events:
1. there is a single segment in the log, with offsets [0-9]
2. we call prefix_truncate(10)
3. concurrently, another batch of 5 messages is being appended.
4. empty segment with base_offset=10, dirty_offset=9 is created
5. the appended batch is placed at offsets 10-14

Previously, the empty segment would have passed the dirty_offset check
and (after waiting for the append to finish) would get deleted
(including the data at offsets 10-14). Check also the segment base_offset
to prevent that.

Fixes redpanda-data#19632
Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

Should this be backported? It seems a scary

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jun 12, 2024

@mmaslankaprv mmaslankaprv merged commit 229c8dd into redpanda-data:dev Jun 12, 2024
19 checks passed
@ztlpn
Copy link
Contributor Author

ztlpn commented Jun 12, 2024

Should this be backported?

Good point, usually I'm reluctant to backport similar fixes without reported bugs, but this seems low risk.

@ztlpn
Copy link
Contributor Author

ztlpn commented Jun 12, 2024

/backport v24.1.x

@ztlpn ztlpn deleted the fix-empty-seg-prefix-truncate branch June 12, 2024 09:58
@ztlpn
Copy link
Contributor Author

ztlpn commented Jun 12, 2024

/backport v24.1.x

@ztlpn
Copy link
Contributor Author

ztlpn commented Jun 12, 2024

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-19790-v23.3.x-690 remotes/upstream/v23.3.x
git cherry-pick -x 963ff237c08eef00a09b49cd4af1d9ee15ef20c1

Workflow run logs.

Comment on lines +2514 to +2516
// base_offset check is for the case of an empty segment
// (where dirty = base - 1). We don't want to remove it because
// batches may be concurrently appended to it and we should keep them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yikes. we probably should have had a lot more concurrency limitations in disk_log_impl. another thing that @andrwng's work on local storage v2 would address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (key symptom) in RandomNodeOperationsTest.test_node_operations
5 participants