Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-6807] kafka: change offset_out_of_range condition in replicated_partition::prefix_truncate() #22905

Merged
merged 3 commits into from
Aug 19, 2024

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Aug 16, 2024

Previously, we would return an error_code::offset_out_of_range for a request where kafka_truncation_offset <= start_offset.

This is a divergence from the current Kafka behaviour, and can lead to errors with certain Kafka clients/consumers.

Remove the condition that throws an offset_out_of_range error if kafka_truncation_offset <= start_offset(), instead returning error_code::none.

More context and details in the JIRA link below.

JIRA Link: https://redpandadata.atlassian.net/browse/CORE-6807

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Improvements

  • Allows DeleteRecords requests from Kafka clients or rpk topic trim-prefix to be called with truncation_offset <= start_offset without returning an error. The request is instead treated as a no-op.

@WillemKauf WillemKauf force-pushed the kafka_delete_records_fix branch from dc8b34a to a07b6c2 Compare August 16, 2024 16:03
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Add early exit on kafka_truncation_offset == start_offset() in replicated_partition::prefix_truncate()
  • Add KafkaCliTools.delete_records()
  • Add DeleteRecordsKafkaTest to make use of KafkaCliTools.delete_records(), asserting we achieve parity with Kafka.

@WillemKauf WillemKauf requested a review from nvartolomei August 16, 2024 16:06
tests/rptest/tests/delete_records_test.py Outdated Show resolved Hide resolved
tests/rptest/tests/delete_records_test.py Outdated Show resolved Hide resolved
tests/rptest/tests/delete_records_test.py Outdated Show resolved Hide resolved
tests/rptest/tests/delete_records_test.py Outdated Show resolved Hide resolved
@WillemKauf WillemKauf force-pushed the kafka_delete_records_fix branch from a07b6c2 to 568e20f Compare August 16, 2024 18:56
@WillemKauf WillemKauf requested a review from nvartolomei August 16, 2024 18:58
@WillemKauf
Copy link
Contributor Author

Force push to address Nicolae's comments:

  • Renamed DeleteRecordsKafkaTest -> DeleteRecordsKafkaCliToolsTest
  • Moved parsing of DeleteRecords result into kafka_cli_tools.delete_records()
  • Refactored and cleaned up test_delete_records_non_empty_topic() further.

src/v/kafka/server/replicated_partition.cc Outdated Show resolved Hide resolved
tests/rptest/tests/delete_records_test.py Outdated Show resolved Hide resolved
Previously, we would return an `error_code::offset_out_of_range` for
a request where `kafka_truncation_offset <= start_offset`.

This is different behavior from what most Kafka clients are expecting
when issuing `DeleteRecords` requests. Change the condition to return
early with a `error_code::none` result.

Also, correct conditions for tests that would previously expect a truncate
call with `offset <= start_offset` to fail with the new, expected behavior
(which is success).
Adds a wrapper function around `kafka-delete-records.sh` which can
be used to issue `DeleteRecords` requests.
Uses `kafka-delete-records.sh` and `rpk trim-prefix` to assert that
Redpanda and Kafka achieve parity for certain edge cases involving
`DeleteRecords` requests.

Uses both `redpanda` and `KafkaService` brokers as mixins.
@WillemKauf WillemKauf force-pushed the kafka_delete_records_fix branch from 568e20f to a3f7130 Compare August 16, 2024 21:38
@WillemKauf
Copy link
Contributor Author

Force push to address Andrew's comments:

  • Added a mixin test BaseDeleteRecordsTest which utilizes both a redpanda cluster as well as a KafkaService to assert parity beyond just the CLIs.
  • This test actually revealed that Kafka does not return an offset_out_of_range error for any DeleteRecords request with truncation_offset <= start_offset- amended the redpanda code to no-op for this entire range of values.

@vbotbuildovich
Copy link
Collaborator

@WillemKauf WillemKauf requested a review from andrwng August 19, 2024 13:32
@WillemKauf WillemKauf merged commit 65c461a into redpanda-data:dev Aug 19, 2024
23 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just guessing that we should have gotten a replication and maybe enterprise reviewer on this pr.

@WillemKauf
Copy link
Contributor Author

Just guessing that we should have gotten a replication and maybe enterprise reviewer on this pr.

Ack, @bharathv, @oleiman, would you like to weigh in on this change?

@WillemKauf WillemKauf requested a review from oleiman August 20, 2024 20:25
Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think any divergence from kafka at the API layer is considered a bug from the enterprise team's perspective, so the spirit of the change seems legit. nice tests!

if (
kafka_truncation_offset <= start_offset()
|| kafka_truncation_offset > high_watermark()) {
if (kafka_truncation_offset <= start_offset()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/cluster/Partition.scala#L1688

suggests that if truncation_offset < 0 it should throw a OffsetOutOfRangeException(), its an edge case so probably not a big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, there is even more nuance here than that: -1 indicates a complete wipe up to high_watermark, while other values < 0 are an offset_out_of_range error. Will have this covered in a follow up 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants