-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-2841] Transforms: Start consuming from an arbitrary offset (numeric from start/end or timestamp) #19975
[CORE-2841] Transforms: Start consuming from an arbitrary offset (numeric from start/end or timestamp) #19975
Conversation
b177085
to
78e09c0
Compare
78e09c0
to
612cd16
Compare
/ci-repeat 1 |
612cd16
to
2d34443
Compare
force push contents:
|
f743216
to
f6f1e38
Compare
force push make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small edge case to redeem otherwise LGTM
}, | ||
[this, &latest](model::transform_from_end off) { | ||
vlog(_logger.debug, "starting at offset: {}", off); | ||
auto actual_offset = latest - off.delta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
underflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh. goood catch. thought i made a note somewhere to address that 😕
}, | ||
[this](model::transform_from_start off) { | ||
vlog(_logger.debug, "starting at offset: {}", off); | ||
auto actual_offset = _source->start_offset() + off.delta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overflow
model::offset_delta resolves arithmetic operator overloads that perform automatic conversions between kafka and model offsets. This commit introduces an offset_delta that is used specifically for applying a numeric delta to an existing kafka::offset without adjusting its type. Useful for transform start offset calculations. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Thin wrappers around a kafka::offset_delta meant for use in a sum type to indicate how the delta should be applied - i.e. added to the start offset of a topic or subtracted from the latest offset of a topic. Enclosed delta should always be greater than or equal to zero. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
As a consequence of this, to maintain compatibility during partial upgrades, introduce a legacy_transform_offset_options. This gives us the ability to feature gate use of the new offset alternatives and avoid writing non-forwards-compatible transform metadata in case a partial upgrade needs to be rolled back. This commit also adds transform_metadata::serde_write, which conditionally writes the legacy version of the offset options struct iff the position variant holds one of the legacy alternatives. This code can be removed after v24.3 ships. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
{ "format": enum[timestamp, from_start, from_end], "value": int64 } Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Specifically, we want to prevent the use of the new offset options alternatives before an upgrade has been finalized. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
f6f1e38
to
e1c3d29
Compare
e1c3d29
to
497bef4
Compare
force push a dt test case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one minor comment on the flag documentation and one question on the default value for the from-offset
format.
format := "" | ||
switch pfx := formatted_offset[0:1]; pfx { | ||
case "@": | ||
format = "timestamp" | ||
case "+": | ||
format = "from_start" | ||
case "-": | ||
format = "from_end" | ||
default: | ||
return nil, fmt.Errorf("Bad prefix: expected one of ['@','+','-'], got: '%s'", pfx) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Question] I do not have a strong opinion on this, but, can we default the format to from_start
?
The flag reads:
Process an input topic partition from this offset
If I use the flag as:
rpk transform deploy --from-offset 30
It reads ok (I will deploy from offset 30) but it will fail with: Bad prefix: expected one of ['@','+','-'], got: '3'
.
We could check if the prefix case is +
or a number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd caution against this because we really don't offer the ability to deploy from a particular offset, only incidentally if the start offset of every partition is 0.
I'm sort of okay with the UX speedbump if it forces a user back to the help text so they know exactly what to expect. I'm not exactly the ideal source of UX opinions, but I think this feature will be used rarely enough to benefit from high specificity.
If somebody does have a strong opinion about this though, I'm happy to make the change.
--from-offset to start from this offset * @t: start from UNIX timestamp (ms from epoch) * +oo: start offset + oo * -oo: latest ofset - oo Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
- Consume records that were produced before the deploy - Specify offsets that run off the end of the input topic - Ill-formed offsets Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
497bef4
to
5d03998
Compare
force push to address some CR comments on the rpk side |
CI Failure is #20574 (known, unrelated) |
inline constexpr offset operator+(offset o, offset_delta d) { | ||
if (o >= offset{0}) { | ||
if (d() <= offset::max() - o) { | ||
return offset{o() + d()}; | ||
} else { | ||
return offset::max(); | ||
} | ||
} else { | ||
if (d() >= offset::min() - o) { | ||
return offset{o() + d()}; | ||
} else { | ||
return offset::min(); | ||
} | ||
} | ||
} | ||
|
||
inline constexpr offset operator-(offset o, offset_delta d) { return o + (-d); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andijcr given the recent issue with named_type strictness (with the PR you had to address it), are these overloads scary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar...do you have a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#15170 sorry i missed the tag.
i need to have a look, i think we already have some overloads for these operators and types...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it thanks. That looks like a good change 👍
For context on this diff: kafka::offset_delta
is new. the idea is to distinguish from model::offset_delta
where the operator overloads do conversion back/forth between kafka::offset
& model::offset
.
as for the overload resolution itself, I think we're fine because the desired one is more specific? like this:
https://godbolt.org/z/EfG7Yx5vc
struct legacy_transform_offset_options | ||
: serde::envelope< | ||
transform_offset_options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first envelope template parameter looks wrong (CRTP)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you 🙏
#20828
This PR wires up the ability to configure the start offset of a transform at deploy time. This can be either a unix timestamp (ms since epoch) or an offset delta (+oo from start offset of -oo from end).
Includes rpk experience.
Backports Required
Release Notes
Improvements