Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-2841] Transforms: Start consuming from an arbitrary offset (numeric from start/end or timestamp) #19975

Merged

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Jun 24, 2024

This PR wires up the ability to configure the start offset of a transform at deploy time. This can be either a unix timestamp (ms since epoch) or an offset delta (+oo from start offset of -oo from end).

Includes rpk experience.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • Adds the ability to start transform processing from an arbitrary offset on the input topic.

@oleiman oleiman self-assigned this Jun 24, 2024
oleiman

This comment was marked as resolved.

@oleiman oleiman changed the title Transforms: Start consuming from an arbitrary offset (numeric from start/end or timestamp) [CORE-2841] Transforms: Start consuming from an arbitrary offset (numeric from start/end or timestamp) Jun 24, 2024
@oleiman oleiman force-pushed the xform/core-2841/specify-start-offset branch 5 times, most recently from b177085 to 78e09c0 Compare June 25, 2024 17:26
@oleiman oleiman marked this pull request as ready for review June 25, 2024 17:28
@oleiman oleiman requested review from a team and michael-redpanda and removed request for a team June 25, 2024 17:29
@oleiman oleiman force-pushed the xform/core-2841/specify-start-offset branch from 78e09c0 to 612cd16 Compare June 25, 2024 18:01
@oleiman
Copy link
Member Author

oleiman commented Jun 27, 2024

/ci-repeat 1

@oleiman oleiman force-pushed the xform/core-2841/specify-start-offset branch from 612cd16 to 2d34443 Compare June 27, 2024 19:13
@oleiman oleiman requested review from rockwood-openai and pgellert and removed request for rockwood-openai June 27, 2024 19:15
@oleiman oleiman requested a review from rockwotj July 2, 2024 02:12
@oleiman
Copy link
Member Author

oleiman commented Jul 2, 2024

force push contents:

  • serde shenanigans
  • adjust feature gate to guard new offset alternatives only

@oleiman oleiman force-pushed the xform/core-2841/specify-start-offset branch from f743216 to f6f1e38 Compare July 2, 2024 02:27
@oleiman
Copy link
Member Author

oleiman commented Jul 2, 2024

force push make legacy_transform_offset_options struct write-only.

rockwotj
rockwotj previously approved these changes Jul 2, 2024
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small edge case to redeem otherwise LGTM

src/v/model/transform.cc Show resolved Hide resolved
},
[this, &latest](model::transform_from_end off) {
vlog(_logger.debug, "starting at offset: {}", off);
auto actual_offset = latest - off.delta;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

underflow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh. goood catch. thought i made a note somewhere to address that 😕

},
[this](model::transform_from_start off) {
vlog(_logger.debug, "starting at offset: {}", off);
auto actual_offset = _source->start_offset() + off.delta;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overflow

oleiman added 6 commits July 1, 2024 22:36
model::offset_delta resolves arithmetic operator overloads that perform
automatic conversions between kafka and model offsets.

This commit introduces an offset_delta that is used specifically for
applying a numeric delta to an existing kafka::offset without adjusting
its type.

Useful for transform start offset calculations.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Thin wrappers around a kafka::offset_delta meant for use in a sum
type to indicate how the delta should be applied - i.e. added to
the start offset of a topic or subtracted from the latest offset
of a topic.

Enclosed delta should always be greater than or equal to zero.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
As a consequence of this, to maintain compatibility during partial
upgrades, introduce a legacy_transform_offset_options. This gives us
the ability to feature gate use of the new offset alternatives and avoid
writing non-forwards-compatible transform metadata in case a partial
upgrade needs to be rolled back.

This commit also adds transform_metadata::serde_write, which conditionally
writes the legacy version of the offset options struct iff the position
variant holds one of the legacy alternatives. This code can be removed after
v24.3 ships.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
{
  "format": enum[timestamp, from_start, from_end],
  "value": int64
}

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Specifically, we want to prevent the use of the new offset options
alternatives before an upgrade has been finalized.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the xform/core-2841/specify-start-offset branch from f6f1e38 to e1c3d29 Compare July 2, 2024 15:14
pgellert
pgellert previously approved these changes Jul 2, 2024
@oleiman oleiman force-pushed the xform/core-2841/specify-start-offset branch from e1c3d29 to 497bef4 Compare July 2, 2024 17:32
@oleiman
Copy link
Member Author

oleiman commented Jul 2, 2024

force push a dt test case

Copy link
Contributor

@r-vasquez r-vasquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor comment on the flag documentation and one question on the default value for the from-offset format.

src/go/rpk/pkg/cli/transform/deploy.go Outdated Show resolved Hide resolved
Comment on lines 341 to 351
format := ""
switch pfx := formatted_offset[0:1]; pfx {
case "@":
format = "timestamp"
case "+":
format = "from_start"
case "-":
format = "from_end"
default:
return nil, fmt.Errorf("Bad prefix: expected one of ['@','+','-'], got: '%s'", pfx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] I do not have a strong opinion on this, but, can we default the format to from_start?

The flag reads:

Process an input topic partition from this offset

If I use the flag as:

rpk transform deploy --from-offset 30

It reads ok (I will deploy from offset 30) but it will fail with: Bad prefix: expected one of ['@','+','-'], got: '3'.

We could check if the prefix case is + or a number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd caution against this because we really don't offer the ability to deploy from a particular offset, only incidentally if the start offset of every partition is 0.

I'm sort of okay with the UX speedbump if it forces a user back to the help text so they know exactly what to expect. I'm not exactly the ideal source of UX opinions, but I think this feature will be used rarely enough to benefit from high specificity.

If somebody does have a strong opinion about this though, I'm happy to make the change.

src/go/rpk/pkg/cli/transform/deploy.go Outdated Show resolved Hide resolved
oleiman added 3 commits July 2, 2024 13:49
--from-offset to start from this offset
  * @t: start from UNIX timestamp (ms from epoch)
  * +oo: start offset + oo
  * -oo: latest ofset - oo

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
- Consume records that were produced before the deploy
- Specify offsets that run off the end of the input topic
- Ill-formed offsets

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the xform/core-2841/specify-start-offset branch from 497bef4 to 5d03998 Compare July 2, 2024 21:04
@oleiman
Copy link
Member Author

oleiman commented Jul 2, 2024

force push to address some CR comments on the rpk side

@oleiman oleiman requested review from rockwotj and r-vasquez July 2, 2024 21:12
@oleiman
Copy link
Member Author

oleiman commented Jul 3, 2024

CI Failure is #20574 (known, unrelated)

@aanthony-rp aanthony-rp merged commit 64090fc into redpanda-data:dev Jul 3, 2024
23 of 26 checks passed
Comment on lines +57 to +73
inline constexpr offset operator+(offset o, offset_delta d) {
if (o >= offset{0}) {
if (d() <= offset::max() - o) {
return offset{o() + d()};
} else {
return offset::max();
}
} else {
if (d() >= offset::min() - o) {
return offset{o() + d()};
} else {
return offset::min();
}
}
}

inline constexpr offset operator-(offset o, offset_delta d) { return o + (-d); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andijcr given the recent issue with named_type strictness (with the PR you had to address it), are these overloads scary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar...do you have a pointer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15170 sorry i missed the tag.
i need to have a look, i think we already have some overloads for these operators and types...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it thanks. That looks like a good change 👍

For context on this diff: kafka::offset_delta is new. the idea is to distinguish from model::offset_delta where the operator overloads do conversion back/forth between kafka::offset & model::offset.

as for the overload resolution itself, I think we're fine because the desired one is more specific? like this:
https://godbolt.org/z/EfG7Yx5vc

Comment on lines +119 to +121
struct legacy_transform_offset_options
: serde::envelope<
transform_offset_options,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first envelope template parameter looks wrong (CRTP)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you 🙏
#20828

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants