Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-2736] Adds output batch compression for Data Transforms #18514

Merged

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented May 15, 2024

This PR adds the ability to configure output batch compression on WASM transforms at deploy time or via metadata patch request. Includes rpk experience for the former. The latter case (metadata patch) is available via direct admin API invocation only.

TODO:

  • rpk bits and tests for deploying w/ compression turned on

Closes CORE-2736

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • Add output batch compression for Data Transforms (configurable per deployed transform)

@oleiman oleiman self-assigned this May 15, 2024
@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch 2 times, most recently from 4adf296 to 58df36c Compare May 16, 2024 05:45
@oleiman
Copy link
Member Author

oleiman commented May 16, 2024

/ci-repeat 1

@oleiman
Copy link
Member Author

oleiman commented May 16, 2024

/ci-repeat 1
release
skip-units
skip-redpanda-build

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 16, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/49248#018f82f5-d2fc-4bd5-b204-a1774afd79bf:

"rptest.tests.connection_virtualizing_test.TestVirtualConnections.test_handling_invalid_ids"

new failures in https://buildkite.com/redpanda/redpanda/builds/49248#018f82f5-d2ff-46d2-8840-d9e063a709e2:

"rptest.tests.connection_virtualizing_test.TestVirtualConnections.test_no_head_of_line_blocking.different_clusters=False.different_connections=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/50134#019009db-8de9-47d5-a29d-ba3e1ac43ce8:

"rptest.tests.topic_creation_test.TopicRecreateTest.test_topic_recreation_while_producing.workload=ACKS_1.cleanup_policy=compact"
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"

@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch 2 times, most recently from d9f6691 to f13207f Compare May 17, 2024 18:31
@oleiman oleiman marked this pull request as ready for review May 17, 2024 18:32
@oleiman oleiman requested review from a team and michael-redpanda and removed request for a team May 17, 2024 22:55
src/go/rpk/pkg/cli/transform/deploy.go Show resolved Hide resolved
src/go/rpk/pkg/cli/transform/list_test.go Outdated Show resolved Hide resolved
src/go/rpk/pkg/cli/transform/meta.go Outdated Show resolved Hide resolved
Comment on lines 102 to 103
at a cost. So, while it does occur asynchronously with respect to transform
execution, compression may introduce latency on the output topic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
at a cost. So, while it does occur asynchronously with respect to transform
execution, compression may introduce latency on the output topic.
at a cost. So, while it occurs asynchronously with respect to transform
execution, compression may introduce latency on the output topic.

@oleiman oleiman marked this pull request as draft May 31, 2024 19:56
@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch from f13207f to 1e48f71 Compare May 31, 2024 20:07
@oleiman
Copy link
Member Author

oleiman commented May 31, 2024

force push to sync w/ dev after a couple weeks of inactivity

@oleiman
Copy link
Member Author

oleiman commented Jun 18, 2024

/ci-repeat 1

Comment on lines 186 to 187
.match("producer", compression::producer);
} catch (std::runtime_error& e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe could avoid the try/catch with

string_switch<std::optional>(c)
...
.default_match(std::nullopt);

if (!c.has_value()) {
i.setstate(...)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point. force of habit from regular switch, but static analysis is no help here obviously.

@@ -170,6 +173,10 @@ struct transform_metadata_patch {
std::optional<absl::flat_hash_map<ss::sstring, ss::sstring>> env;
// Desired paused state for the transform
std::optional<is_transform_paused> paused;
// Desired compression mode for the transform
std::optional<compression> compression_mode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between compression_mode:none and nullopt here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this context, whether or not to update the setting on persisted metadata. comments on these fields could be better.

@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch from d6b3bb0 to 4c2460b Compare June 24, 2024 15:42
@oleiman
Copy link
Member Author

oleiman commented Jun 24, 2024

force push contents:

  • marginally improve comments on transform_metadata_patch
  • improve string_switch handling for compression mode

@oleiman
Copy link
Member Author

oleiman commented Jun 24, 2024

CI Failure is some pandatriage issue

@@ -233,12 +240,16 @@ func mergeProjectConfigs(lhs project.Config, rhs project.Config) (out project.Co
if len(rhs.OutputTopics) > 0 {
out.OutputTopics = rhs.OutputTopics
}
if rhs.Compression != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why the flag should default to empty string 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I was harping on up here 🤷

I have to push again to fix a conflict, so I think I'll change it back. But I (clearly) don't have a strong opinion about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was agreeing with you 😄

oleiman added 12 commits June 25, 2024 08:59
boost::lexical_cast<compression>("bogus) should raise bad_lexical cast.

Previous behavior: runtime_error("Fell off the end of a string-switch")

Failure to match the source string to one of the compression lexical cases
should set the fail bit on the istream instead of allowing the runtime_error
to escape.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Also adds bool transform_metadata_patch::empty()

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Respecting transform_metadata::compression_mode

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
- tranform_from_json
- deploy

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

dt/rpk
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch from 4c2460b to 04cd18e Compare June 25, 2024 17:22
@oleiman
Copy link
Member Author

oleiman commented Jun 25, 2024

force push rebase dev to fix merge conflict

@oleiman
Copy link
Member Author

oleiman commented Jun 27, 2024

@michael-redpanda michael-redpanda merged commit 577a280 into redpanda-data:dev Jun 27, 2024
23 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants