Introduce transform::manager #13270

rockwotj · 2023-09-05T18:45:30Z

Transform manager is the component for ensuring that the correct transform::processor is running for a given core.

It is a component that recieves notifications from the rest of the system and processes those notifications on a single fiber queue.

The queue makes handling shutdown a little simpler, as we can remove most of the logic for handling shutdown correctly into another class,
additionally using a single fiber means we don't need to worry about concurrent modifications of data structures or duplicate notifications
being scheduled.

We also template the manager based on the clock type. This is a similar pattern used in other places that allows us to use the seastar manual
clock for fully deterministic testing, but use the lowres clock for production usage.

Backports Required

Release Notes

none

rockwotj · 2023-09-06T01:23:10Z

CI Failures: #12659, #13181

dotnwat

looks great. several questions but i don't think there are any blockers.

src/v/ssx/work_queue.cc

dotnwat · 2023-09-12T18:39:42Z

src/v/ssx/work_queue.cc

+void work_queue::submit(ss::noncopyable_function<ss::future<>()> fn) {
+    if (_as.abort_requested()) {
+        return;


i wonder if the signature here should be

ss::noncopyable_function<ss::future<>(seastar::abort_source*)> fn

I thought about this - right now there isn't a use case because starting a wasm engine isn't interruptible at the moment and that's the main source of latency I'd assume. Do you think it's worth future proofing?

By future proofing I mean adding now

Do you think it's worth future proofing?

nah

src/v/ssx/work_queue.cc

dotnwat · 2023-09-12T19:12:39Z

src/v/transform/transform_processor.cc

@@ -121,4 +121,6 @@ ss::future<> processor::do_run_transform_loop() {

 model::transform_id processor::id() const { return _id; }
 const model::ntp& processor::ntp() const { return _ntp; }
+const model::transform_metadata& processor::meta() const { return _meta; }
+bool processor::is_running() const { return !_task.available(); }


i'll be interested to see later in the pr how is_running is used, but it feels like an odd property to expose. for example, if its false then the only move is to inspect if there was an exception, but that isn't exposed. it could be restarted, but that could be hidden--restart automatic policy. also, it's true before before start() and after stop() (ie contains ss::now()).

I think you saw this :)

I'm happy for alternatives, but I'd like to colocate the processor with the backoff state and this feels like the easiest thing? I guess an optional (or nullptr) is another option too...

it makes sense to have this or something like it if the restart policy is extracted out of the processor itself. it just wasn't clear what was coming up in a commit-by-commit review, so it's a bit more of a stream of consciousness.

stream of consciousness

love stream of consciousness reviews!

src/v/transform/transform_manager.cc

src/v/transform/transform_manager.h

src/v/utils/human.h

src/v/transform/transform_manager.cc

dotnwat · 2023-09-12T21:45:39Z

src/v/transform/tests/transform_manager_test.cc

+        // enqueued to ensure that task execution is deterministic. Because in
+        // debug mode seastar randomizes task order, so there is no way to wait
+        // for those tasks to be executed outside of draining the seastar queue.
+        ss::set_idle_cpu_handler([this](ss::work_waiting_on_reactor) {


I'm curious if you saw this technique in Redpanda source tree. I ask because we had to solve a similar problem at one point in the past and I don't recall that we had support from Seastar for that like you've managed to get with this code.

I did not. I studied the reactor source in seastar a lot until I stumbled upon this. AFAIK this is would be the first usage of ss::set_idle_cpu_handler in Redpanda.

rockwotj · 2023-09-13T01:27:09Z

Force push rebase with dev

This is a small utility that manages running tasks sequentially on a single fiber. This is useful for simplifying control loops by executing everything on a single fiber without need for locking/etc. As a potenial future extension: for long running async work, this queue could have the ability to manage spinning off background fibers to handle work, then re-enqueuing the result of that work. The advantage of having the work_queue track that is that the bookkeeping is consolidated into a single place and we can cleanly handle shutdown. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>