Default split strategy for GreedyScheduler #124

fredrikekre · 2024-09-26T15:48:42Z

The default split strategy is :scatter for GreedyScheduler (and :batch for the others). I didn't see this being discussed in #77 -- what was the rationale?

In my case I essentially want to use the GreedyScheduler as a "ntasks-limited" version of DynamicScheduler but was wondering why they had different defaults for this.

The text was updated successfully, but these errors were encountered:

carstenbauer · 2024-09-26T18:42:52Z

The default split strategy is :scatter for GreedyScheduler (and :batch for the others). I didn't see this being discussed in #77 -- what was the rationale?

Good question. I can't recall it. @MasonProtter any idea why we have this different default?

(In any case, speaking about different defaults, it's worth noting that the GreedyScheduler has chunking=false.)

In my case I essentially want to use the GreedyScheduler as a "ntasks-limited" version of DynamicScheduler but was wondering why they had different defaults for this.

Maybe I misunderstand but doesn't minchunksize, once we have it (#114), give you what you want?

fredrikekre · 2024-09-26T20:42:36Z

Maybe I misunderstand but doesn't minchunksize, once we have it (#114), give you what you want?}

No, I don't think so. I am after the decoupling of nchunks and ntasks that the GreedyScheduler offers and I don't see how minchunksize would help with that in the DynamicScheduler.

MasonProtter · 2024-09-26T20:45:02Z

Good question. I can't recall it. @MasonProtter any idea why we have this different default?

Honestly I don't remember either. Maybe the motivation was that if you're using the GreedyScheduler, then that's because each iteration is likely taking an unpredictable amount of time, so a round robin approach might help, but that's just a guess, I have no idea.

carstenbauer · 2024-09-27T05:46:24Z

Maybe the motivation was that if you're using the GreedyScheduler, then that's because each iteration is likely taking an unpredictable amount of time, so a round robin approach might help

That was it. We argued that the GreedyScheduler is mostly useful for non-uniform workload - because otherwise the other schedulers have less overhead - and for non-uniform workload the :roundrobin / :scatter split strategy is typically advantageous.

This line of reasoning seems reasonable to me but I'm generally open to considering changing the default. It might be insightful to test this argument with benchmarks. Especially in the light of #121, which makes the GreedyScheduler more efficient.

No, I don't think so. I am after the decoupling of nchunks and ntasks that the GreedyScheduler offers and I don't see how minchunksize would help with that in the DynamicScheduler.

I see. Out of curiosity, why are you after the decoupling of nchunks and ntasks? Do you have a very non-uniform (or perhaps even unknown/dynamic) workload?

fredrikekre · 2024-09-27T06:02:23Z

Out of curiosity, why are you after the decoupling of nchunks and ntasks? Do you have a very non-uniform (or perhaps even unknown/dynamic) workload?

Yes, potentially. The case I am considering is assembly of finite elements matrices. In many cases, depending on the problem you are solving, all elements are identical and then of course just splitting into nthreads chunks and letting each thread process a single chunk is more or less optimal (Ferrite-FEM/Ferrite.jl#1068 (comment) and with thread pinning (thanks for that package!): Ferrite-FEM/Ferrite.jl#1068 (comment)). However, if you simulate plastic deformation the workload will become extremely uneven -- the elements that enter the plastic region will take significantly longer to compute.

The reason I want to limit the total number of tasks is that you typically need task local scratch data which, again depending on the problem, can be somewhat large an/ord expensive to setup.

and for non-uniform workload the :roundrobin / :scatter split strategy is typically advantageous.

I guess this depends on whether you expect the workload to be sorted or not. If it is truly random then the strategy shouldn't matter I think?

carstenbauer · 2024-09-27T06:26:40Z

The reason I want to limit the total number of tasks is that you typically need task local scratch data which, again depending on the problem, can be somewhat large an/ord expensive to setup.

The task-local scratch data setup I get (in fact, we mention it in the docs). What I'm curious about is why you need decoupled chunking. By assigning N chunks to T tasks you effectively bundle ("chunk") again: One bundle being the chunks that each task processes. Couldn't you just as well have simply used one chunk per task (N == T) (which corresponds to the "bundle") instead?

In the context of non-uniform workload, the decoupling - as implemented by the greedy scheduler - makes sense to me, because what will eventually constitute a "bundle" is dynamically decided. In some loose sense, the greedy scheduler is a dynamic variant of :roundrobin. And I guess what I'm asking myself is: When is this dynamic greedy approach better than the regular dynamic and static scheduler with split=:roundrobin.

and for non-uniform workload the :roundrobin / :scatter split strategy is typically advantageous.

I guess this depends on whether you expect the workload to be sorted or not.

The workload doesn't have to be sorted I think. With :roundrobin you (kind of) randomize across the workload. Even for non-sorted (but non-uniform) workload distributions this should (conceptually) be beneficial. For example, consider a bimodal workload distribution (which isn't sorted). With :consecutive, the tasks that get chunks from the middle of the distribution (low workload) will eventually have to wait for the other tasks to finish. With :roundrobin, each chunk will get high-workload and low-workload bits. (Not saying that it will be perfectly balanced, of course.)

If it is truly random then the strategy shouldn't matter I think?

I agree.

fredrikekre · 2024-09-27T07:27:55Z

The task-local scratch data setup I get (in fact, we mention it in the docs). What I'm curious about is why you need decoupled chunking. By assigning N chunks to T tasks you effectively bundle ("chunk") again: One bundle being the chunks that each task processes. Couldn't you just as well have simply used one chunk per task (N == T) (which corresponds to the "bundle") instead?

Maybe I am misunderstanding you, but this wouldn't result in the workstealing behavior of the greedy scheduler, right?

fredrikekre · 2024-09-27T07:57:03Z

Anyway, this can probably be closes and certainly shouldn't hold up the new release. I was just curious.

carstenbauer · 2024-10-01T05:40:13Z

Maybe I am misunderstanding you, but this wouldn't result in the workstealing behavior of the greedy scheduler, right?

It wouldn't. It just isn't obvious to me in what cases the greedy scheduler performs better than :roundrobin.

Anyway, this can probably be closes and certainly shouldn't hold up the new release. I was just curious.

Yep, I think the current default split=:roundrobin makes sense for the greedy scheduler. Unless someone complains about it and, ideally, can provide some benchmarks, I don't see a reason to change the default.

carstenbauer mentioned this issue Sep 26, 2024

Prepare for ChunkSplitters 3.0 #119

Merged

2 tasks

carstenbauer closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default split strategy for GreedyScheduler #124

Default split strategy for GreedyScheduler #124

fredrikekre commented Sep 26, 2024

carstenbauer commented Sep 26, 2024

fredrikekre commented Sep 26, 2024

MasonProtter commented Sep 26, 2024

carstenbauer commented Sep 27, 2024 •

edited

Loading

fredrikekre commented Sep 27, 2024

carstenbauer commented Sep 27, 2024 •

edited

Loading

fredrikekre commented Sep 27, 2024

fredrikekre commented Sep 27, 2024

carstenbauer commented Oct 1, 2024

Default split strategy for GreedyScheduler #124

Default split strategy for GreedyScheduler #124

Comments

fredrikekre commented Sep 26, 2024

carstenbauer commented Sep 26, 2024

fredrikekre commented Sep 26, 2024

MasonProtter commented Sep 26, 2024

carstenbauer commented Sep 27, 2024 • edited Loading

fredrikekre commented Sep 27, 2024

carstenbauer commented Sep 27, 2024 • edited Loading

fredrikekre commented Sep 27, 2024

fredrikekre commented Sep 27, 2024

carstenbauer commented Oct 1, 2024

carstenbauer commented Sep 27, 2024 •

edited

Loading

carstenbauer commented Sep 27, 2024 •

edited

Loading