chore: schedule chains #3819

romange · 2024-09-28T19:35:59Z

Use intrusive queue that allows batching of scheduling calls instead of handling each call separately. This optimizations improves latency and throughput by 3-5%

dranikpg

Questionable 🤔🤔🤔 I don't understand the technical motivation behind it. We have the same amount of inter-thread coordination (if not more), it's just a mpsc queue embedded in a mpsc queue.

We should ask why it is costly to use the shard set queue? Dynamic dispatch? Heavy functor objects? Object access times? What if we use variant<ScheduleContext, Functor> in shard set?

dranikpg · 2024-09-29T10:36:53Z

src/server/transaction.cc

+struct ScheduleQ {
+  base::MPSCIntrusiveQueue<ScheduleContext> queue;
+
+  static constexpr size_t kSz = sizeof(queue);
+
+  char pad1[64];
+  atomic_bool armed{false};
+  char pad2[60];
+};


🤓

alignas(64) MPSC queue; alignas(64) atomic_bool armed;

but isntead of 64 exist better option std::hardware_destructive_interference_size

yes, but this is just an awful name that translated to 64 anyways

in the future it can be 128 so awful name is better than const

you are right, my answer was emotional 😺

dranikpg · 2024-09-29T10:40:00Z

src/server/transaction.h


+  static void ScheduleBatchInShard();


Please comment on what it does and what the motiviation is, we'll be puzzling in a few months

src/server/transaction.cc

BorysTheDev · 2024-09-29T11:08:05Z

src/server/transaction.cc

+}
+
+// of shard_num arity.
+ScheduleQ* schedule_queues = nullptr;


does not provide any value since we must delete manually anyways.

romange · 2024-09-29T13:11:29Z

Batching of code - allows locality. I have not yet measured metrics but the loop may process 10 items and more in a batch
More importantly - it will allow us to try and prefetch keys memory in Dashtable when processing operations. So before that it was not possible to prefetch memory for multiple ops because all of them were executed individually.

You are asking a good question - why using intrusive queue and not a regular queue. I do not know what's more efficient but using a variant will require factoring our FiberQueue from helio, so it's more work.

romange · 2024-09-29T13:14:53Z

FiberQueue uses mpmc_bounded_queue - which notifies a consumer. mpsc is fully lock free - does not allow blocking

dranikpg · 2024-09-29T13:30:01Z

Batching of code - allows locality. I have not yet measured metrics but the loop may process 10 items and more in a batch

What you call batching and locality are purely code driven concepts. Technically nothing has changed - you executed 10 times the task sequentially before, just each time calling a functor, not the function itself directly. Instead of checking on shard queue atomics you check on your new queue atomics, that's it.

pop one job on shard set {
  for i = 1, 10 {
     Run()
  }
}

and

for i = 1, 10 {
  pop job from shard set {
    Run()
  }
}

More importantly - it will allow us to try and prefetch keys memory in Dashtable when processing operations. So before that it was not possible to prefetch memory for multiple ops because all of them were executed individually.

So we can plan it first on a higher level. Let's revise what we use the shard set queues for. Structured information is always better than generalized (functors) and our execution flow is always better off in a structured way than hidden. So I assume to parallelize memory access one day you intend to

vector<ScheduleCntx> batch = next_ten_readonly();
vector<vector<it>> keys;
for (cntx: batch):
  keys += fetch_iterators()
for (cntx: batch)
  run(cntx, keys[i])

romange · 2024-09-29T19:54:22Z

Yes, this is the plan. In any case I am not in hurry to submit this Pr. it can stay as food for thought

dranikpg · 2024-10-01T10:49:34Z

In any case I am not in hurry to submit this Pr

Let's merge, it's just that you reveal your plans step by step whereas we're curious for the final plan

Use intrusive queue that allows batching of scheduling calls instead of handling each call separately. This optimizations improves latency and throughput by 3-5% In addition, we expose batching statistics in info transaction block. Signed-off-by: Roman Gershman <roman@dragonflydb.io>

romange · 2024-10-11T10:57:15Z

hardware_destructive_interference_size does not exist in the gcc versions we use so I can not use it. I added a named const for that. PTAL

BorysTheDev · 2024-10-11T11:18:11Z

hardware_destructive_interference_size does not exist in the gcc versions we use so I can not use it. I added a named const for that. PTAL

It's really strange because it's c++17

kostasrim · 2024-10-11T11:22:26Z

hardware_destructive_interference_size does not exist in the gcc versions we use so I can not use it. I added a named const for that. PTAL

It's really strange because it's c++17

the support for C++17 changes from compiler version to version. Same for all C++ releases. For example, you got modules in C++20. Are they properly supported in recent versions of gcc and clang ? I don't think so. Basically, for every new release they just add support of the missing features so even if you compiler something with C++17 flag it doesn't necessarily mean that the feature exists

dranikpg

hardware_destructive_interference_size does not exist in the gcc versions we use so I can not use it. I added a named const for that. PTAL

We use absl::hardware_something in a few places 🙂

Let's merge and see what we make out of it later

romange requested a review from dranikpg September 29, 2024 09:38

dranikpg previously approved these changes Sep 29, 2024

View reviewed changes

BorysTheDev reviewed Sep 29, 2024

View reviewed changes

romange dismissed dranikpg’s stale review via 1dd7249 September 29, 2024 20:02

dranikpg previously approved these changes Sep 30, 2024

View reviewed changes

romange dismissed dranikpg’s stale review via 77848a2 October 11, 2024 10:37

romange force-pushed the ScheduleChains branch from 1dd7249 to 77848a2 Compare October 11, 2024 10:37

romange force-pushed the ScheduleChains branch from 77848a2 to 0fcab98 Compare October 11, 2024 10:56

romange requested review from BorysTheDev and dranikpg October 11, 2024 10:56

dranikpg approved these changes Oct 11, 2024

View reviewed changes

romange merged commit 5d2c308 into main Oct 11, 2024
12 checks passed

romange deleted the ScheduleChains branch October 11, 2024 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: schedule chains #3819

chore: schedule chains #3819

romange commented Sep 28, 2024

dranikpg left a comment

dranikpg Sep 29, 2024

BorysTheDev Sep 29, 2024

romange Sep 29, 2024

BorysTheDev Sep 30, 2024

romange Sep 30, 2024

dranikpg Sep 29, 2024

BorysTheDev Sep 29, 2024

romange Sep 29, 2024

romange commented Sep 29, 2024

romange commented Sep 29, 2024

dranikpg commented Sep 29, 2024

romange commented Sep 29, 2024 •

edited

Loading

dranikpg commented Oct 1, 2024

romange commented Oct 11, 2024

BorysTheDev commented Oct 11, 2024

kostasrim commented Oct 11, 2024

dranikpg left a comment

chore: schedule chains #3819

chore: schedule chains #3819

Conversation

romange commented Sep 28, 2024

dranikpg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romange commented Sep 29, 2024

romange commented Sep 29, 2024

dranikpg commented Sep 29, 2024

romange commented Sep 29, 2024 • edited Loading

dranikpg commented Oct 1, 2024

romange commented Oct 11, 2024

BorysTheDev commented Oct 11, 2024

kostasrim commented Oct 11, 2024

dranikpg left a comment

Choose a reason for hiding this comment

romange commented Sep 29, 2024 •

edited

Loading