-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: schedule chains #3819
chore: schedule chains #3819
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questionable 🤔🤔🤔 I don't understand the technical motivation behind it. We have the same amount of inter-thread coordination (if not more), it's just a mpsc queue embedded in a mpsc queue.
We should ask why it is costly to use the shard set queue? Dynamic dispatch? Heavy functor objects? Object access times? What if we use variant<ScheduleContext, Functor> in shard set?
src/server/transaction.cc
Outdated
struct ScheduleQ { | ||
base::MPSCIntrusiveQueue<ScheduleContext> queue; | ||
|
||
static constexpr size_t kSz = sizeof(queue); | ||
|
||
char pad1[64]; | ||
atomic_bool armed{false}; | ||
char pad2[60]; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤓
alignas(64) MPSC queue;
alignas(64) atomic_bool armed;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but isntead of 64 exist better option std::hardware_destructive_interference_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but this is just an awful name that translated to 64 anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the future it can be 128 so awful name is better than const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, my answer was emotional 😺
src/server/transaction.h
Outdated
|
||
static void ScheduleBatchInShard(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment on what it does and what the motiviation is, we'll be puzzling in a few months
} | ||
|
||
// of shard_num arity. | ||
ScheduleQ* schedule_queues = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unique_ptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not provide any value since we must delete manually anyways.
You are asking a good question - why using intrusive queue and not a regular queue. I do not know what's more efficient but using a variant will require factoring our FiberQueue from helio, so it's more work. |
FiberQueue uses |
What you call batching and locality are purely code driven concepts. Technically nothing has changed - you executed 10 times the task sequentially before, just each time calling a functor, not the function itself directly. Instead of checking on shard queue atomics you check on your new queue atomics, that's it. pop one job on shard set {
for i = 1, 10 {
Run()
}
} and for i = 1, 10 {
pop job from shard set {
Run()
}
}
So we can plan it first on a higher level. Let's revise what we use the shard set queues for. Structured information is always better than generalized (functors) and our execution flow is always better off in a structured way than hidden. So I assume to parallelize memory access one day you intend to vector<ScheduleCntx> batch = next_ten_readonly();
vector<vector<it>> keys;
for (cntx: batch):
keys += fetch_iterators()
for (cntx: batch)
run(cntx, keys[i]) |
Yes, this is the plan. In any case I am not in hurry to submit this Pr. it can stay as food for thought |
Let's merge, it's just that you reveal your plans step by step whereas we're curious for the final plan |
1dd7249
to
77848a2
Compare
Use intrusive queue that allows batching of scheduling calls instead of handling each call separately. This optimizations improves latency and throughput by 3-5% In addition, we expose batching statistics in info transaction block. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
77848a2
to
0fcab98
Compare
|
It's really strange because it's c++17 |
the support for C++17 changes from compiler version to version. Same for all C++ releases. For example, you got modules in C++20. Are they properly supported in recent versions of gcc and clang ? I don't think so. Basically, for every new release they just add support of the missing features so even if you compiler something with C++17 flag it doesn't necessarily mean that the feature exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardware_destructive_interference_size does not exist in the gcc versions we use so I can not use it. I added a named const for that. PTAL
We use absl::hardware_something in a few places 🙂
Let's merge and see what we make out of it later
Use intrusive queue that allows batching of scheduling calls instead of handling each call separately. This optimizations improves latency and throughput by 3-5%