-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GIT PULL] liburing/io_uring_for_each_cqe: Pull load acquire out of for loop #1249
Conversation
The io_uring_for_each_cqe macro is used to efficiently iterate over all currently available CQEs. It allows to amortize the cost of the store_release on the head for all CQEs processed via this macro. This change also amortizes the load acquire on the tail, by pulling the load out of the for loop. Signed-off-by: Constantin Pestka <constantin.pestka@c-pestka.de>
I think this is fine to do, it'll iterate anything that was already available by the time of starting the loop. |
for (head = *(ring)->cq.khead; \ | ||
(cqe = (head != io_uring_smp_load_acquire((ring)->cq.ktail) ? \ | ||
(cqe = (head != __liburing_internal_cached_tail ? \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (false)
io_uring_for_each_cqe() { ... }
Try it, for each will get run.
It's changing behaviour in a nasty way, we can't have this version. |
Oh. Yeah, that is nasty. Having two initializations in the for loop instead does not work either here -,- |
Ah indeed, thanks Pavel. I'll revert it. |
This reverts commit cb02d22. This wasn't fully baked, see: #1249 (comment) for details. Signed-off-by: Jens Axboe <axboe@kernel.dk>
One would be ideal, I don't think there's any issue with only loading it upfront rather than repeatedly in the loop. There was never any guarantee on how CQEs are found, it only really guarantees that CQEs that were available at the time of the call will be iterated. Eg the app does a wait_nr(N) and then iterates. And depending on ring setup mode, there's no way that events will show up DURING the loop, unless old school setup is done (eg signal notifications for task_work). |
Agree, that should be fine, it depends on timings anyway and might happen that we get a new batch only after finishing "for each".
Idea 1: for (int cached_tail = ({ head = ring.head; load_acquire((ring.tail) }, ...) Do we require compilers to support statement expressions in liburing? Idea 2: struct iter {
int head, tail;
};
for (struct iter it = {..., ...}, ...) In which case you don't even need to pass head, so it might even be a separate helper or just a dummy parameter |
I like idea 1, as it doesn't require a new helper for this. In terms of compiler support, that kind of construct has been used in the kernel for 20+ years, so I don't think we have to worry about that. |
Not particularly a problem, you can always gracefully ignore an argument.
That's if we follow kernel requirements, we're not doing msvc, but I was still thinking about cases where the user might want some niche compiler. Maybe some heterogeneous computing like cuda, or when BPF was in discussion I was thinking making the header shareable with libbpf program (not sure if it supports it). |
Sure, but then folks would need to opt-in. I hear your point on more esoteric compilers. How about this, and please stay close to the eye wash station:
which should result in identical code as to just getting rid of passing in a head iterator to begin with. |
I don't understand, the user should never use the value of #define for_each(ring, head, cqe) for_each_no_head(ring, cqe)
Or we can do whatever option and see if anyone complaints. Though it reminds me someone tried to compile it as a strict C.
Any would do IMHO. nit: I don't think you need to assign |
Just trying to avoid needing to do another helper... We have to assign |
I thought it was a bit odd that the macro repedatly does the atomic load. While when using it one can amortize the atomic store following the loop, I don't really see why one wouldn't want to do the same for the load, like is already being done in io_uring_peek_batch_cqe(). Maybe I'm missing smth.
This pulls the load out of the loop into a local. Not sure if this is acceptable. I gave it a long ugly name to avoid name collisions, but still. Maybe making a new macro would be better.
git request-pull output:
Click to show/hide pull request guidelines
Pull Request Guidelines
notification, use
[GIT PULL]
as a prefix in your PR title.Commit message format rules:
Signed-off-by
tag with your real name and email. For example:The description should be word-wrapped at 72 chars. Some things should
not be word-wrapped. They may be some kind of quoted text - long
compiler error messages, oops reports, Link, etc. (things that have a
certain specific format).
Note that all of this goes in the commit message, not in the pull
request text. The pull request text should introduce what this pull
request does, and each commit message should explain the rationale for
why that particular change was made. The git tree is canonical source
of truth, not github.
Each patch should do one thing, and one thing only. If you find yourself
writing an explanation for why a patch is fixing multiple issues, that's
a good indication that the change should be split into separate patches.
If the commit is a fix for an issue, add a
Fixes
tag with the issueURL.
Don't use GitHub anonymous email like this as the commit author:
Use a real email address!
Commit message example:
By submitting this pull request, I acknowledge that: