Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stronger memory load acquire when reading the wakeup flag #219

Closed
jorangreef opened this issue Oct 2, 2020 · 10 comments
Closed

stronger memory load acquire when reading the wakeup flag #219

jorangreef opened this issue Oct 2, 2020 · 10 comments

Comments

@jorangreef
Copy link
Contributor

@axboe, The man page for io_uring_setup.2 recently added the rationale for using memory load acquire semantics when checking sq_ring->flags:

/*
 * Ensure that the wakeup flag is read after the tail pointer
 * has been written. It's important to use memory load acquire
 * semantics for the flags read, as otherwise the application
 * and the kernel might not agree on the consistency of the
 * wakeup flag.
 */
unsigned flags = atomic_load_relaxed(sq_ring->flags);

However, looking closer, just after the comment, the actual semantics being used in the example above are atomic_load_relaxed instead of atomic_load_acquire?

The same might also be the case in queue.c for the implementation of sq_ring_needs_enter(), which uses IO_URING_READ_ONCE(*ring->sq.kflags).

To avoid I/O starvation, maybe a stronger barrier such as acquire is needed here instead of relaxed?

@jorangreef
Copy link
Contributor Author

If you agree, may I submit a PR to fix the man page and queue.c?

@romange
Copy link
Contributor

romange commented Oct 2, 2020

I think the comment talks about something else. The barrier happens at __io_uring_flush_sq : io_uring_smp_store_release(sq->ktail, ktail);

@jorangreef
Copy link
Contributor Author

@jorangreef
Copy link
Contributor Author

jorangreef commented Oct 2, 2020

@romange See the commit in #194 where the rationale for the need to guard flags when checking for wakeup is discussed.

@romange
Copy link
Contributor

romange commented Oct 3, 2020

@jorangreef Not trying to argue or anything - I am really not in expert in memory ordering.. Also I do not know io_uring code so I could be easily wrong. I think there is a bigger issue here and it's not related to barrier...

  1. atomic variables are always synchronized, even with relaxed semantics. Barriers are needed to formally define how data is synchronized before and after they are read or write.

2.What confused me here is that this code:

 if (load(*kflags, relaxed) == IORING_SQ_NEED_WAKEUP) 
  do_system_call(IORING_ENTER_SQ_WAKEUP);

should not require any ordering since it does not depend on any data besides kflags. But I also think you are correct and there is a bug here - and some notifications could be missed.

  1. I assume that in the kernel side (and really I do not know since I could not find a clear code snippet that proves my point):
    there is a SQ thread that does this:
if (too_much_pointless_spinning) {
 /* () */
*kflags |= IORING_SQ_NEED_WAKEUP; 
 mem_barrier_release();
 suspend_myself()
} 

which signals user thread to waken him. However, this could happen:
a. Kernel thread would preempt before changing *kflags but after deciding it needs to go to sleep (at /* ()*/ line)
b. A user thread reads *kflags before kernel thread mutated the flags.
c. A kernel thread goes to sleep and a user thread does not pass IORING_SQ_NEED_WAKEUP.

I am assuming that the kernel can not guarantee atomicity (i.e. transactional semantics) of multiple statements with respect to userland threads but maybe I am wrong. But If I am not wrong then the problem with the interface and not with memory barriers.

There is a wonderful lock-free algorithm that is called event-count that can solve this issue but it requires more that just a bit in an integer.

@romange
Copy link
Contributor

romange commented Oct 3, 2020

Specifically, the question is whether a userland thread can run when ctx->completion_lock is taken.

https://github.com/torvalds/linux/blob/d3d45f8220d60a0b2aaaacf8fb2be4e6ffd9008e/fs/io_uring.c#L6549

if it can - we have a bug that requires a different algorithm than just atomic.set/get

@axboe
Copy link
Owner

axboe commented Oct 3, 2020

The wakeup flag is ordered with the SQ ring kernel tail read, so as far as I can tell the logic should be sound. The kernel checks if it should go to sleep after setting IORING_SQ_NEED_WAKEUP and doesn't if the SQ ring has entries. The app/liburing side flushes entries to the SQ ring, then checks flags for needing to enter.

@romange
Copy link
Contributor

romange commented Oct 3, 2020

You are right, I missed the piece where kernel checks the tail. So the ordering is around the tail which synchronizes both threads.
In that case relaxed load should suffice for kflags.

@axboe
Copy link
Owner

axboe commented Oct 3, 2020

Thanks for double checking.

@axboe axboe closed this as completed Oct 3, 2020
@jorangreef
Copy link
Contributor Author

jorangreef commented Oct 4, 2020

This has been an interesting thought exercise.

Is this comment in the man page still correct then?

/*
 * Ensure that the wakeup flag is read after the tail pointer
 * has been written. It's important to use MEMORY LOAD ACQUIRE
 * semantics for the flags read, as otherwise the application
 * and the kernel might not agree on the consistency of the
 * wakeup flag.
 */

I think this should then be changed to MEMORY LOAD RELAXED?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants