-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tcmur: improve batch kernel wake up notifications #392
Conversation
This is a compromise between the current implementation and the solution proposed in #359. It serializes the wake ups to the kernel but attempts to batch if multiple completions arrive while waking the kernel up. In my development environment, it improves 4K small IO from ~5K IOPS to ~8.5K IOPS. |
3e54340
to
46141e3
Compare
Adding @lxbsz who just pinged me on this. You lucked out! The ceph team did it for gluster again :) |
{ | ||
struct tcmu_track_aio *aio_track = &rdev->track_queue; | ||
|
||
pthread_cleanup_push(_cleanup_mutex_lock, (void *)&aio_track->track_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think cleanup is necessary here. It's necessary when there is a cancellation point in the critical section. My reading of posix (and the linux man page) doesn't include assert as a cancellation point, but I could be wrong... It may output, which could block (and hence should count as a cancellation point), but if it does, it will exit the application anyway so is probably moot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's probably not needed since the daemon will crash if we hit the assert.
I think we are sometimes overly careful. We can do a clean up patch later.
In my test environment this patch increase performance to 3x times and decrease latency to 3x times. FIO pattern: --bs=4k --iodepth=32 --numjobs=4 --direct=1 --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio (--rw=randwrite, --rw=randread) tcmu-this-patch -> randread iops 19.1k, lat (usec): min=906, max=37268, avg=6657.18, stdev=1972.39 |
@mikechristie Yeah, this is what we really need and thanks very much @dillaman 's great work. I will test this later. |
Hi, all. I tested this patch with one size=1T lun, 8K randwrite IOPS increase to 2x times.(randwrite: ~121 IOPS to ~243 IOPS. ) I will create more LUN to stress the whole cluster and feedback the result. |
Hi ~ I tested the perf with the patch and did some comparison RBD 4k rand_write RBD 4k rand_read RBD 64k write RBD 64k read |
I have test this based the gluster-block product: With this patch: The IOPS is about 13% improved and the BW is about 10~20% improved. |
@serjponomarev @Github641234230 I found your IOPS/BW is very high. what's the LUN size in your test? What is your cluster HW configuration? For the reason that RBD is thin-provisioned, did you dd the LUN firstly before the test to make sure the space is really allocated. |
@zhuozh LUN size = 100G . Yes, the space is really allocated for the RBD. rbd info fio 2.99 [global] [randwrite] |
@zhuozh in my tests: |
tcmulib_processing_complete(dev); | ||
track_aio_wakeup_finish(rdev, &wake_up); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could hit a race:
-
pending_wakeups = 0. thread1 calls track_aio_request_finish, pending_wakeups is incremented to 1, and wake_up=1 is returned. It does it's call to tcmur_command_complete and then falls into the while loop and does tcmulib_processing_complete.
-
thread2 then calls track_aio_request_finish and increments pending_wakeups=2. wake_up=0 is returned so it is relying on thread1 to call tcmulib_processing_complete.
-
thread1 then calls track_aio_wakeup_finish. It sees aio_track->pending_wakeups > 1 and sets pending_wakeups=1and returns wake_up=1.
-
thread1 is super fast and we loop again and makes the tcmulib_processing_complete call and track_aio_wakeup_finish which should have completed the processing for the cmd for thread2. pending_wakeups=1 so it is now set to 0 and wake_up=0 is returned. thread1 is now done. It returns.
-
thread2 finally calls tcmur_command_complete (or it could have been in the middle of calling it but yet done updating the ring buffer). wake_up was 0 for it, so it does not go into the loop above and tcmulib_processing_complete is never called for it.
If the race analysis is correct I think we just need to do something like the attached where we hold the device completion lock while updating the aio track calls as well as doing the tcmulib_command_complete. aio_command_finish it just a little funky in that it straddles both the aio code and tcmur cmd handler code, so my change here is a little gross.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't the call to tcmur_command_complete
and track_aio_request_finish
be swapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't the call to tcmur_command_complete and track_aio_request_finish be swapped?
Lol, yeah that is much better! My eyes were probably going crosseyed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK -- swapped the two function calls.
Only fire a single wake up concurrently. If additional AIO completions arrive while the kernel is being woken up, the wake up will fire again. Signed-off-by: Jason Dillaman <dillaman@redhat.com>
46141e3
to
4c873a2
Compare
Thanks. |
Only fire a single wake up concurrently. If additional AIO
completions arrive while the kernel is being woken up, the
wake up will fire again.
Signed-off-by: Jason Dillaman dillaman@redhat.com