-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilise effect handlers #51
Conversation
6b463c5
to
f840535
Compare
f840535
to
9348e59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this, see comments inline.
The commit d023dc2 depends on ocaml-multicore/ocaml-multicore#693. With this feature, this program, which has an exception being thrown from await prints a backtrace that includes both the awaiting and the awaited task.
|
There were a couple of valid concerns raised by @ctk21 about the current implementation:
|
I assume that once a future version of OCaml gets an effect type-system we'd be able to determine at compile time when a handler is missing. Meanwhile not sure whether there is a type-level solution that is not too invasive (don't want to force ending up with a monad or other solution that requires plumbing a parameter through everything). Perhaps a compromise would be to include a lightweight |
type _ eff += Wait : 'a promise * task_chan -> unit eff | ||
|
||
let get_pool_data p = | ||
match Atomic.get p with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be multiple semantics to pool teardown (might be useful to have a function that just waits/drains the active tasks without causing exceptions to be raised), but once you reach teardown it could just go through the queue of all active tasks and "cancel" them by setting their result to be an exception of cancelled (obviously can't cancel any that are actively running, the await might have some critical "finally" statements to clean up resources). But it might allow removing this Atomic.get if some kind of cancelation is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. It would be useful to convert this into an issue that we tackle in a later PR.
I've tested this on my original program that failed previously, and it works now. (It is an implementation of
Interestingly the OCaml implementation spends a lot more time in userspace and a lot less time in kernelspace (and less time overall). With cold caches the benefits of parallelism become more obvious:
The speed of Lwt and Luv is rather disappointing though, with warm caches and with not even printing the filenames but just counting them I get something that uses 185% CPU but barely achieves a speedup over the sequential find (this is even if I implement Lwt.Unix.openat):
(I don't have a working Luv version at hand though). Once I've cleaned up my code I'll post it online for comparison, I was trying to keep the multicore and Lwt/Luv versions quite similar in implementation. It looks like having the ability to execute OCaml code as soon as we get a result from the syscal on the same thread makes a huge difference (otherwise a lot of time is spent sending jobs between threads and doing syscalls to put threads to sleep and wake them again). |
@ctk21 wanted the results of the On
with GC stats:
On this branch:
with GC stats:
The effects version has lower latency than |
Is a valid way of thinking about this question that if a program needs "task-local" state, then:
I guess that supporting pinning is somewhat complex, and that it adds scheduling constraints that can reduce performance. On the other hand, if task-local state is needed, would you expect that it might be the most efficient implementation strategy? |
@edwintorok wrote:
Does your reimplementation of find use the |
On 1 November 2021 15:56:56 GMT, Anil Madhavapeddy ***@***.***> wrote:
***@***.*** wrote:
> It looks like having the ability to execute OCaml code as soon as we get a result from the syscal on the same thread makes a huge difference (otherwise a lot of time is spent sending jobs between threads and doing syscalls to put threads to sleep and wake them again).
Does your reimplementation of find use the `eio` library? That will use io_uring on Linux and should have very little syscall activity indeed. I'd be interested in seeing that _vs_ a direct syscall implementation.
Not yet, but it looks like `io_uring` has evolved and can now do `openat2` and `statx` calls too. Would be interesting to patch `eio` to support those and then try an eio/domainslib hybrid that uses domainslib tasks only for readdir (which has no urging equivalent yet).
…--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
resolved the conflicts against trunk and deprecation warnings |
Note that ocaml-uring and eio already support openat2: https://github.com/ocaml-multicore/ocaml-uring/blob/main/lib/uring/uring.mli#L106 Patches adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a closer look at the code and it looks reasonable to me.
My open ended comments reported by KC from earlier still stand, but I suspect we might be better served by merging and trying it out.
Purely administrative, but when we tag and release the next version of domainslib, we should bump the major version given the nature of the breaking changes.
Agreed, we should bump the major version owing to the significance of this change. You're right, Sandmark |
The domainslib API docs are already updated. Which docs are you referring to @Sudha247? |
Ah, I meant https://github.com/ocaml-multicore/parallel-programming-in-multicore-ocaml :) Also, after testing this PR with |
Thanks. Merging now. |
This PR uses effect handlers to create tasks. The use of effect handlers will fix the issue described in ocaml-multicore/ocaml-multicore#670 (comment). The new test
test_deadlock.ml
precisely captures the scenario explained in the comment. This test program (after removal ofT.run
) will deadlock ondomainslib.0.3.1
but runs to completion on this branch.Unfortunately, this introduces a breaking change; the computations need to be enclosed in a
run
function due to the use of effect handlers. I am also yet to do performance benchmarking of this change.This PR also makes domainslib work only with
4.12+domains
(and OCaml 5.00); the stdlibEffectHandlers
module is not made available on4.12.0+domains+effects
. In order to address this, the PR ocaml-multicore/ocaml-multicore#689 adds theEffectHandlers
module to4.12.0+domains+effects
.