-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC/WIP : Extensible remote synchronization objects #5817
Conversation
The last update makes RemoteRef is a parametric type that now carries information about what type of object it refers, i.e RemoteRef{T} where
|
Using the latest update:
A |
As an simple example of using tspaces, DB queries could be executed out-of-process - since the typical implementation ends up blocking the main julia thread - with something like :
As a test, I tried out 10,000 concurrent tasks waiting on 8 external workers to finish an equal number of tasks, code : https://gist.github.com/amitmurthy/9089918 . The scheduler didn't miss a beat - sailed through! |
Folks, would like some feedback on this functionality. cc: @JeffBezanson |
I have some rather general reservations about timeout/expiration-based interfaces. They strike me as never what you really want. Let me explain. In this case, why not simply remember how old each entry is and eject the oldest entries to make room for newer ones only when you need to? Is there any advantage to ejecting entries after a fixed amount of time? Likewise, when doing network programming with timers, rather than setting somewhat arbitrary timeouts at each level, what you really want is the ability to interrupt tasks from the outside. Consider an I/O task with six steps that you want to ensure the completion of in under a minute. If you put a timeout of 10 seconds on each step, that certainly ensures that the whole thing completes in under a minute. But what if one of the steps takes 11 seconds and the other steps finish in under a second? The total time would be no more than 16 seconds, but it will still timeout. To allow every possible way of finishing the task in under a minute, you have to give the first task a timeout of 60 seconds, then give the next task a timeout for what remains of the total 60 seconds, and so on. This is doable, but it's a huge pain – the programmer spends a huge amount of their code dealing with timeouts that just aren't the most important thing. It would be much nicer to have a supervising task that can simply wait 60 seconds while the I/O task does it's thing – completely ignorant of timeouts – and if the I/O task isn't done after 60 seconds, the supervising task just kills it and deals with the fact that it took too long. This is a vastly better API: the main I/O code includes no mention of any timeouts. Moreover, separating the timeout logic from the main I/O code, means that you can add the timeout handling easily after the fact instead of having to go through the main I/O logic and retrofit it with all this tedious timeout logic. To me, timeouts and expirations vs. interruptible tasks is very much like C-style error handling vs. exceptions, except possibly an even bigger usability difference. I know this expiration business is not quite I/O timeout programming, so this rant is slightly off-topic, but this has been weighing on my mind for quite a while, so I thought I'd express it. |
I very much agree with @StefanKarpinski here. And we already have the ability to do what he describes; a supervisor task can sleep for 60 seconds, and if the target task is still waiting it can be woken with an exception. |
Two things come to mind when building a job distribution engine using a
Since none of the
Not really, consider a web front-end in Julia that spawns off tasks (each then spawns its dependencies) for each request. All we do is set 60 seconds at each step including the outer one. The user on his browser gets a nice error message withing 60 seconds, while each sub-task does not wait forever in the server, each cleans itself up in 60 seconds.
This would be a great thing to do, but does not preclude supporting timeouts in
This is an implementation detail. For a web front-end we could literally have hundreds of concurrent tasks. We don't want an API where the programmer first has to register with a supervisor with the timeout value required instead of specifying it in the wait call itself. You would want a single supervisor task supervising all such concurrent tasks in the system, so sleep 60 won't do, the supervisor task will have to check every second and wake up the timed out tasks. This is an implementation detail - single supervisor vs timers, but the |
Would also like your thoughts on the general idea of this PR, i.e., user definable synchronization types, like has been done to construct |
Closing as same functionality has been implemented in package https://github.com/amitmurthy/MUtils.jl |
This PR replaces #5791 and make it possible to extend the types of process synchronization facilities that can be provided.
Consequently,
channels(), tspaces() and kvspaces()
as implemented previously in #5791 are now available as an external package - https://github.com/amitmurthy/SyncObjects.jl .The user needs to provide a concrete implementation of
AbstractRemoteSyncObj
withso
an object of typeSyncObjData
, and the following callbacks :cantake
,canput
,fetch
,put
,take
andquery
For example type
RemoteChannel
used to provide achannel
functionality is defined in https://github.com/amitmurthy/SyncObjects.jl asThe actual remote object is created using a new method
syncobj_create(pid, T::Type, args...)
where T is the type of synchronization object being created.As previously stated this is an alternative approach to having native
Channels
support in Julia (as in #5757).