-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vine/WQ: Serial Task Latency #3432
Comments
There are two potential culprits here: |
@colinthomas-z80 has cleaned up a number of items to improve performance in the TaskVine scheduler natively. (Colin, please post a brief summary here of the performance that you are able to get so far. |
Next step: I suspect that the Parsl-TaskVine executor can run into a degenerate case where it waits one second for a TV task to complete, then goes back to checking for newly submitted Parsl tasks. This could result in awful latency for serial task submission. Note the |
Regarding the size of the ready task queue we now have a linear increase in execution time across different task batch sizes. The figure shows the performance after various stages of the development process up to the current. I will work up a profile on the serial task execution now that this is stable |
@colinthomas-z80 I know that you did some work on this problem, please summarize and link here. Is there more to be done, or is it adequately fixed/ |
Regarding the serial task latency in parsl, we found the parsl executor loop was waiting on additional task retrievals after retrieving a single task. In the serialized scenario there are no more tasks to wait on, so the latency was a function of our wait timeout. I removed this additional waiting with a check on the task queue size after each retrieval in the parsl wq executor. The PR is still open at Parsl/parsl#2984. This improved the serial latency from 1 task per second, to about 6 tasks per second. |
Ok, is the same problem present in the TaskVine executor, or no? |
Yes the problem is present, I'm querying Ben to see if he is content with the fix, and if so I will update taskvine as well |
colin's PR 2984 looks good and I'm merging it now - 7m test suite run turned into 10s test suite run! |
Nice! |
I added this comment to the PR after I merged it, after thinking about it some more: basically I think @colinthomas-z80 PR trades off latency for 100% use of a single core... |
@benclifford I will see about putting a wait on the submit queue if we have no work to do. |
I opened #3018 which should wait on the input queue for a task to appear when there is nothing to be done. It is hard to quantify CPU usage but it seems to average lower on the system monitor tool. |
@colinthomas-z80 please check if the Vine executor has the same problem and can be addressed by the same solution in Parsl. |
3038 is open at parsl, updating TaskVine with the loop changes |
Merged |
Performance has been dramatically improved. |
@benclifford reports:
The text was updated successfully, but these errors were encountered: