Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vine: Function Calls Need More Consistent Method of Returning Results #3828

Open
btovar opened this issue May 9, 2024 · 4 comments
Open
Labels
bug For modifications that fix a flaw in the code. TaskVine

Comments

@btovar
Copy link
Member

btovar commented May 9, 2024

The dask executor uses a combination of temporary outputs and regular outputs to compute the graph. Function calls using buffers assume that the results are used immediately when the task returns, but in the dask executor they may be used once the task has gone out of scope and garbage collected.

In #3824 I change function calls to not use buffers, as that should work for all cases. However, it may hurt a bit in performance.
@tphung3 @BarrySlyDelgado @colinthomas-z80

@btovar btovar added bug For modifications that fix a flaw in the code. TaskVine labels May 9, 2024
@dthain
Copy link
Member

dthain commented May 17, 2024

@btovar could you update this with the current state of things? I believe you added some transformation re buffers and disk for function calls

@btovar
Copy link
Member Author

btovar commented May 20, 2024

Currently "task-mode" and "function-call-mode" do the same, which is to write inputs and outputs to files on disk. This means that FunctionCall does not use buffers anymore. This is the safe thing to do but may lose some of the performance of using buffers.

@dthain
Copy link
Member

dthain commented Jun 12, 2024

It seems to me that we have two different modes of operating that need to be supported:
1 - A standalone function call that will return a Python result to be used in-memory by the manager program. The current mode of operation seems to support this well.
2 - A function call that operates as part of a DAG (as in Dask). In this mode, the input and output file objects should be declared externally to the task and attached with add_input and add_output and then pruned and undeclared when no longer needed.

And I will throw out the following general constraint:

  • Whoever declares a file is responsible for undeclaring it. If the DAG processor declared the file, it should undeclare it. If the Task declared some file for internal use, then the Task should be responsible for undeclaring it.

Perhaps we need to reorganize the class hierarchy to better reflect that 1 is implemented as an extension around 2.

@dthain
Copy link
Member

dthain commented Jun 12, 2024

Pulling over brief comment from Ben in #3800:

  • The function calls return a buffer which is freed when the task goes out of scope. Either we need a way to specify their actual output file, or set lazy_transfers=True when using function calls.

@dthain dthain changed the title vine: reconsider interaction of buffer and temp files as outputs for function calls Vine: Function Calls Need Consistent Method of Returning Results Jun 12, 2024
@dthain dthain changed the title Vine: Function Calls Need Consistent Method of Returning Results Vine: Function Calls Need More Consistent Method of Returning Results Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug For modifications that fix a flaw in the code. TaskVine
Projects
Development

No branches or pull requests

2 participants