-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor tasks: GPU/Dask/Ray/Server support #557
Conversation
ad554ea
to
218b697
Compare
This failure is so odd: Sometimes we get 6, sometimes 7 columns. |
e14eb59
to
ac977a9
Compare
This makes it easier to send tasks around to a server, or use dask/ray for distributed computing. It will also make it easier to add other backends, such as GPU based execution.
ac977a9
to
484e864
Compare
Failure was due to a cached file in ~/.vaex/data/
|
Ahh.. so the windows CI starts using numpy 1.18.1, and this issue is still there.. :( Linux also has numpy 1.18.1 but there it passes.. |
This PR completely refactors how task work. Now a task (e.g. an aggregation) is defined, and can be cheaply serialized, and can be executed in several ways. This PR focusses mostly on the server-side and CPU execution, but local experimental branches have shown this to work well with Dask or Ray for distributed computing, although this requires #548 for efficient serialization of data/dataframes, and some refactoring of groupby/join/take etc. It also opens to door to other task executions, e.g. on the GPU, for example using the libraries powering cuDF.