-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(dask): port the dask backend to the new execution model #8005
Conversation
f767c67
to
c887e8b
Compare
0f1dec5
to
f4abfb5
Compare
28850d7
to
87e14ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments. Overall I'd be very happy to get this in and then iterate. This is much cleaner than the current dask backend, thanks for all the work here!
I do think many of these places where we're explicitly calling .compute()
as part of handling an operation are places where we should just drop support for that operation until we can spell it in a more efficient way (for many of these dask itself can do so efficiently, so the limit is more on our executor design than on dask). I'd rather have an UnsupportedOperationError
for now than something that is made inefficient by something ibis
is doing rather than the backend itself.
): | ||
def agg(df): | ||
# if df is a dask dataframe then we collect it to a pandas dataframe | ||
# because the user-defined function expects a pandas dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that computing here like this would be unexpected and not something we should do.
IIUC these are the legacy udfs. IMO we should remove these as part of this release. They were only supported by the pandas and dask backends, and given the rewrite changed a bunch of the internals (and the only major user of these we know of made heavy use of the internals) I suspect there's no user that would be able to upgrade to this release that was also using the legacy udfs. We should drop them for them IMO for the new udfs alone.
3376022
to
432d151
Compare
You can xfail |
On one hand I agree, simply because it should be the ideal way to go. On the other hand at least these operations are usable, even though they are not as performant as they should be. Also some of the window functionality used to be supported, so raising would cause a regression. There is a third option is to raise a warning so that the user is aware about the performance problems and gradually improve them (if there is interest). |
I'd argue that the previous dask backend was both buggy and slow enough that no one is probably using it (we certainly haven't heard from any users successfully using it). A regression in functionality with an improvement in correctness of dask code (including not calling |
…s-project#8005) Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
…s-project#8005) Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
…s-project#8005) Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
…s-project#8005) Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
…s-project#8005) Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
…s-project#8005) Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
…s-project#8005) Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
…s-project#8005) Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
…s-project#8005) Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy. There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least. Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.
Reimplementation of the dask backend on top of the new pandas executor. I had to adjust the pandas backend to support extending. This way the new dask implementation turned out to be pretty tidy.
There are a couple of features which are not implemented using proper dask constructs, but rather have a fallback to local execution using pandas. The most notable are the window functions. The previous dask implementation supported just a couple of window cases, but this way we have full coverage at least.
Thanks to the new pandas base we have a wider feature coverage, see the removed xfails in the test suite.