Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Enable Actor Pool UDFs by default #3488

Merged
merged 6 commits into from
Dec 5, 2024
Merged

Conversation

kevinzwang
Copy link
Member

@kevinzwang kevinzwang commented Dec 4, 2024

Todo:

  • fix tests
  • add docs (future PR)
  • add threaded concurrency (future PR)

@kevinzwang kevinzwang requested review from jaychia and removed request for jaychia December 4, 2024 09:55
@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 4, 2024
Copy link

codspeed-hq bot commented Dec 4, 2024

CodSpeed Performance Report

Merging #3488 will degrade performances by 26.92%

Comparing kevin/no-more-stateful-udf (5367a7e) with main (8de0101)

Summary

❌ 1 regressions
✅ 16 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main kevin/no-more-stateful-udf Change
test_iter_rows_first_row[100 Small Files] 202.1 ms 276.5 ms -26.92%

Copy link

github-actions bot commented Dec 4, 2024

Copy link

codecov bot commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 85.06329% with 59 lines in your changes missing coverage. Please review.

Project coverage is 77.38%. Comparing base (6d30e30) to head (5367a7e).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
daft/execution/actor_pool_udf.py 0.00% 32 Missing ⚠️
src/daft-dsl/src/functions/python/mod.rs 86.76% 9 Missing ⚠️
daft/udf.py 93.54% 4 Missing ⚠️
daft/runners/pyrunner.py 57.14% 3 Missing ⚠️
...rc/daft-logical-plan/src/ops/actor_pool_project.rs 60.00% 2 Missing ⚠️
daft/expressions/expressions.py 87.50% 1 Missing ⚠️
daft/runners/ray_runner.py 88.88% 1 Missing ⚠️
src/common/daft-config/src/python.rs 0.00% 1 Missing ⚠️
src/daft-dsl/src/functions/python/udf.rs 91.66% 1 Missing ⚠️
src/daft-dsl/src/resolve_expr/mod.rs 50.00% 1 Missing ⚠️
... and 4 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3488      +/-   ##
==========================================
+ Coverage   77.00%   77.38%   +0.38%     
==========================================
  Files         696      699       +3     
  Lines       86039    84942    -1097     
==========================================
- Hits        66256    65736     -520     
+ Misses      19783    19206     -577     
Files with missing lines Coverage Δ
daft/execution/execution_step.py 89.47% <100.00%> (ø)
daft/execution/physical_plan.py 94.01% <100.00%> (ø)
src/common/daft-config/src/lib.rs 84.12% <100.00%> (+2.43%) ⬆️
src/daft-dsl/src/expr/mod.rs 73.77% <100.00%> (+0.51%) ⬆️
src/daft-dsl/src/functions/mod.rs 69.66% <100.00%> (ø)
src/daft-dsl/src/lib.rs 100.00% <100.00%> (ø)
src/daft-dsl/src/pyobj_serde.rs 85.71% <ø> (+4.76%) ⬆️
src/daft-dsl/src/python.rs 95.87% <100.00%> (+3.45%) ⬆️
src/daft-logical-plan/src/builder.rs 91.56% <100.00%> (-0.10%) ⬇️
src/daft-logical-plan/src/ops/project.rs 63.22% <100.00%> (ø)
... and 16 more

... and 29 files with indirect coverage changes

@kevinzwang kevinzwang marked this pull request as ready for review December 4, 2024 11:11
@kevinzwang kevinzwang requested a review from jaychia December 4, 2024 11:11
Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me with some overall comments:

  1. I think we should take a stab at wrapping user-provided functions in a no-op class (perhaps as a follow-on
  2. Let's check the user experience around error handling/reporting of initializations

expression: PyExpr,
) -> dict[str, tuple[PartialStatefulUDF, InitArgsType]]: ...
def bind_stateful_udfs(expression: PyExpr, initialized_funcs: dict[str, Callable]) -> PyExpr: ...
def initialize_udfs(expression: PyExpr) -> PyExpr: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check what initialization errors look like using this instead of the bind_stateful_udfs approach?

One thing I like about extract_partial_stateful_udf_py + bind_stateful_udfs is that the initialization happens in Python-land, so we get nice backtraces and stuff if initialization code (which is in "user-space") errors out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the errors still look pretty reasonable. Here's an example from one of our tests:
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For actor pool UDFs it's not as nice but that has always been the case. The native executor actually does a decent job but since things are in a separate process it's just not possible to do it as well


Terminates once it receives None.
"""
initialized_projection = ExpressionsProjection([e._initialize_udfs() for e in uninitialized_projection])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah curious what the error handling looks like here (e.g. bad init_args, runtime errors in __init__)

daft/runners/ray_runner.py Show resolved Hide resolved
daft/udf.py Show resolved Hide resolved
daft/udf.py Show resolved Hide resolved
src/daft-dsl/src/functions/python/udf.rs Show resolved Hide resolved
tests/expressions/test_udf.py Outdated Show resolved Hide resolved
tests/expressions/test_udf.py Outdated Show resolved Hide resolved
tests/expressions/test_udf.py Outdated Show resolved Hide resolved
tests/expressions/test_udf.py Outdated Show resolved Hide resolved
@kevinzwang kevinzwang enabled auto-merge (squash) December 5, 2024 01:26
@kevinzwang kevinzwang merged commit 56f5089 into main Dec 5, 2024
42 of 43 checks passed
@kevinzwang kevinzwang deleted the kevin/no-more-stateful-udf branch December 5, 2024 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants