Feature/instance udfs #890

timsaucer · 2024-10-02T13:46:10Z

Which issue does this PR close?

Closes #822

Rationale for this change

With this change we add an additional argument to udaf and udwf to be used for calls to __init__ of the class that implements the aggregation or partition evaluator. This allows users to pass in these arguments so we can reuse a single class that requires parameters.

The other option is to always pass in these parameters as lit() values which is not as performant.

What changes are included in this PR?

Updates the udaf call to add in the provided arguments when initializing the classes. Also adds unit tests.

Are there any user-facing changes?

There is a non-breaking change to the udaf function that allows for an optional set of arguments to be passed in. If any users are instead calling AggregateUDF() directly they will need to add in an empty list as the extra argument.

There is a breaking change to udwf but since it is not in any released version yet, it shouldn't count as a user-facing change in my opinion.

Michael-J-Ward · 2024-10-03T17:13:30Z

Instead of saving the __init__ arguments and calling them ourselves, wouldn't it be more flexible and less work for datafusion-python to allow the user to provide a factory function?

Modulo runtime assertions, this works on main currently:

def test_udaf_aggregate_with_arguments(df):
    bias = 10.0

    def factory() -> Accumulator:
        return Summarize(bias)

    summarize = udaf(
        factory,
        pa.float64(),
        pa.float64(),
        [pa.float64()],
        volatility="immutable",
        # arguments=[bias],
    )

    df1 = df.aggregate([], [summarize(column("a"))])

    # execute and collect the first (and only) batch
    result = df1.collect()[0]

    assert result.column(0) == pa.array([bias + 1.0 + 2.0 + 3.0])

timsaucer · 2024-10-04T11:35:30Z

This is a very good idea. I'll change to your approach.

Michael-J-Ward

Looks great!

Michael-J-Ward · 2024-10-04T14:52:47Z

Ah, shoot. Sorry @timsaucer, I created a conflict when I merged the release-testing PR. This needs a rebase.

timsaucer · 2024-10-04T15:00:35Z

No worries at all. I'll wait for #892 CI to finish, merge that, then rebase this.

… we get a clean state when functions are reused

…nction

timsaucer self-assigned this Oct 2, 2024

Michael-J-Ward approved these changes Oct 4, 2024

View reviewed changes

timsaucer added 8 commits October 4, 2024 11:31

Add option for passing in constructor arguments to the udaf

c119b5d

Fix small warnings in pylance

14fc166

Improve type hinting for udaf and fix one pylance warning

e3b4acc

Set up UDWF to take arguments as constructor just like UDAF to ensure…

ca1352e

… we get a clean state when functions are reused

Improve handling of udf when user provides a class instead of bare fu…

15bc0ad

…nction

Add unit tests for UDF showing callable class

a989d27

Add license text

b7468dc

Switching to use factory methods for udaf and udwf

250baea

timsaucer force-pushed the feature/instance-udfs branch from 9430084 to 250baea Compare October 4, 2024 15:31

Move new tests to the new testing directory

cb7dc7c

timsaucer merged commit 1fd3762 into apache:main Oct 4, 2024
15 checks passed

timsaucer deleted the feature/instance-udfs branch October 4, 2024 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/instance udfs #890

Feature/instance udfs #890

timsaucer commented Oct 2, 2024

Michael-J-Ward commented Oct 3, 2024

timsaucer commented Oct 4, 2024

Michael-J-Ward left a comment

Michael-J-Ward commented Oct 4, 2024

timsaucer commented Oct 4, 2024

Feature/instance udfs #890

Feature/instance udfs #890

Conversation

timsaucer commented Oct 2, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Michael-J-Ward commented Oct 3, 2024

timsaucer commented Oct 4, 2024

Michael-J-Ward left a comment

Choose a reason for hiding this comment

Michael-J-Ward commented Oct 4, 2024

timsaucer commented Oct 4, 2024