-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pyspark): add support for pyarrow and python UDFs #9753
Conversation
export PYSPARK_PYTHON=$(which python) 🤔 interesting... We don't run the full backend test suite through nix (only the backends that can be run locally with the exception of PySpark). You might have figured out the way we can now do that. I'll give it a try 🥳 |
It looks like you might need to xfail the UDF tests for the earlier version of PySpark that we test against (3.3.3). |
Sweet, setting I'll put up a separate PR to add that to our |
Thanks for that spot @cpcloud - I'd mistakenly assumed that it was implemented in a previous version. The pyspark docs weren't exactly clear on that. I've pushed some changes that should hopefully check that the pyarrow UDFs fail for pyspark < 3.5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I will commit my suggested changes and the push up any fixes needed!
60e0526
to
edc7838
Compare
Description of changes
This PR modifies the pyspark Backend so that it can support UDFs implemented in pure python or pyarrow by making use of the
pyspark.sql.udf
wrapper.To successfully run the unit tests for a pyarrow UDF, I needed to ensure that the spark worker executed the correct python interpreter by setting
within my nix shell. This might require some further configuration of the nix environment in order to pass during the CI tests.
Issues closed