-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6553] [pyspark] Support functools.partial as UDF #5206
Conversation
Can one of the admins verify this patch? |
|
@@ -123,7 +123,7 @@ def _create_judf(self): | |||
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command, self) | |||
ssql_ctx = sc._jvm.SQLContext(sc._jsc.sc()) | |||
jdt = ssql_ctx.parseDataType(self.returnType.json()) | |||
judf = sc._jvm.UserDefinedPythonFunction(f.__name__, bytearray(pickled_command), env, | |||
judf = sc._jvm.UserDefinedPythonFunction(f.__repr__(), bytearray(pickled_command), env, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try to get the name from it? Having a friendly name will help debugging.
name = f.__name__ if hasattr(f, “__name__") else f.__class__.__name__
Jenkins, OK to test |
Hi, I like
as that should even work for oldstyle callables. |
Test build #628 has started for PR 5206 at commit
|
@ksonj LGTM, waiting for jenkins. ping @JoshRosen |
Test build #628 has finished for PR 5206 at commit
|
LGTM. I suppose that we could add a regression test for this, which might be as simple as calling spark/python/pyspark/sql/tests.py Line 117 in 887e1b7
Also, I noticed that this was opened against |
I've added two tests for UDFs with partial functions and callable objects. Thanks for the hint, I'll open future PRs against |
Jenkins, retest this please. |
Test build #29416 has started for PR 5206 at commit
|
Test build #29416 has finished for PR 5206 at commit
|
Test FAILed. |
Doh,
|
Fixed that. @JoshRosen |
Jenkins, retest this please. |
Test build #29561 has started for PR 5206 at commit
|
Test build #29561 has finished for PR 5206 at commit
|
Test PASSed. |
LGTM, so I'm going to merge this into |
Use `f.__repr__()` instead of `f.__name__` when instantiating `UserDefinedFunction`s, so `functools.partial`s may be used. Author: ksonj <kson@siberie.de> Closes #5206 from ksonj/partials and squashes the following commits: ea66f3d [ksonj] Inserted blank lines for PEP8 compliance d81b02b [ksonj] added tests for udf with partial function and callable object 2c76100 [ksonj] Makes UDFs work with all types of callables b814a12 [ksonj] support functools.partial as udf
PRs Merged 1. [Internal] Add AppleAwsClientFactory for Mascot (apache#577) 2. Hive: Log new metadata location in commit (apache#4681) 3. change timeout to 120 for now (apache#661) 4. Internal: Add hive_catalog parameter to SparkCatalog (apache#670) 5. Internal: Pull catalog setting to CachedClientPool (apache#673) 6. Core: Defer reading Avro metadata until ManifestFile is read (apache#5206) 7. API: Fix ID assignment in schema merging (apache#5395) 8. AWS: S3OutputStream - failure to close should persist on subsequent close calls (apache#5311) 9. API: Allow schema updates to find fields with case-insensitivity (apache#5440) 10. Spark 3.3: Spark mergeSchema to respect Spark Case Sensitivity Configuration (apache#5441)
Use
f.__repr__()
instead off.__name__
when instantiatingUserDefinedFunction
s, sofunctools.partial
s may be used.