[SPARK-6553] [pyspark] Support functools.partial as UDF #5206

ksonj · 2015-03-26T09:27:25Z

Use f.__repr__() instead of f.__name__ when instantiating UserDefinedFunctions, so functools.partials may be used.

AmplabJenkins · 2015-03-26T09:32:11Z

Can one of the admins verify this patch?

yishenggudou · 2015-03-26T11:21:51Z

_

davies · 2015-03-26T17:38:13Z

python/pyspark/sql/functions.py

@@ -123,7 +123,7 @@ def _create_judf(self):
        pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command, self)
        ssql_ctx = sc._jvm.SQLContext(sc._jsc.sc())
        jdt = ssql_ctx.parseDataType(self.returnType.json())
-        judf = sc._jvm.UserDefinedPythonFunction(f.__name__, bytearray(pickled_command), env,
+        judf = sc._jvm.UserDefinedPythonFunction(f.__repr__(), bytearray(pickled_command), env,


Could we try to get the name from it? Having a friendly name will help debugging.

name = f.__name__ if hasattr(f, “__name__") else f.__class__.__name__

davies · 2015-03-26T17:38:40Z

Jenkins, OK to test

ksonj · 2015-03-27T07:12:18Z

Hi, I like

name = f.__name__ if hasattr(f, “__name__") else f.__class__.__name__

as that should even work for oldstyle callables.

SparkQA · 2015-03-27T16:53:40Z

Test build #628 has started for PR 5206 at commit b814a12.

This patch merges cleanly.

davies · 2015-03-27T16:53:53Z

@ksonj LGTM, waiting for jenkins.

ping @JoshRosen

SparkQA · 2015-03-27T18:42:08Z

Test build #628 has finished for PR 5206 at commit b814a12.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2015-03-27T19:14:46Z

LGTM. I suppose that we could add a regression test for this, which might be as simple as calling udf() on a partial function. This could go into pyspark/sql/tests.py in the SQLTests class (I'd add a new method named test_udf_with_partial_function; here's a link to the relevant part of the test suite:

spark/python/pyspark/sql/tests.py

Line 117 in 887e1b7

def test_udf(self):

).

Also, I noticed that this was opened against branch-1.3. That's fine for now, but in general we should open these sorts of pull requests against master, since this patch should be applied for both 1.4.0 and 1.3.1.

ksonj · 2015-03-30T08:20:18Z

I've added two tests for UDFs with partial functions and callable objects. Thanks for the hint, I'll open future PRs against master then.

JoshRosen · 2015-03-30T17:38:35Z

Jenkins, retest this please.

SparkQA · 2015-03-30T17:43:20Z

Test build #29416 has started for PR 5206 at commit d81b02b.

This patch merges cleanly.

SparkQA · 2015-03-30T17:45:10Z

Test build #29416 has finished for PR 5206 at commit d81b02b.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-03-30T17:45:11Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29416/
Test FAILed.

JoshRosen · 2015-03-30T17:46:27Z

Doh, pep8 is complaining about blank lines:

=========================================================================
Running Python style checks
=========================================================================
PEP 8 checks failed.
./python/pyspark/sql/tests.py:123:9: E301 expected 1 blank line, found 0
./python/pyspark/sql/tests.py:136:9: E301 expected 1 blank line, found 0
[error] Got a return code of 1 on line 134 of the run-tests script.

ksonj · 2015-03-31T07:00:10Z

Fixed that. @JoshRosen

JoshRosen · 2015-04-01T20:35:17Z

Jenkins, retest this please.

SparkQA · 2015-04-01T20:38:27Z

Test build #29561 has started for PR 5206 at commit ea66f3d.

This patch merges cleanly.

SparkQA · 2015-04-01T22:30:48Z

Test build #29561 has finished for PR 5206 at commit ea66f3d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-04-01T22:30:52Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29561/
Test PASSed.

JoshRosen · 2015-04-02T00:23:38Z

LGTM, so I'm going to merge this into master (1.4.0) and branch-1.3 (1.3.1). Thanks!

Use `f.__repr__()` instead of `f.__name__` when instantiating `UserDefinedFunction`s, so `functools.partial`s may be used. Author: ksonj <kson@siberie.de> Closes #5206 from ksonj/partials and squashes the following commits: ea66f3d [ksonj] Inserted blank lines for PEP8 compliance d81b02b [ksonj] added tests for udf with partial function and callable object 2c76100 [ksonj] Makes UDFs work with all types of callables b814a12 [ksonj] support functools.partial as udf

PRs Merged 1. [Internal] Add AppleAwsClientFactory for Mascot (apache#577) 2. Hive: Log new metadata location in commit (apache#4681) 3. change timeout to 120 for now (apache#661) 4. Internal: Add hive_catalog parameter to SparkCatalog (apache#670) 5. Internal: Pull catalog setting to CachedClientPool (apache#673) 6. Core: Defer reading Avro metadata until ManifestFile is read (apache#5206) 7. API: Fix ID assignment in schema merging (apache#5395) 8. AWS: S3OutputStream - failure to close should persist on subsequent close calls (apache#5311) 9. API: Allow schema updates to find fields with case-insensitivity (apache#5440) 10. Spark 3.3: Spark mergeSchema to respect Spark Case Sensitivity Configuration (apache#5441)

support functools.partial as udf

b814a12

davies reviewed Mar 26, 2015
View reviewed changes

Makes UDFs work with all types of callables

2c76100

added tests for udf with partial function and callable object

d81b02b

Inserted blank lines for PEP8 compliance

ea66f3d

asfgit closed this in 757b2e9 Apr 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6553] [pyspark] Support functools.partial as UDF #5206

[SPARK-6553] [pyspark] Support functools.partial as UDF #5206

ksonj commented Mar 26, 2015

AmplabJenkins commented Mar 26, 2015

yishenggudou commented Mar 26, 2015

davies Mar 26, 2015

davies commented Mar 26, 2015

ksonj commented Mar 27, 2015

SparkQA commented Mar 27, 2015

davies commented Mar 27, 2015

SparkQA commented Mar 27, 2015

JoshRosen commented Mar 27, 2015

ksonj commented Mar 30, 2015

JoshRosen commented Mar 30, 2015

SparkQA commented Mar 30, 2015

SparkQA commented Mar 30, 2015

AmplabJenkins commented Mar 30, 2015

JoshRosen commented Mar 30, 2015

ksonj commented Mar 31, 2015

JoshRosen commented Apr 1, 2015

SparkQA commented Apr 1, 2015

SparkQA commented Apr 1, 2015

AmplabJenkins commented Apr 1, 2015

JoshRosen commented Apr 2, 2015

[SPARK-6553] [pyspark] Support functools.partial as UDF #5206

[SPARK-6553] [pyspark] Support functools.partial as UDF #5206

Conversation

ksonj commented Mar 26, 2015

AmplabJenkins commented Mar 26, 2015

yishenggudou commented Mar 26, 2015

davies Mar 26, 2015

Choose a reason for hiding this comment

davies commented Mar 26, 2015

ksonj commented Mar 27, 2015

SparkQA commented Mar 27, 2015

davies commented Mar 27, 2015

SparkQA commented Mar 27, 2015

JoshRosen commented Mar 27, 2015

ksonj commented Mar 30, 2015

JoshRosen commented Mar 30, 2015

SparkQA commented Mar 30, 2015

SparkQA commented Mar 30, 2015

AmplabJenkins commented Mar 30, 2015

JoshRosen commented Mar 30, 2015

ksonj commented Mar 31, 2015

JoshRosen commented Apr 1, 2015

SparkQA commented Apr 1, 2015

SparkQA commented Apr 1, 2015

AmplabJenkins commented Apr 1, 2015

JoshRosen commented Apr 2, 2015