Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6553] [pyspark] Support functools.partial as UDF #5206

Closed
wants to merge 4 commits into from

Conversation

ksonj
Copy link
Contributor

@ksonj ksonj commented Mar 26, 2015

Use f.__repr__() instead of f.__name__ when instantiating UserDefinedFunctions, so functools.partials may be used.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@yishenggudou
Copy link

_

@@ -123,7 +123,7 @@ def _create_judf(self):
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command, self)
ssql_ctx = sc._jvm.SQLContext(sc._jsc.sc())
jdt = ssql_ctx.parseDataType(self.returnType.json())
judf = sc._jvm.UserDefinedPythonFunction(f.__name__, bytearray(pickled_command), env,
judf = sc._jvm.UserDefinedPythonFunction(f.__repr__(), bytearray(pickled_command), env,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try to get the name from it? Having a friendly name will help debugging.

name = f.__name__ if hasattr(f, “__name__") else f.__class__.__name__

@davies
Copy link
Contributor

davies commented Mar 26, 2015

Jenkins, OK to test

@ksonj
Copy link
Contributor Author

ksonj commented Mar 27, 2015

Hi, I like

name = f.__name__ if hasattr(f, “__name__") else f.__class__.__name__

as that should even work for oldstyle callables.

@SparkQA
Copy link

SparkQA commented Mar 27, 2015

Test build #628 has started for PR 5206 at commit b814a12.

  • This patch merges cleanly.

@davies
Copy link
Contributor

davies commented Mar 27, 2015

@ksonj LGTM, waiting for jenkins.

ping @JoshRosen

@SparkQA
Copy link

SparkQA commented Mar 27, 2015

Test build #628 has finished for PR 5206 at commit b814a12.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

LGTM. I suppose that we could add a regression test for this, which might be as simple as calling udf() on a partial function. This could go into pyspark/sql/tests.py in the SQLTests class (I'd add a new method named test_udf_with_partial_function; here's a link to the relevant part of the test suite:

def test_udf(self):
).

Also, I noticed that this was opened against branch-1.3. That's fine for now, but in general we should open these sorts of pull requests against master, since this patch should be applied for both 1.4.0 and 1.3.1.

@ksonj
Copy link
Contributor Author

ksonj commented Mar 30, 2015

I've added two tests for UDFs with partial functions and callable objects. Thanks for the hint, I'll open future PRs against master then.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29416 has started for PR 5206 at commit d81b02b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29416 has finished for PR 5206 at commit d81b02b.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29416/
Test FAILed.

@JoshRosen
Copy link
Contributor

Doh, pep8 is complaining about blank lines:

=========================================================================
Running Python style checks
=========================================================================
PEP 8 checks failed.
./python/pyspark/sql/tests.py:123:9: E301 expected 1 blank line, found 0
./python/pyspark/sql/tests.py:136:9: E301 expected 1 blank line, found 0
[error] Got a return code of 1 on line 134 of the run-tests script.

@ksonj
Copy link
Contributor Author

ksonj commented Mar 31, 2015

Fixed that. @JoshRosen

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29561 has started for PR 5206 at commit ea66f3d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29561 has finished for PR 5206 at commit ea66f3d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29561/
Test PASSed.

@JoshRosen
Copy link
Contributor

LGTM, so I'm going to merge this into master (1.4.0) and branch-1.3 (1.3.1). Thanks!

asfgit pushed a commit that referenced this pull request Apr 2, 2015
Use `f.__repr__()` instead of `f.__name__` when instantiating `UserDefinedFunction`s, so `functools.partial`s may be used.

Author: ksonj <kson@siberie.de>

Closes #5206 from ksonj/partials and squashes the following commits:

ea66f3d [ksonj] Inserted blank lines for PEP8 compliance
d81b02b [ksonj] added tests for udf with partial function and callable object
2c76100 [ksonj] Makes UDFs work with all types of callables
b814a12 [ksonj] support functools.partial as udf
@asfgit asfgit closed this in 757b2e9 Apr 2, 2015
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
PRs Merged
1. [Internal] Add AppleAwsClientFactory for Mascot (apache#577)
2. Hive: Log new metadata location in commit (apache#4681)
3. change timeout to 120 for now (apache#661)
4. Internal: Add hive_catalog parameter to SparkCatalog (apache#670)
5. Internal: Pull catalog setting to CachedClientPool (apache#673)
6. Core: Defer reading Avro metadata until ManifestFile is read (apache#5206)
7. API: Fix ID assignment in schema merging (apache#5395)
8. AWS: S3OutputStream - failure to close should persist on subsequent close calls (apache#5311)
9. API: Allow schema updates to find fields with case-insensitivity (apache#5440)
10. Spark 3.3: Spark mergeSchema to respect Spark Case Sensitivity Configuration (apache#5441)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants