-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42191][SQL] Support udf 'luhn_check' #39747
Conversation
Hi @dtenedor, @srielau, @HyukjinKwon @cloud-fan , @gengliangwang |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
Outdated
Show resolved
Hide resolved
The general algorithm and test coverage look correct |
80e6dbc
to
bdba53c
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Outdated
Show resolved
Hide resolved
bdba53c
to
57c9c74
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
Outdated
Show resolved
Hide resolved
906622a
to
ccf8243
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Outdated
Show resolved
Hide resolved
ff77d09
to
23095da
Compare
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Show resolved
Hide resolved
I'm not super against this, but this seems too niche to put into the Spark API |
Hi @srowen I get where you're coming from. For background, we have a Jira [1] to add a suite of data masking functions into Spark. This is one of a family of such proposed functions. The idea is to support ETL with redaction of original datasets to make it safer for sharing. |
Right, I don't think most of those belong in the Spark API. This is what UDF and utility libs are for |
@srowen yeah, we should apply discretion for those functions on that list. There's something to be said for having a built-in library of functions that are useful enough for general-purpose consumption, but not everything belongs in there. Maybe for this particular case we can let it through as it is in fact present in Postgres and Trino (as linked in the PR description) but we could do a review of the others and make go/no-go decisions on each one together. WDYT? |
23095da
to
ca8f807
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please address comments and add requested tests from @cloud-fan before merging ✅
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
Outdated
Show resolved
Hide resolved
ca8f807
to
207f391
Compare
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
Outdated
Show resolved
Hide resolved
3df9507
to
899555d
Compare
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
Outdated
Show resolved
Hide resolved
e0c0d6c
to
b7af875
Compare
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
Outdated
Show resolved
Hide resolved
b7af875
to
6044c00
Compare
thanks, merging to master! |
What changes were proposed in this pull request?
This PR adds a built-in function to check if a given number string is a valid Luhn number. It shall return true if the number string is a valid Luhn number, and false otherwise.
Why are the changes needed?
This checksum function is widely applied to credit card numbers and government identification numbers to distinguish valid numbers from mistyped, incorrect numbers
Ref : Trino
Postgresql
Does this PR introduce any user-facing change?
Yes, new udf
luhn_check
How was this patch tested?
Added test cases