Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42191][SQL] Support udf 'luhn_check' #39747

Closed
wants to merge 1 commit into from

Conversation

vinodkc
Copy link
Contributor

@vinodkc vinodkc commented Jan 25, 2023

What changes were proposed in this pull request?

This PR adds a built-in function to check if a given number string is a valid Luhn number. It shall return true if the number string is a valid Luhn number, and false otherwise.

Why are the changes needed?

This checksum function is widely applied to credit card numbers and government identification numbers to distinguish valid numbers from mistyped, incorrect numbers
Ref : Trino
Postgresql

Does this PR introduce any user-facing change?

Yes, new udf luhn_check

How was this patch tested?

Added test cases

@github-actions github-actions bot added the SQL label Jan 25, 2023
@vinodkc vinodkc changed the title [SPARK-40686][SQL] Support udf 'luhn_check' [SPARK-42191][SQL] Support udf 'luhn_check' Jan 25, 2023
@vinodkc
Copy link
Contributor Author

vinodkc commented Jan 25, 2023

Hi @dtenedor, @srielau, @HyukjinKwon @cloud-fan , @gengliangwang
Could you please review this PR?

@dtenedor
Copy link
Contributor

The general algorithm and test coverage look correct

@vinodkc vinodkc force-pushed the br_udf_luhn_check branch 2 times, most recently from 80e6dbc to bdba53c Compare January 26, 2023 02:02
@vinodkc vinodkc force-pushed the br_udf_luhn_check branch 3 times, most recently from 906622a to ccf8243 Compare January 27, 2023 22:47
@vinodkc vinodkc force-pushed the br_udf_luhn_check branch 4 times, most recently from ff77d09 to 23095da Compare January 30, 2023 03:13
@srowen
Copy link
Member

srowen commented Jan 30, 2023

I'm not super against this, but this seems too niche to put into the Spark API

@dtenedor
Copy link
Contributor

Hi @srowen I get where you're coming from. For background, we have a Jira [1] to add a suite of data masking functions into Spark. This is one of a family of such proposed functions. The idea is to support ETL with redaction of original datasets to make it safer for sharing.

[1] https://issues.apache.org/jira/browse/SPARK-40686

@srowen
Copy link
Member

srowen commented Jan 30, 2023

Right, I don't think most of those belong in the Spark API. This is what UDF and utility libs are for

@dtenedor
Copy link
Contributor

@srowen yeah, we should apply discretion for those functions on that list. There's something to be said for having a built-in library of functions that are useful enough for general-purpose consumption, but not everything belongs in there.

Maybe for this particular case we can let it through as it is in fact present in Postgres and Trino (as linked in the PR description) but we could do a review of the others and make go/no-go decisions on each one together. WDYT?

Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please address comments and add requested tests from @cloud-fan before merging ✅

@vinodkc vinodkc force-pushed the br_udf_luhn_check branch 2 times, most recently from 3df9507 to 899555d Compare January 31, 2023 04:41
@vinodkc vinodkc force-pushed the br_udf_luhn_check branch 2 times, most recently from e0c0d6c to b7af875 Compare January 31, 2023 05:22
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in b0ac061 Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants