Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48584][SQL] Perf improvement for unescapePathName #46938

Closed
wants to merge 3 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jun 11, 2024

What changes were proposed in this pull request?

This PR improves perf for unescapePathName with algorithms briefly described as:

  • If a path contains no '%' or contains '%' at position > path.length-2, we return the original identity instead of creating a new StringBuilder to append char by char
  • Otherwise, we loop with 2 indices, plaintextStartIdx which starts from 0 and then points to the next char after resolving %xx, and plaintextEndIdx which points to the next '%'. plaintextStartIdx moves to plaintextEndIdx + 3 if %xx is valid, or moves to plaintextEndIdx + 1 if %xx is invalid.
  • Instead of using Integer.parseInt with error capture, we identify the high and low characters manually.

Why are the changes needed?

performance improvement for hotspots

Does this PR introduce any user-facing change?

no

How was this patch tested?

  • new tests in ExternalCatalogUtilsSuite
  • Benchmark results (9-11x faster)

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Jun 11, 2024
@yaooqinn yaooqinn changed the title [SPARK-48584][SQL]Perf improvement for unescapePathName [SPARK-48584][SQL] Perf improvement for unescapePathName Jun 11, 2024
@yaooqinn
Copy link
Member Author

@beliefer
Copy link
Contributor

Could you provide some micro benchmark?

@yaooqinn
Copy link
Member Author

Could you provide some micro benchmark?

What do you mean by some micro benchmark? Are sql/catalyst/benchmarks/EscapePathBenchmark-results.txt and sql/catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt not sufficient ?

@beliefer
Copy link
Contributor

Oh, I see.

@LuciferYang
Copy link
Contributor

It seems that there is a typo in the pr description:

Instead of using Integer.parseInt with error capture, we identify the high and low chararaters manually.

chararaters -> characters

@yaooqinn
Copy link
Member Author

Thank you @LuciferYang

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM
Thanks @yaooqinn

}
var plaintextEndIdx = path.indexOf('%')
val length = path.length
if (plaintextEndIdx == -1 || plaintextEndIdx + 2 > path.length) {
Copy link
Contributor

@beliefer beliefer Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if (plaintextEndIdx == -1 || plaintextEndIdx + 2 > path.length || path.lastIndexOf('%') == 0) ?

Copy link
Member Author

@yaooqinn yaooqinn Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|| path.lastIndexOf('%')

I'm sorry, I'm having a little trouble understanding your suggestion. Can you please clarify?

@yaooqinn yaooqinn closed this in da81d8e Jun 12, 2024
@yaooqinn yaooqinn deleted the SPARK-48584 branch June 12, 2024 08:40
@yaooqinn
Copy link
Member Author

Thank you @LuciferYang, I added a nit in the last commit and passed the test locally, so I didn't wait for the CI.

Merged to master

beliefer added a commit that referenced this pull request Jun 13, 2024
### What changes were proposed in this pull request?
This PR follows up #46938 and improve the `unescapePathName`.

### Why are the changes needed?
Improve the `unescapePathName` by cut off slow path.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
GA.

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #46957 from beliefer/SPARK-48584_followup.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: beliefer <beliefer@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants