Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid O(N) string allocations in escapeContent #18

Closed
InPermutation opened this issue Oct 4, 2023 · 4 comments
Closed

Avoid O(N) string allocations in escapeContent #18

InPermutation opened this issue Oct 4, 2023 · 4 comments

Comments

@InPermutation
Copy link
Contributor

Like in #16, escapeContent is rebuilding the entire input string character-by-character, and can be greatly optimized by using a single .replace(/RegExp/g).

@nbouvrette
Copy link
Contributor

@InPermutation I had some time this morning, and I think my fix in version 3.2.25 should improve the performance of escapeContent - let me know if you think we can improve it further.

@InPermutation
Copy link
Contributor Author

Looks like it will be much faster to me, though I haven't --perf'd it.

  1. You could conditionally remove |[^\u0020-\u007E] from the RegExp when !escapeUnicode, which would avoid matching on those characters when you're going to just return them as-is.
  2. I think there's a bug in case ' ' - it looks like it will escape all spaces if the first character is a space. You can take a second parameter on your replacer function named position and keep the original escapeSpace || position === 0 ? '\\ ' : ' ' logic. Or, maybe just have a second .replace(/^ /, "\\ ") outside the main .replace(/[\s!#:=\\] …, and then you can conditionally remove space from the RegExp when !escapeSpace, similar to point 1.

@nbouvrette nbouvrette reopened this Oct 5, 2023
@nbouvrette
Copy link
Contributor

Ok cool I re-opened the issue and will look into these!

@nbouvrette
Copy link
Contributor

@InPermutation please take a look at version 3.3.0 which should provide an even more optimized solution. I also added a follow up fix in 3.3.1.

The main challenge with .properties files is that there isn't any official specification. Regarding how spaces are handled, based on the Java implementation, I've observed:

  1. Spaces in all keys must be escaped to be preserved.
  2. Only the first leading space in values must be escaped to be preserved.

You were right about the bug; it did escape all leading spaces in values. Fortunately, it didn't necessarily cause a problem because they would just be converted back to normal spaces. With the latest changes, it should now escape only the first leading space in values.

The regular expression should also be optimized based on the escapeUnicode value.

I'm fairly confident about closing this issue, but please let me know if you spot anything I might have missed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants