remove invisible Unicode special characters from search keywords #16483
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The change made in 9a859d9 (for #16430) is a bit too aggressive and breaks the search for multibyte character languages (e.g. Japanese). The search keywords are not stored in the database as expected (e.g. setting the entry title to 山田 will result in an empty keyword being stored).
Instead of going all in with the removal of Unicode special chars, I used the same approach we’re using for sanitising filenames, which removes the invisible unicode chars (like soft hyphen, no break, zero width space, etc).
Related issues
#16430
#16457