Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encountering issues with Chinese URL encoding on some websites #398

Open
Keyezi opened this issue Dec 27, 2024 · 0 comments
Open

Encountering issues with Chinese URL encoding on some websites #398

Keyezi opened this issue Dec 27, 2024 · 0 comments

Comments

@Keyezi
Copy link

Keyezi commented Dec 27, 2024

The following content was expressed using machine translation. If there are any errors or issues, please let me know for supplementation. Thank you.

When I search for '宠物'(pets) on this website and enable cleanURLs, it can cause garbled URLs, indirectly leading to the inability to search properly.

https://s.1688.com/selloffer/offer_search.htm?keywords=%B3%E8%CE%EF

This is the result that should appear under normal circumstances when searching:
image

When I enabled cleanURLs, it seemed that they were not properly 'url encoded' and were accompanied by some classic Chinese encoding issues. Please refer to the article I provided at the bottom for more details.
image——————————————————————————————————————

锟斤拷 garbled code problem

https://zh.wikipedia.org/wiki/%E4%B8%AD%E6%96%87%E4%BA%82%E7%A2%BC#%E9%94%9F%E6%96%A4%E6%8B%B7%E4%B9%B1%E7%A0%81%E9%97%AE%E9%A2%98

When converting between Unicode encoding and Simplified Chinese encoding systems (such as GB 2312, GBK, GB 18030, CP936), some Simplified Chinese encoded text does not exist in Unicode encoding, and Unicode records it as an "unrecognized character (U+FFFD)" as an internal code, while it is represented as EF BF BD in UTF-8. When multiple EF BF BDs appear consecutively and are interpreted in Simplified Chinese encoding, they will be parsed as multiple "锟斤拷".
The encoding of the three characters is 锟(0xEFBF), 斤(0xBDEF), and 拷 (0xBFBD).
——————————————————————————————————————————————————

URL Encode

https://en.wikipedia.org/wiki/Percent-encoding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant