-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-16 Support #914
Comments
... And here's one in little endian with a byte order mark (Specifically, |
If we support UTF-16, someone will want it to support UTF-16LE, then Windows-8859, and before long someone's asking for KOI8-R. Now we need some function, Now say you get a match. Better print out the line that matches, right? But it would be more accurate to print out the UTF-16 bytes, but that will probably break on your terminal (which almost universally expect UTF-8). And, say your run
well, I would say, a reasonable feature tl;dr - Encodings other than UTF-8 are fundamentally anti-unix. Just save your files as UTF-8. |
I'm not sure that posix terminal encoding being UTF-8 discounts the existance of use-cases for UTF-16 files, though they're definitely rare. I mean, sure, supporting another unicode encoding is a slippery slope to a navajo translation layer, but UTF-16 files are the preferred format for non-english localization files, which happen to be part of our codebase. Not all use-cases for grep-like tools are on small git repositories you have full control over :( It would be nice, at least, if encountering a BOM emitted a warning that the file was skipped instead of silently searching it as UTF-8. When I originally made this bug it was because I spent some time confused as to why I couldn't locate a string that I knew was there before I back-filled that it was in a localization file and ag would silently ignore it. I do agree that it is a pretty low priority request |
I don't think this is the right way to do that. The performance hit would be huge. |
Encoding a file as UTF-16, either little or big endian, with or without a BOM, results in AG treating it as binary.
I've uploaded a test utf16.txt file here:
https://nemu.pointysoftware.net/sink/utf16.txt
The text was updated successfully, but these errors were encountered: