-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rollup merge of #95967 - CAD97:from-utf16, r=dtolnay
Add explicit-endian String::from_utf16 variants This adds the following APIs under `feature(str_from_utf16_endian)`: ```rust impl String { pub fn from_utf16le(v: &[u8]) -> Result<String, FromUtf16Error>; pub fn from_utf16le_lossy(v: &[u8]) -> String; pub fn from_utf16be(v: &[u8]) -> Result<String, FromUtf16Error>; pub fn from_utf16be_lossy(v: &[u8]) -> String; } ``` These are versions of `String::from_utf16` that explicitly take [UTF-16LE and UTF-16BE](https://unicode.org/faq/utf_bom.html#gen7). Notably, we can do better than just the obvious `decode_utf16(v.array_chunks::<2>().copied().map(u16::from_le_bytes)).collect()` in that: - We handle the case where the byte slice is not an even number of bytes, and - In the case that the UTF-16 is native endian and the slice is aligned, we can forward to `String::from_utf16`. If the Unicode Consortium actively defines how to handle character replacement when decoding a UTF-16 bytestream with a trailing odd byte, I was unable to find reference. However, the behavior implemented here is fairly self-evidently correct: replace the single errant byte with the replacement character.
- Loading branch information
Showing
1 changed file
with
150 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters