Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MAX_UTF{8, 16}_LEN constants #98198

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions library/alloc/src/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
#![stable(feature = "rust1", since = "1.0.0")]

#[cfg(not(no_global_oom_handling))]
use core::char::{decode_utf16, REPLACEMENT_CHARACTER};
use core::char::{decode_utf16, MAX_UTF8_LEN, REPLACEMENT_CHARACTER};
use core::fmt;
use core::hash;
#[cfg(not(no_global_oom_handling))]
Expand Down Expand Up @@ -1221,7 +1221,7 @@ impl String {
pub fn push(&mut self, ch: char) {
match ch.len_utf8() {
1 => self.vec.push(ch as u8),
_ => self.vec.extend_from_slice(ch.encode_utf8(&mut [0; 4]).as_bytes()),
_ => self.vec.extend_from_slice(ch.encode_utf8(&mut [0; MAX_UTF8_LEN]).as_bytes()),
}
}

Expand Down Expand Up @@ -1528,7 +1528,7 @@ impl String {
#[stable(feature = "rust1", since = "1.0.0")]
pub fn insert(&mut self, idx: usize, ch: char) {
assert!(self.is_char_boundary(idx));
let mut bits = [0; 4];
let mut bits = [0; MAX_UTF8_LEN];
let bits = ch.encode_utf8(&mut bits).as_bytes();

unsafe {
Expand Down Expand Up @@ -2497,7 +2497,7 @@ impl<T: fmt::Display + ?Sized> ToString for T {
impl ToString for char {
#[inline]
fn to_string(&self) -> String {
String::from(self.encode_utf8(&mut [0; 4]))
String::from(self.encode_utf8(&mut [0; MAX_UTF8_LEN]))
}
}

Expand Down
4 changes: 2 additions & 2 deletions library/alloc/tests/str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1217,7 +1217,7 @@ fn test_to_uppercase_rev_iterator() {
#[test]
#[cfg_attr(miri, ignore)] // Miri is too slow
fn test_chars_decoding() {
let mut bytes = [0; 4];
let mut bytes = [0; std::char::MAX_UTF8_LEN];
for c in (0..0x110000).filter_map(std::char::from_u32) {
let s = c.encode_utf8(&mut bytes);
if Some(c) != s.chars().next() {
Expand All @@ -1229,7 +1229,7 @@ fn test_chars_decoding() {
#[test]
#[cfg_attr(miri, ignore)] // Miri is too slow
fn test_chars_rev_decoding() {
let mut bytes = [0; 4];
let mut bytes = [0; std::char::MAX_UTF8_LEN];
for c in (0..0x110000).filter_map(std::char::from_u32) {
let s = c.encode_utf8(&mut bytes);
if Some(c) != s.chars().rev().next() {
Expand Down
10 changes: 10 additions & 0 deletions library/core/src/char/methods.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,16 @@ impl char {
#[stable(feature = "assoc_char_consts", since = "1.52.0")]
pub const UNICODE_VERSION: (u8, u8, u8) = crate::unicode::UNICODE_VERSION;

/// The maximum number of bytes required to [encode](char::encode_utf8) a
/// `char` in UTF-8.
#[stable(feature = "max_len", since = "1.63.0")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should go in unstable with a tracking issue, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could but i thought since these were constants that replaced literal values we could directly stabilise them. I'm fine with adding unstable feature if that's the way to go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's basically three possible ways forward:

  1. This is an internal code readability change, so the consts are pub(crate) and it goes in with a T-libs review.
  2. This is a new API going in unstable, so it can go in with just a review on a "seems reasonable" basis.
  3. This goes in insta-stable, at which point is needs a full T-libs-api FCP.

Basically we only do 3 if forced (because some things can't go in unstable). So please make it unstable with a tracking issue, and then it can just be approved. (Consider also doing some commit squashing.)

pub const MAX_UTF8_LEN: usize = 4;

/// The maximum number of 16-bit code units required to
/// [encode](char::encode_utf16) a `char` in UTF-16.
#[stable(feature = "max_len", since = "1.63.0")]
pub const MAX_UTF16_LEN: usize = 2;

/// Creates an iterator over the UTF-16 encoded code points in `iter`,
/// returning unpaired surrogates as `Err`s.
///
Expand Down
11 changes: 11 additions & 0 deletions library/core/src/char/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,17 @@ pub const REPLACEMENT_CHARACTER: char = char::REPLACEMENT_CHARACTER;
#[stable(feature = "unicode_version", since = "1.45.0")]
pub const UNICODE_VERSION: (u8, u8, u8) = char::UNICODE_VERSION;

/// The maximum number of bytes required to [encode](char::encode_utf8) a
/// `char` in UTF-8. Use [`char::MAX_UTF8_LEN`] instead.
#[stable(feature = "max_len", since = "1.63.0")]
pub const MAX_UTF8_LEN: usize = char::MAX_UTF8_LEN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pondering: are the module-level constants still desirable, now that we can have type-associated ones? It's unclear to me how to apply precedent from #68490 here -- the char module certainly can't go away. But if the doc comment on this constant says not to use it, why add it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeh i didn't think of type-associated ones, will change it to that


/// The maximum number of 16-bit code units required to
/// [encode](char::encode_utf16) a `char` in UTF-16. Use [`char::MAX_UTF16_LEN`]
/// instead.
#[stable(feature = "max_len", since = "1.63.0")]
pub const MAX_UTF16_LEN: usize = char::MAX_UTF16_LEN;

/// Creates an iterator over the UTF-16 encoded code points in `iter`, returning
/// unpaired surrogates as `Err`s. Use [`char::decode_utf16`] instead.
#[stable(feature = "decode_utf16", since = "1.9.0")]
Expand Down
6 changes: 3 additions & 3 deletions library/core/src/fmt/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#![stable(feature = "rust1", since = "1.0.0")]

use crate::cell::{Cell, Ref, RefCell, RefMut, SyncUnsafeCell, UnsafeCell};
use crate::char::EscapeDebugExtArgs;
use crate::char::{EscapeDebugExtArgs, MAX_UTF8_LEN};
use crate::marker::PhantomData;
use crate::mem;
use crate::num::fmt as numfmt;
Expand Down Expand Up @@ -161,7 +161,7 @@ pub trait Write {
/// ```
#[stable(feature = "fmt_write_char", since = "1.1.0")]
fn write_char(&mut self, c: char) -> Result {
self.write_str(c.encode_utf8(&mut [0; 4]))
self.write_str(c.encode_utf8(&mut [0; MAX_UTF8_LEN]))
}

/// Glue for usage of the [`write!`] macro with implementors of this trait.
Expand Down Expand Up @@ -2225,7 +2225,7 @@ impl Display for char {
if f.width.is_none() && f.precision.is_none() {
f.write_char(*self)
} else {
f.pad(self.encode_utf8(&mut [0; 4]))
f.pad(self.encode_utf8(&mut [0; MAX_UTF8_LEN]))
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion library/core/src/str/iter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1410,7 +1410,7 @@ impl<'a> Iterator for EncodeUtf16<'a> {
return Some(tmp);
}

let mut buf = [0; 2];
let mut buf = [0; char::MAX_UTF16_LEN];
self.chars.next().map(|ch| {
let n = ch.encode_utf16(&mut buf).len();
if n == 2 {
Expand Down
3 changes: 2 additions & 1 deletion library/core/src/str/pattern.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
issue = "27721"
)]

use crate::char::MAX_UTF8_LEN;
use crate::cmp;
use crate::fmt;
use crate::slice::memchr;
Expand Down Expand Up @@ -540,7 +541,7 @@ impl<'a> Pattern<'a> for char {

#[inline]
fn into_searcher(self, haystack: &'a str) -> Self::Searcher {
let mut utf8_encoded = [0; 4];
let mut utf8_encoded = [0; MAX_UTF8_LEN];
let utf8_size = self.encode_utf8(&mut utf8_encoded).len();
CharSearcher {
haystack,
Expand Down
4 changes: 2 additions & 2 deletions library/core/tests/char.rs
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ fn test_escape_unicode() {
#[test]
fn test_encode_utf8() {
fn check(input: char, expect: &[u8]) {
let mut buf = [0; 4];
let mut buf = [0; char::MAX_UTF8_LEN];
let ptr = buf.as_ptr();
let s = input.encode_utf8(&mut buf);
assert_eq!(s.as_ptr() as usize, ptr as usize);
Expand All @@ -275,7 +275,7 @@ fn test_encode_utf8() {
#[test]
fn test_encode_utf16() {
fn check(input: char, expect: &[u16]) {
let mut buf = [0; 2];
let mut buf = [0; char::MAX_UTF16_LEN];
let ptr = buf.as_mut_ptr();
let b = input.encode_utf16(&mut buf);
assert_eq!(b.as_mut_ptr() as usize, ptr as usize);
Expand Down
2 changes: 1 addition & 1 deletion library/std/src/fs/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ fn file_test_io_non_positional_read() {
#[test]
fn file_test_io_seek_and_tell_smoke_test() {
let message = "ten-four";
let mut read_mem = [0; 4];
let mut read_mem = [0; char::MAX_UTF8_LEN];
let set_cursor = 4 as u64;
let tell_pos_pre_read;
let tell_pos_post_read;
Expand Down
6 changes: 3 additions & 3 deletions library/std/src/sys/windows/stdio.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#![unstable(issue = "none", feature = "windows_stdio")]

use crate::char::decode_utf16;
use crate::char::{decode_utf16, MAX_UTF8_LEN};
use crate::cmp;
use crate::io;
use crate::os::windows::io::{FromRawHandle, IntoRawHandle};
Expand All @@ -27,7 +27,7 @@ pub struct Stderr {
}

struct IncompleteUtf8 {
bytes: [u8; 4],
bytes: [u8; MAX_UTF8_LEN],
len: u8,
}

Expand Down Expand Up @@ -377,7 +377,7 @@ fn utf16_to_utf8(utf16: &[u16], utf8: &mut [u8]) -> io::Result<usize> {

impl IncompleteUtf8 {
pub const fn new() -> IncompleteUtf8 {
IncompleteUtf8 { bytes: [0; 4], len: 0 }
IncompleteUtf8 { bytes: [0; MAX_UTF8_LEN], len: 0 }
}
}

Expand Down
4 changes: 2 additions & 2 deletions library/std/src/sys_common/wtf8.rs
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ impl Wtf8Buf {
/// Copied from String::push
/// This does **not** include the WTF-8 concatenation check.
fn push_code_point_unchecked(&mut self, code_point: CodePoint) {
let mut bytes = [0; 4];
let mut bytes = [0; char::MAX_UTF8_LEN];
let bytes = char::encode_utf8_raw(code_point.value, &mut bytes);
self.bytes.extend_from_slice(bytes)
}
Expand Down Expand Up @@ -878,7 +878,7 @@ impl<'a> Iterator for EncodeWide<'a> {
return Some(tmp);
}

let mut buf = [0; 2];
let mut buf = [0; char::MAX_UTF16_LEN];
self.code_points.next().map(|code_point| {
let n = char::encode_utf16_raw(code_point.value, &mut buf).len();
if n == 2 {
Expand Down