Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Respect formatting flags for str-like OsStr #43830

Merged
merged 1 commit into from
Aug 23, 2017

Conversation

alexcrichton
Copy link
Member

Historically many Display and Debug implementations for OsStr-like
abstractions have gone through String::from_utf8_lossy, but this was updated
in #42613 to use an internal Utf8Lossy abstraction instead. This had the
unfortunate side effect of causing a regression (#43765) in code which relied on
these fmt trait implementations respecting the various formatting flags
specified.

This commit opportunistically adds back interpretation of formatting trait flags
in the "common case" where where OsStr-like "thing" is all valid utf-8 and can
delegate to the formatting implementation for str. This doesn't entirely solve
the regression as non-utf8 paths will format differently than they did before
still (in that they will not respect formatting flags), but this should solve
the regression for all "real world" use cases of paths and such. The door's also
still open for handling these flags in the future!

Closes #43765

@alexcrichton alexcrichton added beta-nominated Nominated for backporting to the compiler in the beta channel. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 12, 2017
Copy link
Contributor

@vitiral vitiral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a good stopgap to me. I might want to take a shot at fixing this more thouroughly in the future.

@@ -153,7 +153,21 @@ impl<'a> Iterator for Utf8LossyChunksIter<'a> {

impl fmt::Display for Utf8Lossy {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
// If we're the empty string then our iterato won't actually yield
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterato -> iterator


#[test]
fn display_format_flags() {
assert_eq!(format!("a{:#<5}b", Path::new("").display()), "a#####b");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding these tests!

@vitiral
Copy link
Contributor

vitiral commented Aug 13, 2017

one of the issues with formatting non-utf8 strings (in the future) is what do we even do? Invalid characters are... invalid, so how many "spaces" do they take? (I assume zero)

@ollie27
Copy link
Member

ollie27 commented Aug 13, 2017

one of the issues with formatting non-utf8 strings (in the future) is what do we even do? Invalid characters are... invalid, so how many "spaces" do they take? (I assume zero)

You'd just need to match the behavior of:

rust/src/libcore/fmt/mod.rs

Lines 1111 to 1149 in adbce60

pub fn pad(&mut self, s: &str) -> Result {
// Make sure there's a fast path up front
if self.width.is_none() && self.precision.is_none() {
return self.buf.write_str(s);
}
// The `precision` field can be interpreted as a `max-width` for the
// string being formatted.
let s = if let Some(max) = self.precision {
// If our string is longer that the precision, then we must have
// truncation. However other flags like `fill`, `width` and `align`
// must act as always.
if let Some((i, _)) = s.char_indices().skip(max).next() {
&s[..i]
} else {
&s
}
} else {
&s
};
// The `width` field is more of a `min-width` parameter at this point.
match self.width {
// If we're under the maximum length, and there's no minimum length
// requirements, then we can just emit the string
None => self.buf.write_str(s),
// If we're under the maximum width, check if we're over the minimum
// width, if so it's as easy as just emitting the string.
Some(width) if s.chars().count() >= width => {
self.buf.write_str(s)
}
// If we're under both the maximum and the minimum width, then fill
// up the minimum width with the specified string + some alignment.
Some(width) => {
let align = rt::v1::Alignment::Left;
self.with_padding(width - s.chars().count(), align, |me| {
me.buf.write_str(s)
})
}
}
}
when given a string from to_string_lossy. So each replacement character just counts as 1.

Copy link
Member

@ollie27 ollie27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will also need to be applied to Wtf8 for it to work on Windows:

impl fmt::Display for Wtf8 {
fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
let wtf8_bytes = &self.bytes;
let mut pos = 0;
loop {
match self.next_surrogate(pos) {
Some((surrogate_pos, _)) => {
formatter.write_str(unsafe {
str::from_utf8_unchecked(&wtf8_bytes[pos .. surrogate_pos])
})?;
formatter.write_str(UTF8_REPLACEMENT_CHARACTER)?;
pos = surrogate_pos + 3;
},
None => {
formatter.write_str(unsafe {
str::from_utf8_unchecked(&wtf8_bytes[pos..])
})?;
return Ok(());
}
}
}
}
}

Historically many `Display` and `Debug` implementations for `OsStr`-like
abstractions have gone through `String::from_utf8_lossy`, but this was updated
in rust-lang#42613 to use an internal `Utf8Lossy` abstraction instead. This had the
unfortunate side effect of causing a regression (rust-lang#43765) in code which relied on
these `fmt` trait implementations respecting the various formatting flags
specified.

This commit opportunistically adds back interpretation of formatting trait flags
in the "common case" where where `OsStr`-like "thing" is all valid utf-8 and can
delegate to the formatting implementation for `str`. This doesn't entirely solve
the regression as non-utf8 paths will format differently than they did before
still (in that they will not respect formatting flags), but this should solve
the regression for all "real world" use cases of paths and such. The door's also
still open for handling these flags in the future!

Closes rust-lang#43765
@alexcrichton
Copy link
Member Author

Fixed typo and updated wtf-8 as well

@carols10cents
Copy link
Member

r? @nikomatsakis

@carols10cents carols10cents added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 14, 2017
@carols10cents
Copy link
Member

actually oops this is libs sooo

r? @aturon

@rust-highfive rust-highfive assigned aturon and unassigned nikomatsakis Aug 14, 2017
@alexcrichton
Copy link
Member Author

@bors: p=1

(beta backport)

@alexcrichton alexcrichton added the beta-accepted Accepted for backporting to the compiler in the beta channel. label Aug 22, 2017
@alexcrichton
Copy link
Member Author

Discussed and decided for backport today

@alexcrichton
Copy link
Member Author

We discussed as well that we should have a tracking bug for support all formatting flags here, so I've opened #44048

@aturon
Copy link
Member

aturon commented Aug 22, 2017

@bors: r+

@bors
Copy link
Contributor

bors commented Aug 22, 2017

📌 Commit 742ca0c has been approved by aturon

@bors
Copy link
Contributor

bors commented Aug 23, 2017

⌛ Testing commit 742ca0c with merge 036d625939a7cf53e708ec02aa847d2a62c235d1...

@bors
Copy link
Contributor

bors commented Aug 23, 2017

💔 Test failed - status-appveyor

@alexcrichton
Copy link
Member Author

@bors
Copy link
Contributor

bors commented Aug 23, 2017

⌛ Testing commit 742ca0c with merge 528307a...

bors added a commit that referenced this pull request Aug 23, 2017
std: Respect formatting flags for str-like OsStr

Historically many `Display` and `Debug` implementations for `OsStr`-like
abstractions have gone through `String::from_utf8_lossy`, but this was updated
in #42613 to use an internal `Utf8Lossy` abstraction instead. This had the
unfortunate side effect of causing a regression (#43765) in code which relied on
these `fmt` trait implementations respecting the various formatting flags
specified.

This commit opportunistically adds back interpretation of formatting trait flags
in the "common case" where where `OsStr`-like "thing" is all valid utf-8 and can
delegate to the formatting implementation for `str`. This doesn't entirely solve
the regression as non-utf8 paths will format differently than they did before
still (in that they will not respect formatting flags), but this should solve
the regression for all "real world" use cases of paths and such. The door's also
still open for handling these flags in the future!

Closes #43765
@bors
Copy link
Contributor

bors commented Aug 23, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: aturon
Pushing 528307a to master...

@bors bors merged commit 742ca0c into rust-lang:master Aug 23, 2017
bors added a commit that referenced this pull request Aug 23, 2017
[beta] Final round of beta backports

Includes:

* [x] #43723
* [x] #43830
* [x] #43844
* [x] #44013
* [x] #44049
* [x] #43948
@alexcrichton alexcrichton removed the beta-nominated Nominated for backporting to the compiler in the beta channel. label Aug 23, 2017
@alexcrichton alexcrichton deleted the path-display-regression branch September 1, 2017 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beta-accepted Accepted for backporting to the compiler in the beta channel. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants