Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better diagnostics when concatenating strings #39018

Closed
Manishearth opened this issue Jan 12, 2017 · 42 comments
Closed

Better diagnostics when concatenating strings #39018

Manishearth opened this issue Jan 12, 2017 · 42 comments
Labels
A-diagnostics Area: Messages for errors, warnings, and lints E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.

Comments

@Manishearth
Copy link
Member

A lot of people come from other languages and expect "a" + "b" to work, or x + y where x and y are both Strings.

We should be suggesting "a".to_string() + "b" in the first place, x + &y in the second, and in both cases also recommend using format!() instead.

Context: https://www.reddit.com/r/rust/comments/5nl3fk/rust_severely_disappoints_me/dccdi17/

@Manishearth Manishearth added the A-diagnostics Area: Messages for errors, warnings, and lints label Jan 12, 2017
@mgattozzi
Copy link
Contributor

I'd like to take a stab at this if I could get some guidance as to where things in the compiler need to be changed and how.

@Manishearth Manishearth added the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Jan 12, 2017
@Manishearth
Copy link
Member Author

Sure! What you need to do is find the place in the code which emits the error on "a" + "b" (hint: error code E0369)

Somewhere there you'll want to add an if checking that the operation is add, and then check that the LHS and RHS are both &str (we'll deal with the String + String/&str + String ones later). In that code, the Ty values (lhs_ty and rhs_ty_var) hold these types. They contain TypeVariants, and &str is basically TyRef(TyStr).

Then, span_note! about it. Once you have this going I can help convert the note into a suggestion.

@mgattozzi
Copy link
Contributor

I'll be home soon so I'll get working on it then

@Manishearth
Copy link
Member Author

Cool. You can ask me (or others) for help in #rust-internals if you need it.

@mgattozzi
Copy link
Contributor

I found where after a bit of prodding so I think I have this part but if I run into problems I'll definitely jump on #rust-internals

@andersk
Copy link
Contributor

andersk commented Jan 13, 2017

For the String + StringString + &String part, that suggestion actually applies to anything that wants &str and is incorrectly passed String instead of &String, right? I don’t think concatenation is special.

It could even apply to anything that wants &T and is incorrectly passed U instead of &U where U: Deref<T>, though maybe that’s trickier.

@Manishearth
Copy link
Member Author

Yeah. It's just that the concatenation stuff is handled separately.

I was planning on investigating this (or help someone investigate this) after this bug is fixed, by refactoring the functionality introduced here into a separate function that gets used in multiple error cases.

@mgattozzi
Copy link
Contributor

@Manishearth here's the WIP that I have now: https://github.com/mgattozzi/rust/commit/007d45926f920b51cfab9924f0ee6461f527a2aa

Here's the output though:

michael@eva-02 ~/Code/rustc (git)-[better-string-message] % ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc string.rs
_
error[E0369]: binary operation `+` cannot be applied to type `&str`
 --> string.rs:4:20
  |
4 |     println!("{}", s1.as_str() + s2.as_str());
  |                    ^^^^^^^^^^^
  |
note: You can't use `+` between a &str and &str
 --> string.rs:4:20
  |
4 |     println!("{}", s1.as_str() + s2.as_str());
  |                    ^^^^^^^^^^^

error: aborting due to previous error

It seems the rhs_ty_var doesn't have a type that can be matched against. Is there some way to check for it here?

@Manishearth
Copy link
Member Author

Manishearth commented Jan 14, 2017

Hmm, _ is a TyInfer. Does it work with string literals?

You also might want to restructure the code so that the "an implementation of {} might be missing for {}" doesn't need to be repeated. It's fine for now if that note is unconditionally emitted.

@mgattozzi
Copy link
Contributor

Let me try it out with string literals to see if I get something different.

@mgattozzi
Copy link
Contributor

No it doesn't work. It looks like unless we identify what TyInfer is we won't be able to check the type. This is probably due to the note in the source code here: https://github.com/mgattozzi/rust/blob/007d45926f920b51cfab9924f0ee6461f527a2aa/src/librustc_typeck/check/op.rs#L178-L183

@Manishearth
Copy link
Member Author

ugh. You could check the AST if it's a literal but that's not so great.

Would there be a way of extracting this info? @eddyb ?

@eddyb
Copy link
Member

eddyb commented Jan 15, 2017

Checking for a literal is okay-ish. You might be able to type-check the RHS before emitting the error, giving you the missing information. cc @nikomatsakis

@Manishearth
Copy link
Member Author

Yeah, type checking the RHS is what I want to do, but I don't know how 😄

@eddyb
Copy link
Member

eddyb commented Jan 15, 2017

It's this call. If I were you I would keep an Option<DiagnosticWhatever> around and add the note & emit after that RHS type-checking line.
You can also duplicate that line (there's a simpler form without the coercion) and return early or something.
Oh and the call returns the actual type of the expression.

@mgattozzi
Copy link
Contributor

@eddyb thanks for the tip. I'm going to work out a rough (but somewhat naive) implementation that works and then hopefully get that refined from there with some guidance.

@mgattozzi
Copy link
Contributor

@Manishearth My most recent commit got it working now. While it's a bit naive (we could return early) it's doing what's needed for now. I think we can move from here into the next few parts of the issue.

@Manishearth
Copy link
Member Author

Okay, now try to suggest using to_string() there. Check out how span_suggestion is used elsewhere.

Make the error message mention explicitly that the first operand must be a String for concatenation.

@mgattozzi
Copy link
Contributor

mgattozzi commented Jan 16, 2017

I've got it outputting a message but I'd like it to be the string literal with to_string() attached at the end. I'm unsure how to extract it from lhs_expr but my guess is it's a Symbol embedded within an Expr. Right now the message output with the expr input as a debug print is:

help: Your first argument should be a String, not &str. Try using `to_string()`
|            println!("{}", expr(34: "Hello").to_string() + " World");

How would I get it so that only the "Hello" part prints? Also having it underlined or with more spaces on the line for readability would be nice as well, but I think that's secondary compared to getting this part done right.

@Manishearth
Copy link
Member Author

Don't extract the string literal at all. There might not be one.

Just use span_to_snippet on the expr, and compose that with a .to_string() and the rest of the expression.

span_suggestion is used as err.span_suggestion(span_to_replace, "suggestion text", "replacement text"). So you use the span of the entire addition expr (expr.span), and construct the rest using span_to_snippet on the operand expressions.

@mgattozzi
Copy link
Contributor

Ahhh okay I see what you're saying after digging through some other code for examples. When I get this part working I'll let you know.

@mgattozzi
Copy link
Contributor

@Manishearth This gets it working to output the proper help message and works regardless of how many uses of + are used: https://github.com/mgattozzi/rust/commit/03ec8f1de28e9b9c3e1e3e8ab5d6e55c8e42de66

Not sure what to put for the error message when it's destructured in the match statement there. I think next would also be matching on things like String + String and &String + &str or &String + String right as well as some cleanup of formatting here?

@Manishearth
Copy link
Member Author

Yeah, probably. Refactoring this into a method might help. It's also okay to land with just this one suggestion with a compile-fail test and fix the rest in a separate PR. Regardless, you probably should clean it up a bit and make a WIP PR so that it's easier to leave inline comments.

Not sure what to put for the error message when it's destructured in the match statement there

I don't know what you mean here?

@mgattozzi
Copy link
Contributor

Okay sounds good. If you look here it's just a blank string. I'm not sure what to place here in the case where span_to_snippet would fail.

@Manishearth
Copy link
Member Author

Oh. Use a <expression> or don't emit the suggestion at all.

@mgattozzi
Copy link
Contributor

Ahhhh okay got it. I've split it all out into a separate method that just checks for the use of &str. I'm giving it another compile to be sure. When it comes to the test cases where should I add them and is there a good example similar to this somewhere?

@Manishearth
Copy link
Member Author

src/tests/compile-fail. Look at other tests in that folder, basically you can annotate errors and suggestions and stuff with // ~^. You can run an individual test with ./mach test src/tests/compile-fail -- --test-args name-of-test-file, I think.

@mgattozzi
Copy link
Contributor

Alright I'll play around with it!

@mgattozzi
Copy link
Contributor

@Manishearth I've cleaned it up and put in a test case. The PR is here #39116

@nikomatsakis
Copy link
Contributor

Nice work @mgattozzi!

I have to ask though -- maybe we should just make adding strings work like people expect? I think the "fear of allocations" is the only reason not to? If so, it seems a bit overblown to me personally. I feel like I often get annoyed at places that we've made it less convenient to concatenate and manipulate strings because of the cost of allocation -- but usually the code where I'm manipulating strings is not in my hot path, or even if it is I often find I need the allocations anyway, I just have to express them differently. That said, the only major example of this I can come up with something around the way that join() works and I can't remember the details. =)

@Manishearth
Copy link
Member Author

I am .... conflicted about that. I would not oppose it however.

Diagnostics is the path of least resistance here; not contingent on an rfc (I think this would be) and not controversial, so I went that path 😄 . Changing this with an impl works too.

Similarly, #39029 can be replaced by a language change, I just am not as convinced if I want that language change to want to make an rfc for it.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jan 17, 2017

@nikomatsakis

I have to ask though -- maybe we should just make adding strings work like people expect? I think the "fear of allocations" is the only reason not to?

I thought coherence was the main problem, not the fear? (You can't impl Add<str> for str in places where String is available.)
Having at least String + String and &str + String would certainly be nice.

@mgattozzi
Copy link
Contributor

Thank you @nikomatsakis!

Personally I like that Rust is explicit. Sure it's inconvenient to have to add a to_string() or a & or as_str() to get the desired behavior, but it makes one think about whether they really need to allocate memory or not. This isn't necessarily desirable for newer users, but that's why I think this issue, providing diagnostics and helpful error messages, is a better route long term then something like impl Add.

@nikomatsakis
Copy link
Contributor

@petrochenkov

I thought coherence was the main problem, not the fear? (You can't impl Add for str in places where String is available.)

Ah, this may be true. I feel like the inability to extend coherence to cover a set of crates is an annoyance I would like to overcome. :(


@mgattozzi

Personally I like that Rust is explicit.

Certainly there is something of a trade-off involved.

@Manishearth
Copy link
Member Author

Ah, this may be true. I feel like the inability to extend coherence to cover a set of crates is an annoyance I would like to overcome. :(

I've often wanted the ability to forward-declare an impl, saying something like

#[forward_declare(StringAddition)]
impl Add<str> for str {
    type Target = _;
    fn add(&self, other: &str) -> Self::Target;
}

This generates an instance of a new kind of item with item name StringAddition. It can be used for inherent impls as well. We could just have a different syntax for the item, after all all items have a syntax itemtype /* randomstuff */ {} but I'm not sure if this can be done without breaking the grammar.

and later,

#[fills_declaration(core::StringAddition)]
impl Add<str> for str {
    type Target = String;
    fn add(&self, other: &str) -> String { /* body */}
}

The system works the same way lang items do; these forward-declared impls are left dangling and can't be used until someone defines them, and you can only have one definition active (much like how redefining lang items doesn't work). However, from the point of view of coherence and conflict resolution, these impls do exist even if not defined (I'm not sure if any situation arises where this matters outside of #[fundamental] types).

Thus, the impl exists in two places at once. From the POV of coherence, it exists in libcore, but from the POV of codegen and type resolution, it exists in libstd.

The main purpose is to handle cases like these where you have multiple separate crates that are nevertheless maintained as a single unit. Coherence exists so that I, a Servo developer, am not forced to deal with error messages that must be fixed in, say, hyper (e.g. when I include some other crate and now we have conflicting impls that can't be fixed on the servo-side). But in this case the developers of libstd and libcore are the same and it's fine to punch through coherence for this. Similarly, Servo itself is split into crates for maintainability and compile times, but being able to punch through coherence in the context of servo is pretty useful. Being able to say "I know that this can break if upstream-crate-X changes things, but I am upstream-crate-X and want to do it anyway" is useful. We've had to merge crates before because coherence was making things problematic.

@eddyb
Copy link
Member

eddyb commented Jan 17, 2017

    type Target = _;

You don't have to do that, just don't mention it, and might as well do the same with the method.
This could be done by reusing partial impls, I think, as a coherence "beachhead" marker.

@Manishearth
Copy link
Member Author

You don't have to do that, just don't mention it

You will have to do something for inherent impls though. I guess you can just define the item name and leave it at that. fn foo; or something.

@eddyb
Copy link
Member

eddyb commented Jan 17, 2017

Well... inherent impls don't do anything other than wrap the items inside, so each one would have to be annotated, and it'd be a bit of a mess.

@rocallahan
Copy link

FWIW I think suggesting to_owned() instead of to_string() is more clear. "Hello".to_string() reads like a no-op.

@scottmcm
Copy link
Member

It feels to me like String + String is reasonable to allow, since they're consumed and people are already likely used to adding a & if they want to re-use one later. It would even let the implementation re-use whichever one has a more appropriate buffer, without forcing the user to pick &str + String or String + &str a priori.

That does mean that people might end up doing "a".to_owned() + "b".to_owned() unnecessarily, as naiively seems reasonable for symmetry. That's visible cost, though, so doesn't feel too worrisome. That said, a general "you could have just used the Deref target, no ToOwner needed" lint could be interesting to help undo such accidental pessimization.

@Manishearth
Copy link
Member Author

pre-rfc'd the forward declaration thing in https://internals.rust-lang.org/t/pre-rfc-forward-impls/4628

bors added a commit that referenced this issue Feb 2, 2017
Add clearer error message using `&str + &str`

This is the first part of #39018. One of the common things for new users
coming from more dynamic languages like JavaScript, Python or Ruby is to
use `+` to concatenate strings. However, this doesn't work that way in
Rust unless the first type is a `String`. This commit adds a check for
this use case and outputs a new error as well as a suggestion to guide
the user towards the desired behavior. It also adds a new test case to
test the output of the error.
@Manishearth
Copy link
Member Author

Fixed in #39116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.
Projects
None yet
Development

No branches or pull requests

9 participants