-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow autoderef and autoref in operators #2147
Conversation
cc @cramertj |
So, as the drawbacks section says:
Passing something by value when it was expected to be passed by reference can lead to severe performance issues. Passing something by reference when it was expected to be passed by value can also lead to severe performance issues. And both can make the behavior of code less transparent to the person reading it. (Please note: I acknowledge that having to add In an effort to mitigate this: are there ways we could provide appropriate lints for potentially problematic cases? This RFC addresses the technical issues that made the previous RFC technically problematic to implement. I don't feel like it addresses the remaining issues for whether we should do this, and what protections we should provide to attempt to mitigate potential issues this will cause. |
@joshtriplett Can you give a concrete example? The RFC provides a concrete example of code that is made significantly less comprehensible by not having this feature, what is an example of something that would be made less comprehensible? As far as I can see, this would only really trigger if you only have by-ref impls. Its true, you won't know if an operation using a type you don't know is by ref or by value, but why is that so critical to know that its not enough to look up the impl? |
I'd be a little more concerned about |
If we allow them, they will only be used in pathologically unclear code. If we don't allow them, no code that is not pathologically unclear would miss them. Well, except for the "DSL maker" school that tries to overload syntax in every way possible. I'm not sure what's our best option there. |
text/0000-eye-of-sauron.md
Outdated
|
||
Lexicographic ordering feels like the right way to pick the adjustment list, but it might have unexpected edge cases which might justify a more complicated ordering. | ||
|
||
### Extended arithmetic fiixup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/fiixup/fixup/
@arielb1 I guess someone could make I'm finding the RFC a little unclear about this point. When you say:
Does that just mean the bindings involved need to be |
Not sure whether that's an argument for or against
It means that all the overloaded derefs/indexes in the middle need to implement |
I'll try to clarify things tomorrow. |
Your knowledge of this stuff is impressive! The way this compares to #2111 should be clarified. Although the RFCs share similar motivations they only overlap in obscure cases and solve orthogonal issues. What do you think of the alternative of waiting for intersection impls so that impl<T: Borrow<U>, U> Borrow<U> for &T Then people would implement their operators like: impl<T: Borrow<i32>, U: Borrow<i32>> Add<T> for U This would solve many common cases. I wonder if it would also work for indexing. |
Just fyi there are some analysis and benchmarks of using tl;dr It might work, but it requires considerable care, which makes writing math libraries harder. Also if autojazz works reasonably then it might leave more room for optimization. |
text/0000-eye-of-sauron.md
Outdated
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
Like methods, operators and indexing support automatic referencing and dereferencing. When you use an operator, the compiler will automatically add `&`, `&mut` and `*` operators to match the signature of the operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to include &mut
here? The alternatives section mentions "Mutable autorefs ... is a bad idea".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was left over from an old version of the RFC.
|
||
In this case, these are merely annoying papercuts. In some other cases, they can be a much worse problem. | ||
|
||
One of thim is Isis Lovecruft's "Eye of Sauron" case, which is a problem when using non-`Copy` bignums: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/thim/them/
?
BTW I think linking to https://twitter.com/isislovecruft/status/839333846415998976 may provide some context what is meant by Isis Lovecruft's "Eye of Sauron" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to nit pick, there are no bignums in curve25519, only 256bit numbers, which do not count as bignums because they do not use Box<>
. Also, those numbers could easily be Copy
so if they are not it's presumably Isis and Henry found that 32 bytes is already big enough for Copy
to cause a performance penalty
text/0000-eye-of-sauron.md
Outdated
|
||
Both the LHS and the RHS (if binary) of the operator are first type-checked with no expected type. | ||
|
||
This differs from rustc 1.20, in which an expected type was sometimes propagated from the LHS into the RHS, potentially triggering coercions within the RHS. I should probably come up with an example in which this matters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one's probably a crazy example, but the RHS is definitely dependent on the LHS and the case is rather horrible:
let byte = 0u8;
let word = 1u16;
let x = word + &byte.into();
println!("{}", x);
This needs the &
for $reasons
Found in rust-lang/rust-clippy#2042
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one's probably a crazy example, but the RHS is definitely dependent on the LHS and the case is rather degenerate:
That's a trait-system issue. Operator overloading does succeed in this case. Without the autoref, you get these trait bounds:
u16: Add<$0>
u8: Into<$0>
Because the compiler only considers each trait bound alone, it can't see that the only solution is $0 = u16
.
If you add a reference, instead yoou get
u16: Add<&'a $0>
u8: Into<$0>
And on the first bound, only the impl Add<&u16> for u16
impl matches, which allows a solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also won't coerce: &1 + &mut 1
. As soon as you have two different implementations for the RHS it stops working. That coercion only fires in such rare situations that I wonder if it's existence is by design or an accident of implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case it's orthogonal to this RFC.
This won't solve the "Eye of Sauron" case with strings etc., because you need an autoref for that. |
@arielb1 I don't follow, we have |
@leodasvacas But without autoref, operators would still take their operands by value, and |
Also, with this RFC, |
@arielb1 I get it now, thanks! For types that are not |
text/0000-eye-of-sauron.md
Outdated
``` | ||
"Deref"* "Autoref(Immutable)" "ConvertArrayToSlice"? | ||
``` | ||
|
||
Adjustment lists for all other operands (including the RHS of indexing operations) are selected from these matching the following regular expression | ||
Adjustment lists for all other operands (including the RHS of indexing operator, as well as all operands of all other operators) are selected from these matching the following regular expression | ||
``` | ||
"Deref"* ( "Autoref(Immutable)" "ConvertArrayToSlice"? )? | ||
``` | ||
|
||
The adjustment lists selected are the lexicographically first pair of adjustment lists `(lhs_adjust, rhs_adjust)` (or with an unary op, just the `lhs_adjust`) such that | ||
A1. Both adjustment lists match the relevant regular expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw this list does not render well in markdown. Perhaps add some bullets?
I'm basically in favor of this. It seems to be essentially the 'obvious extension' of method-call dispatch to operator overloading. But I want to raise a question about possible programming patterns just to make the argments in favor more specific. When I work with
So now if you want to "escape" a local variable or parameter (e.g., to store in a field), you do Do you think that this pattern would apply to bignums? Where are the painpoints when you do this? Presumably one would implement In any case, as a historical note, the (Actually, before my time, @joshtriplett I am definitely wary of adding too much autoref sugar in general and finding that there are performance pathologies created, or just introducing plain confusion for users (as an example, when I teach ownership transfer, I always get a question about why That said, I do think that the damage here is somewhat mitigated by the fact that one must provide the impls to be used in the first place -- if you think a type is too expensive to pass by reference, or should only be passed by reference -- that is within your power to enforce (using newtypes, if necessary), no? |
@joshtriplett It'd be super helpful to get two or three concrete examples (i.e. code snippets) where this arises and you feel it's important to be alerted. |
The previous naming was ugly.
I'd imagine that If otoh you're working with non-crypto bignums, then your type might looks roughly like
so you'd want to pass by value. If you are working with crpytographic bignums for RSA, then you'll want roughly either a As an aside, |
Now with method dispatch documented. Turns out to be ugly and short. |
This happens to me also. But you haven't really explained why you want to do this. With this RFC, like the match RFC, the end goal is really to get to a place where you just don't have to go through this edit-compile-debug cycle over errors that are not semantically significant to your program. Its especially worth keeping in mind, with this RFC, that we're just applying a subset of the coercions we apply to method receivers to operators. That is, today you can write: let foo: i32 = 100;
foo.is_positive() We don't make you write: let foo: i32 = 100;
(&foo).is_positive() Even though In fact, the vec.iter().filter(|x| x.is_positive())
vec.iter().filter(|x| x > 0) But the second will not compile, and instead you'll have to do something like: vec.iter().filter(|x| **x > 0) This seems quite silly to me. |
@withoutboats The difference is "not semantically significant" in the same sense that the difference between I managed to find the blog post I was looking for: https://medium.com/@robertgrosse/how-copying-an-int-made-my-code-11-times-faster-f76c66312e0f This is the kind of problem that I think autoderef and autoref will make far more prevalent. |
@rfcbot fcp merge (We'll keep talking through this issue of course @joshtriplett, but I wanted to prep this with a disposition befoer the lang team meeting tomorrow.) |
Team member @withoutboats has proposed to merge this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
(emphasis added) Both of the bolded portions of your comment seem exaggerated to me. Let me try to organize several interwoven lines of argument for why I think this will not have a significant negative performance impact. The micro-optimization opportunities are infrequentFirst, let's define the scenario concretely. You have code like this: vec.iter().filter(|x| x > 0) In addition:
In other words, there's two very narrowing parameters here: you could perform the micro-optimization to remove the indirection, and it actually impacts performance to do it. I'm more likely to, by necessity or laziness, just write this: vec.iter().filter(|&&x| x > 0) In other words, the code the compiler would have inserted for me anyway. If you knew you should perform this optimization, you're likely to do it regardlessThe scenario you're describing is that because you received an error message, you realize that you can perform a micro-optimization. But this seems a little farfetched to me: most importantly, how do you know that its a micro-optimization, instead of a micro-pessimization? If you already know that removing the indirection will be faster, why did you introduce it in the first place? I see two scenarios in which you might know about the performance impact:
You might, today, remove indirection when you get errors like this, and that's fine. But are you really doing that for good reason, or based on a hunch that it will probably be faster? It just seems bad to me to introduce a re-edit cycle in order to get you to think about the performance of an indirection. A re-edit cycle has a huge cost, and if the performance impact of something like this really matters, you'll know some other way. The compiler can optimize many cases alreadyI believe that in a lot of these cases the difference between the possibilities (especially when you mention something like removing a reference through a match) is going to be a wash for the compiler. I believe these dereferences are largely reorderable, removable, and optimizable, and LLVM will only get better at doing this for you over time. The scenarios impacted by this RFC are not extremely commonLastly, as I discussed previously, we already do this kind of autoref and autoderef. We currently do it for method receivers, this only extends that behavior to most operators. This means the only time you get these kind of "prompting errors" today that you wouldn't after this RFC is when you use a binary operator on the indirected value. If instead you called methods on it, you get no error, and if you used it in other ways that give you an error today, that won't change as a result of this RFC. So talk about "far more prevalent" seems like a large exaggeration. Now, to take a contrary approach and try to lay out reasons why I think this is not only not bad but very good. :-) We have carefully benchmarked code in the wild that will benefit from this RFCToday, curve25519-dalek has concrete examples of nigh-unreadable code because they have found that passing their FieldNumber type by value has a serious deleterious performance impact. We have a clear example of production code that's trying to be competitive with handwritten assembly - the absolute high end for Rust performance - and which is hurt by our current semantics. If they weren't competing with handwritten assembly, the authors of curve25519-dalek might've been satisfied with the performance of by-value operations on their field numbers & not taken on the high syntactic burden of performing operations by reference. After this RFC, people writing high performance math code will be more likely to actually benchmark the difference between by-ref and by-value code than they are today (they'll just have to change the impls to get the different codegen, rather than throwing syntactic salt into every use site). As a result, this RFC could actually improve the performance in crates where it really counts. Users who don't need to care don't get botheredOn the other end of the spectrum, as I've demonstrated there are many cases where this "prompting error" is not helpful:
So, today, we have an error being introduced that users can usually fix only one way, and even if they could fix it the other way, it won't matter to them - either because the performance difference is not significant, or because performance is not really critical in their application (especially if they're still learning Rust, and systems programming through Rust). This error is just annoying to advanced users (which is bad), but to new users it can really be the end of their Rust experience. A lot of the "borrowing frustration" especially new users run against is not necessarily complex lifetime situations, but just this kind of "type tetris" juggling. Mitigating it can be a high impact way of easing the onboarding experience for people coming to Rust, which is probably our biggest user complaint. |
I really think this can help, but I also have a concern that's been growing on me recently, which is that it seems the various RFCs (like Match Ergonomics) are adding different ways of doing coercion in different places, and I wonder if these amount to local optimizations that make it hard for users to maintain a global understanding of what coercion is generally available. My recent example is this: It looks like this RFC will make "operators" more like the I find the function argument case particularly important because as it is, I find myself to be a bit wary of creating abstractions, because too often it means I lose some coercion that means I have to add hacks that make my code harder to read. |
Yes, I'd assume the current RFCs will incur a much longer path to stabilization, due to there being so many of them. Ideally, any doing funny business with |
(Signed off for @cramertj, who is on vacation) |
As per the commentary here and in the related coercions RFC, the lang team has decided to close this RFC in favor of experimentation. That is, while we think this design is very plausible, there are a number of related changes being considered, and we'd like to land these all behind feature gates and gain experience with them, before coming back after the impl period with fresh RFCs. Thanks @arielb1 for writing this so well and so quickly; I suspect much of the text will wind up in the final RFC. |
Rendered