Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 2169: Euclidean Modulo #49048

Closed
1 of 2 tasks
alexcrichton opened this issue Mar 15, 2018 · 39 comments · Fixed by #61884
Closed
1 of 2 tasks

Tracking issue for RFC 2169: Euclidean Modulo #49048

alexcrichton opened this issue Mar 15, 2018 · 39 comments · Fixed by #61884
Labels
B-RFC-implemented Blocker: Approved by a merged RFC and implemented. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

alexcrichton commented Mar 15, 2018

Tracking issue for rust-lang/rfcs#2169, euclidean modulo methods on numbers

Steps:

@alexcrichton alexcrichton added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Mar 15, 2018
@varkor
Copy link
Member

varkor commented Mar 16, 2018

@fanzier had expressed interest in implementing this RFC as a good starter issue (most of the details are included in the RFC), so I'd be happy to mentor this issue if they want to tackle it!

@KodrAus
Copy link
Contributor

KodrAus commented Apr 13, 2018

A question came up in the initial implementation PR about shifting the euclidean modulo methods for f32/f64 into libcore rather than libstd.

bors pushed a commit that referenced this issue Apr 13, 2018
bors added a commit that referenced this issue Apr 13, 2018
Implement RFC #2169 (Euclidean modulo).

Tracking issue: #49048
@sdroege
Copy link
Contributor

sdroege commented Apr 18, 2018

euc as the function name seems a bit short. I didn't see that one before elsewhere, is it used in any other language? A longer, more descriptive name would seem useful.

@scottmcm
Copy link
Member

@sdroege See conversations in the RFC, such as rust-lang/rfcs#2169 (comment)

@lambda-fairy
Copy link
Contributor

@huonw made this argument for a longer name, which I find convincing:

Using just euc as a suffix seems a bit like Rust's abbreviation gone too far. For a common word like Vec, it makes sense in a huffman-encoding sense (length should be inversely proportional to frequency) and "vector" is common enough that it's relatively self-explanatory. But, performing Euclidean division is much less common, and "Euclidean" isn't that common as a word to begin with. At least for mod, it could be mod_positive, but something like div_euclidean/mod_euclidean does pair nicer.

@varkor
Copy link
Member

varkor commented Apr 18, 2018

@sdroege, @lfairy: I think the plan was to take another look at the method names during the FCP for these methods (which won't be for a while at least) — because there was some disagreement about what to prioritise for the naming here. With some actual usage statistics, it might be easier to form arguments one way or the other.

@xen0n
Copy link
Contributor

xen0n commented Apr 26, 2018

Just my 2c: the suffix euc is woefully confusing for Asian users where "EUC" usually refers instead to a family of legacy charsets like EUC-JP or EUC-CN. Considering the expected frequency of usage a longer name might be more appropriate -- we'll have to see.

@pietroalbini
Copy link
Member

There is an issue about the future compatibility lint of this: #50232

@varkor
Copy link
Member

varkor commented Apr 26, 2018

@pietroalbini: I think that was an issue about the lint in general, rather than specifically about mod_euc; I've made a comment on the issue to clarify. Thanks!

@Boscop
Copy link

Boscop commented Apr 30, 2018

I really like the short name (I've been using this function a lot in my code, also had named it mod_euc). Even knowing that "EUC" also refers to charsets, in this case it's disambiguated by the mod_ part of the function name :)
And anyone who stumbles across this function in 3rdparty code will see that it is called with numbers as args, so they can easily infer that it is "probably some kind of modulo" even if they don't know about euclidean modulo, but then they will look it up and find out!
So IMO there's not really a high chance for confusion with the "EUC" charsets..

@clarfonthey
Copy link
Contributor

I didn't follow the original RFC, but I think that the methods should be div_euc and rem_euc for consistency with the Rem trait.

@clarfonthey
Copy link
Contributor

Another thing to note: while it has been mentioned before that having a div_rem method wouldn't make too much sense considering how LLVM automatically optimises nearby x / y and x % y calls to one operation, but I think it would be worthwhile to have a div_rem_euc method which is more efficient.

Although I don't know how far down the rabbit hole we'd want to go with that, considering how adding every combination of checked, wraping, and overflowing for div_rem_euc and div_rem would be around 8 new methods.

@demurgos
Copy link

demurgos commented May 4, 2018

Regarding terminology, I believe that mod (for modulo) was specifically chosen to refer the mathematical modulo and avoid confusion with the rem (remainder) operation available currently. The choice to avoid rem_euc was deliberate.

@clarfonthey
Copy link
Contributor

@demurgos Haskell uses the distinction quot/rem for the Euclidean versions and div/mod for the non-euclidean versions. I personally don't like this distinction and find it hard to remember, which is why I accept Rust's decision to simply append _euc to indicate Euclidean versions.

But if we're going to change rem to mod and not div to quot, that seems too odd to me. And besides, the ship has sailed and Rust is stuck with rem instead of mod. So, I'd say that rem_euc is a better name.

@clarfonthey
Copy link
Contributor

clarfonthey commented May 4, 2018

Also, I messed around with the code and div_rem_euc would be:

let (q, r) = (self / rhs, self % rhs);
if r < 0 {
    if rhs > 0 {
        (q - 1, r - rhs)
    } else {
        (q + 1, r + rhs)
    }
} else {
    (q, r)
}

versus the existing:

let q = self / rhs;
let q = if self % rhs < 0 {
    if rhs > 0 { q - 1 } else { q + 1 }
} else {
    q
};
let r = self % rhs;
let r = if r < 0 {
    if rhs < 0 {
        r - rhs
    } else {
        r + rhs
    }
} else {
    r
};
(q, r)

And I haven't run any benchmarks, but it seems pretty clear that even if LLVM can combine the existing division and remainder into one operation, it probably won't be smart enough to combine the branches. But, if we can write the latter in a way that optimises to the former, I think it'd be reasonable to say that a divrem method isn't necessary.

@tspiteri
Copy link
Contributor

tspiteri commented May 4, 2018

I don't think it should be hard for LLVM to find the common subexpressions. In fact, I get the same assembly for div_mod_euc and separate_div_mod_euc below; the only difference I found was in the labels.

pub fn div_mod_euc(lhs: i32, rhs: i32) -> (i32, i32) {
    let (q, r) = (lhs / rhs, lhs % rhs);
    if r < 0 {
        if rhs > 0 {
            (q - 1, r - rhs)
        } else {
            (q + 1, r + rhs)
        }
    } else {
        (q, r)
    }
}
pub fn separate_div_mod_euc(lhs: i32, rhs: i32) -> (i32, i32) {
    (div_euc(lhs, rhs), mod_euc(lhs, rhs))
}
pub fn div_euc(lhs: i32, rhs: i32) -> i32 {
    let (q, r) = (lhs / rhs, lhs % rhs);
    if r < 0 {
        if rhs > 0 {
            q - 1
        } else {
            q + 1
        }
    } else {
        q
    }
}
pub fn mod_euc(lhs: i32, rhs: i32) -> i32 {
    let r = lhs % rhs;
    if r < 0 {
        if rhs > 0 {
            r - rhs
        } else {
            r + rhs
        }
    } else {
        r
    }
}

@varkor
Copy link
Member

varkor commented May 5, 2018

Haskell uses the distinction quot/rem for the Euclidean versions and div/mod for the non-euclidean versions. I personally don't like this distinction and find it hard to remember

Even though it can be annoying to have to remember the distinction, it's more helpful to follow convention and at least be unambiguous than mix terminology confusingly. Rust originally used Quot and Rem too, but Quot was later renamed to Div because it was more familiar. Unfortunately, this leaves us in an awkward state where one of the operators is named inconsistently. It's too late to change that now, though, so I think the best we can do is try to be correct going forward — hopefully the suffix makes it clear something different is going on.

Another thing to note: while it has been mentioned before that having a div_rem method wouldn't make too much sense considering how LLVM automatically optimises nearby x / y and x % y calls to one operation, but I think it would be worthwhile to have a div_rem_euc method which is more efficient.

I was under the impression LLVM doesn't optimise this as much as it could, but I haven't checked in a while, so that may be outdated. I think it could be worth having a dedicated method for this, and it's worth proposing (it wasn't part of the original RFC just to avoid too much bloat).

Although I don't know how far down the rabbit hole we'd want to go with that, considering how adding every combination of checked, wraping, and overflowing for div_rem_euc and div_rem would be around 8 new methods.

Yeah, this method propagation is unfortunate, and it's unclear how to completely avoid it. Maybe just by adding the simplest methods first and then adding the others if/when people request them?

@XAMPPRocky XAMPPRocky added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Jun 26, 2018
@anirudhb
Copy link
Contributor

I would like to work on stabilizing this. One question: would it be stable since 1.27.1 or the next version?

@pietroalbini
Copy link
Member

You should use the current nightly version when stabilizing (at the moment 1.29.0).

@anirudhb
Copy link
Contributor

OK, thanks! I'll open a PR soon.

@anirudhb
Copy link
Contributor

anirudhb commented Jul 20, 2018

I've closed my PR because there is still discussion about naming the functions.

@varkor varkor removed the B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. label Jul 29, 2018
@daboross
Copy link
Contributor

daboross commented Nov 9, 2018

@clarcharr do you think you could elaborate on your concerns in #52547 (comment)?

It seems to me that the consensus in the general programming community is for -5 remainder x being negative and -5 modulo y being positive. As other's have said in this thread, the distinction was purposeful.


There's was already quite a bit of discussion in rust-lang/rfcs#2169 that came up with the current names, mod_euc and div_euc. If the arguments there don't provide enough rationale for the current names, we could discuss more here. But if they're convincing enough to justify the current names, might I move to stabilize this again?

@strega-nil
Copy link
Contributor

strega-nil commented Nov 9, 2018

@demurgos This is not a "mod operation", mathematically, this is a remainder operation. "mod operation" doesn't make sense - modular arithmetic is about equivalences, not operators.

a ~ b mod m means that ∃k ∈ ℤ s.t. k (a - b) = m. There is no mathematical x mod m operation. There's a reason that the division algorithm calls it r - ∀a, b ∈ ℤ, ∃! q, r ∈ ℤ s.t. (0 ≤ r < b ∨ b < r ≤ 0) ∧ aq + r = b. It's the remainder.

Notably, also - mod has no qualms about how negative b is, and it's utterly fine to say, for example, that 128 ~ -128 (mod 256), which is how 2s complement arithmetic works.

@strega-nil
Copy link
Contributor

strega-nil commented Nov 9, 2018

Also, the names are too short - div_euclidean and rem_euclidean are much better names.

@varkor
Copy link
Member

varkor commented Nov 9, 2018

This is not a "mod operation", mathematically, this is a remainder operation. "mod operation" doesn't make sense - modular arithmetic is about equivalences, not operators.

There is such a thing as a mod(ulo) operation. It's analogous to modular arithmetic, but it's a distinct concept.

Also, the names are too short

This is exactly the point under discussion. Personally, I find the full _euclidean suffix to be too verbose, considering it's the most common choice for negative operands in many contexts (the RFC thread contains many examples).

@strega-nil
Copy link
Contributor

strega-nil commented Nov 9, 2018

@varkor note the first paragraph - "In computing, the modulo operation". Note also the ambiguity around sign - calling it "modulo" doesn't make it positive. Remainder is what the rest of the ecosystem calls it, we should continue calling it remainder.

@daboross Afaict, only languages like Haskell make this distinction - C still calls it the modulo operator, but a % b is between 0 and b, not between 0 and |b|.

@varkor
Copy link
Member

varkor commented Nov 9, 2018

I'm not sure why the mathematics versus programming distinction should be made, but even then, the definition for programming languages seems like the most appropriate anyway.

As you say, the precise definition is language-specific anyway, but:

Remainder is what the rest of the ecosystem calls it, we should continue calling it remainder.

Rust chose the name "remainder" specifically because it produced negative results for negative dividends. This is either described in the commit history or the mailing list, but I don't have time to dig it up right now. (If I recall correctly, it used to be called "modulo" until it was changed for this reason.)

@strega-nil
Copy link
Contributor

strega-nil commented Nov 9, 2018

@varkor that doesn't change the fact that even in CS, "mod" does not imply positive. It's only used that way, afaict, in Haskell - other languages use mod to just mean remainder.

You'll note, for example, that C++'s std::fmod does something completely different - fmod(-8, 3) = -2, having the same sign as the dividend, not the divisor, and not positive.

@clarfonthey
Copy link
Contributor

I personally think that _euclid is a good suffix; it's not as verbose as _euclidean while still feeling natural and unambiguous. Additionally, even as a mathematically inclined person, I don't immediately read "euc" as "Euclidean," whereas I feel that even non-mathematically-inclined people would read "euclid" as "Euclidean."

As far as the distinction between "modulo" and "remainder," I feel like the most important distinction is that Rust currently uses "rem" to refer to the remainder of truncated division, rather than "mod." Because div_euc does not use any weird terminology compared to the regular div, it feels inconsistent to use mod instead of rem for mod_euc.

So to clarify, I think we should name the methods to div_euclid and rem_euclid.

That way, we have truncated division + remainder and Euclidean divison + remainder, rather than truncated division + remainder and Euclidean division + modulo.

@clarfonthey
Copy link
Contributor

Side note-- I really think that Rust should have somewhere in the documentation for Div that division is truncated, rather than floored. I don't know all of the places where this would be relevant, but I feel like having that officially clarified is important, considering how C used to have floored division and then changed.

@strega-nil
Copy link
Contributor

Can I fix the names of these functions? Will that be accepted?

Centril added a commit to Centril/rust that referenced this issue Dec 22, 2018
rename div_euc -> div_euclid, and mod_euc -> rem_euclid

logic is written up in rust-lang#49048

Also, update the documentation slightly.

cc @alexcrichton @clarcharr @varkor
kennytm added a commit to kennytm/rust that referenced this issue Dec 22, 2018
rename div_euc -> div_euclid, and mod_euc -> rem_euclid

logic is written up in rust-lang#49048

Also, update the documentation slightly.

cc @alexcrichton @clarcharr @varkor
@aaronfranke
Copy link

What about a modulus operator, perhaps %%, such that:

5 %% 3 returns 2
-5 %% 3 returns 1
5 %% -3 returns -1
-5 %% -3 returns -2

The rule is that the output wraps around the second argument on a range [0, b). If the first argument is already on said range it will be output with no change, such that -2 %% -3 returns -2.

@starblue
Copy link

starblue commented Feb 3, 2019

I just noticed that the naming discussion is not closed (my build failed due to the renaming), so here are my 2 cents:
I would much prefer the suffix _floor over _euclid, i.e. to name them div_floor and rem_floor.

  • _floor expresses precisely how the division behaves, and we might add other useful variants like _ceil (round up) and _round (round to nearest integer) that follow the same scheme later.
  • _euclid can mean many things. There is a concept of Euclidean ring, which only requires the existence of a division with remainder so that the remainder is smaller than the divisor in a suitable ordering. In such a ring the Euclidean algorithm works, hence the name. All three variants I mentioned would be Euclidean in this sense, and there are many more examples if you go beyond integers.

I agree with the choice of rem over mod, because that is consistent with std::ops::Rem.

@varkor
Copy link
Member

varkor commented Feb 3, 2019

I just noticed that the naming discussion is not closed

I think as of #56936, consensus was reached.

I would much prefer the suffix _floor over _euclid

As mentioned in the RFC, flooring division/modulo is different from Euclidean division/modulo (as described in this paper), which is why the name was chosen.

I'd rather avoid any more bikeshedding and put the functions forward for stabilisation.

@starblue
Copy link

starblue commented Feb 4, 2019

Ah, I see, I missed that it behaves differently for negative divisors.

urbanautomaton added a commit to urbanautomaton/raygrass that referenced this issue Apr 22, 2019
This required a fair bit of faffing, and I'm not completely sure I know
what I'm really doing with the boxing. However! My first polymorphism
and traits. I should probably read the book properly, but who's got time
for that.

Some fun also since rust modulo arithmetic seems to be non-intuitive,
and the euclidean versions aren't stable yet.

https://github.com/rust-lang/rfcs/blob/master/text/2169-euclidean-modulo.md
rust-lang/rust#49048
@crlf0710
Copy link
Member

I think this is a good candidate for stabilization? Could any one propose a fcp for it? @SimonSapin @alexcrichton

@est31
Copy link
Member

est31 commented Jun 16, 2019

@crlf0710 you can just file a PR for stabilization, should speed up the process.

Centril added a commit to Centril/rust that referenced this issue Jul 25, 2019
…ntril

Stablize Euclidean Modulo (feature euclidean_division)

Closes rust-lang#49048
@cmcqueen
Copy link

Is there a trait for mod_euclid(), for use with generics constraints?

@lambda-fairy
Copy link
Contributor

I don't think so. The general policy for std is to do the minimum needed for operator overloading, and leave anything else to external crates (like num).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-RFC-implemented Blocker: Approved by a merged RFC and implemented. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.