RFC: Generic integers #2581

clarfonthey · 2018-10-28T04:40:59Z

🖼️ Rendered

📝 Summary

Adds the builtin types uint<N> and int<N>, allowing integers with an arbitrary size in bits. For now, restricts N ≤ 128.

💖 Thanks

To everyone who helped on the internals thread to review this RFC, particularly @Centril, @rkruppe, @scottmcm, @comex, and @ibkevg.

mark-i-m · 2018-10-28T05:45:48Z

I like the idea, and I would really love to have efficient and ergonomic strongly-typed bitfields. However, this proposal feels too magical for my taste; there is to much stuff built in to the compiler. I would rather expose a single simple primitive that allows implementing arbitrarily sized ints efficiently.

Just a half-baked idea: We only build a Bit type into the language, which is guaranteed to be 1 bit large (though it must be padded in a struct unless you use repr(packed)). All of the other integer types are defined as follows:

#[repr(packed)]
struct int<const Width: usize> {
  bits: [Bit; Width],
}

#[repr(packed)]
struct uint<const Width: usize> {
  bits: [Bit; Width],
}

with the appropriate operations implemented via efficient bit-twiddling methods or compiler intrinsics for performance.

Ekleog · 2018-10-28T06:58:59Z

(independently from the remark above) This RFC spends quite some time explaining the memory layout of `int<N>` and `uint<N>` types. Is user code allowed to rely on this memory layout, or is it not? Intuitively it'd be better if the layout was not actually defined, to allow for further optimizations, but the current text makes me believe it is defined. The one case where layout would be mandatory would be for a not-yet-written `#[repr(bitpacked)]` RFC, in my opinion. Oh, and

At the time of writing, no known programming language offers this level of integer generalisation, and if this RFC were to be accepted, Rust would be the first.

Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)

Centril · 2018-10-28T09:13:48Z

Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)

Well, you can represent a Fin : Nat -> Type type constructor in dependent typing pretty easily, but those are quite inefficient...

Centril · 2018-10-28T09:19:38Z

text/0000-generic-int.md

+
+## Primitive behaviour
+
+The compiler will have two new built-in integer types: `uint<N>` and `int<N>`,


Nit: these are two new built-in integer type families or type constructors. You really get 256 new types, not two.

I agree with you but I also wonder if this is the language most people would prefer to use. For example, would you consider Vec<T> to also be a family of types, or just a generic type?

Well, the type constructor is really Vec and not Vec<T>; but "two new built-in generic integer types" is I think clear enough.

Centril · 2018-10-28T09:20:37Z

text/0000-generic-int.md

+The compiler will have two new built-in integer types: `uint<N>` and `int<N>`,
+where `const N: usize`. These will alias to existing `uN` and `iN` types if `N`
+is a power of two and no greater than 128. `usize` and `isize` remain separate
+types due to coherence issues, and `bool` remains separate from `uint<1>` as it


Elaborate on what those coherence issues are for unfamiliar readers?

Traits can be implemented separately & differently for u32, u64, usize, etc. so unifying usize with uN for appropriate N would cause overlapping impls.

I may not be able to get to this by the weekend but I will try to remember to elaborate more on this.

Centril · 2018-10-28T09:21:53Z

text/0000-generic-int.md

+For example, this means that `uint<48>` will take up 8 bytes and have an
+alignment of 8, even though it only has 6 bytes of data.
+
+`int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and


store/stores, pick one :)

Store was a typo. :p

Centril · 2018-10-28T09:24:04Z

text/0000-generic-int.md

+In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via
+`as`, whereas most integer types can only be cast from `bool`.
+
+For the moment, a monomorphisation error will occur if `N > 128`, to minimise


This is my biggest concern; I don't think monomorphization of this magnitude errors belong in the language and I would like to see this changed before stabilization.

Would you be willing to elaborate more on this? Why are these errors a problem?

The nature of Rust's bounded polymorphism is that type checking should be modular so that you can check a generic function separately from the instantiation and then the instantiation of any parameters satisfying bounds should not result in errors.

This makes for more local error messages (you won't have post monomorphization errors in some location far removed from where instantiation happened...), fewer surprises (because this is how polymorphism works everywhere else in the type system) and possibly also better performance.
Another benefit of avoiding post monomorphization errors is that the need to monomorphize as an implementation strategy is lessened. That said, there are instances where the compiler will cause post monomorphization errors, but those are extremely unlikely to occur in actual code. In the case of N > 128 it is rather quite likely.

The general principle is that you declare up front the requirements (with bounds, etc.) to use / call an object and then you don't impose new and hidden requirements for certain values.

If you want to impose N > 128, then that should be explicitly required in the "signatures", e.g. you should state struct uint<const N: usize> where N <= 128 { .. } (and on the impls...). Otherwise, it should work for all N: usize evaluable at compile time.

e.g. you should state struct uint<const N: usize> where N <= 128 { .. } (and on the impls...)

Is this possible with any currently planned or near future const generics? I don't recall seeing any RFCs that would support const operations in where clauses (although I would love to have them available). This specific case might be possible with some horrible hack like

trait LessThan128 {} struct IsLessThan128<const N: usize>; impl LessThan128 for IsLessThan128<0> {} ⋮ impl LessThan128 for IsLessThan128<127> {} struct uint<const N: usize> where IsLessThan128<N>: LessThan128 { .. }

but 🤢

@Nemo157 not possible with RFC 2000 but might be with future extensions.

This is a different formulation which might work(if you can use a const expression in associated types):

struct Bool<const B:bool>; trait LessThan128<const N: usize> { type IsIt; } impl<const N:usize> LessThan128<N> for () { type IsIt=Bool<{N<128}>; } struct uint<const N: usize> where (): LessThan128<N,IsIt=Bool<true>> { .. }

Edit:

If you want to include an error message in the type error,you can use this to print an error message in the type itself.

struct Str<const S:&'static str>; struct Bool<const B:bool>; struct Usize<const B:bool>; trait Assert<const COND:bool,Msg>{} impl<const COND:bool,Msg> Assert<COND> for () where ():AssertHelper<COND,Output=Bool<true>>, {} trait AssertHelper<const COND:bool,Msg>{ type Output; } impl<Msg> AssertHelper<true,Msg> for (){ type Output=Bool<true>; } impl<Msg> AssertHelper<false,Msg> for (){ type Output=Msg; } trait AssertLessThan128<const N:usize>{} impl<const N:usize,const IS_IT:bool> AssertLessThan128<N> for () where (): LessThan<N,128,IsIt=Bool<IS_IT>>+ Assert<IS_IT, ( Str<"uint cannot be constructed with a size larger than 128,the passed size is:", Usize<N>> ) > {} trait LessThan<const L:usize,const R:usize> { type IsIt; } impl<const L:usize,const R:usize> LessThan<L,R> for () { type IsIt=Bool<{L<R}>; } struct uint<const N: usize> where (): AssertLessThan128<N> { .. }

AFAIK stable Rust does not currently have any monomorphization time errors - if you find any, it's a bug - so this RFC "as is" would be introducing them into the language =/

@gnzlbg

fn poly<T>() { let _n: [T; 10000000000000]; } fn monomorphization_error() { poly::<u8>(); // OK! poly::<String>(); // BOOM! } fn main() { monomorphization_error(); }

Centril · 2018-10-28T09:28:47Z

text/0000-generic-int.md

+## Standard library
+
+Existing implementations for integer types should be annotated with
+`default impl` as necessary, and most operations should defer to the


I'm not sure why default impl would be used here... elaborate?

Essentially, your default impls are your base cases and the other cases will recursively rely upon them. For example, <uint<48>>::count_zeroes would ultimately expand to u64::count_zeroes minus 24. I'll try to elaborate more in the RFC text itself later.

You mention that it should be default fn in the text but don't put it in the count_zeros example, including it to the example + showing one of the specialized implementations for a power of two could clarify this somewhat.

Centril · 2018-10-28T09:34:28Z

text/0000-generic-int.md

+
+Once const generics and specialisation are implemented and stable, almost all of
+this could be offered as a crate which offers `uint<N>` and `int<N>` types. I
+won't elaborate much on this because I feel that there are many other


You should :) Do we know that it is actually implementable as a library?

I will try to get to this this weekend.

Ekleog · 2018-10-28T09:59:46Z

> Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :) Well, you can represent a [`Fin : Nat -> Type`](https://github.com/idris-lang/Idris-dev/blob/master/libs/base/Data/Fin.idr) type constructor in dependent typing pretty easily, but those are quite inefficient...

Low* (a part of F* that deals with machine types) can represent those and still compile to C, with for instance the type ```fstar x:int_32{x >= -128l /\ x < 128l} ``` This type will be compiled (to C) as an `int32_t` type, after F* will have proven that the value can indeed not be out of the `[-128, 128[` bounds. More details can be found about machine integers in F* on [the F* wiki](https://github.com/FStarLang/FStar/wiki/Machine-integers)

hanna-kruppe · 2018-10-28T09:48:36Z

text/0000-generic-int.md

+from `uint<N>` to `int<M>` or `uint<M + 1>`, where `M >= N`.
+
+In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via
+`as`, whereas most integer types can only be cast from `bool`.


So, this treats 1 and -1 as true depending on the signedness? I rather like the route of identifying true with -1 instead of 1, but it's not the route Rust has chosen so it might be a bit controversial. From another angle, while it's consistent with true being represented as a single 1 bit, it's also inconsistent with the fact that bool as iN (for currently existing iN) turns true into 1.

Is there motivation for providing these cases instead of just making people write x != 0, other than "we can"?

I completely forgot about sign extension when doing this and now that you mention it, it makes sense that if this were offered, then only uint<1> should cast to bool. Unless anyone has any objections when I next get around to revising the text I'll remove both casts to bool.

hanna-kruppe · 2018-10-28T09:51:59Z

text/0000-generic-int.md

+`int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and
+`uint<N>` stores values between 0 and 2<sup>N</sup>-1. One unexpected case of
+this is that `i1` represents zero or *negative* one, even though LLVM and other
+places use `i1` to refer to `u1`. This case is left as-is because generic code


Nit: integer types in LLVM don't have inherent signedness, they are bags of bits that are interpreted as signed or unsigned by individual operations, and i1 true is treated as -1 by signed operations (e.g., icmp slt i1 true, i1 false is true -- slt being signed less-than).

I didn't actually know this-- I'll be sure to update the text to be accurate there.

hanna-kruppe · 2018-10-28T10:00:14Z

text/0000-generic-int.md

+Because sign extension will always be applied, it's safe for the compiler to
+internally treat `uint<N>` as `uint<N.next_power_of_two()>` when doing all
+computations. As a concrete example, this means that adding two `uint<48>`
+values will work exactly like adding two `u64` values, generating exactly the


This needs more thorough discussion. To take this example, an u48 add can overflow the 48 bits, setting some high bits in the 64 bit register. If we take that as given, u48 comparisons (to give just one example) need to first zero-extend the operands to guarantee the comparison works correctly. Conversely, we could zero-extend after arithmetic operations to get the invariant that the high 16 bits are always zero, and then use that knowledge to implement 48 bit comparisons as a plain 64 bit comparisons. Likewise for i48: you'll need sign extensions. In some code sequences compiler optimizations can prove the zero/sign extension redundant, but generally there is no free lunch here -- you need some extending even in code that never changes bit widths. Eliminating sign extensions is in fact the main reason why C compilers care about signed integer addition being UB rather than wrapping.

I hadn't actually added this section in the original RFC when I requested feedback, and rushed it in because I felt it was necessary. You are right that in most cases, this wouldn't be a no-op, although I'm curious if optimisations could make them so.

This is certainly a case for adding add_unchecked and co. as suggested by… some other RFC issue I don't have the time to look up right now.

Either way, I'll definitely take some time this weekend to revise this section.

hanna-kruppe · 2018-10-28T10:06:25Z

text/0000-generic-int.md

+Primitive operations on `int<N>` and `uint<N>` should work exactly like they do
+on `int<N>` and `uint<N>`: overflows should panic when debug assertions are
+enabled, but ignored when they are not. In general, `uint<N>` will be
+zero-extended to the next power of two, and `int<N>` will be sign-extended to


Is this intended to be a user-facing guarantee? e.g. given x: &uint<48> are the following two guaranteed to give the same result:

unsafe { *(x as *const uint<48> as *const u64) }

*x as u64

I would not guarantee this, considering how x as *const uN as *const uM only holds on little-endian systems. Casting after dereferencing should work like casting values as usual, though.

I'll try and remember to clarify this in the text.

Would you guarantee it on little-endian systems?

I'm adding that as an unresolved question.

Mark-Simulacrum · 2018-10-29T01:08:53Z

I didn't see any discussion within the RFC about not using uN directly. It seems like that could make quite a bit of sense -- at least for an initial implementation.

It seems like it should at least be mentioned in the alternatives section...

jswrenn · 2018-10-29T20:10:17Z

We only build a Bit type into the language, which is guaranteed to be 1 bit large (though it must be padded in a struct unless you use repr(packed)).

@mark-i-m's suggestion strikes me as deeply complementary to @clarcharr's proposal. The leading motivation of this RFC is supporting bitfields, but using a number (albeit one guaranteed to be the right number of bits) to represent a bitfield is often a logical type-mismatch. To use the example from the RFC:

#[repr(bitfields)]
struct MipsInstruction {
    opcode: u6,
    rs: u5,
    rt: u5,
    rd: u5,
    shift: u5,
    function: u6,
}

How often does it really make sense to add or subtract from an opcode? By representing it as an unsigned six-bit integer, we signal that arithmetic on an opcode is a well-defined operation.

With @mark-i-m's suggestion, we can have a more well-typed version of this struct:

#[repr(packed)]
struct MipsInstruction {
    opcode: [Bit; 6],
    rs: u5,
    rt: u5,
    rd: u5,
    shift: u5,
    function: [Bit; 6],
}

It never makes sense to represent opcode or function as numbers, so we don't. Conversely, shift is very clearly numeric.

(Disclaimer: I'm not a MIPS expert; I'm just going off the wikibook. I can't tell whether r{s,t,d} ought to be [Bit; 5] or u5.)

As for the second half of @mark-i-m's suggestion: I can't tell whether it's merely an implementation detail or if it has semantic differences from @clarcharr's proposal. Regardless, it would be a good candidate for discussion in the Alternatives section.

hanna-kruppe · 2018-10-29T22:56:06Z

Arrays aren't (and can't be changed to be) repr(bitpacked), so [Bit; N] would actually occupy N bytes. While one could instead provide a dedicated Bitvector<N> type, that would have a lot of overlap with uint<N> (which have many applications not covered by Bitvector<N>). I also don't really see the typing benefit @jswrenn suggests: if you care about that, you'd want to use more fine-grained newtypes to distinguish (in this MIPS instruction encoding example) the opcode from the function field or the shift amount from the register numbers. Once you have these newtypes, uint<5> vs Bitvector<5> becomes largely irrelevant as it's an implementation detail. (In fact, one could implement Bitvector<N> as a newtype around uint<N>.)

clarfonthey · 2018-10-29T23:45:58Z

Finally getting around to a few comments:

Is user code allowed to rely on this memory layout, or is it not? Intuitively it'd be better if the layout was not actually defined, to allow for further optimizations, but the current text makes me believe it is defined. The one case where layout would be mandatory would be for a not-yet-written #[repr(bitpacked)] RFC, in my opinion.

I feel that defining memory layout is important because there doesn't seem to be a compelling argument otherwise. Rust very much avoids "undefined"ness whenever possible. People will want to know how these types operate in a #[repr(C)] struct or in an array; would [uint<48>; 2] take up six bytes or eight? Establishing this is crucial for stabilisation imho.

Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)

I'll take a look later and add these to the prior art section.

I didn't see any discussion within the RFC about not using uN directly. It seems like that could make quite a bit of sense -- at least for an initial implementation.

It seems like it should at least be mentioned in the alternatives section...

I'll add it to alternatives, although the main reason against this would be that it doesn't allow generic impls even though it is generic. Unless uN were just an alias for uint<N>, which seems unnecessary to me.

clarfonthey · 2018-10-29T23:48:35Z

In terms of offering Bit instead of uint-- I'll definitely add this to the alternatives section. Essentially, offering some kind of bits<N> type would be similar to uint<N>, although completely orthogonal to uint<N> and presumably only allowing bit operations, not arithmetic. I still believe that uint<N> is better overall, but thoroughness is important.

mark-i-m · 2018-10-30T00:31:40Z

Regarding Bit, I had intended it as a way to not add uint<N> and int<N> as language features. Rather, they could be implemented in a crate as wrappers around [Bit; N]. My motivation is just that adding uint<N> seems like a lot of magic, and I would like to reduce magic.

clarfonthey · 2018-10-30T02:59:55Z

While Bit by itself is less magic than bits<N>, there's a lot of magic for making arrays compact for just one particular type. How does Bit apply in most type contexts? Is (Bit, Bit) compact? Etc.

mark-i-m · 2018-10-30T04:42:05Z

My thinking was that Bit had a size of one bit and alignment of 1 bytes. So you need to use repr(packed) to get rid of padding, as with other types.

However, now that you mention it, IIUC, the size and alignment of types is tracked in bytes in the compiler. So some work would need to be put into making the compiler track bits, but i suspect similar work would need to be put into the current RFC proposal anyway to make bitfields work.

Also one other minor thing that I didn't see mentioned in the RFC: does size_of::<uint<N>>() just round up to the nearest byte? or the nearest byte when rounded up to a power of two?

mark-i-m · 2018-10-30T04:47:48Z

@rkruppe Sorry, I just saw your comment

Arrays aren't (and can't be changed to be) repr(bitpacked), so [Bit; N] would actually occupy N bytes.

I was curious why. Does this break some other stability guarantee we have?

clarfonthey · 2018-10-30T04:50:45Z

@mark-i-m the sizes of uint<N> are the same as the larger power of two size. So, uint<48> has the same size as u64.

mark-i-m · 2018-10-30T04:54:32Z

Hmm... so using repr(packed) actually changes the size of the type?

clarfonthey · 2018-10-30T04:58:41Z

@mark-i-m In this case, no. repr(packed) allows alignment to break, but in this case, uint<48> would have an alignment and size of 8. In this case, we'd need a different form of repr(packed) which allows both size and alignment to break, shoving things down to individual bits. That's mostly what repr(bitfields) is in the RFC; originally, I recommended bitpacked as a name but I changed it and I don't remember why.

hanna-kruppe · 2018-10-30T08:22:09Z

@mark-i-m

I was curious why. Does this break some other stability guarantee we have?

Arrays (and slices, which have the same layout as arrays of the same length) guarantee that each element is separately addressable -- that makes the Index/IndexMut impls and iter()/iter_mut() tick, for starters. Even though for all currently existing types that would remain so when arrays start become bitpacked, it would mean we'd have to add new bounds to those things, which would break generic code that doesn't have those bounds.

jswrenn · 2018-10-30T18:53:30Z

Even though for all currently existing types that would remain so when arrays start become bitpacked, it would mean we'd have to add new bounds to those things, which would break generic code that doesn't have those bounds.

Couldn't we signal that individual Bits are unaddressable with an Addressable auto trait that isn't implemented for Bit? E.g.:

auto trait Addressable {}

// Individual bits are unaddressable.
impl !Addressable for Bit {}

clarfonthey · 2018-10-30T19:15:10Z

@jswrenn There was a big discussion of doing this for DynSized with extern types, and the verdict was to not there. I feel like an Addressable bound would have similar problems.

hanna-kruppe · 2018-10-30T19:16:35Z

@jswrenn Auto traits are not assumed to be implemented by default (e.g. fn foo<T>() does not imply T: Send). So beyond just an auto trait, you'd need a new opt-out default bound like Sized, and as @clarcharr mentioned those have been rejected for other purposes in the past.

Centril · 2018-10-30T22:17:59Z

text/0000-generic-int.md

+# Summary
+[summary]: #summary
+
+Adds the builtin types `uint<N>` and `int<N>`, allowing integers with an


Perhaps the types should be called UInt<N> and Int<N> since that is more conventional these days; however, all the primitive types are lower cased so perhaps not... I'm torn.

The weird thing is that they are generic primitives, which I've never seen in a language before...

Perhaps we should do something suitably weird and have new notation for the type?

There are plenty of primitive type constructors -- references, raw pointers, tuples, arrays, slices -- they just all have special syntax as well instead of using angle brackets. I don't think special syntax for these new primitives is worth it.

Centril · 2018-10-30T22:20:05Z

text/0000-generic-int.md

+to be solved during the development of this feature, rather than in this RFC.
+However, here are just a few:
+
+* Should `uN` and `iN` suffixes for integer literals, for arbitrary `N`, be


An alternative to this would be to simply have u<7> and i<42> which would be almost as short...
Perhaps that's too short to be understandable? Chances are tho that given the fundamental nature of the types that people would remember it...

Actually, I thought about this too. uint and int seem inconsistent with the other integer types somehow, so maybe u and i are the right choice?

Also, either way, would we need to make a breaking change to make these identifiers reserved?

I don't think we need to make breaking changes; you can always shadow the type with something else afaik.

I'm adding this to the alternatives section but stating against it because i is such a common variable name.

Well, it's a different namespace so there isn't a conflict. The following "works":

struct i<i> { i: i }; let i: i<i32> = i { i: 4 };

(Said without actually taking a position on whether I think i would be a good name for the type constructor in question.)

Centril · 2018-10-30T22:31:01Z

text/0000-generic-int.md

+signed simply depends on whether its lower bound is negative.
+
+The primary reason for leaving this out is… well, it's a lot harder to
+implement, and could be added in the future as an extension. Longer-term, we


Perhaps... tho you should elaborate on the implementation difficulty as you see it.

...but the ranges feel also much more useful generally for code that isn't interested in space optimizations and such things but rather want to enforce domain logic in a strict and more type-safe fashion. For example, you might want to represent a card in a deck as type Rank = uint<1..=13>;. Then you know by construction that once you have your the_rank : Rank then it is correct and you won't have to recheck things. Of course, the other, more elaborate type safe way is to use an enum, but it might also be less convenient to setup than a simple range.

I think this alternative should be seriously entertained as the way to go; then you can use type aliases / newtypes to map to the range based types, e.g. type uint<const N: usize> = urange<{0..=pow(2, N)}>;.

You mean, urange<{0..=pow(2, N) - 1}>. :p

But actually, you're right. I should seriously clarify that and write it down in the alternatives.

Nemo157 · 2018-10-31T14:36:53Z

text/0000-generic-int.md

+```rust
+impl<const N: usize> uint<N> {
+    fn count_zeros(self) -> u32 {
+        let M = N.next_power_of_two();


I would assume this has to be const M = N.next_power_of_two(); since you can't use a non-const value in the type parameter on the next line. This appears to not be allowed by RFC2000 though.

It seems that this could be written

impl<const N: usize> uint<N> { fn count_zeros(self) -> u32 { let zeros = (self as uint<{ N.next_power_of_two() }>).count_zeros(); zeros + (N.next_power_of_two() - N) } }

but I'm not certain if the const expression there would be accepted by the current const generics implementation.

You're right and I replaced let M with const M: usize for now. I think that's valid.

Nokel81 · 2018-10-31T19:32:42Z

text/0000-generic-int.md

+`(bit_size_of::<T>() + 7) / 8 == size_of::<T>()`. All types would have a bit
+size, allowing for a future `repr(bitpacked)` extension which packs all values
+in a struct or enum variant into the smallest number of bytes possible, given
+their bit sizes. Doing so would prevent referencing the fields of the struct,


Since you are currently limiting the size to 128 bits (but maybe more in the future) wouldn't it be possible to define a reference to a bitpacked generic integer as a (ref, (u16, u16)) where the second pair is a (start, length) pair within?

Would you mind clarifying here? Not quite sure what you mean.

programmerjake · 2021-07-14T17:53:18Z

@programmerjake You might want to check out https://github.com/jhpratt/deranged. It's similar to this but provides for range bound integers with common trait impls.

Neat! I'd still have to use my own implementation for rust-hdl since I need support for >128-bit integers, so I have it based on BigInt.

jhpratt · 2021-07-14T17:57:29Z

Honestly not a bad idea to add support for that behind a flag ¯\_(ツ)_/¯

programmerjake · 2021-07-26T21:21:09Z

C is getting generic integers:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf
clang patch where I found my info: https://reviews.llvm.org/rG6bb042e70024

clarfonthey · 2021-07-27T16:58:16Z

I might try and update this RFC with some prior art for zig, C, and LLVM at some point, maybe this weekend. Not sure if it'd be worth resubmitting though, since I have no idea where this'd be in terms of priorities.

We could probably get away with a very minimal MVP without any additions to the current const generics on stable, although it wouldn't add all the stuff we want. In particular, I guess that the biggest concern is the ability to implement From<uint<N>> for uint<M> where N < M, since we have no idea how those kinds of where clauses will ultimately be implemented. An MVP would not allow such impls, only the enumerated ones we currently have implemented (e.g. u16 > u32).

And then there's still the < 128 bound that seemed very contested in this RFC. Would we want our MVP to allow arbitrary-length integers? Seems like it'd require a lot more work.

buzmeg · 2021-11-08T02:09:11Z

This RFC was postponed on const generics (although I don't see that as blocking this as I prefer the u48 form over uint<48>). Now that a minimal level of that has hit stable, how does this get reopened?

Prior art: I believe that Ada (general purpose language) and VHDL and SystemC (hardware description languages) also allow arbitrary integer sizes.

clarfonthey · 2021-11-08T03:09:36Z

The main reason for offering a generic form uint<N> is explicitly for generics -- if you're forced to use macros to implement for multiple sizes, then it leaves out a large amount of use cases that allow simplifying integer trait implementations.

That said, there's nothing stopping us from offering uN as an alias for uint<N> if there's desire for it, and there's also nothing stopping people from type-aliasing uint<N> for specific N if we don't.

At this point, I would really love to restart this RFC with the things we've learned since, but I simply have too much going on with work + other stuff at the moment. As I said, if anyone else wanted to take over the role of doing that, I'd be happy to help however I can.

buzmeg · 2021-11-11T06:04:46Z

Is there somewhere that the "things learned" are documented?

In addition, I don't like coupling "Generic Integers" to "Bitfields" at all. "Bitfields" have a lot of edge cases while a Rust struct that is "packed" is still useful even if you don't necessarily know exactly how it is packed. (ie RGBA--10/10/10/2 can be packed into 4 bytes even if you don't know the exact order while "unpacked" would cost you at least 7 bytes--almost double the size and corresponding problems with cache lines).

While I do want bitfields, I suspect that having generic integers in the language would help people implementing bitfields to explore the space more effectively before converging.

I guess one obvious question would be "What does uint<N> do to the language grammar?"

workingjubilee · 2021-11-17T21:54:34Z

I agree that generic integer lengths should not be seen "as-if bitfields", as the rules for C bitfields are subtle and highly implementation dependent. However, Rust users will likely use them as an implementation convenience for Rust types that work "like a bitfield" if introduced, so we should consider their semantics in that regard.

Of course, people already use u8, u16, etc. for that same reason, so that's nothing new or shocking.

clarfonthey · 2021-11-19T17:52:23Z

I mean, there's precedent from Zig for using generic integer types for bitfields, as their packed struct would be equivalent to the #[repr(bitpacked)] I suggested. And we could potentially make it so that #[repr(bitpacked)] allows reordering fields just like #[repr(packed)] does, and then you have to do #[repr(C, bitpacked)] in order to ensure order.

workingjubilee · 2021-11-29T23:27:47Z

Sure.

As far as the length-oriented where bound:
It should probably exist in the form of a trait or const fn bound. This would allow changing it after the fact without breaking compatibility. core::simd implements a similar thing as a somewhat ugly hack with a trait implementation on a struct, one we have at least some intention to move to a const fn before we stabilize anything. I mean, we could simply break compilation during monomorphization, but that seems rude.

And I actually think it should be confined to a bound equivalent to uint<N> where N <= 64 at first, because of issues like these:

However, eventually that should rise at least to 128 for the obvious reasons. It would also be nice to be able to go up to 256. This would help simplify implementations of "mask vectors" for AVX512 (currently only requires 16 and 64, but...) and eventually SVE2 (can go up to 256) and RISCV-V, etc. It might even simplify working with oddly-shaped data types like Intel's 80-bit floats.

clarfonthey · 2021-11-30T19:03:06Z

I think that the long-term goal should be to make N virtually unbounded; I say virtually because obviously you're limited by memory and codegen size, and at some point you really should just be using bigints. But yeah, if someone wants to make a 4096-bit integer for RSA keys or something like that, I say they should be allowed. The codegen might kinda suck for integers that big, but I wouldn't say we should stop them.

programmerjake · 2021-12-13T19:28:34Z

C23 and C++ are getting generic integer types _BitInt(N) where N is the number of bits:
https://reviews.llvm.org/rG6c75ab5f66b4

clarfonthey · 2021-12-13T20:00:51Z

LLVM getting proper support should make supporting this a lot easier.

programmerjake · 2021-12-13T20:14:24Z

iirc llvm already has proper support, at least for and, or, xor, shifts, comparison, add, sub, and multiply. division needs runtime library helpers (I don't remember if just the existing 128-bit helpers are sufficient).

programmerjake · 2022-02-18T17:12:53Z

LLVM RFC is up for adding library functions (div/rem) for >128-bit integers:
https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329?u=programmerjake
maybe that would be enough to allow @rust-lang/project-portable-simd to use generic integers for bitmasks once rustc gains support.

buzmeg · 2022-02-20T07:04:37Z

I would like to note that C23 is standardizing "N2709 - Adding a Fundamental Type for N-bit Integers"

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf

At this point, Rust will actually be lagging C without support for generic integers.

workingjubilee · 2022-02-20T08:28:22Z

I think it definitely weighs in strongly on the utility of the type, but we also actually trail behind C and C++ in many other things as well. Adding one more won't hurt.

One thing that came up in idle conversation when discussing the extension to the C Standard with a friend is that in Rust we don't necessarily have a good story for what operations with these types look like... C has the advantage (yes, it is an advantage here!) of accepting certain kinds of numeric promotion, which means that having what are effectively 4096 different integer types is not a problem because they interoperate with each other for simple operations. Are we sure we want that to be the same for Rust?

However, that's actually not something that requires deciding with the implementation, it's just an unresolved question that came up.

clarfonthey · 2022-02-21T03:13:27Z

As evident by the fact that this was postponed instead of closed, there is a pretty strong desire to add these to Rust at some point; it just wasn't on the road map at the time this RFC was done since const generics weren't even close to being done yet, and there's still the unanswered question of numerically constraining where clauses with const generics (i.e., how do we want to implement From<u<N>> for u<M> where N < M).

If people have answers to these questions and the time to write an RFC, I would say go for it. But otherwise, there's not the biggest benefit to posting here since this is still a closed RFC.

I wonder, maybe it would be worth opening up a discussion somewhere in the const generics WG?

hecatia-elegua · 2023-01-04T09:43:02Z

After trying to work with several bitfield crates, I stumbled upon this one: https://github.com/danlehmann/bitfield
I then tried to make it more ergonomic, which went ok for a while until it led me to trying:

#[bitfield(u4)]
struct TestChild {
    field1: u2,
    field2: u2,
}
#[bitfield(u8)]
struct TestParent {
    field1: TestChild,
    field2: u4,
}

Of course, this does not work, as proc macros can't "access" types, so TestParent can't access TestChild and therefore doesn't know how big it is and therefore can't generate any getters/setters. I'm not sure if any crate can (or should) breach that barrier?

My current idea is to add a const field SIZE to every bitfield and then generate offsets at runtime. Another idea is to give up on ideals and add an attribute to every struct field specifying it's size.
There are a ton more things to think about and similar issues when implementing bitenums (and bitflags).

To me it feels like we're reimplementing stuff the compiler does, only that the compiler likes to work with bytes and we reimplement things for working with bits, e.g. offset_of and size_of in the case above.

What would a first helpful step be? Maybe allowing the compiler to recognize bit-sized types?

workingjubilee · 2023-02-18T02:22:23Z

Adding the notion of a "bit-sized" type (as opposed to a bit type that happens to occupy an entire u8) is much more troublesome than you might think, because one thing that Rust code can usually count on is that for each T in (T, T), the types can have pointers taken to them and interacted with separately. If you want to bitpack four u2 into one u8, that goes out the window.

programmerjake · 2023-02-18T02:26:01Z

Adding the notion of a "bit-sized" type (as opposed to a bit type that happens to occupy an entire u8) is much more troublesome than you might think, because one thing that Rust code can usually count on is that for each T in (T, T), the types can have pointers taken to them and interacted with separately. If you want to bitpack four u2 into one u8, that goes out the window.

imho bit-sized types would always be padded out to an integer number of bytes, except for inside bit-packed structs/enums, which are like repr(packed) in that you can't make references to their fields, you can only read or set them.

nyabinary · 2024-06-03T17:00:16Z

Has enough time passed to revisit this yet? :P

clarfonthey changed the title ~~Generic integers RFC~~ RFC: Generic integers Oct 28, 2018

Generic integers RFC

38e6de2

clarfonthey force-pushed the generic_int branch from fb23779 to 38e6de2 Compare October 28, 2018 04:47

Centril reviewed Oct 28, 2018

View reviewed changes

Centril added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Oct 28, 2018

hanna-kruppe reviewed Oct 28, 2018

View reviewed changes

Centril reviewed Oct 30, 2018

View reviewed changes

Nemo157 reviewed Oct 31, 2018

View reviewed changes

Nokel81 reviewed Oct 31, 2018

View reviewed changes

programmerjake mentioned this pull request Jul 26, 2021

Using types like i1, u24, or other nonstandard integer sizes. rust-lang/rust#54855

Open

workingjubilee mentioned this pull request Jan 22, 2022

Use a more trait-oriented design rust-lang/portable-simd#228

Closed

mkitti mentioned this pull request May 27, 2022

Support arbitrary bitwidth integers JuliaLang/julia#45486

Open

programmerjake mentioned this pull request Dec 21, 2022

Remove u8 slice output of simd_bitmask rust-lang/rust#105990

Closed

programmerjake mentioned this pull request Jan 30, 2024

Tracking Issue for ascii::Char (ACP 179) rust-lang/rust#110998

Open

7 tasks

programmerjake mentioned this pull request May 21, 2024

ACP: primitive numeric traits rust-lang/libs-team#371

Open

programmerjake mentioned this pull request Jun 10, 2024

What should SIMD bitmasks look like? rust-lang/rust#126217

Open

clarfonthey mentioned this pull request Sep 1, 2024

Generic Integers V2: It's Time #3686

Open


		## Primitive behaviour

		The compiler will have two new built-in integer types: `uint<N>` and `int<N>`,

RFC: Generic integers #2581

RFC: Generic integers #2581

Conversation

clarfonthey commented Oct 28, 2018

📝 Summary

💖 Thanks

mark-i-m commented Oct 28, 2018

Ekleog commented Oct 28, 2018 via email

Centril commented Oct 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodrimati1992 Nov 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ekleog commented Oct 28, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clarfonthey Oct 29, 2018 • edited Loading

Choose a reason for hiding this comment

hanna-kruppe Oct 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-Simulacrum commented Oct 29, 2018

jswrenn commented Oct 29, 2018

hanna-kruppe commented Oct 29, 2018

clarfonthey commented Oct 29, 2018

clarfonthey commented Oct 29, 2018

mark-i-m commented Oct 30, 2018

clarfonthey commented Oct 30, 2018

mark-i-m commented Oct 30, 2018

mark-i-m commented Oct 30, 2018

clarfonthey commented Oct 30, 2018

mark-i-m commented Oct 30, 2018

clarfonthey commented Oct 30, 2018

hanna-kruppe commented Oct 30, 2018

jswrenn commented Oct 30, 2018

clarfonthey commented Oct 30, 2018

hanna-kruppe commented Oct 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm Nov 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clarfonthey Nov 8, 2018 • edited Loading

Choose a reason for hiding this comment

programmerjake commented Jul 14, 2021

jhpratt commented Jul 14, 2021

programmerjake commented Jul 26, 2021

rodrimati1992 Nov 1, 2018 •

edited

Loading

clarfonthey Oct 29, 2018 •

edited

Loading

hanna-kruppe Oct 28, 2018 •

edited

Loading

scottmcm Nov 9, 2018 •

edited

Loading

clarfonthey Nov 8, 2018 •

edited

Loading

clarfonthey commented Jul 27, 2021 •

edited

Loading

buzmeg commented Nov 8, 2021 •

edited

Loading

buzmeg commented Nov 11, 2021 •

edited

Loading

workingjubilee commented Nov 29, 2021 •

edited

Loading

buzmeg commented Feb 20, 2022 •

edited

Loading

workingjubilee commented Feb 20, 2022 •

edited

Loading