Add i128 and u128 types #521

mahkoh · 2014-12-14T02:19:24Z

typedef unsigned long u64;
typedef __uint128_t u128;

u64 f(u64 x, u64 y) {
    u128 mul = (u128)x * (u128)y;
    mul += 1UL << 63;
    return mul >> 64;
}

This produces efficient code that cannot be written in Rust without inline assembly.

sinistersnare · 2014-12-14T06:30:05Z

We had quadruple precision floating point types at one point (@thestinger added them) but they were later removed from the language, citing lack of support across compilers. Is there good compiler support for 128 bit integrals?

thestinger · 2014-12-14T06:36:34Z

citing lack of support across compilers

That was misinformation. It worked well and provided a useful feature. If there was a valid reason to remove it, it wasn't stated in any notes that were made public.

Is there good compiler support for 128 bit integrals?

LLVM has fully working support for 128-bit integers and quadruple precision floating point. GCC and Clang both expose a 128-bit integer type. It's no harder to implement 128-bit implements in terms of 64-bit ones than it is to implement 64-bit integers in terms of 32-bit ones. Rust already exposes numeric types with efficient software implementations: i64 and u64 on architectures without 64-bit integer registers.

sinistersnare · 2014-12-14T07:31:58Z

I am not saying it was true or not, I am saying what was said when they were removed. Maybe a full RFC for implementing 128bit numbers is a good idea at this point.

HalosGhost · 2014-12-22T19:28:55Z

👍 for {i,u,f}128 being added (back) to the lang.

aldanor · 2015-02-06T20:10:21Z

It would be great to have 128-bit integer types -- for one, this would make bitflags more useful.

pnkfelix · 2015-04-20T16:19:37Z

An aside: rust-lang/rust#24612 (new float to decimal conversion code) has a custom bignum module, and it has a note that its generic (uN, u2N) based bignum code would easily gain N=64 support if we had u128 support.

Diggsey · 2015-04-20T16:44:22Z

+1, The Duration type could be simplified with this: instead of using a u64 for seconds with a seperate u32 for nanoseconds, it could just be a single u128/i128 count of nanoseconds.

nagisa · 2015-04-29T21:35:31Z

I really want this.

We had quadruple precision floating point types at one point (@thestinger added them) but they were later removed from the language, citing lack of support across compilers.

System libm must support variety of non-trivial mathematical operations (sine, cosine, etc) on f128 on all platforms we support in order for the type to be useful. (EDIT: unless f128 is emulated as double-double, which is not IEEE-conformant then)

This does not apply to i/u128 and, as long as LLVM and Rust support i/u128, nothing else needs to support them. This also means that i/u128 are relatively easier to add compared to f128.

HalosGhost · 2015-04-29T21:44:33Z

I've actually come to the opinion that since llvm supports arbitrarily-sized fixed-width integers (e.g., i7 for a 7-bit integer), it'd be great if Rust actually just exposed this directly. I doubt it'll ever happen, but I'd love to see it.

rrichardson · 2015-04-30T20:56:51Z

That would actually provide a fairly easy and efficient way of parsing and writing bit-packed binary structures (like TCP packets) you could write a packed struct full of your oddly sized integers and transmute a buffer ptr to your struct ptr.

HalosGhost · 2015-04-30T21:35:53Z

@rrichardson, it's a perfect way of writing tightly-packed structures, yes. It also allows the programmer to more finely tune the acceptable range of a particular data point if they so wish. For example, if you want to be able to store exactly one hexdigit, an i4 is what you want. This allows you to, for example, implement bigints pretty easily by using an array (or vector since llvm has both) of i4s.

I've been hoping for this feature, but again, I seriously doubt I'll ever see this in Rust.

Diggsey · 2015-04-30T21:40:45Z

@rrichardson Currently packed structs will only pack to the granularity of 1 byte AFAICT, so adding non-multiple-of-eight sized integers wouldn't help much without more extensive changes.

HalosGhost · 2015-04-30T23:52:52Z

@Diggsey, just because they only pack to the nearest octet doesn't mean that arbitrarily-sized fixed-width integers wouldn't make things simpler. Think of them as being a more well-supported version of C's bitfields.

Diggsey · 2015-05-01T00:57:23Z

@HalosGhost Packing means how fields are layed out relative to each other. Even when you enable packing, fields are still aligned to 1 byte, which makes arbitrarily sized integer useless.

C++ bitfields allow you to have bit-level packing because it effectively merges adjacent bitfields together into a single field as a pre-process step before the struct layout code sees them. This is also why bitfields are so poorly supported, because only the struct layout step is standardised on each platform, and that step can't deal with less than 1 byte alignments.

It's not impossible to get bit-level packing, but you just have to go quite a bit further than just adding the types to the language: you have to either do what C++ does by having a strategy for merging sub-byte fields into whole-byte fields, or extend the struct layout code to support sub-byte fields directly, making sure that it never affects the layout for structs without those fields.

You'd also have to decide what to do for #[repr(C)] types, since C doesn't define layout rules in this situation, do you just disallow sub-byte fields? What about new-typing, new-types are currently just single-element tuples, so new-typing an i4 would change the size and alignment of the type to be byte aligned.

HalosGhost · 2015-05-01T01:02:08Z

I must be misunderstanding you.

struct ex {
    uint8_t a: 4;
    uint8_t b: 4;
};

If you were to take sizeof(struct ex); in C you would discover that it reports 1.

My point is that it'd be handier (and cleaner imho) to be able to just say:

struct ex { // forgive me if this is the wrong syntax
    a: u4, b: u4;
};

Diggsey · 2015-05-01T01:09:06Z

In your example, 'a' and 'b' are merged into a single uint8_t field as the first step, and then that field is sent to the layout code, and so of course you get a 1 byte struct. Rust doesn't have that merge step at the moment, and the merge step is part of the reason why bitfields are non-portable and poorly supported, because it's entirely implementation defined.

HalosGhost · 2015-05-01T01:12:30Z

I suppose. Sounds like that should be addressed as well. Even without bit-level packing in structs, the additional boundaries that the arbitrary-width offers is enough for me to want the feature.

rrichardson · 2015-05-01T04:10:19Z

Or maybe the answer is not arbitrary width ints but, bit level pattern
matching and destructuring. Erlang might be a good model for this.

On Thu, Apr 30, 2015, 9:12 PM Sam Stuewe notifications@github.com wrote:

I suppose. Sounds like that should be addressed as well. Even without
bit-level packing in structs, the additional boundaries that the
arbitrary-width offers is enough for me to want the feature.

—
Reply to this email directly or view it on GitHub
#521 (comment).

erickt · 2015-05-01T04:33:58Z

@rrichardson: that could be done with a macro or syntax extension. I've been wanting that for a while now :)

darakian · 2015-06-09T20:41:55Z

Add +1 request for u128 ints :)

stusmall · 2015-11-28T23:34:16Z

Another +1 for {u,i}128. I could really use it for an implementation of murmur3. Is there a downside to adding it?

isislovecruft · 2016-02-08T19:22:02Z

+1

This would often be helpful for implementing various cryptographic algorithms without needing to either jump through hoops to stay within the bounds of u64/i64 or switching to std::num::BigInts. FWIW, @kennytm implemented this as a module (https://github.com/kennytm/extprim), but for various reasons it doesn't work with any version of rustc (at least that I tried).

mahkoh · 2016-02-08T21:27:45Z

This cannot be implemented efficiently without compiler support for various reasons. It is clear that certain people have no interest in this feature and that this issue will be ignored by them unless someone turns it into an RFC.

cesarb · 2016-02-15T00:17:42Z

Does LLVM have native support for 128-bit integers even on 32-bit architectures like x86 or ARM?

HalosGhost · 2016-02-15T00:24:59Z

LLVM exposes arbitrarily-sized fixed-width integers.

mahkoh · 2016-02-15T00:30:34Z

@cesarb That's irrelevant since it can always be emulated before the code is translated to LLVM IR.

Jaak · 2016-02-15T11:08:45Z

@HalosGhost I have found that this claim is only true for machine friendly types. In our experience anything larger than 128-bits on x86_64 is extremely poorly supported. Additionally, when dealing with types smaller than 128-bits one is only safe to use (again, x86_64) 8-, 16-, 32-, 64- and 128-bit integers. Other bit-widths can work, but one has to be careful to not form vectors out of them.

We have also have had issues with storing non-standard bit-width integers in memory, but haven't tried to pinpoint the problem exactly. Seems like this is very poorly tested area of LLVM. Tread carefully.

mjbshaw · 2016-05-25T13:46:49Z

Does LLVM have native support for 128-bit integers even on 32-bit architectures like x86 or ARM?

Nope. __uint128_t is not defined when compiling for armv7, which I recently learned when compiling code for the iPhone 4S (which is still supported by Apple).

ticki · 2016-05-25T13:50:38Z

@mjbshaw A fallback is possible (like rem u64).

strega-nil · 2016-05-25T16:32:00Z

@mjbshaw LLVM supports it, clang does not.

gnzlbg · 2016-06-30T16:44:36Z

Would it be possible to implement u128/i128 but allow them to be used only behind a target_feature ?

I need u128 to easily implement support for the mulx BMI2 instruction on architectures that support it (clang does it this way as well), but wouldn't mind falling back to something slower (e.g. an user-defined type) on architectures that do not support it.

EDIT: It would actually be great if the u128/i128 types were provided by the implementation, and on architectures that do not support them, an emulation would be provided.

briansmith · 2016-06-30T17:29:23Z

I need u128 to easily implement support for the mulx BMI2 instruction on architectures that support it (clang does it this way as well), but wouldn't mind falling back to something slower (e.g. an user-defined type) on architectures that do not support it.

i128/u128 is not needed and IMO not very useful for that kind of thing. I much prefer the intrinsics that Microsoft made: https://msdn.microsoft.com/en-us/library/3dayytw9(v=vs.100).aspx, which operate exclusively on 64-bit types.

gnzlbg · 2016-07-01T08:58:46Z

@briansmith the API only uses 64bit types, but the implementation (which is not shown there) probably won't.

EDIT: the way the 128bit version is typically implemented is:

// 64-bit version
unsigned __int64 _umul128(unsigned __int64 Multiplier, unsigned __int64 Multiplicand, unsigned __int64 *HighProduct) {
  unsigned __int128 __Result = (unsigned __int128) Multiplier * Multiplicand;
  *HighProduct = (unsigned __int64) (__Result >> 64);
  return (unsigned _int64) __Result;
};
// 32-bit version
unsigned __int32 _umul64(unsigned __int32 Multiplier, unsigned __int32 Multiplicand, unsigned __int32 *HighProduct) {
  unsigned __int64 __Result = (unsigned __int64) Multiplier * Multiplicand;
  *HighProduct = (unsigned __int32) (__Result >> 32);
  return (unsigned _int32) __Result;
};

On platforms without 128bit registers you can do something like this, but backends do not recognize it due to its complexity, and won't generate 128bit (4 instructions) or MULX (1 instruction) versions in platforms where they are available.

retep998 · 2016-07-01T09:15:53Z

@gnzlbg It's an intrinsic, so it isn't implemented as a function like that. The compiler treats intrinsics special and compiles them down to a certain set of instructions. The MUL instruction on x86 multiplies RAX and some other i64 value and returns the result in RAX and RDX. So it really is returning the result as two separate i64 values, it's not implemented in terms of i128. Same with DIV, it takes the divisor in RAX and RDX, an i64 divisor, and returns the quotient in RAX and the remainder in RDX. At no point is it working with i128 directly, it is simply doing 64-bit math with a second register holding the high part.

I really wish Rust exposed these intrinsics somehow, they're incredibly useful. Instead all we have are intrinsics for doing multiply with overflow which is far less useful since you don't get the high part, just a bool indicating whether you needed a high part.

gnzlbg · 2016-07-01T09:19:46Z

@retep998 I failed to mention that I am implementing the intrinsic, in Rust. LLVM doesn't offer it because it can recognize it.

gnzlbg · 2016-07-01T09:22:41Z

For example, this is what GCC generates for both versions on a platform that does support 128bit integers (but not MULX). For the non-128bit version:

void mult64to128(uint64_t op1, uint64_t op2, uint64_t *lo)
{
    uint64_t u1 = (op1 & 0xffffffff);
    uint64_t v1 = (op2 & 0xffffffff);
    uint64_t t = (u1 * v1);
    uint64_t w3 = (t & 0xffffffff);
    uint64_t k = (t >> 32);

    op1 >>= 32;
    t = (op1 * v1) + k;
    k = (t & 0xffffffff);

    op2 >>= 32;
    t = (u1 * op2) + k;

    *lo = (t << 32) + w3;
}

it generates:

mult64to128(unsigned long, unsigned long, unsigned long*):
        movl    %edi, %eax
        movl    %esi, %r8d
        shrq    $32, %rdi
        movq    %rax, %rcx
        shrq    $32, %rsi
        imulq   %r8, %rcx
        imulq   %r8, %rdi
        imulq   %rax, %rsi
        movq    %rcx, %r9
        movl    %ecx, %ecx
        shrq    $32, %r9
        addl    %r9d, %edi
        leaq    (%rsi,%rdi), %rax
        salq    $32, %rax
        addq    %rcx, %rax
        movq    %rax, (%rdx)
        ret

while for the 128bit version

uint64_t umulx(uint64_t x, uint64_t y, uint64_t* p) {
  unsigned __int128 r = (unsigned __int128)x * y;
  *p = (uint64_t)(r >> 64);
  return (uint64_t) r;
}

it generates

umulx(unsigned long, unsigned long, unsigned long*):
        movq    %rdi, %rax
        movq    %rdx, %rcx
        mulq    %rsi
        movq    %rdx, (%rcx)
        ret

On platforms with MULX it does use it, and generates:

umulx(unsigned long, unsigned long, unsigned long*):
        movq    %rdx, %rcx
        movq    %rdi, %rdx
        mulx    %rsi, %rax, %rdx
        movq    %rdx, (%rcx)
        ret

for the 128bit version and still "crap" for the version that does not use 128bit integers.

EDIT: this is what clang generates with the LLVM backend, which is identical for the 128bit version, and a bit different for the 64-bit version.

retep998 · 2016-07-01T09:35:02Z

mult64to128 is horrifying and I seriously hope nobody uses that. umulx shows that gcc is capable of optimizing down that pattern to a simple 64-bit multiply. However that doesn't change the fact that we could expose such an intrinsic to do umulx without needing 128-bit integers. Every x86_64 processor supports doing that sort of multiplication, so all it would take is a small amount of effort from someone to add a Rust intrinsic for that.

Although... is there an LLVM intrinsic for umulx? MSVC definitely has _umul128 as an intrinsic, so if we look at the Clang/C2 intrinsics for _umul128... oh, it is an inline function that uses 128 bit integers. Does LLVM not have an intrinsic for this? o_0

static __inline__ unsigned __int64 __DEFAULT_FN_ATTRS
_umul128(unsigned __int64 _Multiplier, unsigned __int64 _Multiplicand,
         unsigned __int64 *_HighProduct) {
  unsigned __int128 _FullProduct =
      (unsigned __int128)_Multiplier * (unsigned __int128)_Multiplicand;
  *_HighProduct = _FullProduct >> 64;
  return _FullProduct;
}

gnzlbg · 2016-07-01T09:40:43Z

TL;DR: to be able to use mulx and mulq LLVM expects to see some code that uses 128bit integers (there is no llvm.bmi2.mulx intrinsic in LLVM). Even if we expose a umulx function in the standard library, that function is going to have to be written somewhere. If Rust had 128bit integers, it could be written in plain rust (using target_feature, switching between 128bit and 64bit depending on the architecture) and LLVM would do the right thing. Without 128bit integers, we would still need to be generating 128bit LLVM IR code from within rustc itself on some architectures for LLVM to recognize it.

@retep998

Although... is there an LLVM intrinsic for umulx?

No, there is no need for it since LLVM recognizes the pattern automatically.

It might also be worth to mention that, actually, LLVM intrinsics get removed over time as the optimizer lerns to recognize new patterns. A pretty good example is the TBM instruction set. It used to have one intrinsic per instruction (llvm.tbm.___) but nowadays it is just written in plain C. You write the algorithms normally in your language and LLVM optimizes them down to single instructions when available. Obviously some algorithms are harder to recognize, and some intrinsics remain because nobody has put in the work yet, but the objective is that you write normal code, generate normal IR, and LLVM does the rest.

And LLVM is not unique in this. GCC (as shown above) does it the exact same way. I wouldn't be surprised if the MSVC intrinsic was written in plain C++ as well.

The main difference between the 128-bit version and the 64-bit version, is that the 64-bit version is very complex. There are probably multiple ways to write it and achieve the same thing, but writing the code that "canonicalizes" it into something that LLVM/GCC/... optimizers can recognize is non trivial. Doing so for the 128bit implementations is much easier.

nagisa · 2016-07-01T10:27:47Z

LLVM’s “intrinsic” (note the quotes) for widening multiply is

%x = zext i64 %a to i128
%y = zext i64 %b to i128
%r = mul nuw i128 %x, %y
ret i128 %r

and ends up becoming a

movq    %rsi, %rax
mulq    %rdi
retq

It is somewhat important that it is easy for LLVM to prove both arguments have been zero extended in order to generate such code, because otherwise it very quickly may become a

imulq   %rdx, %rsi
movq    %rdx, %rax
mulq    %rdi
addq    %rsi, %rdx
imulq   %rdi, %rcx
addq    %rcx, %rdx
retq

instead, which you, obviously, do not want.

Implementing a rust intrinsic which emits the first sequence of LLVM instructions exactly guarantees the emission of the 3 instruction case, whereas using arbitrary 128-bit sized integers may make LLVM fall back to the other case even in obvious cases for no apparent reason.

gnzlbg · 2016-07-01T12:15:15Z

@nagisa Thanks! I will give that approach a try, it should work on all architectures. The only "non-issue" is that we cannot offer widening multiply as an intrinsic unless we actually have 128-bit integers, but that is irrelevant for mulx_u64 (I'll just write an intrinsic for that instead).

whereas using arbitrary 128-bit sized integers may make LLVM fall back to the other case even in obvious cases for no apparent reason

That's a huge may though, since it would be a regression, clang relies on this, ... Following the same reasoning we should be generating assembly directly, since even if we generate the correct IR the backend could still "fall back to the other case for no apparent reason". So, while you are technically correct (which is the best kind of correct), I don't think we should give this argument that much weight.

nikomatsakis · 2016-08-23T21:28:00Z

Closing, we now have accepted an RFC for this feature: rust-lang/rust#35118

nagisa mentioned this issue Apr 18, 2015

Provide support for arbitrary width atomics from 16bit to 2x word size (u16, u32, u64, u128) rust-lang/rust#24564

Closed

nagisa mentioned this issue Apr 29, 2015

Add functions to convert IPv4 to long and back rust-lang/rust#24652

Merged

Amanieu mentioned this issue Feb 21, 2016

Add support for 128-bit integers #1504

Merged

nikomatsakis closed this as completed Aug 23, 2016

VariantXYZ mentioned this issue Dec 24, 2022

f16 native support in Rust #3358

Closed

Add i128 and u128 types #521

Add i128 and u128 types #521

Comments

mahkoh commented Dec 14, 2014

sinistersnare commented Dec 14, 2014

thestinger commented Dec 14, 2014

sinistersnare commented Dec 14, 2014

HalosGhost commented Dec 22, 2014

aldanor commented Feb 6, 2015

pnkfelix commented Apr 20, 2015

Diggsey commented Apr 20, 2015

nagisa commented Apr 29, 2015

HalosGhost commented Apr 29, 2015

rrichardson commented Apr 30, 2015

HalosGhost commented Apr 30, 2015

Diggsey commented Apr 30, 2015

HalosGhost commented Apr 30, 2015

Diggsey commented May 1, 2015

HalosGhost commented May 1, 2015

Diggsey commented May 1, 2015

HalosGhost commented May 1, 2015

rrichardson commented May 1, 2015

erickt commented May 1, 2015

darakian commented Jun 9, 2015

stusmall commented Nov 28, 2015

isislovecruft commented Feb 8, 2016

mahkoh commented Feb 8, 2016

cesarb commented Feb 15, 2016

HalosGhost commented Feb 15, 2016

mahkoh commented Feb 15, 2016

Jaak commented Feb 15, 2016

mjbshaw commented May 25, 2016

ticki commented May 25, 2016

strega-nil commented May 25, 2016

gnzlbg commented Jun 30, 2016 • edited Loading

briansmith commented Jun 30, 2016 • edited Loading

gnzlbg commented Jul 1, 2016 • edited Loading

retep998 commented Jul 1, 2016

gnzlbg commented Jul 1, 2016

gnzlbg commented Jul 1, 2016 • edited Loading

retep998 commented Jul 1, 2016 • edited Loading

gnzlbg commented Jul 1, 2016 • edited Loading

nagisa commented Jul 1, 2016 • edited Loading

gnzlbg commented Jul 1, 2016

nikomatsakis commented Aug 23, 2016

gnzlbg commented Jun 30, 2016 •

edited

Loading

briansmith commented Jun 30, 2016 •

edited

Loading

gnzlbg commented Jul 1, 2016 •

edited

Loading

gnzlbg commented Jul 1, 2016 •

edited

Loading

retep998 commented Jul 1, 2016 •

edited

Loading

gnzlbg commented Jul 1, 2016 •

edited

Loading

nagisa commented Jul 1, 2016 •

edited

Loading