RFC: int/uint portability to 16-bit CPUs #161

1fish2 · 2014-07-12T05:00:37Z

Both Issue #14758 and Issue #9940 call for RFCs.

This RFC summarizes those discussions, explains the core issue of
code portability to 16-bit CPUs (also of 64-bit code to 32-bit CPUs),
explains what's meant by "default" integer types, makes 2 specific
proposals, and proposes coding style for integer sizing.

Both Issue #14758 and Issue #9940 call for RFCs. This RFC summarizes those discussions, explains the core issue of code portability to 16-bit CPUs (also of 64-bit code to 32-bit CPUs), explain what's meant by "default" integer types, makes 2 specific proposals, and proposes usage style for integer sizing.

emberian · 2014-07-12T05:14:27Z

active/0000-intindex.md

+
+# Background
+
+Rust defines types `int` and `uint` as integers that are wide enough to hold a


Is this even true on 16-bit devices, or do modern ones still use a segmentation system? Are there any relevant 16-bit chips anymore?

XMEGA are 8/16-bit?

Some Atmel AVR controllers http://en.wikipedia.org/wiki/Atmel_AVR and some PIC controllers http://en.wikipedia.org/wiki/PIC_microcontroller have 16-bit address spaces. These tend to have Harvard architectures, that is, separate instruction and data memory/addresses.

And the MSP430...

errordeveloper · 2014-07-12T07:15:12Z

Sounds reasonable.

errordeveloper · 2014-07-12T07:17:03Z

active/0000-intindex.md

+
+# Drawbacks
+
+  - Renaming `int`/`uint` requires figuring out which of the current uses to replace with `index`/`uindex` vs. `i32`/`u32`/`BigInt`.


And some people will just end-up redefining the int and uint to be 32-bit in their projects...

errordeveloper · 2014-07-12T07:18:04Z

Overall, it's quite a reasonable thing to do, considering Rust's goals.

Although, may be the motivation and title could be generalised a bit more...

errordeveloper · 2014-07-12T07:23:15Z

active/0000-intindex.md

+
+# Motivation
+
+So Rust libraries won't have new overflow bugs when run on embedded devices with


I'd just replace the entier paragraph with: "Avoid bugs where programmer presumed default integer size for indexing of arrays and eleswhere."

You can expand a little to just say that this concerns non-32 bit targets, mostly, 8-bit and 16-bit MCUs and, to some extend 64-bit CPUs too.

errordeveloper · 2014-07-12T07:32:01Z

Am I correct in understanding that this is to keep the integral type used for array indexing default to "native", i.e. fastest, integer? (On the AVR the int32_t is actually slow, not that I care to much about the AVR, just saying).

By the way, what about suffixes? Would this imply dropping of i and u suffixes?

huonw · 2014-07-12T07:36:26Z

Am I correct in understanding that this is to keep the integral type used for array indexing default to "native", i.e. fastest, integer?

The integral type used for indexing is the smallest one that covers the address space. "fastest"/"native" is irrelevant.

errordeveloper · 2014-07-12T08:01:38Z

@huonw thanks for the better formulation, perhaps that's the way the RFC/docs should state it.

1fish2 · 2014-07-12T08:08:16Z

Agreed. I'll rephrase that.

1fish2 · 2014-07-12T08:15:56Z

Yes, the motivation and title can be generalized. I was trying to start
with a concrete reason rather than a return of the intptr/uintptr
discussion.

dobkeratops · 2014-07-12T10:34:30Z

as well as embedded, some machines have had coprocessors with smaller address spaces.. not so common now, but who knows what the future will bring

My suggestions would have been ...

[1] Officially define int/uint as max( pointer-size, alu-size, 32bits) .. as most people expect: 32/64
This will prevent unexpected bugs when you move from desktop to embedded, and is friendly to most.

[2] Then add other types which are more specific..
perhaps word=max(pointer-size,alu-size) .. might be 16/32/64
maybe even another for min(pointer-size,alu-size,32bits) .. might be 16/32

These are complimentary to the specific types,i32 etc.. code might cluster data dynamically to suit its platform.
The embedded aware programmer switches from int->word as an optimisation.

[3] Vec could be defined more versatile as Vec<T,Index=uint> - in the vast majority of my cases I'd be using Vec<T,i32> - even on a 64bit machine - they're sufficient on machines with up to ~16gb ram, (especially with the majority of ram is full of graphical assets.)
Similarly embedded people could reuse Vec<T,word>..

           68000      x86   x86-64      x32
int        32         32    64          32
word       16         32    64          64
??         16         32    32          32

Seems like the C name 'long' being distinct from int is actually useful, maybe even swapping int out as suggested by the OP would be good, but adding another complimentary type would be less disruptive it think.

* Crisper/broader motivation. * "The smallest integers that span the address space" is clearer than "pointer-sized integers". * More concise. * More "not in scope" items.

glaebhoerl · 2014-07-13T09:29:17Z

active/0000-intindex.md

+
+> In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this. ...
+>
+> Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce.


This suggestion makes a lot of sense in a context where overflow/underflow silently wraps around. However, if something like RFC PR #146 were to be implemented, then it would once again make sense to use types which more accurately express the range of legal values (i.e., which are self-documenting), because compiler-added checks can be enabled to catch errors where the value would go out of range. Accurate types with compiler-added assertions beats inaccurate types with programmer-added assertions.

@glaebhoerl So would you recommend we wait for PR #146 to be accepted or rejected before evaluating this RFC further?

Nah. This was just an ancillary remark on an ancillary part of the proposal. The main part of the proposal (which is about changes to the language to better accomodate [portability to] 16-bit architectures) is unaffected.

(And anyway, the suggestion makes sense in the context of the current language, and the style guide could just be updated again if the language changes.)

Aha! Nice insight, @glaebhoerl.

I'll make the style guide recommendation conditional on overflow-checking.

Q. Does/will overflow checking happen during conversion between integer types?

A. It doesn't currently, but in the context of #146, if #[overflow_checks(on)], I think it should.

Rationale: As far as I can tell as is meant to preserve meaning rather than representation, e.g. 5000i32 as f32 is equivalent to 5000f32 and not to transmute::<i32, f32>(5000i32). Therefore if attempting to transport the meaning of the original value to the target type causes it to overflow, it should be caught.

Yes. Otherwise computing a value in one integer type then converting to another would accidentally bypass the overflow checks.

errordeveloper · 2014-07-13T09:55:27Z

Also, another point this RFC should consider is how would a typical for i in range(..) construct would look like... Well, considering int as well as uint would be scraped, then an integer literal with suffix i or u would mean index or uindex or not?

* Recommended unsigned or signed integer types for numbers that should not be negative -- depending on whether Rust provides integer overflow checking. * Crisper integer style guideline section.

Thiez · 2014-07-14T11:48:49Z

@errordeveloper I doubt that would be a problem because in most cases one would iterate over an indexable collection directly rather than indexing (and paying for bounds checking). Not that I support this RFC...

Ericson2314 · 2014-07-15T16:13:32Z

There should be some integer type that corresponds to pointer size. That is why I like intptr/uintptr much more than just adding an arbitrary 32-bits minimum.

There could be some fancy macro that you give constraints (fastest / smallest, max abs val, signed/unsigned, etc) and it spits out a type or aborts compilation. This seems more versatile and less namespace-cluttering than C99's solution.

BTW, last I checked rust let you transmute int/uint to whatever fixed-size integers type fit for the current build target. This should be arguably disallowed.

I would love some infrastructure everybody could share to do continuous integration with different int sizes. This probably necessitates virtualizing different CPU architectures (because int--ptr transmutations), but it would be cool if it didn't.

I initially didn't think compiler-added overflow checks was too important. But if that is what it takes to make people use unsigned integers for natural numbers, I am all for it.

huonw · 2014-07-15T21:42:57Z

BTW, last I checked rust let you transmute int/uint to whatever fixed-size integers type fit for the current build target. This should be arguably disallowed.

Trying to protect against everything that can change per platform/configuration is impossible. e.g.

#[cfg(windows)]
struct Foo { x: u8 }
#[cfg(not(windows))]
struct Foo { x: u16 }

transmute::<Foo, u8>(...)

Ericson2314 · 2014-07-15T22:03:33Z

Impossible I think not.

I'd like to some how match on a list of archs one attempts to support, lest one forget a case, rather than just config-chaining, and hoping for the best. This shouldn't be to hard.

More radically, for the purposes of type checking it would be nice to take an intersection intersection type or something analogous: e.g:

// can't be transmuted / unique size,
// implements all traits that both u8 and u16 do.
type Magic = u8 ∩ u16

struct Foo { x: Magic }

This is kind of "mangling of phases", and a rather big step from the way things work currently. The alternative is just to part of compilation brute-force the various configuration options, or just cross compile and virtualize as I said before.

alexchandel · 2014-07-28T12:10:26Z

Given the purpose of the int and uint types, to be large enough to hold any memory address on the target machine, the intptr/uintptr names seem appropriate. Paralleling the C standard is another advantage, since they would be familiar to a significant target community. Imposing an arbitrary 32 bit minimum would be inconsistent with the purpose of the type, and would waste memory on targets like 8-bit and 16-bit PICs.

l0kod · 2014-08-15T16:34:21Z

In rust-lang/rust#9940, @thestinger said:

I just don't think this issue has any real benefit beyond painting the bikeshed a bit more to my liking. It's not worth the backwards compatibility break at this point.

I think renaming int and uint worth the backwards compatibility break because it will be the (only) occasion to check for good int use in existing code before they create bugs…

lilyball · 2014-08-15T20:15:45Z

int and uint have two purposes:

Be sized properly to covert the address range for the architecture.
Be the de-facto integral type to use when the programmer doesn't really care, typically because they're using numbers that are small enough that 32-bit vs 64-bit doesn't matter.

Claiming that purpose 1 is the only purpose for these types is wrong, and yet that's the motivation for renaming to intptr / uintptr.

The only real issue with int/uint right now is that on 16-bit machines, these types are probably 16-bit types, and that's probably unexpectedly small. There is a lesser issue with code that is written on a 64-bit machine overflowing on a 32-bit machine, but anyone who's using values larger than 32 bits should recognize that this is so and deal with it appropriately.

Importantly, renaming these to intptr/uintptr is not going to solve the issue of code going from a 64-bit to a 32-bit machine. Even if we encourage the use of e.g. i32 as the de-facto type, people will still be using intptr/uintptr a lot, because that's the type used for indexing and sizing of containers. If anything, encouraging the use of i32 (or i64) is only going to make things worse as people sprinkle as i32 and as uintptr all over their code whenever they work with containers. In this scenario, someone writing code on a 32-bit machine might be fine, but on a 64-bit machine they may end up with intptr/uintptr values that are >32 bits, and the as i32 overflows. Or perhaps the code is run on a 16-bit machine and the as uintptr now overflows.

Basically, renaming these types does not really do anything at all for overflow, it just encourages people to add more unchecked integral casts to their code.

Because of this, the only approach I can support is keeping int/uint but defining them as being at least 32 bits. Using 32-bit integers on a 16-bit machine doesn't seem like it should be particularly problematic; if int becomes intptr people are just going to end up using 32-bit or 64-bit integers instead anyway. Alternatively, we could just ignore the issue and assume anyone using 16-bit machines is basically writing custom code anyway (or using libraries that explicitly support 16-bit machines).

l0kod · 2014-08-15T22:17:54Z

Be the de-facto integral type to use when the programmer doesn't really care, typically because they're using numbers that are small enough that 32-bit vs 64-bit doesn't matter.

For a static typing language, the int and uint are kind of weird because of their dynamic/unknown/architecture-related size. This particularity should be highlighted.

For this reason, I don't think it's a good idea to promote the int nor uint as a de-facto integer type. Does a programmer need to ask itself good questions for choosing every type except an integer type?
The #115 and rust-lang/rust#6023 are not in that direction:

You should only use a pointer-size integer if it's actually what you need. You can't use a fixed-size integer without thinking about the bounds, so a pointer-size integer is a bad fallback.

Obviously, the architecture-related integer is needed for memory-related access (i.e. indexing and sizing of containers). Is there a good reason for hiding the initial goal and bug-prone (e.g. cast) property of a type?

Basically, renaming these types does not really do anything at all for overflow, it just encourages people to add more unchecked integral casts to their code.

That's a possibility, but if they are aware of the architecture-related property they have more reasons to do the right choice: to choose the right type everywhere.
The index or intptr are more meaningful than the simple int which, for many languages, do not denote any memory-related notion.

If that make sense, the "at least 32-bit" exception is not needed. Moreover it would introduce another weird rule to this already weird type.

huonw · 2014-08-15T23:10:41Z

Be the de-facto integral type to use when the programmer doesn't really care, typically because they're using numbers that are small enough that 32-bit vs 64-bit doesn't matter.

This isn't really the case, it's just using any other types is annoying and historically unfavoured (since we had default-to-int functionality previously, and uint and int get the nice short u/i suffixes). Defaulting to the int/uint type also has downsides with respect to memory use: most numbers are small, so the 32-bit extra bits of int vs. i32 (on 64-bit platforms) is a complete waste.

1fish2 · 2014-08-15T23:19:49Z

the architecture-related integer is needed for memory-related access (i.e. indexing and sizing of containers).

Alternatively, declare each array's index type rather than using an architecture-dependent type that spans the address space.

1fish2 · 2014-11-07T05:21:59Z

Good plan. Would you like me to withdraw this PR and submit a new PR to rename int/uint and select i32 as the fallback type?

And to be sure I have it precisely right, "fallback" means both the type inference default for integer literals and the recommended programmers' go-to type?

thestinger · 2014-11-07T05:24:29Z

@1fish2: Yeah, I think a new RFC with that scope would have a high chance of success.

And to be sure I have it precisely right, "fallback" means both the type inference default for integer literals and the recommended programmers' go-to type?

Yeah, the type inference default (which was accepted again with https://github.com/rust-lang/rfcs/blob/master/text/0212-restore-int-fallback.md) which is essentially the type that the language is recommending as a good default.

l0kod · 2014-11-07T07:38:51Z

For bikeshed discussion about new int/uint names see http://discuss.rust-lang.org/t/if-int-has-the-wrong-size/454
Seems like the isize/usize is favored.

thestinger · 2014-11-07T07:41:41Z

Calling them isize / usize wouldn't really be correct. The maximum object size (including for arrays) needs to be capped 1 bit lower than the pointer size. The types are defined as having the same number of bits as pointers, not as a way of measuring sizes.

Thiez · 2014-11-07T09:03:28Z

Perhaps a bit offtopic, but suppose we decide to stop using int and uint for anything unrelated to indexing and pointers. What exactly is the advantage of having int at all? Why not have only uint? If int is dropped then so is the silly requirement that the maximum object size needs to be capped.

netvl · 2014-11-07T09:11:02Z

@Thiez, isn't it there to represent a difference between pointers? You can't have it without a sign.

Thiez · 2014-11-07T10:15:57Z

Sure you can. Suppose we have a machine with 256 bytes of memory, so size_of::<uint>() == size_of::<int>() == 1. We have two pointers, represented as uints: p = 100u; q = 200. What is the difference between p and q? let (diffpq, diffqp) = (q - p, p - q); Then, by virtue of unsigned integer wraparound, we have p + diffpq == q and q + diffqp == p. If for whatever reason we wish to know if p < q, we should just use that test, rather than checking if diffpq > 0.

tbu- · 2014-11-07T10:20:34Z

I'm currently working on a draft on changing the default fallback type to i32 (Link to the branch).

@Thiez This is indeed offtopic, I don't think it's helping the RFC.

1fish2 · 2014-11-07T10:22:04Z

OK. I'll do that in a couple days and let you review it before sending the PR.
Do you want to jointly author it?

errordeveloper · 2014-11-07T13:17:24Z

Have we thought of just adding a lint warning when the type in question is
used for anything other then indexing?

tbu- · 2014-11-07T13:18:54Z

@errordeveloper If it's used for indexing it's already automatically inferred to be a uint. Also, you can't index an array using int.

Ericson2314 · 2014-11-07T17:49:12Z

@thestinger if indexing is done with uint, is there any problem with 32-bit processes on 64-bit machines? I do agree we should call them something along the lines of uptr, iptr, and make i32 the default (cause nobody will "program in the small" on 16-bit machine).

thestinger · 2014-11-07T20:02:17Z

@Ericson2314: There's no problem in terms of uint with 32-bit processes on 64-bit machines. It can still address every byte in the address space. The maximum positive int value needs to be an upper bound on object and dynamic array size in order to keep offset well-defined.

thestinger · 2014-11-07T20:05:36Z

@Thiez: Pointer arithmetic is inherently signed because it can go in both directions, not unsigned. It is not well-defined to overflow normal (fast) pointer arithmetic.

Ericson2314 · 2014-11-07T21:54:01Z

@thestinger so negative ptr offsets are an essential thing to support?

thestinger · 2014-11-07T22:08:25Z

@Ericson2314: Yes, being able to calculate pointer differences and do negative offsets is an essential feature. Ensuring correctness requires limiting the maximum object size to int::MAX. LLVM also only offers fast pointer arithmetic via signed offsets, so it's important in the short term from a performance point of view.

Ericson2314 · 2014-11-12T01:08:48Z

@thestringer Ok, I'm sold. Especially given the performance aspect.

1fish2 · 2014-11-13T05:22:11Z

The new, simpler draft RFC to replace the present one is at 0000-int-name.md.

Comments?

thestinger · 2014-11-13T05:29:22Z

@1fish2: It looks great to me.

l0kod · 2014-11-13T08:19:00Z

@1fish2: great!

I would also add the argument that the renaming process would be the good and probably only time to spot future bugs before they appear.

There is also the question about integer suffixes i and u. They must change (or disappear) accordingly to the int/uint, otherwise the bugs related to this names will surely remains in the existing code.

A good example of using uindex and not index should help too: object indexing and object length (not sure anymore)?

1fish2 · 2014-11-13T08:34:41Z

Excellent points, Mickaël.

I just sent the PR. Do you want to add these points there? We'll continue the discussion there.

1fish2 · 2014-11-13T08:46:45Z

I propose to withdraw this RFC in favor of the single-purpose RFC: Renaming int/uint (PR #464).

errordeveloper · 2014-11-13T09:02:36Z

On 13 November 2014 08:46, Jerry Morrison notifications@github.com wrote:

I propose to withdraw this RFC in favor of the single-purpose RFC:
Renaming int/uint (PR #464 #464).

Makes sense.

emberian · 2014-11-13T10:56:13Z

@1fish2 you have the power to close it :)

emberian reviewed Jul 12, 2014
View reviewed changes

errordeveloper reviewed Jul 12, 2014
View reviewed changes

incorporate feedback

fcaa58d

* Crisper/broader motivation. * "The smallest integers that span the address space" is clearer than "pointer-sized integers". * More concise. * More "not in scope" items.

glaebhoerl reviewed Jul 13, 2014
View reviewed changes

1fish2 mentioned this pull request Jul 13, 2014

RFC: Scoped attributes for checked arithmetic #146

Closed

style guidelines w/ or w/o overflow checking

8df4ea3

* Recommended unsigned or signed integer types for numbers that should not be negative -- depending on whether Rust provides integer overflow checking. * Crisper integer style guideline section.

l0kod mentioned this pull request Aug 15, 2014

RFC: rename int and uint to intptr/uintptr rust-lang/rust#9940

Closed

1fish2 mentioned this pull request Nov 13, 2014

RFC: Renaming int/uint #464

Closed

Propose to withdraw this RFC in favor of PR #464

561fa48

brson assigned nrc and unassigned brson Nov 13, 2014

1fish2 closed this Nov 13, 2014

thestinger mentioned this pull request Dec 28, 2014

RFC: Rename int/uint to something better #544

Merged


		# Background

		Rust defines types `int` and `uint` as integers that are wide enough to hold a


		# Drawbacks

		- Renaming `int`/`uint` requires figuring out which of the current uses to replace with `index`/`uindex` vs. `i32`/`u32`/`BigInt`.


		# Motivation

		So Rust libraries won't have new overflow bugs when run on embedded devices with

RFC: int/uint portability to 16-bit CPUs #161

RFC: int/uint portability to 16-bit CPUs #161

Conversation

1fish2 commented Jul 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

errordeveloper commented Jul 12, 2014

Choose a reason for hiding this comment

errordeveloper commented Jul 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

errordeveloper commented Jul 12, 2014

huonw commented Jul 12, 2014

errordeveloper commented Jul 12, 2014

1fish2 commented Jul 12, 2014

1fish2 commented Jul 12, 2014

dobkeratops commented Jul 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

errordeveloper commented Jul 13, 2014

Thiez commented Jul 14, 2014

Ericson2314 commented Jul 15, 2014

huonw commented Jul 15, 2014

Ericson2314 commented Jul 15, 2014

alexchandel commented Jul 28, 2014

l0kod commented Aug 15, 2014

lilyball commented Aug 15, 2014

l0kod commented Aug 15, 2014

huonw commented Aug 15, 2014

1fish2 commented Aug 15, 2014

1fish2 commented Nov 7, 2014

thestinger commented Nov 7, 2014

l0kod commented Nov 7, 2014

thestinger commented Nov 7, 2014

Thiez commented Nov 7, 2014

netvl commented Nov 7, 2014

Thiez commented Nov 7, 2014

tbu- commented Nov 7, 2014

1fish2 commented Nov 7, 2014

errordeveloper commented Nov 7, 2014

tbu- commented Nov 7, 2014

Ericson2314 commented Nov 7, 2014

thestinger commented Nov 7, 2014

thestinger commented Nov 7, 2014

Ericson2314 commented Nov 7, 2014

thestinger commented Nov 7, 2014

Ericson2314 commented Nov 12, 2014

1fish2 commented Nov 13, 2014

thestinger commented Nov 13, 2014

l0kod commented Nov 13, 2014

1fish2 commented Nov 13, 2014

1fish2 commented Nov 13, 2014

I propose to withdraw this RFC in favor of the single-purpose RFC: Renaming int/uint (PR #464).

errordeveloper commented Nov 13, 2014

emberian commented Nov 13, 2014