-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: int/uint portability to 16-bit CPUs #161
Conversation
Both Issue #14758 and Issue #9940 call for RFCs. This RFC summarizes those discussions, explains the core issue of code portability to 16-bit CPUs (also of 64-bit code to 32-bit CPUs), explain what's meant by "default" integer types, makes 2 specific proposals, and proposes usage style for integer sizing.
|
||
# Background | ||
|
||
Rust defines types `int` and `uint` as integers that are wide enough to hold a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this even true on 16-bit devices, or do modern ones still use a segmentation system? Are there any relevant 16-bit chips anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XMEGA are 8/16-bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some Atmel AVR controllers http://en.wikipedia.org/wiki/Atmel_AVR and some PIC controllers http://en.wikipedia.org/wiki/PIC_microcontroller have 16-bit address spaces. These tend to have Harvard architectures, that is, separate instruction and data memory/addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the MSP430...
Sounds reasonable. |
|
||
# Drawbacks | ||
|
||
- Renaming `int`/`uint` requires figuring out which of the current uses to replace with `index`/`uindex` vs. `i32`/`u32`/`BigInt`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And some people will just end-up redefining the int
and uint
to be 32-bit in their projects...
Overall, it's quite a reasonable thing to do, considering Rust's goals. Although, may be the motivation and title could be generalised a bit more... |
|
||
# Motivation | ||
|
||
So Rust libraries won't have new overflow bugs when run on embedded devices with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just replace the entier paragraph with: "Avoid bugs where programmer presumed default integer size for indexing of arrays and eleswhere."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can expand a little to just say that this concerns non-32 bit targets, mostly, 8-bit and 16-bit MCUs and, to some extend 64-bit CPUs too.
Am I correct in understanding that this is to keep the integral type used for array indexing default to "native", i.e. fastest, integer? (On the AVR the By the way, what about suffixes? Would this imply dropping of |
The integral type used for indexing is the smallest one that covers the address space. "fastest"/"native" is irrelevant. |
@huonw thanks for the better formulation, perhaps that's the way the RFC/docs should state it. |
Agreed. I'll rephrase that. |
Yes, the motivation and title can be generalized. I was trying to start |
as well as embedded, some machines have had coprocessors with smaller address spaces.. not so common now, but who knows what the future will bring My suggestions would have been ... [1] Officially define [2] Then add other types which are more specific.. These are complimentary to the specific types,i32 etc.. code might cluster data dynamically to suit its platform. [3] Vec could be defined more versatile as
Seems like the C name 'long' being distinct from int is actually useful, maybe even swapping int out as suggested by the OP would be good, but adding another complimentary type would be less disruptive it think. |
* Crisper/broader motivation. * "The smallest integers that span the address space" is clearer than "pointer-sized integers". * More concise. * More "not in scope" items.
|
||
> In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this. ... | ||
> | ||
> Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion makes a lot of sense in a context where overflow/underflow silently wraps around. However, if something like RFC PR #146 were to be implemented, then it would once again make sense to use types which more accurately express the range of legal values (i.e., which are self-documenting), because compiler-added checks can be enabled to catch errors where the value would go out of range. Accurate types with compiler-added assertions beats inaccurate types with programmer-added assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glaebhoerl So would you recommend we wait for PR #146 to be accepted or rejected before evaluating this RFC further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah. This was just an ancillary remark on an ancillary part of the proposal. The main part of the proposal (which is about changes to the language to better accomodate [portability to] 16-bit architectures) is unaffected.
(And anyway, the suggestion makes sense in the context of the current language, and the style guide could just be updated again if the language changes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Nice insight, @glaebhoerl.
I'll make the style guide recommendation conditional on overflow-checking.
Q. Does/will overflow checking happen during conversion between integer types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A. It doesn't currently, but in the context of #146, if #[overflow_checks(on)]
, I think it should.
Rationale: As far as I can tell as
is meant to preserve meaning rather than representation, e.g. 5000i32 as f32
is equivalent to 5000f32
and not to transmute::<i32, f32>(5000i32)
. Therefore if attempting to transport the meaning of the original value to the target type causes it to overflow, it should be caught.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Otherwise computing a value in one integer type then converting to another would accidentally bypass the overflow checks.
Also, another point this RFC should consider is how would a typical |
* Recommended unsigned or signed integer types for numbers that should not be negative -- depending on whether Rust provides integer overflow checking. * Crisper integer style guideline section.
@errordeveloper I doubt that would be a problem because in most cases one would iterate over an indexable collection directly rather than indexing (and paying for bounds checking). Not that I support this RFC... |
There should be some integer type that corresponds to pointer size. That is why I like There could be some fancy macro that you give constraints (fastest / smallest, max abs val, signed/unsigned, etc) and it spits out a type or aborts compilation. This seems more versatile and less namespace-cluttering than C99's solution. BTW, last I checked rust let you transmute I would love some infrastructure everybody could share to do continuous integration with different int sizes. This probably necessitates virtualizing different CPU architectures (because int--ptr transmutations), but it would be cool if it didn't. I initially didn't think compiler-added overflow checks was too important. But if that is what it takes to make people use unsigned integers for natural numbers, I am all for it. |
Trying to protect against everything that can change per platform/configuration is impossible. e.g. #[cfg(windows)]
struct Foo { x: u8 }
#[cfg(not(windows))]
struct Foo { x: u16 }
transmute::<Foo, u8>(...) |
Impossible I think not. I'd like to some how match on a list of archs one attempts to support, lest one forget a case, rather than just config-chaining, and hoping for the best. This shouldn't be to hard. More radically, for the purposes of type checking it would be nice to take an intersection intersection type or something analogous: e.g: // can't be transmuted / unique size,
// implements all traits that both u8 and u16 do.
type Magic = u8 ∩ u16
struct Foo { x: Magic } This is kind of "mangling of phases", and a rather big step from the way things work currently. The alternative is just to part of compilation brute-force the various configuration options, or just cross compile and virtualize as I said before. |
Given the purpose of the |
In rust-lang/rust#9940, @thestinger said:
I think renaming |
Claiming that purpose 1 is the only purpose for these types is wrong, and yet that's the motivation for renaming to The only real issue with Importantly, renaming these to Basically, renaming these types does not really do anything at all for overflow, it just encourages people to add more unchecked integral casts to their code. Because of this, the only approach I can support is keeping |
For a static typing language, the For this reason, I don't think it's a good idea to promote the
Obviously, the architecture-related integer is needed for memory-related access (i.e. indexing and sizing of containers). Is there a good reason for hiding the initial goal and bug-prone (e.g. cast) property of a type?
That's a possibility, but if they are aware of the architecture-related property they have more reasons to do the right choice: to choose the right type everywhere. If that make sense, the "at least 32-bit" exception is not needed. Moreover it would introduce another weird rule to this already weird type. |
This isn't really the case, it's just using any other types is annoying and historically unfavoured (since we had default-to- |
Alternatively, declare each array's index type rather than using an architecture-dependent type that spans the address space. |
Good plan. Would you like me to withdraw this PR and submit a new PR to rename And to be sure I have it precisely right, "fallback" means both the type inference default for integer literals and the recommended programmers' go-to type? |
@1fish2: Yeah, I think a new RFC with that scope would have a high chance of success.
Yeah, the type inference default (which was accepted again with https://github.com/rust-lang/rfcs/blob/master/text/0212-restore-int-fallback.md) which is essentially the type that the language is recommending as a good default. |
For bikeshed discussion about new |
Calling them |
Perhaps a bit offtopic, but suppose we decide to stop using |
@Thiez, isn't it there to represent a difference between pointers? You can't have it without a sign. |
Sure you can. Suppose we have a machine with 256 bytes of memory, so |
I'm currently working on a draft on changing the default fallback type to @Thiez This is indeed offtopic, I don't think it's helping the RFC. |
OK. I'll do that in a couple days and let you review it before sending the PR. |
Have we thought of just adding a lint warning when the type in question is |
@errordeveloper If it's used for indexing it's already automatically inferred to be a |
@thestinger if indexing is done with uint, is there any problem with 32-bit processes on 64-bit machines? I do agree we should call them something along the lines of |
@Ericson2314: There's no problem in terms of |
@Thiez: Pointer arithmetic is inherently signed because it can go in both directions, not unsigned. It is not well-defined to overflow normal (fast) pointer arithmetic. |
@thestinger so negative ptr offsets are an essential thing to support? |
@Ericson2314: Yes, being able to calculate pointer differences and do negative offsets is an essential feature. Ensuring correctness requires limiting the maximum object size to |
@thestringer Ok, I'm sold. Especially given the performance aspect. |
The new, simpler draft RFC to replace the present one is at 0000-int-name.md. Comments? |
@1fish2: It looks great to me. |
@1fish2: great! I would also add the argument that the renaming process would be the good and probably only time to spot future bugs before they appear. There is also the question about integer suffixes A good example of using |
Excellent points, Mickaël. I just sent the PR. Do you want to add these points there? We'll continue the discussion there. |
I propose to withdraw this RFC in favor of the single-purpose RFC: Renaming int/uint (PR #464). |
On 13 November 2014 08:46, Jerry Morrison notifications@github.com wrote:
Makes sense. |
@1fish2 you have the power to close it :) |
Both Issue #14758 and Issue #9940 call for RFCs.
This RFC summarizes those discussions, explains the core issue of
code portability to 16-bit CPUs (also of 64-bit code to 32-bit CPUs),
explains what's meant by "default" integer types, makes 2 specific
proposals, and proposes coding style for integer sizing.