Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify UB around immutability & mutation #1385

Merged
merged 3 commits into from
Jul 27, 2023

Conversation

ivanbakel
Copy link
Contributor

I personally found this description of UB confusing, since the use of "reached" suggests that UB only happens for read bytes, and the definition of immutability is not given, allowing for multiple interpretations: does the "data" have to be immutable from the first read? From the creation of the reference? Between reads from the immutable accessor, but not otherwise? etc.

This clarifies the actual UB conditions, based on this Zulip interaction and this reference discussion in two ways:

  • The definition of "data" is clarified to be stated in terms of bytes, in a way that should avoid ambiguity about which bytes are considered. Based on the GH issue, this clarification should also allow for use of a *mut pointer through a shared reference, which is not in itself UB. Based on the Zulip issue, the definition includes padding bytes, which may be surprising.
  • The definition of immutability & mutation for a set of bytes is clarified to mean forbidding all non-0-byte writes.

I also tried to avoid use of the word "reachable", since I thought it be implying a transitive property about immutability for references which (to my understanding) doesn't actually apply. This may now fail to cover mutable-references-behind-shared-references in some way, but I'll leave it to review to decide that.

@ehuss
Copy link
Contributor

ehuss commented Jul 21, 2023

cc @RalfJung Do you have a chance to review this?

@RalfJung
Copy link
Member

RalfJung commented Jul 21, 2023

Hm... not sure. This includes things which I would say are commentary, and already implied by the previous sentence which is the actual spec (such as the remark about padding, or the last sentence about writes not modifying the byte's contents). We should at least clearly separate comments from spec -- we don't want to leave the impression that the sentence "Writes which do not modify the byte contents are still mutations" changes the definition of "mutation".

Also, "bytes of a value pointed to by a shared ref" is IMO even less clear than the previous formulation. What about just saying "bytes pointed to by a shared ref"? Then IMO the sentence about padding becomes unnecessary. If anything we might want to define "pointed to" -- which we actually kind of do define later

The span of bytes it points to is determined by the pointer value and the size of the pointee type (using size_of_val).


Based on the Zulip issue, the definition includes padding bytes, which may be surprising.

When talking about bytes, there isn't even such a thing as "padding bytes". They only exist when talking about values and how those relate to their byte representation.

By rephrasing the entire definition in terms of bytes, I don't think we still need the padding comment -- it is redundant.

@RalfJung
Copy link
Member

RalfJung commented Jul 21, 2023

This may now fail to cover mutable-references-behind-shared-references in some way, but I'll leave it to review to decide that.

Yeah that's a good question. Maybe add something like "all bytes pointed to by a shared reference, including transitively through other references (shared and mutable) and Boxes, including those stored in fields of compound types"?

This isn't even the bullet point about the aliasing model, but I think the old "reached" was intended to capture this transitivity. (The term goes back all the way to the initial commit, so it's hard to tell.)

@ivanbakel
Copy link
Contributor Author

This includes things which I would say are commentary, and already implied by the previous sentence which is the actual spec (such as the remark about padding, or the last sentence about writes not modifying the byte's contents). We should at least clearly separate comments from spec -- we don't want to leave the impression that the sentence "Writes which do not modify the byte contents are still mutations" changes the definition of "mutation".

I originally had these as notes, but I was worried about over-bulking the section. I can move the unchanged-value addendum back to a note, but I'm certain it's surprising enough to include explicitly. The padding note can be removed when moving to the byte-based definition; but it's worth dropping any mention of "data" then, since I think it's confusing (if bytes are what really matter.)

Yeah that's a good question. Maybe add something like "all bytes pointed to by a shared reference, including transitively through other references (shared and mutable) and Boxes, including those stored in fields of compound types"?

I will add that.

ivanbakel pushed a commit to ivanbakel/reference that referenced this pull request Jul 24, 2023
This is part of the feedback on rust-lang#1385.

Ralf made the point that the immutability definition could be restated
solely in terms of bytes, which has the added benefit of no longer
requiring the note on padding (since it's a natural consequence of the
byte version.)

The new wording for shared references also clarifies the case of mutable
references behind shared ones, and reintroduces some of the transitivity
property that I removed in my previous commit. The wording is separate
from that for immutable bindings, since those don't have transitive
immutability.

This also bumps the definition of bytes pointed to by references and
pointers into its own subsection, so that it can be linked to by the UB
definition, to avoid duplication.

Co-authored-by: Ralf Jung <post@ralfj.de>
ivanbakel and others added 2 commits July 24, 2023 14:57
I personally found this description of UB confusing, since the use of
"reached" suggests that UB only happens for read bytes, and the
definition of immutability is not given, allowing for multiple
interpretations: does the "data" have to be immutable from the first
read? From the creation of the reference? Between reads from the
immutable accessor, but not otherwise? etc.

This clarifies the actual UB conditions, based on this Zulip
interaction:
https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/What.20exactly.20are.20.22immutable.22.20and.20.22reached.22.20in.20shared.20ref.20UB.3F
and this reference discussion:
rust-lang#1227
in two ways:
  * The definition of "data" is clarified to be stated in terms of
    bytes, in a way that should avoid ambiguity about which bytes are
    considered. Based on the GH issue, this clarification should also
    allow for use of a `*mut` pointer through a shared reference, which
    is not in itself UB. Based on the Zulip issue, the definition
    includes padding bytes, which may be surprising.
  * The definition of immutability & mutation for a set of bytes is
    clarified to mean forbidding *all* non-0-byte writes.
This is part of the feedback on rust-lang#1385.

Ralf made the point that the immutability definition could be restated
solely in terms of bytes, which has the added benefit of no longer
requiring the note on padding (since it's a natural consequence of the
byte version.)

The new wording for shared references also clarifies the case of mutable
references behind shared ones, and reintroduces some of the transitivity
property that I removed in my previous commit. The wording is separate
from that for immutable bindings, since those don't have transitive
immutability.

This also bumps the definition of bytes pointed to by references and
pointers into its own subsection, so that it can be linked to by the UB
definition, to avoid duplication.

Co-authored-by: Ralf Jung <post@ralfj.de>
@ivanbakel ivanbakel force-pushed the immutable-data-UB-clarification branch from 8c18a31 to 70886e3 Compare July 24, 2023 12:57
src/behavior-considered-undefined.md Outdated Show resolved Hide resolved
src/behavior-considered-undefined.md Outdated Show resolved Hide resolved
These changes should preserve the meaning of the contents.

Co-authored-by: Ralf Jung <post@ralfj.de>
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ivanbakel and @RalfJung!

@ehuss ehuss added this pull request to the merge queue Jul 27, 2023
Merged via the queue into rust-lang:master with commit a64394d Jul 27, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 1, 2023
Update books

## rust-lang/reference

7 commits in 1ea0178266b3f3f613b0fabdaf16a83961c99cdb..9cd5c5a6ccbd4c07c65ab5c69a53286280308c95
2023-07-29 22:29:51 UTC to 2023-07-16 20:12:46 UTC

- Fix merge queue building twice. (rust-lang/reference#1383)
- Clarify UB around immutability & mutation (rust-lang/reference#1385)
- mention the extra const UB (rust-lang/reference#1273)
- Operator expressions: make the note about division by zero clearer. (rust-lang/reference#1384)
- Make unsafe keyword docs less confusing (rust-lang/reference#1379)
- Say that division by zero for primitive types panics (rust-lang/reference#1382)
- Add CI trigger for merge queues. (rust-lang/reference#1381)

## rust-lang/rust-by-example

3 commits in 8a87926a985ce32ca1fad1be4008ee161a0b91eb..07e0df2f006e59d171c6bf3cafa9d61dbeb520d8
2023-07-24 11:37:55 UTC to 2023-07-24 11:35:36 UTC

- Added attribute unused_labels - fixed warning. (rust-lang/rust-by-example#1729)
- more explanation about panic (rust-lang/rust-by-example#1728)
- chore: add the portuguese version of this project to `readme.md` (rust-lang/rust-by-example#1727)

## rust-lang/rustc-dev-guide

26 commits in b5a12d95e32ae53791cc6ab44417774667ed2ac6..24eebb6df96d037aad285e4f7793bd0393a66878
2023-07-30 11:23:23 UTC to 2023-07-11 06:02:34 UTC

- fix(name-resolution): remove unnecessary closing paranthesis (rust-lang/rustc-dev-guide#1760)
- fix(macro-expansion.md): fix the article `an` to `a` (rust-lang/rustc-dev-guide#1759)
- fix(serialization.md): fix the name of a derive macro (rust-lang/rustc-dev-guide#1756)
- fix(serialization.md): add a necessary plural suffix (rust-lang/rustc-dev-guide#1757)
- fix(salsa.md): add punctuation to prevent confusion (rust-lang/rustc-dev-guide#1754)
- fix(salsa.md): remove duplicate "To Be" verb (rust-lang/rustc-dev-guide#1755)
- feat(fuzzing.md): make `halfempty` word a link (rust-lang/rustc-dev-guide#1750)
- fix(about.md): use `a` instead of `an` (rust-lang/rustc-dev-guide#1751)
- refactor(git.md): make git-scm links clickable (rust-lang/rustc-dev-guide#1747)
- fix(walkthrough.md) add a comma operator to eliminate ambiguity (rust-lang/rustc-dev-guide#1749)
- fix(git.md): remove a confusing end of sentence character (rust-lang/rustc-dev-guide#1748)
- refactor(profiling/with_perf): remove a wrong to be verb (rust-lang/rustc-dev-guide#1746)
- refactor(tests/headers): remove duplicate list item (rust-lang/rustc-dev-guide#1745)
- refactor(test/headers.md): make the meaning more obvious (rust-lang/rustc-dev-guide#1744)
- refactor(tests/ui): remove unnecessary duplicate word (rust-lang/rustc-dev-guide#1743)
- refactor(compiletest): remove unnecessary duplicate word (rust-lang/rustc-dev-guide#1742)
- generic_arguments.md: substs -> GenericArgs (rust-lang/rustc-dev-guide#1741)
- fix(suggested): remove an unnecessary and confusing statement (rust-lang/rustc-dev-guide#1739)
- fix(how-to-build-and-run): fix a typo ("fromer" -> "former") (rust-lang/rustc-dev-guide#1736)
- fix(how-to-build-and-run): remove a wrong paragraph (rust-lang/rustc-dev-guide#1735)
- coverage code has moved (rust-lang/rustc-dev-guide#1728)
- linked issue is closed (rust-lang/rustc-dev-guide#1729)
- remove duplicate reference in about-this-guide.md (rust-lang/rustc-dev-guide#1734)
- Explain more in depth what early and late bound generic parameters are (rust-lang/rustc-dev-guide#1732)
- add section for normalization with the new solver (rust-lang/rustc-dev-guide#1731)
- Improve cleanup-crew.md with an example post (rust-lang/rustc-dev-guide#1730)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 14, 2023
Update books

## rust-lang/book

3 commits in 668c64760b5c7ea654facb4ba5fe9faddfda27cc..72187f5cd0beaaa9c6f584156bcd88f921871e83
2023-08-04 14:42:07 UTC to 2023-08-03 13:36:44 UTC

- redirects: change link for the `#![no_std]` tutorial (rust-lang/book#3705)
- [chpt10.2] - Small wording changes (rust-lang/book#3724)
- Improve sentence (rust-lang/book#3725)

## rust-embedded/book

1 commits in 1e5556dd1b864109985d5871616ae6b9164bcead..99ad2847b865e96d8ae7b333d3ee96963557e621
2023-08-11 06:31:04 UTC to 2023-08-11 06:31:04 UTC

- Fix a small typo in qemu.md (rust-embedded/book#359)

## rust-lang/nomicon

1 commits in 302b995bcb24b70fd883980fd174738c3a10b705..388750b081c0893c275044d37203f97709e058ba
2023-08-10 21:15:21 UTC to 2023-08-10 21:15:21 UTC

- Document thiscall abi (rust-lang/nomicon#311)

## rust-lang/reference

10 commits in 1ea0178266b3f3f613b0fabdaf16a83961c99cdb..d43038932adeb16ada80e206d4c073d851298101
2023-08-12 19:07:28 UTC to 2023-07-16 20:12:46 UTC

- Document thiscall abi (rust-lang/reference#1092)
- add section about implied bounds (rust-lang/reference#1261)
- Clearly specify the `instruction_set` effects (rust-lang/reference#1307)
- Fix merge queue building twice. (rust-lang/reference#1383)
- Clarify UB around immutability & mutation (rust-lang/reference#1385)
- mention the extra const UB (rust-lang/reference#1273)
- Operator expressions: make the note about division by zero clearer. (rust-lang/reference#1384)
- Make unsafe keyword docs less confusing (rust-lang/reference#1379)
- Say that division by zero for primitive types panics (rust-lang/reference#1382)
- Add CI trigger for merge queues. (rust-lang/reference#1381)

## rust-lang/rust-by-example

3 commits in 8a87926a985ce32ca1fad1be4008ee161a0b91eb..07e0df2f006e59d171c6bf3cafa9d61dbeb520d8
2023-07-24 11:37:55 UTC to 2023-07-24 11:35:36 UTC

- Added attribute unused_labels - fixed warning. (rust-lang/rust-by-example#1729)
- more explanation about panic (rust-lang/rust-by-example#1728)
- chore: add the portuguese version of this project to `readme.md` (rust-lang/rust-by-example#1727)

## rust-lang/rustc-dev-guide

31 commits in b5a12d95e32ae53791cc6ab44417774667ed2ac6..b123ab4754127d822ffb38349ce0fbf561f1b2fd
2023-08-14 08:34:59 UTC to 2023-07-11 06:02:34 UTC

- fix: stabilize debugger_visualizer (rust-lang/rustc-dev-guide#1766)
- feat(part-5-intro): make "Part 5" obvious (rust-lang/rustc-dev-guide#1753)
- Update from `#[warn_]` to `#[warning]` diagnostic attributes (rust-lang/rustc-dev-guide#1765)
- Add RPITIT documentation (rust-lang/rustc-dev-guide#1764)
- fix(visitor.md): fix a type name in a code sample (rust-lang/rustc-dev-guide#1762)
- fix(name-resolution): remove unnecessary closing paranthesis (rust-lang/rustc-dev-guide#1760)
- fix(macro-expansion.md): fix the article `an` to `a` (rust-lang/rustc-dev-guide#1759)
- fix(serialization.md): fix the name of a derive macro (rust-lang/rustc-dev-guide#1756)
- fix(serialization.md): add a necessary plural suffix (rust-lang/rustc-dev-guide#1757)
- fix(salsa.md): add punctuation to prevent confusion (rust-lang/rustc-dev-guide#1754)
- fix(salsa.md): remove duplicate "To Be" verb (rust-lang/rustc-dev-guide#1755)
- feat(fuzzing.md): make `halfempty` word a link (rust-lang/rustc-dev-guide#1750)
- fix(about.md): use `a` instead of `an` (rust-lang/rustc-dev-guide#1751)
- refactor(git.md): make git-scm links clickable (rust-lang/rustc-dev-guide#1747)
- fix(walkthrough.md) add a comma operator to eliminate ambiguity (rust-lang/rustc-dev-guide#1749)
- fix(git.md): remove a confusing end of sentence character (rust-lang/rustc-dev-guide#1748)
- refactor(profiling/with_perf): remove a wrong to be verb (rust-lang/rustc-dev-guide#1746)
- refactor(tests/headers): remove duplicate list item (rust-lang/rustc-dev-guide#1745)
- refactor(test/headers.md): make the meaning more obvious (rust-lang/rustc-dev-guide#1744)
- refactor(tests/ui): remove unnecessary duplicate word (rust-lang/rustc-dev-guide#1743)
- refactor(compiletest): remove unnecessary duplicate word (rust-lang/rustc-dev-guide#1742)
- generic_arguments.md: substs -> GenericArgs (rust-lang/rustc-dev-guide#1741)
- fix(suggested): remove an unnecessary and confusing statement (rust-lang/rustc-dev-guide#1739)
- fix(how-to-build-and-run): fix a typo ("fromer" -> "former") (rust-lang/rustc-dev-guide#1736)
- fix(how-to-build-and-run): remove a wrong paragraph (rust-lang/rustc-dev-guide#1735)
- coverage code has moved (rust-lang/rustc-dev-guide#1728)
- linked issue is closed (rust-lang/rustc-dev-guide#1729)
- remove duplicate reference in about-this-guide.md (rust-lang/rustc-dev-guide#1734)
- Explain more in depth what early and late bound generic parameters are (rust-lang/rustc-dev-guide#1732)
- add section for normalization with the new solver (rust-lang/rustc-dev-guide#1731)
- Improve cleanup-crew.md with an example post (rust-lang/rustc-dev-guide#1730)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants