Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safety comments #1651

Merged
merged 6 commits into from
Oct 29, 2024
Merged

Add safety comments #1651

merged 6 commits into from
Oct 29, 2024

Conversation

Manishearth
Copy link
Contributor

I recently performed safety review on 0.19. Besides finding the already-fixed #1491, I also found that these cases of unsafety were a bit hard to disentangle, and documented them further.

@sftse
Copy link
Contributor

sftse commented Oct 22, 2024

Does the referenced UCG discussion 412 apply to this case? Exercising the offending code under miri with and without the flags mentioned in the issue does not make a difference.

MIRIFLAGS='-Zmiri-env-set=TOKENIZERS_PARALLELISM=false -Zmiri-recursive-validation' cargo +nightly miri test bpe::trainer::tests::test_train

@Manishearth
Copy link
Contributor Author

@sftse IIRC miri doesn't currently do anything special around Vecs in this way, so it's not useful to rely on it for whether the rules apply. I did mention "this and related issues" because I know this type of thing has been the subject of discussion. For now I'm comfortable giving a sense of "this is potentially but not probably problematic"

@sftse
Copy link
Contributor

sftse commented Oct 23, 2024

[...] does not make a difference.

This may have been poor wording on my part, and may have communicated that miri thinks the code is fine. What I meant is under both the liberal and more restrictive validity rules for references, miri does not think the code is fine. If this was not a misunderstanding, this reply would be very puzzling to me and would hope you can expand on that.

@sftse
Copy link
Contributor

sftse commented Oct 23, 2024

It's worth pointing out that this appears to not even be something solely detectable by miri. Prior to commit 4322056 rustc would refuse to compile this crate, as mentioned in #1485. That commit, although not introducing any relevant changes to the unsafe block, managed to successfully subvert rustc's detection of this UB pattern.

@Manishearth
Copy link
Contributor Author

Oh! The & to &mut transmute is the problem, I see. We can use &raw or addr_of there instead. The recursive validity issue made me not notice the other issue there.

Given that I might just write it with manual unsafe (which would not be open to recusrive validity issues or transmute issues) then.

@Manishearth
Copy link
Contributor Author

I attempted a fix.

@sftse
Copy link
Contributor

sftse commented Oct 23, 2024

Thanks!

Before seeing your PR I had made #1664, trying to hit the same angle but a bit more verbose, I should have mentioned that. #1651 and #1664 have a bit of overlap, the maintainers may want to deduplicate how they see fit. No longer.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot both! 🤗 quite honored to see core rust people here 👀
LGTM!

unsafe {
assert!(i < words_len);
// This is words[i], but avoids needing to go through &T (which triggers UB)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FMI, why is &T UB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's UB to mutate data that currently has an active &T. This persists through raw pointers in some ways (and not in others).

@ArthurZucker
Copy link
Collaborator

cargo fmt should be enough for the CI!

@ArthurZucker
Copy link
Collaborator

Error:    --> src/models/bpe/trainer.rs:535:17
    |
535 |             let ref pos: HashSet<usize> = top.pos;
    |             ----^^^^^^^--------------------------- help: try: `let pos: &HashSet<usize> = &top.pos;`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#toplevel_ref_arg
    = note: `-D clippy::toplevel-ref-arg` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::toplevel_ref_arg)]`

seems to cause an issue with compilation, not sure if it's actionable on my side

@Manishearth
Copy link
Contributor Author

Fixed

@ArthurZucker
Copy link
Collaborator

Thanks!

@ArthurZucker ArthurZucker merged commit 5512a42 into huggingface:main Oct 29, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unsound use of unsafe in src/utils/parallelism.rs
4 participants