-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Imperative Traverse #1614
base: master
Are you sure you want to change the base?
Imperative Traverse #1614
Conversation
cd3d1e7
to
fcf4f1f
Compare
Small update: This causes undefined behaviour error on miri when running with stacked borrows, but when run with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is always UB to alias mutable references, as I have suspected (see the first answer here for example: https://stackoverflow.com/questions/54633474/is-aliasing-of-mutable-references-correct-in-unsafe-code). If you want to alias stuff, you to have to use raw pointers, and thus probably put the whole code inside an unsafe block.
There is only ever one mutable reference to a point in memory at a time The values stored in the stack are raw pointers. At the beginning of the loop, a single pointer is turned into a mutable reference. |
Or do you mean that it's illegal to have a mutable reference and a pointer to the same memory live at the same time? |
I would say so, intuitively. The compiler assumes that mutable references can't be aliased, and can do optimizations based on this assumption. That being said, it's hard to say in practice, and I reckon it's not a very well documented part of the Rust language, unfortunately... In general, it's a very dangerous bet to make. UB is no joke to debug if it kicks in. What are the possible alternatives, even if it's not as nice or performant? Using |
I think we discussed a version of this already, but one possible way to avoid the unsafe would be to push continuations (represented as |
Wait no that is definitely safe. You can do it completely without unsafe code
having a pointer around is never UB. Only accessing them can be. As an aside, someone found a small change to my code that makes it pass I understand the hesitation though, Having learned a bit more about point aliasing rules through this experience, they're arcane and ill-defined. It's not totally clear to me how anyone decides that even the standard library structures are safe.
|
Ah, you make good points, my bad. I haven't dived that deep into unsafe rules, but following through some links brought me to the LLVM noalias rule, which is indeed less strong that the answer which I linked to originally (and where the alias a mutable ref with a pointer isn't clear either). Anyway, I think this is still a good reminder that the safety rules of Rust aren't, even as of today, very clearly defined 😅 the thing is, unless we expect this function to be a bottleneck, I feel like it's not really worth using unsafe here. I like the zipper idea as well, but I fear this is going to be very annoying to define the whole zipper type for @jneem's idea sounds like the most direct and low-overhead safe way to implementation as an external observer. The unsafe approach might still be interesting if we ever encounter performance issues on this part of the code: do you think it would be simple to explain shortly in a comment, making it easy to reconstruct for anyone in the future? |
On Wednesday @yannham and I tried implementing the version of this that keeps continuations on a stack. We kept running into problems that required increasingly complex machinery to get around. So we've decided to go ahead with the version using I updated the TODO list on the first comment, and started work on the first item, then forgot to post this comment 😅 |
fcf4f1f
to
7e88d14
Compare
I've managed to extract out the common logic of
|
@Radvendii this CI is failing because of dangling references inside comments. |
Hi. I'm just a lurker Nixer&Rustacean, but since I noticed that the discussion here is about I don't really have experience writting compilers and PLs, but I do have experience with DoD, especially in Rust, and I'm a big proponent. It does help a lot when dealing with non-trivial recursive and graph-like data-structures, in the absence of a GC. I also would like to share a video about a programming language that did employ DoD: https://vimeo.com/handmadecities/practical-data-oriented-design#t=1796s allegedly with great results. Anyway, maybe I'm totally off here, please excuse if so. I hope that video is worth your time even if that's the case. |
@yannham yeah, sorry. I haven't had time to fix this up since joining the new client. I'm hoping next week will be calmer. @dpc Yeah, I've drunk the zig & DoD coolaid too. I think that would take a lot of thought and quite a big redesign. I still suspect it's worth while, but I don't have any experience myself so I didn't want to sink the necessary time into it without knowing where I was going. Right now I'm busy with other things, but if I end up with more time for Nickel stuff I'd love to chat with you about what this would look like. |
Hi @dpc, I haven't heard of DoD before (at least as a concept or a paradigm, as opposed the examples given in the wikipedia page). Indeed the issue at hand here is pretty generic and not really PL-specific: we have a tree-like data-structure (an AST), and we want to map over a function on each node (in fact, on each subtree, to be more precise). We had a pretty pedestrian recursive implementation, easy to follow and to implement, and probably the most efficient - however, as with any recursive implementation, at some point you hit a stack overflow. The point of this PR is just to make the implementation imperative instead of recursive, so that the extra space is allocated on the heap and we can handle big terms. Of course there are ways to implement this without unsafe code, but this is to be weighted with several factors, such as speed, size of the diff, added complexity, etc... It's not so easy to implement that in safe Rust, especially for the bottom-up transformation, because we need to store the "continuation": first stack the children, but then remember than we need to reconstruct a node XXX (say, However, with mutation, we can sidestep the issue: mutate the children node in place, and then apply the transformation to the parent, which leads @Radvendii to using raw ptr (because it doesn't fit Rust's borrowing semantics). I'm curious, would you have any suggestion for this particular use-case? Thanks for the pointers anyway! |
Most definitely. Might be a good opportunity to point it out as food for thought but it's not very immediately helpful. :D
I'd be happy to chat.
Yeah, so in essence with DoD the references between "things" (nodes?) become numerical IDs of some kinds instead of language-level references, and "things" live in vectors, maps, slotmaps, etc. instead of being scattered around the heap. Mapping over an AST becomes completely painless, as one holds some In other words - the datastructures become more like database representation, than a typical graph/tree of objects.
In gamedev industry developers use DoD all the time with things like ECS to achieve extremely good performance (due to machine-friendliness of this approach), and great composability, but in Rust people are often recommended to give it a try to avoid mutable aliasing problems exactly like here.
I looked at it now, and I can't tell if approach is technically sound. I would cautiously side with "no", but you might as well ask a defensively programming 8-ball. Other than jumping straight to DoD approach, I would think about avoiding references altogether and just "decompose" each node in a tree. I.e. instead of:
And this way able to do everything by value. Internally the implementation of this trait can just I might be missing something, as I didn't have to prepare for any job interviews in a while, which is the only time when I do graph (and even just tree) algorithms, but assembling the children back in DFS should be doable by a |
@dpc Thanks for your help on this!
There's theoretical reason to believe it should work, and it passes
Interesting. This feels similar to the zipper approach. And like the zipper approach, it's easy to implement it when the tree has children stored in a |
Reminded me about this thread https://www.reddit.com/r/rust/comments/185icjh/in_search_of_the_perfect_fold/ |
I started the work to make
Traverse
imperative. I wanted to see if people thought this was a good direction to go in before continuing the work.Any
XXX
comments in the code are things that should be resolved before mergingTODO:
on_children_mut(self, f: impl Fn(&mut T))
function forTraverse<T>
trait, that applies a function to all of the first descendants of a node of type T. It won't recurse to the bottom of the tree. This allows us to push all of those children onto the stack.Vec<dyn Traverse<T>>
, andon_children_mut()
could just callf
on the direct descendants.Traverse<RichTerm>
implementationsTraverse<Type>
implementations though.traverse
andtraverse_ref
notes/
directory explaining the other options we considered, and pros/cons, in case we want to come back to thisdecurse
crateTerm
Term
whose first layer is anRc<RefCell<T>>
The only semantic changes of this PR are that we traverse trees in a slightly different order. Still depth-first, but we traverse the children backwards from how we did before.
The other semantic change, which I think might have just been a bug before, was that the
traverse
implementation forType
was not recursing intoTypeF::Flat