Imperative Traverse #1614

Radvendii · 2023-09-19T20:33:38Z

I started the work to make Traverse imperative. I wanted to see if people thought this was a good direction to go in before continuing the work.

Any XXX comments in the code are things that should be resolved before merging

TODO:

implement a on_children_mut(self, f: impl Fn(&mut T)) function for Traverse<T> trait, that applies a function to all of the first descendants of a node of type T. It won't recurse to the bottom of the tree. This allows us to push all of those children onto the stack.
- Alternatively the stack could be Vec<dyn Traverse<T>>, and on_children_mut() could just call f on the direct descendants.
convert the rest of the Traverse<RichTerm> implementations
unclear whether we want to convert the Traverse<Type> implementations though.
see if i can unify most of the implementation of traverse and traverse_ref
make a document in the notes/ directory explaining the other options we considered, and pros/cons, in case we want to come back to this
- keeping a continuation stack
- the decurse crate
- a zipper version of Term
- a version of Term whose first layer is an Rc<RefCell<T>>

The only semantic changes of this PR are that we traverse trees in a slightly different order. Still depth-first, but we traverse the children backwards from how we did before.

The other semantic change, which I think might have just been a bug before, was that the traverse implementation for Type was not recursing into TypeF::Flat

Radvendii · 2023-10-05T10:34:28Z

Small update: This causes undefined behaviour error on miri when running with stacked borrows, but when run with -Zmiri-tree-borrows, it... gets much further. Miri is so slow that I wasn't actually able to run all the tests, but it passes at least the first few, and given how quickly the UB error happened with stacked borrows, I'm guessing it would have triggered if it were going to.

yannham

This is always UB to alias mutable references, as I have suspected (see the first answer here for example: https://stackoverflow.com/questions/54633474/is-aliasing-of-mutable-references-correct-in-unsafe-code). If you want to alias stuff, you to have to use raw pointers, and thus probably put the whole code inside an unsafe block.

Radvendii · 2023-10-05T12:29:37Z

There is only ever one mutable reference to a point in memory at a time

The values stored in the stack are raw pointers. At the beginning of the loop, a single pointer is turned into a mutable reference.

Radvendii · 2023-10-05T12:33:55Z

Or do you mean that it's illegal to have a mutable reference and a pointer to the same memory live at the same time?

yannham · 2023-10-05T14:38:07Z

Or do you mean that it's illegal to have a mutable reference and a pointer to the same memory live at the same time?

I would say so, intuitively. The compiler assumes that mutable references can't be aliased, and can do optimizations based on this assumption. That being said, it's hard to say in practice, and I reckon it's not a very well documented part of the Rust language, unfortunately...

In general, it's a very dangerous bet to make. UB is no joke to debug if it kicks in. What are the possible alternatives, even if it's not as nice or performant? Using UnsafeCell (I believe those can alias?)? Just encoding the rest of the work to do on the stack, a kind of zipper of the Term type?

jneem · 2023-10-06T03:34:01Z

I think we discussed a version of this already, but one possible way to avoid the unsafe would be to push continuations (represented as Box<dyn FnOnce(..)>) on the stack. There are probably multiple ways to do it, but for example I think a relatively small modification to your current approach would replace the parent pointer by having the child push a Box<dyn FnOnce(&mut RichTerm)> that would later get applied to its parent.

Radvendii · 2023-10-09T12:57:58Z

Wait no that is definitely safe. You can do it completely without unsafe code

let foo: &mut T = ...;
let bar = foo as *mut;

having a pointer around is never UB. Only accessing them can be.

As an aside, someone found a small change to my code that makes it pass miri even with the standard stacked borrows.

I understand the hesitation though, Having learned a bit more about point aliasing rules through this experience, they're arcane and ill-defined. It's not totally clear to me how anyone decides that even the standard library structures are safe.

UnsafeCell I don't think provides any safety guarantees that *mut doesn't. It just takes ownership of the object, which is not what we want.
We could encode the rest of the work to do on the stack, as @jneem explained.
I like the idea of a zipper. We could walk down it and then walk back up, applying f as we go up (for BottomUp at least). I dismissed this earlier because I couldn't figure out a way to nicely / easily make a zipper for a complicated enum like Term. But also I'm not familiar with how zipper types are usually constructed. The only examples I could find were for simple children: Vec<Node> trees.

yannham · 2023-10-09T16:16:35Z

Ah, you make good points, my bad. I haven't dived that deep into unsafe rules, but following through some links brought me to the LLVM noalias rule, which is indeed less strong that the answer which I linked to originally (and where the alias a mutable ref with a pointer isn't clear either).

Anyway, I think this is still a good reminder that the safety rules of Rust aren't, even as of today, very clearly defined 😅 the thing is, unless we expect this function to be a bottleneck, I feel like it's not really worth using unsafe here.

I like the zipper idea as well, but I fear this is going to be very annoying to define the whole zipper type for Term, don't you think? Ideally we would have macros to do that, but I don't think it exists and it's applicable anyway given the current form of Term.

@jneem's idea sounds like the most direct and low-overhead safe way to implementation as an external observer. The unsafe approach might still be interesting if we ever encounter performance issues on this part of the code: do you think it would be simple to explain shortly in a comment, making it easy to reconstruct for anyone in the future?

Radvendii · 2023-10-13T12:55:33Z

On Wednesday @yannham and I tried implementing the version of this that keeps continuations on a stack. We kept running into problems that required increasingly complex machinery to get around. So we've decided to go ahead with the version using unsafe, for now. And to document alternatives so we can come back to it.

I updated the TODO list on the first comment, and started work on the first item, then forgot to post this comment 😅

Radvendii · 2023-10-17T23:19:35Z

I've managed to extract out the common logic of Traverse which leaves only Traverse1 to implement per-type. This has a few advantages

only have to worry about the logic of "how do i get the child nodes", which is what's unique to each type anyways
only have to implement once per type (as opposed to per type and per type that you're looking for impl Traverse1 for T rather than impl Traverse<T1> for T2)

This leaves Traverse with no methods that actually need to be implemented. Maybe they should just be functions that take in a Traverse1? This would mean we don't need empty Traverse impls around, but it also means we can't use .traverse() syntax. We can just impl<A, B: Traverse1> Traverse<A> for B or something like that.

yannham · 2023-10-25T20:28:25Z

@Radvendii this CI is failing because of dangling references inside comments.

dpc · 2023-10-25T21:39:14Z

Hi. I'm just a lurker Nixer&Rustacean, but since I noticed that the discussion here is about &mut aliasing, unsafe and UB, I got a bit surprised that Nickel would have to deal with such things. I briefly look at the code, and it seems to me that if you're hitting these kinds of problems it might be worth thinking about DoD.

I don't really have experience writting compilers and PLs, but I do have experience with DoD, especially in Rust, and I'm a big proponent. It does help a lot when dealing with non-trivial recursive and graph-like data-structures, in the absence of a GC.

I also would like to share a video about a programming language that did employ DoD: https://vimeo.com/handmadecities/practical-data-oriented-design#t=1796s allegedly with great results.

Anyway, maybe I'm totally off here, please excuse if so. I hope that video is worth your time even if that's the case.

Radvendii · 2023-10-25T22:38:23Z

@yannham yeah, sorry. I haven't had time to fix this up since joining the new client. I'm hoping next week will be calmer.

@dpc Yeah, I've drunk the zig & DoD coolaid too. I think that would take a lot of thought and quite a big redesign. I still suspect it's worth while, but I don't have any experience myself so I didn't want to sink the necessary time into it without knowing where I was going. Right now I'm busy with other things, but if I end up with more time for Nickel stuff I'd love to chat with you about what this would look like.

yannham · 2023-10-25T22:51:52Z

Hi @dpc, I haven't heard of DoD before (at least as a concept or a paradigm, as opposed the examples given in the wikipedia page). Indeed the issue at hand here is pretty generic and not really PL-specific: we have a tree-like data-structure (an AST), and we want to map over a function on each node (in fact, on each subtree, to be more precise).

We had a pretty pedestrian recursive implementation, easy to follow and to implement, and probably the most efficient - however, as with any recursive implementation, at some point you hit a stack overflow. The point of this PR is just to make the implementation imperative instead of recursive, so that the extra space is allocated on the heap and we can handle big terms. Of course there are ways to implement this without unsafe code, but this is to be weighted with several factors, such as speed, size of the diff, added complexity, etc...

It's not so easy to implement that in safe Rust, especially for the bottom-up transformation, because we need to store the "continuation": first stack the children, but then remember than we need to reconstruct a node XXX (say, Let(id, exp1, exp2)), which is cumbersome. The functional way would be to define a zipper - but defining a zipper for the whole AST by hand is going to be a lot of boilerplate. One way or another we need to encode the continuation, implicitly encoded for us in the recursive version by the OS call stack.

However, with mutation, we can sidestep the issue: mutate the children node in place, and then apply the transformation to the parent, which leads @Radvendii to using raw ptr (because it doesn't fit Rust's borrowing semantics). I'm curious, would you have any suggestion for this particular use-case? Thanks for the pointers anyway!

dpc · 2023-10-26T04:08:31Z

@Radvendii

I think that would take a lot of thought and quite a big redesign.

Most definitely. Might be a good opportunity to point it out as food for thought but it's not very immediately helpful. :D

but if I end up with more time for Nickel stuff I'd love to chat with you about what this would look like.

I'd be happy to chat.

@yannham

we have a tree-like data-structure (an AST), and we want to map over a function on each node (in fact, on each subtree, to be more precise).

Yeah, so in essence with DoD the references between "things" (nodes?) become numerical IDs of some kinds instead of language-level references, and "things" live in vectors, maps, slotmaps, etc. instead of being scattered around the heap. Mapping over an AST becomes completely painless, as one holds some &mut Ast where struct Ast { fields: SlotMap<Field>, ...} and just pick and modify the right field by id and everything can be done by id.

In other words - the datastructures become more like database representation, than a typical graph/tree of objects.

(because it doesn't fit Rust's borrowing semantics).

In gamedev industry developers use DoD all the time with things like ECS to achieve extremely good performance (due to machine-friendliness of this approach), and great composability, but in Rust people are often recommended to give it a try to avoid mutable aliasing problems exactly like here.

I'm curious, would you have any suggestion for this particular use-case

I looked at it now, and I can't tell if approach is technically sound. I would cautiously side with "no", but you might as well ask a defensively programming 8-ball.

Other than jumping straight to DoD approach, I would think about avoiding references altogether and just "decompose" each node in a tree.

I.e. instead of:

    fn get_children_mut(&mut self) -> Vec<&mut dyn GetChildren>;
    fn get_children_ref(&self) -> Vec<&dyn GetChildren>;

fn decompose(self) -> (Self, Vec<Self>);
fn assemble(self, children: Vec<Self>) -> Self;

And this way able to do everything by value. Internally the implementation of this trait can just mem::swap(&mut self.children, vec![]) which should be cheap, as vec![] makes no allocation.

I might be missing something, as I didn't have to prepare for any job interviews in a while, which is the only time when I do graph (and even just tree) algorithms, but assembling the children back in DFS should be doable by a already_processed: Vec<Vec<Node>> where first index means "depth level", and then the inner Vec collects processed elements at that level. When processing element at level, you assemble it with any already_processed[level + 1] elements. By re-using capacity of the inner Vec the performance might not be bad.

Radvendii · 2023-10-26T14:20:15Z

@dpc Thanks for your help on this!

I looked at it now, and I can't tell if approach is technically sound. I would cautiously side with "no", but you might as well ask a defensively programming 8-ball.

There's theoretical reason to believe it should work, and it passes miri. I'm very new to unsafe rust and aliasing problems, so I don't know how safe that makes things really.

I would think about avoiding references altogether and just "decompose" each node in a tree.

Interesting. This feels similar to the zipper approach. And like the zipper approach, it's easy to implement it when the tree has children stored in a Vec. It's harder when the tree is an enum with a bunch of variants. Your approach has the benefit of not requiring a separate zipper enum, but it's still not as simple as mem::swap since the children are not stored in a Vec to begin with.

dpc · 2023-11-29T19:19:29Z

Reminded me about this thread https://www.reddit.com/r/rust/comments/185icjh/in_search_of_the_perfect_fold/

Radvendii force-pushed the imperative-traverse branch 2 times, most recently from cd3d1e7 to fcf4f1f Compare September 19, 2023 20:41

github-actions bot temporarily deployed to pull request September 19, 2023 20:45 Inactive

Radvendii marked this pull request as draft September 19, 2023 21:24

yannham requested changes Oct 5, 2023

View reviewed changes

Radvendii added 11 commits October 16, 2023 15:57

make traverse for RichTerms imperative

4a65cc8

improvements, comments, etc

681d90a

same treatment for traverse_ref

80edb2a

rearrange

a97e89d

todos

b788eb5

add miri (and nightly) devShell

521cebb

make work for stacked borrows

a3ebbcd

restructure into Traverse1

4f9f0b7

bring traverse_1 in line with traverse_ref_1

67f21e3

order of usages shouldn't matter

9ed4f0e

comment

7e88d14

Radvendii force-pushed the imperative-traverse branch from fcf4f1f to 7e88d14 Compare October 17, 2023 23:16

github-actions bot temporarily deployed to pull request October 17, 2023 23:21 Inactive

use as_any crate to get rid of AsAny instances

079422f

github-actions bot temporarily deployed to pull request October 17, 2023 23:37 Inactive

unify Traverse implementations

08e46d1

Traverse1 -> GetChildren, with cleaner interface

142f646

github-actions bot temporarily deployed to pull request October 19, 2023 15:17 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imperative Traverse #1614

Imperative Traverse #1614

Radvendii commented Sep 19, 2023 •

edited

Loading

Radvendii commented Oct 5, 2023

yannham left a comment

Radvendii commented Oct 5, 2023

Radvendii commented Oct 5, 2023

yannham commented Oct 5, 2023 •

edited

Loading

jneem commented Oct 6, 2023

Radvendii commented Oct 9, 2023

yannham commented Oct 9, 2023 •

edited

Loading

Radvendii commented Oct 13, 2023

Radvendii commented Oct 17, 2023 •

edited

Loading

yannham commented Oct 25, 2023

dpc commented Oct 25, 2023 •

edited

Loading

Radvendii commented Oct 25, 2023

yannham commented Oct 25, 2023

dpc commented Oct 26, 2023 •

edited

Loading

Radvendii commented Oct 26, 2023

dpc commented Nov 29, 2023

Imperative Traverse #1614

Are you sure you want to change the base?

Imperative Traverse #1614

Conversation

Radvendii commented Sep 19, 2023 • edited Loading

Radvendii commented Oct 5, 2023

yannham left a comment

Choose a reason for hiding this comment

Radvendii commented Oct 5, 2023

Radvendii commented Oct 5, 2023

yannham commented Oct 5, 2023 • edited Loading

jneem commented Oct 6, 2023

Radvendii commented Oct 9, 2023

yannham commented Oct 9, 2023 • edited Loading

Radvendii commented Oct 13, 2023

Radvendii commented Oct 17, 2023 • edited Loading

yannham commented Oct 25, 2023

dpc commented Oct 25, 2023 • edited Loading

Radvendii commented Oct 25, 2023

yannham commented Oct 25, 2023

dpc commented Oct 26, 2023 • edited Loading

Radvendii commented Oct 26, 2023

dpc commented Nov 29, 2023

Radvendii commented Sep 19, 2023 •

edited

Loading

yannham commented Oct 5, 2023 •

edited

Loading

yannham commented Oct 9, 2023 •

edited

Loading

Radvendii commented Oct 17, 2023 •

edited

Loading

dpc commented Oct 25, 2023 •

edited

Loading

dpc commented Oct 26, 2023 •

edited

Loading