Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-Gc the AST #7929

Closed
huonw opened this issue Jul 20, 2013 · 12 comments · Fixed by #13316
Closed

De-Gc the AST #7929

huonw opened this issue Jul 20, 2013 · 12 comments · Fixed by #13316
Labels
C-cleanup Category: PRs that clean code up or issues documenting cleanup. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. metabug Issues about issues themselves ("bugs about bugs")

Comments

@huonw
Copy link
Member

huonw commented Jul 20, 2013

It would be really nice to have an AST that was a tree with ~. This would have countless advantages, like the ast-fold could consume the AST and only allocate if it is actually creating a new node, and we'd generally not have the problems associated with Gc (the GC might be able to land!).

However, there are a few areas that are very tricky; mainly the AST map, which maps node_id to actual AST nodes of various types, the only safe way to do this with an ~ tree is with & references... which freezes the AST for the remainder of the program, so no transformations like constant evaluation, and no consuming of the AST by trans.

@Aatch
Copy link
Contributor

Aatch commented Jul 20, 2013

There are several side-issues with making the AST map work "properly". For one, since it contains & it would have lifetime, which means that every piece of code that uses it needs lifetimes and that causes more lifetime requirements and so on and so forth.

Much of the current code isn't designed to deal with these kinds of structures, happily copying around @-pointers from the map and using @fn closures that can't have borrowed pointers in them. Furthermore, much of the code only needs minor use of the map, but the rest of the code pays for it's design decisions.

While unpalatable, a map based on unsafe pointers may help us, at least as a bridging solution until the rest of the code is brought up to a place where a safe-code map can be used instead.

My idea is to integrate the map into a wrapper for the AST, to limit the potential for, uh, bad things to happen. A simple struct like this:

struct ASTData {
    crate: ast::crate,
    priv map: ast_map::Map
}

With methods like this:

fn find<'a>(&'a self, node: node_id) -> Option<&'a node>;

The only place (as far as I am aware) the AST undergoes structural changes is during folding. Thus, folding can maintain the AST map. There is very little code that absolutely requires @, instead it is used as the "easy" option.

The current AST map is HashMap<node_id, node>, where node is an enum variant for the various indexed nodes. The variants mostly use @, but they could use * instead. ASTData has map as a private field to ensure no direct access, instead most access should be done via a safe API that returns a region pointer that causes ASTData to be frozen for it's length. An unsafe API can also be written to deal with cases where such freezing is unwanted, but the memory usage from copying is unacceptable.

The design above should be fairly safe as the pointers held in the map are "internal" pointers to the map's owner, meaning it's unlikely end up with dangling pointers.

@nikomatsakis
Copy link
Contributor

I actually don't know that ~ trees are the way to model the AST. I was thinking that using an arena might be the right approach (it's certainly the traditional approach). To make that work, though, we need to finish up the work on lifetime syntax.

@Aatch
Copy link
Contributor

Aatch commented Jul 20, 2013

@nikomatsakis as long as it can be semantically owned (so folding is memory efficient, it can be sent across task boundaries, etc) then I don't really care how it's implemented.

Finishing the lifetime syntax would be very nice too 😄

@dobkeratops
Copy link

Probably not enough code that uses it for it to be an issue yet ... but would you forsee a need for seperate compiler & tools AST structure/interfaces - similar to how clang provides a stable C interface and a more fully featured , but more rapidly changing internal representation.

r.e. tools, are there any ideas on how incremental build might go - with C++ although the header situation is pretty dire, you can at least translate individual sources - wheras at present you must compile an entire crate. Perhaps a tools oriented AST could be more suited to incremental builds? (for ide support).

(perhaps there are other issues more suitable for this question)

@emberian
Copy link
Member

emberian commented Nov 8, 2013

@nikomatsakis given that said work seems to be being finished, is it time to reapproach this? Probably not a 1.0 priority, as it certainly has no bearing on the language itself, but it seems it will be necessary to get rustc memory usage reasonable.

@nikomatsakis
Copy link
Contributor

Well, PR #10153 has not yet landed (still under review) but yes it might be reasonable to experiment once it does.

@nikomatsakis
Copy link
Contributor

On Fri, Nov 08, 2013 at 12:18:45AM -0800, Corey Richardson wrote:

it seems it will be necessary to get rustc memory usage reasonable.

...though I'm not sure I agree with this particular statement. It
would reduce the usage in box headers, but then improving the GC
would do so as well.

Nonetheless, I don't think the AST is correctly structured today, just
from first principles. I can imagine two alternative designs, both of
which will save memory and make it obvious how to trash the AST in a
deterministic fashion etc:

  1. Store all AST nodes in arrays, divided by type. Something like:

    struct Ast {
        items: ~[Item],
        exprs: ~[Expr],
        ...
    }
    
    ...
    
    enum Expr {
         ...
         ExprBinop(Binop, ExprId, ExprId),
         ...
    }
    
    struct ItemId { index: u32 }
    struct ExprId { index: u32 }
    

    Note that @ast::Expr pointers are replaced with ExprId, etc.
    This requires some refactoring to Segregate node ids into distinct
    categories. In some cases we pun within a single table, mixing node
    ids for items and exprs. We'll have to stop doing that.

    For those code paths that take a single node id, we could create
    an enum like:

    enum NodeId {
         ItemId(ItemId),
         ExprId(ExprId),
         ...
    }
    

    which would allow us to keep a single code path.

    Pro: Maximally memory layout; pointers can be 32 bit.
    Pro: Subsumes the current AST Map, which can just go away
    entirely, saving even more memory.
    Pro: Trivial to trash the AST Node after folding.
    Pro: We can use TLS to thread the AST array around with minimal
    effort; though we will have to be careful in trans
    when growing the array.
    Pro: Refactoring away from NodeId into different categories
    will improve readability and correctness, imo.
    Con: Annoying to match nested nodes.
    Con: The potential for "dangling pointers" in the form of outdated
    indices exists when folding.

  2. Use an arena to allocate AST nodes. Make ASTNodes have the lifetime
    of this arena. Replace any @ast::Node with Node<'ast>, defined
    as follows:

    type Node<'ast> = &'a Node<'ast>;

    Pro: Relatively efficient memory layout.
    Pro: Easy to trash, no need for dangling pointers.
    Pro: Correctness of AST fold becomes type checked, because it
    will have to have two lifetime parameters, 'in and 'out,
    and it will be producing Node<'out>.
    Con: Must thread lifetimes around, lots of manual editing.
    Con: We still need the AST Map. That makes me very sad.

Looking forward, I would (maybe) like to eventually convert from one
monolithic AST into an AST per item. The same would hold for all the
side tables in trans. The idea is to enable us to parallelize by
shipping items to different tasks. Either (1) or (2) could be adapted
to this approach, though using an array is somewhat easier but less
safe. Easier because indices can be sent between tasks. Less safe
because the type of an ExprId would not identify the item it belongs
to, and so we could confusingly use an ExprId from one fn in the
context of another. To convert approach (2) we'd have to use something
like the Message abstraction I outlined on the mailing list.

That said, what I'd REALLY like to do is to desugar from the AST into
a CFG after type checking, and do most of our processing on the CFG.
But all of this discussion is still relevant. As a babystep in that
direction I opened #6298 and I even have a lot of work on that lying
around on a branch, waiting for me to finish up with #2202 before it
can be merged. Got to get on that!

@pnkfelix
Copy link
Member

cc me

@eddyb
Copy link
Member

eddyb commented Dec 4, 2013

I would like to replace all uses of @T/P<T> in Visitor with &T.
The only Visitor implementation I know of that needs the original pointer is in ast_map, and I have a few ideas on how to deal with it:

  • provide a helper P::from_ref(&T) -> P<T> function that returns the original @T/P<T> pointer from a reference; this will work in most cases, but it's not safe if the provided &T wasn't borrowed from a @T/P<T>.
  • use a Folder instead of a Visitor; if all the methods are implemented to return the original @T/P<T>, no new allocations should be made behind the scenes, but it could be slower (experimentation required)
  • write a specialized version of the current Visitor, which works with @T/P<T> and is used only by ast_map; apart from the code duplication, it's the cleanest solution

@nikomatsakis
Copy link
Contributor

@eddyb what is the motivation to make this change? The fact that some users want to take ownership of what they are passed seems like a good reason for keeping it the way it is.

@eddyb
Copy link
Member

eddyb commented Dec 4, 2013

@nikomatsakis it's an extra cost (ref-counting right now) for all visitors and changing @T with anything else means touching every Visitor implementation.

@huonw huonw changed the title De-@ the AST De-Gc the AST Jun 28, 2014
@huonw
Copy link
Member Author

huonw commented Jun 28, 2014

(The AST now has Gc instead of @, but the semantics haven't changed.)

bors added a commit that referenced this issue Sep 14, 2014
Replaces Gc<T> in the AST with a custom owned smart pointer, P<T>. Fixes #7929.

## Benefits
* **Identity** (affinity?): sharing AST nodes is bad for the various analysis passes (e.g. one could bypass borrowck with a shared `ExprAddrOf` node taking a mutable borrow), the only reason we haven't hit any serious issues with it is because of inefficient folding passes which will always deduplicate any such shared nodes. Even if we were to switch to an arena, this would still hold, i.e. we wouldn't just use `&'a T` in the AST, but rather an wrapper (`P<'a, T>`?).

* **Immutability**: `P<T>` disallows mutating its inner `T` (unless that contains an `Unsafe` interior, which won't happen in the AST), unlike `~T`.

* **Efficiency**: folding can reuse allocation space for `P<T>` and `Vec<T>`, the latter even when the input and output types differ (as it would be the case with arenas or an AST with type parameters to toggle macro support). Also, various algorithms have been changed from copying `Gc<T>` to using `&T` and iterators.

* **Maintainability**: there is another reason I didn't just replace `Gc<T>` with `~T`: `P<T>` provides a fixed interface (`Deref`, `and_then` and `map`) which can remain fully functional even if the implementation changes (using a special thread-local heap, for example). Moreover, switching to, e.g. `P<'a, T>` (for a contextual arena) is easy and mostly automated.
flip1995 pushed a commit to flip1995/rust that referenced this issue Nov 4, 2021
Rustup

r? `@ghost`

changelog: none
notriddle pushed a commit to notriddle/rust that referenced this issue Nov 10, 2022
fix: make custom expr prefix completions to understand refs

Possible fix of rust-lang#7929

While reviewing the postfix completion code I saw that while calling `add_custom_postfix_completions` we were doing it under the part where reference was not taken into consideration, but as we are only adding postfix completions with `Expr` scope ( [source](https://github.com/rust-lang/rust-analyzer/blob/ba28e19b7838e3ad4223ae82d074dc3950ef1548/crates/ide-completion/src/completions/postfix.rs#L272) )

I shifted the `add_custom_postfix_completions` call to part where references are considered

I am not sure if this is the correct fix or I am understanding the problem exactly but this small move seemed to have fixed the issue :)
arcnmx pushed a commit to arcnmx/rust that referenced this issue Dec 17, 2022
…hievink

fix: breaking snippets on typed incomplete suggestions

Possible fix for rust-lang#7929

Fix the case where if a user types `&&42.o`, snippet completion was still applying &&Ok(42). Note this was fixed previously on `&&42.` but this still remained a problem for this case

Previous relevant PR: rust-lang#13517

### Points to help in review:

- The main problem why everything broke on adding an extra `o` was, earlier `dot_receiver` was `42.` which was a `LITERAL` but now `42.o` becomes a `FIELD_EXPR`

- Till now `include_references` was just checking for parent of `LITERAL` and if it was a `REF_EXPR`, but now we consider `FIELD_EXPR` and traverse all of them, finally to reach `REF_EXPR`. If `REF_EXPR` is not found we  just return the original `initial_element`

- We are constructing a new node during `include_references` because if we rely on `dot_receiver` solely we would get `&&42.o` to be replaced with, but we want `&&42` to be replaced with

### Output Video:

https://user-images.githubusercontent.com/49019259/205420166-efbdef78-5b3a-4aef-ab4b-d892dac056a0.mov

Hope everything I wrote makes sense 😅

Also interestingly previous PR's number was `13517` and this PR's number is `13715`, nicee
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Category: PRs that clean code up or issues documenting cleanup. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. metabug Issues about issues themselves ("bugs about bugs")
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants