Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the translation item collector #30900

Merged
merged 4 commits into from
Jan 29, 2016

Conversation

michaelwoerister
Copy link
Member

The purpose of the translation item collector is to find all monomorphic instances of functions, methods and statics that need to be translated into LLVM IR in order to compile the current crate.

So far these instances have been discovered lazily during the trans path. For incremental compilation we want to know the set of these instances in advance, and that is what the trans::collect module provides.
In the future, incremental and regular translation will be driven by the collector implemented here.

r? @nikomatsakis
cc @rust-lang/compiler

Translation Item Collection

This module is responsible for discovering all items that will contribute to
to code generation of the crate. The important part here is that it not only
needs to find syntax-level items (functions, structs, etc) but also all
their monomorphized instantiations. Every non-generic, non-const function
maps to one LLVM artifact. Every generic function can produce
from zero to N artifacts, depending on the sets of type arguments it
is instantiated with.
This also applies to generic items from other crates: A generic definition
in crate X might produce monomorphizations that are compiled into crate Y.
We also have to collect these here.

The following kinds of "translation items" are handled here:

  • Functions
  • Methods
  • Closures
  • Statics
  • Drop glue

The following things also result in LLVM artifacts, but are not collected
here, since we instantiate them locally on demand when needed in a given
codegen unit:

  • Constants
  • Vtables
  • Object Shims

General Algorithm

Let's define some terms first:

  • A "translation item" is something that results in a function or global in
    the LLVM IR of a codegen unit. Translation items do not stand on their
    own, they can reference other translation items. For example, if function
    foo() calls function bar() then the translation item for foo()
    references the translation item for function bar(). In general, the
    definition for translation item A referencing a translation item B is that
    the LLVM artifact produced for A references the LLVM artifact produced
    for B.
  • Translation items and the references between them for a directed graph,
    where the translation items are the nodes and references form the edges.
    Let's call this graph the "translation item graph".
  • The translation item graph for a program contains all translation items
    that are needed in order to produce the complete LLVM IR of the program.

The purpose of the algorithm implemented in this module is to build the
translation item graph for the current crate. It runs in two phases:

  1. Discover the roots of the graph by traversing the HIR of the crate.
  2. Starting from the roots, find neighboring nodes by inspecting the MIR
    representation of the item corresponding to a given node, until no more
    new nodes are found.

Discovering roots

The roots of the translation item graph correspond to the non-generic
syntactic items in the source code. We find them by walking the HIR of the
crate, and whenever we hit upon a function, method, or static item, we
create a translation item consisting of the items DefId and, since we only
consider non-generic items, an empty type-substitution set.

Finding neighbor nodes

Given a translation item node, we can discover neighbors by inspecting its
MIR. We walk the MIR and any time we hit upon something that signifies a
reference to another translation item, we have found a neighbor. Since the
translation item we are currently at is always monomorphic, we also know the
concrete type arguments of its neighbors, and so all neighbors again will be
monomorphic. The specific forms a reference to a neighboring node can take
in MIR are quite diverse. Here is an overview:

Calling Functions/Methods

The most obvious form of one translation item referencing another is a
function or method call (represented by a CALL terminator in MIR). But
calls are not the only thing that might introduce a reference between two
function translation items, and as we will see below, they are just a
specialized of the form described next, and consequently will don't get any
special treatment in the algorithm.

Taking a reference to a function or method

A function does not need to actually be called in order to be a neighbor of
another function. It suffices to just take a reference in order to introduce
an edge. Consider the following example:

fn print_val<T: Display>(x: T) {
    println!("{}", x);
}

fn call_fn(f: &Fn(i32), x: i32) {
    f(x);
}

fn main() {
    let print_i32 = print_val::<i32>;
    call_fn(&print_i32, 0);
}

The MIR of none of these functions will contain an explicit call to
print_val::<i32>. Nonetheless, in order to translate this program, we need
an instance of this function. Thus, whenever we encounter a function or
method in operand position, we treat it as a neighbor of the current
translation item. Calls are just a special case of that.

Closures

In a way, closures are a simple case. Since every closure object needs to be
constructed somewhere, we can reliably discover them by observing
RValue::Aggregate expressions with AggregateKind::Closure. This is also
true for closures inlined from other crates.

Drop glue

Drop glue translation items are introduced by MIR drop-statements. The
generated translation item will again have drop-glue item neighbors if the
type to be dropped contains nested values that also need to be dropped. It
might also have a function item neighbor for the explicit Drop::drop
implementation of its type.

Unsizing Casts

A subtle way of introducing neighbor edges is by casting to a trait object.
Since the resulting fat-pointer contains a reference to a vtable, we need to
instantiate all object-save methods of the trait, as we need to store
pointers to these functions even if they never get called anywhere. This can
be seen as a special case of taking a function reference.

Boxes

Since Box expression have special compiler support, no explicit calls to
exchange_malloc() and exchange_free() may show up in MIR, even if the
compiler will generate them. We have to observe Rvalue::Box expressions
and Box-typed drop-statements for that purpose.

Interaction with Cross-Crate Inlining

The binary of a crate will not only contain machine code for the items
defined in the source code of that crate. It will also contain monomorphic
instantiations of any extern generic functions and of functions marked with

[inline].

The collection algorithm handles this more or less transparently. When
constructing a neighbor node for an item, the algorithm will always call
inline::get_local_instance() before proceeding. If no local instance can
be acquired (e.g. for a function that is just linked to) no node is created;
which is exactly what we want, since no machine code should be generated in
the current crate for such an item. On the other hand, if we can
successfully inline the function, we subsequently can just treat it like a
local item, walking it's MIR et cetera.

Eager and Lazy Collection Mode

Translation item collection can be performed in one of two modes:

  • Lazy mode means that items will only be instantiated when actually
    referenced. The goal is to produce the least amount of machine code
    possible.
  • Eager mode is meant to be used in conjunction with incremental compilation
    where a stable set of translation items is more important than a minimal
    one. Thus, eager mode will instantiate drop-glue for every drop-able type
    in the crate, even of no drop call for that type exists (yet). It will
    also instantiate default implementations of trait methods, something that
    otherwise is only done on demand.

Open Issues

Some things are not yet fully implemented in the current version of this
module.

Initializers of Constants and Statics

Since no MIR is constructed yet for initializer expressions of constants and
statics we cannot inspect these properly.

Const Fns

Ideally, no translation item should be generated for const fns unless there
is a call to them that cannot be evaluated at compile time. At the moment
this is not implemented however: a translation item will be produced
regardless of whether it is actually needed or not.

Review on Reviewable

@nikomatsakis
Copy link
Contributor

/me excited, will look as soon as I get a chance.

@Aatch
Copy link
Contributor

Aatch commented Jan 15, 2016

Well based on the description (I haven't looked at the PR itself yet) it looks good. Based on the commit message, I assume that the results aren't actually being used in any way just yet?

@michaelwoerister
Copy link
Member Author

@Aatch No, the results are not used yet except for the specialized auto-tests.

@michaelwoerister
Copy link
Member Author

The ICE while compiling librustc is due to my code not being able to handle the unsizing of a Rc<>. Looking into right now.

@nikomatsakis
Copy link
Contributor

I'm not yet finished reviewing, but here is where I've been collecting comments:

https://gist.github.com/nikomatsakis/a31427e59e420a51391c

@eddyb
Copy link
Member

eddyb commented Jan 19, 2016

The logic for handling unsizing coercions in trans is pretty hairy.
And that's not everything: unsized_info gets called and ends up using struct_lockstep_tails to extract the actual source/target for unsizing.

@michaelwoerister
Copy link
Member Author

@eddyb Hm, I was thinking of implementing something that relies on the CustomCoerceUnsized to drill down into structs until it find something that's handled by a builtin coercion (i.e. &Trait, *Trait, or Box<Trait>). More or less the same approach as the code you linked to. struct_lockstep_tails is an excellent hint, by the way. Thanks!

@michaelwoerister michaelwoerister force-pushed the trans_item_collect branch 2 times, most recently from 553f88e to d5c9577 Compare January 19, 2016 17:19
@nikomatsakis
Copy link
Contributor

@michaelwoerister updated the gist with more comments. still reading through collect.rs, but looking pretty good. My biggest concern is lack of DRY, wondering if you have any thoughts on how to "de-dup" some of this logic.

@michaelwoerister
Copy link
Member Author

My biggest concern is lack of DRY, wondering if you have any thoughts on how to "de-dup" some of this logic.

For the do_static_trait_method_dispatch() I thought that it would be nice to just cache the results in a side table belonging to the translation item. Something like:

enum TransItem<'tcx> {
    Fn {
        node_id: NodeId,
        substs: &'tcx Substs<'tcx>,
        original_definition: DefId,
        method_dispatch_cache: FnvHashMap<CallKey, (DefId, &'tcx Substs<'tcx>)>
    },
    ...
}

This table (and any other per-translation-item information cached during collection) could then be made available in the fcx and deallocated after the function in question has been translated.
The problem here is that trans-collection already works with MIR while regular translation still works with HIR: The CallKey value above would be something like mir::repr::BasicBlock. So, with a side table like this, I would expect the duplication to go away, but we can only do that when trans is based MIR.

find_vtable_types_for_unsizing() I just wrote yesterday to fix the unsizing bug and I haven't really looked into how to factor this better.

In both instances though I also think that it would be highly desirable to have this logic just in one place.

@nikomatsakis
Copy link
Contributor

@michaelwoerister

The problem here is that trans-collection already works with MIR while regular translation still works with HIR: The CallKey value above would be something like mir::repr::BasicBlock. So, with a side table like this, I would expect the duplication to go away, but we can only do that when trans is based MIR.

Yes, I was anticipating that part of this would be "after we move to trans-based MIR" -- though we've got a lot of work in that direction, so we might be able to prototype this (but not in this PR).

@nikomatsakis
Copy link
Contributor

OK, so, r=me, modulo the nits in this gist (I added a few more). I think this is a good step, but we definitely need to do more work to cleanup. For the record:

  1. I think we should move away from using node-ids and move towards using def-ids everywhere, including inlined items. This will easier once MIR trans is in place.
  2. I think we should try to reduce the duplication. The strategy we discussed above makes sense, or maybe we can refactor helpers (sometimes it's better to redo work than to cache it, hard to say in advance).
  3. I'd like to see some perf numbers, given that this code is always running.

@michaelwoerister
Copy link
Member Author

Great. (There's still some failing tests related to the monomorphization recursion limit, which also need fixing)

@arielb1
Copy link
Contributor

arielb1 commented Jan 20, 2016

// If we have a closure or a function pointer, we will also encounter
// the concrete closure/function somewhere else (during closure or fn
// pointer construction). That's where we track those things.

If we have a use of <Debug as Debug>::fmt, that should ensure the wrapper function is emitted - similarly for <fn() as Fn()>::call (the latter causes an ICE at this moment because we forgot to implement it and I'm waiting on MIR to fix it, but that should not be the case).

In any case, the code in that function and the place it is was copied from have grown terribly ugly from refactor damage and would be much prettier if cleaned up.

@arielb1
Copy link
Contributor

arielb1 commented Jan 20, 2016

Why is find_vtable_types_for_unsizing calling represent_type? def_id_a.struct_variant().fields[i] works just as well for getting the i-th field.

@michaelwoerister
Copy link
Member Author

@arielb1 Thanks for the comments!

If we have a use of <Debug as Debug>::fmt, that should ensure the wrapper function is emitted - similarly for <fn() as Fn()>::call (the latter causes an ICE at this moment because we forgot to implement it and I'm waiting on MIR to fix it, but that should not be the case).

If you are talking about object shims here, the plan is to emit them lazily, per codegen unit.

In any case, the code in that function and the place it is was copied from have grown terribly ugly from refactor damage and would be much prettier if cleaned up.

Can you elaborate on that? What functions do you mean exactly and does it pertain to this PR?

Why is find_vtable_types_for_unsizing calling represent_type? def_id_a.struct_variant().fields[i] works just as well for getting the i-th field.

You're right, that's more concise. Will change it.

@arielb1
Copy link
Contributor

arielb1 commented Jan 21, 2016

If you are talking about object shims here, the plan is to emit them lazily, per codegen unit.

I was talking about object shims, but these may need to be monomorphized too - you can have things like <Foo<Ty> as Foo<Ty>>::meth.

Can you elaborate on that? What functions do you mean exactly and does it pertain to this PR?

trans_static_method_callee is ugly for no good reason, and that code seems to be multiplying like rabbits. I am writing a PR that will de-uglify it.

@michaelwoerister michaelwoerister force-pushed the trans_item_collect branch 3 times, most recently from 0a9dacc to f6a5caa Compare January 21, 2016 15:23
@michaelwoerister
Copy link
Member Author

OK, I think I've taken care of all of @nikomatsakis' comments except for using DefId instead of NodeId for function and closure translation items. I want to do that too, since it should allow us to postpone inlining to actual translation.

Also, I'll probably wait for @arielb1's cleanup PR to land before proceeding.

@bors
Copy link
Contributor

bors commented Jan 22, 2016

☔ The latest upstream changes (presumably #31087) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister michaelwoerister force-pushed the trans_item_collect branch 3 times, most recently from 838f3cd to f1dd819 Compare January 25, 2016 17:30
@michaelwoerister
Copy link
Member Author

Some performance numbers (in seconds):

translation item collection total compilation time percentage of total time
libstd 0.334 30.682 1.1
libcore 0.162 37.618 0.43
libsyntax 14.104 214.130 6.6
librustc 6.663 168.191 3.9

The time collection takes for libsyntax and librustc seems a bit excessive. I want to look into that a bit and see if some easy optimizations make a difference there (e.g. currently MIR for external items is deserialized every time the collector encounters a new instantiation of that item...)

@nikomatsakis
Copy link
Contributor

Is this for the defid-based version? Not that I would expect much difference in performance, I was just curious whether that was ready for review.

@michaelwoerister
Copy link
Member Author

Yes, that was for the DefId based version. I did two optimizations in the meantime: caching external MIR and trying harder to avoid redundant work for drop glue translation items and the results look a lot better:

translation item collection total compilation time
libstd 0.259s 28.939s
libcore 0.157s 39.570s
libsyntax 1.567s 214.660s
librustc 4.529s 298.632s

That version should be up soon.

The purpose of the translation item collector is to find all monomorphic instances of functions, methods and statics that need to be translated into LLVM IR in order to compile the current crate.
So far these instances have been discovered lazily during the trans path. For incremental compilation we want to know the set of these instances in advance, and that is what the trans::collect module provides.
In the future, incremental and regular translation will be driven by the collector implemented here.
@michaelwoerister
Copy link
Member Author

OK, should be ready for review.

@eddyb
Copy link
Member

eddyb commented Jan 26, 2016

Minor nit: the email used in your commits is not recognized by GitHub. It would be nice to either amend them, or better, add that email to your GitHub account (if valid; should make all old commits recognized as well).

@michaelwoerister
Copy link
Member Author

@eddyb Thanks for making me aware of that. Should be fixed now.

@nikomatsakis
Copy link
Contributor

@michaelwoerister

And I expect the mir cache here to move into the ccx in the medium term (this needs a bit more than initially expected), so this parameter would go away.

Yes, ok.

@nikomatsakis
Copy link
Contributor

@michaelwoerister so I'm ready to r+, one question: do you think it's worth doing a crater run here?

@nikomatsakis
Copy link
Contributor

(I kicked one off.)

@michaelwoerister
Copy link
Member Author

(I kicked one off.)

Sounds good. The results only show us if there is a crash somewhere but that's also good to know.

@nikomatsakis
Copy link
Contributor

Hmm, my crater run failed somehow. I'll see if I can figure out why.

@nikomatsakis
Copy link
Contributor

But I'm willing to land anyhow. It's a while till the next release :)

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Jan 28, 2016

📌 Commit 4d074b8 has been approved by nikomatsakis

@michaelwoerister
Copy link
Member Author

Are the results of the crater run publicly visible somewhere?

@bors
Copy link
Contributor

bors commented Jan 29, 2016

⌛ Testing commit 4d074b8 with merge 53c2933...

bors added a commit that referenced this pull request Jan 29, 2016
…sakis

The purpose of the translation item collector is to find all monomorphic instances of functions, methods and statics that need to be translated into LLVM IR in order to compile the current crate.

So far these instances have been discovered lazily during the trans path. For incremental compilation we want to know the set of these instances in advance, and that is what the trans::collect module provides.
In the future, incremental and regular translation will be driven by the collector implemented here.

r? @nikomatsakis
cc @rust-lang/compiler

Translation Item Collection
===========================

This module is responsible for discovering all items that will contribute to
to code generation of the crate. The important part here is that it not only
needs to find syntax-level items (functions, structs, etc) but also all
their monomorphized instantiations. Every non-generic, non-const function
maps to one LLVM artifact. Every generic function can produce
from zero to N artifacts, depending on the sets of type arguments it
is instantiated with.
This also applies to generic items from other crates: A generic definition
in crate X might produce monomorphizations that are compiled into crate Y.
We also have to collect these here.

The following kinds of "translation items" are handled here:

 - Functions
 - Methods
 - Closures
 - Statics
 - Drop glue

The following things also result in LLVM artifacts, but are not collected
here, since we instantiate them locally on demand when needed in a given
codegen unit:

 - Constants
 - Vtables
 - Object Shims

General Algorithm
-----------------
Let's define some terms first:

 - A "translation item" is something that results in a function or global in
   the LLVM IR of a codegen unit. Translation items do not stand on their
   own, they can reference other translation items. For example, if function
   `foo()` calls function `bar()` then the translation item for `foo()`
   references the translation item for function `bar()`. In general, the
   definition for translation item A referencing a translation item B is that
   the LLVM artifact produced for A references the LLVM artifact produced
   for B.

 - Translation items and the references between them for a directed graph,
   where the translation items are the nodes and references form the edges.
   Let's call this graph the "translation item graph".

 - The translation item graph for a program contains all translation items
   that are needed in order to produce the complete LLVM IR of the program.

The purpose of the algorithm implemented in this module is to build the
translation item graph for the current crate. It runs in two phases:

 1. Discover the roots of the graph by traversing the HIR of the crate.
 2. Starting from the roots, find neighboring nodes by inspecting the MIR
    representation of the item corresponding to a given node, until no more
    new nodes are found.

The roots of the translation item graph correspond to the non-generic
syntactic items in the source code. We find them by walking the HIR of the
crate, and whenever we hit upon a function, method, or static item, we
create a translation item consisting of the items DefId and, since we only
consider non-generic items, an empty type-substitution set.

Given a translation item node, we can discover neighbors by inspecting its
MIR. We walk the MIR and any time we hit upon something that signifies a
reference to another translation item, we have found a neighbor. Since the
translation item we are currently at is always monomorphic, we also know the
concrete type arguments of its neighbors, and so all neighbors again will be
monomorphic. The specific forms a reference to a neighboring node can take
in MIR are quite diverse. Here is an overview:

The most obvious form of one translation item referencing another is a
function or method call (represented by a CALL terminator in MIR). But
calls are not the only thing that might introduce a reference between two
function translation items, and as we will see below, they are just a
specialized of the form described next, and consequently will don't get any
special treatment in the algorithm.

A function does not need to actually be called in order to be a neighbor of
another function. It suffices to just take a reference in order to introduce
an edge. Consider the following example:

```rust
fn print_val<T: Display>(x: T) {
    println!("{}", x);
}

fn call_fn(f: &Fn(i32), x: i32) {
    f(x);
}

fn main() {
    let print_i32 = print_val::<i32>;
    call_fn(&print_i32, 0);
}
```
The MIR of none of these functions will contain an explicit call to
`print_val::<i32>`. Nonetheless, in order to translate this program, we need
an instance of this function. Thus, whenever we encounter a function or
method in operand position, we treat it as a neighbor of the current
translation item. Calls are just a special case of that.

In a way, closures are a simple case. Since every closure object needs to be
constructed somewhere, we can reliably discover them by observing
`RValue::Aggregate` expressions with `AggregateKind::Closure`. This is also
true for closures inlined from other crates.

Drop glue translation items are introduced by MIR drop-statements. The
generated translation item will again have drop-glue item neighbors if the
type to be dropped contains nested values that also need to be dropped. It
might also have a function item neighbor for the explicit `Drop::drop`
implementation of its type.

A subtle way of introducing neighbor edges is by casting to a trait object.
Since the resulting fat-pointer contains a reference to a vtable, we need to
instantiate all object-save methods of the trait, as we need to store
pointers to these functions even if they never get called anywhere. This can
be seen as a special case of taking a function reference.

Since `Box` expression have special compiler support, no explicit calls to
`exchange_malloc()` and `exchange_free()` may show up in MIR, even if the
compiler will generate them. We have to observe `Rvalue::Box` expressions
and Box-typed drop-statements for that purpose.

Interaction with Cross-Crate Inlining
-------------------------------------
The binary of a crate will not only contain machine code for the items
defined in the source code of that crate. It will also contain monomorphic
instantiations of any extern generic functions and of functions marked with
The collection algorithm handles this more or less transparently. When
constructing a neighbor node for an item, the algorithm will always call
`inline::get_local_instance()` before proceeding. If no local instance can
be acquired (e.g. for a function that is just linked to) no node is created;
which is exactly what we want, since no machine code should be generated in
the current crate for such an item. On the other hand, if we can
successfully inline the function, we subsequently can just treat it like a
local item, walking it's MIR et cetera.

Eager and Lazy Collection Mode
------------------------------
Translation item collection can be performed in one of two modes:

 - Lazy mode means that items will only be instantiated when actually
   referenced. The goal is to produce the least amount of machine code
   possible.

 - Eager mode is meant to be used in conjunction with incremental compilation
   where a stable set of translation items is more important than a minimal
   one. Thus, eager mode will instantiate drop-glue for every drop-able type
   in the crate, even of no drop call for that type exists (yet). It will
   also instantiate default implementations of trait methods, something that
   otherwise is only done on demand.

Open Issues
-----------
Some things are not yet fully implemented in the current version of this
module.

Since no MIR is constructed yet for initializer expressions of constants and
statics we cannot inspect these properly.

Ideally, no translation item should be generated for const fns unless there
is a call to them that cannot be evaluated at compile time. At the moment
this is not implemented however: a translation item will be produced
regardless of whether it is actually needed or not.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/rust-lang/rust/30900)
<!-- Reviewable:end -->
@bors bors merged commit 4d074b8 into rust-lang:master Jan 29, 2016
@bors bors mentioned this pull request Jan 29, 2016
@michaelwoerister
Copy link
Member Author

🎉

@nikomatsakis
Copy link
Contributor

@nikomatsakis
Copy link
Contributor

it reports one regression, but it looks like a false positive.

@nikomatsakis
Copy link
Contributor

(To be clear, that's the report from a fresh, apparently successful run.)

@michaelwoerister
Copy link
Member Author

it reports one regression, but it looks like a false positive.

Yes, looks like the download timed out.
Good to know that all these other crates seem to build fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants