Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

construct MIR for all crates up to and including rustdoc #27893

Merged
merged 12 commits into from
Sep 6, 2015

Conversation

nikomatsakis
Copy link
Contributor

This PR contains a new crate, rustc_mir, which implements the MIR as specified in the RFC (more or less). There are no targeted unit tests at the moment, as I didn't decide what kind of infrastructure would be best and didn't take the time to implement it.

NB: In packaging up this PR, I realized that MIR construction code is not triggering for methods right now, I think it's only for fixed fns. I'll push a fix for this soon. Hopefully it doesn't stop any crates from building. :) Fixed. Everything still seems to work.

However, the MIR construction code (librustc_mir/build) is intentionally quite distinct from the code which munges the compiler's data structures (librustc_mir/tcx). The interface between the two is the HIR trait (librustc_mir/hir). To avoid confusion with @nrc's work, perhaps a better name for this trait is warranted, although ultimately this trait will be connected to the HIR, I imagine, so in a way the name is perfect. Anyway, I'm open to suggestions. The initial motivation for this split was to allow for the MIR construction code to be unit-tested. But while I didn't end up writing unit tests (yet), I did find the split made the code immensely easier to think about, since the messiness of our existing system, with its myriad hashtables, punning, and so forth, is confined to one part, which simply transforms to a more fully explicit AST-like form. I tried to separate out the commits somewhat, but since this mostly new code, it mostly winds up coming in one fell swoop in the MIR commit.

Quick guide to the MIR crate:

  • repr.rs defines the MIR itself; each MIR instance is parameterized by some HIR H
  • build/ is the MIR construction code, parameterized by a particular HIR
  • hir/ is the definition of the HIR interface
  • tcx/ is the impl of the HIR interface for the tcx
  • dump.rs is the minimal compiler pass that invokes the HIR

One open question:

  • In the HIR trait, I used exclusively struct-like variants. I found I like this more, since things have names. Should I convert the repr code?

@nikomatsakis
Copy link
Contributor Author

cc @rust-lang/compiler

@rust-highfive
Copy link
Collaborator

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@@ -479,6 +494,16 @@ pub trait Labeller<'a,N,E> {
}
}

/// Escape tags in such a way that it is suitable for inclusion in a
/// Graphviz HTML label.
pub fn escape_html(s: &str) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... this isn't a great escape function, but I guess it's not really untrusted input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, yes. :)

@arielb1
Copy link
Contributor

arielb1 commented Aug 18, 2015

@nikomatsakis

The HIR in this patch is fairly distant from @nrc's HIR: his HIR is very close to the (expanded) syntax, and intended to separate type-checking from the details of expansion/resolution, while the HIR in this patch is more like an elaborated, desugared version of Rust.

Vec { fields: Vec<ExprRef<H>> },
Tuple { fields: Vec<ExprRef<H>> },
Adt { adt_def: H::AdtDef,
variant_index: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you store the variant itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielb1

why don't you store the variant itself?

Because it was inconvenient to get from the variant to the containing enum and recompute its index, I think. Maybe I should just add those two bits of information to the VariantDef though.

@nikomatsakis
Copy link
Contributor Author

@arielb1

The HIR in this patch is fairly distant from @nrc's HIR: his HIR is very close to the (expanded) syntax, and intended to separate type-checking from the details of expansion/resolution, while the HIR in this patch is more like an elaborated, desugared version of Rust.

Yes, sure. It may be that we evolve @nrc's HIR downwards (I hope we do), but in any case I agree they are not identical, though in my head they play pretty much the same role: a relatively high-level, AST-oriented view on Rust. Anyway I didn't see a suggestion for a better name in your comment. :P

@glaebhoerl
Copy link
Contributor

MHIR or HMIR? :p

EDIT: Perhaps AIR, for Abstract Intermediate Representation. Or HAIR.

@Diggsey
Copy link
Contributor

Diggsey commented Aug 19, 2015

EIR: Elevated Intermediate Representation...

@nrc
Copy link
Member

nrc commented Aug 19, 2015

@glaebhoerl Her Majesty's Intermediate Representation? /British

pub fn escape_html(s: &str) -> String {
s
.replace("\"", "&quot;")
.replace("&", "&amp;")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always replace & first, to avoid &amp;quot;.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ms2ger

Always replace & first, to avoid &quot;.

heh, yes.

@nikomatsakis
Copy link
Contributor Author

@glaebhoerl hmm, I do like "HAIR" :)

}

/// Compile `expr`, yielding a compile-time constant. Assumes that
/// `expr` is a valid compile-time constant!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the idea that after check_const runs, everything marked as const will get turned into a const-rvalue?

I'd have thought the const-evaluator would be like an llvm-optimization-pass, simply running on the MIR without much more information (global consts + extern consts + const-fn are needed of course).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oli-obk I started writing a response and found I had more to say than seemed appropriate for a GitHub comment. See this thread: https://internals.rust-lang.org/t/how-to-handle-constants-in-the-compiler/2543

@nikomatsakis
Copy link
Contributor Author

r? @pnkfelix

after discussing with @huonw, decided to change official reviewer to pnkfelix, but I figure everyone will kibbitz anyhow.

@rust-highfive rust-highfive assigned pnkfelix and unassigned huonw Aug 21, 2015
Use(Lvalue<H>),

// [x; 32]
Repeat(Lvalue<H>, Lvalue<H>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the second value should be a Constant<H>?

@nikomatsakis
Copy link
Contributor Author

@arielb1

Also, no, it's not only primitive operators, but all operators except for explicit lvalue projections (i.e. addr-of, deref, field access, match, the LHS of array indexing) have their arguments copied to temporaries (up to optimizations) rather than accessed directly.

Hmm, yes, I see, I see. Of course I copied this approach from how trans does things, but I agree it's not right. I'm modifying the MIR construction now, but it seems to push me more in favor of introducing an Operand category, so that "by-value" Rvalue things take Operand rather than Lvalue. This has a few advantages: we can inline constants more easily (since Operand can include constants) and also we force ourselves to through as_operand (or something similar) where appropriate, which will ensure we don't accidentally wind up referencing some data in place that was (logically) moved. And then, as you say, we can later strip out intermediates that are not needed to achieve better re-use. I'll do some experiments and see how this works out.

@nikomatsakis
Copy link
Contributor Author

OK, I just did a major rebase that folded in all of the WIP comments. In particular I introduced a new concept into the MIR, Operand, that corresponds (at construction time) to a value. It can be either an lvalue (as before) or a constant. This means that when you have an expression like x + y that references lvalues, you'll have temporaries introduced:

tmp0 = x
tmp1 = y
result = tmp0 + tmp1

It also means we do NOT introduce temporaries for constants in expressions like x + 1, those can now be directly folded in.

It seems like these properties are worthy of unit testing, so I may turn some attention to how we can build up a unit test framework. That said, introducing the notion of Operand makes it clear in the type system where something is being used "by value" versus "by reference" (what we have sometimes called an "lvalue context"). This means that the construction code is, at least, known to invoke as_operand for each "by value" use, so so long as we can convince ourselves that it introduces a temporary for lvalues, we're all set.

Also, it should be no problem to purge unnecessary temporaries later, once we have ascertained for sure that there is no mutation to the source of the temporary. As @arielb1 said, I'd rather do that in a separate, dedicated pass (this can also cover the optimizations around pattern matching we now do in trans, I imagine.)

This may be worth updating the RFC for as well, since it marks the first departure, but I'll hold off on that. (And the RFC was clearly intended to evolve anyhow.)

@bors
Copy link
Contributor

bors commented Aug 25, 2015

☔ The latest upstream changes (presumably #27943) made this pull request unmergeable. Please resolve the merge conflicts.


#[derive(Clone, PartialEq)]
pub enum Operand<H:HIR> {
Lvalue(Lvalue<H>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd almost want to name this variant Move or Consume.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyb

I'd almost want to name this variant Move or Consume.

Seems good.

@arielb1
Copy link
Contributor

arielb1 commented Aug 25, 2015

@nikomatsakis

this can also cover the optimizations around pattern matching we now do in trans, I imagine.

Of course this won't do the decision-tree optimizations, but the temporary-elimination optimizations we do in trans are basically the reverse of RVO.

We translate code like

    match foo {
        Some(foo_inner) => return Some(f(foo_inner));
        None => return None
    }

into MIR (sugared-assmebly-format, rather than graph-format, temporaries are %name, ordinary variables var)

    %discr: Option::Discr = foo.@discr ; match takes an lvalue
    switch %discr, None=>.none, Some=>.some
.some:
    foo_inner = foo.inner ; this isn't even a temporary - borrowck ensures no trouble
.some_body:
    %f_arg : T = foo_inner
    %f_ret : U = call f, %f_arg, onerr <snip> 
    %ret_val_2 : Option<U> = Some(%f_ret)
    *%ret = %ret_val_2
    jmp exit
.none:
.none_body:
    %ret_val : Option<U> = None
    *%ret = %ret_val
exit:
    ret

Standard RVO would replaces temporaries that are written somewhere with the place they are written to:

    %discr: Option::Discr = foo.@discr ; match takes an lvalue
    switch %discr, None=>.none, Some=>.some
.some:
    foo_inner = foo.inner
.some_body:
    %f_arg : Option<T> = foo_inner
    ; %ret_val_2.as<Some>.0 = %f_ret
    ; (*%ret) = %ret_val_2
    (*%ret).as<Some>.0 = call f, %f_arg, onerr <snip> 
    *%ret = Some((*%ret).as<Some>.0) ; lowering ignores the self-assignment
    jmp exit
.none:
.none_body:
    *%ret = None
exit:
    ret

Match optimization substitutes temporaries with their values

    %discr: Option::Discr = foo.@discr ; match takes an lvalue
    switch %discr, None=>.none, Some=>.some
.some:
.some_body:
    ; foo_inner = foo.inner - not really a temporary, but good enough
    ; %f_arg = foo_inner
    (*%ret).as<Some>.0 = call f, foo.inner, onerr <snip> 
    *%ret = Some((*%ret).as<Some>.0) ; lowering ignores the self-assignment
    jmp exit
.none:
.none_body:
    *%ret = None
exit:
    ret

@arielb1
Copy link
Contributor

arielb1 commented Aug 26, 2015

Maybe DVIR (desugared value-level IR) for the "HIR"?

@nikomatsakis
Copy link
Contributor Author

@arielb1

Of course this won't do the decision-tree optimizations, but the temporary-elimination optimizations...

Yes, this is what I meant.

@nikomatsakis
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Sep 6, 2015

⌛ Trying commit 2f00086 with merge ea3892f...

bors added a commit that referenced this pull request Sep 6, 2015
This PR contains a new crate, `rustc_mir`, which implements the MIR as specified in the RFC (more or less). There are no targeted unit tests at the moment, as I didn't decide what kind of infrastructure would be best and didn't take the time to implement it. 

~~NB: In packaging up this PR, I realized that MIR construction code is not triggering for methods right now, I think it's only for fixed fns. I'll push a fix for this soon. Hopefully it doesn't stop any crates from building. :)~~ Fixed. Everything still seems to work.

However, the MIR construction code (`librustc_mir/build`) is intentionally quite distinct from the code which munges the compiler's data structures (`librustc_mir/tcx`). The interface between the two is the `HIR` trait (`librustc_mir/hir`). To avoid confusion with @nrc's work, perhaps a better name for this trait is warranted, although ultimately this trait *will* be connected to the HIR, I imagine, so in a way the name is perfect. Anyway, I'm open to suggestions. The initial motivation for this split was to allow for the MIR construction code to be unit-tested. But while I didn't end up writing unit tests (yet), I did find the split made the code immensely easier to think about, since the messiness of our existing system, with its myriad hashtables, punning, and so forth, is confined to one part, which simply transforms to a more fully explicit AST-like form. I tried to separate out the commits somewhat, but since this mostly new code, it mostly winds up coming in one fell swoop in the MIR commit.

Quick guide to the MIR crate:

- `repr.rs` defines the MIR itself; each MIR instance is parameterized by some HIR `H`
- `build/` is the MIR construction code, parameterized by a particular HIR
- `hir/` is the definition of the HIR interface
- `tcx/` is the impl of the HIR interface for the tcx
- `dump.rs` is the minimal compiler pass that invokes the HIR

One open question:

- In the HIR trait, I used exclusively struct-like variants. I found I like this more, since things have names. Should I convert the repr code?
@nikomatsakis
Copy link
Contributor Author

@bors r=nrc

(@nrc gave thumbs up over IRC)

@bors
Copy link
Contributor

bors commented Sep 6, 2015

📌 Commit 2f00086 has been approved by nrc

@nikomatsakis
Copy link
Contributor Author

@bors r=nrc

@bors
Copy link
Contributor

bors commented Sep 6, 2015

📌 Commit c8a6618 has been approved by nrc

@bors
Copy link
Contributor

bors commented Sep 6, 2015

⌛ Testing commit c8a6618 with merge 3dd1a48...

bors added a commit that referenced this pull request Sep 6, 2015
This PR contains a new crate, `rustc_mir`, which implements the MIR as specified in the RFC (more or less). There are no targeted unit tests at the moment, as I didn't decide what kind of infrastructure would be best and didn't take the time to implement it. 

~~NB: In packaging up this PR, I realized that MIR construction code is not triggering for methods right now, I think it's only for fixed fns. I'll push a fix for this soon. Hopefully it doesn't stop any crates from building. :)~~ Fixed. Everything still seems to work.

However, the MIR construction code (`librustc_mir/build`) is intentionally quite distinct from the code which munges the compiler's data structures (`librustc_mir/tcx`). The interface between the two is the `HIR` trait (`librustc_mir/hir`). To avoid confusion with @nrc's work, perhaps a better name for this trait is warranted, although ultimately this trait *will* be connected to the HIR, I imagine, so in a way the name is perfect. Anyway, I'm open to suggestions. The initial motivation for this split was to allow for the MIR construction code to be unit-tested. But while I didn't end up writing unit tests (yet), I did find the split made the code immensely easier to think about, since the messiness of our existing system, with its myriad hashtables, punning, and so forth, is confined to one part, which simply transforms to a more fully explicit AST-like form. I tried to separate out the commits somewhat, but since this mostly new code, it mostly winds up coming in one fell swoop in the MIR commit.

Quick guide to the MIR crate:

- `repr.rs` defines the MIR itself; each MIR instance is parameterized by some HIR `H`
- `build/` is the MIR construction code, parameterized by a particular HIR
- `hir/` is the definition of the HIR interface
- `tcx/` is the impl of the HIR interface for the tcx
- `dump.rs` is the minimal compiler pass that invokes the HIR

One open question:

- In the HIR trait, I used exclusively struct-like variants. I found I like this more, since things have names. Should I convert the repr code?
@bors bors merged commit c8a6618 into rust-lang:master Sep 6, 2015
@nikomatsakis nikomatsakis deleted the mir branch March 30, 2016 16:16
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 14, 2023
Remove unnecessary FIXME

Found this while browsing rustc, I traced it back to rust-lang#27893 when MIR first introduced, some time passed since then and I think this FIXME is no longer necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.