Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute query::Providers almost entirely at compile-time. #74283

Closed
wants to merge 1 commit into from

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Jul 13, 2020

Background

Query "providers" are the functions which actually implement the computation of a query, when the result wasn't already cached.

To allow queries to be implemented "downstream" from the definition of the query system, and even crates using the query, their providers are kept as fn pointers in a struct Providers, which is filled at runtime:

pub struct Providers {
    pub generics_of: fn(TyCtxt<'_>, DefId) -> ty::Generics,
    pub type_of: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx>,
    // ... all other queries ...
}

To further make filling the Providers struct ergonomic, and hide implementation details, fn provide functions are used, which mutate one or more fn pointer fields, in order to "provide" query implementations:

pub fn provide(providers: &mut Providers) {
    *providers = Providers {
        // These refer to `fn generics_of` and `fn type_of` in the same module, which can remain private.
        generics_of,
        type_of,
        // ... other queries implemented in the same module ...
        ..*providers
    };
}

rustc_interface ties all of this together, by calling each crate's top-level fn provide functions, to create a complete Providers, which is then immutably stored within the query system (which ends up inside/referenced by TyCtxt).

Motivation

rustc_interface's API allows custom drivers to override query providers, to customize the behavior of certain queries.
E.g.: @jyn514 used this in #73566 to prevent rustdoc from doing too much work (and producing unwanted errors).

However, if executing the original provider (also called "default", although not in the sense of Providers implementing Default, but rather the result of rustc_interface combining all the fn provide together) is desired, there is no nice way to do so.

The main approach today is to use a lazy_static! / thread_local! to cache one or more providers, by running one or more fn provide, and then extracting some of the fn pointer fields, or even preserving the entire resulting Providers.

Sometimes the actual provider function is publicly exposed, but ideally it should remain hidden, to avoid accidental calls (which would bypass the query cache), or dead code remaining around even when it's no longer used through the query system.

It's also possible that there's some startup cost to constructing the Providers at runtime, but likely not enough to matter.

This PR

All fn provide are now const fn, and they're ran at compile-time to produce const DEFAULT_QUERY_PROVIDERS.
(The one exception is the codegen backend, due to it potentially varying at runtime, but e.g. rustc_codegen_llvm only installs a grand total of 2 providers, so it likely does not matter much)

The effect of this is that the following snippet optimizes to returning a constant fn pointer:

pub fn default_type_of() -> for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx> {
    rustc_interface::DEFAULT_QUERY_PROVIDERS.type_of
}

Probably the best way to grab one of the default providers, however, is this:

const DEFAULT_TYPE_OF: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx> =
    rustc_interface::DEFAULT_QUERY_PROVIDERS.type_of;

As DEFAULT_TYPE_OF is computed entirely at compile-time, any use of it will directly refer to the (private) function which implements the type_of query, even with no LLVM optimizations involved.
(And you can directly use it to call that function, e.g. DEFAULT_TYPE_OF(tcx, def_id))

r? @nikomatsakis cc @nnethercote
(also cc @rust-lang/wg-const-eval because of the feature(const_mut_refs) dogfooding involved)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 13, 2020
@eddyb
Copy link
Member Author

eddyb commented Jul 13, 2020

@bors rollup=never (in case this breaks some tool or fulldeps test, and for perf)

@eddyb
Copy link
Member Author

eddyb commented Jul 13, 2020

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Jul 13, 2020

⌛ Trying commit 77e4bef6bc791bc166021050f8e5d5b618645ea1 with merge 46100ef145a1aca4b68beed35e7314f9a5507928...

@bors
Copy link
Contributor

bors commented Jul 13, 2020

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 13, 2020
@eddyb
Copy link
Member Author

eddyb commented Jul 13, 2020

Oh I was worried about this, but didn't realize it changed in this release cycle:

error[E0764]: mutable references are not allowed in constants
   --> src/librustc_interface/passes.rs:724:25
    |
724 |     let mut providers = &mut Providers::EMPTY;
    |                         ^^^^^^^^^^^^^^^^^^^^^ `&mut` is only allowed in `const fn`

error[E0764]: mutable references are not allowed in constants
   --> src/librustc_interface/passes.rs:749:21
    |
749 |     let providers = &mut DEFAULT_QUERY_PROVIDERS;
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `&mut` is only allowed in `const fn`

@eddyb eddyb force-pushed the const-fn-provide branch from 77e4bef to f0141eb Compare July 13, 2020 02:04
@eddyb
Copy link
Member Author

eddyb commented Jul 13, 2020

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Jul 13, 2020

⌛ Trying commit f0141eb with merge 8e025f12c56194c5c38b7e718c3f1498028d7d94...

@bors
Copy link
Contributor

bors commented Jul 13, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 8e025f12c56194c5c38b7e718c3f1498028d7d94 (8e025f12c56194c5c38b7e718c3f1498028d7d94)

@rust-timer
Copy link
Collaborator

Queued 8e025f12c56194c5c38b7e718c3f1498028d7d94 with parent 9d09331, future comparison URL.

rustc_codegen_ssa::provide(providers);
}
pub const DEFAULT_QUERY_PROVIDERS: Providers = {
// FIXME(eddyb) the `const fn` shouldn't be needed, but mutable references
Copy link
Member

@RalfJung RalfJung Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads as if it should be a fn instead, but I think what you actually mean is that it should be directly inlined in DEFAULT_QUERY_PROVIDERS instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, maybe I should say "separate function" or something like that.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (8e025f12c56194c5c38b7e718c3f1498028d7d94): comparison url.

@bjorn3
Copy link
Member

bjorn3 commented Jul 13, 2020

This is a slight (up to 0.9%) slowdown especially for check benchmarks.

@eddyb
Copy link
Member Author

eddyb commented Jul 13, 2020

Okay, that's weird, I was expecting a speed-up. Since all it needs to do now is memcpy the Providers onto the stack.
Surely there were more copies in the old configuration? I guess I could try to perf diff the compilation of an empty #![no_std] library crate, before and after this PR.

@lcnr

This comment has been minimized.

@eddyb
Copy link
Member Author

eddyb commented Jul 13, 2020

@lcnr compare with the old code, which also kept Providers on the stack.


One thing I wasn't aware of was that LLVM can actually optimize the *providers = Providers { foo, ..*providers }; to a single field write.

So the difference is that before we were writing every field twice, once by Providers::default(), and once by the relevant fn provide, potentially copying the Providers once (keep in mind e.g. Providers::default() couldn't be inlined into rustc_interface, but maybe we do great codegen for that now?), whereas now there's just one copy from .rodata.


Nevermind, I think I found the problem, it's something I was worried might happen but didn't think it could.
Instantiating DEFAULT_QUERY_PROVIDERS results in 14493 instances of define (pre-optimizations).

So a const Providers results in massive amounts of codegen, in the wrong crate (rustc_interface instead of the original).
Presumably most of the functions involved remain in the original crate, so the copies in rustc_interface are less optimized?

EDIT: most of the providers appear to be declare'd not define'd so I'm guessing maybe only the ones using closures get instantiated into rustc_interface? I guess I could check by replacing the Providers::EMPTY functions with closures.

EDIT2: confirmed, it's closures that are causing the excessive cross-crate codegen. I'll look for a potential way to solve this.
(Also, just checked and 56 queries appear to be provided with closures, so it's a bit of a problem)

@eddyb
Copy link
Member Author

eddyb commented Jul 13, 2020

It looks like closures are considered generic functions, due to substs.non_erasable_generics() always returning true, thanks to the "generic parameters" they keep around (to track certain properties, including e.g. captures' types, during type inference).

(-Z share-generics might do something, but I haven't done a full build with it on to see what happens)

I can try and fix the closure monomorphization behavior, although we might want to land it separately, there's clearly some risk of it limiting some optimizations, although not for all the cases in which a closure is found within a generic function, and the only other case would have to be #[inline] functions with closures inside, which we could maybe account for?

@eddyb
Copy link
Member Author

eddyb commented Jul 14, 2020

I've played around with changing the closure codegen behavior (to fit this situation better), and I don't think it's worth it.

What's happening is arguably not a bug, but a feature, we just lose a bit of performance due to the closure-provided queries having duplicate codegen in rustc_interface, and whatever that implies for optimizations.

If we want to rely on CTFE I believe there's only one way: introduce a provide! macro that provides the ergonomics of using closures for queries (or better, really, since *providers = Providers { ..., ..*providers }; is not the nicest thing), but which avoids closures entirely (and instead relies on functions and grabbing the query input/output type from ty::query::queries using the query name, which is something rustc_metadata already employs successfully).

We could even use the same name: |tcx, input| {...}, syntax as today, so the main difference would be in indentation.

I'd appreciate feedback on that, but otherwise it'd be probably fine to just have a lazy_static! in rustc_interface for the default providers, so that custom drivers don't need to keep their own. And that should have no impact on performance either way.


While messing with closure codegen, I've come across a few things that I'll try to split into several PRs (aren't rabbit holes fun?).

@oli-obk
Copy link
Contributor

oli-obk commented Jul 14, 2020

have a lazy_static! in rustc_interface for the default providers, so that custom drivers don't need to keep their own.

I think I'd prefer that. All the other approaches don't seem strictly better than the status quo (modulo getting the current PR to work fast, but that rabbit hole has been explored). Without a clear improvement over the status quo, I personally prefer a simpler solution over a complex one, even if the complex one doesn't require lazy statics.

@eddyb
Copy link
Member Author

eddyb commented Jul 14, 2020

@oli-obk In case it wasn't clear: the macro idea should make the current PR to work, by removing all closures (i.e. the macro can turn closure syntax into regular functions under the hook, which would not trigger the problematic behavior this PR stumbled onto).

@oli-obk
Copy link
Contributor

oli-obk commented Jul 14, 2020

oh... hmm... ok that works, just leave a FIXME for removing the macro once the closure problem is resolved (do we have an issue for that?)

@eddyb
Copy link
Member Author

eddyb commented Jul 14, 2020

The macro would be slightly nicer than today (and more declarative, so it would work with approaches other than "struct of fn pointers"), so I don't know why we would remove it, to be honest.

And the closure thing we'll likely not fix, as it's arguably largely not a bug: it's free devirtualization, it just happens to slightly regress performance in this very specific case.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 14, 2020

I was under the impression that this may be duplicating closure bodies all over the place, causing them to get compiled twice when we should just be referring to the already compiled function from the other crate.

@eddyb
Copy link
Member Author

eddyb commented Jul 14, 2020

@oli-obk Yes, that's typically a good idea because we want to inline closures. And it barely hurts rustc, presumably they just get less optimized in rustc_interface than they would in their original crate. AFAIK the only official way to avoid closures or generic functions from being codegen'd downstream is -Z share-generics, I don't know what the future of that is.

@jyn514
Copy link
Member

jyn514 commented Jul 15, 2020

have a lazy_static! in rustc_interface for the default providers, so that custom drivers don't need to keep their own.

I implemented this in #74347.

Manishearth added a commit to Manishearth/rust that referenced this pull request Jul 15, 2020
…u, r=eddyb

Initialize default providers only once

This avoids copying a new `Providers` struct for each downstream crate
that wants to use it.

Follow-up to rust-lang#74283 without the perf hit.

r? @eddyb
@nikomatsakis
Copy link
Contributor

Question: if instead of fn we used Box<dyn Fn()>, what would happen? I think that for the default providers, there would be no actual allocation, and the same for zero-sized closures with no upvars. But when "overriding" a provider, people could then allocate a Box that stores the old value. This would permit more of a "layering" approach, though I'm not sure if anyone needs that.

Anyway, I don't know that I prefer this approach to what is being proposed here, I was just curious though if it would have obvious performance implications in comparison to fn()?

@nikomatsakis
Copy link
Contributor

(It would be ergonomically annoying to write foo: Box::new(foo), for sure.)

@nikomatsakis
Copy link
Contributor

Also, fwiw, I am not a fan of having providers provided via closures. I always find it hard to find their definitions that way, for one thing, and it feels like it clumps many providers into one function. I prefer if people use a free function with the same name as the query, so that it is easy to find.

@bjorn3
Copy link
Member

bjorn3 commented Jul 17, 2020

Question: if instead of fn we used Box<dyn Fn()>, what would happen? I think that for the default providers, there would be no actual allocation, and the same for zero-sized closures with no upvars. But when "overriding" a provider, people could then allocate a Box that stores the old value. This would permit more of a "layering" approach, though I'm not sure if anyone needs that.

That would double the size of Providers, which is already very big. This will likely cause a slowdown.

@eddyb
Copy link
Member Author

eddyb commented Jul 19, 2020

@nikomatsakis Would be interesting if we had a quick way of measuring the performance impact of that (I guess we can do the grunt work of writing Box::new(...) around each call site - or maybe the box operator works?).

I'm not as worried about anything happening before TyCtxt is constructed, even if it can affect startup costs.
But I expect vtable calls to be more expensive (and would be pleasantly surprised to learn that I am wrong).

Also, we kind of want to keep providers as pure as possible, but using Fn() wouldn't necessarily change much.

@nikomatsakis
Copy link
Contributor

@eddyb

But I expect vtable calls to be more expensive

i.e., you expect the extra load involved in dispatch through a vtable to be very expensive?

@Muirrum Muirrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 6, 2020
@eddyb
Copy link
Member Author

eddyb commented Aug 16, 2020

i.e., you expect the extra load involved in dispatch through a vtable to be very expensive?

Maybe, but maybe I'm just assuming. It's definitely at least as expensive, but it could be marginal, depending on workload.

I remember writing a longer reply but I think I forgot to send it and it got lost ages ago.


I forget what I wanted to do with this PR. I think I'll just close it but not delete the branch, so anyone can refer to it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.