-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor query arena allocation out from query caches #107833
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced by the direction of this last commit.
My proposal would be to:
- migrate all queries to arena-allocate manually and drop
arena_cache
for them, - remove all the arena support.
@@ -67,6 +71,24 @@ use std::sync::Arc; | |||
pub(crate) use rustc_query_system::query::QueryJobId; | |||
use rustc_query_system::query::*; | |||
|
|||
pub struct QuerySystem<'tcx> { | |||
pub local_providers: Box<Providers>, | |||
pub extern_providers: Box<ExternProviders>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the providers moved here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was convenient to access them using a TyCtxt
somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do want to try moving the rest of the rustc_query_impl::Queries
state to rustc_middle
though, as that slightly reduces the size of the query hot path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I didn't see where you use them. Could you point it?
This data structure has been put into rustc_query_impl
in an effort to shrink rustc_middle
and its compile time. I'd rather see stuff moved to rustc_query_impl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To conclude this discussion, could you keep the current setup of fields in GlobalCtxt
? To be re-discussed in another PR that needs those fields moved.
I kind of view this as an incremental step towards removing |
☔ The latest upstream changes (presumably #107643) made this pull request unmergeable. Please resolve the merge conflicts. |
@@ -795,6 +793,8 @@ pub fn create_global_ctxt<'tcx>( | |||
untracked, | |||
dep_graph, | |||
queries.on_disk_cache.as_ref().map(OnDiskCache::as_dyn), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: is there a reason we don't make this a method on QueryEngine
? seems weird to pass in a reference to the query engine and to a field of it.
local_providers, | ||
extern_providers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these could be exposed from the QueryEngine
, or if that is a performance issue, we could make the current QueryEngine
uses be a struct with a trailing dyn QueryEngine
field to have quick access to the fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(job, result) | ||
// Mark as complete before we remove the job from the active state | ||
// so no other thread can re-execute this query. | ||
cache.complete(key.clone(), result, dep_node_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes behaviour for the parallel compiler. IIUC, this is a bugfix, so we should probably land this separately.
value: query_provided::$name<'tcx>, | ||
) -> query_values::$name<'tcx> { | ||
query_if_arena!([$($modifiers)*] | ||
(&*_tcx.query_system.arenas.$name.alloc(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the global tcx.arena
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to use the dropless arena for types without destructors at least, which improves performance. We have to add all the query types back to tcx.arena
before fully using that.
#[allow(nonstandard_style, unused_lifetimes, unused_parens)] | ||
pub mod query_storage { | ||
#[allow(nonstandard_style, unused_lifetimes)] | ||
pub mod query_provided { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you doc-comment on the module that this type is?
)* | ||
} | ||
#[allow(nonstandard_style, unused_lifetimes)] | ||
pub mod query_provided_to_value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you doc-comment on the module that this does?
|
||
$( | ||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline always?
@@ -67,6 +71,24 @@ use std::sync::Arc; | |||
pub(crate) use rustc_query_system::query::QueryJobId; | |||
use rustc_query_system::query::*; | |||
|
|||
pub struct QuerySystem<'tcx> { | |||
pub local_providers: Box<Providers>, | |||
pub extern_providers: Box<ExternProviders>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To conclude this discussion, could you keep the current setup of fields in GlobalCtxt
? To be re-discussed in another PR that needs those fields moved.
qcx, | ||
dep_node | ||
); | ||
value.map(|value| query_provided_to_value::$name(qcx.tcx, value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need a call to query_provided_to_value
? Could we work with only the query_value
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arena_cache
decodes <V as Deref>::Target
so we need to put it on an arena. We can't decode plain V
anymore since #73851.
65f0ce0
to
2d24908
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some clones (even if usually no ops/copies) can be avoided now
2d24908
to
afa9de7
Compare
I think I've address all comments now. The use of the
The incremental case still regresses a bit:
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 257a5a4d57637401596ae8d047b0f576a0dda672 with merge 86f67440d16aa2f389da87483866d18ca64a9c3e... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (86f67440d16aa2f389da87483866d18ca64a9c3e): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
@bors r+ |
📌 Commit 257a5a4d57637401596ae8d047b0f576a0dda672 has been approved by It is now in the queue for this repository. |
⌛ Testing commit 257a5a4d57637401596ae8d047b0f576a0dda672 with merge 76cde7c771243a8bd7e125bc4486fb484327f7a7... |
💔 Test failed - checks-actions |
☔ The latest upstream changes (presumably #108116) made this pull request unmergeable. Please resolve the merge conflicts. |
257a5a4
to
caf29b2
Compare
@bors r+ |
@@ -271,6 +267,7 @@ where | |||
QueryResult::Poisoned => panic!(), | |||
} | |||
}; | |||
cache.complete(key, result, dep_node_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this required any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the extra block that I removed? Not sure when it became redundant.
☀️ Test successful - checks-actions |
Finished benchmarking commit (947b696): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
Upstream PRs that require local changes: - Switch to EarlyBinder for type_of query rust-lang/rust#107753 - Factor query arena allocation out from query caches rust-lang/rust#107833 Co-authored-by: Qinheping Hu <qinhh@amazon.com>
This moves the logic for arena allocation out from the query caches into conditional code in the query system. The specialized arena caches are removed. A new
QuerySystem
type is added inrustc_middle
which contains the arenas, providers and query caches.Performance seems to be slightly regressed:
Incremental performance is a bit worse:
It does seem like LLVM optimizers struggle a bit with the current state of the query system.
Based on top of #107782 and #107802.
r? @cjgillot