Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Reduce znver3/4 LoopMicroOpBufferSize to practical loop unrolling values #91340

Merged
merged 1 commit into from
May 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions llvm/lib/Target/X86/X86ScheduleZnver3.td
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,10 @@ def Znver3Model : SchedMachineModel {
// The op cache is organized as an associative cache with 64 sets and 8 ways.
// At each set-way intersection is an entry containing up to 8 macro ops.
// The maximum capacity of the op cache is 4K ops.
// Agner, 22.5 µop cache
// The size of the µop cache is big enough for holding most critical loops.
// FIXME: PR50584: MachineScheduler/PostRAScheduler have quadradic complexity,
// with large values here the compilation of certain loops
// ends up taking way too long.
// let LoopMicroOpBufferSize = 4096;
let LoopMicroOpBufferSize = 512;
// Assuming a maximum dispatch of 8 ops/cy and a mispredict cost of 12cy from
// the op-cache, we limit the loop buffer to 8*12 = 96 to avoid loop unrolling
// leading to excessive filling of the op-cache from frontend.
let LoopMicroOpBufferSize = 96;
// AMD SOG 19h, 2.6.2 L1 Data Cache
// The L1 data cache has a 4- or 5- cycle integer load-to-use latency.
// AMD SOG 19h, 2.12 L1 Data Cache
Expand Down
16 changes: 5 additions & 11 deletions llvm/lib/Target/X86/X86ScheduleZnver4.td
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,11 @@ def Znver4Model : SchedMachineModel {
// AMD SOG 19h, 2.9.1 Op Cache
// The op cache is organized as an associative cache with 64 sets and 8 ways.
// At each set-way intersection is an entry containing up to 8 macro ops.
// The maximum capacity of the op cache is 4K ops.
// Agner, 22.5 µop cache
// The size of the µop cache is big enough for holding most critical loops.
// FIXME: PR50584: MachineScheduler/PostRAScheduler have quadradic complexity,
// with large values here the compilation of certain loops
// ends up taking way too long.
// Ideally for znver4, we should have 6.75K. However we don't add that
// considerting the impact compile time and prefer using default values
// instead.
// Retaining minimal value to influence unrolling as we did for znver3.
let LoopMicroOpBufferSize = 512;
// The maximum capacity of the op cache is 6.75K ops.
// Assuming a maximum dispatch of 9 ops/cy and a mispredict cost of 12cy from
// the op-cache, we limit the loop buffer to 9*12 = 108 to avoid loop
// unrolling leading to excessive filling of the op-cache from frontend.
let LoopMicroOpBufferSize = 108;
// AMD SOG 19h, 2.6.2 L1 Data Cache
// The L1 data cache has a 4- or 5- cycle integer load-to-use latency.
// AMD SOG 19h, 2.12 L1 Data Cache
Expand Down
Loading
Loading