Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AMD PMC/U support #222

Merged
merged 5 commits into from
Aug 9, 2024
Merged

Add AMD PMC/U support #222

merged 5 commits into from
Aug 9, 2024

Conversation

lancelui-amzn
Copy link
Contributor

@lancelui-amzn lancelui-amzn commented Aug 6, 2024

Description of changes:

  • Added AMD PMU event numbers and support in perf_stat

Testing:

Ran benchmarks on AMD milan and genoa family instances and compared output to results generated by perfrunbook and Intel/graviton based instances.
Example of ipc comparison between perfrunbook and aperf:

Screenshot 2024-08-05 at 11 15 21 PM Screenshot 2024-08-05 at 11 34 01 PM

Note: We observed some incorrect data presumably from event multiplexing (up to 13 groups of 14 events). Values may be different than true counts due to this scaling, but trends remain accurate.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@lancelui-amzn lancelui-amzn requested a review from a team as a code owner August 6, 2024 04:17
/* Get Model specific events */
platform_specific_counter = match cpu_info.model_name.as_str() {
"AMD EPYC 9R14" => GENOA_CTRS.to_vec(),
"AMD EPYC 7R13 Processor" => MILAN_CTRS.to_vec(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some processors may have different model name. For example, ""AMD EPYC 7R13 48-Core Processor" on c6a metal instances.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that because you're planning on making changes here to do, for example, a substring or prefix match? Or are you saying it because you want to know whether it matters if metal works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to note it here. Have a prefix match for the first 13 characters and will push changes soon.

@@ -0,0 +1,24 @@
use crate::data::perf_stat::{NamedCtr, NamedTypeCtr, PerfType};

static STALL_BACKEND_PKC: NamedTypeCtr = NamedTypeCtr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't PKC, this is a single event, it's just backend stalls.

name: "stall_backend_pkc",
nrs: vec![STALL_BACKEND_PKC],
drs: vec![CYCLES],
scale: 167 //~= 1000/6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is really weird, why this value instead of just 1000? The ones below for milan are just 1000.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I think its from each cycle being able to dispatch 6 instructions? This scale was from our documentation and AMD documentation.

@@ -0,0 +1,35 @@
use crate::data::perf_stat::{NamedCtr, NamedTypeCtr, PerfType};

static STALL_BACKEND_PKC1: NamedTypeCtr = NamedTypeCtr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not PKC

name: "Backend-Stalls-1",
config: 0xf7ae,
};
static STALL_BACKEND_PKC2: NamedTypeCtr = NamedTypeCtr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not PKC

};
static STALL_BACKEND_PKC2: NamedTypeCtr = NamedTypeCtr {
perf_type: PerfType::RAW,
name: "Backend-Stalls-2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any better specifiers for these besides just -1 and -2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, each one counts several backend stall types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not documented what exactly each one counts, perhaps the better thing to do is add them together. That assumes there's not overlap in what each one counts.

Copy link
Contributor Author

@lancelui-amzn lancelui-amzn Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with Geoff, the sum of the two is the total backend stalls. Pushed changes

};
static L1_DATA: NamedTypeCtr = NamedTypeCtr {
perf_type: PerfType::RAW,
name: "L1-Data",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an L1 data access, or a L1 data miss?

In general a lot of these counters have names that are not specific enough. I haven't looked at our existing definitions, hopefully they aren't all like this.

/* Get Model specific events */
platform_specific_counter = match cpu_info.model_name.as_str() {
"AMD EPYC 9R14" => GENOA_CTRS.to_vec(),
"AMD EPYC 7R13 Processor" => MILAN_CTRS.to_vec(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that because you're planning on making changes here to do, for example, a substring or prefix match? Or are you saying it because you want to know whether it matters if metal works?

Comment on lines 98 to 102
NamedCtr {
name: "l3-mpki",
nrs: vec![L3],
nrs: vec![L1_ANY_FILLS_DRAM],
drs: vec![INSTRUCTIONS],
scale: 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name says L3, nrs says L1. Are you sure this is right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The counter is approximating L3 misses. "This is sorta l3 mpki, but ellides Prefetch misses from L2" - from our docs.

@@ -19,17 +19,11 @@ static CYCLES: NamedTypeCtr = NamedTypeCtr {
lazy_static! {
pub static ref MILAN_CTRS: Vec<NamedCtr<'static>> = [
NamedCtr {
name: "stall_backend_pkc1",
nrs: vec![STALL_BACKEND_1],
name: "stall_backend",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it still have the pkc in its name? Do whatever we do for other processors on this metric.

But having a metric without showing what its units/scale are seems incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right.

Copy link
Contributor

@wash-amzn wash-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cupcakes had nothing to do with this approval ...

@wash-amzn wash-amzn merged commit 57242bb into aws:main Aug 9, 2024
6 checks passed
@lancelui-amzn lancelui-amzn deleted the amd-pmu branch August 9, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants