Skip to content

Commit

Permalink
Add auxiliary event automatically, if needed by the hardware.
Browse files Browse the repository at this point in the history
  • Loading branch information
jmuehlig committed Dec 8, 2024
1 parent 73f7efa commit e4759d8
Show file tree
Hide file tree
Showing 7 changed files with 117 additions and 22 deletions.
34 changes: 31 additions & 3 deletions docs/sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -592,9 +592,31 @@ Additionally, memory sampling typically requires a [precision](#precision) setti
#### Before Sapphire Rapids
From our experience, Intel's Cascade Lake architecture (and earlier architectures) only reports latency and source for memory loads, not stores – this changes from Sapphire Rapids.

#### Sapphire Rapids
To use weight-sampling on Intel's Sapphire Rapids architecture, the perf subsystem needs an auxiliary counter to be added to the group, before the first "real" counter is added (see [this commit](https://lore.kernel.org/lkml/1612296553-21962-3-git-send-email-kan.liang@linux.intel.com/)).
*perf-cpp* will define this counter, you only need to add it accordingly.
You can add load and store events like this:

```cpp
sampler.trigger("mem-loads", perf::Precision::MustHaveZeroSkid); /// Only load events
```
or
```cpp
sampler.trigger("mem-stores", perf::Precision::MustHaveZeroSkid); /// Only store events
```
or
```cpp
/// Load and store events
sampler.trigger({
std::vector<perf::Sampler::Trigger>{{"mem-loads", perf::Precision::MustHaveZeroSkid}},
std::vector<perf::Sampler::Trigger>{{"mem-stores", perf::Precision::MustHaveZeroSkid}}
});
```

#### Sapphire Rapids and Beyond
To use memory latency sampling on Intel's Sapphire Rapids architecture, the perf subsystem **needs an auxiliary counter** to be added to the group, before the first "real" counter is added (see [this commit](https://lore.kernel.org/lkml/1612296553-21962-3-git-send-email-kan.liang@linux.intel.com/)).

*perf-cpp* will define this counter and **add it as a trigger automatically** (from version `0.10.0`), when it can detect that the hardware needs it.
In this case, you can proceed as before *Sapphire Rapids*.

However, if the detection fails but the system needs it, you can add it yourself:

```cpp
sampler.trigger({
Expand All @@ -608,6 +630,12 @@ sampler.trigger({
&rarr; [See code example](../examples/multi_event_sampling.cpp)
You can check if the auxiliary counter is needed by checking if the following file exists in the system:
```
/sys/bus/event_source/devices/cpu/events/mem-loads-aux
```
### AMD (Instruction Based Sampling)
AMD uses Instruction Based Sampling to tag instructions randomly for sampling and collect various information for each sample ([see the programmer reference](https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf)).
In contrast to Intel's mechanism, IBS cannot tag specific load and store instructions (and apply a filter on the latency).
Expand Down
10 changes: 6 additions & 4 deletions include/perfcpp/counter.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ class CounterConfig
[[nodiscard]] std::optional<std::uint8_t> precise_ip() const noexcept { return _precise_ip; }
[[nodiscard]] std::optional<PeriodOrFrequency> period_or_frequency() const noexcept { return _period_or_frequency; }

[[nodiscard]] bool operator==(const CounterConfig& other) const noexcept
{
return _type == other._type && _event_id == other._event_id;
}

private:
std::uint32_t _type;
std::uint64_t _event_id;
Expand Down Expand Up @@ -197,10 +202,7 @@ class Counter
std::optional<pid_t> process_id = std::nullopt,
std::optional<std::int32_t> cpu_id = std::nullopt) const;

[[nodiscard]] bool operator==(const CounterConfig config) const noexcept
{
return _config.type() == config.type() && _config.event_id() == config.event_id();
}
[[nodiscard]] bool operator==(const CounterConfig& config) const noexcept { return _config == config; }

private:
/// The config of an event.
Expand Down
11 changes: 11 additions & 0 deletions include/perfcpp/exception.h
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,17 @@ class CannotFindEventError final : public std::runtime_error
~CannotFindEventError() override = default;
};

class CannotChangeTriggerWhenSamplerOpenedError final : public std::runtime_error
{
public:
CannotChangeTriggerWhenSamplerOpenedError()
: std::runtime_error(
"The Sampler was already opened. Cannot modify triggers after opening. Please create a new Sampler.")
{
}
~CannotChangeTriggerWhenSamplerOpenedError() override = default;
};

class MetricNotSupportedAsSamplingTriggerError final : public std::runtime_error
{
public:
Expand Down
12 changes: 1 addition & 11 deletions include/perfcpp/hardware_info.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,7 @@ class HardwareInfo
/**
* @return True, if the underlying Intel processor requires an aux counter for memory sampling.
*/
[[nodiscard]] static bool is_intel_aux_counter_required() noexcept
{
#if (defined(__GNUC__) && __GNUC__ > 10) || (defined(__clang__) && __clang_major__ > 11)
/// "sapphirerapids" and "alderlake" is only supported since clang-12 and gcc-11
if (is_intel()) {
return static_cast<bool>(__builtin_cpu_is("sapphirerapids")) || static_cast<bool>(__builtin_cpu_is("alderlake"));
}
#endif

return false;
}
[[nodiscard]] static bool is_intel_aux_counter_required();

/**
* @return The id of Intel's PEBS "mem-loads-aux" event.
Expand Down
8 changes: 8 additions & 0 deletions include/perfcpp/sampler.h
Original file line number Diff line number Diff line change
Expand Up @@ -479,6 +479,7 @@ class Sampler

[[nodiscard]] Group& group() noexcept { return _group; }
[[nodiscard]] const Group& group() const noexcept { return _group; }
[[nodiscard]] RequestedEventSet& requested_events() noexcept { return _requested_events; }
[[nodiscard]] const RequestedEventSet& requested_events() const noexcept { return _requested_events; }

private:
Expand Down Expand Up @@ -616,6 +617,13 @@ class Sampler
const std::vector<std::tuple<std::string_view, std::optional<Precision>, std::optional<PeriodOrFrequency>>>&
triggers) const;

/**
* Adds an auxiliary counter as the first counter, if the first counter is a mem-loads counter and the underlying
* hardware needs it.
* @param trigger List of triggers where the auxiliary event should be added.
*/
void add_auxiliary_counter_if_needed(std::vector<Trigger>& trigger) const;

/**
* Reads the sample_id struct from the data located at sample_ptr into the provided sample.
*
Expand Down
13 changes: 12 additions & 1 deletion src/hardware_info.cpp
Original file line number Diff line number Diff line change
@@ -1,13 +1,24 @@
#include <algorithm>
#include <filesystem>
#include <fstream>
#include <perfcpp/hardware_info.h>
#include <regex>
#include <sstream>

bool
perf::HardwareInfo::is_intel_aux_counter_required()
{
if (HardwareInfo::is_intel()) {
return std::filesystem::exists(std::filesystem::path("/sys/bus/event_source/devices/cpu/events/mem-loads-aux"));
}

return false;
}

std::optional<std::uint64_t>
perf::HardwareInfo::intel_pebs_mem_loads_aux_event_id()
{
if (HardwareInfo::is_intel()) {
if (HardwareInfo::is_intel_aux_counter_required()) {
return HardwareInfo::parse_event_umask_from_file("/sys/bus/event_source/devices/cpu/events/mem-loads-aux");
}

Expand Down
51 changes: 48 additions & 3 deletions src/sampler.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include <algorithm>
#include <perfcpp/exception.h>
#include <perfcpp/hardware_info.h>
#include <perfcpp/sampler.h>
#include <stdexcept>
#include <utility>
Expand Down Expand Up @@ -27,8 +28,23 @@ perf::Sampler::trigger(std::vector<std::vector<std::string>>&& list_of_trigger_n
perf::Sampler&
perf::Sampler::trigger(std::vector<std::vector<Trigger>>&& triggers)
{
this->_triggers.reserve(triggers.size());
/// Deny to modify triggers after the sampler was already opened.
if (this->_is_opened) {
throw CannotChangeTriggerWhenSamplerOpenedError{};
}

/// Remove all triggers that where added so far.
this->_triggers.clear();

if (triggers.empty()) {
return *this;
}

/// Add an auxiliary event if needed (memory loads on some specific Intel architectures like Sapphire Rapids).
this->add_auxiliary_counter_if_needed(triggers.front());

/// Process all requested triggers.
this->_triggers.reserve(triggers.size());
for (auto& trigger_group : triggers) {
auto trigger_group_references =
std::vector<std::tuple<std::string_view, std::optional<Precision>, std::optional<PeriodOrFrequency>>>{};
Expand Down Expand Up @@ -244,6 +260,35 @@ perf::Sampler::transform_trigger_to_sample_counter(
return SampleCounter{ std::move(group) };
}

void
perf::Sampler::add_auxiliary_counter_if_needed(std::vector<Trigger>& trigger) const
{
/// Test if auxiliary event is necessary anyway.
if (HardwareInfo::is_intel_aux_counter_required()) {
const auto& first_trigger = trigger.front();

/// Test if the first trigger is a memory load event.
const auto mem_loads = this->_counter_definitions.counter(std::string_view{ "mem-loads" });
const auto first_trigger_counter = this->_counter_definitions.counter(first_trigger.name());
if (mem_loads.has_value() && first_trigger_counter.has_value() &&
std::get<1>(mem_loads.value()) == std::get<1>(first_trigger_counter.value())) {
/// Test if the auxiliary counter is available.
if (const auto auxiliary_counter = this->_counter_definitions.counter(std::string_view{ "mem-loads-aux" });
auxiliary_counter.has_value()) {
/// Configure the trigger: Try to inject the configuration from the mem-loads counter; fall back to global
/// config if not provided.
auto auxiliary_name = std::string{ std::get<0>(auxiliary_counter.value()) };
const auto auxiliary_precision = first_trigger.precision().value_or(this->_config.precise_ip());
const auto auxiliary_period_or_frequency =
first_trigger.period_or_frequency().value_or(this->_config.period_for_frequency());

trigger.insert(trigger.begin(),
Trigger{ std::move(auxiliary_name), auxiliary_precision, auxiliary_period_or_frequency });
}
}
}
}

std::vector<perf::Sample>
perf::Sampler::result(const bool sort_by_time) const
{
Expand Down Expand Up @@ -538,8 +583,8 @@ perf::Sampler::read_hardware_events(UserLevelBufferEntry& entry, const SampleCou
}

/// Build a result containing metrics and hardware events requested by teh user.
return sample_counter.requested_events().result(this->_counter_definitions,
CounterResult{ std::move(hardware_counter_results) }, 1ULL);
return sample_counter.requested_events().result(
this->_counter_definitions, CounterResult{ std::move(hardware_counter_results) }, 1ULL);
}

std::optional<std::vector<std::uintptr_t>>
Expand Down

0 comments on commit e4759d8

Please sign in to comment.