Skip to content

Commit

Permalink
Add metrics to track local and global memory arbitrations separately (f…
Browse files Browse the repository at this point in the history
…acebookincubator#9224)

Summary:
We trigger memory arbitrations for two different reasons: (1) a query exceeds its own memory limit;
(2) the memory arbitrator doesn't have free space to grow a query memory arbitration request.
The latter indicates we are over-provision the worker memory or it happens that they are all run at the
peak (at least we don't expect the memory arbitration to handle the sustained high memory usage.
The memory arbitration should help to handle the transient peak memory usage, otherwise, the whole
worker performance will be severely degraded). The case (1) can run in parallel and shouldn't affect the
other running queries or block their memory arbitration if the system has free capacities. We might
consider the followup optimization for case (1). For now, add metrics to monitor the two arbitration
events separately in this PR

Pull Request resolved: facebookincubator#9224

Reviewed By: bikramSingh91, oerling

Differential Revision: D55261366

Pulled By: xiaoxmeng

fbshipit-source-id: 3258b6cef04c7afde4cce0c0d5cdaa19bbc919e8
  • Loading branch information
xiaoxmeng authored and facebook-github-bot committed Mar 23, 2024
1 parent dcc3c88 commit 458339f
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 0 deletions.
17 changes: 17 additions & 0 deletions velox/common/base/Counters.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,23 @@ void registerVeloxMetrics() {
DEFINE_METRIC(
kMetricArbitratorRequestsCount, facebook::velox::StatType::COUNT);

// The number of arbitration that reclaims the used memory from the query
// which initiates the memory arbitration request itself. It ensures the
// memory arbitration request won't exceed its per-query memory capacity
// limit.
DEFINE_METRIC(
kMetricArbitratorLocalArbitrationCount, facebook::velox::StatType::COUNT);

// The number of arbitration which ensures the total allocated query capacity
// won't exceed the arbitrator capacity limit. It may or may not reclaim
// memory from the query which initiate the memory arbitration request. This
// indicates the velox runtime doesn't have enough memory to run all the
// queries at their peak memory usage. We have to trigger spilling to let them
// run through completion.
DEFINE_METRIC(
kMetricArbitratorGlobalArbitrationCount,
facebook::velox::StatType::COUNT);

// The number of times a query level memory pool is aborted as a result of a
// memory arbitration process. The memory pool aborted will eventually result
// in a cancelling the original query.
Expand Down
6 changes: 6 additions & 0 deletions velox/common/base/Counters.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,12 @@ constexpr folly::StringPiece kMetricMemoryPoolReservationLeakBytes{
constexpr folly::StringPiece kMetricArbitratorRequestsCount{
"velox.arbitrator_requests_count"};

constexpr folly::StringPiece kMetricArbitratorLocalArbitrationCount{
"velox.arbitrator_local_arbitration_count"};

constexpr folly::StringPiece kMetricArbitratorGlobalArbitrationCount{
"velox.arbitrator_global_arbitration_count"};

constexpr folly::StringPiece kMetricArbitratorAbortedCount{
"velox.arbitrator_aborted_count"};

Expand Down
2 changes: 2 additions & 0 deletions velox/common/memory/SharedArbitrator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -432,6 +432,7 @@ bool SharedArbitrator::arbitrateMemory(
}

VELOX_CHECK_LT(freedBytes, growTarget);
RECORD_METRIC_VALUE(kMetricArbitratorGlobalArbitrationCount);
freedBytes += reclaimUsedMemoryFromCandidatesBySpill(
requestor, candidates, growTarget - freedBytes);
if (requestor->aborted()) {
Expand Down Expand Up @@ -547,6 +548,7 @@ uint64_t SharedArbitrator::reclaim(
try {
freedBytes = pool->shrink(targetBytes);
if (freedBytes < targetBytes) {
RECORD_METRIC_VALUE(kMetricArbitratorLocalArbitrationCount);
pool->reclaim(
targetBytes - freedBytes, memoryReclaimWaitMs_, reclaimerStats);
}
Expand Down
12 changes: 12 additions & 0 deletions velox/docs/monitoring/metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,18 @@ Memory Management
- Count
- The number of times a memory arbitration request was initiated by a
memory pool attempting to grow its capacity.
* - arbitrator_local_arbitration_count
- Count
- The number of arbitration that reclaims the used memory from the query which initiates
the memory arbitration request itself. It ensures the memory arbitration request won't
exceed its per-query memory capacity limit.
* - arbitrator_global_arbitration_count
- Count
- The number of arbitration which ensures the total allocated query capacity won't exceed
the arbitrator capacity limit. It may or may not reclaim memory from the query which
initiate the memory arbitration request. This indicates the velox runtime doesn't have
enough memory to run all the queries at their peak memory usage. We have to trigger
spilling to let them run through completion.
* - arbitrator_aborted_count
- Count
- The number of times a query level memory pool is aborted as a result of
Expand Down

0 comments on commit 458339f

Please sign in to comment.