Skip to content

Commit

Permalink
[data] Don't reset iteration counter stats (#48618)
Browse files Browse the repository at this point in the history
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

We currently report `iter_total_blocked_seconds` and `iter_user_seconds`
as **Gauge** while we tracking them as counters, i.e.:
- For each iteration, we had a timer that sums locally for each
iteration into an aggregated value (which is the sum of total blocked
seconds)
- When the iteration ends or the iterator GCed, the gauge metric value
is currently set to 0.
- This creates confusion for users as a counter value (total time
blocked on a dataset) should not be going back to 0, generating charts
like below.

---------

Signed-off-by: rickyx <rickyx@anyscale.com>
  • Loading branch information
rickyyx authored Nov 14, 2024
1 parent 2e953d3 commit a1d4cb1
Showing 1 changed file with 5 additions and 14 deletions.
19 changes: 5 additions & 14 deletions python/ray/data/_internal/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,12 +405,6 @@ def clear_execution_metrics(self, dataset_tag: str, operator_tags: List[str]):
for prom_metric in self.execution_metrics_misc.values():
prom_metric.set(0, tags)

def clear_iteration_metrics(self, dataset_tag: str):
tags = self._create_tags(dataset_tag)
self.iter_total_blocked_s.set(0, tags)
self.iter_user_s.set(0, tags)
self.iter_initialize_s.set(0, tags)

def register_dataset(self, job_id: str, dataset_tag: str, operator_tags: List[str]):
self.datasets[dataset_tag] = {
"job_id": job_id,
Expand Down Expand Up @@ -620,18 +614,15 @@ def update_iteration_metrics(self, stats: "DatasetStats", dataset_tag: str):
self._start_thread_if_not_running()

def clear_iteration_metrics(self, dataset_tag: str):
# Delete the last iteration stats so that update thread will have
# a chance to terminate.
# Note we don't reset the actual metric values through the StatsActor
# since the value is essentially a counter value. See
# https://github.com/ray-project/ray/pull/48618 for more context.
with self._stats_lock:
if dataset_tag in self._last_iteration_stats:
del self._last_iteration_stats[dataset_tag]

try:
self._stats_actor(
create_if_not_exists=False
).clear_iteration_metrics.remote(dataset_tag)
except Exception:
# Cluster may be shut down.
pass

# Other methods

def register_dataset_to_stats_actor(self, dataset_tag, operator_tags):
Expand Down

0 comments on commit a1d4cb1

Please sign in to comment.