Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Don't reset iteration counter stats #48618

Merged
merged 4 commits into from
Nov 14, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions python/ray/data/_internal/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,8 +407,9 @@ def clear_execution_metrics(self, dataset_tag: str, operator_tags: List[str]):

def clear_iteration_metrics(self, dataset_tag: str):
tags = self._create_tags(dataset_tag)
self.iter_total_blocked_s.set(0, tags)
self.iter_user_s.set(0, tags)
# NOTE(rickyx): We should not be clearing the iter_total_blocked_s and
# iter_user_s metrics because they are technically counters we tracked, and
# should not be reset by each iteration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rickyyx could you help me understand why this value keeps going up across iterations? Is it because we reset the value here, but not in some other place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During my investigation for #44635, where the Rows Outputted value is seen as zero, I found that disabling clear_execution_metrics() here is the fix.

I was thinking it would make sense to just completely remove clear_execution_metrics() and clear_iteration_metrics() calls currently being called after Dataset execution/iteration completes -- @rickyyx do you agree? For more context, I believe the reason why we had this in the first place is to do this hacky "reset" of metrics, to prevent values from persisting at the last value. But I think this is no longer the behavior we want, since we also now show rates on the Grafana dashboard by default -- so we can simply remove the metrics reset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to just removing it

Why also not converting these to counters as well?

Copy link
Contributor Author

@rickyyx rickyyx Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - i think removing it makes sense.

Why also not converting these to counters as well?

Yeah, I think this was an alternative, but if I remember correctly, we are less prone to changing actual timeseries definition since there might be customers depending on this? (or what's the policy here for backward compatibility on the metrics)?

I am open to just change the metric type to counters too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you help me understand why this value keeps going up across iterations

I think it's because we reuse the stats field. Let me see if that could be fixed.

self.iter_initialize_s.set(0, tags)

def register_dataset(self, job_id: str, dataset_tag: str, operator_tags: List[str]):
Expand Down