Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add companion metrics for all nsTiming metrics to measure time elapsed excluding semaphore wait #11330

Closed
binmahone opened this issue Aug 15, 2024 · 0 comments · Fixed by #11331
Assignees
Labels
feature request New feature or request

Comments

@binmahone
Copy link
Collaborator

binmahone commented Aug 15, 2024

Sometimes a huge number is observed in metrics, like:

image

This may confuse performance tuners because such metrics grew huge only because the task spent a lot of time waiting for the GPU semaphore. So for each timing metrics, like "concat time" or "op time", it is necessary to distinguish between w and w/o semaphore wait time:

If w is huge, and w/o is minor, then we should focus on issues like GPU contention.
If both w and w/o is huge, then we're more confident that it's more of a kernel/algorithm issues in single task.

@binmahone binmahone added feature request New feature or request ? - Needs Triage Need team to review and classify labels Aug 15, 2024
@binmahone binmahone self-assigned this Aug 15, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants