-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subscriber: record a histogram of wake-to-poll times #50
Comments
Hi @seanmonstar, I've been reading up on #47 to see how to do this. Do you mind if I take a stab at this issue? |
It's possible that this data is also collected by tokio's runtime metrics feature (tokio-rs/tokio#4073), which I would like to start integrating into the console. It might be worth looking at the upstream API to see if that's the case. Otherwise, we can definitely record this with the data we're already aggregating. |
Thank you @hawkw for the note. I read the RFC and it doesn't look like there's something we can use for this issue specifically. It looks like that work is more concerned with collecting perf-counter type metrics than aggregating latency. |
Each task displays the sum of the time it has been idle and busy, as well as the total. The idle time includes the time between when a task is woken, and when the runtime actually polls that task. There are cases where a task may be scheduled for a long time after being woken, before it is polled. Especially if many tasks are woken at the same time and don't yield back to the runtime quickly. To add visibility to this, the total time that a task is scheduled before being polled has been added. Additionally, a new task state `Scheduled` has been added. This is displayed in both the tasks table and in the task detail view. In the `console-api`, the total `scheduled_time` for the task has been added to the `TaskStats` message in `tasks.proto`. This is the first of two parts. In the second part (#409), a histogram for scheduled time will be added, the equivalent of the poll time histogram which is already available on the task detail screen. To show a pathological case which may lead to needing to see the scheduled time, a new example has been added to the `console-subscriber` ## PR Notes This PR does something adjacent to what is described in #50, but not quite. The unicode character used for a `SCHED` task is ⏫. The second PR (#409) will record a scheduled time histogram the same as it recorded for poll times. These two changes should go in together (and certainly shouldn't be released separately). However, this PR is already quite big, so they'll be separated out. The idea is that this PR isn't merged until the next one has also been reviewed and approved. It would be good to get some feedback at this stage though. The task list view with the new column for `Sched` time: <img width="1032" alt="a tasks table view for the long-scheduled example" src="https://user-images.githubusercontent.com/89589/232456977-2921f884-4673-420f-ba4f-3646627d44db.png"> The `Task` block in the task detail view showing the new `Scheduled` time entry. <img width="510" alt="The task block on the task detail view for the rx task in the long-scheduled example" src="https://user-images.githubusercontent.com/89589/232457332-e455e086-9468-42c9-8fda-7965d8d1e6f8.png">
Each task displays the sum of the time it has been idle and busy, as well as the total. The idle time includes the time between when a task is woken, and when the runtime actually polls that task. There are cases where a task may be scheduled for a long time after being woken, before it is polled. Especially if many tasks are woken at the same time and don't yield back to the runtime quickly. To add visibility to this, the total time that a task is scheduled before being polled has been added. Additionally, a new task state `Scheduled` has been added. This is displayed in both the tasks table and in the task detail view. In the `console-api`, the total `scheduled_time` for the task has been added to the `TaskStats` message in `tasks.proto`. This is the first of two parts. In the second part (#409), a histogram for scheduled time will be added, the equivalent of the poll time histogram which is already available on the task detail screen. To show a pathological case which may lead to needing to see the scheduled time, a new example has been added to the `console-subscriber` ## PR Notes This PR does something adjacent to what is described in #50, but not quite. The unicode character used for a `SCHED` task is ⏫. The second PR (#409) will record a scheduled time histogram the same as it recorded for poll times. These two changes should go in together (and certainly shouldn't be released separately). However, this PR is already quite big, so they'll be separated out. The idea is that this PR isn't merged until the next one has also been reviewed and approved. It would be good to get some feedback at this stage though. The task list view with the new column for `Sched` time: <img width="1032" alt="a tasks table view for the long-scheduled example" src="https://user-images.githubusercontent.com/89589/232456977-2921f884-4673-420f-ba4f-3646627d44db.png"> The `Task` block in the task detail view showing the new `Scheduled` time entry. <img width="510" alt="The task block on the task detail view for the rx task in the long-scheduled example" src="https://user-images.githubusercontent.com/89589/232457332-e455e086-9468-42c9-8fda-7965d8d1e6f8.png">
This task is done now per task. As we don't currently have any "as a whole" or "per runtime" histograms, I think that we can consider it complete after #406 was merged. Any histograms per runtime would be added once runtimes themselves have been instrumented in Tokio. |
A way for someone to better understand how the runtime itself is behaving is to record a history of how long it takes a task to be polled after it was been woken. We could either record this information per task, or on a whole. I think as a whole is probably sufficient, and would mean less data needs to be sent per task (see #47 (comment)). But if there's a very helpful use case for knowing this data per task, we could do that.
The text was updated successfully, but these errors were encountered: