-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add notice of excessive scheduling time to time macro etc. #55103
Open
IanButterworth
wants to merge
6
commits into
JuliaLang:master
Choose a base branch
from
IanButterworth:ib/time_schedule_efficiency
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
add notice of excessive scheduling time to time macro etc. #55103
IanButterworth
wants to merge
6
commits into
JuliaLang:master
from
IanButterworth:ib/time_schedule_efficiency
+100
−36
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This comment has been minimized.
This comment has been minimized.
IanButterworth
force-pushed
the
ib/time_schedule_efficiency
branch
from
July 11, 2024 13:35
7fe2b02
to
13c2d17
Compare
This comment has been minimized.
This comment has been minimized.
IanButterworth
force-pushed
the
ib/time_schedule_efficiency
branch
3 times, most recently
from
October 3, 2024 02:46
2f2121f
to
d371291
Compare
This comment has been minimized.
This comment has been minimized.
IanButterworth
force-pushed
the
ib/time_schedule_efficiency
branch
3 times, most recently
from
October 6, 2024 02:32
320a2f8
to
86a6aa1
Compare
Updated the examples in the OP. With the new more precise timing a complex |
This comment was marked as resolved.
This comment was marked as resolved.
IanButterworth
force-pushed
the
ib/time_schedule_efficiency
branch
3 times, most recently
from
October 7, 2024 13:15
a2e76bf
to
46a7504
Compare
It'd be good to get this in to try it out on wild code to see if it reports meaningfully and hone the threshold if necessary. But it'd be good to get a review for the timing measurement logic by someone familiar with the scheduler first. |
IanButterworth
force-pushed
the
ib/time_schedule_efficiency
branch
from
October 14, 2024 17:42
46a7504
to
f7d3e8b
Compare
IanButterworth
force-pushed
the
ib/time_schedule_efficiency
branch
from
October 15, 2024 01:04
8ea0531
to
81522a8
Compare
Merged
kpamnany
added a commit
that referenced
this pull request
Dec 6, 2024
Close #47351 (builds on top of #48416) Adds two per-task metrics: - running time = amount of time the task was actually running (according to our scheduler). Note: currently inclusive of GC time, but would be good to be able to separate that out (in a future PR) - wall time = amount of time between the scheduler becoming aware of this task and the task entering a terminal state (i.e. done or failed). We record running time in `wait()`, where the scheduler stops running the task as well as in `yield(t)`, `yieldto(t)` and `throwto(t)`, which bypass the scheduler. Other places where a task stops running (for `Channel`, `ReentrantLock`, `Event`, `Timer` and `Semaphore` are all implemented in terms of `wait(Condition)`, which in turn calls `wait()`. `LibuvStream` similarly calls `wait()`. This should capture everything (albeit, slightly over-counting task CPU time by including any enqueuing work done before we hit `wait()`). The various metrics counters could be a separate inlined struct if we think that's a useful abstraction, but for now i've just put them directly in `jl_task_t`. They are all atomic, except the `metrics_enabled` flag itself (which we now have to check on task start/switch/done even if metrics are not enabled) which is set on task construction and marked `const` on the julia side. In future PRs we could add more per-task metrics, e.g. compilation time, GC time, allocations, potentially a wait-time breakdown (time waiting on locks, channels, in the scheduler run queue, etc.), potentially the number of yields. Perhaps in future there could be ways to enable this on a per-thread and per-task basis. And potentially in future these same timings could be used by `@time` (e.g. writing this same timing data to a ScopedValue like in #55103 but only for tasks lexically scoped to inside the `@time` block). Timings are off by default but can be turned on globally via starting Julia with `--task-metrics=yes` or calling `Base.Experimental.task_metrics(true)`. Metrics are collected for all tasks created when metrics are enabled. In other words, enabling/disabling timings via `Base.Experimental.task_metrics` does not affect existing `Task`s, only new `Task`s. The other new APIs are `Base.Experimental.task_running_time_ns(::Task)` and `Base.Experimental.task_wall_time_ns(::Task)` for retrieving the new metrics. These are safe to call on any task (including the current task, or a task running on another thread). All these are in `Base.Experimental` to give us room to change up the APIs as we add more metrics in future PRs (without worrying about release timelines). cc @NHDaly @kpamnany @d-netto --------- Co-authored-by: Pete Vilter <pete.vilter@gmail.com> Co-authored-by: K Pamnany <kpamnany@users.noreply.github.com> Co-authored-by: Nathan Daly <nathan.daly@relational.ai> Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
kpamnany
added a commit
to RelationalAI/julia
that referenced
this pull request
Dec 7, 2024
Close JuliaLang#47351 (builds on top of JuliaLang#48416) Adds two per-task metrics: - running time = amount of time the task was actually running (according to our scheduler). Note: currently inclusive of GC time, but would be good to be able to separate that out (in a future PR) - wall time = amount of time between the scheduler becoming aware of this task and the task entering a terminal state (i.e. done or failed). We record running time in `wait()`, where the scheduler stops running the task as well as in `yield(t)`, `yieldto(t)` and `throwto(t)`, which bypass the scheduler. Other places where a task stops running (for `Channel`, `ReentrantLock`, `Event`, `Timer` and `Semaphore` are all implemented in terms of `wait(Condition)`, which in turn calls `wait()`. `LibuvStream` similarly calls `wait()`. This should capture everything (albeit, slightly over-counting task CPU time by including any enqueuing work done before we hit `wait()`). The various metrics counters could be a separate inlined struct if we think that's a useful abstraction, but for now i've just put them directly in `jl_task_t`. They are all atomic, except the `metrics_enabled` flag itself (which we now have to check on task start/switch/done even if metrics are not enabled) which is set on task construction and marked `const` on the julia side. In future PRs we could add more per-task metrics, e.g. compilation time, GC time, allocations, potentially a wait-time breakdown (time waiting on locks, channels, in the scheduler run queue, etc.), potentially the number of yields. Perhaps in future there could be ways to enable this on a per-thread and per-task basis. And potentially in future these same timings could be used by `@time` (e.g. writing this same timing data to a ScopedValue like in JuliaLang#55103 but only for tasks lexically scoped to inside the `@time` block). Timings are off by default but can be turned on globally via starting Julia with `--task-metrics=yes` or calling `Base.Experimental.task_metrics(true)`. Metrics are collected for all tasks created when metrics are enabled. In other words, enabling/disabling timings via `Base.Experimental.task_metrics` does not affect existing `Task`s, only new `Task`s. The other new APIs are `Base.Experimental.task_running_time_ns(::Task)` and `Base.Experimental.task_wall_time_ns(::Task)` for retrieving the new metrics. These are safe to call on any task (including the current task, or a task running on another thread). All these are in `Base.Experimental` to give us room to change up the APIs as we add more metrics in future PRs (without worrying about release timelines). cc @NHDaly @kpamnany @d-netto --------- Co-authored-by: Pete Vilter <pete.vilter@gmail.com> Co-authored-by: K Pamnany <kpamnany@users.noreply.github.com> Co-authored-by: Nathan Daly <nathan.daly@relational.ai> Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
kpamnany
added a commit
to RelationalAI/julia
that referenced
this pull request
Dec 9, 2024
* Add per-task metrics (JuliaLang#56320) Close JuliaLang#47351 (builds on top of JuliaLang#48416) Adds two per-task metrics: - running time = amount of time the task was actually running (according to our scheduler). Note: currently inclusive of GC time, but would be good to be able to separate that out (in a future PR) - wall time = amount of time between the scheduler becoming aware of this task and the task entering a terminal state (i.e. done or failed). We record running time in `wait()`, where the scheduler stops running the task as well as in `yield(t)`, `yieldto(t)` and `throwto(t)`, which bypass the scheduler. Other places where a task stops running (for `Channel`, `ReentrantLock`, `Event`, `Timer` and `Semaphore` are all implemented in terms of `wait(Condition)`, which in turn calls `wait()`. `LibuvStream` similarly calls `wait()`. This should capture everything (albeit, slightly over-counting task CPU time by including any enqueuing work done before we hit `wait()`). The various metrics counters could be a separate inlined struct if we think that's a useful abstraction, but for now i've just put them directly in `jl_task_t`. They are all atomic, except the `metrics_enabled` flag itself (which we now have to check on task start/switch/done even if metrics are not enabled) which is set on task construction and marked `const` on the julia side. In future PRs we could add more per-task metrics, e.g. compilation time, GC time, allocations, potentially a wait-time breakdown (time waiting on locks, channels, in the scheduler run queue, etc.), potentially the number of yields. Perhaps in future there could be ways to enable this on a per-thread and per-task basis. And potentially in future these same timings could be used by `@time` (e.g. writing this same timing data to a ScopedValue like in JuliaLang#55103 but only for tasks lexically scoped to inside the `@time` block). Timings are off by default but can be turned on globally via starting Julia with `--task-metrics=yes` or calling `Base.Experimental.task_metrics(true)`. Metrics are collected for all tasks created when metrics are enabled. In other words, enabling/disabling timings via `Base.Experimental.task_metrics` does not affect existing `Task`s, only new `Task`s. The other new APIs are `Base.Experimental.task_running_time_ns(::Task)` and `Base.Experimental.task_wall_time_ns(::Task)` for retrieving the new metrics. These are safe to call on any task (including the current task, or a task running on another thread). All these are in `Base.Experimental` to give us room to change up the APIs as we add more metrics in future PRs (without worrying about release timelines). cc @NHDaly @kpamnany @d-netto --------- Co-authored-by: Pete Vilter <pete.vilter@gmail.com> Co-authored-by: K Pamnany <kpamnany@users.noreply.github.com> Co-authored-by: Nathan Daly <nathan.daly@relational.ai> Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com> * Address review comments --------- Co-authored-by: Pete Vilter <pete.vilter@gmail.com> Co-authored-by: K Pamnany <kpamnany@users.noreply.github.com> Co-authored-by: Nathan Daly <nathan.daly@relational.ai> Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Replaces #55094
Balancing number of tasks vs. task duration is a key optimization aspect, so this aims to bring a warning when it's notably imbalanced to the most common place that a user might try to see why their function is taking so long etc.
If more than 10% of wall time is spent (on average across all worker threads) specifically scheduling tasks then warn in
@time
.A real world example that doesn't show the schedule warning.