Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add tracing for runner #3113

Merged
merged 6 commits into from
Nov 8, 2024
Merged

[FEAT] Add tracing for runner #3113

merged 6 commits into from
Nov 8, 2024

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Oct 24, 2024

Adds tracing for the RayRunner. This is turned off by default, and can be deactivated via DAFT_ENABLE_RAY_TRACING=1.

Visualize this with https://ui.perfetto.dev/

  1. The RayRunner dispatch loop view gives us a good idea of what that crazy while loop is doing, as well as overall Ray metrics such as number of tasks in flight, number of cores available etc
  2. The Ray Task Execution view gives us a view of the Ray Tasks that were scheduled, when they were received as completed and also their ResourceRequests.
image
  1. I also added views for each node and their worker processes. These views get updated on the end of every "wave" by polling a metrics actor
  2. A "Stages" view allows us to point to the tasks that were launched by each stage, into (3)
image

@github-actions github-actions bot added the enhancement New feature or request label Oct 24, 2024
Copy link

codspeed-hq bot commented Oct 24, 2024

CodSpeed Performance Report

Merging #3113 will degrade performances by 53.29%

Comparing jay/scheduler-tracing (9db3da2) with main (f566125)

Summary

⚡ 1 improvements
❌ 2 regressions
✅ 14 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main jay/scheduler-tracing Change
test_count[1 Small File] 3.2 ms 3.6 ms -12.55%
test_iter_rows_first_row[100 Small Files] 358.8 ms 224.4 ms +59.91%
test_show[100 Small Files] 24 ms 51.4 ms -53.29%

Copy link

codecov bot commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 67.94258% with 134 lines in your changes missing coverage. Please review.

Project coverage is 77.79%. Comparing base (0d669ca) to head (9db3da2).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
daft/runners/ray_tracing.py 63.17% 88 Missing ⚠️
daft/runners/ray_metrics.py 55.29% 38 Missing ⚠️
daft/runners/ray_runner.py 95.00% 4 Missing ⚠️
src/common/daft-config/src/lib.rs 57.14% 3 Missing ⚠️
src/common/daft-config/src/python.rs 85.71% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3113      +/-   ##
==========================================
- Coverage   79.13%   77.79%   -1.34%     
==========================================
  Files         637      645       +8     
  Lines       77982    79915    +1933     
==========================================
+ Hits        61712    62171     +459     
- Misses      16270    17744    +1474     
Files with missing lines Coverage Δ
src/common/daft-config/src/python.rs 78.19% <85.71%> (+0.29%) ⬆️
src/common/daft-config/src/lib.rs 79.36% <57.14%> (-8.14%) ⬇️
daft/runners/ray_runner.py 81.62% <95.00%> (+0.67%) ⬆️
daft/runners/ray_metrics.py 55.29% <55.29%> (ø)
daft/runners/ray_tracing.py 63.17% <63.17%> (ø)

... and 190 files with indirect coverage changes

@jaychia jaychia force-pushed the jay/scheduler-tracing branch 2 times, most recently from a07a517 to 0af74f3 Compare October 29, 2024 02:12
@jaychia jaychia requested a review from samster25 October 29, 2024 18:48
@jaychia jaychia force-pushed the jay/scheduler-tracing branch 2 times, most recently from b384352 to 0cbc552 Compare November 1, 2024 23:08
@jaychia jaychia changed the base branch from main to jay/rayrunner-refactor November 1, 2024 23:08
@jaychia jaychia force-pushed the jay/scheduler-tracing branch from 0cbc552 to 69c2c64 Compare November 1, 2024 23:10
Base automatically changed from jay/rayrunner-refactor to main November 5, 2024 22:39
@jaychia jaychia force-pushed the jay/scheduler-tracing branch from eb733e8 to 64af511 Compare November 6, 2024 19:18
daft/runners/ray_metrics.py Outdated Show resolved Hide resolved
daft/runners/ray_metrics.py Outdated Show resolved Hide resolved
daft/runners/ray_metrics.py Outdated Show resolved Hide resolved
daft/runners/ray_metrics.py Outdated Show resolved Hide resolved
daft/runners/ray_runner.py Outdated Show resolved Hide resolved
daft/runners/ray_tracing.py Outdated Show resolved Hide resolved
daft/runners/ray_tracing.py Outdated Show resolved Hide resolved
daft/runners/ray_tracing.py Outdated Show resolved Hide resolved
@jaychia jaychia requested a review from samster25 November 8, 2024 02:01
daft/runners/ray_metrics.py Show resolved Hide resolved
daft/runners/ray_runner.py Show resolved Hide resolved
daft/runners/ray_runner.py Show resolved Hide resolved
daft/runners/ray_tracing.py Outdated Show resolved Hide resolved
daft/runners/ray_tracing.py Show resolved Hide resolved
daft/runners/ray_tracing.py Outdated Show resolved Hide resolved
@jaychia jaychia enabled auto-merge (squash) November 8, 2024 20:17
@jaychia jaychia merged commit 54e58df into main Nov 8, 2024
43 of 44 checks passed
@jaychia jaychia deleted the jay/scheduler-tracing branch November 8, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants