You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Chatting with @quasiben, it seems like in addition to #1226, it would be useful to have some (preferably machine-parseable) method to track cuDF spilling statistics beyond the dashboard page; in our initial conversations around this, a potential implementation of this looked like an option/argument for the worker/cluster APIs to enable logging of cuDF spilling during and/or after computation:
$ CUDF_SPILL=on CUDF_SPILL_STATS=1 dask cuda worker --cudf-spill-logging tcp://10.33.227.163:8786
2023-09-29 07:36:11,333 - distributed.nanny - INFO - Start Nanny at: 'tcp://10.33.227.163:38751'
...
2023-09-29 07:40:14,483 - distributed.worker - INFO - Worker tcp://10.33.227.163:45905 spilled 24 bytes from GPU in 0.01s
2023-09-29 07:40:14,483 - distributed.worker - INFO - Worker tcp://10.33.227.163:45905 unspilled 24 bytes to GPU in 0.01s
...
2023-09-29 07:36:14,483 - distributed.worker - INFO - -------------------------------------------------
2023-09-29 07:36:14,483 - distributed.worker - INFO - Worker: tcp://10.33.227.163:45905
2023-09-29 07:36:14,483 - distributed.worker - INFO - Bytes spilled: 24
2023-09-29 07:36:14,483 - distributed.worker - INFO - Time spent spilling: 0.02s
2023-09-29 07:36:14,483 - distributed.worker - INFO - -------------------------------------------------
2023-09-29 07:40:38,126 - distributed.nanny - INFO - Worker process 3868728 was killed by signal 9
Imagine this could look like a worker plugin that polls the cuDF spilling statistics periodically (is there a way we could "subscribe" a worker to cuDF spilling event?) and at worker closing time, but am interested in if there's a better approach we could take here.
The text was updated successfully, but these errors were encountered:
Imagine this could look like a worker plugin that polls the cuDF spilling statistics periodically (is there a way we could "subscribe" a worker to cuDF spilling event?) and at worker closing time, but am interested in if there's a better approach we could take here.
I don't think we have a "proper" way of doing something like this, the closest to that is probably the LoggerBuffer interface that could be plugged in to a PeriodicCallback as suggested in #442 (comment) . Other than that, I don't think there's any pre-baked solutions for this, but I agree this could be a useful feature.
Chatting with @quasiben, it seems like in addition to #1226, it would be useful to have some (preferably machine-parseable) method to track cuDF spilling statistics beyond the dashboard page; in our initial conversations around this, a potential implementation of this looked like an option/argument for the worker/cluster APIs to enable logging of cuDF spilling during and/or after computation:
Imagine this could look like a worker plugin that polls the cuDF spilling statistics periodically (is there a way we could "subscribe" a worker to cuDF spilling event?) and at worker closing time, but am interested in if there's a better approach we could take here.
The text was updated successfully, but these errors were encountered: