Utility functions to debug cluster_dump #5873

fjetter · 2022-02-28T10:53:11Z

Client.dump_cluster_state offers a way to dump the entire cluster state of all workers and scheduler. For active clusters this typically includes hundreds of thousands of lines of state (assuming it is printed human readable).

To work with this state dump artifact, one typically needs to write custom scripts, grep the logs, etc. Many of these operations can be standardized and we should keep a few functions around to help us analyze this.

get_tasks_in_state(state: str, worker: bool=False) Return all TaskState objects who are currently in a given state on the scheduler (or the Workers if worker is True)
story - Get a global story for a key or stimulus ID, similar to Client.story - Support collecting cluster-wide story for a key or stimulus ID #5872
missing_workers - Get names and addresses of all workers who are known to the scheduler but we are missing logs for
...

The above functions should accept the cluster dump artifact in both yaml and msgpack format.

The text was updated successfully, but these errors were encountered:

gjoseph92 · 2022-03-01T01:15:26Z

In general, a function to see where scheduler state doesn't match with worker state (tasks have differing states, workers have different status, etc.) could be a helpful starting point. Basically, an overall "diff" between the scheduler's view of the world, and the actual world as recored by workers. But particularly for tasks, since that's usually what you're interested in.

fjetter · 2022-03-01T09:50:43Z

In general, a function to see where scheduler state doesn't match with worker state (tasks have differing states, workers have different status, etc.) could be a helpful starting point. Basically, an overall "diff" between the scheduler's view of the world, and the actual world as recored by workers. But particularly for tasks, since that's usually what you're interested in.

Maybe. There is always a diff. States don't map exactly 1to1 and there is a delay. I'm concerned that writing this "diff" is not worth the cost tbh

gjoseph92 · 2022-03-03T19:50:24Z

True. A full diff probably isn't necessary. I'm thinking more about task states specifically, and especially processing on the scheduler vs ready+executing on workers.

Really, I'm looking for tools to give you a starting point in a large dump, and highlight potential problems so you have an idea where to look with more targeted tools like story. And since typically, a deadlock ultimately happens because the scheduler's "view of the world" (task states) doesn't match with the workers'—and what we're looking for is the root cause why those got out of sync—seeing where the state differs seemed to me like good place to start.

fjetter · 2022-03-04T09:11:28Z

Really, I'm looking for tools to give you a starting point in a large dump

Every dump I investigate starts with

"get all tasks in state processing on the scheduler"
"Pick one of the tasks"
"Show we on which worker this task is supposed to be executing"

I think this pattern can be sufficiently covered by the function I proposed above, get_tasks_in_state.

From here on out, it depends on what I see

A) Is the worker indeed trying to execute it? If so, why isn't it? (e.g. missing dependency)
B) The worker is not aware that it is supposed to execute it
C) Multiple workers know about this task and disagree
D) ...

The most relevant attributes I typically look at are

Scheduler.transition_log
Scheduler.tasks
Stealing (basicalyl everyhing stored in the extension)
Scheduler.events
Worker.tasks
Worker.log
A few more attributes on worker side (len(self.ready), len(self.data_needed), n_executing, etc.)

In the past, every specific script I wrote to debug this became stale after I fixed a particular issue. Therefore, I kept the most obvious, elementary methods in my scripts like the ones mentioned above. In my experience, anything more advanced requires way too much knowledge about the internals and is highly dependent on what you're looking for.

An example why I think the "diff" is too hard and maybe not even helpful. processing on scheduler side can mean one of many different things on the worker which are all valid states and all can pop up in a deadlocked situation, depending on many other variables

released
forgotten
waiting (only temporarily)
ready / constrained
resumed
executing

In fact, even flight, fetch and cancelled are possible if one of the workers is drifting a bit (e.g. if the task was recently lost and recomputed). That would cover every state now, excluding long-running for the sake of the argument, that one is weird anyhow.

Of course, this set is much more restricted given that we're only looking at the one worker where the task is supposed to be processing on. However, this is a very specific query and we'll need much more flexibility.

sjperkins · 2022-03-09T11:03:10Z

Client.dump_cluster_state offers a way to dump the entire cluster state of all workers and scheduler. For active clusters this typically includes hundreds of thousands of lines of state (assuming it is printed human readable).

To work with this state dump artifact, one typically needs to write custom scripts, grep the logs, etc. Many of these operations can be standardized and we should keep a few functions around to help us analyze this.
* `get_tasks_in_state(state: str, worker: bool=False)` Return all TaskState objects who are currently in a given state on the scheduler (or the Workers if worker is `True`)

* `story` - Get a global story for a key or stimulus ID, similar to [`Client.story` - Support collecting cluster-wide story for a key or stimulus ID #5872](https://github.com/dask/distributed/issues/5872)

* `missing_workers` - Get names and addresses of all workers who are known to the scheduler but we are missing logs for

* ...
The above functions should accept the cluster dump artifact in both yaml and msgpack format.

I'm currently taking a look at where the above functions should be implemented. I don't think they should be implemented on the Client object as they depend on the dumped state. Would a utils_client.py or utils_state.py module be appropriate?

fjetter · 2022-03-09T11:27:01Z

In #5863 we're about to introduce a cluster_dump.py module for this purpose and some other shared code

sjperkins · 2022-03-10T08:36:43Z

This issue describes utility functions for debugging cluster state stored in a disk artifact.

If the purpose is for generalised debugging, it occurred to me that would be useful to encapsulate the state in a class and offer the functionality afforded in these functions as class methods. For e.g.

class DumpInspector:
  def __init__(self, url_or_state: str | dict, context="scheduler"):
    if isinstance(url_or_state, str):
      self.dump = load_cluster_dump(url_or_state)
    elif isinstance(url_or_state, dict):
      self.dump = url_or_state

    self.context = context

 def tasks_in_state(self, state=""):
   if state:
     return {k, v for k, v in self.dump[self.context].items() if v["state"] == state}
  
   return self.dump[self.context]

Then the following is possible:

inspect = DumpInspector("dump.msgpack.gz")
released = inspect.tasks_in_state("released)
memory = inspect.tasks_in_state("memory")

fjetter · 2022-03-10T10:05:37Z

Yes, this is about debugging only. I don't even expect this to be used by a wide range of users but this is probably mostly for developers.

If the purpose is for generalised debugging, it occurred to me that would be useful to encapsulate the state in a class and

What would be the benefits of doing this?

I think a lot of this simply comes down to API design and taste. I probably would've started with a few functions and mostly builtins, e.g.

def load_cluster_dump(url_or_path: str) -> dict:
    ...

def tasks_in_state(dump: dict, state: Collection[str]) -> Collection[dict]:
    ...

In terms of usage, this would boil down to mostly the same as for the inspector

state = load_cluster_dump("path/to/my/dump")
tasks_in_state(state, ["processing", "memory")

compared to

inspector = DumpInspector("path/to/my/dump")
inspector.tasks_in_state(["processing", "memory"])

Re context

If you choose to go for an inspector class, it should not be bound by worker/scheduler context. We typically want to investigate both but with different calls, e.g.

tasks_in_state_on_scheduler("processing")
tasks_in_state_on_workers(["executing", "ready"], workers=["addrA", "addrB"])

sjperkins · 2022-03-10T10:35:06Z

Yes, this is about debugging only. I don't even expect this to be used by a wide range of users but this is probably mostly for developers.

If the purpose is for generalised debugging, it occurred to me that would be useful to encapsulate the state in a class and

What would be the benefits of doing this?

One can simply import the class which provides access to all the associated methods, rather importing all the functions that one would use to inspect the state. I think this might save some typing.

fjetter · 2022-03-10T11:31:08Z

One can simply import the class which provides access to all the associated methods, rather importing all the functions that one would use to inspect the state. I think this might save some typing.

I don't have a strong opinion here.

sjperkins · 2022-03-10T12:42:34Z

missing_workers - Get names and addresses of all workers who are known to the scheduler but we are missing logs for

I'm interested in defining the above in more detail. One way of defining this in terms of the cluster state might be the following pseudo-code:

scheduler_workers = state["scheduler"]["workers"]  # Workers known to the scheduler?
workers = state["workers"]                                        # Actual workers
missing = set()

for sw in scheduler_workers.keys():
  if sw not in workers or not workers[sw]["log"]:
    missing.add(sw)

However, its not immediately clear to me if state["scheduler']["workers"] and state["workers"] are constructed in an independent fashion during a cluster dump, as state["workers"] is derived from a broadcast to all workers.

fjetter · 2022-03-10T12:48:13Z

However, its not immediately clear to me if state["scheduler']["workers"] and state["workers"] are constructed in an independent fashion during a cluster dump, as the state["workers"] is derived from a broadcast to all workers.

state["scheduler']["workers"] is the view of the scheduler
state["workers"] are the workers that actually replied to the broadcast. They may timeout

More broadly speaking, in this function one could also look into other state like Scheduler.events where we have a record of workers of the past. I’m not entirely sure how valuable this is, though

This was referenced Mar 1, 2022

Support dumping cluster state to URL #5863

Merged

(Worker) State Machine determinism and replayability #5736

Closed

sjperkins mentioned this issue Mar 9, 2022

Cluster dump utilities #5920

Merged

3 tasks

fjetter closed this as completed in #5920 Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utility functions to debug cluster_dump #5873

Utility functions to debug cluster_dump #5873

fjetter commented Feb 28, 2022

gjoseph92 commented Mar 1, 2022

fjetter commented Mar 1, 2022

gjoseph92 commented Mar 3, 2022

fjetter commented Mar 4, 2022

sjperkins commented Mar 9, 2022

fjetter commented Mar 9, 2022

sjperkins commented Mar 10, 2022 •

edited

Loading

fjetter commented Mar 10, 2022

sjperkins commented Mar 10, 2022

fjetter commented Mar 10, 2022

sjperkins commented Mar 10, 2022 •

edited

Loading

fjetter commented Mar 10, 2022

Utility functions to debug cluster_dump #5873

Utility functions to debug cluster_dump #5873

Comments

fjetter commented Feb 28, 2022

gjoseph92 commented Mar 1, 2022

fjetter commented Mar 1, 2022

gjoseph92 commented Mar 3, 2022

fjetter commented Mar 4, 2022

sjperkins commented Mar 9, 2022

fjetter commented Mar 9, 2022

sjperkins commented Mar 10, 2022 • edited Loading

fjetter commented Mar 10, 2022

sjperkins commented Mar 10, 2022

fjetter commented Mar 10, 2022

sjperkins commented Mar 10, 2022 • edited Loading

fjetter commented Mar 10, 2022

sjperkins commented Mar 10, 2022 •

edited

Loading

sjperkins commented Mar 10, 2022 •

edited

Loading