Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore using JSON for LB stats files for a more consistent, readable format #1469

Closed
lifflander opened this issue Jun 9, 2021 · 17 comments · Fixed by #1475
Closed

Explore using JSON for LB stats files for a more consistent, readable format #1469

lifflander opened this issue Jun 9, 2021 · 17 comments · Fixed by #1475
Assignees

Comments

@lifflander
Copy link
Collaborator

What Needs to be Done?

Choose a new format for the LB stats file.

@lifflander
Copy link
Collaborator Author

Proposal for format:

{
    "phases": [
        {
            "phase": [
                {
                    "id": 1,
                    "tasks": [
                        {
                            "resource": "cpu",
                            "node": 10,
                            "object": 1993438,
                            "time": 5.34
                        },
                        {
                            "resource": "cpu",
                            "node": 10,
                            "object": 283883,
                            "time": 3.4,
                            "suphases": [
                                {
                                    "id": 1,
                                    "time": 1.4
                                },
                                {
                                    "id": 2,
                                    "time": 2.0
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}

@nlslatt @PhilMiller @JacobDomagala @jstrzebonski

@jstrzebonski
Copy link
Contributor

If there won't be anything except phases, then it could be a simple array, which still is valid JSON.

[
        {
            "phase": [
                {
                    "id": 1,
                    "tasks": [
                        {
                            "resource": "cpu",
                            "node": 10,
                            "object": 1993438,
                            "time": 5.34
                        },
                        {
                            "resource": "cpu",
                            "node": 10,
                            "object": 283883,
                            "time": 3.4,
                            "suphases": [
                                {
                                    "id": 1,
                                    "time": 1.4
                                },
                                {
                                    "id": 2,
                                    "time": 2.0
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]

@jstrzebonski
Copy link
Contributor

One question, why is phase an array?

@lifflander
Copy link
Collaborator Author

One question, why is phase an array?

Because there will be multiple phases. I just have phase 1 in this example, but there will be many in real data.

@lifflander
Copy link
Collaborator Author

Oh, I see what you mean. I misunderstood. I think you are right on this.

@jstrzebonski
Copy link
Contributor

OK, so than that would be sufficient I guess:

[
        {
            "phase": {
                    "id": 1,
                    "tasks": [
                        {
                            "resource": "cpu",
                            "node": 10,
                            "object": 1993438,
                            "time": 5.34
                        },
                        {
                            "resource": "cpu",
                            "node": 10,
                            "object": 283883,
                            "time": 3.4,
                            "suphases": [
                                {
                                    "id": 1,
                                    "time": 1.4
                                },
                                {
                                    "id": 2,
                                    "time": 2.0
                                }
                            ]
                        }
                    ]
                }
        },
...
    ]

@jstrzebonski
Copy link
Contributor

jstrzebonski commented Jun 9, 2021

On second thought, maybe like this:

{
  "phases": [
    {
      "id": 1,
      "tasks": [
        {
          "resource": "cpu",
          "node": 10,
          "object": 1993438,
          "time": 5.34
        },
        {
          "resource": "cpu",
          "node": 10,
          "object": 283883,
          "time": 3.4,
          "subphases": [
            {
              "id": 1,
              "time": 1.4
            },
            {
              "id": 2,
              "time": 2.0
            }
          ]
        }
      ]
    },
    {
      "id": 2,
      "tasks": [
        {
          "resource": "cpu",
          "node": 10,
          "object": 1993438,
          "time": 5.34
        },
        {
          "resource": "cpu",
          "node": 10,
          "object": 283883,
          "time": 3.4,
          "subphases": [
            {
              "id": 1,
              "time": 1.4
            },
            {
              "id": 2,
              "time": 2.0
            }
          ]
        }
      ]
    }
  ]
}

that way phases, tasks and subphases are all simply arrays of objects, which, I think, we expect.

@nlslatt
Copy link
Collaborator

nlslatt commented Jun 10, 2021

Are we including node just for completeness? Or are we no longer going to have one file per rank?

@PhilMiller
Copy link
Member

If we do include node, then depending on the reset of the structure, the files could in theory be simply concatenated or the parsed results merged (i.e. set/map union)

@lifflander
Copy link
Collaborator Author

lifflander commented Jun 10, 2021

If we do include node, then depending on the reset of the structure, the files could in theory be simply concatenated or the parsed results merged (i.e. set/map union)

That's exactly why I included it. So we could combine the files and it would be correct. I'm still intending to have one file per rank because that will be most efficient to output I think. But this would allow us to easily combine them and know which rank they came from without relying on the filename to know.

@lifflander
Copy link
Collaborator Author

So the optional meta-data file would look like this:

{
  "subphases": [
    {
      "id": 1,
      "name": "mySubphaseName1"
    },
    {
      "id": 2,
      "name": "mySubphaseName2"
    },
  ]
}

@lifflander
Copy link
Collaborator Author

So the issue with the specification is that if we incrementally output, we can't create an array for phases unless we read it in and then write it out again (AFAIK). Basically, the code I wrote ends up with something like this for an incremental builder:

{
    "phases": [
        null,
        null,
        null,
        null,
        null,
        null,
        null,
        {
            "tasks": [
                {
                    "time": 1.6927719116210938e-05,
                    "subphases": [
                        {
                            "time": 1.6927719116210938e-05,
                            "id": 0
                        }
                    ],
                    "resource": "cpu",
                    "object": 107374182400,
                    "node": 0
                },
                {
                    "time": 0.012003183364868164,
                    "subphases": [
                        {
                            "time": 0.012003183364868164,
                            "id": 0
                        }
                    ],
                    "resource": "cpu",
                    "object": 25769803776,
                    "node": 0
                },
                {
                    "time": 0.011924028396606445,
                    "subphases": [
                        {
                            "time": 0.011924028396606445,
                            "id": 0
                        }
                    ],
                    "resource": "cpu",
                    "object": 0,
                    "node": 0
                }
            ],
            "id": 7
        }
    ]
}

But the code to output this is easy to write:

  using json = nlohmann::json;

  json j;
  j["phases"] = {};
  j["phases"][phase]["id"] = phase;

  std::size_t i = 0;
  for (auto&& elm : node_data_.at(phase)) {
    ElementIDStruct id = elm.first;
    TimeType time = elm.second;
    j["phases"][phase]["tasks"][i]["resource"] = "cpu";
    j["phases"][phase]["tasks"][i]["node"] = theContext()->getNode();
    j["phases"][phase]["tasks"][i]["object"] = id.id;
    j["phases"][phase]["tasks"][i]["time"] = time;

    auto const& subphase_times = node_subphase_data_.at(phase)[id];
    std::size_t const subphases = subphase_times.size();
    if (subphases != 0) {
      for (std::size_t s = 0; s < subphases; s++) {
        j["phases"][phase]["tasks"][i]["subphases"][s]["id"] = s;
        j["phases"][phase]["tasks"][i]["subphases"][s]["time"] = subphase_times[s];
      }
    }
    i++;
  }

  fmt::print("j={}\n", to_string(j));

@lifflander
Copy link
Collaborator Author

So I think we need to not output an array for phases, instead just a grouping on the name.

@lifflander
Copy link
Collaborator Author

Actually, now I read more, I think to do this "correctly" we have to read all the json in, add what we want, and then write it out again. So we will need some work-around for the incremental output.

@nlslatt
Copy link
Collaborator

nlslatt commented Jun 11, 2021

Given that limitation, is JSON really the right format to use?

lifflander added a commit that referenced this issue Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants