Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if we can use states and stateTotals in the GScan instead of jobs #401

Closed
kinow opened this issue Feb 11, 2020 · 17 comments · Fixed by #617
Closed

Check if we can use states and stateTotals in the GScan instead of jobs #401

kinow opened this issue Feb 11, 2020 · 17 comments · Fixed by #617
Assignees
Milestone

Comments

@kinow
Copy link
Member

kinow commented Feb 11, 2020

Describe exactly what you would like to see in an upcoming release

ATM, the GScan component uses the workflow.jobs in the GraphQL query. During the Cylc Meetup 2020, it was pointed we could instead use workflow.states and workflow.stateTotals.

It appears to be missing the task name, so we may need to review how Cylc 7 was doing it.

Additional context

Pull requests welcome!

@kinow
Copy link
Member Author

kinow commented Mar 10, 2020

Cylc 7

The task states are retrieved in the GScan application every 1 second, when the GScan run function (while True:...; sleep(1) loop) will populate a dict self.suite_info_map that contains a key tasks-by-states. This dict is populated with the result of a HTTP GET to /id/identify (subject to permission/auth check).

image

The HTTP server simply returns the existing values, that are created when the Scheduler calls the StateSummaryMgr's update function.

image

@hjoliver
Copy link
Member

It appears to be missing the task name,

Do you mean we're missing tasks_by_state on master? I think that was just for passing the most recent (6 max?) tasks of each state to gscan, for use in hover-over popups... And it was pre-formatted strings, with "And N more" in the final slot...

@hjoliver
Copy link
Member

So we will need to add that (or something similar) in to the new code.

@kinow
Copy link
Member Author

kinow commented Mar 10, 2020

I was just looking at how it was done in Cylc 7. Next up is understand how it's being implemented in cylc monitor, to see if the Cylc UI can use something similar, or if we are really missing something in the new code.

@kinow
Copy link
Member Author

kinow commented Mar 10, 2020

Cylc 8 (cylc monitor)

From what I understood, Cylc 8's cylc monitor calls the client operation get_suite_state_summary, that returns a JSON with an array. It uses the two first items in the array to fetch the global info, and the states total. However, it doesn't have that tooltip as in GScan of Cylc 7 or Cylc 8, so it doesn't need to group the entries per state, and the state totals key has the information it needs digested.

image

@dwsutherland
Copy link
Member

State totals is in there:
uis_added_updated

However, I'll need to add a totals-by-cycle at some stage (info is available, but I haven't schema'd it)

@kinow
Copy link
Member Author

kinow commented Mar 10, 2020

Thanks David! Tomorrow I was going to look at our schema, and ask you whether we were going to have the same or, if later that cylc client get_suite_state_summary command would be converted to graphql and perhaps we could call it.

But looks like we will get the new fields in the schema. Updating GScan (vue) should be really straightforward after that. Thanks!

@dwsutherland
Copy link
Member

get_suite_state_summary will be retired, and an equivalent/sufficient graphql endpoint call will be used.

Just working on the grouped deltas (as shown above), and then will continue converting the CLI.

@kinow
Copy link
Member Author

kinow commented Sep 22, 2020

I'm using the stateTotals in #337 #499 (ops)

@hjoliver
Copy link
Member

I think you mean #499? - so that can close this issue?

@kinow
Copy link
Member Author

kinow commented Sep 23, 2020

I think you mean #499? - so that can close this issue?

Ops, wrong issue linked. Thanks.

Not sure if this can be closed. I think there's still the workflow summaries. I'm iterating the list of task proxies. We had a discussion some days ago about adding some of that info to GraphQL queries. Not sure if that should go under states or stateTotals though (if not we can then close this one I think)

@hjoliver
Copy link
Member

Ah, right. Yeah we don't want gscan to be requesting all the task proxies.

@hjoliver
Copy link
Member

hjoliver commented Dec 3, 2020

State totals tool-tips "most recent tasks by state" blocked by: cylc/cylc-flow#3976

@dwsutherland
Copy link
Member

dwsutherland commented Jan 20, 2021

Are we sure we can't just use something like this:

subscription {
  deltas (workflows: ["sutherlander|baz"], stripNull: true) {
    updated {
      workflow (stripNull: false, deltaStore: false) {
        stateTotals
        submitted: taskProxies (
          states: ["submitted"],
          deltaStore: false,
          stripNull: true,
          sort: {keys: ["name"], reverse: false}
        ) {
                name
        	cyclePoint
        }
        running: taskProxies (
          states: ["running"],
          deltaStore: false,
          stripNull: true,
          sort: {keys: ["name"], reverse: false}
        ) {
                name
        	cyclePoint
        }
        failed: taskProxies (
          states: ["failed"],
          deltaStore: false,
          stripNull: true,
          sort: {keys: ["name"], reverse: false}
        ) {
                name
        	cyclePoint
        }
      }
    }
  }
}

recent-tasks-gscan

? And then add a limit argument to return only 5?

Or do we just want to create a static list of name.cycle for every state (recalculated on state change)?

@hjoliver
Copy link
Member

hjoliver commented Jan 27, 2021

? And then add a limit argument to return only 5?

Will that give us the most recent tasks, for each state? E.g. we want the most recent 5 task failures, not just 5 failed tasks ordered alphanumerically by name or whatever.

@dwsutherland
Copy link
Member

? And then add a limit argument to return only 5?

Will that give us the most recent tasks, for each state? E.g. we want the most recent 5 task failures, not just 5 failed tasks ordered alphanumerically by name or whatever.

You can sort by cycle-point/submit and filter by state ... Then apply the limit last (which might be inefficient.. as limiting first would mean the resolvers don't have to return all)

@dwsutherland
Copy link
Member

dwsutherland commented Feb 4, 2021

We may need a "update_time" field added to all nodes (so we can sort by that)... Beauty of deltas is; it's ok for there to be heaps of fields (as not all are sent on each update).. However, this one will be sent with any update (doesn't have to be "any" I suppose)..

The alternate approach might put less strain on the back end; just create a static dump of tasks (in string for) for each state when needed on all state changes.. Although, this would be a constant thing for the workflows (whether someone is watching or not)..

In either case the backend would need to;

  • Filter by state, sort by last_updated, and then limit to the first 5 or N results..

But it doesn't seam that scalable if you have thousands of tasks to filter and sort through on ever state update...

Perhaps we can do it smarter on the backend... i.e.

  • Each change in state adds it to the front of a state list/queue/dict, removes it from the old (if any)
  • A queue/data-structure whose size is restricted to say 20
  • The query can then pick the first 5/N from this.

This would be scalable

This was referenced Mar 16, 2021
@kinow kinow self-assigned this Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants