Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event driven data-store deltas #3938

Merged
merged 5 commits into from
Jan 19, 2021

Conversation

dwsutherland
Copy link
Member

@dwsutherland dwsutherland commented Nov 12, 2020

These changes close #3746, closes #3921, closes #4030
Partially satisfies cylc/cylc-ui#530

At present the data-store deltas are only partially event driven, including:

  • Node creation (task/familiy proxies and edges).
  • Job creation and deltas.

A few points on current state and intentions:

  • Task proxy deltas are not created on event.
  • The job pool (job_pool.py) is separate from the data-store manager because it is event driven, but after converting the data-store to an event driven approach, this is no longer necessary.

Some updates and these deltas should still be batched for application and publishing (for now) to ensure relational consistency, summary updates are less frequent, and UI/UIS is not overwhelmed. i.e. if you have a family of 100 tasks, and each individual state change constitutes a recalculation of the family state and a publishing of these, then you could end up with hundreds if not thousands of group-state/summary and state deltas to calculate and publish in a few seconds (but batched this would only come to a handful if that).

To do:

  • Separate the batched task fields delta into separate event driven deltas:
    • state
    • held
    • outputs (separate to capture individual output changes)
    • prerequisites (separate to capture individual output changes)
    • Clock trigger, external trigger, xtrigger changes (separate for each? just a dump for all in one proxy field ATM)
    • Job addition updates
  • Combine/merge the event-driven job pool (job_pool.py) into the data-store manager.

Most the new delta event triggers will be from within the task_pool.py and task_events_mgr.py

Possible future work:

  • Publish the deltas individually
  • Create a data-store at the UIS off these published events, which will decouple the n-windows at both ends.

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Appropriate tests are included (unit and/or functional).
  • Appropriate change log entry included.
  • No documentation update required.
  • No dependency changes.

@dwsutherland dwsutherland added efficiency For notable efficiency improvements sod-follow-up labels Nov 12, 2020
@dwsutherland dwsutherland added this to the cylc-8.0a3 milestone Nov 12, 2020
@dwsutherland dwsutherland self-assigned this Nov 12, 2020
@dwsutherland dwsutherland force-pushed the event-driven-deltas branch 4 times, most recently from eb372b6 to 254193b Compare December 18, 2020 02:31
@dwsutherland dwsutherland marked this pull request as ready for review December 18, 2020 02:31
@dwsutherland
Copy link
Member Author

dwsutherland commented Dec 18, 2020

Ok, ready for review..

However, there are a couple things that I might change:

One
The delta calls have been spread through most the event/task/job managers.
I think I want to change this to pass the data_store_mgr into the TaskProxy and objects created therein (i.e. state, outputs, prerequisites)..
Reasons to do this:

  • We may not be catching all the delta events (i.e. I may have missed places where the outputs are reset)
  • The way it is now makes maintenance/changes harder.. If other locations for state/output/other changes are added, data-store delta calls might be missed...

Reason(s) not to:

  • We may be moving away from using a pool of TaskProxys in the future anyway...
  • ...others?

Thoughts?

Two
Some fields of the workflow element should be delta driven:
cylc/cylc-ui#543

@hjoliver
Copy link
Member

hjoliver commented Dec 21, 2020

I've been playing with this branch using cylc tui - it's great to see the ui responding properly now (particularly to succeeded events) 🎉

However, I'm seeing this error from the UI Server with Cylc UI:

[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'PbTaskProxy.outputs' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' t
ype if you intend to send raw bytes. 
2020-12-21 14:48:53,393 tornado.application ERROR    Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncI
OMainLoop object at 0x7f22128d2a58>>, <Task finished coro=<WorkflowsManager.update() done, defined at /home/oliverh/cylc/cylc-uiserver/cylc/uiserver/workflows_mgr.py:227> e
xception=DecodeError('Error parsing message')>)
Traceback (most recent call last):
  File "/home/oliverh/cylc/cylc-uiserver/venv/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/home/oliverh/cylc/cylc-uiserver/venv/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/home/oliverh/cylc/cylc-uiserver/cylc/uiserver/workflows_mgr.py", line 255, in update
    await self._connect(wid, flow)
  File "/home/oliverh/cylc/cylc-uiserver/cylc/uiserver/workflows_mgr.py", line 202, in _connect
    flow
  File "/home/oliverh/cylc/cylc-uiserver/cylc/uiserver/data_store_mgr.py", line 134, in sync_workflow
    await self.entire_workflow_update(ids=[w_id])
  File "/home/oliverh/cylc/cylc-uiserver/cylc/uiserver/data_store_mgr.py", line 320, in entire_workflow_update
    pb_data.ParseFromString(result)
google.protobuf.message.DecodeError: Error parsing message
[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'PbTaskProxy.outputs' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' t
ype if you intend to send raw bytes. 

@hjoliver
Copy link
Member

Seems to be a genuine error. It still happens after recreating my environments. It occurs after starting the first flow on the back end.

@dwsutherland
Copy link
Member Author

dwsutherland commented Dec 21, 2020

I'll have a look. But, on the surface, it looks like the UI Server hasn't picked up the protobuf generated module changes.. But you mentioned you recreated your virtual envs.. So, not sure yet

@hjoliver
Copy link
Member

hjoliver commented Dec 21, 2020

I did pip install . from cylc-flow with your branch checked out, into cylc-uiserver.

@hjoliver
Copy link
Member

Also checked the correct data_messages_pb2.py is installed in the cylc-uiserver venv.

@dwsutherland
Copy link
Member Author

Working for me:
image

However, separate topic, did the "job runner" cylc-ui end go in?
image

@hjoliver
Copy link
Member

hjoliver commented Dec 21, 2020

However, separate topic, did the "job runner" cylc-ui end go in?

No but Bruno approved it already (on holiday!) and it's a simple change, so I'll merge it now. (That's a PR from me, today, BTW).

@hjoliver
Copy link
Member

(cylc-ui change merged)

@hjoliver
Copy link
Member

Update on this problem for others: google.protobuf.message.DecodeError: Error parsing message:

double check that you've done pip install . from this branch of cylc-flow into cylc-uiserver.

@hjoliver
Copy link
Member

(branch conflicted now)

@hjoliver
Copy link
Member

The delta calls have been spread through most the event/task/job managers. ... Thoughts?

My feeling is leave it as-is for now. This works, and we could consider moving them into the task proxy object later. However, it may not be worth bothering because we want to move to a proper event-loop driven system before too long: objects raise events (with associated data) and a central event loop processes all events, dispatching them to systems (like the datastore) that have registered interest in them. (Something like that).

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍, no problems found, so approving this before I bail out for the xmas break. Thanks @dwsutherland 💐

@dwsutherland dwsutherland force-pushed the event-driven-deltas branch 4 times, most recently from aebe2e7 to 0f5d7ab Compare December 24, 2020 02:18
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, runs fine, happy to have Tui showing succeeded tasks again 👏.

cylc/flow/data_store_mgr.py Show resolved Hide resolved
cylc/flow/data_store_mgr.py Show resolved Hide resolved
tp_id, PbTaskProxy(id=tp_id))
tp_delta.stamp = f'{tp_id}@{update_time}'
ext_trigger = tp_delta.external_triggers[trig]
ext_trigger.message = message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with outputs the message field is updated unnecessarily.

Copy link
Member Author

@dwsutherland dwsutherland Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one's a bit different, as I don't set the message in on task/node creation.. Reason being; I think it's user defined on the fly (via command).

cylc/flow/scheduler.py Show resolved Hide resolved
Comment on lines -815 to +818
itask.state.reset(TASK_STATUS_WAITING)
if itask.state.reset(TASK_STATUS_WAITING):
self.data_store_mgr.delta_task_state(itask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of these delta_task_state so a high risk of missing one, would it make sense to put the delta_task_state bit inside TaskProxy.reset to ensure consistency?

Copy link
Member Author

@dwsutherland dwsutherland Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrestled with this too..

On the one hand it would be simpler to pass the data_store_mgr object into ever TaskProxy, and then the proxies' Outputs, TaskState, and then to Prerequisites object and so on... You can see it being passed around a lot, like public bicycle but worse..

But on the other hand, the data_store_mgr doesn't need to know about all the task proxies (i.e. runahead and other not ready), and/or all their events...

At least this way we can be intentional about it... but I'm open..

Copy link
Member

@oliver-sanders oliver-sanders Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, might open a follow-on issue as this isn't entirely a data-store issue. Recently hit a bug where the DB wasn't being updated for a particular state change because the full procedure wasn't being followed, so I'm a bit worried about potential bugs now we have the task proxy, data store, outputs and data base to keep in sync.

(Long-term I would really like task events to be pushed to a centralised "event queue" with multiple consumers able to subscribe to the event stream.)

Copy link
Member

@oliver-sanders oliver-sanders Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I guess we could write an callback method in the task_pool and pass that to each task_proxy to avoid having to pass a handful of managers down to every object?

TaskPool already has access to suite_db_mgr, task_events_mgr & data_store_mgr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumped - #4037

cylc/flow/scripts/show.py Outdated Show resolved Hide resolved
cylc/flow/scripts/show.py Outdated Show resolved Hide resolved
cylc/flow/scripts/show.py Outdated Show resolved Hide resolved
cylc/flow/scripts/show.py Outdated Show resolved Hide resolved
cylc/flow/scripts/show.py Outdated Show resolved Hide resolved
@oliver-sanders
Copy link
Member

Prerequisites only seem to be listed for active tasks, presumably a SoD consequence, will need to find a way to generate the output for N>0 tasks as at the moment cylc show incorrectly shows a waiting task as having no prerequisites, should bump to another issue for another day.

@oliver-sanders
Copy link
Member

oliver-sanders commented Jan 6, 2021

Does this close #3921?

The information was already there, but in a different form (updated on event) .. So let's say yes.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UI working fine with the changes, great work!

@dwsutherland, I noticed while playing with the new code in GraphiQL that I was getting empty deltas, and empty task proxies too. Should them have been stripped/removed from the response instead? Easy to filter in the UI, but looks unnecessary to send {}. And you can see that the number of {}'s corresponds to the number of task proxies (that's five, the same bar/foo task proxies it's always displaying)

image

No other issues found 👍 I haven't tested it with cylc/cylc-ui#530, as it's addressing that issue partially. @oliver-sanders did a great review. I had a cursory look at the code, and found no other issues +1 pending only @dwsutherland to address @oliver-sanders ' review.

@hjoliver
Copy link
Member

Prerequisites only seem to be listed for active tasks, presumably a SoD consequence, will need to find a way to generate the output for N>0 tasks as at the moment cylc show incorrectly shows a waiting task as having no prerequisites, should bump to another issue for another day.

Correct, although I'd point out that the old system only had prerequisites for waiting tasks that happened to have corresponding task proxies spawned ahead already.

@dwsutherland dwsutherland force-pushed the event-driven-deltas branch 2 times, most recently from 1c8a612 to 19cee8f Compare January 15, 2021 04:09
@oliver-sanders
Copy link
Member

Chucked the prerequisites thing into another issue - #4036

@oliver-sanders oliver-sanders merged commit af770d4 into cylc:master Jan 19, 2021
@kinow kinow mentioned this pull request Feb 15, 2021
12 tasks
@hjoliver hjoliver modified the milestones: cylc-8.0a3, cylc-8.0b0 Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

showing prerequites of waiting tasks Add xtriggers to GraphQL schema Event-driven datastore update
4 participants