Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Serialize Agent tasks #755

Merged
merged 4 commits into from
Aug 20, 2014
Merged

Serialize Agent tasks #755

merged 4 commits into from
Aug 20, 2014

Conversation

bcwaldon
Copy link
Contributor

@bcwaldon bcwaldon commented Aug 7, 2014

@bcwaldon
Copy link
Contributor Author

bcwaldon commented Aug 7, 2014

@jonboulle What do you think of this approach? There may still be some rough edges here.

return nil, errors.New("task already in flight")
}

if t.Job == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be on line 60

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, moved things around last-minute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add more testing of this.

@jonboulle
Copy link
Contributor

Seems OK. At first the lack of error handling around a lot of the edges scared me, but then on reading the existing code I realised stuff isn't handled anyway, soo

@jonboulle
Copy link
Contributor

I mean, to be clear, the model is "operations can fail, failures are unhandled, the next reconciliation will clean up"

@bcwaldon
Copy link
Contributor Author

bcwaldon commented Aug 7, 2014

You are correct, the repetitive reconciliation will pound units into submission over time, so we don't need to handle task failures as intelligently right now.

@bcwaldon
Copy link
Contributor Author

bcwaldon commented Aug 7, 2014

@jonboulle hit everything

a.um.Stop(jobName)

a.uGen.Unsubscribe(jobName)
a.registry.RemoveUnitState(jobName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this can be removed entirely now, right? UnitStateGenerator should clean it up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh, yes it can.

@jonboulle
Copy link
Contributor

Looks good.

@bcwaldon
Copy link
Contributor Author

bcwaldon commented Aug 8, 2014

@jonboulle I'd like to consider one addition to this. In master today, a LoadJob and StartJob will be executed back-to-back. With the task manager, only one task can be in-flight, so the StartJob will be rejected. We have to wait for a subsequent reconciliation to get that StartJob to run, which is essentially introducing a 10s lag in starting jobs. What if we tasks that were passed to taskManager.Do were not simply dropped while another task were in flight, but were queued. The queue would be of size 1, and any call to Do would replace the currently-queued task. This would give us the ability to have a LoadJob in-flight, with the StartJob on deck ready to go whenever the LoadJob finishes (successfully). Eh?

@jonboulle
Copy link
Contributor

The queue would be of size 1, and any call to Do would replace the currently-queued task.

I'm not sure this is nuanced enough - we can't silently drop tasks. What if we have, say, LoadJob, StartJob, UnloadJob, LoadJob, StartJob in quick succession, with the original LoadJob being in flight for an extended period?

@bcwaldon
Copy link
Contributor Author

bcwaldon commented Aug 8, 2014

@jonboulle In your example, the startJob would be queued at the end. The intermediate tasks weren't going to be fulfilled anyways given the nature of the reconciler.

It is clear that this is an optimization that could hurt us, especially if we don't take into account the hash of the referenced unit (it could change over time). How do you feel about the proposal if we don't replace the queued task? Still a queue size of 1 to speed up the fleetctl start foo.service use-case, but we don't try to be any more helpful for now.

@jonboulle
Copy link
Contributor

Yes, I think that's a reasonable compromise for now.

@bcwaldon
Copy link
Contributor Author

rebased on master and squashed the squishables

}

go func() {
for res := range reschan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for even using reschan? Seems like it'd be simpler to just log errors in the taskmanager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to log the result in the context of the AgentReconciler. Maybe it's silly, but I don't want the task manager logging anything.

@bcwaldon bcwaldon force-pushed the serialize-tasks branch 3 times, most recently from 2caf619 to 28b9dbc Compare August 20, 2014 01:21
@jonboulle
Copy link
Contributor

justgoforit

bcwaldon added a commit that referenced this pull request Aug 20, 2014
@bcwaldon bcwaldon merged commit 61e9334 into coreos:master Aug 20, 2014
@bcwaldon bcwaldon deleted the serialize-tasks branch August 20, 2014 01:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants