-
Notifications
You must be signed in to change notification settings - Fork 302
Conversation
@jonboulle What do you think of this approach? There may still be some rough edges here. |
return nil, errors.New("task already in flight") | ||
} | ||
|
||
if t.Job == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be on line 60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, moved things around last-minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add more testing of this.
Seems OK. At first the lack of error handling around a lot of the edges scared me, but then on reading the existing code I realised stuff isn't handled anyway, soo |
I mean, to be clear, the model is "operations can fail, failures are unhandled, the next reconciliation will clean up" |
You are correct, the repetitive reconciliation will pound units into submission over time, so we don't need to handle task failures as intelligently right now. |
@jonboulle hit everything |
a.um.Stop(jobName) | ||
|
||
a.uGen.Unsubscribe(jobName) | ||
a.registry.RemoveUnitState(jobName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this can be removed entirely now, right? UnitStateGenerator
should clean it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooh, yes it can.
Looks good. |
@jonboulle I'd like to consider one addition to this. In master today, a LoadJob and StartJob will be executed back-to-back. With the task manager, only one task can be in-flight, so the StartJob will be rejected. We have to wait for a subsequent reconciliation to get that StartJob to run, which is essentially introducing a 10s lag in starting jobs. What if we tasks that were passed to taskManager.Do were not simply dropped while another task were in flight, but were queued. The queue would be of size 1, and any call to Do would replace the currently-queued task. This would give us the ability to have a LoadJob in-flight, with the StartJob on deck ready to go whenever the LoadJob finishes (successfully). Eh? |
I'm not sure this is nuanced enough - we can't silently drop tasks. What if we have, say, |
@jonboulle In your example, the startJob would be queued at the end. The intermediate tasks weren't going to be fulfilled anyways given the nature of the reconciler. It is clear that this is an optimization that could hurt us, especially if we don't take into account the hash of the referenced unit (it could change over time). How do you feel about the proposal if we don't replace the queued task? Still a queue size of 1 to speed up the |
Yes, I think that's a reasonable compromise for now. |
rebased on master and squashed the squishables |
} | ||
|
||
go func() { | ||
for res := range reschan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for even using reschan
? Seems like it'd be simpler to just log errors in the taskmanager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to log the result in the context of the AgentReconciler. Maybe it's silly, but I don't want the task manager logging anything.
2caf619
to
28b9dbc
Compare
justgoforit |
28b9dbc
to
28a6a2e
Compare
28a6a2e
to
ee666f1
Compare
#646