-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Job awaiter #633
Add Job awaiter #633
Conversation
07aed3d
to
59f6ea7
Compare
59f6ea7
to
1c223e7
Compare
Looking forward to this. Should really simplify our code base. Any idea when this is targeted to be completed? |
I'd guess that it will be in a dev build within the next week. |
Awesome, so glad to hear that. This is a HUGE feature for us, and pretty much anyone using pulumi to deploy applications that require stuff like DB migrations, etc. |
1c223e7
to
04ccc3a
Compare
Here's how things look at this point: Failed Job:
Successful Job:
|
377ce21
to
9759d2f
Compare
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really digging the test scenarios here 🙂. I know we do similar, extended coverage for all other resources with fixtures, but am not sure to what extent. How is the parity in this space across all resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to the way we're testing Pod, and is the general direction I'm taking the test coverage. Existing tests for other resources are basically black box tests, and generally make it harder to reason about correctness.
There's room for another layer of tests to make sure the awaiter channels/timeouts are wired up properly, but I think that's a lot less critical and error prone than the state checking logic I'm testing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM - A couple of logical changes would be nice to see to reduce on complexity
9759d2f
to
191b54b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cc @hausdorff - PTAL and review.
191b54b
to
de9b5fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding await semantics to Job
simply enriches the existing functionality, so I think this is the right move.
But, before we click the merge button, I just want to make sure we're all on the same page about the (very weird) implications of using Job
in Pulumi. All of these were true before, but we never had this conversation before, so let me state some things I believe to be true about the Job
, and if any of them are wrong, please correct me:
- If you run
pulumi up
with aJob
, it will stick around until you delete it. So subsequent runs ofpulumi up
will not cause the job to re-run. - Users should be very cautious of including
Job
in Pulumi programs! Unlike other resource types,Job
is intended to run once (e.g., for a DB schema migration), so when and how it runs really matters. Once you add aJob
to your Pulumi project, ordering suddenly matters a lot—so if you run a freshpulumi up
and yourJob
does not run exactly when it is supposed to, it could fail the whole deployment. - We make no attempt to be smart about automated cleanup from the TTL controller. So, if a user sets
.spec.ttlSecondsAfterFinished
and theJob
gets cleaned up, another run ofpulumi up
after the TTL will re-deploy theJob
.
Like I said, I think all of this is fine, especially since we support it all already, but I just want us to go in with eyes open.
// A Job is a construct that allows users to run a workload as a Pod that terminates with a | ||
// success or failure. | ||
// | ||
// A Job is considered "ready" if the following conditions are true: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We say that these are the conditions required to determine a job is "ready", but it sounds below like we're describing jobs that have completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant "ready" in the sense that we're done waiting on the resource. For Job, that would mean it is complete.
Yes - that is expected and desired - unless you do something to force it to replace.
I think this behaves exactly as you want for scenarios where it is useful - as long as you can force it to replace when the thing that should trigger it to re-run changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simple enough we can probably merge now. I'm fully confident we'll find more bugs, but I think we're well within our risk tolerance here.
I left a couple comments. The biggest thing that is missing is the error reporting is not super great in interactive mode. We can follow up with that though.
Fixes #449.
TODO:
Here's how things look at this point:
Failed Job:
Successful Job: