-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve kube deploy process. #13397
Improve kube deploy process. #13397
Conversation
secretKeyRef: | ||
name: airbyte-secrets | ||
key: DATABASE_USER | ||
ttlSecondsAfterFinished: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the PR description, this is necessary to avoid an error being about the airbyte-bootloader job being immutable. This isn't an ideal solution, because this means that the bootloader pod is deleted after it completes, making its logs inaccessible through kube.
This was the only solution I could come up with that allowed us to still use kubectl apply
without issue though, so it may be a worthwhile tradeoff.
Definitely open to feedback here if there are any other options I haven't considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment explaining why we do this, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
apiVersion: v1 | ||
kind: Pod | ||
apiVersion: batch/v1 | ||
kind: Job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kube jobs only have best effort parallelism guarantees, which is why I don't really like using them for crucial workflows. Took a look and confirmed this is probably the best way of doing this with Kustomize. Can we add a comment here that we generally want to use Pod
(our Helm charts use Pod) for the best exactly-once execution guarantees and cannot do so because Kustomize does not support generateName
? Want to prevent confusion in the future.
If Kustomize did support generateName
, we should be able to instruct users to run kubectl create
on initial create and replace
on subsequent runs.
This happens relatively infrequently so risk is low. In the long term, I think we'll consolidate the Kube deploys into Helm so I think this is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the thoroughness and the detailed PR description. One note to explain why we are using a job here. Otherwise looks good!
* master: (142 commits) Highlight removed and added streams in Connection form (airbytehq#13392) 🐛 Source Amplitude: Fixed JSON Validator `date-time` validation (airbytehq#13373) 🐛 Source Mixpanel: publish v0.1.17 (airbytehq#13450) Fixed reverted PR: Fix cancel button when it doesn't provide feedback to the user + UX improvements (airbytehq#13388) 🎉 Source Freshdesk: Added new streams (airbytehq#13332) Prepare YamlSeedConfigPersistence for dependency injection (airbytehq#13384) helm chart: Support nodeSelector, tolerations and affinity on the booloader pod (airbytehq#11467) airbyte-api: add jackson model annotations to remove null values from responses (airbytehq#13370) Change stage to `beta` (airbytehq#13422) 🐛 Source Google Sheets: Retry on server errors (airbytehq#13446) Improve kube deploy process. (airbytehq#13397) Helm chart dependencies fix (airbytehq#13432) 🐛 Source HubSpot: Transform `contact_lists` data to comply with schema (airbytehq#13218) airbytehq#11758: Source Google Ads to GA (airbytehq#13441) Add more pr actions to tag pull requests (airbytehq#13437) Source Google Ads: drop schema field that filters out the data from stream (airbytehq#13423) Updates error view with new design (airbytehq#13197) Source MSSQL: correct enum Standard method (airbytehq#13419) Update postgres doc about cdc publication (airbytehq#13433) run source acceptance tests against image built from branch (airbytehq#13401) ...
What
Resolves #13144
As the issue linked above describes, if a user tries to perform a rolling update of a kube deployment of airbyte, they may run into multiple issues: it may throw an error saying that the bootloader pod cannot be edited, and if a new db pod starts up it could cause the underlying db to be permanently broken. Even if users follow our upgrade instructions in our docs, i.e. in a non-rolling fashion, they could still run into both issues depending on how quickly they try to execute the commands.
This PR attempts to fix both issues
How
I tried a few strategies to fix the bootloader problem:
generateName
instead ofname
so that the bootloader pod would always have a unique name. This didn't work becausekubectl apply
cannot be used on a kube resource that does not have aname
field, and kubectl apply is what our docs currently instruct users to use and may be important in the future for rolling deploys.Deployment
instead ofPod
. This was bad because it caused the bootloader to be ran repeatedly; Deployment is not the right resource type to use for a one-time process.Job
instead ofPod
. This still had the issue where runningkubectl apply
with some changes to env variables would throw this error:ttlSecondsAfterFinished
to the bootloader job, so that the bootloader pod would be automatically deleted after it completes. This fixed the above issue and allowed me to usekubectl apply
to freely switch betweenstable
anddev
(as long as I waited for the bootloader pod to be automatically deleted).For the db pod issue, adding the
Recreate
strategy to the db deployment manifest seems to have fully fixed the issue, as it causes kube to first terminate the existing db pod before spinning up a new one.Recommended reading order
any