Add support for PodDisruptionBudget #308

wei-lee · 2020-01-22T10:47:37Z

WHAT

This PR will add support for PodDisruptionBudget to the 3scale operator. See JIRA.

HOW

A new feature flag is introduced to the APIManager CRD to allow users to enable this feature. An example can be found here.

When enabled, the following PDBs are created:

apicast-production
apicast-staging
backend-cron
backend-listener
backend-worker
system-app
system-sidekiq
zync
zync-que

All PDBs have the MaxUnavailable field value set to 1. This will ensure when the node is being drained, only 1 pod per component is allowed to be evicted. We do not to update/delete these PDBs even if there is only 1 instance of these components are deployed as it won't violate the rules in PDBs. Also all the components can still be scaled up and down as normal. It will be count against the PDB rules, but PDB won't stop replica controllers from scaling down the pods.

Verification

To verify:

Follow the instructions here to build the operator and deploy it an OpensShift 4 cluster. You can use this image quay.io/wei_lee/3scale-operator:test.
Create a new APIManager CR using this example.
Wait for deployments start for all the components.
List all the PDBs in the namespace and you should see 9 of them. You can also check the content of them to make sure they have the right spec.
Try scale down some of the components and you should still be able to do that.
Delete the APIManager CR that is created and make sure all the PDBs are deleted as well.

miguelsorianod · 2020-01-27T13:41:15Z

Hi @wei-lee,

Thank you for the contribution.

I've been trying the PodDisruptionBudget and the behaviour I understand it has is the following one:

As you commented, when draining a node, this PR allows that, when activated, it ensures that at most one pod at a time will be evicted from the node for each component. It is important to note that if the node to drain has only one pod for a component, the PodDisruptionBudget will comply the rules, as the MaxUnavailable is set to 1. This means that this functionality does not prevent a node being left without a pod of a component. What it does, is it makes the draining of the node for all of its pods components to be moved to another node one at a time. That's the intent of this PR right? and not preventing a node being left without a pod of a component.
As you also commented, the PodDisruptionBudget stats are taken into account when scaling up/down a DeploymentConfig/Deployment or performing some kind of rolling update (for example when you change the PodTemplate spec and this triggers a new deployment), but they are not enforced in this case, so we should be able to perform changes into the Deployment and scale up/down without problems even if the PodDisruptionBudget stats indicate that it shouldn't be possible to evict.

As I understand, apart from enabling the PodDisruptionBudget field, to really use this functionality the number of replicas of the different components have to be configured with more than one replica. Otherwise downtime avoidance is not prevented.

We are currently reviewing the code and we'll get back to you with feedback about it.

Thank you.

go.mod

pkg/3scale/amp/operator/base_apimanager_logic_reconciler.go

pkg/3scale/amp/component/backend.go

pkg/apis/apps/v1alpha1/apimanager_types.go

wei-lee · 2020-01-27T15:02:38Z

@miguelsorianod

Hi @wei-lee,

Thank you for the contribution.

I've been trying the PodDisruptionBudget and the behaviour I understand it has is the following one:

As you commented, when draining a node, this PR allows that, when activated, it ensures that at most one pod at a time will be evicted from the node for each component. It is important to note that if the node to drain has only one pod for a component, the PodDisruptionBudget will comply the rules, as the MaxUnavailable is set to 1. This means that this functionality does not prevent a node being left without a pod of a component. What it does, is it makes the draining of the node for all of its pods components to be moved to another node one at a time. That's the intent of this PR right? and not preventing a node being left without a pod of a component.

Yes, that's right.

As you also commented, the PodDisruptionBudget stats are taken into account when scaling up/down a DeploymentConfig/Deployment or performing some kind of rolling update (for example when you change the PodTemplate spec and this triggers a new deployment), but they are not enforced in this case, so we should be able to perform changes into the Deployment and scale up/down without problems even if the PodDisruptionBudget stats indicate that it shouldn't be possible to evict.

That's right too.

As I understand, apart from enabling the PodDisruptionBudget field, to really use this functionality the number of replicas of the different components have to be configured with more than one replica. Otherwise downtime avoidance is not prevented.

Yes, right again.

We are currently reviewing the code and we'll get back to you with feedback about it.

Thank you.

Thank you for reviewing!

eguzki · 2020-01-27T17:37:59Z

Good Job 🏅

The PR introduces lots of changes. Reviewing will continue

eguzki · 2020-01-27T17:39:04Z

Feel free to ask anything about the code structure. Happy to answer

eguzki · 2020-01-28T09:44:09Z

No PDB has been created for system-sphinx deploymentconfig. Any reason not to do it or just missed it?

wei-lee · 2020-01-28T11:09:57Z

No PDB has been created for system-sphinx deploymentconfig. Any reason not to do it or just missed it?

@eguzki By the look of this code, it looks like system-sphinx is not scalable. That's why I didn't create the PDB for it as it won't make any differences. Should it be scalable too?

pkg/3scale/amp/operator/base_apimanager_logic_reconciler.go

pkg/controller/apimanager/apimanager_controller.go

eguzki · 2020-01-28T12:25:52Z

@eguzki By the look of this code, it looks like system-sphinx is not scalable. That's why I didn't create the PDB for it as it won't make any differences. Should it be scalable too?

@wei-lee Fair enough. To be honest we do not even know if it is scalable. My bet is that it is actually scalable. Furthermore, the operator creates a service for it.

For this PR, we are not creating PDB for system-sphinx. But we have created one action point to ask system guys about it and include PDB for it in case it is scalable in some other PR.

eguzki · 2020-01-28T15:11:50Z

looking soo good, congrats @wei-lee 🏅

wei-lee · 2020-01-28T15:38:59Z

@miguelsorianod @eguzki Thanks for the review, really appreciated it. I think I have addressed all your comments now, do you mind taking a look again and let me know if there is anything else you want to change.

Thanks a lot!

wei-lee · 2020-01-28T15:40:08Z

Forgot to mention I have also deployed the 3scale operator to an openshift cluster and it works fine.

pkg/apis/capabilities/v1alpha1/zz_generated.openapi.go

pkg/apis/apps/v1alpha1/zz_generated.openapi.go

miguelsorianod · 2020-01-28T16:16:26Z

Functionality wise looks good to me 👍 . I've also tried to deploy it in my local cluster and trying to enable and disable PodDisruptionBudget and it works correctly too 👍

miguelsorianod

I think this can be almost merged. To me there's only pending to address the go.sum changes and the generated openapi changes that we left you feeback about.

Also before merging the PR, can you clean the commit history?
We usually have one or more commits for the code functionality (separated in "logical" commits) and another different commit with the generated templates that we usually name it as "generator: update templates".

Thank you.

wei-lee · 2020-01-28T17:22:09Z

@miguelsorianod I have now reverted changes to the go.sum file and cleaned up the commit history.

I also regenerated the files by using 0.8.0 version of the operator-sdk by running operator-sdk generate k8s and operator-sdk generate openapi. However, the generated files are still different from master and the crd tests are failing. Do you have any idea where I did wrong? Am I using the wrong version of the operator-sdk?

miguelsorianod · 2020-01-28T17:32:15Z

Hi @wei-lee,

The changes in the pkg/apis/apps/v1alpha1/zz_generated.deepcopy.go and pkg/apis/apps/v1alpha1/zz_generated.openapi.go files are expected and correct now. The changes are expected because are due to the addition of the PodDisruptionBudget changes so it's ok in that regard 👍 .

About the deploy/crds/capabilities* files please exclude those changes from the commits. Those changes happens because some fields in the CRDs are not generated by the operator-sdk's openapi generator tool but we wanted to include them to offer security at deploy time.
The same happens with some of the fields in 'deploy/crds/apps_v1alpha1_apimanager_crd.yaml'. From there we only want the changes related to the PodDisruptionBudget addition. Please do not add the changes that remove some parts of the configurationSecretRef and awsSecretCredentials fields. We know this is unfortunate but we do not know of a way to include them automatically as the openapi tool does not seem to include those fields or how to make it include them (at least in the version used in operator-sdk 0.8.0)

wei-lee · 2020-01-28T18:44:48Z

@miguelsorianod understood. Thanks for the detailed explanation. I will make the changes you suggested.

wei-lee · 2020-01-28T19:13:22Z

@miguelsorianod ok, all issues are addressed now. Please let me know if there is anything else you need.

Thanks for the help! 👍

miguelsorianod · 2020-01-29T09:45:42Z

Hi @wei-lee,

The final state of the changes looks good now 👍

Regarding the commit history, I think the commits do not correspond with the changes. On the update templates commit I do not see that the amp-ha template changes are there, which should be the only file in that commit.
I also see some changes related to go.sum and some changes we talked about in the deepcopy generated files in the commits that shouldn't be shown be added in any commit. Can you fix that? Thank you.

wei-lee · 2020-01-29T10:56:24Z

@miguelsorianod is it ok if i just merge the 2 commits to 1? The commits are already squashed so it will be tricky to reorganise them again.

eguzki · 2020-01-29T11:22:04Z

@wei-lee this is one of my favorite git tricks.

Merge both commits into one, and then split them as you wish.

Split commit in two

https://emmanuelbernard.com/blog/2014/04/14/split-a-commit-in-two-with-git/

git rebase -i <oldsha1>~1
# mark the expected commit as `edit` (replace pick in front of the line), save and close
git reset HEAD^
git add ...
git commit -m "First part"
git add ...
git commit -m "Second part"
git rebase --continue

wei-lee · 2020-01-29T11:36:33Z

@eguzki thanks for the trick. That's really useful. Will try that now.

codeclimate · 2020-01-29T11:54:21Z

Code Climate has analyzed commit 1e10de5 and detected 46 issues on this pull request.

Here's the issue category breakdown:

Category	Count
Complexity	5
Duplication	17
Style	24

View more on Code Climate.

wei-lee · 2020-01-29T11:55:08Z

@miguelsorianod thanks again @eguzki for the trick, it is easier than I thought. Now it should be ok now.

miguelsorianod

It looks good to me now and on my view it can be merged, good job 👍

Thank you very much, we appreciate your contribution and work on this @wei-lee.

wei-lee requested a review from eguzki January 22, 2020 10:53

wei-lee changed the title ~~Intly 4681 add pdb~~ Add support for PodDisruptionBudget Jan 22, 2020

wei-lee requested a review from miguelsorianod January 22, 2020 11:16

wei-lee mentioned this pull request Jan 22, 2020

Add PodDisruptionBudget when HA is enabled #303

Closed

wei-lee force-pushed the INTLY-4681-add-pdb branch from 056e6a9 to d2f46b6 Compare January 27, 2020 09:51