Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPNET-133: Support remote worker #858

Merged
merged 1 commit into from
Dec 19, 2022

Conversation

tsorya
Copy link
Contributor

@tsorya tsorya commented Nov 15, 2022

Adding node affinity rule to router deployement that will not allow
router to run on remote worker node (node with custom role label)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 15, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 15, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need to update hashableDeployment and deploymentConfigChanged. Please also add a unit test in TestDeploymentConfigChanged.

Isn't having remote worker nodes already a supported feature? Is the "node-role.kubernetes.io/remote-worker" label new? This change implies that running router pods on nodes with that label is strictly prohibited, but if it was possible to do it before, then it would be a regression to prohibit it now.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2022
@tsorya tsorya changed the title WIP: Don't run router on remote worker OPNET-133: Support remote worker Nov 30, 2022
@tsorya tsorya marked this pull request as ready for review November 30, 2022 15:08
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 30, 2022
@tsorya
Copy link
Contributor Author

tsorya commented Nov 30, 2022

It was not possible previously as we don't support remote worker at all for now

@tsorya tsorya force-pushed the igal/remote-worker branch 2 times, most recently from e614a77 to e1a1e31 Compare December 1, 2022 14:02
Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor suggestions or questions around comments. Looks fine overall.

pkg/operator/controller/ingress/deployment.go Outdated Show resolved Hide resolved
@@ -592,6 +592,21 @@ func desiredRouterDeployment(ci *operatorv1.IngressController, ingressController
nodeSelector["node-role.kubernetes.io/master"] = ""
default:
nodeSelector["node-role.kubernetes.io/worker"] = ""
// Disabling running on remote workers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Disabling running on remote workers
// Disabling running on remote workers.

Is there any intention to allow scheduling routers on remote workers in some future release?

@@ -40,6 +40,8 @@ const (
// DefaultCanaryNamespace is the default namespace for
// the ingress canary check resources.
DefaultCanaryNamespace = "openshift-ingress-canary"

RemoteWorkerLabel = "node.openshift.io/remote-worker"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding godoc for this definition?

Defining the label here is fine as a temporary measure, but will it eventually be defined in openshift/api or the like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope yes, currently we want to start using it and there is no actual need to add it to openshift-api right a way.
But yes, when we will start increasing usage, we will move it to openshift-api

Adding node affinity rule to router deployement that will not allow
router to run on remote worker node (node with custom role label)
@Miciah
Copy link
Contributor

Miciah commented Dec 7, 2022

Thanks!
/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2022
@tsorya
Copy link
Contributor Author

tsorya commented Dec 7, 2022

/retest

@tsorya
Copy link
Contributor Author

tsorya commented Dec 8, 2022

/retest-required

2 similar comments
@tsorya
Copy link
Contributor Author

tsorya commented Dec 8, 2022

/retest-required

@tsorya
Copy link
Contributor Author

tsorya commented Dec 8, 2022

/retest-required

@Miciah
Copy link
Contributor

Miciah commented Dec 8, 2022

e2e-aws-operator failed on deprovisioning, which is a known problem with CI on AWS. Because the job failed due to a known issue that is unrelated to this PR and because the e2e-azure-operator and e2e-gcp-operator jobs succeeded, I'm overriding the e2e-aws-operator job.
/override ci/prow/e2e-aws-operator

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 8, 2022

@Miciah: Overrode contexts on behalf of Miciah: ci/prow/e2e-aws-operator

In response to this:

e2e-aws-operator failed on deprovisioning, which is a known problem with CI on AWS. Because the job failed due to a known issue that is unrelated to this PR and because the e2e-azure-operator and e2e-gcp-operator jobs succeeded, I'm overriding the e2e-aws-operator job.
/override ci/prow/e2e-aws-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 8, 2022

@tsorya: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@lihongan
Copy link
Contributor

/label qe-approved
follow pre-merge testing and tested with cluster-bot (launch openshift/cluster-ingress-operator#858) and passed, we can see the new nodeAffinity is added to router-default deployment, see

oc -n openshift-ingress get deployment/router-default -oyaml
---
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node.openshift.io/remote-worker
                operator: NotIn
                values:
                - ""

after adding label node.openshift.io/remote-worker= to one of the worker and scale ingresscontroller/default to 3, then we can see one of router pods cannot be scheduled to that node

$ oc -n openshift-ingress get pod -owide
NAME                              READY   STATUS    RESTARTS   AGE     IP            NODE                                        NOMINATED NODE   READINESS GATES
router-default-574bd8cd79-6vzfm   0/1     Pending   0          4m40s   <none>        <none>                                      <none>           <none>
router-default-574bd8cd79-mgd9b   1/1     Running   0          49m     10.131.0.6    ip-10-0-133-12.us-east-2.compute.internal   <none>           <none>
router-default-574bd8cd79-xv992   1/1     Running   0          6m42s   10.129.2.17   ip-10-0-128-53.us-east-2.compute.internal   <none>           <none>

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Dec 14, 2022
@danmacpherson
Copy link

/label docs-approved

@openshift-ci openshift-ci bot added the docs-approved Signifies that Docs has signed off on this PR label Dec 15, 2022
@CFields651
Copy link

/label px-approved

@openshift-ci openshift-ci bot added the px-approved Signifies that Product Support has signed off on this PR label Dec 19, 2022
@openshift-merge-robot openshift-merge-robot merged commit 560b2d3 into openshift:master Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants