Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add affinity customization to catalog source grpc pods #279

Merged

Conversation

Jamstah
Copy link
Contributor

@Jamstah Jamstah commented Mar 5, 2023

As we move closer to multi-architecture clusters being a thing, we'd like a way to mark our catalog source pods for the architectures they have been built on, which tends to be the architectures the operators support. This isn't possible with node selectors because they can only handle one label value, so making this PR to add affinity.

We could just add the node affinity section, but it made sense to me to do it at the same level as the other pod configuration options.

Once merged, this would need to be followed by the change in OLM to use the new field. Code for that is here: operator-framework/operator-lifecycle-manager@master...Jamstah:operator-lifecycle-manager:catalog-source-affinity

Copy link
Member

@dinhxuanvu dinhxuanvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR makes sense based on what you described in the commit. However, I assume this is the first PR to simply update the catalogsource API and then another PR will be opened on OLM side to actually add the Affinity to the catalog pods?

@Jamstah
Copy link
Contributor Author

Jamstah commented Mar 6, 2023

The PR makes sense based on what you described in the commit. However, I assume this is the first PR to simply update the catalogsource API and then another PR will be opened on OLM side to actually add the Affinity to the catalog pods?

Yes, see here for the tested code: operator-framework/operator-lifecycle-manager@master...Jamstah:operator-lifecycle-manager:catalog-source-affinity

Didn't open as a pr because it has modified vendor code, when this one merges i can open that cleanly.

@Jamstah
Copy link
Contributor Author

Jamstah commented Mar 13, 2023

@dinhxuanvu I'd like to make sure this one gets into 4.14 because I believe that's where multi-arch clusters will come in. Is there something that needs doing before test/merge?

@dinhxuanvu
Copy link
Member

@awgreene PTAL

Signed-off-by: James Hewitt <james.hewitt@uk.ibm.com>
@Jamstah
Copy link
Contributor Author

Jamstah commented Mar 14, 2023

I'm not sure how my make generate modified that comment, but rerunning make generate put it back and now make verify is clean, so tests should pass.

@Jamstah
Copy link
Contributor Author

Jamstah commented Apr 4, 2023

@awgreene @dinhxuanvu what are the next steps here?

@dinhxuanvu
Copy link
Member

@Jamstah The OLM team needs to review and merge this PR.
@awgreene PTAL

@jaypoulz
Copy link
Contributor

I spoke with the OLM folks today about this PR. The team would like to minimize the number of changes being made to this API. While we're thankful for this contribution, we'd like to encourage you to explore what's already possible between nodeSelector and taints/tolerances. While the solution might not be as elegant, you should already have sufficient control over the scheduling of the catalog pod.

CC @joelanford @awgreene @kevinrizza for further comment.

@Jamstah
Copy link
Contributor Author

Jamstah commented Apr 12, 2023

With what is available it is not possible to take a catalog that works for exactly two architectures and only allow it to run on those two architectures without changing the labelling of nodes specifically for those two architectures.

What exists today is not a complete solution. Is there a reason not to extend the API?

@jaypoulz
Copy link
Contributor

I'm not on the team, so I cannot explain the underlying reasons. One of the folks linked above can speak to that better.

However, I do know the team is fully engaged in building the next generation of OLM, so I suspect this is a resource-constraint decision to invest in the replacement rather than diverting attention to the thing being replaced. This is speculation on my part, so I defer to the wisdom of my peers mentioned above.

@Jamstah
Copy link
Contributor Author

Jamstah commented Apr 14, 2023

That is possible, but until the next generation of olm is released and we have adopted it, there are going to be things we need in the current generation. This request is based on a change in the ecosystem, specifically open shift starting to support multi-arch clusters.

As I am contributing the code and tests, I would hope that any resource constraint issue is less pronounced.

@Jamstah
Copy link
Contributor Author

Jamstah commented Apr 24, 2023

With what is available it is not possible to take a catalog that works for exactly two architectures and only allow it to run on those two architectures without changing the labeling of nodes specifically for those two architectures.

I'm still waiting to hear what the suggested solution to this problem is without this change to the API - can anyone help?

If this change is a definite no from the team, I'd like to hear it.

@dinhxuanvu
Copy link
Member

I understand the desire to minimize the API changes to maintain the stability. However, I do not see this change as a breaking change given it is optional and doesn't require a API version bump. I think accepting this change has minimal impact on API stability. Plus, while nodeSelection is the similar concept with Affinity, Affinity has more control over selection logic that will be helpful for James' use case.

@awgreene
Copy link
Member

awgreene commented May 3, 2023

Yeesh sorry about the delay @Jamstah, I'll review today.

@awgreene
Copy link
Member

awgreene commented May 4, 2023

/approve

@openshift-ci
Copy link

openshift-ci bot commented May 4, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awgreene, Jamstah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 4, 2023
@awgreene
Copy link
Member

awgreene commented May 4, 2023

/hold

I'd like to make sure that @joelanford reviews and lgtm/unholds this PR as I know there was a discussion on multi-arch recently that I wasn't able to attend.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 4, 2023
@awgreene
Copy link
Member

awgreene commented May 4, 2023

Checked with Joe, got the go-ahead
/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 4, 2023
@Jamstah
Copy link
Contributor Author

Jamstah commented May 4, 2023

Awesome, thanks. Once it merges I'll make the PR for olm to use it.

@joelanford
Copy link
Member

/lgtm
/unhold

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 4, 2023
@joelanford joelanford merged commit ce75af3 into operator-framework:master May 4, 2023
@Jamstah
Copy link
Contributor Author

Jamstah commented May 5, 2023

Follow on PR for OLM: operator-framework/operator-lifecycle-manager#2963

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants