Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use library-go ApplyDeployment and ApplyDaemonset #2882

Closed
wants to merge 1 commit into from

Conversation

mkenigs
Copy link
Contributor

@mkenigs mkenigs commented Dec 16, 2021

Use library-go ApplyDeployment and ApplyDaemonset

Our current functions in ./lib are out of date and we should be using
library-go instead

The library-go functions require passing the generation of the
previously applied deployment/daemonset, and this should be stored on an
MCO CRD. Because this currently doesn't exist, the existing generation
is passed as generation. This is no worse than how MCO's current
functions work, so the library-go functions should still be used until
an MCO CRD is created.

Files with the now unused functions and unused helper functions were
removed:
resourceapply/apps.go
resourcemerge/{apps.go,apps_test.go,core.go,core_test.go}

Helps #819
Depends #2833

@mkenigs
Copy link
Contributor Author

mkenigs commented Dec 16, 2021

@deads2k does this look right?

@mkenigs
Copy link
Contributor Author

mkenigs commented Dec 16, 2021

/cc @yuqi-zhang

@mkenigs mkenigs changed the title Use library-go ApplyDeployment and ApplyDaemonset Bug 2034364: Use library-go ApplyDeployment and ApplyDaemonset Jan 4, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 4, 2022

@mkenigs: An error was encountered querying GitHub for users with public email (rioliu@redhat.com) for bug 2034364 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. non-200 OK status code: 403 Forbidden body: "{\n \"documentation_url\": \"https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits\",\n \"message\": \"You have exceeded a secondary rate limit. Please wait a few minutes before you try again.\"\n}\n"

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

Bug 2034364: Use library-go ApplyDeployment and ApplyDaemonset

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 7, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkenigs
To complete the pull request process, please assign kikisdeliveryservice after the PR has been reviewed.
You can assign the PR to them by writing /assign @kikisdeliveryservice in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 10, 2022
@openshift-ci openshift-ci bot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jan 10, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 10, 2022

@mkenigs: This pull request references Bugzilla bug 2034364, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request.

In response to this:

Bug 2034364: Use library-go ApplyDeployment and ApplyDaemonset

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

if err != nil {
return err
}
if updated {
optr.daemonsetGeneration = daemonset.Generation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this get persisted so when you restart you have it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it lasts as long as the MCO process. Are you saying it needs to persist beyond that?

if err != nil {
return err
}
if updated {
optr.deploymentGeneration = deployment.Generation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this get persisted so when you restart you have it?

@deads2k
Copy link
Contributor

deads2k commented Jan 10, 2022

just the question about persistence to ensure that the right happens on restart. Looks good otherwise.

@deads2k
Copy link
Contributor

deads2k commented Jan 10, 2022

example from openshiftapiserver status

status
  generations:
  - group: apps
    hash: ""
    lastGeneration: 5
    name: apiserver
    namespace: openshift-apiserver
    resource: deployments

@yuqi-zhang
Copy link
Contributor

@mkenigs and I did some investigating on storing generations properly to use for apply functions, and this is what we've determined (please correct any mistakes):

Using OpenshiftAPIServer object as an example (i.e. oc describe OpenShiftAPIServer/cluster), generations is supposed to be stored on some object the operator manages.

The first issue here is, the MCO never had any object tracking daemonset/deployment object generations like this.

The second issue is, the MCO doesn't actually have an equivalent object to track with. The MCO creates the following CRDS:

  • Containerruntimeconfig/Kubeletconfig

  • MachineConfigs

  • MachineConfigPools

  • ControllerConfig

Of these CRDs we don't have a direct equivalent of what OpenshiftAPIServer is in the above context. The closest is ControllerConfig which is a cluster level config but only used by the MCO (mayhaps naming is not very apt), and is used by the operator pod to read cluster objects (e.g. proxy, osimage) and populate for the machine-config-controller to render into templates for the machine-config-daemons.

So when that in mind, our options seem to be:

  1. For now, store the generations we last acted on in the operator Go struct. This is the least optimal since every time the pod restarts, this would get reset which triggers a forced apply (which shouldn't cause any issues otherwise), but normal syncs would not trigger applies
  2. Every time we want to apply, read the observedGeneration of the object first, because as far as I understand that is basically what the library-go functions are asking for: the last generation we acted upon. If this is an ok path, we wouldn't have to rework any object to store generations data
  3. Combine 1 and 2 so we read on pod start and then sync via operator. This saves a read operations every sync
  4. Add daemonset/deployment data to ControllerConfig and store generations properly there, although it is not what the object was originally created to do
  5. Create a new CRD (OpenshiftMachineConfig...?) that for now, is just doing the generations part of the equivalent of OpenShiftAPIServer

Not sure which approach is best for our case.

@cgwalters
Copy link
Member

The closest is ControllerConfig which is a cluster level config but only used by the MCO (mayhaps naming is not very apt), and is used by the operator pod to read cluster objects (e.g. proxy, osimage) and populate for the machine-config-controller to render into templates for the machine-config-daemons.

Tangentially related, I keep thinking of ControllerConfig as basically a private channel between the operator pod and the controller pod. Which could go away entirely if they were in the same process.

@deads2k
Copy link
Contributor

deads2k commented Jan 27, 2022

  • For now, store the generations we last acted on in the operator Go struct. This is the least optimal since every time the pod restarts, this would get reset which triggers a forced apply (which shouldn't cause any issues otherwise), but normal syncs would not trigger applies
  • Every time we want to apply, read the observedGeneration of the object first, because as far as I understand that is basically what the library-go functions are asking for: the last generation we acted upon. If this is an ok path, we wouldn't have to rework any object to store generations data
  • Combine 1 and 2 so we read on pod start and then sync via operator. This saves a read operations every sync
  • Add daemonset/deployment data to ControllerConfig and store generations properly there, although it is not what the object was originally created to do
  • Create a new CRD (OpenshiftMachineConfig...?) that for now, is just doing the generations part of the equivalent of OpenShiftAPIServer

I suggest creating a CRD to start the equivalent of openshiftapiserver.operator.openshift.io. This would also provide a spot for operand loglevel, operator loglevel, unsupported overrides, detailed status to add additional degraded loops.

Kicking deployments and daemonsets on every process restart isn't a state we should ship.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

The requirements for Bugzilla bugs have changed (BZs linked to PRs on master branch need to target OCP 4.11), recalculating validity.

@openshift-ci openshift-ci bot added bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. and removed bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jan 28, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 28, 2022

@openshift-bot: This pull request references Bugzilla bug 2034364, which is invalid:

  • expected the bug to target the "4.11.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

The requirements for Bugzilla bugs have changed (BZs linked to PRs on master branch need to target OCP 4.11), recalculating validity.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2022

@mkenigs: This pull request references Bugzilla bug 2034364, which is invalid:

  • expected the bug to target the "4.11.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 2034364: Use library-go ApplyDeployment and ApplyDaemonset

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jan 31, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2022

@mkenigs: This pull request references Bugzilla bug 2034364, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.11.0) matches configured target release for branch (4.11.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Our current functions in ./lib are out of date and we should be using
library-go instead

The library-go functions require passing the generation of the
previously applied deployment/daemonset, and this should be stored on an
MCO CRD. Because this currently doesn't exist, the existing generation
is passed as generation. This is no worse than how MCO's current
functions work, so the library-go functions should still be used until
an MCO CRD is created.

Files with the now unused functions and unused helper functions were
removed:
resourceapply/apps.go
resourcemerge/{apps.go,apps_test.go,core.go,core_test.go}

Helps openshift#819
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2022

@mkenigs: An error was encountered querying GitHub for users with public email (rioliu@redhat.com) for bug 2034364 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. non-200 OK status code: 403 Forbidden body: "{\n \"documentation_url\": \"https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits\",\n \"message\": \"You have exceeded a secondary rate limit. Please wait a few minutes before you try again.\"\n}\n"

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

Bug 2034364: Use library-go ApplyDeployment and ApplyDaemonset

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member

wking commented Jan 31, 2022

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2022

@wking: This pull request references Bugzilla bug 2034364, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.11.0) matches configured target release for branch (4.11.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mkenigs
Copy link
Contributor Author

mkenigs commented Feb 1, 2022

/hold
Going to wait for https://issues.redhat.com/browse/MCO-163 to do this correctly

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 1, 2022
@kikisdeliveryservice kikisdeliveryservice changed the title Bug 2034364: Use library-go ApplyDeployment and ApplyDaemonset Use library-go ApplyDeployment and ApplyDaemonset Mar 7, 2022
@openshift-ci openshift-ci bot removed bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Mar 7, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 7, 2022

@mkenigs: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

Use library-go ApplyDeployment and ApplyDaemonset

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice
Copy link
Contributor

/hold Going to wait for https://issues.redhat.com/browse/MCO-163 to do this correctly

Note: I removed the BZ as this is going to be worked on via the above jira issue and so that the underlying BZ can be closed based on the other 2 PRs that merged.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 24, 2022

@mkenigs: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi e65917a link false /test e2e-metal-ipi
ci/prow/e2e-aws-single-node e65917a link false /test e2e-aws-single-node
ci/prow/e2e-ovn-step-registry e65917a link false /test e2e-ovn-step-registry
ci/prow/e2e-aws-serial e65917a link false /test e2e-aws-serial
ci/prow/e2e-aws-workers-rhel7 e65917a link false /test e2e-aws-workers-rhel7
ci/prow/e2e-gcp-op-single-node e65917a link false /test e2e-gcp-op-single-node
ci/prow/e2e-aws-disruptive e65917a link false /test e2e-aws-disruptive
ci/prow/e2e-vsphere-upgrade e65917a link false /test e2e-vsphere-upgrade
ci/prow/e2e-aws-workers-rhel8 e65917a link false /test e2e-aws-workers-rhel8
ci/prow/e2e-agnostic-upgrade e65917a link true /test e2e-agnostic-upgrade
ci/prow/e2e-gcp-op e65917a link true /test e2e-gcp-op
ci/prow/e2e-aws e65917a link true /test e2e-aws
ci/prow/e2e-aws-upgrade-single-node e65917a link false /test e2e-aws-upgrade-single-node
ci/prow/okd-e2e-aws e65917a link false /test okd-e2e-aws
ci/prow/4.12-upgrade-from-stable-4.11-images e65917a link true /test 4.12-upgrade-from-stable-4.11-images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 24, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 24, 2022

@mkenigs: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 22, 2022
@mkenigs
Copy link
Contributor Author

mkenigs commented Jun 22, 2022

/lifecycle frozen

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 22, 2022

@mkenigs: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 23, 2022
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Aug 22, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 22, 2022

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants