Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator restructure #8

Merged
merged 2 commits into from
Dec 21, 2018
Merged

Conversation

mrogers950
Copy link
Contributor

Use the cluster-osin-operator repo layout and boilerplate (with minor modifications).
@openshift/sig-auth

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 19, 2018
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 19, 2018
@mrogers950 mrogers950 force-pushed the refactor branch 2 times, most recently from 21832e9 to 53f0343 Compare December 19, 2018 22:51
@mrogers950
Copy link
Contributor Author

@enj fixed the roles, and disabled authn/authz delegation in the default config. I also reverted my changes to the boilerplate and used controller.New like you suggested..

- apiGroups:
- ""
resources:
- pods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these required because of the controller command's default behavior?

@enj
Copy link
Contributor

enj commented Dec 20, 2018

Going to tag because I have a feeling that changing #8 (comment) will require messing with the controller command bits.

/lgtm
/retest

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 20, 2018
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

func NewMachineApproverOperatorCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "machine-approver",
Short: "OpenShift osin OAuth server operator",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder to update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you didn't fix this.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@mrogers950
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 21, 2018
@enj
Copy link
Contributor

enj commented Dec 21, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 21, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enj, mrogers950

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 78714da into openshift:master Dec 21, 2018
@smarterclayton
Copy link
Contributor

@smarterclayton
Copy link
Contributor

I would have liked to see a lot more green runs before this merged, given that it was a clean controller. It looks like you got exactly 1 pass after failing 29 times in a row - that alone is an eyebrow raiser.

In general, whenever we rewrite / refactor a controller, please ensure you see multiple green e2e-aws runs before merging. If I see continued flakes in the next few days, and it looks like this controller is at fault, I will be reverting this (unless you can point to a different cause).

@@ -0,0 +1,131 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have a build-rpms script?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inert junk from SSCS.

@@ -0,0 +1,47 @@
# junitreport
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are no longer supposed to vendor this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junk some SSCS.

@@ -0,0 +1,28 @@
package operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have a package called boilerplate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my personal variation on the controller / operator sync loop. It is used today by the following operators:

  1. Alfred
  2. Console
  3. Authentication
  4. Here

My goal was to add it to library-go once I felt it was "good enough."

FROM registry.svc.ci.openshift.org/openshift/release:golang-1.10 AS builder
COPY . /go/src/github.com/openshift/cluster-machine-appover
RUN cd /go/src/github.com/openshift/cluster-machine-appover && go build -o machine-approver .
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This less consistent with our default pattern than the previous (which was also not correct)

See https://github.com/openshift/cluster-version-operator/blob/master/Dockerfile for the standard format, you should only have to change the RUN line in the builder and insert your custom steps after the COPY line in the runtime image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@smarterclayton
Copy link
Contributor

W1221 23:01:41.804742       1 cmd.go:127] Using insecure, self-signed certificates
I1221 23:01:41.805108       1 crypto.go:459] Generating new CA for cluster-machine-approver-signer@1545433301 cert, and key in /tmp/serving-cert-143867860/serving-signer.crt, /tmp/serving-cert-143867860/serving-signer.key
I1221 23:01:42.028512       1 crypto.go:536] Generating server certificate in /tmp/serving-cert-143867860/tls.crt, key in /tmp/serving-cert-143867860/tls.key
I1221 23:01:42.288075       1 observer_polling.go:37] Adding reactor for file "/var/run/configmaps/config/operator-config.yaml"
I1221 23:01:42.288156       1 observer_polling.go:37] Adding reactor for file "/var/run/secrets/serving-cert/tls.crt"
I1221 23:01:42.288179       1 observer_polling.go:37] Adding reactor for file "/var/run/secrets/serving-cert/tls.key"
I1221 23:01:42.289040       1 observer_polling.go:96] Starting file observer
panic: unable to obtain replicaset reference for events: Get https://127.0.0.1:6443/api/v1/namespaces/openshift-cluster-machine-approver/pods: dial tcp 127.0.0.1:6443: connect: connection refused

goroutine 1 [running]:
github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerBuilder).Run(0xc420532240, 0xc4200e20f0, 0xc4200941e0, 0x3, 0x3)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd/builder.go:155 +0xabd
github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerCommandConfig).StartController(0xc4205f8840, 0xc420094120, 0x247c9d8, 0x247c9d8)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd/cmd.go:172 +0x557
github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerCommandConfig).NewCommand.func1(0xc4200d7b80, 0xc4206966e0, 0x0, 0x2)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd/cmd.go:67 +0x265
github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra.(*Command).execute(0xc4200d7b80, 0xc420696680, 0x2, 0x2, 0xc4200d7b80, 0xc420696680)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra/command.go:760 +0x2c1
github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc4200d7900, 0x1, 0xc4200d7900, 0xc4200d7b80)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra/command.go:846 +0x30a
github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra.(*Command).Execute(0xc4200d7900, 0x181d990, 0x245b620)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra/command.go:794 +0x2b
main.main()
	/go/src/github.com/openshift/cluster-machine-approver/cmd/machine-approver/main.go:29 +0x187

in
https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/2708/artifacts/release-e2e-aws/pods/openshift-cluster-machine-approver_machine-approver-5bb5cb965d-28ljh_machine-approver-controller_previous.log.gz

After the restart it looks like you are renewing your lease incredibly often:

I1221 23:05:37.153030       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1221 23:05:39.231941       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1221 23:05:41.440786       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1221 23:05:43.456531       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1221 23:05:45.581277       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1221 23:05:47.651659       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1

Please fix that (and fix the library-go default if that's at fault). This controller doesn't even need really need a lock.

@enj
Copy link
Contributor

enj commented Dec 22, 2018

W1221 23:01:41.804742       1 cmd.go:127] Using insecure, self-signed certificates
I1221 23:01:41.805108       1 crypto.go:459] Generating new CA for cluster-machine-approver-signer@1545433301 cert, and key in /tmp/serving-cert-143867860/serving-signer.crt, /tmp/serving-cert-143867860/serving-signer.key
I1221 23:01:42.028512       1 crypto.go:536] Generating server certificate in /tmp/serving-cert-143867860/tls.crt, key in /tmp/serving-cert-143867860/tls.key
I1221 23:01:42.288075       1 observer_polling.go:37] Adding reactor for file "/var/run/configmaps/config/operator-config.yaml"
I1221 23:01:42.288156       1 observer_polling.go:37] Adding reactor for file "/var/run/secrets/serving-cert/tls.crt"
I1221 23:01:42.288179       1 observer_polling.go:37] Adding reactor for file "/var/run/secrets/serving-cert/tls.key"
I1221 23:01:42.289040       1 observer_polling.go:96] Starting file observer
panic: unable to obtain replicaset reference for events: Get https://127.0.0.1:6443/api/v1/namespaces/openshift-cluster-machine-approver/pods: dial tcp 127.0.0.1:6443: connect: connection refused

goroutine 1 [running]:
github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerBuilder).Run(0xc420532240, 0xc4200e20f0, 0xc4200941e0, 0x3, 0x3)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd/builder.go:155 +0xabd
github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerCommandConfig).StartController(0xc4205f8840, 0xc420094120, 0x247c9d8, 0x247c9d8)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd/cmd.go:172 +0x557
github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerCommandConfig).NewCommand.func1(0xc4200d7b80, 0xc4206966e0, 0x0, 0x2)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/openshift/library-go/pkg/controller/controllercmd/cmd.go:67 +0x265
github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra.(*Command).execute(0xc4200d7b80, 0xc420696680, 0x2, 0x2, 0xc4200d7b80, 0xc420696680)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra/command.go:760 +0x2c1
github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc4200d7900, 0x1, 0xc4200d7900, 0xc4200d7b80)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra/command.go:846 +0x30a
github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra.(*Command).Execute(0xc4200d7900, 0x181d990, 0x245b620)
	/go/src/github.com/openshift/cluster-machine-approver/vendor/github.com/spf13/cobra/command.go:794 +0x2b
main.main()
	/go/src/github.com/openshift/cluster-machine-approver/cmd/machine-approver/main.go:29 +0x187

I still do not understand the 127.0.0.1:6443 bit. I would expect something like ci-op-rx4tgcrl-3361d-api.origin-ci-int-aws.dev.rhcloud.com:6443.

in
https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/2708/artifacts/release-e2e-aws/pods/openshift-cluster-machine-approver_machine-approver-5bb5cb965d-28ljh_machine-approver-controller_previous.log.gz

After the restart it looks like you are renewing your lease incredibly often:

I1221 23:05:37.153030       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1221 23:05:39.231941       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1221 23:05:41.440786       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1221 23:05:43.456531       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1221 23:05:45.581277       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1221 23:05:47.651659       1 leaderelection.go:209] successfully renewed lease openshift-cluster-machine-approver/cluster-machine-approver-lock
I1

Please fix that (and fix the library-go default if that's at fault). This controller doesn't even need really need a lock.

Yeah,

https://github.com/openshift/library-go/blob/077f461517a8617b5bdb5088afa7b451dd612b15/pkg/config/leaderelection/leaderelection.go#L83-L85

@enj
Copy link
Contributor

enj commented Dec 22, 2018

Ah nvm, this runs with hostNetwork: true so the local IP makes sense. I think we may just need a custom controller command to better account for this controller's requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants