Bug 1928581: validate the proxy by trying oc image info #2539

QiWang19 · 2021-04-20T17:12:00Z

Try use skopeo inspect image using http proxy config for proxy validation. If the skopeo command fails, do not render the proxy.

- What I did

Close Bug 1928581 https://bugzilla.redhat.com/show_bug.cgi?id=1928581

- How to verify it

- Description for the changelog

Add HTTP proxy validation.

openshift-ci-robot · 2021-04-20T17:12:08Z

@QiWang19: This pull request references Bugzilla bug 1928581, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.8.0) matches configured target release for branch (4.8.0)
bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (schoudha@redhat.com), skipping review request.

In response to this:

Bug 1928581: validate the proxy by trying image pull

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2021-04-20T17:12:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: QiWang19
To complete the pull request process, please assign kikisdeliveryservice after the PR has been reviewed.
You can assign the PR to them by writing /assign @kikisdeliveryservice in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/controller/bootstrap/bootstrap.go

sgreene570 · 2021-04-21T18:53:22Z

What about proxy config changes made as a day 2 operation? Does the podman "test pull" code in this PR verify that image pulls work when modifying the cluster proxy config after installation?

QiWang19 · 2021-06-30T15:06:48Z

/retest

QiWang19 · 2021-07-07T21:13:12Z

/retest

sinnykumari · 2021-07-08T08:13:03Z

Considering that proxy config is a global config in the cluster, shouldn't this be fixed at source i.e checking for proxy config when it gets applied/updated to the cluster? Consumers like MCO comes into picture later on.

QiWang19 · 2021-07-08T15:37:39Z

/retest

QiWang19 · 2021-07-08T15:41:51Z

Considering that proxy config is a global config in the cluster, shouldn't this be fixed at source i.e checking for proxy config when it gets applied/updated to the cluster? Consumers like MCO comes into picture later on.

@sinnykumari From the Bugzilla discussion, when applying to the cluster the proxy will be validated by CNO. comment 17. It is the not a MCO change.
or could you give me a pointer where would be a proper place to validate it, looks like?

machine-config-operator/lib/resourceapply/machineconfig.go

Line 14 in 01f5f79

    
           func ApplyMachineConfig(client mcfgclientv1.MachineConfigsGetter, required *mcfgv1.MachineConfig) (*mcfgv1.MachineConfig, bool, error) {

sinnykumari · 2021-07-08T16:27:34Z

@sinnykumari From the Bugzilla discussion, when applying to the cluster the proxy will be validated by CNO. comment 17. It is the not a MCO change.

ah ok, I may have been confused then because this PR is making changes to MCO's MCC bootstrap mode. If CNO is going to validate the proxy (which makes sense to me), shouldn't this validation code be in the CNO repo?

QiWang19 · 2021-07-08T17:19:15Z

If the bootstrap mode has an invalid proxy, the CNO pod will fail to launch since the CNO images cannot be pulled down.

QiWang19 · 2021-07-09T17:06:34Z

@sinnykumari From the Bugzilla discussion, when applying to the cluster the proxy will be validated by CNO. comment 17. It is the not a MCO change.

ah ok, I may have been confused then because this PR is making changes to MCO's MCC bootstrap mode. If CNO is going to validate the proxy (which makes sense to me), shouldn't this validation code be in the CNO repo?

@sinnykumari PTAL. Add validation for places after the installation.

QiWang19 · 2021-07-10T02:50:19Z

/retest

Signed-off-by: Qi Wang <qiwan@redhat.com>

QiWang19 · 2022-10-17T15:44:20Z

/retest-required

QiWang19 · 2022-10-18T16:39:34Z

@yuqi-zhang @palonsoro Could you review? The new commit can install openshift-client and exec oc command to fetch CNO image pull spec.

palonsoro · 2022-10-19T11:42:47Z

@QiWang19 it looks good to me. Thanks!

cgwalters · 2022-10-19T20:31:20Z

Dockerfile

@@ -12,6 +12,10 @@ COPY --from=builder /go/src/github.com/openshift/machine-config-operator/instroo
 RUN cd / && tar xf /tmp/instroot.tar && rm -f /tmp/instroot.tar
 COPY install /manifests

+RUN dnf -y update && dnf -y reinstall shadow-utils && \
+dnf -y install skopeo && dnf -y install openshift-clients && \


It's a bit unfortunate to add a whole new copy of skopeo and oc into the image, because we have them right there on the host too...
(And for that matter, it's actually useful to validate the proxy configuration from the host network namespace since that's where most image pulling will be happening)
Tricky to deal with without making the MCC privileged enough for host mounts though. But OTOH, the MCC really is privileged in a cluster sense anyways, so making it a privileged container (at least enough for host mounts) isn't really adding any new attack surface.

@cgwalters I agree with the host network part, you made a great point here, that may be something worth considering.

However, regarding including the binaries, that's a usual burden that we are already paying a lot of times in a lot of components, nothing that should be surprising to us if we have OCP4 consisting of a number of clusteroperators which deploy a number of operands, almost all of them running inside containers.

Making the MCC deployment require access to the host and require the host to always have these binaries available, even when not crazy in practice, goes much against the spirit of having every component in a container so it is, well, self-contained.

A possible way of improvement here would be to make MCO image derive from the tools image shipped in the release instead of base, because the tools image includes the correct version of oc already. Other images that require oc already do it and benefit from image layer de-duplication in what regards the oc client.

A possible way of improvement here would be to make MCO image derive from the tools image shipped in the release instead of base, because the tools image includes the correct version of oc already.

Yeah, that'd help, but doesn't get us out of also shipping skopeo, which today also vendors large parts of the container runtime again.

Hmm, do we actually need to use skopeo vs just forking oc?

Yes, we do. oc is only used to find out which image must be checked (CNO image). Once found, the check is done with skopeo (oc doesn't provide a way to do it).

Wouldn't just running e.g. oc image info be sufficient?

mmm, as long as executing this successfully guarantees a successful pull, it would.

oc today vendors the docker Go library for interacting with registries, whereas skopeo uses the github.com/containers/image bits. But ultimately...I can't imagine a case where one worked but not the other.

Today oc's fetching is kind of load-bearing because it's where we have e.g. oc image mirror etc that people use for disconnected.

I can't imagine a case where oc succeeds but skopeo (and/or podman/cri-o) would fail with respect to the proxy.

I agree, we can just run oc image info

yuqi-zhang · 2022-10-19T23:27:39Z

Sorry for the delay. I think I would need to check in with the team on this as a more general discussion on how to proceed.

yuqi-zhang · 2022-10-20T13:58:07Z

/retest-required

jkyros · 2022-10-20T18:17:29Z

pkg/operator/sync.go

+		} else if err != nil {
+			return err
+		}
+		if err := ctrlcommon.ProxyValidation(&proxy.Status, clusterversionCfg.Status.Desired.Version, icspRules); err != nil {


I feel like you could talk me into putting this in a separate place for "sanity checks" before config rollout/eviction/drain, but I have concerns about the proposed location in SyncRenderConfig() -- if the proxyconfig fails this test for whatever reason, we don't get a RenderConfig, which prevents the rest of the sync functions from running, and that is problematic for general cluster stability during normal operations (among other things, it affects certificate rotations).

Specifically, in a case where the proxy was "valid" when it was configured, but is down/unreachable/etc at the time of the check, the MCO would degrade, wouldn't it?

I don't know that I have a spot picked out where it should go, because the MCO has typically not accepted preflight checks of this nature, but I'm kind of sympathetic here 😄

@yuqi-zhang could you help locate where MCO deploys the proxy settings to the Nodes so that the proxyvalidatoin can be done there?

Right, so the exact details are a bit up to debate. The way it's set up in this PR right now is blocking at the operator level which is potentially dangerous for reasons John has listed, and I agree that we should probably think about moving this to a consolidated "checking" location.

So, if we want to achieve the point of "don't roll out the proxy to nodes unless it most likely works", then it likely will have to happen at the controller level, between:

https://github.com/openshift/machine-config-operator/blob/master/pkg/controller/render/render_controller.go#L546, where the rendered MC is generated for a pool

https://github.com/openshift/machine-config-operator/blob/master/pkg/controller/node/node_controller.go#L846, where the config is rolled out to the MCP

So then this would be something like validateIncomingRenderedConfig before it gets rolled out to the pool, somehow, where right now we just validate the proxy, but could be extended as a node-specific pre-flight check of some sort. We could even have a flag that enables/disables this, if we don't want to change default behaviour.

The other side of this issue is, as we move towards layering, what if I built a new format OS image with a proxy built into it somehow? This validation path would not catch that if done directly in the image.

One last extension thought on validation that's a bit more encompassing: have a (flag enabled?) option to create an extra canary node on incoming updates to see if that node would break, before upgrading the rest of the nodes. That's a bit too far though.

In summary, I think this is a pretty complex topic. Right now I think maybe the safest option is at the controller level, but how exactly that would be done is a bit up in the air

The other side of this issue is, as we move towards layering, what if I built a new format OS image with a proxy built into it somehow? This validation path would not catch that if done directly in the image.

Ultimately I think what we want is ostreedev/ostree#2725 - basically, we try booting the new configuration, and roll back if kubelet isn't able to start.

@yuqi-zhang Please review, need help with the operator code auto-generation regarding to the func (f *fixture) newController() (pkg/controller/render/render_controller_test.go)

need help with the operator code auto-generation regarding to the func (f *fixture) newController() (pkg/controller/render/render_controller_test.go)

Sorry, I don't quite follow, what is the issue with the code autogen?

Signed-off-by: Qi Wang <qiwan@redhat.com>

openshift-ci · 2022-10-24T18:55:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: QiWang19, rphillips
Once this PR has been reviewed and has the lgtm label, please assign cgwalters for approval by writing /assign @cgwalters in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yuqi-zhang · 2022-10-24T21:35:59Z

pkg/controller/bootstrap/bootstrap.go

@@ -183,6 +183,12 @@ func (b *Bootstrap) Run(destDir string) error {
 		configs = append(configs, kconfigs...)
 	}

+	if releaseVersion, ok := cconfig.Annotations[ctrlcommon.ReleaseImageVersionAnnotationKey]; ok {
+		if err := ctrlcommon.ProxyValidation(cconfig.Spec.Proxy, releaseVersion, icspRules); err != nil {


Hmm, just for my own curiosity, this will check via the bootstrap network on the bootstrap node right?

Is there a possibility that the bootstrap network is different? Does it even use the proxy you provide to the cluster?

Bootstrap network will be the masters network in most if not all the cases.
For example, in on-prem environments where the keepalived VIP is deployed, the kube-apiserver VIP is first assigned to the bootstrap and eventually moves to one of the masters, so bootstrap and masters must be in the same subnet for that to happen.

yuqi-zhang · 2022-10-24T21:37:07Z

pkg/controller/common/helpers.go

+	const (
+		tagName           = "cluster-network-operator"
+		imageInfo         = "adm release info %s --image-for %s"
+		imageInfoWithICSP = "adm release info %s --image-for %s --icsp-file %s"


Could you explain how the ICSP affects the imagespec here?

the oc command can accept the icsp file as an alternative source to retrieve the release image. https://github.com/openshift/oc/blob/3cdf3c29f0c109c94eb67124548a6b21fc5f6a22/pkg/cli/admin/release/info.go#L136.
If the icsp has been configured on the cluster I think we should pass them to the oc to get the release image and get the cno pull spec.

yuqi-zhang · 2022-10-24T21:39:16Z

pkg/operator/sync.go

@@ -301,6 +303,25 @@ func (optr *Operator) syncRenderConfig(_ *renderConfig) error {
 		}
 	}
 	spec.AdditionalTrustBundle = trustBundle
+	clusterversionCfg, err := optr.configClient.ConfigV1().ClusterVersions().Get(context.TODO(), "version", metav1.GetOptions{})


We are looking to remove these changes right?

yuqi-zhang · 2022-10-24T21:40:10Z

pkg/controller/render/render_controller.go

+	}
+	if releaseVersion, ok := cc.Annotations[ctrlcommon.ReleaseImageVersionAnnotationKey]; ok {
+		if err := ctrlcommon.ProxyValidation(cc.Spec.Proxy, releaseVersion, icspRules); err != nil {
+			return err


so if we err out here, I think we just don't generate the rendered config right? I feel like maybe we should still generate the rendered config, the have the node controller do the validation and fail there, so we can reference which rendered MC is failing

Put another way, we still would have an issue where the rendered config doesn't get generated if we do it here, I think?

yes, that error was out before the render config was generated and synced.
To let the node controller do the check, we can drop the validation from render_conller, and in the node controller we add validation before this line https://github.com/openshift/machine-config-operator/blob/master/pkg/controller/node/node_controller.go#L846, something like:

cconfigs, err := ctrl.ccLister.List(labels.Everything()) for _, cc := range cconfigs { retry(validation) }

what do you think?

I think somewhere in the sync MCP function could work.

Although, hmm, this does mean that every time we perform an update of any sort, for every node that gets synced, we re-check the proxy, and even for scenarios that don't have any changes to the proxy, we re-validate, which seems like... a lot of unnecessary work.

So in that case, maybe having it as we do now is better, but only validate on a change to the proxy between old and new?

What do you think @jkyros ? I'm leaning towards reducing the # of times we validate if there isn't a change in the proxy, just not sure where the best place to do so would be. From a logic perspective, I think maybe render controller is easier, but comes with the downside of not generating a new rendered MC.

I think in my view, the best place would be, after we generate the rendered MC, before we roll out to a MCP, we do a one time check if the current->desired MC contains a proxy change, before allowing the node controller to roll out. Such that if there is an error, the user would see a new rendered MC, but the MCP not start an upgrade due to it being degraded on checking for proxy (with a certain amount of retries). But I don't know how well that plugs into what we have now without it either 1. looking clunkly or 2. adding a whole new interface of some sort to do so

yuqi-zhang · 2022-10-24T21:42:20Z

pkg/controller/common/helpers.go

+		if strings.Contains(string(rawOut), proxyErr) {
+			return fmt.Errorf("invalid http proxy: %w: error running %s %s: %s", err, oc, strings.Join(args, " "), string(rawOut))
+		}
+		return fmt.Errorf("%w: error running %s %s: %s", err, oc, strings.Join(args, " "), string(rawOut))


Should we have some kind of retry here, for transient failures?

I guess we always retry via re-syncing technically, maybe it would be worth adding a requeue somewhere...

The issue I'm thinking about is, let's say the network is unstable, and we happen to fail the one validation, but the proxy is otherwise valid, what is the user experience there?

yes, I agree. we can retry to deal with the risks.

yuqi-zhang · 2022-10-24T21:43:24Z

pkg/operator/sync.go

+		} else if err != nil {
+			return err
+		}
+		if err := ctrlcommon.ProxyValidation(&proxy.Status, clusterversionCfg.Status.Desired.Version, icspRules); err != nil {


need help with the operator code auto-generation regarding to the func (f *fixture) newController() (pkg/controller/render/render_controller_test.go)

Sorry, I don't quite follow, what is the issue with the code autogen?

openshift-ci · 2022-10-24T22:39:37Z

@QiWang19: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-techpreview-featuregate	a6bc3173636256e2df7a32c9f3691503dcaaf9f6	link	`/test e2e-aws-techpreview-featuregate`
ci/prow/e2e-aws-single-node	a3b414110c85473a66c4b8f1b5a72b259dd43205	link	false	`/test e2e-aws-single-node`
ci/prow/e2e-vsphere-upgrade	a3b414110c85473a66c4b8f1b5a72b259dd43205	link	false	`/test e2e-vsphere-upgrade`
ci/prow/e2e-aws-serial	a3b414110c85473a66c4b8f1b5a72b259dd43205	link	false	`/test e2e-aws-serial`
ci/prow/e2e-aws-workers-rhel8	a3b414110c85473a66c4b8f1b5a72b259dd43205	link	false	`/test e2e-aws-workers-rhel8`
ci/prow/e2e-aws-upgrade-single-node	a3b414110c85473a66c4b8f1b5a72b259dd43205	link	false	`/test e2e-aws-upgrade-single-node`
ci/prow/e2e-aws-disruptive	a3b414110c85473a66c4b8f1b5a72b259dd43205	link	false	`/test e2e-aws-disruptive`
ci/prow/e2e-aws-workers-rhel7	a3b414110c85473a66c4b8f1b5a72b259dd43205	link	false	`/test e2e-aws-workers-rhel7`
ci/prow/okd-e2e-aws	a3b414110c85473a66c4b8f1b5a72b259dd43205	link	false	`/test okd-e2e-aws`
ci/prow/4.12-upgrade-from-stable-4.11-images	a3b414110c85473a66c4b8f1b5a72b259dd43205	link	true	`/test 4.12-upgrade-from-stable-4.11-images`
ci/prow/okd-scos-e2e-gcp-op	`30a0c20`	link	false	`/test okd-scos-e2e-gcp-op`
ci/prow/okd-scos-e2e-upgrade	`30a0c20`	link	false	`/test okd-scos-e2e-upgrade`
ci/prow/okd-scos-e2e-vsphere	`30a0c20`	link	false	`/test okd-scos-e2e-vsphere`
ci/prow/unit	`a49df6a`	link	true	`/test unit`
ci/prow/okd-scos-e2e-aws	`a49df6a`	link	false	`/test okd-scos-e2e-aws`
ci/prow/e2e-gcp-op	`a49df6a`	link	true	`/test e2e-gcp-op`
ci/prow/e2e-agnostic-upgrade	`a49df6a`	link	true	`/test e2e-agnostic-upgrade`
ci/prow/e2e-aws	`a49df6a`	link	true	`/test e2e-aws`
ci/prow/verify	`a49df6a`	link	true	`/test verify`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

QiWang19 · 2022-10-25T18:43:41Z

Sorry, I don't quite follow, what is the issue with the code autogen?

the function signature change made for getting icsp https://github.com/openshift/machine-config-operator/pull/2539/files#diff-d38c494535eacf2f0876136ce2b6a6329c78e91d238f7cb2b8f75379427747c0R80
So in the test we need to match the arguments in the call, the arguments for this function are auto-generated by informer-gen, https://github.com/QiWang19/machine-config-operator/blob/a49df6a2bcf2803f77aff5c2247d549fbdc62fff/pkg/controller/render/render_controller_test.go#L69
I have run make update but it did not generate.

yuqi-zhang · 2022-10-25T22:08:28Z

I have run make update but it did not generate.

Hmm, it's been a long time since we last updated that test.

What happens if you just try to manually update the test function with the additional necessary items?

i.e. a f.operatorClient = fakeoperatorclient.NewSimpleClientset(f.operatorObjects...) -> oi := operatorinformer.NewSharedInformerFactory(f.operatorClient, noResyncPeriodFunc()) -> oi.Operator().V1alpha1().ImageContentSourcePolicies(), like we do for e.g. containerruntimeconfigcontroller?

yuqi-zhang · 2022-10-26T18:03:10Z

I've spent some time thinking through this general problem (validation of the configuration on the nodes), and I'd like to bring this up more for general discussion.

First, I'd like to go back to the MCO mission statement:

The MCO keeps the underlying CoreOS system up-to-date and applies configs. The team chooses:
Simplicity over sophistication
Being OPEN and Transparent over being opinionated
Verbosity and clarity over imposed safety
Empowering debugging over preventing bugs

The MCO was never designed to be, and I believe should still not be, the place where we provide imposed safety. The MCO does not check whether your configuration is correct today (other than syntax); instead it is simply the bridge between your configuration and the nodes. If we wanted to ensure configuration safety, there is simply too large of a matrix to ensure every configuration you provide is "safe", and the validation complexity will only increase as we move towards CoreOS Layering and providing image based updates. Simply put, this is a tradeoff we have made.

Side note: there is a sort of mitigation in place for "breaking updates": we roll out changes one node at a time (generally), and any singular node should always be replaceable.

Back to the point of this PR, proxy has always been a contentious issue. Fundamentally, the proxy object is not owned by the MCO. If any validation were to happen, that the root object owner should be making validations to the object changes before it is provided to the cluster for consumption. If a user provides a broken proxy, shouldn't the change be rejected in the first place? Instead of having it get all the way to the MCO generating a new config before saying: actually your proxy changes aren't valid because the MCC container can't pull the CNO image. The CNO could have done that before we even got here, reducing the complexity of transit. The MCO would then react to it

machine-config-operator/pkg/operator/render.go

Line 175 in 69933a3

if proxy.Status == (configv1.ProxyStatus{}) {

and not lay down the "bad proxy"

Side note again: more broadly speaking, validation at source doesn't cover some other cases we've ran into in the past, such as important secrets/certs. etc. being deleted. Some of these objects are created during install time and never "managed", and the MCO simply consumes them.

I would also like to revisit the bug for a moment: up until this point https://bugzilla.redhat.com/show_bug.cgi?id=1928581#c12 we were still discussing how to properly validate at a CNO level, but right after we did some component switching for which there is no context for in the bug. Trevor makes some good points in https://bugzilla.redhat.com/show_bug.cgi?id=1928581#c17 and then we flipped it back to node. Did we ever get a chance to discuss this at a higher level?

Now, to also look at the flip slide, "MCO does not do validation" is not a view that cannot be changed. Openshift is constantly growing and adapting, such that if there is sufficient need to tackle a problem, I think we should consider it. As I see it, there are a few alternatives floating in mind:

the MCO does validation as we see fit approach: this would be basically this PR: we add validations for issues that are seen a lot and are annoying to deal with, and we scatter it across the code. This obviously does not scale well, but can solve some immediate issues
the MCO creates a new validation schema/API/controller approach: we would spend time designing and crafting a whole new method (controller?) that allows us to create extendable methods to validate configuration, with potential options to allow users to specify extra validation schema. This would probably take more design and I am not sure what's the best way to do so today
the root owner takes responsibility approach, where the validation happens at a higher level before it reaches the MCO, and the MCO continues to be a consumer. This would likely also work better with layering, since the validation would happen pre-build
the validate coreos layered images approach, where we create a new image validation schema specifically for the new layered image update workflow
the create OCP config validator flow, where a new operator is generally responsible for watching important configuration changes

And lastly, I feel bad for this writeup, since many people have put a lot of work into this PR, but after all the back and forth I am still leaning towards "this isn't something we should do in the MCO". I am happy to discuss this further in any context, and I am willing to change my mind.

cgwalters · 2022-10-26T21:09:36Z

Excellent writeup, I agree with most of it. My view on this is what we really want is automated rollbacks. Basically in this scenario:

node boots into new config
we fail to contact the proxy (I don't think kubelet fails in this scenario, but we can't fetch OS updates anymore? Presumably other pod workloads on the node fail)
This is detected by a health check
Roll back to previous config
Error from previous state is saved and reported

A question here is whether we then later try to reconcile again later. I think it'd make sense to do so, with a backoff to only trying the change again e.g. once a day at most or so?

cgwalters · 2022-10-26T21:10:26Z

But to be clear I agree in this specific instance it'd make sense to have the proxy config be validated by an owning component before it gets rolled out.

sinnykumari · 2022-10-27T08:32:45Z

Thank you Jerry for adding all the context and reasoning, great explanation! 100% agree with it and will echo again that validation should be done at the source not at the consumer level. This scales better, less error prone as provider have better knowledge of what is correct.
On the similar note, we backed off recently proposal which involved MCO updating infra object openshift/enhancements#1102 (comment)

palonsoro · 2022-10-27T15:58:02Z

I agree that it should have been CNO and not MCO who does this test. Honestly, I don't understand how the bug ended in MCO in the first place. But if this is to be returned to CNO, we need some higher level co-ordination to make this possible.

BTW, the PR may not be the best discussion place for all of this but the bugzilla.

rphillips · 2022-10-31T17:40:40Z

I am in agreement with the latest discussion. Let's close this PR and document a procedure to test the settings. The proxy settings should be tested in a staging environment.

openshift-ci · 2022-10-31T17:46:13Z

@QiWang19: This pull request references Bugzilla bug 1928581. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.
Warning: Failed to comment on Bugzilla bug with reason for changed state.

In response to this:

Bug 1928581: validate the proxy by trying oc image info

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

palonsoro · 2022-11-01T10:11:09Z

I am in agreement with the latest discussion. Let's close this PR and document a procedure to test the settings. The proxy settings should be tested in a staging environment.

I agree with closing this in what regards MCO, because this should not (and never should have been) checked by MCO.

However, just relying on customer validation on stage environment is not a correct approach, because mistakes will always happen. The main point of this bug was not to protect from the error itself, but from the fact that there is no sane way to recover from it once it happened.

If this issue raises again, we shall open bug to the CNO, which where some proper solution should be placed.

yuqi-zhang · 2022-11-09T21:22:05Z

Thanks everyone for the work and comments! will try to continue tracking this in Jira so we don't lose the context

openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Apr 20, 2021

openshift-ci-robot requested review from kikisdeliveryservice and sinnykumari April 20, 2021 17:12

QiWang19 force-pushed the valid-httpproxy branch 3 times, most recently from b157484 to f895be7 Compare April 20, 2021 19:23

rphillips reviewed Apr 20, 2021

View reviewed changes

pkg/controller/bootstrap/bootstrap.go Outdated Show resolved Hide resolved

rphillips reviewed Apr 20, 2021

View reviewed changes

pkg/controller/bootstrap/bootstrap.go Outdated Show resolved Hide resolved

QiWang19 force-pushed the valid-httpproxy branch from f895be7 to 1e1356a Compare June 29, 2021 20:33

QiWang19 force-pushed the valid-httpproxy branch 2 times, most recently from 643480d to 9a31c9a Compare July 7, 2021 17:12

QiWang19 force-pushed the valid-httpproxy branch from 9a31c9a to 30662d0 Compare July 9, 2021 15:04

QiWang19 force-pushed the valid-httpproxy branch from 30662d0 to 3ea78d3 Compare July 9, 2021 19:17

QiWang19 force-pushed the valid-httpproxy branch from b179dda to 3f9788d Compare July 12, 2021 17:19

QiWang19 force-pushed the valid-httpproxy branch from 3f9788d to a6bc317 Compare August 6, 2021 20:09

QiWang19 changed the title ~~Bug 1928581: validate the proxy by trying image pull~~ WIP: Bug 1928581: validate the proxy by trying image pull Aug 7, 2021

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 7, 2021

QiWang19 force-pushed the valid-httpproxy branch from 73c5174 to 41ffbbd Compare October 6, 2022 17:36

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2022

QiWang19 force-pushed the valid-httpproxy branch from 41ffbbd to 1744e5f Compare October 6, 2022 18:29

use cnoimage

30a0c20

Signed-off-by: Qi Wang <qiwan@redhat.com>

QiWang19 force-pushed the valid-httpproxy branch from 1744e5f to 30a0c20 Compare October 6, 2022 21:46

cgwalters reviewed Oct 19, 2022

View reviewed changes

jkyros reviewed Oct 20, 2022

View reviewed changes

use oc image and update validation to config rollout phase

a49df6a

Signed-off-by: Qi Wang <qiwan@redhat.com>

yuqi-zhang reviewed Oct 24, 2022

View reviewed changes

jkyros mentioned this pull request Oct 26, 2022

OCPBUGS-1761: Substitute skopeo inspect for imageInspect/podman, drop podman inspect fallback #3390

Merged

QiWang19 changed the title ~~Bug 1928581: validate the proxy by trying skopeo inspect image~~ Bug 1928581: validate the proxy by trying oc image info Oct 26, 2022

QiWang19 closed this Oct 31, 2022

Bug 1928581: validate the proxy by trying oc image info #2539

Bug 1928581: validate the proxy by trying oc image info #2539

Conversation

QiWang19 commented Apr 20, 2021 • edited Loading

openshift-ci-robot commented Apr 20, 2021

openshift-ci-robot commented Apr 20, 2021

sgreene570 commented Apr 21, 2021 • edited Loading

QiWang19 commented Jun 30, 2021

QiWang19 commented Jul 7, 2021

sinnykumari commented Jul 8, 2021

QiWang19 commented Jul 8, 2021

QiWang19 commented Jul 8, 2021 • edited Loading

sinnykumari commented Jul 8, 2021

QiWang19 commented Jul 8, 2021 • edited Loading

QiWang19 commented Jul 9, 2021

QiWang19 commented Jul 10, 2021

QiWang19 commented Oct 17, 2022

QiWang19 commented Oct 18, 2022

palonsoro commented Oct 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuqi-zhang commented Oct 19, 2022

yuqi-zhang commented Oct 20, 2022

jkyros Oct 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Oct 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Oct 24, 2022

QiWang19 commented Oct 25, 2022

yuqi-zhang commented Oct 25, 2022

yuqi-zhang commented Oct 26, 2022

cgwalters commented Oct 26, 2022 • edited Loading

cgwalters commented Oct 26, 2022

sinnykumari commented Oct 27, 2022

palonsoro commented Oct 27, 2022

rphillips commented Oct 31, 2022

openshift-ci bot commented Oct 31, 2022

palonsoro commented Nov 1, 2022 • edited Loading

yuqi-zhang commented Nov 9, 2022

QiWang19 commented Apr 20, 2021 •

edited

Loading

sgreene570 commented Apr 21, 2021 •

edited

Loading

QiWang19 commented Jul 8, 2021 •

edited

Loading

QiWang19 commented Jul 8, 2021 •

edited

Loading

jkyros Oct 20, 2022 •

edited

Loading

cgwalters commented Oct 26, 2022 •

edited

Loading

palonsoro commented Nov 1, 2022 •

edited

Loading